Paper Group AWR 2
Squared Earth Mover’s Distance-based Loss for Training Deep Neural Networks. LIFT: Learned Invariant Feature Transform. Wide & Deep Learning for Recommender Systems. Face Detection with End-to-End Integration of a ConvNet and a 3D Model. Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science. Improving the Neural Algorith …
Squared Earth Mover’s Distance-based Loss for Training Deep Neural Networks
Title | Squared Earth Mover’s Distance-based Loss for Training Deep Neural Networks |
Authors | Le Hou, Chen-Ping Yu, Dimitris Samaras |
Abstract | In the context of single-label classification, despite the huge success of deep learning, the commonly used cross-entropy loss function ignores the intricate inter-class relationships that often exist in real-life tasks such as age classification. In this work, we propose to leverage these relationships between classes by training deep nets with the exact squared Earth Mover’s Distance (also known as Wasserstein distance) for single-label classification. The squared EMD loss uses the predicted probabilities of all classes and penalizes the miss-predictions according to a ground distance matrix that quantifies the dissimilarities between classes. We demonstrate that on datasets with strong inter-class relationships such as an ordering between classes, our exact squared EMD losses yield new state-of-the-art results. Furthermore, we propose a method to automatically learn this matrix using the CNN’s own features during training. We show that our method can learn a ground distance matrix efficiently with no inter-class relationship priors and yield the same performance gain. Finally, we show that our method can be generalized to applications that lack strong inter-class relationships and still maintain state-of-the-art performance. Therefore, with limited computational overhead, one can always deploy the proposed loss function on any dataset over the conventional cross-entropy. |
Tasks | |
Published | 2016-11-17 |
URL | http://arxiv.org/abs/1611.05916v4 |
http://arxiv.org/pdf/1611.05916v4.pdf | |
PWC | https://paperswithcode.com/paper/squared-earth-movers-distance-based-loss-for |
Repo | https://github.com/luke321321/portfolio |
Framework | none |
LIFT: Learned Invariant Feature Transform
Title | LIFT: Learned Invariant Feature Transform |
Authors | Kwang Moo Yi, Eduard Trulls, Vincent Lepetit, Pascal Fua |
Abstract | We introduce a novel Deep Network architecture that implements the full feature point handling pipeline, that is, detection, orientation estimation, and feature description. While previous works have successfully tackled each one of these problems individually, we show how to learn to do all three in a unified manner while preserving end-to-end differentiability. We then demonstrate that our Deep pipeline outperforms state-of-the-art methods on a number of benchmark datasets, without the need of retraining. |
Tasks | |
Published | 2016-03-30 |
URL | http://arxiv.org/abs/1603.09114v2 |
http://arxiv.org/pdf/1603.09114v2.pdf | |
PWC | https://paperswithcode.com/paper/lift-learned-invariant-feature-transform |
Repo | https://github.com/cvlab-epfl/LIFT |
Framework | none |
Wide & Deep Learning for Recommender Systems
Title | Wide & Deep Learning for Recommender Systems |
Authors | Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, Rohan Anil, Zakaria Haque, Lichan Hong, Vihan Jain, Xiaobing Liu, Hemal Shah |
Abstract | Generalized linear models with nonlinear feature transformations are widely used for large-scale regression and classification problems with sparse inputs. Memorization of feature interactions through a wide set of cross-product feature transformations are effective and interpretable, while generalization requires more feature engineering effort. With less feature engineering, deep neural networks can generalize better to unseen feature combinations through low-dimensional dense embeddings learned for the sparse features. However, deep neural networks with embeddings can over-generalize and recommend less relevant items when the user-item interactions are sparse and high-rank. In this paper, we present Wide & Deep learning—jointly trained wide linear models and deep neural networks—to combine the benefits of memorization and generalization for recommender systems. We productionized and evaluated the system on Google Play, a commercial mobile app store with over one billion active users and over one million apps. Online experiment results show that Wide & Deep significantly increased app acquisitions compared with wide-only and deep-only models. We have also open-sourced our implementation in TensorFlow. |
Tasks | Click-Through Rate Prediction, Feature Engineering, Recommendation Systems |
Published | 2016-06-24 |
URL | http://arxiv.org/abs/1606.07792v1 |
http://arxiv.org/pdf/1606.07792v1.pdf | |
PWC | https://paperswithcode.com/paper/wide-deep-learning-for-recommender-systems |
Repo | https://github.com/pollyyu/Final_Project_MachineLearning_in_TensorFlow_Berkeley |
Framework | tf |
Face Detection with End-to-End Integration of a ConvNet and a 3D Model
Title | Face Detection with End-to-End Integration of a ConvNet and a 3D Model |
Authors | Yunzhu Li, Benyuan Sun, Tianfu Wu, Yizhou Wang |
Abstract | This paper presents a method for face detection in the wild, which integrates a ConvNet and a 3D mean face model in an end-to-end multi-task discriminative learning framework. The 3D mean face model is predefined and fixed (e.g., we used the one provided in the AFLW dataset). The ConvNet consists of two components: (i) The face pro- posal component computes face bounding box proposals via estimating facial key-points and the 3D transformation (rotation and translation) parameters for each predicted key-point w.r.t. the 3D mean face model. (ii) The face verification component computes detection results by prun- ing and refining proposals based on facial key-points based configuration pooling. The proposed method addresses two issues in adapting state- of-the-art generic object detection ConvNets (e.g., faster R-CNN) for face detection: (i) One is to eliminate the heuristic design of prede- fined anchor boxes in the region proposals network (RPN) by exploit- ing a 3D mean face model. (ii) The other is to replace the generic RoI (Region-of-Interest) pooling layer with a configuration pooling layer to respect underlying object structures. The multi-task loss consists of three terms: the classification Softmax loss and the location smooth l1 -losses [14] of both the facial key-points and the face bounding boxes. In ex- periments, our ConvNet is trained on the AFLW dataset only and tested on the FDDB benchmark with fine-tuning and on the AFW benchmark without fine-tuning. The proposed method obtains very competitive state-of-the-art performance in the two benchmarks. |
Tasks | Face Detection, Face Verification, Object Detection |
Published | 2016-06-02 |
URL | http://arxiv.org/abs/1606.00850v3 |
http://arxiv.org/pdf/1606.00850v3.pdf | |
PWC | https://paperswithcode.com/paper/face-detection-with-end-to-end-integration-of |
Repo | https://github.com/tfwu/FaceDetection-ConvNet-3D |
Framework | tf |
Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science
Title | Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science |
Authors | Randal S. Olson, Nathan Bartley, Ryan J. Urbanowicz, Jason H. Moore |
Abstract | As the field of data science continues to grow, there will be an ever-increasing demand for tools that make machine learning accessible to non-experts. In this paper, we introduce the concept of tree-based pipeline optimization for automating one of the most tedious parts of machine learning—pipeline design. We implement an open source Tree-based Pipeline Optimization Tool (TPOT) in Python and demonstrate its effectiveness on a series of simulated and real-world benchmark data sets. In particular, we show that TPOT can design machine learning pipelines that provide a significant improvement over a basic machine learning analysis while requiring little to no input nor prior knowledge from the user. We also address the tendency for TPOT to design overly complex pipelines by integrating Pareto optimization, which produces compact pipelines without sacrificing classification accuracy. As such, this work represents an important step toward fully automating machine learning pipeline design. |
Tasks | Automated Feature Engineering, AutoML, Hyperparameter Optimization, Neural Architecture Search |
Published | 2016-03-20 |
URL | http://arxiv.org/abs/1603.06212v1 |
http://arxiv.org/pdf/1603.06212v1.pdf | |
PWC | https://paperswithcode.com/paper/evaluation-of-a-tree-based-pipeline |
Repo | https://github.com/rhiever/tpot |
Framework | none |
Improving the Neural Algorithm of Artistic Style
Title | Improving the Neural Algorithm of Artistic Style |
Authors | Roman Novak, Yaroslav Nikulin |
Abstract | In this work we investigate different avenues of improving the Neural Algorithm of Artistic Style (by Leon A. Gatys, Alexander S. Ecker and Matthias Bethge, arXiv:1508.06576). While showing great results when transferring homogeneous and repetitive patterns, the original style representation often fails to capture more complex properties, like having separate styles of foreground and background. This leads to visual artifacts and undesirable textures appearing in unexpected regions when performing style transfer. We tackle this issue with a variety of approaches, mostly by modifying the style representation in order for it to capture more information and impose a tighter constraint on the style transfer result. In our experiments, we subjectively evaluate our best method as producing from barely noticeable to significant improvements in the quality of style transfer. |
Tasks | Style Transfer |
Published | 2016-05-15 |
URL | http://arxiv.org/abs/1605.04603v1 |
http://arxiv.org/pdf/1605.04603v1.pdf | |
PWC | https://paperswithcode.com/paper/improving-the-neural-algorithm-of-artistic |
Repo | https://github.com/telecombcn-dl/2018-dlai-team5 |
Framework | tf |
Modeling the Dynamics of Online Learning Activity
Title | Modeling the Dynamics of Online Learning Activity |
Authors | Charalampos Mavroforakis, Isabel Valera, Manuel Gomez Rodriguez |
Abstract | People are increasingly relying on the Web and social media to find solutions to their problems in a wide range of domains. In this online setting, closely related problems often lead to the same characteristic learning pattern, in which people sharing these problems visit related pieces of information, perform almost identical queries or, more generally, take a series of similar actions. In this paper, we introduce a novel modeling framework for clustering continuous-time grouped streaming data, the hierarchical Dirichlet Hawkes process (HDHP), which allows us to automatically uncover a wide variety of learning patterns from detailed traces of learning activity. Our model allows for efficient inference, scaling to millions of actions taken by thousands of users. Experiments on real data gathered from Stack Overflow reveal that our framework can recover meaningful learning patterns in terms of both content and temporal dynamics, as well as accurately track users’ interests and goals over time. |
Tasks | |
Published | 2016-10-18 |
URL | http://arxiv.org/abs/1610.05775v1 |
http://arxiv.org/pdf/1610.05775v1.pdf | |
PWC | https://paperswithcode.com/paper/modeling-the-dynamics-of-online-learning |
Repo | https://github.com/Networks-Learning/hdhp.py |
Framework | none |
Decoupled Neural Interfaces using Synthetic Gradients
Title | Decoupled Neural Interfaces using Synthetic Gradients |
Authors | Max Jaderberg, Wojciech Marian Czarnecki, Simon Osindero, Oriol Vinyals, Alex Graves, David Silver, Koray Kavukcuoglu |
Abstract | Training directed neural networks typically requires forward-propagating data through a computation graph, followed by backpropagating error signal, to produce weight updates. All layers, or more generally, modules, of the network are therefore locked, in the sense that they must wait for the remainder of the network to execute forwards and propagate error backwards before they can be updated. In this work we break this constraint by decoupling modules by introducing a model of the future computation of the network graph. These models predict what the result of the modelled subgraph will produce using only local information. In particular we focus on modelling error gradients: by using the modelled synthetic gradient in place of true backpropagated error gradients we decouple subgraphs, and can update them independently and asynchronously i.e. we realise decoupled neural interfaces. We show results for feed-forward models, where every layer is trained asynchronously, recurrent neural networks (RNNs) where predicting one’s future gradient extends the time over which the RNN can effectively model, and also a hierarchical RNN system with ticking at different timescales. Finally, we demonstrate that in addition to predicting gradients, the same framework can be used to predict inputs, resulting in models which are decoupled in both the forward and backwards pass – amounting to independent networks which co-learn such that they can be composed into a single functioning corporation. |
Tasks | |
Published | 2016-08-18 |
URL | http://arxiv.org/abs/1608.05343v2 |
http://arxiv.org/pdf/1608.05343v2.pdf | |
PWC | https://paperswithcode.com/paper/decoupled-neural-interfaces-using-synthetic |
Repo | https://github.com/TheoryDev/Deep-neural-network-training-optimisation |
Framework | pytorch |
Deep Learning for Identifying Metastatic Breast Cancer
Title | Deep Learning for Identifying Metastatic Breast Cancer |
Authors | Dayong Wang, Aditya Khosla, Rishab Gargeya, Humayun Irshad, Andrew H. Beck |
Abstract | The International Symposium on Biomedical Imaging (ISBI) held a grand challenge to evaluate computational systems for the automated detection of metastatic breast cancer in whole slide images of sentinel lymph node biopsies. Our team won both competitions in the grand challenge, obtaining an area under the receiver operating curve (AUC) of 0.925 for the task of whole slide image classification and a score of 0.7051 for the tumor localization task. A pathologist independently reviewed the same images, obtaining a whole slide image classification AUC of 0.966 and a tumor localization score of 0.733. Combining our deep learning system’s predictions with the human pathologist’s diagnoses increased the pathologist’s AUC to 0.995, representing an approximately 85 percent reduction in human error rate. These results demonstrate the power of using deep learning to produce significant improvements in the accuracy of pathological diagnoses. |
Tasks | Image Classification |
Published | 2016-06-18 |
URL | http://arxiv.org/abs/1606.05718v1 |
http://arxiv.org/pdf/1606.05718v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-for-identifying-metastatic |
Repo | https://github.com/martin-fabbri/kaggle-histopathologic-cancer-detector |
Framework | none |
V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation
Title | V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation |
Authors | Fausto Milletari, Nassir Navab, Seyed-Ahmad Ahmadi |
Abstract | Convolutional Neural Networks (CNNs) have been recently employed to solve problems from both the computer vision and medical image analysis fields. Despite their popularity, most approaches are only able to process 2D images while most medical data used in clinical practice consists of 3D volumes. In this work we propose an approach to 3D image segmentation based on a volumetric, fully convolutional, neural network. Our CNN is trained end-to-end on MRI volumes depicting prostate, and learns to predict segmentation for the whole volume at once. We introduce a novel objective function, that we optimise during training, based on Dice coefficient. In this way we can deal with situations where there is a strong imbalance between the number of foreground and background voxels. To cope with the limited number of annotated volumes available for training, we augment the data applying random non-linear transformations and histogram matching. We show in our experimental evaluation that our approach achieves good performances on challenging test data while requiring only a fraction of the processing time needed by other previous methods. |
Tasks | Medical Image Segmentation, Semantic Segmentation, Volumetric Medical Image Segmentation |
Published | 2016-06-15 |
URL | http://arxiv.org/abs/1606.04797v1 |
http://arxiv.org/pdf/1606.04797v1.pdf | |
PWC | https://paperswithcode.com/paper/v-net-fully-convolutional-neural-networks-for |
Repo | https://github.com/alexbmp/run-vnet-keras |
Framework | none |
Modelling Sentence Pairs with Tree-structured Attentive Encoder
Title | Modelling Sentence Pairs with Tree-structured Attentive Encoder |
Authors | Yao Zhou, Cong Liu, Yan Pan |
Abstract | We describe an attentive encoder that combines tree-structured recursive neural networks and sequential recurrent neural networks for modelling sentence pairs. Since existing attentive models exert attention on the sequential structure, we propose a way to incorporate attention into the tree topology. Specially, given a pair of sentences, our attentive encoder uses the representation of one sentence, which generated via an RNN, to guide the structural encoding of the other sentence on the dependency parse tree. We evaluate the proposed attentive encoder on three tasks: semantic similarity, paraphrase identification and true-false question selection. Experimental results show that our encoder outperforms all baselines and achieves state-of-the-art results on two tasks. |
Tasks | Paraphrase Identification, Semantic Similarity, Semantic Textual Similarity |
Published | 2016-10-10 |
URL | http://arxiv.org/abs/1610.02806v1 |
http://arxiv.org/pdf/1610.02806v1.pdf | |
PWC | https://paperswithcode.com/paper/modelling-sentence-pairs-with-tree-structured |
Repo | https://github.com/yoosan/sentpair |
Framework | none |
A Character-Level Decoder without Explicit Segmentation for Neural Machine Translation
Title | A Character-Level Decoder without Explicit Segmentation for Neural Machine Translation |
Authors | Junyoung Chung, Kyunghyun Cho, Yoshua Bengio |
Abstract | The existing machine translation systems, whether phrase-based or neural, have relied almost exclusively on word-level modelling with explicit segmentation. In this paper, we ask a fundamental question: can neural machine translation generate a character sequence without any explicit segmentation? To answer this question, we evaluate an attention-based encoder-decoder with a subword-level encoder and a character-level decoder on four language pairs–En-Cs, En-De, En-Ru and En-Fi– using the parallel corpora from WMT’15. Our experiments show that the models with a character-level decoder outperform the ones with a subword-level decoder on all of the four language pairs. Furthermore, the ensembles of neural models with a character-level decoder outperform the state-of-the-art non-neural machine translation systems on En-Cs, En-De and En-Fi and perform comparably on En-Ru. |
Tasks | Machine Translation |
Published | 2016-03-19 |
URL | http://arxiv.org/abs/1603.06147v4 |
http://arxiv.org/pdf/1603.06147v4.pdf | |
PWC | https://paperswithcode.com/paper/a-character-level-decoder-without-explicit |
Repo | https://github.com/nyu-dl/dl4mt-cdec |
Framework | none |
Resnet in Resnet: Generalizing Residual Architectures
Title | Resnet in Resnet: Generalizing Residual Architectures |
Authors | Sasha Targ, Diogo Almeida, Kevin Lyman |
Abstract | Residual networks (ResNets) have recently achieved state-of-the-art on challenging computer vision tasks. We introduce Resnet in Resnet (RiR): a deep dual-stream architecture that generalizes ResNets and standard CNNs and is easily implemented with no computational overhead. RiR consistently improves performance over ResNets, outperforms architectures with similar amounts of augmentation on CIFAR-10, and establishes a new state-of-the-art on CIFAR-100. |
Tasks | |
Published | 2016-03-25 |
URL | http://arxiv.org/abs/1603.08029v1 |
http://arxiv.org/pdf/1603.08029v1.pdf | |
PWC | https://paperswithcode.com/paper/resnet-in-resnet-generalizing-residual |
Repo | https://github.com/osmr/imgclsmob |
Framework | mxnet |
Learning from Simulated and Unsupervised Images through Adversarial Training
Title | Learning from Simulated and Unsupervised Images through Adversarial Training |
Authors | Ashish Shrivastava, Tomas Pfister, Oncel Tuzel, Josh Susskind, Wenda Wang, Russ Webb |
Abstract | With recent progress in graphics, it has become more tractable to train models on synthetic images, potentially avoiding the need for expensive annotations. However, learning from synthetic images may not achieve the desired performance due to a gap between synthetic and real image distributions. To reduce this gap, we propose Simulated+Unsupervised (S+U) learning, where the task is to learn a model to improve the realism of a simulator’s output using unlabeled real data, while preserving the annotation information from the simulator. We develop a method for S+U learning that uses an adversarial network similar to Generative Adversarial Networks (GANs), but with synthetic images as inputs instead of random vectors. We make several key modifications to the standard GAN algorithm to preserve annotations, avoid artifacts, and stabilize training: (i) a ‘self-regularization’ term, (ii) a local adversarial loss, and (iii) updating the discriminator using a history of refined images. We show that this enables generation of highly realistic images, which we demonstrate both qualitatively and with a user study. We quantitatively evaluate the generated images by training models for gaze estimation and hand pose estimation. We show a significant improvement over using synthetic images, and achieve state-of-the-art results on the MPIIGaze dataset without any labeled real data. |
Tasks | Domain Adaptation, Gaze Estimation, Hand Pose Estimation, Image-to-Image Translation, Pose Estimation |
Published | 2016-12-22 |
URL | http://arxiv.org/abs/1612.07828v2 |
http://arxiv.org/pdf/1612.07828v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-from-simulated-and-unsupervised |
Repo | https://github.com/shinseung428/simGAN_NYU_Hand |
Framework | tf |
Coupled Generative Adversarial Networks
Title | Coupled Generative Adversarial Networks |
Authors | Ming-Yu Liu, Oncel Tuzel |
Abstract | We propose coupled generative adversarial network (CoGAN) for learning a joint distribution of multi-domain images. In contrast to the existing approaches, which require tuples of corresponding images in different domains in the training set, CoGAN can learn a joint distribution without any tuple of corresponding images. It can learn a joint distribution with just samples drawn from the marginal distributions. This is achieved by enforcing a weight-sharing constraint that limits the network capacity and favors a joint distribution solution over a product of marginal distributions one. We apply CoGAN to several joint distribution learning tasks, including learning a joint distribution of color and depth images, and learning a joint distribution of face images with different attributes. For each task it successfully learns the joint distribution without any tuple of corresponding images. We also demonstrate its applications to domain adaptation and image transformation. |
Tasks | Domain Adaptation, Image-to-Image Translation |
Published | 2016-06-24 |
URL | http://arxiv.org/abs/1606.07536v2 |
http://arxiv.org/pdf/1606.07536v2.pdf | |
PWC | https://paperswithcode.com/paper/coupled-generative-adversarial-networks |
Repo | https://github.com/eriklindernoren/PyTorch-GAN |
Framework | pytorch |