May 7, 2019

3001 words 15 mins read

Paper Group AWR 2

Squared Earth Mover’s Distance-based Loss for Training Deep Neural Networks. LIFT: Learned Invariant Feature Transform. Wide & Deep Learning for Recommender Systems. Face Detection with End-to-End Integration of a ConvNet and a 3D Model. Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science. Improving the Neural Algorith …

Squared Earth Mover’s Distance-based Loss for Training Deep Neural Networks


Title	Squared Earth Mover’s Distance-based Loss for Training Deep Neural Networks
Authors	Le Hou, Chen-Ping Yu, Dimitris Samaras
Abstract	In the context of single-label classification, despite the huge success of deep learning, the commonly used cross-entropy loss function ignores the intricate inter-class relationships that often exist in real-life tasks such as age classification. In this work, we propose to leverage these relationships between classes by training deep nets with the exact squared Earth Mover’s Distance (also known as Wasserstein distance) for single-label classification. The squared EMD loss uses the predicted probabilities of all classes and penalizes the miss-predictions according to a ground distance matrix that quantifies the dissimilarities between classes. We demonstrate that on datasets with strong inter-class relationships such as an ordering between classes, our exact squared EMD losses yield new state-of-the-art results. Furthermore, we propose a method to automatically learn this matrix using the CNN’s own features during training. We show that our method can learn a ground distance matrix efficiently with no inter-class relationship priors and yield the same performance gain. Finally, we show that our method can be generalized to applications that lack strong inter-class relationships and still maintain state-of-the-art performance. Therefore, with limited computational overhead, one can always deploy the proposed loss function on any dataset over the conventional cross-entropy.
Tasks
Published	2016-11-17
URL	http://arxiv.org/abs/1611.05916v4
PDF	http://arxiv.org/pdf/1611.05916v4.pdf
PWC	https://paperswithcode.com/paper/squared-earth-movers-distance-based-loss-for
Repo	https://github.com/luke321321/portfolio
Framework	none

LIFT: Learned Invariant Feature Transform


Title	LIFT: Learned Invariant Feature Transform
Authors	Kwang Moo Yi, Eduard Trulls, Vincent Lepetit, Pascal Fua
Abstract	We introduce a novel Deep Network architecture that implements the full feature point handling pipeline, that is, detection, orientation estimation, and feature description. While previous works have successfully tackled each one of these problems individually, we show how to learn to do all three in a unified manner while preserving end-to-end differentiability. We then demonstrate that our Deep pipeline outperforms state-of-the-art methods on a number of benchmark datasets, without the need of retraining.
Tasks
Published	2016-03-30
URL	http://arxiv.org/abs/1603.09114v2
PDF	http://arxiv.org/pdf/1603.09114v2.pdf
PWC	https://paperswithcode.com/paper/lift-learned-invariant-feature-transform
Repo	https://github.com/cvlab-epfl/LIFT
Framework	none

Wide & Deep Learning for Recommender Systems


Title	Wide & Deep Learning for Recommender Systems
Authors	Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, Rohan Anil, Zakaria Haque, Lichan Hong, Vihan Jain, Xiaobing Liu, Hemal Shah
Abstract	Generalized linear models with nonlinear feature transformations are widely used for large-scale regression and classification problems with sparse inputs. Memorization of feature interactions through a wide set of cross-product feature transformations are effective and interpretable, while generalization requires more feature engineering effort. With less feature engineering, deep neural networks can generalize better to unseen feature combinations through low-dimensional dense embeddings learned for the sparse features. However, deep neural networks with embeddings can over-generalize and recommend less relevant items when the user-item interactions are sparse and high-rank. In this paper, we present Wide & Deep learning—jointly trained wide linear models and deep neural networks—to combine the benefits of memorization and generalization for recommender systems. We productionized and evaluated the system on Google Play, a commercial mobile app store with over one billion active users and over one million apps. Online experiment results show that Wide & Deep significantly increased app acquisitions compared with wide-only and deep-only models. We have also open-sourced our implementation in TensorFlow.
Tasks	Click-Through Rate Prediction, Feature Engineering, Recommendation Systems
Published	2016-06-24
URL	http://arxiv.org/abs/1606.07792v1
PDF	http://arxiv.org/pdf/1606.07792v1.pdf
PWC	https://paperswithcode.com/paper/wide-deep-learning-for-recommender-systems
Repo	https://github.com/pollyyu/Final_Project_MachineLearning_in_TensorFlow_Berkeley
Framework	tf

Face Detection with End-to-End Integration of a ConvNet and a 3D Model


Title	Face Detection with End-to-End Integration of a ConvNet and a 3D Model
Authors	Yunzhu Li, Benyuan Sun, Tianfu Wu, Yizhou Wang
Abstract	This paper presents a method for face detection in the wild, which integrates a ConvNet and a 3D mean face model in an end-to-end multi-task discriminative learning framework. The 3D mean face model is predefined and fixed (e.g., we used the one provided in the AFLW dataset). The ConvNet consists of two components: (i) The face pro- posal component computes face bounding box proposals via estimating facial key-points and the 3D transformation (rotation and translation) parameters for each predicted key-point w.r.t. the 3D mean face model. (ii) The face verification component computes detection results by prun- ing and refining proposals based on facial key-points based configuration pooling. The proposed method addresses two issues in adapting state- of-the-art generic object detection ConvNets (e.g., faster R-CNN) for face detection: (i) One is to eliminate the heuristic design of prede- fined anchor boxes in the region proposals network (RPN) by exploit- ing a 3D mean face model. (ii) The other is to replace the generic RoI (Region-of-Interest) pooling layer with a configuration pooling layer to respect underlying object structures. The multi-task loss consists of three terms: the classification Softmax loss and the location smooth l1 -losses [14] of both the facial key-points and the face bounding boxes. In ex- periments, our ConvNet is trained on the AFLW dataset only and tested on the FDDB benchmark with fine-tuning and on the AFW benchmark without fine-tuning. The proposed method obtains very competitive state-of-the-art performance in the two benchmarks.
Tasks	Face Detection, Face Verification, Object Detection
Published	2016-06-02
URL	http://arxiv.org/abs/1606.00850v3
PDF	http://arxiv.org/pdf/1606.00850v3.pdf
PWC	https://paperswithcode.com/paper/face-detection-with-end-to-end-integration-of
Repo	https://github.com/tfwu/FaceDetection-ConvNet-3D
Framework	tf

Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science


Title	Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science
Authors	Randal S. Olson, Nathan Bartley, Ryan J. Urbanowicz, Jason H. Moore
Abstract	As the field of data science continues to grow, there will be an ever-increasing demand for tools that make machine learning accessible to non-experts. In this paper, we introduce the concept of tree-based pipeline optimization for automating one of the most tedious parts of machine learning—pipeline design. We implement an open source Tree-based Pipeline Optimization Tool (TPOT) in Python and demonstrate its effectiveness on a series of simulated and real-world benchmark data sets. In particular, we show that TPOT can design machine learning pipelines that provide a significant improvement over a basic machine learning analysis while requiring little to no input nor prior knowledge from the user. We also address the tendency for TPOT to design overly complex pipelines by integrating Pareto optimization, which produces compact pipelines without sacrificing classification accuracy. As such, this work represents an important step toward fully automating machine learning pipeline design.
Tasks	Automated Feature Engineering, AutoML, Hyperparameter Optimization, Neural Architecture Search
Published	2016-03-20
URL	http://arxiv.org/abs/1603.06212v1
PDF	http://arxiv.org/pdf/1603.06212v1.pdf
PWC	https://paperswithcode.com/paper/evaluation-of-a-tree-based-pipeline
Repo	https://github.com/rhiever/tpot
Framework	none

Improving the Neural Algorithm of Artistic Style


Title	Improving the Neural Algorithm of Artistic Style
Authors	Roman Novak, Yaroslav Nikulin
Abstract	In this work we investigate different avenues of improving the Neural Algorithm of Artistic Style (by Leon A. Gatys, Alexander S. Ecker and Matthias Bethge, arXiv:1508.06576). While showing great results when transferring homogeneous and repetitive patterns, the original style representation often fails to capture more complex properties, like having separate styles of foreground and background. This leads to visual artifacts and undesirable textures appearing in unexpected regions when performing style transfer. We tackle this issue with a variety of approaches, mostly by modifying the style representation in order for it to capture more information and impose a tighter constraint on the style transfer result. In our experiments, we subjectively evaluate our best method as producing from barely noticeable to significant improvements in the quality of style transfer.
Tasks	Style Transfer
Published	2016-05-15
URL	http://arxiv.org/abs/1605.04603v1
PDF	http://arxiv.org/pdf/1605.04603v1.pdf
PWC	https://paperswithcode.com/paper/improving-the-neural-algorithm-of-artistic
Repo	https://github.com/telecombcn-dl/2018-dlai-team5
Framework	tf

Modeling the Dynamics of Online Learning Activity


Title	Modeling the Dynamics of Online Learning Activity
Authors	Charalampos Mavroforakis, Isabel Valera, Manuel Gomez Rodriguez
Abstract	People are increasingly relying on the Web and social media to find solutions to their problems in a wide range of domains. In this online setting, closely related problems often lead to the same characteristic learning pattern, in which people sharing these problems visit related pieces of information, perform almost identical queries or, more generally, take a series of similar actions. In this paper, we introduce a novel modeling framework for clustering continuous-time grouped streaming data, the hierarchical Dirichlet Hawkes process (HDHP), which allows us to automatically uncover a wide variety of learning patterns from detailed traces of learning activity. Our model allows for efficient inference, scaling to millions of actions taken by thousands of users. Experiments on real data gathered from Stack Overflow reveal that our framework can recover meaningful learning patterns in terms of both content and temporal dynamics, as well as accurately track users’ interests and goals over time.
Tasks
Published	2016-10-18
URL	http://arxiv.org/abs/1610.05775v1
PDF	http://arxiv.org/pdf/1610.05775v1.pdf
PWC	https://paperswithcode.com/paper/modeling-the-dynamics-of-online-learning
Repo	https://github.com/Networks-Learning/hdhp.py
Framework	none

Decoupled Neural Interfaces using Synthetic Gradients


Title	Decoupled Neural Interfaces using Synthetic Gradients
Authors	Max Jaderberg, Wojciech Marian Czarnecki, Simon Osindero, Oriol Vinyals, Alex Graves, David Silver, Koray Kavukcuoglu
Abstract	Training directed neural networks typically requires forward-propagating data through a computation graph, followed by backpropagating error signal, to produce weight updates. All layers, or more generally, modules, of the network are therefore locked, in the sense that they must wait for the remainder of the network to execute forwards and propagate error backwards before they can be updated. In this work we break this constraint by decoupling modules by introducing a model of the future computation of the network graph. These models predict what the result of the modelled subgraph will produce using only local information. In particular we focus on modelling error gradients: by using the modelled synthetic gradient in place of true backpropagated error gradients we decouple subgraphs, and can update them independently and asynchronously i.e. we realise decoupled neural interfaces. We show results for feed-forward models, where every layer is trained asynchronously, recurrent neural networks (RNNs) where predicting one’s future gradient extends the time over which the RNN can effectively model, and also a hierarchical RNN system with ticking at different timescales. Finally, we demonstrate that in addition to predicting gradients, the same framework can be used to predict inputs, resulting in models which are decoupled in both the forward and backwards pass – amounting to independent networks which co-learn such that they can be composed into a single functioning corporation.
Tasks
Published	2016-08-18
URL	http://arxiv.org/abs/1608.05343v2
PDF	http://arxiv.org/pdf/1608.05343v2.pdf
PWC	https://paperswithcode.com/paper/decoupled-neural-interfaces-using-synthetic
Repo	https://github.com/TheoryDev/Deep-neural-network-training-optimisation
Framework	pytorch

Deep Learning for Identifying Metastatic Breast Cancer


Title	Deep Learning for Identifying Metastatic Breast Cancer
Authors	Dayong Wang, Aditya Khosla, Rishab Gargeya, Humayun Irshad, Andrew H. Beck
Abstract	The International Symposium on Biomedical Imaging (ISBI) held a grand challenge to evaluate computational systems for the automated detection of metastatic breast cancer in whole slide images of sentinel lymph node biopsies. Our team won both competitions in the grand challenge, obtaining an area under the receiver operating curve (AUC) of 0.925 for the task of whole slide image classification and a score of 0.7051 for the tumor localization task. A pathologist independently reviewed the same images, obtaining a whole slide image classification AUC of 0.966 and a tumor localization score of 0.733. Combining our deep learning system’s predictions with the human pathologist’s diagnoses increased the pathologist’s AUC to 0.995, representing an approximately 85 percent reduction in human error rate. These results demonstrate the power of using deep learning to produce significant improvements in the accuracy of pathological diagnoses.
Tasks	Image Classification
Published	2016-06-18
URL	http://arxiv.org/abs/1606.05718v1
PDF	http://arxiv.org/pdf/1606.05718v1.pdf
PWC	https://paperswithcode.com/paper/deep-learning-for-identifying-metastatic
Repo	https://github.com/martin-fabbri/kaggle-histopathologic-cancer-detector
Framework	none

V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation


Title	V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation
Authors	Fausto Milletari, Nassir Navab, Seyed-Ahmad Ahmadi
Abstract	Convolutional Neural Networks (CNNs) have been recently employed to solve problems from both the computer vision and medical image analysis fields. Despite their popularity, most approaches are only able to process 2D images while most medical data used in clinical practice consists of 3D volumes. In this work we propose an approach to 3D image segmentation based on a volumetric, fully convolutional, neural network. Our CNN is trained end-to-end on MRI volumes depicting prostate, and learns to predict segmentation for the whole volume at once. We introduce a novel objective function, that we optimise during training, based on Dice coefficient. In this way we can deal with situations where there is a strong imbalance between the number of foreground and background voxels. To cope with the limited number of annotated volumes available for training, we augment the data applying random non-linear transformations and histogram matching. We show in our experimental evaluation that our approach achieves good performances on challenging test data while requiring only a fraction of the processing time needed by other previous methods.
Tasks	Medical Image Segmentation, Semantic Segmentation, Volumetric Medical Image Segmentation
Published	2016-06-15
URL	http://arxiv.org/abs/1606.04797v1
PDF	http://arxiv.org/pdf/1606.04797v1.pdf
PWC	https://paperswithcode.com/paper/v-net-fully-convolutional-neural-networks-for
Repo	https://github.com/alexbmp/run-vnet-keras
Framework	none

Modelling Sentence Pairs with Tree-structured Attentive Encoder


Title	Modelling Sentence Pairs with Tree-structured Attentive Encoder
Authors	Yao Zhou, Cong Liu, Yan Pan
Abstract	We describe an attentive encoder that combines tree-structured recursive neural networks and sequential recurrent neural networks for modelling sentence pairs. Since existing attentive models exert attention on the sequential structure, we propose a way to incorporate attention into the tree topology. Specially, given a pair of sentences, our attentive encoder uses the representation of one sentence, which generated via an RNN, to guide the structural encoding of the other sentence on the dependency parse tree. We evaluate the proposed attentive encoder on three tasks: semantic similarity, paraphrase identification and true-false question selection. Experimental results show that our encoder outperforms all baselines and achieves state-of-the-art results on two tasks.
Tasks	Paraphrase Identification, Semantic Similarity, Semantic Textual Similarity
Published	2016-10-10
URL	http://arxiv.org/abs/1610.02806v1
PDF	http://arxiv.org/pdf/1610.02806v1.pdf
PWC	https://paperswithcode.com/paper/modelling-sentence-pairs-with-tree-structured
Repo	https://github.com/yoosan/sentpair
Framework	none

A Character-Level Decoder without Explicit Segmentation for Neural Machine Translation


Title	A Character-Level Decoder without Explicit Segmentation for Neural Machine Translation
Authors	Junyoung Chung, Kyunghyun Cho, Yoshua Bengio
Abstract	The existing machine translation systems, whether phrase-based or neural, have relied almost exclusively on word-level modelling with explicit segmentation. In this paper, we ask a fundamental question: can neural machine translation generate a character sequence without any explicit segmentation? To answer this question, we evaluate an attention-based encoder-decoder with a subword-level encoder and a character-level decoder on four language pairs–En-Cs, En-De, En-Ru and En-Fi– using the parallel corpora from WMT’15. Our experiments show that the models with a character-level decoder outperform the ones with a subword-level decoder on all of the four language pairs. Furthermore, the ensembles of neural models with a character-level decoder outperform the state-of-the-art non-neural machine translation systems on En-Cs, En-De and En-Fi and perform comparably on En-Ru.
Tasks	Machine Translation
Published	2016-03-19
URL	http://arxiv.org/abs/1603.06147v4
PDF	http://arxiv.org/pdf/1603.06147v4.pdf
PWC	https://paperswithcode.com/paper/a-character-level-decoder-without-explicit
Repo	https://github.com/nyu-dl/dl4mt-cdec
Framework	none

Resnet in Resnet: Generalizing Residual Architectures


Title	Resnet in Resnet: Generalizing Residual Architectures
Authors	Sasha Targ, Diogo Almeida, Kevin Lyman
Abstract	Residual networks (ResNets) have recently achieved state-of-the-art on challenging computer vision tasks. We introduce Resnet in Resnet (RiR): a deep dual-stream architecture that generalizes ResNets and standard CNNs and is easily implemented with no computational overhead. RiR consistently improves performance over ResNets, outperforms architectures with similar amounts of augmentation on CIFAR-10, and establishes a new state-of-the-art on CIFAR-100.
Tasks
Published	2016-03-25
URL	http://arxiv.org/abs/1603.08029v1
PDF	http://arxiv.org/pdf/1603.08029v1.pdf
PWC	https://paperswithcode.com/paper/resnet-in-resnet-generalizing-residual
Repo	https://github.com/osmr/imgclsmob
Framework	mxnet

Learning from Simulated and Unsupervised Images through Adversarial Training


Title	Learning from Simulated and Unsupervised Images through Adversarial Training
Authors	Ashish Shrivastava, Tomas Pfister, Oncel Tuzel, Josh Susskind, Wenda Wang, Russ Webb
Abstract	With recent progress in graphics, it has become more tractable to train models on synthetic images, potentially avoiding the need for expensive annotations. However, learning from synthetic images may not achieve the desired performance due to a gap between synthetic and real image distributions. To reduce this gap, we propose Simulated+Unsupervised (S+U) learning, where the task is to learn a model to improve the realism of a simulator’s output using unlabeled real data, while preserving the annotation information from the simulator. We develop a method for S+U learning that uses an adversarial network similar to Generative Adversarial Networks (GANs), but with synthetic images as inputs instead of random vectors. We make several key modifications to the standard GAN algorithm to preserve annotations, avoid artifacts, and stabilize training: (i) a ‘self-regularization’ term, (ii) a local adversarial loss, and (iii) updating the discriminator using a history of refined images. We show that this enables generation of highly realistic images, which we demonstrate both qualitatively and with a user study. We quantitatively evaluate the generated images by training models for gaze estimation and hand pose estimation. We show a significant improvement over using synthetic images, and achieve state-of-the-art results on the MPIIGaze dataset without any labeled real data.
Tasks	Domain Adaptation, Gaze Estimation, Hand Pose Estimation, Image-to-Image Translation, Pose Estimation
Published	2016-12-22
URL	http://arxiv.org/abs/1612.07828v2
PDF	http://arxiv.org/pdf/1612.07828v2.pdf
PWC	https://paperswithcode.com/paper/learning-from-simulated-and-unsupervised
Repo	https://github.com/shinseung428/simGAN_NYU_Hand
Framework	tf

Coupled Generative Adversarial Networks


Title	Coupled Generative Adversarial Networks
Authors	Ming-Yu Liu, Oncel Tuzel
Abstract	We propose coupled generative adversarial network (CoGAN) for learning a joint distribution of multi-domain images. In contrast to the existing approaches, which require tuples of corresponding images in different domains in the training set, CoGAN can learn a joint distribution without any tuple of corresponding images. It can learn a joint distribution with just samples drawn from the marginal distributions. This is achieved by enforcing a weight-sharing constraint that limits the network capacity and favors a joint distribution solution over a product of marginal distributions one. We apply CoGAN to several joint distribution learning tasks, including learning a joint distribution of color and depth images, and learning a joint distribution of face images with different attributes. For each task it successfully learns the joint distribution without any tuple of corresponding images. We also demonstrate its applications to domain adaptation and image transformation.
Tasks	Domain Adaptation, Image-to-Image Translation
Published	2016-06-24
URL	http://arxiv.org/abs/1606.07536v2
PDF	http://arxiv.org/pdf/1606.07536v2.pdf
PWC	https://paperswithcode.com/paper/coupled-generative-adversarial-networks
Repo	https://github.com/eriklindernoren/PyTorch-GAN
Framework	pytorch