May 7, 2019

3001 words 15 mins read

Paper Group AWR 2

Paper Group AWR 2

Squared Earth Mover’s Distance-based Loss for Training Deep Neural Networks. LIFT: Learned Invariant Feature Transform. Wide & Deep Learning for Recommender Systems. Face Detection with End-to-End Integration of a ConvNet and a 3D Model. Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science. Improving the Neural Algorith …

Squared Earth Mover’s Distance-based Loss for Training Deep Neural Networks

Title Squared Earth Mover’s Distance-based Loss for Training Deep Neural Networks
Authors Le Hou, Chen-Ping Yu, Dimitris Samaras
Abstract In the context of single-label classification, despite the huge success of deep learning, the commonly used cross-entropy loss function ignores the intricate inter-class relationships that often exist in real-life tasks such as age classification. In this work, we propose to leverage these relationships between classes by training deep nets with the exact squared Earth Mover’s Distance (also known as Wasserstein distance) for single-label classification. The squared EMD loss uses the predicted probabilities of all classes and penalizes the miss-predictions according to a ground distance matrix that quantifies the dissimilarities between classes. We demonstrate that on datasets with strong inter-class relationships such as an ordering between classes, our exact squared EMD losses yield new state-of-the-art results. Furthermore, we propose a method to automatically learn this matrix using the CNN’s own features during training. We show that our method can learn a ground distance matrix efficiently with no inter-class relationship priors and yield the same performance gain. Finally, we show that our method can be generalized to applications that lack strong inter-class relationships and still maintain state-of-the-art performance. Therefore, with limited computational overhead, one can always deploy the proposed loss function on any dataset over the conventional cross-entropy.
Tasks
Published 2016-11-17
URL http://arxiv.org/abs/1611.05916v4
PDF http://arxiv.org/pdf/1611.05916v4.pdf
PWC https://paperswithcode.com/paper/squared-earth-movers-distance-based-loss-for
Repo https://github.com/luke321321/portfolio
Framework none

LIFT: Learned Invariant Feature Transform

Title LIFT: Learned Invariant Feature Transform
Authors Kwang Moo Yi, Eduard Trulls, Vincent Lepetit, Pascal Fua
Abstract We introduce a novel Deep Network architecture that implements the full feature point handling pipeline, that is, detection, orientation estimation, and feature description. While previous works have successfully tackled each one of these problems individually, we show how to learn to do all three in a unified manner while preserving end-to-end differentiability. We then demonstrate that our Deep pipeline outperforms state-of-the-art methods on a number of benchmark datasets, without the need of retraining.
Tasks
Published 2016-03-30
URL http://arxiv.org/abs/1603.09114v2
PDF http://arxiv.org/pdf/1603.09114v2.pdf
PWC https://paperswithcode.com/paper/lift-learned-invariant-feature-transform
Repo https://github.com/cvlab-epfl/LIFT
Framework none

Wide & Deep Learning for Recommender Systems

Title Wide & Deep Learning for Recommender Systems
Authors Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, Rohan Anil, Zakaria Haque, Lichan Hong, Vihan Jain, Xiaobing Liu, Hemal Shah
Abstract Generalized linear models with nonlinear feature transformations are widely used for large-scale regression and classification problems with sparse inputs. Memorization of feature interactions through a wide set of cross-product feature transformations are effective and interpretable, while generalization requires more feature engineering effort. With less feature engineering, deep neural networks can generalize better to unseen feature combinations through low-dimensional dense embeddings learned for the sparse features. However, deep neural networks with embeddings can over-generalize and recommend less relevant items when the user-item interactions are sparse and high-rank. In this paper, we present Wide & Deep learning—jointly trained wide linear models and deep neural networks—to combine the benefits of memorization and generalization for recommender systems. We productionized and evaluated the system on Google Play, a commercial mobile app store with over one billion active users and over one million apps. Online experiment results show that Wide & Deep significantly increased app acquisitions compared with wide-only and deep-only models. We have also open-sourced our implementation in TensorFlow.
Tasks Click-Through Rate Prediction, Feature Engineering, Recommendation Systems
Published 2016-06-24
URL http://arxiv.org/abs/1606.07792v1
PDF http://arxiv.org/pdf/1606.07792v1.pdf
PWC https://paperswithcode.com/paper/wide-deep-learning-for-recommender-systems
Repo https://github.com/pollyyu/Final_Project_MachineLearning_in_TensorFlow_Berkeley
Framework tf

Face Detection with End-to-End Integration of a ConvNet and a 3D Model

Title Face Detection with End-to-End Integration of a ConvNet and a 3D Model
Authors Yunzhu Li, Benyuan Sun, Tianfu Wu, Yizhou Wang
Abstract This paper presents a method for face detection in the wild, which integrates a ConvNet and a 3D mean face model in an end-to-end multi-task discriminative learning framework. The 3D mean face model is predefined and fixed (e.g., we used the one provided in the AFLW dataset). The ConvNet consists of two components: (i) The face pro- posal component computes face bounding box proposals via estimating facial key-points and the 3D transformation (rotation and translation) parameters for each predicted key-point w.r.t. the 3D mean face model. (ii) The face verification component computes detection results by prun- ing and refining proposals based on facial key-points based configuration pooling. The proposed method addresses two issues in adapting state- of-the-art generic object detection ConvNets (e.g., faster R-CNN) for face detection: (i) One is to eliminate the heuristic design of prede- fined anchor boxes in the region proposals network (RPN) by exploit- ing a 3D mean face model. (ii) The other is to replace the generic RoI (Region-of-Interest) pooling layer with a configuration pooling layer to respect underlying object structures. The multi-task loss consists of three terms: the classification Softmax loss and the location smooth l1 -losses [14] of both the facial key-points and the face bounding boxes. In ex- periments, our ConvNet is trained on the AFLW dataset only and tested on the FDDB benchmark with fine-tuning and on the AFW benchmark without fine-tuning. The proposed method obtains very competitive state-of-the-art performance in the two benchmarks.
Tasks Face Detection, Face Verification, Object Detection
Published 2016-06-02
URL http://arxiv.org/abs/1606.00850v3
PDF http://arxiv.org/pdf/1606.00850v3.pdf
PWC https://paperswithcode.com/paper/face-detection-with-end-to-end-integration-of
Repo https://github.com/tfwu/FaceDetection-ConvNet-3D
Framework tf

Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science

Title Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science
Authors Randal S. Olson, Nathan Bartley, Ryan J. Urbanowicz, Jason H. Moore
Abstract As the field of data science continues to grow, there will be an ever-increasing demand for tools that make machine learning accessible to non-experts. In this paper, we introduce the concept of tree-based pipeline optimization for automating one of the most tedious parts of machine learning—pipeline design. We implement an open source Tree-based Pipeline Optimization Tool (TPOT) in Python and demonstrate its effectiveness on a series of simulated and real-world benchmark data sets. In particular, we show that TPOT can design machine learning pipelines that provide a significant improvement over a basic machine learning analysis while requiring little to no input nor prior knowledge from the user. We also address the tendency for TPOT to design overly complex pipelines by integrating Pareto optimization, which produces compact pipelines without sacrificing classification accuracy. As such, this work represents an important step toward fully automating machine learning pipeline design.
Tasks Automated Feature Engineering, AutoML, Hyperparameter Optimization, Neural Architecture Search
Published 2016-03-20
URL http://arxiv.org/abs/1603.06212v1
PDF http://arxiv.org/pdf/1603.06212v1.pdf
PWC https://paperswithcode.com/paper/evaluation-of-a-tree-based-pipeline
Repo https://github.com/rhiever/tpot
Framework none

Improving the Neural Algorithm of Artistic Style

Title Improving the Neural Algorithm of Artistic Style
Authors Roman Novak, Yaroslav Nikulin
Abstract In this work we investigate different avenues of improving the Neural Algorithm of Artistic Style (by Leon A. Gatys, Alexander S. Ecker and Matthias Bethge, arXiv:1508.06576). While showing great results when transferring homogeneous and repetitive patterns, the original style representation often fails to capture more complex properties, like having separate styles of foreground and background. This leads to visual artifacts and undesirable textures appearing in unexpected regions when performing style transfer. We tackle this issue with a variety of approaches, mostly by modifying the style representation in order for it to capture more information and impose a tighter constraint on the style transfer result. In our experiments, we subjectively evaluate our best method as producing from barely noticeable to significant improvements in the quality of style transfer.
Tasks Style Transfer
Published 2016-05-15
URL http://arxiv.org/abs/1605.04603v1
PDF http://arxiv.org/pdf/1605.04603v1.pdf
PWC https://paperswithcode.com/paper/improving-the-neural-algorithm-of-artistic
Repo https://github.com/telecombcn-dl/2018-dlai-team5
Framework tf

Modeling the Dynamics of Online Learning Activity

Title Modeling the Dynamics of Online Learning Activity
Authors Charalampos Mavroforakis, Isabel Valera, Manuel Gomez Rodriguez
Abstract People are increasingly relying on the Web and social media to find solutions to their problems in a wide range of domains. In this online setting, closely related problems often lead to the same characteristic learning pattern, in which people sharing these problems visit related pieces of information, perform almost identical queries or, more generally, take a series of similar actions. In this paper, we introduce a novel modeling framework for clustering continuous-time grouped streaming data, the hierarchical Dirichlet Hawkes process (HDHP), which allows us to automatically uncover a wide variety of learning patterns from detailed traces of learning activity. Our model allows for efficient inference, scaling to millions of actions taken by thousands of users. Experiments on real data gathered from Stack Overflow reveal that our framework can recover meaningful learning patterns in terms of both content and temporal dynamics, as well as accurately track users’ interests and goals over time.
Tasks
Published 2016-10-18
URL http://arxiv.org/abs/1610.05775v1
PDF http://arxiv.org/pdf/1610.05775v1.pdf
PWC https://paperswithcode.com/paper/modeling-the-dynamics-of-online-learning
Repo https://github.com/Networks-Learning/hdhp.py
Framework none

Decoupled Neural Interfaces using Synthetic Gradients

Title Decoupled Neural Interfaces using Synthetic Gradients
Authors Max Jaderberg, Wojciech Marian Czarnecki, Simon Osindero, Oriol Vinyals, Alex Graves, David Silver, Koray Kavukcuoglu
Abstract Training directed neural networks typically requires forward-propagating data through a computation graph, followed by backpropagating error signal, to produce weight updates. All layers, or more generally, modules, of the network are therefore locked, in the sense that they must wait for the remainder of the network to execute forwards and propagate error backwards before they can be updated. In this work we break this constraint by decoupling modules by introducing a model of the future computation of the network graph. These models predict what the result of the modelled subgraph will produce using only local information. In particular we focus on modelling error gradients: by using the modelled synthetic gradient in place of true backpropagated error gradients we decouple subgraphs, and can update them independently and asynchronously i.e. we realise decoupled neural interfaces. We show results for feed-forward models, where every layer is trained asynchronously, recurrent neural networks (RNNs) where predicting one’s future gradient extends the time over which the RNN can effectively model, and also a hierarchical RNN system with ticking at different timescales. Finally, we demonstrate that in addition to predicting gradients, the same framework can be used to predict inputs, resulting in models which are decoupled in both the forward and backwards pass – amounting to independent networks which co-learn such that they can be composed into a single functioning corporation.
Tasks
Published 2016-08-18
URL http://arxiv.org/abs/1608.05343v2
PDF http://arxiv.org/pdf/1608.05343v2.pdf
PWC https://paperswithcode.com/paper/decoupled-neural-interfaces-using-synthetic
Repo https://github.com/TheoryDev/Deep-neural-network-training-optimisation
Framework pytorch

Deep Learning for Identifying Metastatic Breast Cancer

Title Deep Learning for Identifying Metastatic Breast Cancer
Authors Dayong Wang, Aditya Khosla, Rishab Gargeya, Humayun Irshad, Andrew H. Beck
Abstract The International Symposium on Biomedical Imaging (ISBI) held a grand challenge to evaluate computational systems for the automated detection of metastatic breast cancer in whole slide images of sentinel lymph node biopsies. Our team won both competitions in the grand challenge, obtaining an area under the receiver operating curve (AUC) of 0.925 for the task of whole slide image classification and a score of 0.7051 for the tumor localization task. A pathologist independently reviewed the same images, obtaining a whole slide image classification AUC of 0.966 and a tumor localization score of 0.733. Combining our deep learning system’s predictions with the human pathologist’s diagnoses increased the pathologist’s AUC to 0.995, representing an approximately 85 percent reduction in human error rate. These results demonstrate the power of using deep learning to produce significant improvements in the accuracy of pathological diagnoses.
Tasks Image Classification
Published 2016-06-18
URL http://arxiv.org/abs/1606.05718v1
PDF http://arxiv.org/pdf/1606.05718v1.pdf
PWC https://paperswithcode.com/paper/deep-learning-for-identifying-metastatic
Repo https://github.com/martin-fabbri/kaggle-histopathologic-cancer-detector
Framework none

V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation

Title V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation
Authors Fausto Milletari, Nassir Navab, Seyed-Ahmad Ahmadi
Abstract Convolutional Neural Networks (CNNs) have been recently employed to solve problems from both the computer vision and medical image analysis fields. Despite their popularity, most approaches are only able to process 2D images while most medical data used in clinical practice consists of 3D volumes. In this work we propose an approach to 3D image segmentation based on a volumetric, fully convolutional, neural network. Our CNN is trained end-to-end on MRI volumes depicting prostate, and learns to predict segmentation for the whole volume at once. We introduce a novel objective function, that we optimise during training, based on Dice coefficient. In this way we can deal with situations where there is a strong imbalance between the number of foreground and background voxels. To cope with the limited number of annotated volumes available for training, we augment the data applying random non-linear transformations and histogram matching. We show in our experimental evaluation that our approach achieves good performances on challenging test data while requiring only a fraction of the processing time needed by other previous methods.
Tasks Medical Image Segmentation, Semantic Segmentation, Volumetric Medical Image Segmentation
Published 2016-06-15
URL http://arxiv.org/abs/1606.04797v1
PDF http://arxiv.org/pdf/1606.04797v1.pdf
PWC https://paperswithcode.com/paper/v-net-fully-convolutional-neural-networks-for
Repo https://github.com/alexbmp/run-vnet-keras
Framework none

Modelling Sentence Pairs with Tree-structured Attentive Encoder

Title Modelling Sentence Pairs with Tree-structured Attentive Encoder
Authors Yao Zhou, Cong Liu, Yan Pan
Abstract We describe an attentive encoder that combines tree-structured recursive neural networks and sequential recurrent neural networks for modelling sentence pairs. Since existing attentive models exert attention on the sequential structure, we propose a way to incorporate attention into the tree topology. Specially, given a pair of sentences, our attentive encoder uses the representation of one sentence, which generated via an RNN, to guide the structural encoding of the other sentence on the dependency parse tree. We evaluate the proposed attentive encoder on three tasks: semantic similarity, paraphrase identification and true-false question selection. Experimental results show that our encoder outperforms all baselines and achieves state-of-the-art results on two tasks.
Tasks Paraphrase Identification, Semantic Similarity, Semantic Textual Similarity
Published 2016-10-10
URL http://arxiv.org/abs/1610.02806v1
PDF http://arxiv.org/pdf/1610.02806v1.pdf
PWC https://paperswithcode.com/paper/modelling-sentence-pairs-with-tree-structured
Repo https://github.com/yoosan/sentpair
Framework none

A Character-Level Decoder without Explicit Segmentation for Neural Machine Translation

Title A Character-Level Decoder without Explicit Segmentation for Neural Machine Translation
Authors Junyoung Chung, Kyunghyun Cho, Yoshua Bengio
Abstract The existing machine translation systems, whether phrase-based or neural, have relied almost exclusively on word-level modelling with explicit segmentation. In this paper, we ask a fundamental question: can neural machine translation generate a character sequence without any explicit segmentation? To answer this question, we evaluate an attention-based encoder-decoder with a subword-level encoder and a character-level decoder on four language pairs–En-Cs, En-De, En-Ru and En-Fi– using the parallel corpora from WMT’15. Our experiments show that the models with a character-level decoder outperform the ones with a subword-level decoder on all of the four language pairs. Furthermore, the ensembles of neural models with a character-level decoder outperform the state-of-the-art non-neural machine translation systems on En-Cs, En-De and En-Fi and perform comparably on En-Ru.
Tasks Machine Translation
Published 2016-03-19
URL http://arxiv.org/abs/1603.06147v4
PDF http://arxiv.org/pdf/1603.06147v4.pdf
PWC https://paperswithcode.com/paper/a-character-level-decoder-without-explicit
Repo https://github.com/nyu-dl/dl4mt-cdec
Framework none

Resnet in Resnet: Generalizing Residual Architectures

Title Resnet in Resnet: Generalizing Residual Architectures
Authors Sasha Targ, Diogo Almeida, Kevin Lyman
Abstract Residual networks (ResNets) have recently achieved state-of-the-art on challenging computer vision tasks. We introduce Resnet in Resnet (RiR): a deep dual-stream architecture that generalizes ResNets and standard CNNs and is easily implemented with no computational overhead. RiR consistently improves performance over ResNets, outperforms architectures with similar amounts of augmentation on CIFAR-10, and establishes a new state-of-the-art on CIFAR-100.
Tasks
Published 2016-03-25
URL http://arxiv.org/abs/1603.08029v1
PDF http://arxiv.org/pdf/1603.08029v1.pdf
PWC https://paperswithcode.com/paper/resnet-in-resnet-generalizing-residual
Repo https://github.com/osmr/imgclsmob
Framework mxnet

Learning from Simulated and Unsupervised Images through Adversarial Training

Title Learning from Simulated and Unsupervised Images through Adversarial Training
Authors Ashish Shrivastava, Tomas Pfister, Oncel Tuzel, Josh Susskind, Wenda Wang, Russ Webb
Abstract With recent progress in graphics, it has become more tractable to train models on synthetic images, potentially avoiding the need for expensive annotations. However, learning from synthetic images may not achieve the desired performance due to a gap between synthetic and real image distributions. To reduce this gap, we propose Simulated+Unsupervised (S+U) learning, where the task is to learn a model to improve the realism of a simulator’s output using unlabeled real data, while preserving the annotation information from the simulator. We develop a method for S+U learning that uses an adversarial network similar to Generative Adversarial Networks (GANs), but with synthetic images as inputs instead of random vectors. We make several key modifications to the standard GAN algorithm to preserve annotations, avoid artifacts, and stabilize training: (i) a ‘self-regularization’ term, (ii) a local adversarial loss, and (iii) updating the discriminator using a history of refined images. We show that this enables generation of highly realistic images, which we demonstrate both qualitatively and with a user study. We quantitatively evaluate the generated images by training models for gaze estimation and hand pose estimation. We show a significant improvement over using synthetic images, and achieve state-of-the-art results on the MPIIGaze dataset without any labeled real data.
Tasks Domain Adaptation, Gaze Estimation, Hand Pose Estimation, Image-to-Image Translation, Pose Estimation
Published 2016-12-22
URL http://arxiv.org/abs/1612.07828v2
PDF http://arxiv.org/pdf/1612.07828v2.pdf
PWC https://paperswithcode.com/paper/learning-from-simulated-and-unsupervised
Repo https://github.com/shinseung428/simGAN_NYU_Hand
Framework tf

Coupled Generative Adversarial Networks

Title Coupled Generative Adversarial Networks
Authors Ming-Yu Liu, Oncel Tuzel
Abstract We propose coupled generative adversarial network (CoGAN) for learning a joint distribution of multi-domain images. In contrast to the existing approaches, which require tuples of corresponding images in different domains in the training set, CoGAN can learn a joint distribution without any tuple of corresponding images. It can learn a joint distribution with just samples drawn from the marginal distributions. This is achieved by enforcing a weight-sharing constraint that limits the network capacity and favors a joint distribution solution over a product of marginal distributions one. We apply CoGAN to several joint distribution learning tasks, including learning a joint distribution of color and depth images, and learning a joint distribution of face images with different attributes. For each task it successfully learns the joint distribution without any tuple of corresponding images. We also demonstrate its applications to domain adaptation and image transformation.
Tasks Domain Adaptation, Image-to-Image Translation
Published 2016-06-24
URL http://arxiv.org/abs/1606.07536v2
PDF http://arxiv.org/pdf/1606.07536v2.pdf
PWC https://paperswithcode.com/paper/coupled-generative-adversarial-networks
Repo https://github.com/eriklindernoren/PyTorch-GAN
Framework pytorch
comments powered by Disqus