May 7, 2019

3228 words 16 mins read

Paper Group AWR 6

Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks. Hadamard Product for Low-rank Bilinear Pooling. Split-Brain Autoencoders: Unsupervised Learning by Cross-Channel Prediction. Making Deep Neural Networks Robust to Label Noise: a Loss Correction Approach. Gated End-to-End Memory Networks. Trace Norm Reg …

Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks


Title	Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks
Authors	Tim Salimans, Diederik P. Kingma
Abstract	We present weight normalization: a reparameterization of the weight vectors in a neural network that decouples the length of those weight vectors from their direction. By reparameterizing the weights in this way we improve the conditioning of the optimization problem and we speed up convergence of stochastic gradient descent. Our reparameterization is inspired by batch normalization but does not introduce any dependencies between the examples in a minibatch. This means that our method can also be applied successfully to recurrent models such as LSTMs and to noise-sensitive applications such as deep reinforcement learning or generative models, for which batch normalization is less well suited. Although our method is much simpler, it still provides much of the speed-up of full batch normalization. In addition, the computational overhead of our method is lower, permitting more optimization steps to be taken in the same amount of time. We demonstrate the usefulness of our method on applications in supervised image recognition, generative modelling, and deep reinforcement learning.
Tasks	Image Classification
Published	2016-02-25
URL	http://arxiv.org/abs/1602.07868v3
PDF	http://arxiv.org/pdf/1602.07868v3.pdf
PWC	https://paperswithcode.com/paper/weight-normalization-a-simple
Repo	https://github.com/TimSalimans/weight_norm
Framework	none

Hadamard Product for Low-rank Bilinear Pooling


Title	Hadamard Product for Low-rank Bilinear Pooling
Authors	Jin-Hwa Kim, Kyoung-Woon On, Woosang Lim, Jeonghee Kim, Jung-Woo Ha, Byoung-Tak Zhang
Abstract	Bilinear models provide rich representations compared with linear models. They have been applied in various visual tasks, such as object recognition, segmentation, and visual question-answering, to get state-of-the-art performances taking advantage of the expanded representations. However, bilinear representations tend to be high-dimensional, limiting the applicability to computationally complex tasks. We propose low-rank bilinear pooling using Hadamard product for an efficient attention mechanism of multimodal learning. We show that our model outperforms compact bilinear pooling in visual question-answering tasks with the state-of-the-art results on the VQA dataset, having a better parsimonious property.
Tasks	Visual Question Answering
Published	2016-10-14
URL	http://arxiv.org/abs/1610.04325v4
PDF	http://arxiv.org/pdf/1610.04325v4.pdf
PWC	https://paperswithcode.com/paper/hadamard-product-for-low-rank-bilinear
Repo	https://github.com/Cadene/vqa.pytorch
Framework	pytorch

Split-Brain Autoencoders: Unsupervised Learning by Cross-Channel Prediction


Title	Split-Brain Autoencoders: Unsupervised Learning by Cross-Channel Prediction
Authors	Richard Zhang, Phillip Isola, Alexei A. Efros
Abstract	We propose split-brain autoencoders, a straightforward modification of the traditional autoencoder architecture, for unsupervised representation learning. The method adds a split to the network, resulting in two disjoint sub-networks. Each sub-network is trained to perform a difficult task – predicting one subset of the data channels from another. Together, the sub-networks extract features from the entire input signal. By forcing the network to solve cross-channel prediction tasks, we induce a representation within the network which transfers well to other, unseen tasks. This method achieves state-of-the-art performance on several large-scale transfer learning benchmarks.
Tasks	Representation Learning, Transfer Learning, Unsupervised Representation Learning
Published	2016-11-29
URL	http://arxiv.org/abs/1611.09842v3
PDF	http://arxiv.org/pdf/1611.09842v3.pdf
PWC	https://paperswithcode.com/paper/split-brain-autoencoders-unsupervised
Repo	https://github.com/richzhang/splitbrainauto
Framework	none

Making Deep Neural Networks Robust to Label Noise: a Loss Correction Approach


Title	Making Deep Neural Networks Robust to Label Noise: a Loss Correction Approach
Authors	Giorgio Patrini, Alessandro Rozza, Aditya Menon, Richard Nock, Lizhen Qu
Abstract	We present a theoretically grounded approach to train deep neural networks, including recurrent networks, subject to class-dependent label noise. We propose two procedures for loss correction that are agnostic to both application domain and network architecture. They simply amount to at most a matrix inversion and multiplication, provided that we know the probability of each class being corrupted into another. We further show how one can estimate these probabilities, adapting a recent technique for noise estimation to the multi-class setting, and thus providing an end-to-end framework. Extensive experiments on MNIST, IMDB, CIFAR-10, CIFAR-100 and a large scale dataset of clothing images employing a diversity of architectures — stacking dense, convolutional, pooling, dropout, batch normalization, word embedding, LSTM and residual layers — demonstrate the noise robustness of our proposals. Incidentally, we also prove that, when ReLU is the only non-linearity, the loss curvature is immune to class-dependent label noise.
Tasks
Published	2016-09-13
URL	http://arxiv.org/abs/1609.03683v2
PDF	http://arxiv.org/pdf/1609.03683v2.pdf
PWC	https://paperswithcode.com/paper/making-deep-neural-networks-robust-to-label
Repo	https://github.com/giorgiop/loss-correction
Framework	none

Gated End-to-End Memory Networks


Title	Gated End-to-End Memory Networks
Authors	Julien Perez, Fei Liu
Abstract	Machine reading using differentiable reasoning models has recently shown remarkable progress. In this context, End-to-End trainable Memory Networks, MemN2N, have demonstrated promising performance on simple natural language based reasoning tasks such as factual reasoning and basic deduction. However, other tasks, namely multi-fact question-answering, positional reasoning or dialog related tasks, remain challenging particularly due to the necessity of more complex interactions between the memory and controller modules composing this family of models. In this paper, we introduce a novel end-to-end memory access regulation mechanism inspired by the current progress on the connection short-cutting principle in the field of computer vision. Concretely, we develop a Gated End-to-End trainable Memory Network architecture, GMemN2N. From the machine learning perspective, this new capability is learned in an end-to-end fashion without the use of any additional supervision signal which is, as far as our knowledge goes, the first of its kind. Our experiments show significant improvements on the most challenging tasks in the 20 bAbI dataset, without the use of any domain knowledge. Then, we show improvements on the dialog bAbI tasks including the real human-bot conversion-based Dialog State Tracking Challenge (DSTC-2) dataset. On these two datasets, our model sets the new state of the art.
Tasks	Question Answering, Reading Comprehension
Published	2016-10-13
URL	http://arxiv.org/abs/1610.04211v2
PDF	http://arxiv.org/pdf/1610.04211v2.pdf
PWC	https://paperswithcode.com/paper/gated-end-to-end-memory-networks
Repo	https://github.com/cstghitpku/GateMemN2N
Framework	tf

Trace Norm Regularised Deep Multi-Task Learning


Title	Trace Norm Regularised Deep Multi-Task Learning
Authors	Yongxin Yang, Timothy M. Hospedales
Abstract	We propose a framework for training multiple neural networks simultaneously. The parameters from all models are regularised by the tensor trace norm, so that each neural network is encouraged to reuse others’ parameters if possible – this is the main motivation behind multi-task learning. In contrast to many deep multi-task learning models, we do not predefine a parameter sharing strategy by specifying which layers have tied parameters. Instead, our framework considers sharing for all shareable layers, and the sharing strategy is learned in a data-driven way.
Tasks	Multi-Task Learning
Published	2016-06-13
URL	http://arxiv.org/abs/1606.04038v2
PDF	http://arxiv.org/pdf/1606.04038v2.pdf
PWC	https://paperswithcode.com/paper/trace-norm-regularised-deep-multi-task
Repo	https://github.com/wOOL/TNRDMTL
Framework	tf


Title	Fully-adaptive Feature Sharing in Multi-Task Networks with Applications in Person Attribute Classification
Authors	Yongxi Lu, Abhishek Kumar, Shuangfei Zhai, Yu Cheng, Tara Javidi, Rogerio Feris
Abstract	Multi-task learning aims to improve generalization performance of multiple prediction tasks by appropriately sharing relevant information across them. In the context of deep neural networks, this idea is often realized by hand-designed network architectures with layers that are shared across tasks and branches that encode task-specific features. However, the space of possible multi-task deep architectures is combinatorially large and often the final architecture is arrived at by manual exploration of this space subject to designer’s bias, which can be both error-prone and tedious. In this work, we propose a principled approach for designing compact multi-task deep learning architectures. Our approach starts with a thin network and dynamically widens it in a greedy manner during training using a novel criterion that promotes grouping of similar tasks together. Our Extensive evaluation on person attributes classification tasks involving facial and clothing attributes suggests that the models produced by the proposed method are fast, compact and can closely match or exceed the state-of-the-art accuracy from strong baselines by much more expensive models.
Tasks	Multi-Task Learning
Published	2016-11-16
URL	http://arxiv.org/abs/1611.05377v1
PDF	http://arxiv.org/pdf/1611.05377v1.pdf
PWC	https://paperswithcode.com/paper/fully-adaptive-feature-sharing-in-multi-task
Repo	https://github.com/hardianlawi/MTL-Homoscedastic-Uncertainty
Framework	tf

Decision Forests, Convolutional Networks and the Models in-Between


Title	Decision Forests, Convolutional Networks and the Models in-Between
Authors	Yani Ioannou, Duncan Robertson, Darko Zikic, Peter Kontschieder, Jamie Shotton, Matthew Brown, Antonio Criminisi
Abstract	This paper investigates the connections between two state of the art classifiers: decision forests (DFs, including decision jungles) and convolutional neural networks (CNNs). Decision forests are computationally efficient thanks to their conditional computation property (computation is confined to only a small region of the tree, the nodes along a single branch). CNNs achieve state of the art accuracy, thanks to their representation learning capabilities. We present a systematic analysis of how to fuse conditional computation with representation learning and achieve a continuum of hybrid models with different ratios of accuracy vs. efficiency. We call this new family of hybrid models conditional networks. Conditional networks can be thought of as: i) decision trees augmented with data transformation operators, or ii) CNNs, with block-diagonal sparse weight matrices, and explicit data routing functions. Experimental validation is performed on the common task of image classification on both the CIFAR and Imagenet datasets. Compared to state of the art CNNs, our hybrid models yield the same accuracy with a fraction of the compute cost and much smaller number of parameters.
Tasks	Image Classification, Representation Learning
Published	2016-03-03
URL	http://arxiv.org/abs/1603.01250v1
PDF	http://arxiv.org/pdf/1603.01250v1.pdf
PWC	https://paperswithcode.com/paper/decision-forests-convolutional-networks-and
Repo	https://github.com/PierrickPochelu/word_tree_label
Framework	tf

Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling


Title	Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling
Authors	Bing Liu, Ian Lane
Abstract	Attention-based encoder-decoder neural network models have recently shown promising results in machine translation and speech recognition. In this work, we propose an attention-based neural network model for joint intent detection and slot filling, both of which are critical steps for many speech understanding and dialog systems. Unlike in machine translation and speech recognition, alignment is explicit in slot filling. We explore different strategies in incorporating this alignment information to the encoder-decoder framework. Learning from the attention mechanism in encoder-decoder model, we further propose introducing attention to the alignment-based RNN models. Such attentions provide additional information to the intent classification and slot label prediction. Our independent task models achieve state-of-the-art intent detection error rate and slot filling F1 score on the benchmark ATIS task. Our joint training model further obtains 0.56% absolute (23.8% relative) error reduction on intent detection and 0.23% absolute gain on slot filling over the independent task models.
Tasks	Intent Classification, Intent Detection, Slot Filling
Published	2016-09-06
URL	http://arxiv.org/abs/1609.01454v1
PDF	http://arxiv.org/pdf/1609.01454v1.pdf
PWC	https://paperswithcode.com/paper/attention-based-recurrent-neural-network
Repo	https://github.com/Sungguk/Jointly-Training-of-Sequence-Labeling-and-Classification
Framework	tf

Non-convex Global Minimization and False Discovery Rate Control for the TREX


Title	Non-convex Global Minimization and False Discovery Rate Control for the TREX
Authors	Jacob Bien, Irina Gaynanova, Johannes Lederer, Christian Müller
Abstract	The TREX is a recently introduced method for performing sparse high-dimensional regression. Despite its statistical promise as an alternative to the lasso, square-root lasso, and scaled lasso, the TREX is computationally challenging in that it requires solving a non-convex optimization problem. This paper shows a remarkable result: despite the non-convexity of the TREX problem, there exists a polynomial-time algorithm that is guaranteed to find the global minimum. This result adds the TREX to a very short list of non-convex optimization problems that can be globally optimized (principal components analysis being a famous example). After deriving and developing this new approach, we demonstrate that (i) the ability of the preexisting TREX heuristic to reach the global minimum is strongly dependent on the difficulty of the underlying statistical problem, (ii) the new polynomial-time algorithm for TREX permits a novel variable ranking and selection scheme, (iii) this scheme can be incorporated into a rule that controls the false discovery rate (FDR) of included features in the model. To achieve this last aim, we provide an extension of the results of Barber & Candes (2015) to establish that the knockoff filter framework can be applied to the TREX. This investigation thus provides both a rare case study of a heuristic for non-convex optimization and a novel way of exploiting non-convexity for statistical inference.
Tasks
Published	2016-04-22
URL	http://arxiv.org/abs/1604.06815v2
PDF	http://arxiv.org/pdf/1604.06815v2.pdf
PWC	https://paperswithcode.com/paper/non-convex-global-minimization-and-false
Repo	https://github.com/muellsen/TREX
Framework	none

Visual Dynamics: Probabilistic Future Frame Synthesis via Cross Convolutional Networks


Title	Visual Dynamics: Probabilistic Future Frame Synthesis via Cross Convolutional Networks
Authors	Tianfan Xue, Jiajun Wu, Katherine L. Bouman, William T. Freeman
Abstract	We study the problem of synthesizing a number of likely future frames from a single input image. In contrast to traditional methods, which have tackled this problem in a deterministic or non-parametric way, we propose a novel approach that models future frames in a probabilistic manner. Our probabilistic model makes it possible for us to sample and synthesize many possible future frames from a single input image. Future frame synthesis is challenging, as it involves low- and high-level image and motion understanding. We propose a novel network structure, namely a Cross Convolutional Network to aid in synthesizing future frames; this network structure encodes image and motion information as feature maps and convolutional kernels, respectively. In experiments, our model performs well on synthetic data, such as 2D shapes and animated game sprites, as well as on real-wold videos. We also show that our model can be applied to tasks such as visual analogy-making, and present an analysis of the learned network representations.
Tasks
Published	2016-07-09
URL	http://arxiv.org/abs/1607.02586v1
PDF	http://arxiv.org/pdf/1607.02586v1.pdf
PWC	https://paperswithcode.com/paper/visual-dynamics-probabilistic-future-frame
Repo	https://github.com/tensorflow/models
Framework	tf

Domain Separation Networks


Title	Domain Separation Networks
Authors	Konstantinos Bousmalis, George Trigeorgis, Nathan Silberman, Dilip Krishnan, Dumitru Erhan
Abstract	The cost of large scale data collection and annotation often makes the application of machine learning algorithms to new tasks or datasets prohibitively expensive. One approach circumventing this cost is training models on synthetic data where annotations are provided automatically. Despite their appeal, such models often fail to generalize from synthetic to real images, necessitating domain adaptation algorithms to manipulate these models before they can be successfully applied. Existing approaches focus either on mapping representations from one domain to the other, or on learning to extract features that are invariant to the domain from which they were extracted. However, by focusing only on creating a mapping or shared representation between the two domains, they ignore the individual characteristics of each domain. We suggest that explicitly modeling what is unique to each domain can improve a model’s ability to extract domain-invariant features. Inspired by work on private-shared component analysis, we explicitly learn to extract image representations that are partitioned into two subspaces: one component which is private to each domain and one which is shared across domains. Our model is trained not only to perform the task we care about in the source domain, but also to use the partitioned representation to reconstruct the images from both domains. Our novel architecture results in a model that outperforms the state-of-the-art on a range of unsupervised domain adaptation scenarios and additionally produces visualizations of the private and shared representations enabling interpretation of the domain adaptation process.
Tasks	Domain Adaptation, Unsupervised Domain Adaptation
Published	2016-08-22
URL	http://arxiv.org/abs/1608.06019v1
PDF	http://arxiv.org/pdf/1608.06019v1.pdf
PWC	https://paperswithcode.com/paper/domain-separation-networks
Repo	https://github.com/tensorflow/models/tree/master/research/domain_adaptation
Framework	tf

COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images


Title	COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images
Authors	Andreas Veit, Tomas Matera, Lukas Neumann, Jiri Matas, Serge Belongie
Abstract	This paper describes the COCO-Text dataset. In recent years large-scale datasets like SUN and Imagenet drove the advancement of scene understanding and object recognition. The goal of COCO-Text is to advance state-of-the-art in text detection and recognition in natural images. The dataset is based on the MS COCO dataset, which contains images of complex everyday scenes. The images were not collected with text in mind and thus contain a broad variety of text instances. To reflect the diversity of text in natural scenes, we annotate text with (a) location in terms of a bounding box, (b) fine-grained classification into machine printed text and handwritten text, (c) classification into legible and illegible text, (d) script of the text and (e) transcriptions of legible text. The dataset contains over 173k text annotations in over 63k images. We provide a statistical analysis of the accuracy of our annotations. In addition, we present an analysis of three leading state-of-the-art photo Optical Character Recognition (OCR) approaches on our dataset. While scene text detection and recognition enjoys strong advances in recent years, we identify significant shortcomings motivating future work.
Tasks	Object Recognition, Optical Character Recognition, Scene Text Detection, Scene Understanding
Published	2016-01-26
URL	http://arxiv.org/abs/1601.07140v2
PDF	http://arxiv.org/pdf/1601.07140v2.pdf
PWC	https://paperswithcode.com/paper/coco-text-dataset-and-benchmark-for-text
Repo	https://github.com/OzHsu23/chineseocr
Framework	tf

Deep Shading: Convolutional Neural Networks for Screen-Space Shading


Title	Deep Shading: Convolutional Neural Networks for Screen-Space Shading
Authors	Oliver Nalbach, Elena Arabadzhiyska, Dushyant Mehta, Hans-Peter Seidel, Tobias Ritschel
Abstract	In computer vision, convolutional neural networks (CNNs) have recently achieved new levels of performance for several inverse problems where RGB pixel appearance is mapped to attributes such as positions, normals or reflectance. In computer graphics, screen-space shading has recently increased the visual quality in interactive image synthesis, where per-pixel attributes such as positions, normals or reflectance of a virtual 3D scene are converted into RGB pixel appearance, enabling effects like ambient occlusion, indirect light, scattering, depth-of-field, motion blur, or anti-aliasing. In this paper we consider the diagonal problem: synthesizing appearance from given per-pixel attributes using a CNN. The resulting Deep Shading simulates various screen-space effects at competitive quality and speed while not being programmed by human experts but learned from example images.
Tasks	Image Generation
Published	2016-03-19
URL	http://arxiv.org/abs/1603.06078v2
PDF	http://arxiv.org/pdf/1603.06078v2.pdf
PWC	https://paperswithcode.com/paper/deep-shading-convolutional-neural-networks
Repo	https://github.com/paragchaudhuri/CS775Project
Framework	none

A Novel Framework to Expedite Systematic Reviews by Automatically Building Information Extraction Training Corpora


Title	A Novel Framework to Expedite Systematic Reviews by Automatically Building Information Extraction Training Corpora
Authors	Tanmay Basu, Shraman Kumar, Abhishek Kalyan, Priyanka Jayaswal, Pawan Goyal, Stephen Pettifer, Siddhartha R. Jonnalagadda
Abstract	A systematic review identifies and collates various clinical studies and compares data elements and results in order to provide an evidence based answer for a particular clinical question. The process is manual and involves lot of time. A tool to automate this process is lacking. The aim of this work is to develop a framework using natural language processing and machine learning to build information extraction algorithms to identify data elements in a new primary publication, without having to go through the expensive task of manual annotation to build gold standards for each data element type. The system is developed in two stages. Initially, it uses information contained in existing systematic reviews to identify the sentences from the PDF files of the included references that contain specific data elements of interest using a modified Jaccard similarity measure. These sentences have been treated as labeled data.A Support Vector Machine (SVM) classifier is trained on this labeled data to extract data elements of interests from a new article. We conducted experiments on Cochrane Database systematic reviews related to congestive heart failure using inclusion criteria as an example data element. The empirical results show that the proposed system automatically identifies sentences containing the data element of interest with a high recall (93.75%) and reasonable precision (27.05% - which means the reviewers have to read only 3.7 sentences on average). The empirical results suggest that the tool is retrieving valuable information from the reference articles, even when it is time-consuming to identify them manually. Thus we hope that the tool will be useful for automatic data extraction from biomedical research publications. The future scope of this work is to generalize this information framework for all types of systematic reviews.
Tasks
Published	2016-06-21
URL	http://arxiv.org/abs/1606.06424v1
PDF	http://arxiv.org/pdf/1606.06424v1.pdf
PWC	https://paperswithcode.com/paper/a-novel-framework-to-expedite-systematic
Repo	https://github.com/tanmaybasu/Towards-Expediting-the-Process-of-Building-Systematic-Review-using-Machine-Learning
Framework	none