Paper Group AWR 6
Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks. Hadamard Product for Low-rank Bilinear Pooling. Split-Brain Autoencoders: Unsupervised Learning by Cross-Channel Prediction. Making Deep Neural Networks Robust to Label Noise: a Loss Correction Approach. Gated End-to-End Memory Networks. Trace Norm Reg …
Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks
Title | Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks |
Authors | Tim Salimans, Diederik P. Kingma |
Abstract | We present weight normalization: a reparameterization of the weight vectors in a neural network that decouples the length of those weight vectors from their direction. By reparameterizing the weights in this way we improve the conditioning of the optimization problem and we speed up convergence of stochastic gradient descent. Our reparameterization is inspired by batch normalization but does not introduce any dependencies between the examples in a minibatch. This means that our method can also be applied successfully to recurrent models such as LSTMs and to noise-sensitive applications such as deep reinforcement learning or generative models, for which batch normalization is less well suited. Although our method is much simpler, it still provides much of the speed-up of full batch normalization. In addition, the computational overhead of our method is lower, permitting more optimization steps to be taken in the same amount of time. We demonstrate the usefulness of our method on applications in supervised image recognition, generative modelling, and deep reinforcement learning. |
Tasks | Image Classification |
Published | 2016-02-25 |
URL | http://arxiv.org/abs/1602.07868v3 |
http://arxiv.org/pdf/1602.07868v3.pdf | |
PWC | https://paperswithcode.com/paper/weight-normalization-a-simple |
Repo | https://github.com/TimSalimans/weight_norm |
Framework | none |
Hadamard Product for Low-rank Bilinear Pooling
Title | Hadamard Product for Low-rank Bilinear Pooling |
Authors | Jin-Hwa Kim, Kyoung-Woon On, Woosang Lim, Jeonghee Kim, Jung-Woo Ha, Byoung-Tak Zhang |
Abstract | Bilinear models provide rich representations compared with linear models. They have been applied in various visual tasks, such as object recognition, segmentation, and visual question-answering, to get state-of-the-art performances taking advantage of the expanded representations. However, bilinear representations tend to be high-dimensional, limiting the applicability to computationally complex tasks. We propose low-rank bilinear pooling using Hadamard product for an efficient attention mechanism of multimodal learning. We show that our model outperforms compact bilinear pooling in visual question-answering tasks with the state-of-the-art results on the VQA dataset, having a better parsimonious property. |
Tasks | Visual Question Answering |
Published | 2016-10-14 |
URL | http://arxiv.org/abs/1610.04325v4 |
http://arxiv.org/pdf/1610.04325v4.pdf | |
PWC | https://paperswithcode.com/paper/hadamard-product-for-low-rank-bilinear |
Repo | https://github.com/Cadene/vqa.pytorch |
Framework | pytorch |
Split-Brain Autoencoders: Unsupervised Learning by Cross-Channel Prediction
Title | Split-Brain Autoencoders: Unsupervised Learning by Cross-Channel Prediction |
Authors | Richard Zhang, Phillip Isola, Alexei A. Efros |
Abstract | We propose split-brain autoencoders, a straightforward modification of the traditional autoencoder architecture, for unsupervised representation learning. The method adds a split to the network, resulting in two disjoint sub-networks. Each sub-network is trained to perform a difficult task – predicting one subset of the data channels from another. Together, the sub-networks extract features from the entire input signal. By forcing the network to solve cross-channel prediction tasks, we induce a representation within the network which transfers well to other, unseen tasks. This method achieves state-of-the-art performance on several large-scale transfer learning benchmarks. |
Tasks | Representation Learning, Transfer Learning, Unsupervised Representation Learning |
Published | 2016-11-29 |
URL | http://arxiv.org/abs/1611.09842v3 |
http://arxiv.org/pdf/1611.09842v3.pdf | |
PWC | https://paperswithcode.com/paper/split-brain-autoencoders-unsupervised |
Repo | https://github.com/richzhang/splitbrainauto |
Framework | none |
Making Deep Neural Networks Robust to Label Noise: a Loss Correction Approach
Title | Making Deep Neural Networks Robust to Label Noise: a Loss Correction Approach |
Authors | Giorgio Patrini, Alessandro Rozza, Aditya Menon, Richard Nock, Lizhen Qu |
Abstract | We present a theoretically grounded approach to train deep neural networks, including recurrent networks, subject to class-dependent label noise. We propose two procedures for loss correction that are agnostic to both application domain and network architecture. They simply amount to at most a matrix inversion and multiplication, provided that we know the probability of each class being corrupted into another. We further show how one can estimate these probabilities, adapting a recent technique for noise estimation to the multi-class setting, and thus providing an end-to-end framework. Extensive experiments on MNIST, IMDB, CIFAR-10, CIFAR-100 and a large scale dataset of clothing images employing a diversity of architectures — stacking dense, convolutional, pooling, dropout, batch normalization, word embedding, LSTM and residual layers — demonstrate the noise robustness of our proposals. Incidentally, we also prove that, when ReLU is the only non-linearity, the loss curvature is immune to class-dependent label noise. |
Tasks | |
Published | 2016-09-13 |
URL | http://arxiv.org/abs/1609.03683v2 |
http://arxiv.org/pdf/1609.03683v2.pdf | |
PWC | https://paperswithcode.com/paper/making-deep-neural-networks-robust-to-label |
Repo | https://github.com/giorgiop/loss-correction |
Framework | none |
Gated End-to-End Memory Networks
Title | Gated End-to-End Memory Networks |
Authors | Julien Perez, Fei Liu |
Abstract | Machine reading using differentiable reasoning models has recently shown remarkable progress. In this context, End-to-End trainable Memory Networks, MemN2N, have demonstrated promising performance on simple natural language based reasoning tasks such as factual reasoning and basic deduction. However, other tasks, namely multi-fact question-answering, positional reasoning or dialog related tasks, remain challenging particularly due to the necessity of more complex interactions between the memory and controller modules composing this family of models. In this paper, we introduce a novel end-to-end memory access regulation mechanism inspired by the current progress on the connection short-cutting principle in the field of computer vision. Concretely, we develop a Gated End-to-End trainable Memory Network architecture, GMemN2N. From the machine learning perspective, this new capability is learned in an end-to-end fashion without the use of any additional supervision signal which is, as far as our knowledge goes, the first of its kind. Our experiments show significant improvements on the most challenging tasks in the 20 bAbI dataset, without the use of any domain knowledge. Then, we show improvements on the dialog bAbI tasks including the real human-bot conversion-based Dialog State Tracking Challenge (DSTC-2) dataset. On these two datasets, our model sets the new state of the art. |
Tasks | Question Answering, Reading Comprehension |
Published | 2016-10-13 |
URL | http://arxiv.org/abs/1610.04211v2 |
http://arxiv.org/pdf/1610.04211v2.pdf | |
PWC | https://paperswithcode.com/paper/gated-end-to-end-memory-networks |
Repo | https://github.com/cstghitpku/GateMemN2N |
Framework | tf |
Trace Norm Regularised Deep Multi-Task Learning
Title | Trace Norm Regularised Deep Multi-Task Learning |
Authors | Yongxin Yang, Timothy M. Hospedales |
Abstract | We propose a framework for training multiple neural networks simultaneously. The parameters from all models are regularised by the tensor trace norm, so that each neural network is encouraged to reuse others’ parameters if possible – this is the main motivation behind multi-task learning. In contrast to many deep multi-task learning models, we do not predefine a parameter sharing strategy by specifying which layers have tied parameters. Instead, our framework considers sharing for all shareable layers, and the sharing strategy is learned in a data-driven way. |
Tasks | Multi-Task Learning |
Published | 2016-06-13 |
URL | http://arxiv.org/abs/1606.04038v2 |
http://arxiv.org/pdf/1606.04038v2.pdf | |
PWC | https://paperswithcode.com/paper/trace-norm-regularised-deep-multi-task |
Repo | https://github.com/wOOL/TNRDMTL |
Framework | tf |
Fully-adaptive Feature Sharing in Multi-Task Networks with Applications in Person Attribute Classification
Title | Fully-adaptive Feature Sharing in Multi-Task Networks with Applications in Person Attribute Classification |
Authors | Yongxi Lu, Abhishek Kumar, Shuangfei Zhai, Yu Cheng, Tara Javidi, Rogerio Feris |
Abstract | Multi-task learning aims to improve generalization performance of multiple prediction tasks by appropriately sharing relevant information across them. In the context of deep neural networks, this idea is often realized by hand-designed network architectures with layers that are shared across tasks and branches that encode task-specific features. However, the space of possible multi-task deep architectures is combinatorially large and often the final architecture is arrived at by manual exploration of this space subject to designer’s bias, which can be both error-prone and tedious. In this work, we propose a principled approach for designing compact multi-task deep learning architectures. Our approach starts with a thin network and dynamically widens it in a greedy manner during training using a novel criterion that promotes grouping of similar tasks together. Our Extensive evaluation on person attributes classification tasks involving facial and clothing attributes suggests that the models produced by the proposed method are fast, compact and can closely match or exceed the state-of-the-art accuracy from strong baselines by much more expensive models. |
Tasks | Multi-Task Learning |
Published | 2016-11-16 |
URL | http://arxiv.org/abs/1611.05377v1 |
http://arxiv.org/pdf/1611.05377v1.pdf | |
PWC | https://paperswithcode.com/paper/fully-adaptive-feature-sharing-in-multi-task |
Repo | https://github.com/hardianlawi/MTL-Homoscedastic-Uncertainty |
Framework | tf |
Decision Forests, Convolutional Networks and the Models in-Between
Title | Decision Forests, Convolutional Networks and the Models in-Between |
Authors | Yani Ioannou, Duncan Robertson, Darko Zikic, Peter Kontschieder, Jamie Shotton, Matthew Brown, Antonio Criminisi |
Abstract | This paper investigates the connections between two state of the art classifiers: decision forests (DFs, including decision jungles) and convolutional neural networks (CNNs). Decision forests are computationally efficient thanks to their conditional computation property (computation is confined to only a small region of the tree, the nodes along a single branch). CNNs achieve state of the art accuracy, thanks to their representation learning capabilities. We present a systematic analysis of how to fuse conditional computation with representation learning and achieve a continuum of hybrid models with different ratios of accuracy vs. efficiency. We call this new family of hybrid models conditional networks. Conditional networks can be thought of as: i) decision trees augmented with data transformation operators, or ii) CNNs, with block-diagonal sparse weight matrices, and explicit data routing functions. Experimental validation is performed on the common task of image classification on both the CIFAR and Imagenet datasets. Compared to state of the art CNNs, our hybrid models yield the same accuracy with a fraction of the compute cost and much smaller number of parameters. |
Tasks | Image Classification, Representation Learning |
Published | 2016-03-03 |
URL | http://arxiv.org/abs/1603.01250v1 |
http://arxiv.org/pdf/1603.01250v1.pdf | |
PWC | https://paperswithcode.com/paper/decision-forests-convolutional-networks-and |
Repo | https://github.com/PierrickPochelu/word_tree_label |
Framework | tf |
Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling
Title | Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling |
Authors | Bing Liu, Ian Lane |
Abstract | Attention-based encoder-decoder neural network models have recently shown promising results in machine translation and speech recognition. In this work, we propose an attention-based neural network model for joint intent detection and slot filling, both of which are critical steps for many speech understanding and dialog systems. Unlike in machine translation and speech recognition, alignment is explicit in slot filling. We explore different strategies in incorporating this alignment information to the encoder-decoder framework. Learning from the attention mechanism in encoder-decoder model, we further propose introducing attention to the alignment-based RNN models. Such attentions provide additional information to the intent classification and slot label prediction. Our independent task models achieve state-of-the-art intent detection error rate and slot filling F1 score on the benchmark ATIS task. Our joint training model further obtains 0.56% absolute (23.8% relative) error reduction on intent detection and 0.23% absolute gain on slot filling over the independent task models. |
Tasks | Intent Classification, Intent Detection, Slot Filling |
Published | 2016-09-06 |
URL | http://arxiv.org/abs/1609.01454v1 |
http://arxiv.org/pdf/1609.01454v1.pdf | |
PWC | https://paperswithcode.com/paper/attention-based-recurrent-neural-network |
Repo | https://github.com/Sungguk/Jointly-Training-of-Sequence-Labeling-and-Classification |
Framework | tf |
Non-convex Global Minimization and False Discovery Rate Control for the TREX
Title | Non-convex Global Minimization and False Discovery Rate Control for the TREX |
Authors | Jacob Bien, Irina Gaynanova, Johannes Lederer, Christian Müller |
Abstract | The TREX is a recently introduced method for performing sparse high-dimensional regression. Despite its statistical promise as an alternative to the lasso, square-root lasso, and scaled lasso, the TREX is computationally challenging in that it requires solving a non-convex optimization problem. This paper shows a remarkable result: despite the non-convexity of the TREX problem, there exists a polynomial-time algorithm that is guaranteed to find the global minimum. This result adds the TREX to a very short list of non-convex optimization problems that can be globally optimized (principal components analysis being a famous example). After deriving and developing this new approach, we demonstrate that (i) the ability of the preexisting TREX heuristic to reach the global minimum is strongly dependent on the difficulty of the underlying statistical problem, (ii) the new polynomial-time algorithm for TREX permits a novel variable ranking and selection scheme, (iii) this scheme can be incorporated into a rule that controls the false discovery rate (FDR) of included features in the model. To achieve this last aim, we provide an extension of the results of Barber & Candes (2015) to establish that the knockoff filter framework can be applied to the TREX. This investigation thus provides both a rare case study of a heuristic for non-convex optimization and a novel way of exploiting non-convexity for statistical inference. |
Tasks | |
Published | 2016-04-22 |
URL | http://arxiv.org/abs/1604.06815v2 |
http://arxiv.org/pdf/1604.06815v2.pdf | |
PWC | https://paperswithcode.com/paper/non-convex-global-minimization-and-false |
Repo | https://github.com/muellsen/TREX |
Framework | none |
Visual Dynamics: Probabilistic Future Frame Synthesis via Cross Convolutional Networks
Title | Visual Dynamics: Probabilistic Future Frame Synthesis via Cross Convolutional Networks |
Authors | Tianfan Xue, Jiajun Wu, Katherine L. Bouman, William T. Freeman |
Abstract | We study the problem of synthesizing a number of likely future frames from a single input image. In contrast to traditional methods, which have tackled this problem in a deterministic or non-parametric way, we propose a novel approach that models future frames in a probabilistic manner. Our probabilistic model makes it possible for us to sample and synthesize many possible future frames from a single input image. Future frame synthesis is challenging, as it involves low- and high-level image and motion understanding. We propose a novel network structure, namely a Cross Convolutional Network to aid in synthesizing future frames; this network structure encodes image and motion information as feature maps and convolutional kernels, respectively. In experiments, our model performs well on synthetic data, such as 2D shapes and animated game sprites, as well as on real-wold videos. We also show that our model can be applied to tasks such as visual analogy-making, and present an analysis of the learned network representations. |
Tasks | |
Published | 2016-07-09 |
URL | http://arxiv.org/abs/1607.02586v1 |
http://arxiv.org/pdf/1607.02586v1.pdf | |
PWC | https://paperswithcode.com/paper/visual-dynamics-probabilistic-future-frame |
Repo | https://github.com/tensorflow/models |
Framework | tf |
Domain Separation Networks
Title | Domain Separation Networks |
Authors | Konstantinos Bousmalis, George Trigeorgis, Nathan Silberman, Dilip Krishnan, Dumitru Erhan |
Abstract | The cost of large scale data collection and annotation often makes the application of machine learning algorithms to new tasks or datasets prohibitively expensive. One approach circumventing this cost is training models on synthetic data where annotations are provided automatically. Despite their appeal, such models often fail to generalize from synthetic to real images, necessitating domain adaptation algorithms to manipulate these models before they can be successfully applied. Existing approaches focus either on mapping representations from one domain to the other, or on learning to extract features that are invariant to the domain from which they were extracted. However, by focusing only on creating a mapping or shared representation between the two domains, they ignore the individual characteristics of each domain. We suggest that explicitly modeling what is unique to each domain can improve a model’s ability to extract domain-invariant features. Inspired by work on private-shared component analysis, we explicitly learn to extract image representations that are partitioned into two subspaces: one component which is private to each domain and one which is shared across domains. Our model is trained not only to perform the task we care about in the source domain, but also to use the partitioned representation to reconstruct the images from both domains. Our novel architecture results in a model that outperforms the state-of-the-art on a range of unsupervised domain adaptation scenarios and additionally produces visualizations of the private and shared representations enabling interpretation of the domain adaptation process. |
Tasks | Domain Adaptation, Unsupervised Domain Adaptation |
Published | 2016-08-22 |
URL | http://arxiv.org/abs/1608.06019v1 |
http://arxiv.org/pdf/1608.06019v1.pdf | |
PWC | https://paperswithcode.com/paper/domain-separation-networks |
Repo | https://github.com/tensorflow/models/tree/master/research/domain_adaptation |
Framework | tf |
COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images
Title | COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images |
Authors | Andreas Veit, Tomas Matera, Lukas Neumann, Jiri Matas, Serge Belongie |
Abstract | This paper describes the COCO-Text dataset. In recent years large-scale datasets like SUN and Imagenet drove the advancement of scene understanding and object recognition. The goal of COCO-Text is to advance state-of-the-art in text detection and recognition in natural images. The dataset is based on the MS COCO dataset, which contains images of complex everyday scenes. The images were not collected with text in mind and thus contain a broad variety of text instances. To reflect the diversity of text in natural scenes, we annotate text with (a) location in terms of a bounding box, (b) fine-grained classification into machine printed text and handwritten text, (c) classification into legible and illegible text, (d) script of the text and (e) transcriptions of legible text. The dataset contains over 173k text annotations in over 63k images. We provide a statistical analysis of the accuracy of our annotations. In addition, we present an analysis of three leading state-of-the-art photo Optical Character Recognition (OCR) approaches on our dataset. While scene text detection and recognition enjoys strong advances in recent years, we identify significant shortcomings motivating future work. |
Tasks | Object Recognition, Optical Character Recognition, Scene Text Detection, Scene Understanding |
Published | 2016-01-26 |
URL | http://arxiv.org/abs/1601.07140v2 |
http://arxiv.org/pdf/1601.07140v2.pdf | |
PWC | https://paperswithcode.com/paper/coco-text-dataset-and-benchmark-for-text |
Repo | https://github.com/OzHsu23/chineseocr |
Framework | tf |
Deep Shading: Convolutional Neural Networks for Screen-Space Shading
Title | Deep Shading: Convolutional Neural Networks for Screen-Space Shading |
Authors | Oliver Nalbach, Elena Arabadzhiyska, Dushyant Mehta, Hans-Peter Seidel, Tobias Ritschel |
Abstract | In computer vision, convolutional neural networks (CNNs) have recently achieved new levels of performance for several inverse problems where RGB pixel appearance is mapped to attributes such as positions, normals or reflectance. In computer graphics, screen-space shading has recently increased the visual quality in interactive image synthesis, where per-pixel attributes such as positions, normals or reflectance of a virtual 3D scene are converted into RGB pixel appearance, enabling effects like ambient occlusion, indirect light, scattering, depth-of-field, motion blur, or anti-aliasing. In this paper we consider the diagonal problem: synthesizing appearance from given per-pixel attributes using a CNN. The resulting Deep Shading simulates various screen-space effects at competitive quality and speed while not being programmed by human experts but learned from example images. |
Tasks | Image Generation |
Published | 2016-03-19 |
URL | http://arxiv.org/abs/1603.06078v2 |
http://arxiv.org/pdf/1603.06078v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-shading-convolutional-neural-networks |
Repo | https://github.com/paragchaudhuri/CS775Project |
Framework | none |
A Novel Framework to Expedite Systematic Reviews by Automatically Building Information Extraction Training Corpora
Title | A Novel Framework to Expedite Systematic Reviews by Automatically Building Information Extraction Training Corpora |
Authors | Tanmay Basu, Shraman Kumar, Abhishek Kalyan, Priyanka Jayaswal, Pawan Goyal, Stephen Pettifer, Siddhartha R. Jonnalagadda |
Abstract | A systematic review identifies and collates various clinical studies and compares data elements and results in order to provide an evidence based answer for a particular clinical question. The process is manual and involves lot of time. A tool to automate this process is lacking. The aim of this work is to develop a framework using natural language processing and machine learning to build information extraction algorithms to identify data elements in a new primary publication, without having to go through the expensive task of manual annotation to build gold standards for each data element type. The system is developed in two stages. Initially, it uses information contained in existing systematic reviews to identify the sentences from the PDF files of the included references that contain specific data elements of interest using a modified Jaccard similarity measure. These sentences have been treated as labeled data.A Support Vector Machine (SVM) classifier is trained on this labeled data to extract data elements of interests from a new article. We conducted experiments on Cochrane Database systematic reviews related to congestive heart failure using inclusion criteria as an example data element. The empirical results show that the proposed system automatically identifies sentences containing the data element of interest with a high recall (93.75%) and reasonable precision (27.05% - which means the reviewers have to read only 3.7 sentences on average). The empirical results suggest that the tool is retrieving valuable information from the reference articles, even when it is time-consuming to identify them manually. Thus we hope that the tool will be useful for automatic data extraction from biomedical research publications. The future scope of this work is to generalize this information framework for all types of systematic reviews. |
Tasks | |
Published | 2016-06-21 |
URL | http://arxiv.org/abs/1606.06424v1 |
http://arxiv.org/pdf/1606.06424v1.pdf | |
PWC | https://paperswithcode.com/paper/a-novel-framework-to-expedite-systematic |
Repo | https://github.com/tanmaybasu/Towards-Expediting-the-Process-of-Building-Systematic-Review-using-Machine-Learning |
Framework | none |