October 20, 2019

3307 words 16 mins read

Paper Group AWR 205

Trained Rank Pruning for Efficient Deep Neural Networks. An Unsupervised Learning Model for Deformable Medical Image Registration. Did the Model Understand the Question?. Identifying Well-formed Natural Language Questions. A Comprehensive Analysis of Deep Regression. Multispectral Pedestrian Detection via Simultaneous Detection and Segmentation. Mu …

Trained Rank Pruning for Efficient Deep Neural Networks


Title	Trained Rank Pruning for Efficient Deep Neural Networks
Authors	Yuhui Xu, Yuxi Li, Shuai Zhang, Wei Wen, Botao Wang, Yingyong Qi, Yiran Chen, Weiyao Lin, Hongkai Xiong
Abstract	The performance of Deep Neural Networks (DNNs) keeps elevating in recent years with increasing network depth and width. To enable DNNs on edge devices like mobile phones, researchers proposed several network compression methods including pruning, quantization and factorization. Among the factorization-based approaches, low-rank approximation has been widely adopted because of its solid theoretical rationale and efficient implementations. Several previous works attempted to directly approximate a pre-trained model by low-rank decomposition; however, small approximation errors in parameters can ripple a large prediction loss. As a result, performance usually drops significantly and a sophisticated fine-tuning is required to recover accuracy. We argue that it is not optimal to separate low-rank approximation from training. Unlike previous works, this paper integrates low rank approximation and regularization into the training. We propose Trained Rank Pruning (TRP), which iterates low rank approximation and training. TRP maintains the capacity of original network while imposes low-rank constraints during training. A stochastic sub-gradient descent optimized nuclear regularization is utilized to further encourage low rank in TRP. The TRP trained network has low-rank structure in nature, and can be approximated with negligible performance loss, eliminating fine-tuning after low rank approximation. The methods are comprehensively evaluated on CIFAR-10 and ImageNet, outperforming previous compression methods using low rank approximation. Code is available: https://github.com/yuhuixu1993/Trained-Rank-Pruning
Tasks	Quantization
Published	2018-12-06
URL	https://arxiv.org/abs/1812.02402v3
PDF	https://arxiv.org/pdf/1812.02402v3.pdf
PWC	https://paperswithcode.com/paper/trained-rank-pruning-for-efficient-deep
Repo	https://github.com/yuhuixu1993/Trained-Rank-Pruning
Framework	pytorch

An Unsupervised Learning Model for Deformable Medical Image Registration


Title	An Unsupervised Learning Model for Deformable Medical Image Registration
Authors	Guha Balakrishnan, Amy Zhao, Mert R. Sabuncu, John Guttag, Adrian V. Dalca
Abstract	We present a fast learning-based algorithm for deformable, pairwise 3D medical image registration. Current registration methods optimize an objective function independently for each pair of images, which can be time-consuming for large data. We define registration as a parametric function, and optimize its parameters given a set of images from a collection of interest. Given a new pair of scans, we can quickly compute a registration field by directly evaluating the function using the learned parameters. We model this function using a convolutional neural network (CNN), and use a spatial transform layer to reconstruct one image from another while imposing smoothness constraints on the registration field. The proposed method does not require supervised information such as ground truth registration fields or anatomical landmarks. We demonstrate registration accuracy comparable to state-of-the-art 3D image registration, while operating orders of magnitude faster in practice. Our method promises to significantly speed up medical image analysis and processing pipelines, while facilitating novel directions in learning-based registration and its applications. Our code is available at https://github.com/balakg/voxelmorph .
Tasks	Deformable Medical Image Registration, Image Registration, Medical Image Registration
Published	2018-02-07
URL	http://arxiv.org/abs/1802.02604v3
PDF	http://arxiv.org/pdf/1802.02604v3.pdf
PWC	https://paperswithcode.com/paper/an-unsupervised-learning-model-for-deformable
Repo	https://github.com/yh854/Rigid-Registration-of-3D-MRI-Based-on-Unsupervised-Learning
Framework	tf

Did the Model Understand the Question?


Title	Did the Model Understand the Question?
Authors	Pramod Kaushik Mudrakarta, Ankur Taly, Mukund Sundararajan, Kedar Dhamdhere
Abstract	We analyze state-of-the-art deep learning models for three tasks: question answering on (1) images, (2) tables, and (3) passages of text. Using the notion of \emph{attribution} (word importance), we find that these deep networks often ignore important question terms. Leveraging such behavior, we perturb questions to craft a variety of adversarial examples. Our strongest attacks drop the accuracy of a visual question answering model from $61.1%$ to $19%$, and that of a tabular question answering model from $33.5%$ to $3.3%$. Additionally, we show how attributions can strengthen attacks proposed by Jia and Liang (2017) on paragraph comprehension models. Our results demonstrate that attributions can augment standard measures of accuracy and empower investigation of model performance. When a model is accurate but for the wrong reasons, attributions can surface erroneous logic in the model that indicates inadequacies in the test data.
Tasks	Question Answering, Visual Question Answering
Published	2018-05-14
URL	http://arxiv.org/abs/1805.05492v1
PDF	http://arxiv.org/pdf/1805.05492v1.pdf
PWC	https://paperswithcode.com/paper/did-the-model-understand-the-question
Repo	https://github.com/ankurtaly/Integrated-Gradients
Framework	tf

Identifying Well-formed Natural Language Questions


Title	Identifying Well-formed Natural Language Questions
Authors	Manaal Faruqui, Dipanjan Das
Abstract	Understanding search queries is a hard problem as it involves dealing with “word salad” text ubiquitously issued by users. However, if a query resembles a well-formed question, a natural language processing pipeline is able to perform more accurate interpretation, thus reducing downstream compounding errors. Hence, identifying whether or not a query is well formed can enhance query understanding. Here, we introduce a new task of identifying a well-formed natural language question. We construct and release a dataset of 25,100 publicly available questions classified into well-formed and non-wellformed categories and report an accuracy of 70.7% on the test set. We also show that our classifier can be used to improve the performance of neural sequence-to-sequence models for generating questions for reading comprehension.
Tasks	Query Wellformedness
Published	2018-08-28
URL	http://arxiv.org/abs/1808.09419v1
PDF	http://arxiv.org/pdf/1808.09419v1.pdf
PWC	https://paperswithcode.com/paper/identifying-well-formed-natural-language
Repo	https://github.com/google-research-datasets/query-wellformedness
Framework	none

A Comprehensive Analysis of Deep Regression


Title	A Comprehensive Analysis of Deep Regression
Authors	Stéphane Lathuilière, Pablo Mesejo, Xavier Alameda-Pineda, Radu Horaud
Abstract	Deep learning revolutionized data science, and recently its popularity has grown exponentially, as did the amount of papers employing deep networks. Vision tasks, such as human pose estimation, did not escape from this trend. There is a large number of deep models, where small changes in the network architecture, or in the data pre-processing, together with the stochastic nature of the optimization procedures, produce notably different results, making extremely difficult to sift methods that significantly outperform others. This situation motivates the current study, in which we perform a systematic evaluation and statistical analysis of vanilla deep regression, i.e. convolutional neural networks with a linear regression top layer. This is the first comprehensive analysis of deep regression techniques. We perform experiments on four vision problems, and report confidence intervals for the median performance as well as the statistical significance of the results, if any. Surprisingly, the variability due to different data pre-processing procedures generally eclipses the variability due to modifications in the network architecture. Our results reinforce the hypothesis according to which, in general, a general-purpose network (e.g. VGG-16 or ResNet-50) adequately tuned can yield results close to the state-of-the-art without having to resort to more complex and ad-hoc regression models.
Tasks	Pose Estimation
Published	2018-03-22
URL	http://arxiv.org/abs/1803.08450v2
PDF	http://arxiv.org/pdf/1803.08450v2.pdf
PWC	https://paperswithcode.com/paper/a-comprehensive-analysis-of-deep-regression
Repo	https://github.com/seanmcgovern21/Machine-Learning-CS539
Framework	pytorch

Multispectral Pedestrian Detection via Simultaneous Detection and Segmentation


Title	Multispectral Pedestrian Detection via Simultaneous Detection and Segmentation
Authors	Chengyang Li, Dan Song, Ruofeng Tong, Min Tang
Abstract	Multispectral pedestrian detection has attracted increasing attention from the research community due to its crucial competence for many around-the-clock applications (e.g., video surveillance and autonomous driving), especially under insufficient illumination conditions. We create a human baseline over the KAIST dataset and reveal that there is still a large gap between current top detectors and human performance. To narrow this gap, we propose a network fusion architecture, which consists of a multispectral proposal network to generate pedestrian proposals, and a subsequent multispectral classification network to distinguish pedestrian instances from hard negatives. The unified network is learned by jointly optimizing pedestrian detection and semantic segmentation tasks. The final detections are obtained by integrating the outputs from different modalities as well as the two stages. The approach significantly outperforms state-of-the-art methods on the KAIST dataset while remain fast. Additionally, we contribute a sanitized version of training annotations for the KAIST dataset, and examine the effects caused by different kinds of annotation errors. Future research of this problem will benefit from the sanitized version which eliminates the interference of annotation errors.
Tasks	Autonomous Driving, Pedestrian Detection, Semantic Segmentation
Published	2018-08-14
URL	http://arxiv.org/abs/1808.04818v1
PDF	http://arxiv.org/pdf/1808.04818v1.pdf
PWC	https://paperswithcode.com/paper/multispectral-pedestrian-detection-via
Repo	https://github.com/Li-Chengyang/MSDS-RCNN
Framework	tf

Multilevel Artificial Neural Network Training for Spatially Correlated Learning


Title	Multilevel Artificial Neural Network Training for Spatially Correlated Learning
Authors	C. B. Scott, Eric Mjolsness
Abstract	Multigrid modeling algorithms are a technique used to accelerate relaxation models running on a hierarchy of similar graphlike structures. We introduce and demonstrate a new method for training neural networks which uses multilevel methods. Using an objective function derived from a graph-distance metric, we perform orthogonally-constrained optimization to find optimal prolongation and restriction maps between graphs. We compare and contrast several methods for performing this numerical optimization, and additionally present some new theoretical results on upper bounds of this type of objective function. Once calculated, these optimal maps between graphs form the core of Multiscale Artificial Neural Network (MsANN) training, a new procedure we present which simultaneously trains a hierarchy of neural network models of varying spatial resolution. Parameter information is passed between members of this hierarchy according to standard coarsening and refinement schedules from the multiscale modelling literature. In our machine learning experiments, these models are able to learn faster than default training, achieving a comparable level of error in an order of magnitude fewer training examples.
Tasks
Published	2018-06-14
URL	https://arxiv.org/abs/1806.05703v3
PDF	https://arxiv.org/pdf/1806.05703v3.pdf
PWC	https://paperswithcode.com/paper/multilevel-artificial-neural-network-training
Repo	https://github.com/scottcb/MsANN
Framework	tf

Transferring Knowledge across Learning Processes


Title	Transferring Knowledge across Learning Processes
Authors	Sebastian Flennerhag, Pablo G. Moreno, Neil D. Lawrence, Andreas Damianou
Abstract	In complex transfer learning scenarios new tasks might not be tightly linked to previous tasks. Approaches that transfer information contained only in the final parameters of a source model will therefore struggle. Instead, transfer learning at a higher level of abstraction is needed. We propose Leap, a framework that achieves this by transferring knowledge across learning processes. We associate each task with a manifold on which the training process travels from initialization to final parameters and construct a meta-learning objective that minimizes the expected length of this path. Our framework leverages only information obtained during training and can be computed on the fly at negligible cost. We demonstrate that our framework outperforms competing methods, both in meta-learning and transfer learning, on a set of computer vision tasks. Finally, we demonstrate that Leap can transfer knowledge across learning processes in demanding reinforcement learning environments (Atari) that involve millions of gradient steps.
Tasks	Meta-Learning, Transfer Learning
Published	2018-12-03
URL	http://arxiv.org/abs/1812.01054v3
PDF	http://arxiv.org/pdf/1812.01054v3.pdf
PWC	https://paperswithcode.com/paper/transferring-knowledge-across-learning
Repo	https://github.com/amzn/metalearn-leap
Framework	pytorch

Truncated Back-propagation for Bilevel Optimization


Title	Truncated Back-propagation for Bilevel Optimization
Authors	Amirreza Shaban, Ching-An Cheng, Nathan Hatch, Byron Boots
Abstract	Bilevel optimization has been recently revisited for designing and analyzing algorithms in hyperparameter tuning and meta learning tasks. However, due to its nested structure, evaluating exact gradients for high-dimensional problems is computationally challenging. One heuristic to circumvent this difficulty is to use the approximate gradient given by performing truncated back-propagation through the iterative optimization procedure that solves the lower-level problem. Although promising empirical performance has been reported, its theoretical properties are still unclear. In this paper, we analyze the properties of this family of approximate gradients and establish sufficient conditions for convergence. We validate this on several hyperparameter tuning and meta learning tasks. We find that optimization with the approximate gradient computed using few-step back-propagation often performs comparably to optimization with the exact gradient, while requiring far less memory and half the computation time.
Tasks	bilevel optimization, Meta-Learning
Published	2018-10-25
URL	http://arxiv.org/abs/1810.10667v2
PDF	http://arxiv.org/pdf/1810.10667v2.pdf
PWC	https://paperswithcode.com/paper/truncated-back-propagation-for-bilevel
Repo	https://github.com/lucfra/FAR-HO
Framework	tf

Multi-level Semantic Feature Augmentation for One-shot Learning


Title	Multi-level Semantic Feature Augmentation for One-shot Learning
Authors	Zitian Chen, Yanwei Fu, Yinda Zhang, Yu-Gang Jiang, Xiangyang Xue, Leonid Sigal
Abstract	The ability to quickly recognize and learn new visual concepts from limited samples enables humans to swiftly adapt to new environments. This ability is enabled by semantic associations of novel concepts with those that have already been learned and stored in memory. Computers can start to ascertain similar abilities by utilizing a semantic concept space. A concept space is a high-dimensional semantic space in which similar abstract concepts appear close and dissimilar ones far apart. In this paper, we propose a novel approach to one-shot learning that builds on this idea. Our approach learns to map a novel sample instance to a concept, relates that concept to the existing ones in the concept space and generates new instances, by interpolating among the concepts, to help learning. Instead of synthesizing new image instance, we propose to directly synthesize instance features by leveraging semantics using a novel auto-encoder network we call dual TriNet. The encoder part of the TriNet learns to map multi-layer visual features of deep CNNs, that is, multi-level concepts, to a semantic vector. In semantic space, we search for related concepts, which are then projected back into the image feature spaces by the decoder portion of the TriNet. Two strategies in the semantic space are explored. Notably, this seemingly simple strategy results in complex augmented feature distributions in the image feature space, leading to substantially better performance.
Tasks	One-Shot Learning
Published	2018-04-15
URL	http://arxiv.org/abs/1804.05298v4
PDF	http://arxiv.org/pdf/1804.05298v4.pdf
PWC	https://paperswithcode.com/paper/multi-level-semantic-feature-augmentation-for
Repo	https://github.com/tankche1/Semantic-Feature-Augmentation-in-Few-shot-Learning
Framework	pytorch

ARM: Augment-REINFORCE-Merge Gradient for Stochastic Binary Networks


Title	ARM: Augment-REINFORCE-Merge Gradient for Stochastic Binary Networks
Authors	Mingzhang Yin, Mingyuan Zhou
Abstract	To backpropagate the gradients through stochastic binary layers, we propose the augment-REINFORCE-merge (ARM) estimator that is unbiased, exhibits low variance, and has low computational complexity. Exploiting variable augmentation, REINFORCE, and reparameterization, the ARM estimator achieves adaptive variance reduction for Monte Carlo integration by merging two expectations via common random numbers. The variance-reduction mechanism of the ARM estimator can also be attributed to either antithetic sampling in an augmented space, or the use of an optimal anti-symmetric “self-control” baseline function together with the REINFORCE estimator in that augmented space. Experimental results show the ARM estimator provides state-of-the-art performance in auto-encoding variational inference and maximum likelihood estimation, for discrete latent variable models with one or multiple stochastic binary layers. Python code for reproducible research is publicly available.
Tasks	Data Augmentation, Latent Variable Models
Published	2018-07-30
URL	https://arxiv.org/abs/1807.11143v2
PDF	https://arxiv.org/pdf/1807.11143v2.pdf
PWC	https://paperswithcode.com/paper/arm-augment-reinforce-merge-gradient-for
Repo	https://github.com/mingzhang-yin/ARM-gradient
Framework	tf

GIRNet: Interleaved Multi-Task Recurrent State Sequence Models


Title	GIRNet: Interleaved Multi-Task Recurrent State Sequence Models
Authors	Divam Gupta, Tanmoy Chakraborty, Soumen Chakrabarti
Abstract	In several natural language tasks, labeled sequences are available in separate domains (say, languages), but the goal is to label sequences with mixed domain (such as code-switched text). Or, we may have available models for labeling whole passages (say, with sentiments), which we would like to exploit toward better position-specific label inference (say, target-dependent sentiment annotation). A key characteristic shared across such tasks is that different positions in a primary instance can benefit from different `experts’ trained from auxiliary data, but labeled primary instances are scarce, and labeling the best expert for each position entails unacceptable cognitive burden. We propose GITNet, a unified position-sensitive multi-task recurrent neural network (RNN) architecture for such applications. Auxiliary and primary tasks need not share training instances. Auxiliary RNNs are trained over auxiliary instances. A primary instance is also submitted to each auxiliary RNN, but their state sequences are gated and merged into a novel composite state sequence tailored to the primary inference task. Our approach is in sharp contrast to recent multi-task networks like the cross-stitch and sluice network, which do not control state transfer at such fine granularity. We demonstrate the superiority of GIRNet using three applications: sentiment classification of code-switched passages, part-of-speech tagging of code-switched text, and target position-sensitive annotation of sentiment in monolingual passages. In all cases, we establish new state-of-the-art performance beyond recent competitive baselines. \|
Tasks	Part-Of-Speech Tagging, Sentiment Analysis
Published	2018-11-28
URL	http://arxiv.org/abs/1811.11456v2
PDF	http://arxiv.org/pdf/1811.11456v2.pdf
PWC	https://paperswithcode.com/paper/girnet-interleaved-multi-task-recurrent-state
Repo	https://github.com/divamgupta/mtl_girnet
Framework	tf

Is rotation forest the best classifier for problems with continuous features?


Title	Is rotation forest the best classifier for problems with continuous features?
Authors	A. Bagnall, M. Flynn, J. Large, J. Line, A. Bostrom, G. Cawley
Abstract	In short, our experiments suggest that yes, on average, rotation forest is better than the most common alternatives when all the attributes are real-valued. Rotation forest is a tree based ensemble that performs transforms on subsets of attributes prior to constructing each tree. We present an empirical comparison of classifiers for problems with only real-valued features. We evaluate classifiers from three families of algorithms: support vector machines; tree-based ensembles; and neural networks tuned with a large grid search. We compare classifiers on unseen data based on the quality of the decision rule (using classification error) the ability to rank cases (area under the receiver operating characteristic) and the probability estimates (using negative log likelihood). We conclude that, in answer to the question posed in the title, yes, rotation forest is significantly more accurate on average than competing techniques when compared on three distinct sets of datasets. Further, we assess the impact of the design features of rotation forest through an ablative study that transforms random forest into rotation forest. We identify the major limitation of rotation forest as its scalability, particularly in number of attributes. To overcome this problem we develop a model to predict the train time of the algorithm and hence propose a contract version of rotation forest where a run time cap is imposed {\em a priori}. We demonstrate that on large problems rotation forest can be made an order of magnitude faster without significant loss of accuracy. We also show that there is no real benefit (on average) from tuning rotation forest. We maintain that without any domain knowledge to indicate an algorithm preference, rotation forest should be the default algorithm of choice for problems with continuous attributes.
Tasks
Published	2018-09-18
URL	https://arxiv.org/abs/1809.06705v2
PDF	https://arxiv.org/pdf/1809.06705v2.pdf
PWC	https://paperswithcode.com/paper/is-rotation-forest-the-best-classifier-for
Repo	https://github.com/Liam-E2/RotationForest
Framework	none

What Makes Good Synthetic Training Data for Learning Disparity and Optical Flow Estimation?


Title	What Makes Good Synthetic Training Data for Learning Disparity and Optical Flow Estimation?
Authors	Nikolaus Mayer, Eddy Ilg, Philipp Fischer, Caner Hazirbas, Daniel Cremers, Alexey Dosovitskiy, Thomas Brox
Abstract	The finding that very large networks can be trained efficiently and reliably has led to a paradigm shift in computer vision from engineered solutions to learning formulations. As a result, the research challenge shifts from devising algorithms to creating suitable and abundant training data for supervised learning. How to efficiently create such training data? The dominant data acquisition method in visual recognition is based on web data and manual annotation. Yet, for many computer vision problems, such as stereo or optical flow estimation, this approach is not feasible because humans cannot manually enter a pixel-accurate flow field. In this paper, we promote the use of synthetically generated data for the purpose of training deep networks on such tasks.We suggest multiple ways to generate such data and evaluate the influence of dataset properties on the performance and generalization properties of the resulting networks. We also demonstrate the benefit of learning schedules that use different types of data at selected stages of the training process.
Tasks	Optical Flow Estimation
Published	2018-01-19
URL	http://arxiv.org/abs/1801.06397v3
PDF	http://arxiv.org/pdf/1801.06397v3.pdf
PWC	https://paperswithcode.com/paper/what-makes-good-synthetic-training-data-for
Repo	https://github.com/lmb-freiburg/optical-flow-2d-data-generation
Framework	none

Findings of the E2E NLG Challenge


Title	Findings of the E2E NLG Challenge
Authors	Ondřej Dušek, Jekaterina Novikova, Verena Rieser
Abstract	This paper summarises the experimental setup and results of the first shared task on end-to-end (E2E) natural language generation (NLG) in spoken dialogue systems. Recent end-to-end generation systems are promising since they reduce the need for data annotation. However, they are currently limited to small, delexicalised datasets. The E2E NLG shared task aims to assess whether these novel approaches can generate better-quality output by learning from a dataset containing higher lexical richness, syntactic complexity and diverse discourse phenomena. We compare 62 systems submitted by 17 institutions, covering a wide range of approaches, including machine learning architectures – with the majority implementing sequence-to-sequence models (seq2seq) – as well as systems based on grammatical rules and templates.
Tasks	Data-to-Text Generation, Spoken Dialogue Systems, Text Generation
Published	2018-10-02
URL	http://arxiv.org/abs/1810.01170v1
PDF	http://arxiv.org/pdf/1810.01170v1.pdf
PWC	https://paperswithcode.com/paper/findings-of-the-e2e-nlg-challenge
Repo	https://github.com/UFAL-DSG/tgen
Framework	tf