October 21, 2019

3185 words 15 mins read

Paper Group AWR 123

Deep Super Learner: A Deep Ensemble for Classification Problems. Large Margin Deep Networks for Classification. h-detach: Modifying the LSTM Gradient Towards Better Optimization. Fast Gaussian Process Based Gradient Matching for Parameter Identification in Systems of Nonlinear ODEs. Specification-Guided Safety Verification for Feedforward Neural Ne …

Deep Super Learner: A Deep Ensemble for Classification Problems


Title	Deep Super Learner: A Deep Ensemble for Classification Problems
Authors	Steven Young, Tamer Abdou, Ayse Bener
Abstract	Deep learning has become very popular for tasks such as predictive modeling and pattern recognition in handling big data. Deep learning is a powerful machine learning method that extracts lower level features and feeds them forward for the next layer to identify higher level features that improve performance. However, deep neural networks have drawbacks, which include many hyper-parameters and infinite architectures, opaqueness into results, and relatively slower convergence on smaller datasets. While traditional machine learning algorithms can address these drawbacks, they are not typically capable of the performance levels achieved by deep neural networks. To improve performance, ensemble methods are used to combine multiple base learners. Super learning is an ensemble that finds the optimal combination of diverse learning algorithms. This paper proposes deep super learning as an approach which achieves log loss and accuracy results competitive to deep neural networks while employing traditional machine learning algorithms in a hierarchical structure. The deep super learner is flexible, adaptable, and easy to train with good performance across different tasks using identical hyper-parameter values. Using traditional machine learning requires fewer hyper-parameters, allows transparency into results, and has relatively fast convergence on smaller datasets. Experimental results show that the deep super learner has superior performance compared to the individual base learners, single-layer ensembles, and in some cases deep neural networks. Performance of the deep super learner may further be improved with task-specific tuning.
Tasks
Published	2018-03-06
URL	http://arxiv.org/abs/1803.02323v1
PDF	http://arxiv.org/pdf/1803.02323v1.pdf
PWC	https://paperswithcode.com/paper/deep-super-learner-a-deep-ensemble-for
Repo	https://github.com/levyben/DeepSuperLearner
Framework	none

Large Margin Deep Networks for Classification


Title	Large Margin Deep Networks for Classification
Authors	Gamaleldin F. Elsayed, Dilip Krishnan, Hossein Mobahi, Kevin Regan, Samy Bengio
Abstract	We present a formulation of deep learning that aims at producing a large margin classifier. The notion of margin, minimum distance to a decision boundary, has served as the foundation of several theoretically profound and empirically successful results for both classification and regression tasks. However, most large margin algorithms are applicable only to shallow models with a preset feature representation; and conventional margin methods for neural networks only enforce margin at the output layer. Such methods are therefore not well suited for deep networks. In this work, we propose a novel loss function to impose a margin on any chosen set of layers of a deep network (including input and hidden layers). Our formulation allows choosing any norm on the metric measuring the margin. We demonstrate that the decision boundary obtained by our loss has nice properties compared to standard classification loss functions. Specifically, we show improved empirical results on the MNIST, CIFAR-10 and ImageNet datasets on multiple tasks: generalization from small training sets, corrupted labels, and robustness against adversarial perturbations. The resulting loss is general and complementary to existing data augmentation (such as random/adversarial input transform) and regularization techniques (such as weight decay, dropout, and batch norm).
Tasks	Data Augmentation
Published	2018-03-15
URL	http://arxiv.org/abs/1803.05598v2
PDF	http://arxiv.org/pdf/1803.05598v2.pdf
PWC	https://paperswithcode.com/paper/large-margin-deep-networks-for-classification
Repo	https://github.com/zsef123/Large_Margin_Loss_PyTorch
Framework	pytorch

h-detach: Modifying the LSTM Gradient Towards Better Optimization


Title	h-detach: Modifying the LSTM Gradient Towards Better Optimization
Authors	Devansh Arpit, Bhargav Kanuparthi, Giancarlo Kerg, Nan Rosemary Ke, Ioannis Mitliagkas, Yoshua Bengio
Abstract	Recurrent neural networks are known for their notorious exploding and vanishing gradient problem (EVGP). This problem becomes more evident in tasks where the information needed to correctly solve them exist over long time scales, because EVGP prevents important gradient components from being back-propagated adequately over a large number of steps. We introduce a simple stochastic algorithm (\textit{h}-detach) that is specific to LSTM optimization and targeted towards addressing this problem. Specifically, we show that when the LSTM weights are large, the gradient components through the linear path (cell state) in the LSTM computational graph get suppressed. Based on the hypothesis that these components carry information about long term dependencies (which we show empirically), their suppression can prevent LSTMs from capturing them. Our algorithm\footnote{Our code is available at https://github.com/bhargav104/h-detach.} prevents gradients flowing through this path from getting suppressed, thus allowing the LSTM to capture such dependencies better. We show significant improvements over vanilla LSTM gradient based training in terms of convergence speed, robustness to seed and learning rate, and generalization using our modification of LSTM gradient on various benchmark datasets.
Tasks
Published	2018-10-06
URL	http://arxiv.org/abs/1810.03023v2
PDF	http://arxiv.org/pdf/1810.03023v2.pdf
PWC	https://paperswithcode.com/paper/h-detach-modifying-the-lstm-gradient-towards
Repo	https://github.com/bhargav104/h-detach
Framework	pytorch

Fast Gaussian Process Based Gradient Matching for Parameter Identification in Systems of Nonlinear ODEs


Title	Fast Gaussian Process Based Gradient Matching for Parameter Identification in Systems of Nonlinear ODEs
Authors	Philippe Wenk, Alkis Gotovos, Stefan Bauer, Nico Gorbach, Andreas Krause, Joachim M. Buhmann
Abstract	Parameter identification and comparison of dynamical systems is a challenging task in many fields. Bayesian approaches based on Gaussian process regression over time-series data have been successfully applied to infer the parameters of a dynamical system without explicitly solving it. While the benefits in computational cost are well established, a rigorous mathematical framework has been missing. We offer a novel interpretation which leads to a better understanding and improvements in state-of-the-art performance in terms of accuracy for nonlinear dynamical systems.
Tasks	Time Series
Published	2018-04-12
URL	http://arxiv.org/abs/1804.04378v2
PDF	http://arxiv.org/pdf/1804.04378v2.pdf
PWC	https://paperswithcode.com/paper/fast-gaussian-process-based-gradient-matching
Repo	https://github.com/wenkph/FGPGM
Framework	none

Specification-Guided Safety Verification for Feedforward Neural Networks


Title	Specification-Guided Safety Verification for Feedforward Neural Networks
Authors	Weiming Xiang, Hoang-Dung Tran, Taylor T. Johnson
Abstract	This paper presents a specification-guided safety verification method for feedforward neural networks with general activation functions. As such feedforward networks are memoryless, they can be abstractly represented as mathematical functions, and the reachability analysis of the neural network amounts to interval analysis problems. In the framework of interval analysis, a computationally efficient formula which can quickly compute the output interval sets of a neural network is developed. Then, a specification-guided reachability algorithm is developed. Specifically, the bisection process in the verification algorithm is completely guided by a given safety specification. Due to the employment of the safety specification, unnecessary computations are avoided and thus the computational cost can be reduced significantly. Experiments show that the proposed method enjoys much more efficiency in safety verification with significantly less computational cost.
Tasks
Published	2018-12-14
URL	http://arxiv.org/abs/1812.06161v1
PDF	http://arxiv.org/pdf/1812.06161v1.pdf
PWC	https://paperswithcode.com/paper/specification-guided-safety-verification-for
Repo	https://github.com/verivital/nnv
Framework	none

Deep learning for denoising


Title	Deep learning for denoising
Authors	Siwei Yu, Jianwei Ma, Wenlong Wang
Abstract	Compared with traditional seismic noise attenuation algorithms that depend on signal models and their corresponding prior assumptions, removing noise with a deep neural network is trained based on a large training set, where the inputs are the raw datasets and the corresponding outputs are the desired clean data. After the completion of training, the deep learning method achieves adaptive denoising with no requirements of (i) accurate modelings of the signal and noise, or (ii) optimal parameters tuning. We call this intelligent denoising. We use a convolutional neural network as the basic tool for deep learning. In random and linear noise attenuation, the training set is generated with artificially added noise. In the multiple attenuation step, the training set is generated with acoustic wave equation. Stochastic gradient descent is used to solve the optimal parameters for the convolutional neural network. The runtime of deep learning on a graphics processing unit for denoising has the same order as the $f-x$ deconvolution method. Synthetic and field results show the potential applications of deep learning in automatic attenuation of random noise (with unknown variance), linear noise, and multiples.
Tasks	Denoising
Published	2018-10-27
URL	https://arxiv.org/abs/1810.11614v2
PDF	https://arxiv.org/pdf/1810.11614v2.pdf
PWC	https://paperswithcode.com/paper/deep-learning-tutorial-for-denoising
Repo	https://github.com/macaba/NNDN
Framework	none

Learning to Navigate in Cities Without a Map


Title	Learning to Navigate in Cities Without a Map
Authors	Piotr Mirowski, Matthew Koichi Grimes, Mateusz Malinowski, Karl Moritz Hermann, Keith Anderson, Denis Teplyashin, Karen Simonyan, Koray Kavukcuoglu, Andrew Zisserman, Raia Hadsell
Abstract	Navigating through unstructured environments is a basic capability of intelligent creatures, and thus is of fundamental interest in the study and development of artificial intelligence. Long-range navigation is a complex cognitive task that relies on developing an internal representation of space, grounded by recognisable landmarks and robust visual processing, that can simultaneously support continuous self-localisation (“I am here”) and a representation of the goal (“I am going there”). Building upon recent research that applies deep reinforcement learning to maze navigation problems, we present an end-to-end deep reinforcement learning approach that can be applied on a city scale. Recognising that successful navigation relies on integration of general policies with locale-specific knowledge, we propose a dual pathway architecture that allows locale-specific features to be encapsulated, while still enabling transfer to multiple cities. We present an interactive navigation environment that uses Google StreetView for its photographic content and worldwide coverage, and demonstrate that our learning method allows agents to learn to navigate multiple cities and to traverse to target destinations that may be kilometres away. The project webpage http://streetlearn.cc contains a video summarising our research and showing the trained agent in diverse city environments and on the transfer task, the form to request the StreetLearn dataset and links to further resources. The StreetLearn environment code is available at https://github.com/deepmind/streetlearn
Tasks	Autonomous Navigation
Published	2018-03-31
URL	http://arxiv.org/abs/1804.00168v3
PDF	http://arxiv.org/pdf/1804.00168v3.pdf
PWC	https://paperswithcode.com/paper/learning-to-navigate-in-cities-without-a-map
Repo	https://github.com/heiner/scalable_agent
Framework	tf


Title	Extreme Network Compression via Filter Group Approximation
Authors	Bo Peng, Wenming Tan, Zheyang Li, Shun Zhang, Di Xie, Shiliang Pu
Abstract	In this paper we propose a novel decomposition method based on filter group approximation, which can significantly reduce the redundancy of deep convolutional neural networks (CNNs) while maintaining the majority of feature representation. Unlike other low-rank decomposition algorithms which operate on spatial or channel dimension of filters, our proposed method mainly focuses on exploiting the filter group structure for each layer. For several commonly used CNN models, including VGG and ResNet, our method can reduce over 80% floating-point operations (FLOPs) with less accuracy drop than state-of-the-art methods on various image classification datasets. Besides, experiments demonstrate that our method is conducive to alleviating degeneracy of the compressed network, which hurts the convergence and performance of the network.
Tasks	Image Classification
Published	2018-07-30
URL	http://arxiv.org/abs/1807.11254v2
PDF	http://arxiv.org/pdf/1807.11254v2.pdf
PWC	https://paperswithcode.com/paper/extreme-network-compression-via-filter-group
Repo	https://github.com/lhaof/deep-learning-for-mobile-device
Framework	none

Systematic Generalization: What Is Required and Can It Be Learned?


Title	Systematic Generalization: What Is Required and Can It Be Learned?
Authors	Dzmitry Bahdanau, Shikhar Murty, Michael Noukhovitch, Thien Huu Nguyen, Harm de Vries, Aaron Courville
Abstract	Numerous models for grounded language understanding have been recently proposed, including (i) generic models that can be easily adapted to any given task and (ii) intuitively appealing modular models that require background knowledge to be instantiated. We compare both types of models in how much they lend themselves to a particular form of systematic generalization. Using a synthetic VQA test, we evaluate which models are capable of reasoning about all possible object pairs after training on only a small subset of them. Our findings show that the generalization of modular models is much more systematic and that it is highly sensitive to the module layout, i.e. to how exactly the modules are connected. We furthermore investigate if modular models that generalize well could be made more end-to-end by learning their layout and parametrization. We find that end-to-end methods from prior work often learn inappropriate layouts or parametrizations that do not facilitate systematic generalization. Our results suggest that, in addition to modularity, systematic generalization in language understanding may require explicit regularizers or priors.
Tasks	Visual Question Answering
Published	2018-11-30
URL	http://arxiv.org/abs/1811.12889v3
PDF	http://arxiv.org/pdf/1811.12889v3.pdf
PWC	https://paperswithcode.com/paper/systematic-generalization-what-is-required
Repo	https://github.com/rizar/systematic-generalization-sqoop
Framework	pytorch

IRLAS: Inverse Reinforcement Learning for Architecture Search


Title	IRLAS: Inverse Reinforcement Learning for Architecture Search
Authors	Minghao Guo, Zhao Zhong, Wei Wu, Dahua Lin, Junjie Yan
Abstract	In this paper, we propose an inverse reinforcement learning method for architecture search (IRLAS), which trains an agent to learn to search network structures that are topologically inspired by human-designed network. Most existing architecture search approaches totally neglect the topological characteristics of architectures, which results in complicated architecture with a high inference latency. Motivated by the fact that human-designed networks are elegant in topology with a fast inference speed, we propose a mirror stimuli function inspired by biological cognition theory to extract the abstract topological knowledge of an expert human-design network (ResNeXt). To avoid raising a too strong prior over the search space, we introduce inverse reinforcement learning to train the mirror stimuli function and exploit it as a heuristic guidance for architecture search, easily generalized to different architecture search algorithms. On CIFAR-10, the best architecture searched by our proposed IRLAS achieves 2.60% error rate. For ImageNet mobile setting, our model achieves a state-of-the-art top-1 accuracy 75.28%, while being 2~4x faster than most auto-generated architectures. A fast version of this model achieves 10% faster than MobileNetV2, while maintaining a higher accuracy.
Tasks	Neural Architecture Search
Published	2018-12-13
URL	https://arxiv.org/abs/1812.05285v5
PDF	https://arxiv.org/pdf/1812.05285v5.pdf
PWC	https://paperswithcode.com/paper/irlas-inverse-reinforcement-learning-for
Repo	https://github.com/gmh14/IRLAS
Framework	pytorch

The highD Dataset: A Drone Dataset of Naturalistic Vehicle Trajectories on German Highways for Validation of Highly Automated Driving Systems


Title	The highD Dataset: A Drone Dataset of Naturalistic Vehicle Trajectories on German Highways for Validation of Highly Automated Driving Systems
Authors	Robert Krajewski, Julian Bock, Laurent Kloeker, Lutz Eckstein
Abstract	Scenario-based testing for the safety validation of highly automated vehicles is a promising approach that is being examined in research and industry. This approach heavily relies on data from real-world scenarios to derive the necessary scenario information for testing. Measurement data should be collected at a reasonable effort, contain naturalistic behavior of road users and include all data relevant for a description of the identified scenarios in sufficient quality. However, the current measurement methods fail to meet at least one of the requirements. Thus, we propose a novel method to measure data from an aerial perspective for scenario-based validation fulfilling the mentioned requirements. Furthermore, we provide a large-scale naturalistic vehicle trajectory dataset from German highways called highD. We evaluate the data in terms of quantity, variety and contained scenarios. Our dataset consists of 16.5 hours of measurements from six locations with 110 000 vehicles, a total driven distance of 45 000 km and 5600 recorded complete lane changes. The highD dataset is available online at: http://www.highD-dataset.com
Tasks
Published	2018-10-11
URL	http://arxiv.org/abs/1810.05642v1
PDF	http://arxiv.org/pdf/1810.05642v1.pdf
PWC	https://paperswithcode.com/paper/the-highd-dataset-a-drone-dataset-of
Repo	https://github.com/RobertKrajewski/highD-dataset
Framework	none

Learning Latent Permutations with Gumbel-Sinkhorn Networks


Title	Learning Latent Permutations with Gumbel-Sinkhorn Networks
Authors	Gonzalo Mena, David Belanger, Scott Linderman, Jasper Snoek
Abstract	Permutations and matchings are core building blocks in a variety of latent variable models, as they allow us to align, canonicalize, and sort data. Learning in such models is difficult, however, because exact marginalization over these combinatorial objects is intractable. In response, this paper introduces a collection of new methods for end-to-end learning in such models that approximate discrete maximum-weight matching using the continuous Sinkhorn operator. Sinkhorn iteration is attractive because it functions as a simple, easy-to-implement analog of the softmax operator. With this, we can define the Gumbel-Sinkhorn method, an extension of the Gumbel-Softmax method (Jang et al. 2016, Maddison2016 et al. 2016) to distributions over latent matchings. We demonstrate the effectiveness of our method by outperforming competitive baselines on a range of qualitatively different tasks: sorting numbers, solving jigsaw puzzles, and identifying neural signals in worms.
Tasks	Latent Variable Models
Published	2018-02-23
URL	http://arxiv.org/abs/1802.08665v1
PDF	http://arxiv.org/pdf/1802.08665v1.pdf
PWC	https://paperswithcode.com/paper/learning-latent-permutations-with-gumbel
Repo	https://github.com/HeddaCohenIndelman/Learning-Gumbel-Sinkhorn-Permutations-w-Pytorch
Framework	pytorch

Direct Output Connection for a High-Rank Language Model


Title	Direct Output Connection for a High-Rank Language Model
Authors	Sho Takase, Jun Suzuki, Masaaki Nagata
Abstract	This paper proposes a state-of-the-art recurrent neural network (RNN) language model that combines probability distributions computed not only from a final RNN layer but also from middle layers. Our proposed method raises the expressive power of a language model based on the matrix factorization interpretation of language modeling introduced by Yang et al. (2018). The proposed method improves the current state-of-the-art language model and achieves the best score on the Penn Treebank and WikiText-2, which are the standard benchmark datasets. Moreover, we indicate our proposed method contributes to two application tasks: machine translation and headline generation. Our code is publicly available at: https://github.com/nttcslab-nlp/doc_lm.
Tasks	Constituency Parsing, Language Modelling, Machine Translation
Published	2018-08-30
URL	http://arxiv.org/abs/1808.10143v2
PDF	http://arxiv.org/pdf/1808.10143v2.pdf
PWC	https://paperswithcode.com/paper/direct-output-connection-for-a-high-rank
Repo	https://github.com/nttcslab-nlp/doc_lm
Framework	pytorch

A Survey of Word Embeddings Evaluation Methods


Title	A Survey of Word Embeddings Evaluation Methods
Authors	Amir Bakarov
Abstract	Word embeddings are real-valued word representations able to capture lexical semantics and trained on natural language corpora. Models proposing these representations have gained popularity in the recent years, but the issue of the most adequate evaluation method still remains open. This paper presents an extensive overview of the field of word embeddings evaluation, highlighting main problems and proposing a typology of approaches to evaluation, summarizing 16 intrinsic methods and 12 extrinsic methods. I describe both widely-used and experimental methods, systematize information about evaluation datasets and discuss some key challenges.
Tasks	Word Embeddings
Published	2018-01-21
URL	http://arxiv.org/abs/1801.09536v1
PDF	http://arxiv.org/pdf/1801.09536v1.pdf
PWC	https://paperswithcode.com/paper/a-survey-of-word-embeddings-evaluation
Repo	https://github.com/avi-jit/SWOW-eval
Framework	none

Combo Loss: Handling Input and Output Imbalance in Multi-Organ Segmentation


Title	Combo Loss: Handling Input and Output Imbalance in Multi-Organ Segmentation
Authors	Saeid Asgari Taghanaki, Yefeng Zheng, S. Kevin Zhou, Bogdan Georgescu, Puneet Sharma, Daguang Xu, Dorin Comaniciu, Ghassan Hamarneh
Abstract	Simultaneous segmentation of multiple organs from different medical imaging modalities is a crucial task as it can be utilized for computer-aided diagnosis, computer-assisted surgery, and therapy planning. Thanks to the recent advances in deep learning, several deep neural networks for medical image segmentation have been introduced successfully for this purpose. In this paper, we focus on learning a deep multi-organ segmentation network that labels voxels. In particular, we examine the critical choice of a loss function in order to handle the notorious imbalance problem that plagues both the input and output of a learning model. The input imbalance refers to the class-imbalance in the input training samples (i.e., small foreground objects embedded in an abundance of background voxels, as well as organs of varying sizes). The output imbalance refers to the imbalance between the false positives and false negatives of the inference model. In order to tackle both types of imbalance during training and inference, we introduce a new curriculum learning based loss function. Specifically, we leverage Dice similarity coefficient to deter model parameters from being held at bad local minima and at the same time gradually learn better model parameters by penalizing for false positives/negatives using a cross entropy term. We evaluated the proposed loss function on three datasets: whole body positron emission tomography (PET) scans with 5 target organs, magnetic resonance imaging (MRI) prostate scans, and ultrasound echocardigraphy images with a single target organ i.e., left ventricular. We show that a simple network architecture with the proposed integrative loss function can outperform state-of-the-art methods and results of the competing methods can be improved when our proposed loss is used.
Tasks	Medical Image Segmentation, Semantic Segmentation
Published	2018-05-08
URL	http://arxiv.org/abs/1805.02798v5
PDF	http://arxiv.org/pdf/1805.02798v5.pdf
PWC	https://paperswithcode.com/paper/combo-loss-handling-input-and-output
Repo	https://github.com/asgsaeid/ComboLoss
Framework	none