October 20, 2019

3007 words 15 mins read

Paper Group AWR 284

Improving the Resolution of CNN Feature Maps Efficiently with Multisampling. Where Did My Optimum Go?: An Empirical Analysis of Gradient Descent Optimization in Policy Gradient Methods. Multi-objective Model-based Policy Search for Data-efficient Learning with Sparse Rewards. Curriculum Adversarial Training. Deep Leaf Segmentation Using Synthetic D …

Improving the Resolution of CNN Feature Maps Efficiently with Multisampling


Title	Improving the Resolution of CNN Feature Maps Efficiently with Multisampling
Authors	Shayan Sadigh, Pradeep Sen
Abstract	We describe a new class of subsampling techniques for CNNs, termed multisampling, that significantly increases the amount of information kept by feature maps through subsampling layers. One version of our method, which we call checkered subsampling, significantly improves the accuracy of state-of-the-art architectures such as DenseNet and ResNet without any additional parameters and, remarkably, improves the accuracy of certain pretrained ImageNet models without any training or fine-tuning. We glean new insight into the nature of data augmentations and demonstrate, for the first time, that coarse feature maps are significantly bottlenecking the performance of neural networks in image classification.
Tasks	Image Classification
Published	2018-05-28
URL	http://arxiv.org/abs/1805.10766v1
PDF	http://arxiv.org/pdf/1805.10766v1.pdf
PWC	https://paperswithcode.com/paper/improving-the-resolution-of-cnn-feature-maps
Repo	https://github.com/ShayanPersonal/checkered-cnn
Framework	pytorch

Where Did My Optimum Go?: An Empirical Analysis of Gradient Descent Optimization in Policy Gradient Methods


Title	Where Did My Optimum Go?: An Empirical Analysis of Gradient Descent Optimization in Policy Gradient Methods
Authors	Peter Henderson, Joshua Romoff, Joelle Pineau
Abstract	Recent analyses of certain gradient descent optimization methods have shown that performance can degrade in some settings - such as with stochasticity or implicit momentum. In deep reinforcement learning (Deep RL), such optimization methods are often used for training neural networks via the temporal difference error or policy gradient. As an agent improves over time, the optimization target changes and thus the loss landscape (and local optima) change. Due to the failure modes of those methods, the ideal choice of optimizer for Deep RL remains unclear. As such, we provide an empirical analysis of the effects that a wide range of gradient descent optimizers and their hyperparameters have on policy gradient methods, a subset of Deep RL algorithms, for benchmark continuous control tasks. We find that adaptive optimizers have a narrow window of effective learning rates, diverging in other cases, and that the effectiveness of momentum varies depending on the properties of the environment. Our analysis suggests that there is significant interplay between the dynamics of the environment and Deep RL algorithm properties which aren’t necessarily accounted for by traditional adaptive gradient methods. We provide suggestions for optimal settings of current methods and further lines of research based on our findings.
Tasks	Continuous Control, Policy Gradient Methods
Published	2018-10-05
URL	http://arxiv.org/abs/1810.02525v1
PDF	http://arxiv.org/pdf/1810.02525v1.pdf
PWC	https://paperswithcode.com/paper/where-did-my-optimum-go-an-empirical-analysis
Repo	https://github.com/facebookresearch/WhereDidMyOptimumGo
Framework	pytorch

Multi-objective Model-based Policy Search for Data-efficient Learning with Sparse Rewards


Title	Multi-objective Model-based Policy Search for Data-efficient Learning with Sparse Rewards
Authors	Rituraj Kaushik, Konstantinos Chatzilygeroudis, Jean-Baptiste Mouret
Abstract	The most data-efficient algorithms for reinforcement learning in robotics are model-based policy search algorithms, which alternate between learning a dynamical model of the robot and optimizing a policy to maximize the expected return given the model and its uncertainties. However, the current algorithms lack an effective exploration strategy to deal with sparse or misleading reward scenarios: if they do not experience any state with a positive reward during the initial random exploration, it is very unlikely to solve the problem. Here, we propose a novel model-based policy search algorithm, Multi-DEX, that leverages a learned dynamical model to efficiently explore the task space and solve tasks with sparse rewards in a few episodes. To achieve this, we frame the policy search problem as a multi-objective, model-based policy optimization problem with three objectives: (1) generate maximally novel state trajectories, (2) maximize the expected return and (3) keep the system in state-space regions for which the model is as accurate as possible. We then optimize these objectives using a Pareto-based multi-objective optimization algorithm. The experiments show that Multi-DEX is able to solve sparse reward scenarios (with a simulated robotic arm) in much lower interaction time than VIME, TRPO, GEP-PG, CMA-ES and Black-DROPS.
Tasks	Continuous Control, Efficient Exploration
Published	2018-06-25
URL	https://arxiv.org/abs/1806.09351v3
PDF	https://arxiv.org/pdf/1806.09351v3.pdf
PWC	https://paperswithcode.com/paper/multi-objective-model-based-policy-search-for
Repo	https://github.com/resibots/kaushik_2018_multi-dex
Framework	none

Curriculum Adversarial Training


Title	Curriculum Adversarial Training
Authors	Qi-Zhi Cai, Min Du, Chang Liu, Dawn Song
Abstract	Recently, deep learning has been applied to many security-sensitive applications, such as facial authentication. The existence of adversarial examples hinders such applications. The state-of-the-art result on defense shows that adversarial training can be applied to train a robust model on MNIST against adversarial examples; but it fails to achieve a high empirical worst-case accuracy on a more complex task, such as CIFAR-10 and SVHN. In our work, we propose curriculum adversarial training (CAT) to resolve this issue. The basic idea is to develop a curriculum of adversarial examples generated by attacks with a wide range of strengths. With two techniques to mitigate the forgetting and the generalization issues, we demonstrate that CAT can improve the prior art’s empirical worst-case accuracy by a large margin of 25% on CIFAR-10 and 35% on SVHN. At the same, the model’s performance on non-adversarial inputs is comparable to the state-of-the-art models.
Tasks
Published	2018-05-13
URL	http://arxiv.org/abs/1805.04807v1
PDF	http://arxiv.org/pdf/1805.04807v1.pdf
PWC	https://paperswithcode.com/paper/curriculum-adversarial-training
Repo	https://github.com/sunblaze-ucb/curriculum-adversarial-training-CAT
Framework	pytorch

Deep Leaf Segmentation Using Synthetic Data


Title	Deep Leaf Segmentation Using Synthetic Data
Authors	Daniel Ward, Peyman Moghadam, Nicolas Hudson
Abstract	Automated segmentation of individual leaves of a plant in an image is a prerequisite to measure more complex phenotypic traits in high-throughput phenotyping. Applying state-of-the-art machine learning approaches to tackle leaf instance segmentation requires a large amount of manually annotated training data. Currently, the benchmark datasets for leaf segmentation contain only a few hundred labeled training images. In this paper, we propose a framework for leaf instance segmentation by augmenting real plant datasets with generated synthetic images of plants inspired by domain randomisation. We train a state-of-the-art deep learning segmentation architecture (Mask-RCNN) with a combination of real and synthetic images of Arabidopsis plants. Our proposed approach achieves 90% leaf segmentation score on the A1 test set outperforming the-state-of-the-art approaches for the CVPPP Leaf Segmentation Challenge (LSC). Our approach also achieves 81% mean performance over all five test datasets.
Tasks	Instance Segmentation, Semantic Segmentation
Published	2018-07-28
URL	http://arxiv.org/abs/1807.10931v3
PDF	http://arxiv.org/pdf/1807.10931v3.pdf
PWC	https://paperswithcode.com/paper/deep-leaf-segmentation-using-synthetic-data
Repo	https://github.com/DanielCWard/Deep-Leaf-Segmentation-Using-Synthetic-Data
Framework	none

NRTR: A No-Recurrence Sequence-to-Sequence Model For Scene Text Recognition


Title	NRTR: A No-Recurrence Sequence-to-Sequence Model For Scene Text Recognition
Authors	Fenfen Sheng, Zhineng Chen, Bo Xu
Abstract	Scene text recognition has attracted a great many researches due to its importance to various applications. Existing methods mainly adopt recurrence or convolution based networks. Though have obtained good performance, these methods still suffer from two limitations: slow training speed due to the internal recurrence of RNNs, and high complexity due to stacked convolutional layers for long-term feature extraction. This paper, for the first time, proposes a no-recurrence sequence-to-sequence text recognizer, named NRTR, that dispenses with recurrences and convolutions entirely. NRTR follows the encoder-decoder paradigm, where the encoder uses stacked self-attention to extract image features, and the decoder applies stacked self-attention to recognize texts based on encoder output. NRTR relies solely on self-attention mechanism thus could be trained with more parallelization and less complexity. Considering scene image has large variation in text and background, we further design a modality-transform block to effectively transform 2D input images to 1D sequences, combined with the encoder to extract more discriminative features. NRTR achieves state-of-the-art or highly competitive performance on both regular and irregular benchmarks, while requires only a small fraction of training time compared to the best model from the literature (at least 8 times faster).
Tasks	Scene Text Recognition
Published	2018-06-04
URL	https://arxiv.org/abs/1806.00926v2
PDF	https://arxiv.org/pdf/1806.00926v2.pdf
PWC	https://paperswithcode.com/paper/nrtr-a-no-recurrence-sequence-to-sequence
Repo	https://github.com/Belval/NRTR
Framework	tf

COLA: Decentralized Linear Learning


Title	COLA: Decentralized Linear Learning
Authors	Lie He, An Bian, Martin Jaggi
Abstract	Decentralized machine learning is a promising emerging paradigm in view of global challenges of data ownership and privacy. We consider learning of linear classification and regression models, in the setting where the training data is decentralized over many user devices, and the learning algorithm must run on-device, on an arbitrary communication network, without a central coordinator. We propose COLA, a new decentralized training algorithm with strong theoretical guarantees and superior practical performance. Our framework overcomes many limitations of existing methods, and achieves communication efficiency, scalability, elasticity as well as resilience to changes in data and participating devices.
Tasks
Published	2018-08-13
URL	https://arxiv.org/abs/1808.04883v4
PDF	https://arxiv.org/pdf/1808.04883v4.pdf
PWC	https://paperswithcode.com/paper/cola-decentralized-linear-learning
Repo	https://github.com/epfml/cola
Framework	pytorch

Language Modeling with Sparse Product of Sememe Experts


Title	Language Modeling with Sparse Product of Sememe Experts
Authors	Yihong Gu, Jun Yan, Hao Zhu, Zhiyuan Liu, Ruobing Xie, Maosong Sun, Fen Lin, Leyu Lin
Abstract	Most language modeling methods rely on large-scale data to statistically learn the sequential patterns of words. In this paper, we argue that words are atomic language units but not necessarily atomic semantic units. Inspired by HowNet, we use sememes, the minimum semantic units in human languages, to represent the implicit semantics behind words for language modeling, named Sememe-Driven Language Model (SDLM). More specifically, to predict the next word, SDLM first estimates the sememe distribution gave textual context. Afterward, it regards each sememe as a distinct semantic expert, and these experts jointly identify the most probable senses and the corresponding word. In this way, SDLM enables language models to work beyond word-level manipulation to fine-grained sememe-level semantics and offers us more powerful tools to fine-tune language models and improve the interpretability as well as the robustness of language models. Experiments on language modeling and the downstream application of headline gener- ation demonstrate the significant effect of SDLM. Source code and data used in the experiments can be accessed at https:// github.com/thunlp/SDLM-pytorch.
Tasks	Language Modelling
Published	2018-10-29
URL	http://arxiv.org/abs/1810.12387v1
PDF	http://arxiv.org/pdf/1810.12387v1.pdf
PWC	https://paperswithcode.com/paper/language-modeling-with-sparse-product-of
Repo	https://github.com/thunlp/SDLM-pytorch
Framework	pytorch

SketchyGAN: Towards Diverse and Realistic Sketch to Image Synthesis


Title	SketchyGAN: Towards Diverse and Realistic Sketch to Image Synthesis
Authors	Wengling Chen, James Hays
Abstract	Synthesizing realistic images from human drawn sketches is a challenging problem in computer graphics and vision. Existing approaches either need exact edge maps, or rely on retrieval of existing photographs. In this work, we propose a novel Generative Adversarial Network (GAN) approach that synthesizes plausible images from 50 categories including motorcycles, horses and couches. We demonstrate a data augmentation technique for sketches which is fully automatic, and we show that the augmented data is helpful to our task. We introduce a new network building block suitable for both the generator and discriminator which improves the information flow by injecting the input image at multiple scales. Compared to state-of-the-art image translation methods, our approach generates more realistic images and achieves significantly higher Inception Scores.
Tasks	Data Augmentation, Image Generation
Published	2018-01-09
URL	http://arxiv.org/abs/1801.02753v2
PDF	http://arxiv.org/pdf/1801.02753v2.pdf
PWC	https://paperswithcode.com/paper/sketchygan-towards-diverse-and-realistic
Repo	https://github.com/wchen342/SketchyGAN
Framework	tf

Non-Matrix Tactile Sensors: How Can Be Exploited Their Local Connectivity For Predicting Grasp Stability?


Title	Non-Matrix Tactile Sensors: How Can Be Exploited Their Local Connectivity For Predicting Grasp Stability?
Authors	Brayan S. Zapata-Impata, Pablo Gil, Fernando Torres
Abstract	Tactile sensors supply useful information during the interaction with an object that can be used for assessing the stability of a grasp. Most of the previous works on this topic processed tactile readings as signals by calculating hand-picked features. Some of them have processed these readings as images calculating characteristics on matrix-like sensors. In this work, we explore how non-matrix sensors (sensors with taxels not arranged exactly in a matrix) can be processed as tactile images as well. In addition, we prove that they can be used for predicting grasp stability by training a Convolutional Neural Network (CNN) with them. We captured over 2500 real three-fingered grasps on 41 everyday objects to train a CNN that exploited the local connectivity inherent on the non-matrix tactile sensors, achieving 94.2% F1-score on predicting stability.
Tasks
Published	2018-09-14
URL	http://arxiv.org/abs/1809.05551v1
PDF	http://arxiv.org/pdf/1809.05551v1.pdf
PWC	https://paperswithcode.com/paper/non-matrix-tactile-sensors-how-can-be
Repo	https://github.com/yayaneath/biotac-sp-images
Framework	none

There Are Many Consistent Explanations of Unlabeled Data: Why You Should Average


Title	There Are Many Consistent Explanations of Unlabeled Data: Why You Should Average
Authors	Ben Athiwaratkun, Marc Finzi, Pavel Izmailov, Andrew Gordon Wilson
Abstract	Presently the most successful approaches to semi-supervised learning are based on consistency regularization, whereby a model is trained to be robust to small perturbations of its inputs and parameters. To understand consistency regularization, we conceptually explore how loss geometry interacts with training procedures. The consistency loss dramatically improves generalization performance over supervised-only training; however, we show that SGD struggles to converge on the consistency loss and continues to make large steps that lead to changes in predictions on the test data. Motivated by these observations, we propose to train consistency-based methods with Stochastic Weight Averaging (SWA), a recent approach which averages weights along the trajectory of SGD with a modified learning rate schedule. We also propose fast-SWA, which further accelerates convergence by averaging multiple points within each cycle of a cyclical learning rate schedule. With weight averaging, we achieve the best known semi-supervised results on CIFAR-10 and CIFAR-100, over many different quantities of labeled training data. For example, we achieve 5.0% error on CIFAR-10 with only 4000 labels, compared to the previous best result in the literature of 6.3%.
Tasks	Domain Adaptation, Semi-Supervised Image Classification
Published	2018-06-14
URL	http://arxiv.org/abs/1806.05594v3
PDF	http://arxiv.org/pdf/1806.05594v3.pdf
PWC	https://paperswithcode.com/paper/there-are-many-consistent-explanations-of
Repo	https://github.com/benathi/fastswa-semi-sup
Framework	pytorch

Uncertainty-Aware Attention for Reliable Interpretation and Prediction


Title	Uncertainty-Aware Attention for Reliable Interpretation and Prediction
Authors	Jay Heo, Hae Beom Lee, Saehoon Kim, Juho Lee, Kwang Joon Kim, Eunho Yang, Sung Ju Hwang
Abstract	Attention mechanism is effective in both focusing the deep learning models on relevant features and interpreting them. However, attentions may be unreliable since the networks that generate them are often trained in a weakly-supervised manner. To overcome this limitation, we introduce the notion of input-dependent uncertainty to the attention mechanism, such that it generates attention for each feature with varying degrees of noise based on the given input, to learn larger variance on instances it is uncertain about. We learn this Uncertainty-aware Attention (UA) mechanism using variational inference, and validate it on various risk prediction tasks from electronic health records on which our model significantly outperforms existing attention models. The analysis of the learned attentions shows that our model generates attentions that comply with clinicians’ interpretation, and provide richer interpretation via learned variance. Further evaluation of both the accuracy of the uncertainty calibration and the prediction performance with “I don’t know” decision show that UA yields networks with high reliability as well.
Tasks	Calibration
Published	2018-05-24
URL	http://arxiv.org/abs/1805.09653v1
PDF	http://arxiv.org/pdf/1805.09653v1.pdf
PWC	https://paperswithcode.com/paper/uncertainty-aware-attention-for-reliable
Repo	https://github.com/jayheo/UA
Framework	tf

PredRNN++: Towards A Resolution of the Deep-in-Time Dilemma in Spatiotemporal Predictive Learning


Title	PredRNN++: Towards A Resolution of the Deep-in-Time Dilemma in Spatiotemporal Predictive Learning
Authors	Yunbo Wang, Zhifeng Gao, Mingsheng Long, Jianmin Wang, Philip S. Yu
Abstract	We present PredRNN++, an improved recurrent network for video predictive learning. In pursuit of a greater spatiotemporal modeling capability, our approach increases the transition depth between adjacent states by leveraging a novel recurrent unit, which is named Causal LSTM for re-organizing the spatial and temporal memories in a cascaded mechanism. However, there is still a dilemma in video predictive learning: increasingly deep-in-time models have been designed for capturing complex variations, while introducing more difficulties in the gradient back-propagation. To alleviate this undesirable effect, we propose a Gradient Highway architecture, which provides alternative shorter routes for gradient flows from outputs back to long-range inputs. This architecture works seamlessly with causal LSTMs, enabling PredRNN++ to capture short-term and long-term dependencies adaptively. We assess our model on both synthetic and real video datasets, showing its ability to ease the vanishing gradient problem and yield state-of-the-art prediction results even in a difficult objects occlusion scenario.
Tasks
Published	2018-04-17
URL	http://arxiv.org/abs/1804.06300v2
PDF	http://arxiv.org/pdf/1804.06300v2.pdf
PWC	https://paperswithcode.com/paper/predrnn-towards-a-resolution-of-the-deep-in
Repo	https://github.com/Yunbo426/predrnn-pp
Framework	tf

Masked Conditional Neural Networks for Audio Classification


Title	Masked Conditional Neural Networks for Audio Classification
Authors	Fady Medhat, David Chesmore, John Robinson
Abstract	We present the ConditionaL Neural Network (CLNN) and the Masked ConditionaL Neural Network (MCLNN) designed for temporal signal recognition. The CLNN takes into consideration the temporal nature of the sound signal and the MCLNN extends upon the CLNN through a binary mask to preserve the spatial locality of the features and allows an automated exploration of the features combination analogous to hand-crafting the most relevant features for the recognition task. MCLNN has achieved competitive recognition accuracies on the GTZAN and the ISMIR2004 music datasets that surpass several state-of-the-art neural network based architectures and hand-crafted methods applied on both datasets.
Tasks	Audio Classification
Published	2018-03-06
URL	http://arxiv.org/abs/1803.02421v2
PDF	http://arxiv.org/pdf/1803.02421v2.pdf
PWC	https://paperswithcode.com/paper/masked-conditional-neural-networks-for-audio
Repo	https://github.com/fadymedhat/MCLNN
Framework	tf

Towards multi-instrument drum transcription


Title	Towards multi-instrument drum transcription
Authors	Richard Vogl, Gerhard Widmer, Peter Knees
Abstract	Automatic drum transcription, a subtask of the more general automatic music transcription, deals with extracting drum instrument note onsets from an audio source. Recently, progress in transcription performance has been made using non-negative matrix factorization as well as deep learning methods. However, these works primarily focus on transcribing three drum instruments only: snare drum, bass drum, and hi-hat. Yet, for many applications, the ability to transcribe more drum instruments which make up standard drum kits used in western popular music would be desirable. In this work, convolutional and convolutional recurrent neural networks are trained to transcribe a wider range of drum instruments. First, the shortcomings of publicly available datasets in this context are discussed. To overcome these limitations, a larger synthetic dataset is introduced. Then, methods to train models using the new dataset focusing on generalization to real world data are investigated. Finally, the trained models are evaluated on publicly available datasets and results are discussed. The contributions of this work comprise: (i.) a large-scale synthetic dataset for drum transcription, (ii.) first steps towards an automatic drum transcription system that supports a larger range of instruments by evaluating and discussing training setups and the impact of datasets in this context, and (iii.) a publicly available set of trained models for drum transcription. Additional materials are available at http://ifs.tuwien.ac.at/~vogl/dafx2018
Tasks	Drum Transcription
Published	2018-06-18
URL	http://arxiv.org/abs/1806.06676v2
PDF	http://arxiv.org/pdf/1806.06676v2.pdf
PWC	https://paperswithcode.com/paper/towards-multi-instrument-drum-transcription
Repo	https://github.com/keunwoochoi/DrummerNet
Framework	pytorch