Paper Group AWR 284
Improving the Resolution of CNN Feature Maps Efficiently with Multisampling. Where Did My Optimum Go?: An Empirical Analysis of Gradient Descent Optimization in Policy Gradient Methods. Multi-objective Model-based Policy Search for Data-efficient Learning with Sparse Rewards. Curriculum Adversarial Training. Deep Leaf Segmentation Using Synthetic D …
Improving the Resolution of CNN Feature Maps Efficiently with Multisampling
Title | Improving the Resolution of CNN Feature Maps Efficiently with Multisampling |
Authors | Shayan Sadigh, Pradeep Sen |
Abstract | We describe a new class of subsampling techniques for CNNs, termed multisampling, that significantly increases the amount of information kept by feature maps through subsampling layers. One version of our method, which we call checkered subsampling, significantly improves the accuracy of state-of-the-art architectures such as DenseNet and ResNet without any additional parameters and, remarkably, improves the accuracy of certain pretrained ImageNet models without any training or fine-tuning. We glean new insight into the nature of data augmentations and demonstrate, for the first time, that coarse feature maps are significantly bottlenecking the performance of neural networks in image classification. |
Tasks | Image Classification |
Published | 2018-05-28 |
URL | http://arxiv.org/abs/1805.10766v1 |
http://arxiv.org/pdf/1805.10766v1.pdf | |
PWC | https://paperswithcode.com/paper/improving-the-resolution-of-cnn-feature-maps |
Repo | https://github.com/ShayanPersonal/checkered-cnn |
Framework | pytorch |
Where Did My Optimum Go?: An Empirical Analysis of Gradient Descent Optimization in Policy Gradient Methods
Title | Where Did My Optimum Go?: An Empirical Analysis of Gradient Descent Optimization in Policy Gradient Methods |
Authors | Peter Henderson, Joshua Romoff, Joelle Pineau |
Abstract | Recent analyses of certain gradient descent optimization methods have shown that performance can degrade in some settings - such as with stochasticity or implicit momentum. In deep reinforcement learning (Deep RL), such optimization methods are often used for training neural networks via the temporal difference error or policy gradient. As an agent improves over time, the optimization target changes and thus the loss landscape (and local optima) change. Due to the failure modes of those methods, the ideal choice of optimizer for Deep RL remains unclear. As such, we provide an empirical analysis of the effects that a wide range of gradient descent optimizers and their hyperparameters have on policy gradient methods, a subset of Deep RL algorithms, for benchmark continuous control tasks. We find that adaptive optimizers have a narrow window of effective learning rates, diverging in other cases, and that the effectiveness of momentum varies depending on the properties of the environment. Our analysis suggests that there is significant interplay between the dynamics of the environment and Deep RL algorithm properties which aren’t necessarily accounted for by traditional adaptive gradient methods. We provide suggestions for optimal settings of current methods and further lines of research based on our findings. |
Tasks | Continuous Control, Policy Gradient Methods |
Published | 2018-10-05 |
URL | http://arxiv.org/abs/1810.02525v1 |
http://arxiv.org/pdf/1810.02525v1.pdf | |
PWC | https://paperswithcode.com/paper/where-did-my-optimum-go-an-empirical-analysis |
Repo | https://github.com/facebookresearch/WhereDidMyOptimumGo |
Framework | pytorch |
Multi-objective Model-based Policy Search for Data-efficient Learning with Sparse Rewards
Title | Multi-objective Model-based Policy Search for Data-efficient Learning with Sparse Rewards |
Authors | Rituraj Kaushik, Konstantinos Chatzilygeroudis, Jean-Baptiste Mouret |
Abstract | The most data-efficient algorithms for reinforcement learning in robotics are model-based policy search algorithms, which alternate between learning a dynamical model of the robot and optimizing a policy to maximize the expected return given the model and its uncertainties. However, the current algorithms lack an effective exploration strategy to deal with sparse or misleading reward scenarios: if they do not experience any state with a positive reward during the initial random exploration, it is very unlikely to solve the problem. Here, we propose a novel model-based policy search algorithm, Multi-DEX, that leverages a learned dynamical model to efficiently explore the task space and solve tasks with sparse rewards in a few episodes. To achieve this, we frame the policy search problem as a multi-objective, model-based policy optimization problem with three objectives: (1) generate maximally novel state trajectories, (2) maximize the expected return and (3) keep the system in state-space regions for which the model is as accurate as possible. We then optimize these objectives using a Pareto-based multi-objective optimization algorithm. The experiments show that Multi-DEX is able to solve sparse reward scenarios (with a simulated robotic arm) in much lower interaction time than VIME, TRPO, GEP-PG, CMA-ES and Black-DROPS. |
Tasks | Continuous Control, Efficient Exploration |
Published | 2018-06-25 |
URL | https://arxiv.org/abs/1806.09351v3 |
https://arxiv.org/pdf/1806.09351v3.pdf | |
PWC | https://paperswithcode.com/paper/multi-objective-model-based-policy-search-for |
Repo | https://github.com/resibots/kaushik_2018_multi-dex |
Framework | none |
Curriculum Adversarial Training
Title | Curriculum Adversarial Training |
Authors | Qi-Zhi Cai, Min Du, Chang Liu, Dawn Song |
Abstract | Recently, deep learning has been applied to many security-sensitive applications, such as facial authentication. The existence of adversarial examples hinders such applications. The state-of-the-art result on defense shows that adversarial training can be applied to train a robust model on MNIST against adversarial examples; but it fails to achieve a high empirical worst-case accuracy on a more complex task, such as CIFAR-10 and SVHN. In our work, we propose curriculum adversarial training (CAT) to resolve this issue. The basic idea is to develop a curriculum of adversarial examples generated by attacks with a wide range of strengths. With two techniques to mitigate the forgetting and the generalization issues, we demonstrate that CAT can improve the prior art’s empirical worst-case accuracy by a large margin of 25% on CIFAR-10 and 35% on SVHN. At the same, the model’s performance on non-adversarial inputs is comparable to the state-of-the-art models. |
Tasks | |
Published | 2018-05-13 |
URL | http://arxiv.org/abs/1805.04807v1 |
http://arxiv.org/pdf/1805.04807v1.pdf | |
PWC | https://paperswithcode.com/paper/curriculum-adversarial-training |
Repo | https://github.com/sunblaze-ucb/curriculum-adversarial-training-CAT |
Framework | pytorch |
Deep Leaf Segmentation Using Synthetic Data
Title | Deep Leaf Segmentation Using Synthetic Data |
Authors | Daniel Ward, Peyman Moghadam, Nicolas Hudson |
Abstract | Automated segmentation of individual leaves of a plant in an image is a prerequisite to measure more complex phenotypic traits in high-throughput phenotyping. Applying state-of-the-art machine learning approaches to tackle leaf instance segmentation requires a large amount of manually annotated training data. Currently, the benchmark datasets for leaf segmentation contain only a few hundred labeled training images. In this paper, we propose a framework for leaf instance segmentation by augmenting real plant datasets with generated synthetic images of plants inspired by domain randomisation. We train a state-of-the-art deep learning segmentation architecture (Mask-RCNN) with a combination of real and synthetic images of Arabidopsis plants. Our proposed approach achieves 90% leaf segmentation score on the A1 test set outperforming the-state-of-the-art approaches for the CVPPP Leaf Segmentation Challenge (LSC). Our approach also achieves 81% mean performance over all five test datasets. |
Tasks | Instance Segmentation, Semantic Segmentation |
Published | 2018-07-28 |
URL | http://arxiv.org/abs/1807.10931v3 |
http://arxiv.org/pdf/1807.10931v3.pdf | |
PWC | https://paperswithcode.com/paper/deep-leaf-segmentation-using-synthetic-data |
Repo | https://github.com/DanielCWard/Deep-Leaf-Segmentation-Using-Synthetic-Data |
Framework | none |
NRTR: A No-Recurrence Sequence-to-Sequence Model For Scene Text Recognition
Title | NRTR: A No-Recurrence Sequence-to-Sequence Model For Scene Text Recognition |
Authors | Fenfen Sheng, Zhineng Chen, Bo Xu |
Abstract | Scene text recognition has attracted a great many researches due to its importance to various applications. Existing methods mainly adopt recurrence or convolution based networks. Though have obtained good performance, these methods still suffer from two limitations: slow training speed due to the internal recurrence of RNNs, and high complexity due to stacked convolutional layers for long-term feature extraction. This paper, for the first time, proposes a no-recurrence sequence-to-sequence text recognizer, named NRTR, that dispenses with recurrences and convolutions entirely. NRTR follows the encoder-decoder paradigm, where the encoder uses stacked self-attention to extract image features, and the decoder applies stacked self-attention to recognize texts based on encoder output. NRTR relies solely on self-attention mechanism thus could be trained with more parallelization and less complexity. Considering scene image has large variation in text and background, we further design a modality-transform block to effectively transform 2D input images to 1D sequences, combined with the encoder to extract more discriminative features. NRTR achieves state-of-the-art or highly competitive performance on both regular and irregular benchmarks, while requires only a small fraction of training time compared to the best model from the literature (at least 8 times faster). |
Tasks | Scene Text Recognition |
Published | 2018-06-04 |
URL | https://arxiv.org/abs/1806.00926v2 |
https://arxiv.org/pdf/1806.00926v2.pdf | |
PWC | https://paperswithcode.com/paper/nrtr-a-no-recurrence-sequence-to-sequence |
Repo | https://github.com/Belval/NRTR |
Framework | tf |
COLA: Decentralized Linear Learning
Title | COLA: Decentralized Linear Learning |
Authors | Lie He, An Bian, Martin Jaggi |
Abstract | Decentralized machine learning is a promising emerging paradigm in view of global challenges of data ownership and privacy. We consider learning of linear classification and regression models, in the setting where the training data is decentralized over many user devices, and the learning algorithm must run on-device, on an arbitrary communication network, without a central coordinator. We propose COLA, a new decentralized training algorithm with strong theoretical guarantees and superior practical performance. Our framework overcomes many limitations of existing methods, and achieves communication efficiency, scalability, elasticity as well as resilience to changes in data and participating devices. |
Tasks | |
Published | 2018-08-13 |
URL | https://arxiv.org/abs/1808.04883v4 |
https://arxiv.org/pdf/1808.04883v4.pdf | |
PWC | https://paperswithcode.com/paper/cola-decentralized-linear-learning |
Repo | https://github.com/epfml/cola |
Framework | pytorch |
Language Modeling with Sparse Product of Sememe Experts
Title | Language Modeling with Sparse Product of Sememe Experts |
Authors | Yihong Gu, Jun Yan, Hao Zhu, Zhiyuan Liu, Ruobing Xie, Maosong Sun, Fen Lin, Leyu Lin |
Abstract | Most language modeling methods rely on large-scale data to statistically learn the sequential patterns of words. In this paper, we argue that words are atomic language units but not necessarily atomic semantic units. Inspired by HowNet, we use sememes, the minimum semantic units in human languages, to represent the implicit semantics behind words for language modeling, named Sememe-Driven Language Model (SDLM). More specifically, to predict the next word, SDLM first estimates the sememe distribution gave textual context. Afterward, it regards each sememe as a distinct semantic expert, and these experts jointly identify the most probable senses and the corresponding word. In this way, SDLM enables language models to work beyond word-level manipulation to fine-grained sememe-level semantics and offers us more powerful tools to fine-tune language models and improve the interpretability as well as the robustness of language models. Experiments on language modeling and the downstream application of headline gener- ation demonstrate the significant effect of SDLM. Source code and data used in the experiments can be accessed at https:// github.com/thunlp/SDLM-pytorch. |
Tasks | Language Modelling |
Published | 2018-10-29 |
URL | http://arxiv.org/abs/1810.12387v1 |
http://arxiv.org/pdf/1810.12387v1.pdf | |
PWC | https://paperswithcode.com/paper/language-modeling-with-sparse-product-of |
Repo | https://github.com/thunlp/SDLM-pytorch |
Framework | pytorch |
SketchyGAN: Towards Diverse and Realistic Sketch to Image Synthesis
Title | SketchyGAN: Towards Diverse and Realistic Sketch to Image Synthesis |
Authors | Wengling Chen, James Hays |
Abstract | Synthesizing realistic images from human drawn sketches is a challenging problem in computer graphics and vision. Existing approaches either need exact edge maps, or rely on retrieval of existing photographs. In this work, we propose a novel Generative Adversarial Network (GAN) approach that synthesizes plausible images from 50 categories including motorcycles, horses and couches. We demonstrate a data augmentation technique for sketches which is fully automatic, and we show that the augmented data is helpful to our task. We introduce a new network building block suitable for both the generator and discriminator which improves the information flow by injecting the input image at multiple scales. Compared to state-of-the-art image translation methods, our approach generates more realistic images and achieves significantly higher Inception Scores. |
Tasks | Data Augmentation, Image Generation |
Published | 2018-01-09 |
URL | http://arxiv.org/abs/1801.02753v2 |
http://arxiv.org/pdf/1801.02753v2.pdf | |
PWC | https://paperswithcode.com/paper/sketchygan-towards-diverse-and-realistic |
Repo | https://github.com/wchen342/SketchyGAN |
Framework | tf |
Non-Matrix Tactile Sensors: How Can Be Exploited Their Local Connectivity For Predicting Grasp Stability?
Title | Non-Matrix Tactile Sensors: How Can Be Exploited Their Local Connectivity For Predicting Grasp Stability? |
Authors | Brayan S. Zapata-Impata, Pablo Gil, Fernando Torres |
Abstract | Tactile sensors supply useful information during the interaction with an object that can be used for assessing the stability of a grasp. Most of the previous works on this topic processed tactile readings as signals by calculating hand-picked features. Some of them have processed these readings as images calculating characteristics on matrix-like sensors. In this work, we explore how non-matrix sensors (sensors with taxels not arranged exactly in a matrix) can be processed as tactile images as well. In addition, we prove that they can be used for predicting grasp stability by training a Convolutional Neural Network (CNN) with them. We captured over 2500 real three-fingered grasps on 41 everyday objects to train a CNN that exploited the local connectivity inherent on the non-matrix tactile sensors, achieving 94.2% F1-score on predicting stability. |
Tasks | |
Published | 2018-09-14 |
URL | http://arxiv.org/abs/1809.05551v1 |
http://arxiv.org/pdf/1809.05551v1.pdf | |
PWC | https://paperswithcode.com/paper/non-matrix-tactile-sensors-how-can-be |
Repo | https://github.com/yayaneath/biotac-sp-images |
Framework | none |
There Are Many Consistent Explanations of Unlabeled Data: Why You Should Average
Title | There Are Many Consistent Explanations of Unlabeled Data: Why You Should Average |
Authors | Ben Athiwaratkun, Marc Finzi, Pavel Izmailov, Andrew Gordon Wilson |
Abstract | Presently the most successful approaches to semi-supervised learning are based on consistency regularization, whereby a model is trained to be robust to small perturbations of its inputs and parameters. To understand consistency regularization, we conceptually explore how loss geometry interacts with training procedures. The consistency loss dramatically improves generalization performance over supervised-only training; however, we show that SGD struggles to converge on the consistency loss and continues to make large steps that lead to changes in predictions on the test data. Motivated by these observations, we propose to train consistency-based methods with Stochastic Weight Averaging (SWA), a recent approach which averages weights along the trajectory of SGD with a modified learning rate schedule. We also propose fast-SWA, which further accelerates convergence by averaging multiple points within each cycle of a cyclical learning rate schedule. With weight averaging, we achieve the best known semi-supervised results on CIFAR-10 and CIFAR-100, over many different quantities of labeled training data. For example, we achieve 5.0% error on CIFAR-10 with only 4000 labels, compared to the previous best result in the literature of 6.3%. |
Tasks | Domain Adaptation, Semi-Supervised Image Classification |
Published | 2018-06-14 |
URL | http://arxiv.org/abs/1806.05594v3 |
http://arxiv.org/pdf/1806.05594v3.pdf | |
PWC | https://paperswithcode.com/paper/there-are-many-consistent-explanations-of |
Repo | https://github.com/benathi/fastswa-semi-sup |
Framework | pytorch |
Uncertainty-Aware Attention for Reliable Interpretation and Prediction
Title | Uncertainty-Aware Attention for Reliable Interpretation and Prediction |
Authors | Jay Heo, Hae Beom Lee, Saehoon Kim, Juho Lee, Kwang Joon Kim, Eunho Yang, Sung Ju Hwang |
Abstract | Attention mechanism is effective in both focusing the deep learning models on relevant features and interpreting them. However, attentions may be unreliable since the networks that generate them are often trained in a weakly-supervised manner. To overcome this limitation, we introduce the notion of input-dependent uncertainty to the attention mechanism, such that it generates attention for each feature with varying degrees of noise based on the given input, to learn larger variance on instances it is uncertain about. We learn this Uncertainty-aware Attention (UA) mechanism using variational inference, and validate it on various risk prediction tasks from electronic health records on which our model significantly outperforms existing attention models. The analysis of the learned attentions shows that our model generates attentions that comply with clinicians’ interpretation, and provide richer interpretation via learned variance. Further evaluation of both the accuracy of the uncertainty calibration and the prediction performance with “I don’t know” decision show that UA yields networks with high reliability as well. |
Tasks | Calibration |
Published | 2018-05-24 |
URL | http://arxiv.org/abs/1805.09653v1 |
http://arxiv.org/pdf/1805.09653v1.pdf | |
PWC | https://paperswithcode.com/paper/uncertainty-aware-attention-for-reliable |
Repo | https://github.com/jayheo/UA |
Framework | tf |
PredRNN++: Towards A Resolution of the Deep-in-Time Dilemma in Spatiotemporal Predictive Learning
Title | PredRNN++: Towards A Resolution of the Deep-in-Time Dilemma in Spatiotemporal Predictive Learning |
Authors | Yunbo Wang, Zhifeng Gao, Mingsheng Long, Jianmin Wang, Philip S. Yu |
Abstract | We present PredRNN++, an improved recurrent network for video predictive learning. In pursuit of a greater spatiotemporal modeling capability, our approach increases the transition depth between adjacent states by leveraging a novel recurrent unit, which is named Causal LSTM for re-organizing the spatial and temporal memories in a cascaded mechanism. However, there is still a dilemma in video predictive learning: increasingly deep-in-time models have been designed for capturing complex variations, while introducing more difficulties in the gradient back-propagation. To alleviate this undesirable effect, we propose a Gradient Highway architecture, which provides alternative shorter routes for gradient flows from outputs back to long-range inputs. This architecture works seamlessly with causal LSTMs, enabling PredRNN++ to capture short-term and long-term dependencies adaptively. We assess our model on both synthetic and real video datasets, showing its ability to ease the vanishing gradient problem and yield state-of-the-art prediction results even in a difficult objects occlusion scenario. |
Tasks | |
Published | 2018-04-17 |
URL | http://arxiv.org/abs/1804.06300v2 |
http://arxiv.org/pdf/1804.06300v2.pdf | |
PWC | https://paperswithcode.com/paper/predrnn-towards-a-resolution-of-the-deep-in |
Repo | https://github.com/Yunbo426/predrnn-pp |
Framework | tf |
Masked Conditional Neural Networks for Audio Classification
Title | Masked Conditional Neural Networks for Audio Classification |
Authors | Fady Medhat, David Chesmore, John Robinson |
Abstract | We present the ConditionaL Neural Network (CLNN) and the Masked ConditionaL Neural Network (MCLNN) designed for temporal signal recognition. The CLNN takes into consideration the temporal nature of the sound signal and the MCLNN extends upon the CLNN through a binary mask to preserve the spatial locality of the features and allows an automated exploration of the features combination analogous to hand-crafting the most relevant features for the recognition task. MCLNN has achieved competitive recognition accuracies on the GTZAN and the ISMIR2004 music datasets that surpass several state-of-the-art neural network based architectures and hand-crafted methods applied on both datasets. |
Tasks | Audio Classification |
Published | 2018-03-06 |
URL | http://arxiv.org/abs/1803.02421v2 |
http://arxiv.org/pdf/1803.02421v2.pdf | |
PWC | https://paperswithcode.com/paper/masked-conditional-neural-networks-for-audio |
Repo | https://github.com/fadymedhat/MCLNN |
Framework | tf |
Towards multi-instrument drum transcription
Title | Towards multi-instrument drum transcription |
Authors | Richard Vogl, Gerhard Widmer, Peter Knees |
Abstract | Automatic drum transcription, a subtask of the more general automatic music transcription, deals with extracting drum instrument note onsets from an audio source. Recently, progress in transcription performance has been made using non-negative matrix factorization as well as deep learning methods. However, these works primarily focus on transcribing three drum instruments only: snare drum, bass drum, and hi-hat. Yet, for many applications, the ability to transcribe more drum instruments which make up standard drum kits used in western popular music would be desirable. In this work, convolutional and convolutional recurrent neural networks are trained to transcribe a wider range of drum instruments. First, the shortcomings of publicly available datasets in this context are discussed. To overcome these limitations, a larger synthetic dataset is introduced. Then, methods to train models using the new dataset focusing on generalization to real world data are investigated. Finally, the trained models are evaluated on publicly available datasets and results are discussed. The contributions of this work comprise: (i.) a large-scale synthetic dataset for drum transcription, (ii.) first steps towards an automatic drum transcription system that supports a larger range of instruments by evaluating and discussing training setups and the impact of datasets in this context, and (iii.) a publicly available set of trained models for drum transcription. Additional materials are available at http://ifs.tuwien.ac.at/~vogl/dafx2018 |
Tasks | Drum Transcription |
Published | 2018-06-18 |
URL | http://arxiv.org/abs/1806.06676v2 |
http://arxiv.org/pdf/1806.06676v2.pdf | |
PWC | https://paperswithcode.com/paper/towards-multi-instrument-drum-transcription |
Repo | https://github.com/keunwoochoi/DrummerNet |
Framework | pytorch |