October 21, 2019

2690 words 13 mins read

Paper Group AWR 127

Paper Group AWR 127

Temporal Gaussian Mixture Layer for Videos. Hyperspherical Variational Auto-Encoders. Conditional Linear Regression. Efficient Model-Free Reinforcement Learning Using Gaussian Process. Neural Processes. Deep Underwater Image Enhancement. Comparing Dynamics: Deep Neural Networks versus Glassy Systems. End-to-End Speech Recognition From the Raw Wavef …

Temporal Gaussian Mixture Layer for Videos

Title Temporal Gaussian Mixture Layer for Videos
Authors AJ Piergiovanni, Michael S. Ryoo
Abstract We introduce a new convolutional layer named the Temporal Gaussian Mixture (TGM) layer and present how it can be used to efficiently capture longer-term temporal information in continuous activity videos. The TGM layer is a temporal convolutional layer governed by a much smaller set of parameters (e.g., location/variance of Gaussians) that are fully differentiable. We present our fully convolutional video models with multiple TGM layers for activity detection. The extensive experiments on multiple datasets, including Charades and MultiTHUMOS, confirm the effectiveness of TGM layers, significantly outperforming the state-of-the-arts.
Tasks Action Detection, Activity Detection
Published 2018-03-16
URL https://arxiv.org/abs/1803.06316v6
PDF https://arxiv.org/pdf/1803.06316v6.pdf
PWC https://paperswithcode.com/paper/temporal-gaussian-mixture-layer-for-videos
Repo https://github.com/piergiaj/evanet-iccv19
Framework tf

Hyperspherical Variational Auto-Encoders

Title Hyperspherical Variational Auto-Encoders
Authors Tim R. Davidson, Luca Falorsi, Nicola De Cao, Thomas Kipf, Jakub M. Tomczak
Abstract The Variational Auto-Encoder (VAE) is one of the most used unsupervised machine learning models. But although the default choice of a Gaussian distribution for both the prior and posterior represents a mathematically convenient distribution often leading to competitive results, we show that this parameterization fails to model data with a latent hyperspherical structure. To address this issue we propose using a von Mises-Fisher (vMF) distribution instead, leading to a hyperspherical latent space. Through a series of experiments we show how such a hyperspherical VAE, or $\mathcal{S}$-VAE, is more suitable for capturing data with a hyperspherical latent structure, while outperforming a normal, $\mathcal{N}$-VAE, in low dimensions on other data types.
Tasks
Published 2018-04-03
URL http://arxiv.org/abs/1804.00891v2
PDF http://arxiv.org/pdf/1804.00891v2.pdf
PWC https://paperswithcode.com/paper/hyperspherical-variational-auto-encoders
Repo https://github.com/nicola-decao/s-vae
Framework tf

Conditional Linear Regression

Title Conditional Linear Regression
Authors Diego Calderon, Brendan Juba, Sirui Li, Zongyi Li, Lisa Ruan
Abstract Work in machine learning and statistics commonly focuses on building models that capture the vast majority of data, possibly ignoring a segment of the population as outliers. However, there does not often exist a good model on the whole dataset, so we seek to find a small subset where there exists a useful model. We are interested in finding a linear rule capable of achieving more accurate predictions for just a segment of the population. We give an efficient algorithm with theoretical analysis for the conditional linear regression task, which is the joint task of identifying a significant segment of the population, described by a k-DNF, along with its linear regression fit.
Tasks
Published 2018-06-06
URL https://arxiv.org/abs/1806.02326v2
PDF https://arxiv.org/pdf/1806.02326v2.pdf
PWC https://paperswithcode.com/paper/conditional-linear-regression
Repo https://github.com/wumming/lud
Framework none

Efficient Model-Free Reinforcement Learning Using Gaussian Process

Title Efficient Model-Free Reinforcement Learning Using Gaussian Process
Authors Ying Fan, Letian Chen, Yizhou Wang
Abstract Efficient Reinforcement Learning usually takes advantage of demonstration or good exploration strategy. By applying posterior sampling in model-free RL under the hypothesis of GP, we propose Gaussian Process Posterior Sampling Reinforcement Learning(GPPSTD) algorithm in continuous state space, giving theoretical justifications and empirical results. We also provide theoretical and empirical results that various demonstration could lower expected uncertainty and benefit posterior sampling exploration. In this way, we combined the demonstration and exploration process together to achieve a more efficient reinforcement learning.
Tasks
Published 2018-12-11
URL http://arxiv.org/abs/1812.04359v1
PDF http://arxiv.org/pdf/1812.04359v1.pdf
PWC https://paperswithcode.com/paper/efficient-model-free-reinforcement-learning
Repo https://github.com/Eunice330/model-based-RL
Framework pytorch

Neural Processes

Title Neural Processes
Authors Marta Garnelo, Jonathan Schwarz, Dan Rosenbaum, Fabio Viola, Danilo J. Rezende, S. M. Ali Eslami, Yee Whye Teh
Abstract A neural network (NN) is a parameterised function that can be tuned via gradient descent to approximate a labelled collection of data with high precision. A Gaussian process (GP), on the other hand, is a probabilistic model that defines a distribution over possible functions, and is updated in light of data via the rules of probabilistic inference. GPs are probabilistic, data-efficient and flexible, however they are also computationally intensive and thus limited in their applicability. We introduce a class of neural latent variable models which we call Neural Processes (NPs), combining the best of both worlds. Like GPs, NPs define distributions over functions, are capable of rapid adaptation to new observations, and can estimate the uncertainty in their predictions. Like NNs, NPs are computationally efficient during training and evaluation but also learn to adapt their priors to data. We demonstrate the performance of NPs on a range of learning tasks, including regression and optimisation, and compare and contrast with related models in the literature.
Tasks Latent Variable Models
Published 2018-07-04
URL http://arxiv.org/abs/1807.01622v1
PDF http://arxiv.org/pdf/1807.01622v1.pdf
PWC https://paperswithcode.com/paper/neural-processes
Repo https://github.com/Arnaud15/CS236_Neural_Processes_For_Image_Completion
Framework pytorch

Deep Underwater Image Enhancement

Title Deep Underwater Image Enhancement
Authors Saeed Anwar, Chongyi Li, Fatih Porikli
Abstract In an underwater scene, wavelength-dependent light absorption and scattering degrade the visibility of images, causing low contrast and distorted color casts. To address this problem, we propose a convolutional neural network based image enhancement model, i.e., UWCNN, which is trained efficiently using a synthetic underwater image database. Unlike the existing works that require the parameters of underwater imaging model estimation or impose inflexible frameworks applicable only for specific scenes, our model directly reconstructs the clear latent underwater image by leveraging on an automatic end-to-end and data-driven training mechanism. Compliant with underwater imaging models and optical properties of underwater scenes, we first synthesize ten different marine image databases. Then, we separately train multiple UWCNN models for each underwater image formation type. Experimental results on real-world and synthetic underwater images demonstrate that the presented method generalizes well on different underwater scenes and outperforms the existing methods both qualitatively and quantitatively. Besides, we conduct an ablation study to demonstrate the effect of each component in our network.
Tasks Image Enhancement
Published 2018-07-10
URL http://arxiv.org/abs/1807.03528v1
PDF http://arxiv.org/pdf/1807.03528v1.pdf
PWC https://paperswithcode.com/paper/deep-underwater-image-enhancement
Repo https://github.com/saeed-anwar/UWCNN
Framework tf

Comparing Dynamics: Deep Neural Networks versus Glassy Systems

Title Comparing Dynamics: Deep Neural Networks versus Glassy Systems
Authors M. Baity-Jesi, L. Sagun, M. Geiger, S. Spigler, G. Ben Arous, C. Cammarota, Y. LeCun, M. Wyart, G. Biroli
Abstract We analyze numerically the training dynamics of deep neural networks (DNN) by using methods developed in statistical physics of glassy systems. The two main issues we address are (1) the complexity of the loss landscape and of the dynamics within it, and (2) to what extent DNNs share similarities with glassy systems. Our findings, obtained for different architectures and datasets, suggest that during the training process the dynamics slows down because of an increasingly large number of flat directions. At large times, when the loss is approaching zero, the system diffuses at the bottom of the landscape. Despite some similarities with the dynamics of mean-field glassy systems, in particular, the absence of barrier crossing, we find distinctive dynamical behaviors in the two cases, showing that the statistical properties of the corresponding loss and energy landscapes are different. In contrast, when the network is under-parametrized we observe a typical glassy behavior, thus suggesting the existence of different phases depending on whether the network is under-parametrized or over-parametrized.
Tasks
Published 2018-03-19
URL http://arxiv.org/abs/1803.06969v2
PDF http://arxiv.org/pdf/1803.06969v2.pdf
PWC https://paperswithcode.com/paper/comparing-dynamics-deep-neural-networks-1
Repo https://github.com/mbaityje/DEEP-GLASS
Framework pytorch

End-to-End Speech Recognition From the Raw Waveform

Title End-to-End Speech Recognition From the Raw Waveform
Authors Neil Zeghidour, Nicolas Usunier, Gabriel Synnaeve, Ronan Collobert, Emmanuel Dupoux
Abstract State-of-the-art speech recognition systems rely on fixed, hand-crafted features such as mel-filterbanks to preprocess the waveform before the training pipeline. In this paper, we study end-to-end systems trained directly from the raw waveform, building on two alternatives for trainable replacements of mel-filterbanks that use a convolutional architecture. The first one is inspired by gammatone filterbanks (Hoshen et al., 2015; Sainath et al, 2015), and the second one by the scattering transform (Zeghidour et al., 2017). We propose two modifications to these architectures and systematically compare them to mel-filterbanks, on the Wall Street Journal dataset. The first modification is the addition of an instance normalization layer, which greatly improves on the gammatone-based trainable filterbanks and speeds up the training of the scattering-based filterbanks. The second one relates to the low-pass filter used in these approaches. These modifications consistently improve performances for both approaches, and remove the need for a careful initialization in scattering-based trainable filterbanks. In particular, we show a consistent improvement in word error rate of the trainable filterbanks relatively to comparable mel-filterbanks. It is the first time end-to-end models trained from the raw signal significantly outperform mel-filterbanks on a large vocabulary task under clean recording conditions.
Tasks End-To-End Speech Recognition, Speech Recognition
Published 2018-06-19
URL http://arxiv.org/abs/1806.07098v2
PDF http://arxiv.org/pdf/1806.07098v2.pdf
PWC https://paperswithcode.com/paper/end-to-end-speech-recognition-from-the-raw
Repo https://github.com/renyuanL/ry-Speech-commands
Framework tf

LeapsAndBounds: A Method for Approximately Optimal Algorithm Configuration

Title LeapsAndBounds: A Method for Approximately Optimal Algorithm Configuration
Authors Gellért Weisz, András György, Csaba Szepesvári
Abstract We consider the problem of configuring general-purpose solvers to run efficiently on problem instances drawn from an unknown distribution. The goal of the configurator is to find a configuration that runs fast on average on most instances, and do so with the least amount of total work. It can run a chosen solver on a random instance until the solver finishes or a timeout is reached. We propose LeapsAndBounds, an algorithm that tests configurations on randomly selected problem instances for longer and longer time. We prove that the capped expected runtime of the configuration returned by LeapsAndBounds is close to the optimal expected runtime, while our algorithm’s running time is near-optimal. Our results show that LeapsAndBounds is more efficient than the recent algorithm of Kleinberg et al. (2017), which, to our knowledge, is the only other algorithm configuration method with non-trivial theoretical guarantees. Experimental results on configuring a public SAT solver on a new benchmark dataset also stand witness to the superiority of our method.
Tasks
Published 2018-07-02
URL http://arxiv.org/abs/1807.00755v1
PDF http://arxiv.org/pdf/1807.00755v1.pdf
PWC https://paperswithcode.com/paper/leapsandbounds-a-method-for-approximately
Repo https://github.com/drgrhm/alg_config
Framework none

On the Power of Over-parametrization in Neural Networks with Quadratic Activation

Title On the Power of Over-parametrization in Neural Networks with Quadratic Activation
Authors Simon S. Du, Jason D. Lee
Abstract We provide new theoretical insights on why over-parametrization is effective in learning neural networks. For a $k$ hidden node shallow network with quadratic activation and $n$ training data points, we show as long as $ k \ge \sqrt{2n}$, over-parametrization enables local search algorithms to find a \emph{globally} optimal solution for general smooth and convex loss functions. Further, despite that the number of parameters may exceed the sample size, using theory of Rademacher complexity, we show with weight decay, the solution also generalizes well if the data is sampled from a regular distribution such as Gaussian. To prove when $k\ge \sqrt{2n}$, the loss function has benign landscape properties, we adopt an idea from smoothed analysis, which may have other applications in studying loss surfaces of neural networks.
Tasks
Published 2018-03-03
URL http://arxiv.org/abs/1803.01206v2
PDF http://arxiv.org/pdf/1803.01206v2.pdf
PWC https://paperswithcode.com/paper/on-the-power-of-over-parametrization-in-1
Repo https://github.com/Clumsyndicate/One_layer_analysis_network
Framework tf

Clipped Action Policy Gradient

Title Clipped Action Policy Gradient
Authors Yasuhiro Fujita, Shin-ichi Maeda
Abstract Many continuous control tasks have bounded action spaces. When policy gradient methods are applied to such tasks, out-of-bound actions need to be clipped before execution, while policies are usually optimized as if the actions are not clipped. We propose a policy gradient estimator that exploits the knowledge of actions being clipped to reduce the variance in estimation. We prove that our estimator, named clipped action policy gradient (CAPG), is unbiased and achieves lower variance than the conventional estimator that ignores action bounds. Experimental results demonstrate that CAPG generally outperforms the conventional estimator, indicating that it is a better policy gradient estimator for continuous control tasks. The source code is available at https://github.com/pfnet-research/capg.
Tasks Continuous Control, Policy Gradient Methods
Published 2018-02-21
URL http://arxiv.org/abs/1802.07564v2
PDF http://arxiv.org/pdf/1802.07564v2.pdf
PWC https://paperswithcode.com/paper/clipped-action-policy-gradient
Repo https://github.com/pfnet-research/capg
Framework none

Augmented CycleGAN: Learning Many-to-Many Mappings from Unpaired Data

Title Augmented CycleGAN: Learning Many-to-Many Mappings from Unpaired Data
Authors Amjad Almahairi, Sai Rajeswar, Alessandro Sordoni, Philip Bachman, Aaron Courville
Abstract Learning inter-domain mappings from unpaired data can improve performance in structured prediction tasks, such as image segmentation, by reducing the need for paired data. CycleGAN was recently proposed for this problem, but critically assumes the underlying inter-domain mapping is approximately deterministic and one-to-one. This assumption renders the model ineffective for tasks requiring flexible, many-to-many mappings. We propose a new model, called Augmented CycleGAN, which learns many-to-many mappings between domains. We examine Augmented CycleGAN qualitatively and quantitatively on several image datasets.
Tasks Semantic Segmentation, Structured Prediction
Published 2018-02-27
URL http://arxiv.org/abs/1802.10151v2
PDF http://arxiv.org/pdf/1802.10151v2.pdf
PWC https://paperswithcode.com/paper/augmented-cyclegan-learning-many-to-many
Repo https://github.com/NathanDeMaria/AugmentedCycleGAN
Framework pytorch

Learning cross-lingual phonological and orthagraphic adaptations: a case study in improving neural machine translation between low-resource languages

Title Learning cross-lingual phonological and orthagraphic adaptations: a case study in improving neural machine translation between low-resource languages
Authors Saurav Jha, Akhilesh Sudhakar, Anil Kumar Singh
Abstract Out-of-vocabulary (OOV) words can pose serious challenges for machine translation (MT) tasks, and in particular, for low-resource language (LRL) pairs, i.e., language pairs for which few or no parallel corpora exist. Our work adapts variants of seq2seq models to perform transduction of such words from Hindi to Bhojpuri (an LRL instance), learning from a set of cognate pairs built from a bilingual dictionary of Hindi–Bhojpuri words. We demonstrate that our models can be effectively used for language pairs that have limited parallel corpora; our models work at the character level to grasp phonetic and orthographic similarities across multiple types of word adaptations, whether synchronic or diachronic, loan words or cognates. We describe the training aspects of several character level NMT systems that we adapted to this task and characterize their typical errors. Our method improves BLEU score by 6.3 on the Hindi-to-Bhojpuri translation task. Further, we show that such transductions can generalize well to other languages by applying it successfully to Hindi – Bangla cognate pairs. Our work can be seen as an important step in the process of: (i) resolving the OOV words problem arising in MT tasks, (ii) creating effective parallel corpora for resource-constrained languages, and (iii) leveraging the enhanced semantic knowledge captured by word-level embeddings to perform character-level tasks.
Tasks Machine Translation
Published 2018-11-21
URL https://arxiv.org/abs/1811.08816v2
PDF https://arxiv.org/pdf/1811.08816v2.pdf
PWC https://paperswithcode.com/paper/neural-machine-translation-based-word
Repo https://github.com/Saurav0074/nmt-based-word-transduction
Framework tf

Bilateral Adversarial Training: Towards Fast Training of More Robust Models Against Adversarial Attacks

Title Bilateral Adversarial Training: Towards Fast Training of More Robust Models Against Adversarial Attacks
Authors Jianyu Wang, Haichao Zhang
Abstract In this paper, we study fast training of adversarially robust models. From the analyses of the state-of-the-art defense method, i.e., the multi-step adversarial training, we hypothesize that the gradient magnitude links to the model robustness. Motivated by this, we propose to perturb both the image and the label during training, which we call Bilateral Adversarial Training (BAT). To generate the adversarial label, we derive an closed-form heuristic solution. To generate the adversarial image, we use one-step targeted attack with the target label being the most confusing class. In the experiment, we first show that random start and the most confusing target attack effectively prevent the label leaking and gradient masking problem. Then coupled with the adversarial label part, our model significantly improves the state-of-the-art results. For example, against PGD100 white-box attack with cross-entropy loss, on CIFAR10, we achieve 63.7% versus 47.2%; on SVHN, we achieve 59.1% versus 42.1%. At last, the experiment on the very (computationally) challenging ImageNet dataset further demonstrates the effectiveness of our fast method.
Tasks
Published 2018-11-26
URL https://arxiv.org/abs/1811.10716v2
PDF https://arxiv.org/pdf/1811.10716v2.pdf
PWC https://paperswithcode.com/paper/bilateral-adversarial-training-towards-fast
Repo https://github.com/Line290/FeatureAttack
Framework pytorch

Continuous-time Models for Stochastic Optimization Algorithms

Title Continuous-time Models for Stochastic Optimization Algorithms
Authors Antonio Orvieto, Aurelien Lucchi
Abstract We propose new continuous-time formulations for first-order stochastic optimization algorithms such as mini-batch gradient descent and variance-reduced methods. We exploit these continuous-time models, together with simple Lyapunov analysis as well as tools from stochastic calculus, in order to derive convergence bounds for various types of non-convex functions. Guided by such analysis, we show that the same Lyapunov arguments hold in discrete-time, leading to matching rates. In addition, we use these models and Ito calculus to infer novel insights on the dynamics of SGD, proving that a decreasing learning rate acts as time warping or, equivalently, as landscape stretching.
Tasks Stochastic Optimization
Published 2018-10-05
URL https://arxiv.org/abs/1810.02565v3
PDF https://arxiv.org/pdf/1810.02565v3.pdf
PWC https://paperswithcode.com/paper/continuous-time-models-for-stochastic
Repo https://github.com/aorvieto/SGD-SVRG-models
Framework none
comments powered by Disqus