October 21, 2019

2690 words 13 mins read

Paper Group AWR 127

Temporal Gaussian Mixture Layer for Videos. Hyperspherical Variational Auto-Encoders. Conditional Linear Regression. Efficient Model-Free Reinforcement Learning Using Gaussian Process. Neural Processes. Deep Underwater Image Enhancement. Comparing Dynamics: Deep Neural Networks versus Glassy Systems. End-to-End Speech Recognition From the Raw Wavef …

Temporal Gaussian Mixture Layer for Videos


Title	Temporal Gaussian Mixture Layer for Videos
Authors	AJ Piergiovanni, Michael S. Ryoo
Abstract	We introduce a new convolutional layer named the Temporal Gaussian Mixture (TGM) layer and present how it can be used to efficiently capture longer-term temporal information in continuous activity videos. The TGM layer is a temporal convolutional layer governed by a much smaller set of parameters (e.g., location/variance of Gaussians) that are fully differentiable. We present our fully convolutional video models with multiple TGM layers for activity detection. The extensive experiments on multiple datasets, including Charades and MultiTHUMOS, confirm the effectiveness of TGM layers, significantly outperforming the state-of-the-arts.
Tasks	Action Detection, Activity Detection
Published	2018-03-16
URL	https://arxiv.org/abs/1803.06316v6
PDF	https://arxiv.org/pdf/1803.06316v6.pdf
PWC	https://paperswithcode.com/paper/temporal-gaussian-mixture-layer-for-videos
Repo	https://github.com/piergiaj/evanet-iccv19
Framework	tf

Hyperspherical Variational Auto-Encoders


Title	Hyperspherical Variational Auto-Encoders
Authors	Tim R. Davidson, Luca Falorsi, Nicola De Cao, Thomas Kipf, Jakub M. Tomczak
Abstract	The Variational Auto-Encoder (VAE) is one of the most used unsupervised machine learning models. But although the default choice of a Gaussian distribution for both the prior and posterior represents a mathematically convenient distribution often leading to competitive results, we show that this parameterization fails to model data with a latent hyperspherical structure. To address this issue we propose using a von Mises-Fisher (vMF) distribution instead, leading to a hyperspherical latent space. Through a series of experiments we show how such a hyperspherical VAE, or $\mathcal{S}$-VAE, is more suitable for capturing data with a hyperspherical latent structure, while outperforming a normal, $\mathcal{N}$-VAE, in low dimensions on other data types.
Tasks
Published	2018-04-03
URL	http://arxiv.org/abs/1804.00891v2
PDF	http://arxiv.org/pdf/1804.00891v2.pdf
PWC	https://paperswithcode.com/paper/hyperspherical-variational-auto-encoders
Repo	https://github.com/nicola-decao/s-vae
Framework	tf

Conditional Linear Regression


Title	Conditional Linear Regression
Authors	Diego Calderon, Brendan Juba, Sirui Li, Zongyi Li, Lisa Ruan
Abstract	Work in machine learning and statistics commonly focuses on building models that capture the vast majority of data, possibly ignoring a segment of the population as outliers. However, there does not often exist a good model on the whole dataset, so we seek to find a small subset where there exists a useful model. We are interested in finding a linear rule capable of achieving more accurate predictions for just a segment of the population. We give an efficient algorithm with theoretical analysis for the conditional linear regression task, which is the joint task of identifying a significant segment of the population, described by a k-DNF, along with its linear regression fit.
Tasks
Published	2018-06-06
URL	https://arxiv.org/abs/1806.02326v2
PDF	https://arxiv.org/pdf/1806.02326v2.pdf
PWC	https://paperswithcode.com/paper/conditional-linear-regression
Repo	https://github.com/wumming/lud
Framework	none

Efficient Model-Free Reinforcement Learning Using Gaussian Process


Title	Efficient Model-Free Reinforcement Learning Using Gaussian Process
Authors	Ying Fan, Letian Chen, Yizhou Wang
Abstract	Efficient Reinforcement Learning usually takes advantage of demonstration or good exploration strategy. By applying posterior sampling in model-free RL under the hypothesis of GP, we propose Gaussian Process Posterior Sampling Reinforcement Learning(GPPSTD) algorithm in continuous state space, giving theoretical justifications and empirical results. We also provide theoretical and empirical results that various demonstration could lower expected uncertainty and benefit posterior sampling exploration. In this way, we combined the demonstration and exploration process together to achieve a more efficient reinforcement learning.
Tasks
Published	2018-12-11
URL	http://arxiv.org/abs/1812.04359v1
PDF	http://arxiv.org/pdf/1812.04359v1.pdf
PWC	https://paperswithcode.com/paper/efficient-model-free-reinforcement-learning
Repo	https://github.com/Eunice330/model-based-RL
Framework	pytorch

Neural Processes


Title	Neural Processes
Authors	Marta Garnelo, Jonathan Schwarz, Dan Rosenbaum, Fabio Viola, Danilo J. Rezende, S. M. Ali Eslami, Yee Whye Teh
Abstract	A neural network (NN) is a parameterised function that can be tuned via gradient descent to approximate a labelled collection of data with high precision. A Gaussian process (GP), on the other hand, is a probabilistic model that defines a distribution over possible functions, and is updated in light of data via the rules of probabilistic inference. GPs are probabilistic, data-efficient and flexible, however they are also computationally intensive and thus limited in their applicability. We introduce a class of neural latent variable models which we call Neural Processes (NPs), combining the best of both worlds. Like GPs, NPs define distributions over functions, are capable of rapid adaptation to new observations, and can estimate the uncertainty in their predictions. Like NNs, NPs are computationally efficient during training and evaluation but also learn to adapt their priors to data. We demonstrate the performance of NPs on a range of learning tasks, including regression and optimisation, and compare and contrast with related models in the literature.
Tasks	Latent Variable Models
Published	2018-07-04
URL	http://arxiv.org/abs/1807.01622v1
PDF	http://arxiv.org/pdf/1807.01622v1.pdf
PWC	https://paperswithcode.com/paper/neural-processes
Repo	https://github.com/Arnaud15/CS236_Neural_Processes_For_Image_Completion
Framework	pytorch

Deep Underwater Image Enhancement


Title	Deep Underwater Image Enhancement
Authors	Saeed Anwar, Chongyi Li, Fatih Porikli
Abstract	In an underwater scene, wavelength-dependent light absorption and scattering degrade the visibility of images, causing low contrast and distorted color casts. To address this problem, we propose a convolutional neural network based image enhancement model, i.e., UWCNN, which is trained efficiently using a synthetic underwater image database. Unlike the existing works that require the parameters of underwater imaging model estimation or impose inflexible frameworks applicable only for specific scenes, our model directly reconstructs the clear latent underwater image by leveraging on an automatic end-to-end and data-driven training mechanism. Compliant with underwater imaging models and optical properties of underwater scenes, we first synthesize ten different marine image databases. Then, we separately train multiple UWCNN models for each underwater image formation type. Experimental results on real-world and synthetic underwater images demonstrate that the presented method generalizes well on different underwater scenes and outperforms the existing methods both qualitatively and quantitatively. Besides, we conduct an ablation study to demonstrate the effect of each component in our network.
Tasks	Image Enhancement
Published	2018-07-10
URL	http://arxiv.org/abs/1807.03528v1
PDF	http://arxiv.org/pdf/1807.03528v1.pdf
PWC	https://paperswithcode.com/paper/deep-underwater-image-enhancement
Repo	https://github.com/saeed-anwar/UWCNN
Framework	tf

Comparing Dynamics: Deep Neural Networks versus Glassy Systems


Title	Comparing Dynamics: Deep Neural Networks versus Glassy Systems
Authors	M. Baity-Jesi, L. Sagun, M. Geiger, S. Spigler, G. Ben Arous, C. Cammarota, Y. LeCun, M. Wyart, G. Biroli
Abstract	We analyze numerically the training dynamics of deep neural networks (DNN) by using methods developed in statistical physics of glassy systems. The two main issues we address are (1) the complexity of the loss landscape and of the dynamics within it, and (2) to what extent DNNs share similarities with glassy systems. Our findings, obtained for different architectures and datasets, suggest that during the training process the dynamics slows down because of an increasingly large number of flat directions. At large times, when the loss is approaching zero, the system diffuses at the bottom of the landscape. Despite some similarities with the dynamics of mean-field glassy systems, in particular, the absence of barrier crossing, we find distinctive dynamical behaviors in the two cases, showing that the statistical properties of the corresponding loss and energy landscapes are different. In contrast, when the network is under-parametrized we observe a typical glassy behavior, thus suggesting the existence of different phases depending on whether the network is under-parametrized or over-parametrized.
Tasks
Published	2018-03-19
URL	http://arxiv.org/abs/1803.06969v2
PDF	http://arxiv.org/pdf/1803.06969v2.pdf
PWC	https://paperswithcode.com/paper/comparing-dynamics-deep-neural-networks-1
Repo	https://github.com/mbaityje/DEEP-GLASS
Framework	pytorch

End-to-End Speech Recognition From the Raw Waveform


Title	End-to-End Speech Recognition From the Raw Waveform
Authors	Neil Zeghidour, Nicolas Usunier, Gabriel Synnaeve, Ronan Collobert, Emmanuel Dupoux
Abstract	State-of-the-art speech recognition systems rely on fixed, hand-crafted features such as mel-filterbanks to preprocess the waveform before the training pipeline. In this paper, we study end-to-end systems trained directly from the raw waveform, building on two alternatives for trainable replacements of mel-filterbanks that use a convolutional architecture. The first one is inspired by gammatone filterbanks (Hoshen et al., 2015; Sainath et al, 2015), and the second one by the scattering transform (Zeghidour et al., 2017). We propose two modifications to these architectures and systematically compare them to mel-filterbanks, on the Wall Street Journal dataset. The first modification is the addition of an instance normalization layer, which greatly improves on the gammatone-based trainable filterbanks and speeds up the training of the scattering-based filterbanks. The second one relates to the low-pass filter used in these approaches. These modifications consistently improve performances for both approaches, and remove the need for a careful initialization in scattering-based trainable filterbanks. In particular, we show a consistent improvement in word error rate of the trainable filterbanks relatively to comparable mel-filterbanks. It is the first time end-to-end models trained from the raw signal significantly outperform mel-filterbanks on a large vocabulary task under clean recording conditions.
Tasks	End-To-End Speech Recognition, Speech Recognition
Published	2018-06-19
URL	http://arxiv.org/abs/1806.07098v2
PDF	http://arxiv.org/pdf/1806.07098v2.pdf
PWC	https://paperswithcode.com/paper/end-to-end-speech-recognition-from-the-raw
Repo	https://github.com/renyuanL/ry-Speech-commands
Framework	tf

LeapsAndBounds: A Method for Approximately Optimal Algorithm Configuration


Title	LeapsAndBounds: A Method for Approximately Optimal Algorithm Configuration
Authors	Gellért Weisz, András György, Csaba Szepesvári
Abstract	We consider the problem of configuring general-purpose solvers to run efficiently on problem instances drawn from an unknown distribution. The goal of the configurator is to find a configuration that runs fast on average on most instances, and do so with the least amount of total work. It can run a chosen solver on a random instance until the solver finishes or a timeout is reached. We propose LeapsAndBounds, an algorithm that tests configurations on randomly selected problem instances for longer and longer time. We prove that the capped expected runtime of the configuration returned by LeapsAndBounds is close to the optimal expected runtime, while our algorithm’s running time is near-optimal. Our results show that LeapsAndBounds is more efficient than the recent algorithm of Kleinberg et al. (2017), which, to our knowledge, is the only other algorithm configuration method with non-trivial theoretical guarantees. Experimental results on configuring a public SAT solver on a new benchmark dataset also stand witness to the superiority of our method.
Tasks
Published	2018-07-02
URL	http://arxiv.org/abs/1807.00755v1
PDF	http://arxiv.org/pdf/1807.00755v1.pdf
PWC	https://paperswithcode.com/paper/leapsandbounds-a-method-for-approximately
Repo	https://github.com/drgrhm/alg_config
Framework	none

On the Power of Over-parametrization in Neural Networks with Quadratic Activation


Title	On the Power of Over-parametrization in Neural Networks with Quadratic Activation
Authors	Simon S. Du, Jason D. Lee
Abstract	We provide new theoretical insights on why over-parametrization is effective in learning neural networks. For a $k$ hidden node shallow network with quadratic activation and $n$ training data points, we show as long as $ k \ge \sqrt{2n}$, over-parametrization enables local search algorithms to find a \emph{globally} optimal solution for general smooth and convex loss functions. Further, despite that the number of parameters may exceed the sample size, using theory of Rademacher complexity, we show with weight decay, the solution also generalizes well if the data is sampled from a regular distribution such as Gaussian. To prove when $k\ge \sqrt{2n}$, the loss function has benign landscape properties, we adopt an idea from smoothed analysis, which may have other applications in studying loss surfaces of neural networks.
Tasks
Published	2018-03-03
URL	http://arxiv.org/abs/1803.01206v2
PDF	http://arxiv.org/pdf/1803.01206v2.pdf
PWC	https://paperswithcode.com/paper/on-the-power-of-over-parametrization-in-1
Repo	https://github.com/Clumsyndicate/One_layer_analysis_network
Framework	tf

Clipped Action Policy Gradient


Title	Clipped Action Policy Gradient
Authors	Yasuhiro Fujita, Shin-ichi Maeda
Abstract	Many continuous control tasks have bounded action spaces. When policy gradient methods are applied to such tasks, out-of-bound actions need to be clipped before execution, while policies are usually optimized as if the actions are not clipped. We propose a policy gradient estimator that exploits the knowledge of actions being clipped to reduce the variance in estimation. We prove that our estimator, named clipped action policy gradient (CAPG), is unbiased and achieves lower variance than the conventional estimator that ignores action bounds. Experimental results demonstrate that CAPG generally outperforms the conventional estimator, indicating that it is a better policy gradient estimator for continuous control tasks. The source code is available at https://github.com/pfnet-research/capg.
Tasks	Continuous Control, Policy Gradient Methods
Published	2018-02-21
URL	http://arxiv.org/abs/1802.07564v2
PDF	http://arxiv.org/pdf/1802.07564v2.pdf
PWC	https://paperswithcode.com/paper/clipped-action-policy-gradient
Repo	https://github.com/pfnet-research/capg
Framework	none

Augmented CycleGAN: Learning Many-to-Many Mappings from Unpaired Data


Title	Augmented CycleGAN: Learning Many-to-Many Mappings from Unpaired Data
Authors	Amjad Almahairi, Sai Rajeswar, Alessandro Sordoni, Philip Bachman, Aaron Courville
Abstract	Learning inter-domain mappings from unpaired data can improve performance in structured prediction tasks, such as image segmentation, by reducing the need for paired data. CycleGAN was recently proposed for this problem, but critically assumes the underlying inter-domain mapping is approximately deterministic and one-to-one. This assumption renders the model ineffective for tasks requiring flexible, many-to-many mappings. We propose a new model, called Augmented CycleGAN, which learns many-to-many mappings between domains. We examine Augmented CycleGAN qualitatively and quantitatively on several image datasets.
Tasks	Semantic Segmentation, Structured Prediction
Published	2018-02-27
URL	http://arxiv.org/abs/1802.10151v2
PDF	http://arxiv.org/pdf/1802.10151v2.pdf
PWC	https://paperswithcode.com/paper/augmented-cyclegan-learning-many-to-many
Repo	https://github.com/NathanDeMaria/AugmentedCycleGAN
Framework	pytorch

Learning cross-lingual phonological and orthagraphic adaptations: a case study in improving neural machine translation between low-resource languages


Title	Learning cross-lingual phonological and orthagraphic adaptations: a case study in improving neural machine translation between low-resource languages
Authors	Saurav Jha, Akhilesh Sudhakar, Anil Kumar Singh
Abstract	Out-of-vocabulary (OOV) words can pose serious challenges for machine translation (MT) tasks, and in particular, for low-resource language (LRL) pairs, i.e., language pairs for which few or no parallel corpora exist. Our work adapts variants of seq2seq models to perform transduction of such words from Hindi to Bhojpuri (an LRL instance), learning from a set of cognate pairs built from a bilingual dictionary of Hindi–Bhojpuri words. We demonstrate that our models can be effectively used for language pairs that have limited parallel corpora; our models work at the character level to grasp phonetic and orthographic similarities across multiple types of word adaptations, whether synchronic or diachronic, loan words or cognates. We describe the training aspects of several character level NMT systems that we adapted to this task and characterize their typical errors. Our method improves BLEU score by 6.3 on the Hindi-to-Bhojpuri translation task. Further, we show that such transductions can generalize well to other languages by applying it successfully to Hindi – Bangla cognate pairs. Our work can be seen as an important step in the process of: (i) resolving the OOV words problem arising in MT tasks, (ii) creating effective parallel corpora for resource-constrained languages, and (iii) leveraging the enhanced semantic knowledge captured by word-level embeddings to perform character-level tasks.
Tasks	Machine Translation
Published	2018-11-21
URL	https://arxiv.org/abs/1811.08816v2
PDF	https://arxiv.org/pdf/1811.08816v2.pdf
PWC	https://paperswithcode.com/paper/neural-machine-translation-based-word
Repo	https://github.com/Saurav0074/nmt-based-word-transduction
Framework	tf

Bilateral Adversarial Training: Towards Fast Training of More Robust Models Against Adversarial Attacks


Title	Bilateral Adversarial Training: Towards Fast Training of More Robust Models Against Adversarial Attacks
Authors	Jianyu Wang, Haichao Zhang
Abstract	In this paper, we study fast training of adversarially robust models. From the analyses of the state-of-the-art defense method, i.e., the multi-step adversarial training, we hypothesize that the gradient magnitude links to the model robustness. Motivated by this, we propose to perturb both the image and the label during training, which we call Bilateral Adversarial Training (BAT). To generate the adversarial label, we derive an closed-form heuristic solution. To generate the adversarial image, we use one-step targeted attack with the target label being the most confusing class. In the experiment, we first show that random start and the most confusing target attack effectively prevent the label leaking and gradient masking problem. Then coupled with the adversarial label part, our model significantly improves the state-of-the-art results. For example, against PGD100 white-box attack with cross-entropy loss, on CIFAR10, we achieve 63.7% versus 47.2%; on SVHN, we achieve 59.1% versus 42.1%. At last, the experiment on the very (computationally) challenging ImageNet dataset further demonstrates the effectiveness of our fast method.
Tasks
Published	2018-11-26
URL	https://arxiv.org/abs/1811.10716v2
PDF	https://arxiv.org/pdf/1811.10716v2.pdf
PWC	https://paperswithcode.com/paper/bilateral-adversarial-training-towards-fast
Repo	https://github.com/Line290/FeatureAttack
Framework	pytorch

Continuous-time Models for Stochastic Optimization Algorithms


Title	Continuous-time Models for Stochastic Optimization Algorithms
Authors	Antonio Orvieto, Aurelien Lucchi
Abstract	We propose new continuous-time formulations for first-order stochastic optimization algorithms such as mini-batch gradient descent and variance-reduced methods. We exploit these continuous-time models, together with simple Lyapunov analysis as well as tools from stochastic calculus, in order to derive convergence bounds for various types of non-convex functions. Guided by such analysis, we show that the same Lyapunov arguments hold in discrete-time, leading to matching rates. In addition, we use these models and Ito calculus to infer novel insights on the dynamics of SGD, proving that a decreasing learning rate acts as time warping or, equivalently, as landscape stretching.
Tasks	Stochastic Optimization
Published	2018-10-05
URL	https://arxiv.org/abs/1810.02565v3
PDF	https://arxiv.org/pdf/1810.02565v3.pdf
PWC	https://paperswithcode.com/paper/continuous-time-models-for-stochastic
Repo	https://github.com/aorvieto/SGD-SVRG-models
Framework	none