Paper Group AWR 127
Temporal Gaussian Mixture Layer for Videos. Hyperspherical Variational Auto-Encoders. Conditional Linear Regression. Efficient Model-Free Reinforcement Learning Using Gaussian Process. Neural Processes. Deep Underwater Image Enhancement. Comparing Dynamics: Deep Neural Networks versus Glassy Systems. End-to-End Speech Recognition From the Raw Wavef …
Temporal Gaussian Mixture Layer for Videos
Title | Temporal Gaussian Mixture Layer for Videos |
Authors | AJ Piergiovanni, Michael S. Ryoo |
Abstract | We introduce a new convolutional layer named the Temporal Gaussian Mixture (TGM) layer and present how it can be used to efficiently capture longer-term temporal information in continuous activity videos. The TGM layer is a temporal convolutional layer governed by a much smaller set of parameters (e.g., location/variance of Gaussians) that are fully differentiable. We present our fully convolutional video models with multiple TGM layers for activity detection. The extensive experiments on multiple datasets, including Charades and MultiTHUMOS, confirm the effectiveness of TGM layers, significantly outperforming the state-of-the-arts. |
Tasks | Action Detection, Activity Detection |
Published | 2018-03-16 |
URL | https://arxiv.org/abs/1803.06316v6 |
https://arxiv.org/pdf/1803.06316v6.pdf | |
PWC | https://paperswithcode.com/paper/temporal-gaussian-mixture-layer-for-videos |
Repo | https://github.com/piergiaj/evanet-iccv19 |
Framework | tf |
Hyperspherical Variational Auto-Encoders
Title | Hyperspherical Variational Auto-Encoders |
Authors | Tim R. Davidson, Luca Falorsi, Nicola De Cao, Thomas Kipf, Jakub M. Tomczak |
Abstract | The Variational Auto-Encoder (VAE) is one of the most used unsupervised machine learning models. But although the default choice of a Gaussian distribution for both the prior and posterior represents a mathematically convenient distribution often leading to competitive results, we show that this parameterization fails to model data with a latent hyperspherical structure. To address this issue we propose using a von Mises-Fisher (vMF) distribution instead, leading to a hyperspherical latent space. Through a series of experiments we show how such a hyperspherical VAE, or $\mathcal{S}$-VAE, is more suitable for capturing data with a hyperspherical latent structure, while outperforming a normal, $\mathcal{N}$-VAE, in low dimensions on other data types. |
Tasks | |
Published | 2018-04-03 |
URL | http://arxiv.org/abs/1804.00891v2 |
http://arxiv.org/pdf/1804.00891v2.pdf | |
PWC | https://paperswithcode.com/paper/hyperspherical-variational-auto-encoders |
Repo | https://github.com/nicola-decao/s-vae |
Framework | tf |
Conditional Linear Regression
Title | Conditional Linear Regression |
Authors | Diego Calderon, Brendan Juba, Sirui Li, Zongyi Li, Lisa Ruan |
Abstract | Work in machine learning and statistics commonly focuses on building models that capture the vast majority of data, possibly ignoring a segment of the population as outliers. However, there does not often exist a good model on the whole dataset, so we seek to find a small subset where there exists a useful model. We are interested in finding a linear rule capable of achieving more accurate predictions for just a segment of the population. We give an efficient algorithm with theoretical analysis for the conditional linear regression task, which is the joint task of identifying a significant segment of the population, described by a k-DNF, along with its linear regression fit. |
Tasks | |
Published | 2018-06-06 |
URL | https://arxiv.org/abs/1806.02326v2 |
https://arxiv.org/pdf/1806.02326v2.pdf | |
PWC | https://paperswithcode.com/paper/conditional-linear-regression |
Repo | https://github.com/wumming/lud |
Framework | none |
Efficient Model-Free Reinforcement Learning Using Gaussian Process
Title | Efficient Model-Free Reinforcement Learning Using Gaussian Process |
Authors | Ying Fan, Letian Chen, Yizhou Wang |
Abstract | Efficient Reinforcement Learning usually takes advantage of demonstration or good exploration strategy. By applying posterior sampling in model-free RL under the hypothesis of GP, we propose Gaussian Process Posterior Sampling Reinforcement Learning(GPPSTD) algorithm in continuous state space, giving theoretical justifications and empirical results. We also provide theoretical and empirical results that various demonstration could lower expected uncertainty and benefit posterior sampling exploration. In this way, we combined the demonstration and exploration process together to achieve a more efficient reinforcement learning. |
Tasks | |
Published | 2018-12-11 |
URL | http://arxiv.org/abs/1812.04359v1 |
http://arxiv.org/pdf/1812.04359v1.pdf | |
PWC | https://paperswithcode.com/paper/efficient-model-free-reinforcement-learning |
Repo | https://github.com/Eunice330/model-based-RL |
Framework | pytorch |
Neural Processes
Title | Neural Processes |
Authors | Marta Garnelo, Jonathan Schwarz, Dan Rosenbaum, Fabio Viola, Danilo J. Rezende, S. M. Ali Eslami, Yee Whye Teh |
Abstract | A neural network (NN) is a parameterised function that can be tuned via gradient descent to approximate a labelled collection of data with high precision. A Gaussian process (GP), on the other hand, is a probabilistic model that defines a distribution over possible functions, and is updated in light of data via the rules of probabilistic inference. GPs are probabilistic, data-efficient and flexible, however they are also computationally intensive and thus limited in their applicability. We introduce a class of neural latent variable models which we call Neural Processes (NPs), combining the best of both worlds. Like GPs, NPs define distributions over functions, are capable of rapid adaptation to new observations, and can estimate the uncertainty in their predictions. Like NNs, NPs are computationally efficient during training and evaluation but also learn to adapt their priors to data. We demonstrate the performance of NPs on a range of learning tasks, including regression and optimisation, and compare and contrast with related models in the literature. |
Tasks | Latent Variable Models |
Published | 2018-07-04 |
URL | http://arxiv.org/abs/1807.01622v1 |
http://arxiv.org/pdf/1807.01622v1.pdf | |
PWC | https://paperswithcode.com/paper/neural-processes |
Repo | https://github.com/Arnaud15/CS236_Neural_Processes_For_Image_Completion |
Framework | pytorch |
Deep Underwater Image Enhancement
Title | Deep Underwater Image Enhancement |
Authors | Saeed Anwar, Chongyi Li, Fatih Porikli |
Abstract | In an underwater scene, wavelength-dependent light absorption and scattering degrade the visibility of images, causing low contrast and distorted color casts. To address this problem, we propose a convolutional neural network based image enhancement model, i.e., UWCNN, which is trained efficiently using a synthetic underwater image database. Unlike the existing works that require the parameters of underwater imaging model estimation or impose inflexible frameworks applicable only for specific scenes, our model directly reconstructs the clear latent underwater image by leveraging on an automatic end-to-end and data-driven training mechanism. Compliant with underwater imaging models and optical properties of underwater scenes, we first synthesize ten different marine image databases. Then, we separately train multiple UWCNN models for each underwater image formation type. Experimental results on real-world and synthetic underwater images demonstrate that the presented method generalizes well on different underwater scenes and outperforms the existing methods both qualitatively and quantitatively. Besides, we conduct an ablation study to demonstrate the effect of each component in our network. |
Tasks | Image Enhancement |
Published | 2018-07-10 |
URL | http://arxiv.org/abs/1807.03528v1 |
http://arxiv.org/pdf/1807.03528v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-underwater-image-enhancement |
Repo | https://github.com/saeed-anwar/UWCNN |
Framework | tf |
Comparing Dynamics: Deep Neural Networks versus Glassy Systems
Title | Comparing Dynamics: Deep Neural Networks versus Glassy Systems |
Authors | M. Baity-Jesi, L. Sagun, M. Geiger, S. Spigler, G. Ben Arous, C. Cammarota, Y. LeCun, M. Wyart, G. Biroli |
Abstract | We analyze numerically the training dynamics of deep neural networks (DNN) by using methods developed in statistical physics of glassy systems. The two main issues we address are (1) the complexity of the loss landscape and of the dynamics within it, and (2) to what extent DNNs share similarities with glassy systems. Our findings, obtained for different architectures and datasets, suggest that during the training process the dynamics slows down because of an increasingly large number of flat directions. At large times, when the loss is approaching zero, the system diffuses at the bottom of the landscape. Despite some similarities with the dynamics of mean-field glassy systems, in particular, the absence of barrier crossing, we find distinctive dynamical behaviors in the two cases, showing that the statistical properties of the corresponding loss and energy landscapes are different. In contrast, when the network is under-parametrized we observe a typical glassy behavior, thus suggesting the existence of different phases depending on whether the network is under-parametrized or over-parametrized. |
Tasks | |
Published | 2018-03-19 |
URL | http://arxiv.org/abs/1803.06969v2 |
http://arxiv.org/pdf/1803.06969v2.pdf | |
PWC | https://paperswithcode.com/paper/comparing-dynamics-deep-neural-networks-1 |
Repo | https://github.com/mbaityje/DEEP-GLASS |
Framework | pytorch |
End-to-End Speech Recognition From the Raw Waveform
Title | End-to-End Speech Recognition From the Raw Waveform |
Authors | Neil Zeghidour, Nicolas Usunier, Gabriel Synnaeve, Ronan Collobert, Emmanuel Dupoux |
Abstract | State-of-the-art speech recognition systems rely on fixed, hand-crafted features such as mel-filterbanks to preprocess the waveform before the training pipeline. In this paper, we study end-to-end systems trained directly from the raw waveform, building on two alternatives for trainable replacements of mel-filterbanks that use a convolutional architecture. The first one is inspired by gammatone filterbanks (Hoshen et al., 2015; Sainath et al, 2015), and the second one by the scattering transform (Zeghidour et al., 2017). We propose two modifications to these architectures and systematically compare them to mel-filterbanks, on the Wall Street Journal dataset. The first modification is the addition of an instance normalization layer, which greatly improves on the gammatone-based trainable filterbanks and speeds up the training of the scattering-based filterbanks. The second one relates to the low-pass filter used in these approaches. These modifications consistently improve performances for both approaches, and remove the need for a careful initialization in scattering-based trainable filterbanks. In particular, we show a consistent improvement in word error rate of the trainable filterbanks relatively to comparable mel-filterbanks. It is the first time end-to-end models trained from the raw signal significantly outperform mel-filterbanks on a large vocabulary task under clean recording conditions. |
Tasks | End-To-End Speech Recognition, Speech Recognition |
Published | 2018-06-19 |
URL | http://arxiv.org/abs/1806.07098v2 |
http://arxiv.org/pdf/1806.07098v2.pdf | |
PWC | https://paperswithcode.com/paper/end-to-end-speech-recognition-from-the-raw |
Repo | https://github.com/renyuanL/ry-Speech-commands |
Framework | tf |
LeapsAndBounds: A Method for Approximately Optimal Algorithm Configuration
Title | LeapsAndBounds: A Method for Approximately Optimal Algorithm Configuration |
Authors | Gellért Weisz, András György, Csaba Szepesvári |
Abstract | We consider the problem of configuring general-purpose solvers to run efficiently on problem instances drawn from an unknown distribution. The goal of the configurator is to find a configuration that runs fast on average on most instances, and do so with the least amount of total work. It can run a chosen solver on a random instance until the solver finishes or a timeout is reached. We propose LeapsAndBounds, an algorithm that tests configurations on randomly selected problem instances for longer and longer time. We prove that the capped expected runtime of the configuration returned by LeapsAndBounds is close to the optimal expected runtime, while our algorithm’s running time is near-optimal. Our results show that LeapsAndBounds is more efficient than the recent algorithm of Kleinberg et al. (2017), which, to our knowledge, is the only other algorithm configuration method with non-trivial theoretical guarantees. Experimental results on configuring a public SAT solver on a new benchmark dataset also stand witness to the superiority of our method. |
Tasks | |
Published | 2018-07-02 |
URL | http://arxiv.org/abs/1807.00755v1 |
http://arxiv.org/pdf/1807.00755v1.pdf | |
PWC | https://paperswithcode.com/paper/leapsandbounds-a-method-for-approximately |
Repo | https://github.com/drgrhm/alg_config |
Framework | none |
On the Power of Over-parametrization in Neural Networks with Quadratic Activation
Title | On the Power of Over-parametrization in Neural Networks with Quadratic Activation |
Authors | Simon S. Du, Jason D. Lee |
Abstract | We provide new theoretical insights on why over-parametrization is effective in learning neural networks. For a $k$ hidden node shallow network with quadratic activation and $n$ training data points, we show as long as $ k \ge \sqrt{2n}$, over-parametrization enables local search algorithms to find a \emph{globally} optimal solution for general smooth and convex loss functions. Further, despite that the number of parameters may exceed the sample size, using theory of Rademacher complexity, we show with weight decay, the solution also generalizes well if the data is sampled from a regular distribution such as Gaussian. To prove when $k\ge \sqrt{2n}$, the loss function has benign landscape properties, we adopt an idea from smoothed analysis, which may have other applications in studying loss surfaces of neural networks. |
Tasks | |
Published | 2018-03-03 |
URL | http://arxiv.org/abs/1803.01206v2 |
http://arxiv.org/pdf/1803.01206v2.pdf | |
PWC | https://paperswithcode.com/paper/on-the-power-of-over-parametrization-in-1 |
Repo | https://github.com/Clumsyndicate/One_layer_analysis_network |
Framework | tf |
Clipped Action Policy Gradient
Title | Clipped Action Policy Gradient |
Authors | Yasuhiro Fujita, Shin-ichi Maeda |
Abstract | Many continuous control tasks have bounded action spaces. When policy gradient methods are applied to such tasks, out-of-bound actions need to be clipped before execution, while policies are usually optimized as if the actions are not clipped. We propose a policy gradient estimator that exploits the knowledge of actions being clipped to reduce the variance in estimation. We prove that our estimator, named clipped action policy gradient (CAPG), is unbiased and achieves lower variance than the conventional estimator that ignores action bounds. Experimental results demonstrate that CAPG generally outperforms the conventional estimator, indicating that it is a better policy gradient estimator for continuous control tasks. The source code is available at https://github.com/pfnet-research/capg. |
Tasks | Continuous Control, Policy Gradient Methods |
Published | 2018-02-21 |
URL | http://arxiv.org/abs/1802.07564v2 |
http://arxiv.org/pdf/1802.07564v2.pdf | |
PWC | https://paperswithcode.com/paper/clipped-action-policy-gradient |
Repo | https://github.com/pfnet-research/capg |
Framework | none |
Augmented CycleGAN: Learning Many-to-Many Mappings from Unpaired Data
Title | Augmented CycleGAN: Learning Many-to-Many Mappings from Unpaired Data |
Authors | Amjad Almahairi, Sai Rajeswar, Alessandro Sordoni, Philip Bachman, Aaron Courville |
Abstract | Learning inter-domain mappings from unpaired data can improve performance in structured prediction tasks, such as image segmentation, by reducing the need for paired data. CycleGAN was recently proposed for this problem, but critically assumes the underlying inter-domain mapping is approximately deterministic and one-to-one. This assumption renders the model ineffective for tasks requiring flexible, many-to-many mappings. We propose a new model, called Augmented CycleGAN, which learns many-to-many mappings between domains. We examine Augmented CycleGAN qualitatively and quantitatively on several image datasets. |
Tasks | Semantic Segmentation, Structured Prediction |
Published | 2018-02-27 |
URL | http://arxiv.org/abs/1802.10151v2 |
http://arxiv.org/pdf/1802.10151v2.pdf | |
PWC | https://paperswithcode.com/paper/augmented-cyclegan-learning-many-to-many |
Repo | https://github.com/NathanDeMaria/AugmentedCycleGAN |
Framework | pytorch |
Learning cross-lingual phonological and orthagraphic adaptations: a case study in improving neural machine translation between low-resource languages
Title | Learning cross-lingual phonological and orthagraphic adaptations: a case study in improving neural machine translation between low-resource languages |
Authors | Saurav Jha, Akhilesh Sudhakar, Anil Kumar Singh |
Abstract | Out-of-vocabulary (OOV) words can pose serious challenges for machine translation (MT) tasks, and in particular, for low-resource language (LRL) pairs, i.e., language pairs for which few or no parallel corpora exist. Our work adapts variants of seq2seq models to perform transduction of such words from Hindi to Bhojpuri (an LRL instance), learning from a set of cognate pairs built from a bilingual dictionary of Hindi–Bhojpuri words. We demonstrate that our models can be effectively used for language pairs that have limited parallel corpora; our models work at the character level to grasp phonetic and orthographic similarities across multiple types of word adaptations, whether synchronic or diachronic, loan words or cognates. We describe the training aspects of several character level NMT systems that we adapted to this task and characterize their typical errors. Our method improves BLEU score by 6.3 on the Hindi-to-Bhojpuri translation task. Further, we show that such transductions can generalize well to other languages by applying it successfully to Hindi – Bangla cognate pairs. Our work can be seen as an important step in the process of: (i) resolving the OOV words problem arising in MT tasks, (ii) creating effective parallel corpora for resource-constrained languages, and (iii) leveraging the enhanced semantic knowledge captured by word-level embeddings to perform character-level tasks. |
Tasks | Machine Translation |
Published | 2018-11-21 |
URL | https://arxiv.org/abs/1811.08816v2 |
https://arxiv.org/pdf/1811.08816v2.pdf | |
PWC | https://paperswithcode.com/paper/neural-machine-translation-based-word |
Repo | https://github.com/Saurav0074/nmt-based-word-transduction |
Framework | tf |
Bilateral Adversarial Training: Towards Fast Training of More Robust Models Against Adversarial Attacks
Title | Bilateral Adversarial Training: Towards Fast Training of More Robust Models Against Adversarial Attacks |
Authors | Jianyu Wang, Haichao Zhang |
Abstract | In this paper, we study fast training of adversarially robust models. From the analyses of the state-of-the-art defense method, i.e., the multi-step adversarial training, we hypothesize that the gradient magnitude links to the model robustness. Motivated by this, we propose to perturb both the image and the label during training, which we call Bilateral Adversarial Training (BAT). To generate the adversarial label, we derive an closed-form heuristic solution. To generate the adversarial image, we use one-step targeted attack with the target label being the most confusing class. In the experiment, we first show that random start and the most confusing target attack effectively prevent the label leaking and gradient masking problem. Then coupled with the adversarial label part, our model significantly improves the state-of-the-art results. For example, against PGD100 white-box attack with cross-entropy loss, on CIFAR10, we achieve 63.7% versus 47.2%; on SVHN, we achieve 59.1% versus 42.1%. At last, the experiment on the very (computationally) challenging ImageNet dataset further demonstrates the effectiveness of our fast method. |
Tasks | |
Published | 2018-11-26 |
URL | https://arxiv.org/abs/1811.10716v2 |
https://arxiv.org/pdf/1811.10716v2.pdf | |
PWC | https://paperswithcode.com/paper/bilateral-adversarial-training-towards-fast |
Repo | https://github.com/Line290/FeatureAttack |
Framework | pytorch |
Continuous-time Models for Stochastic Optimization Algorithms
Title | Continuous-time Models for Stochastic Optimization Algorithms |
Authors | Antonio Orvieto, Aurelien Lucchi |
Abstract | We propose new continuous-time formulations for first-order stochastic optimization algorithms such as mini-batch gradient descent and variance-reduced methods. We exploit these continuous-time models, together with simple Lyapunov analysis as well as tools from stochastic calculus, in order to derive convergence bounds for various types of non-convex functions. Guided by such analysis, we show that the same Lyapunov arguments hold in discrete-time, leading to matching rates. In addition, we use these models and Ito calculus to infer novel insights on the dynamics of SGD, proving that a decreasing learning rate acts as time warping or, equivalently, as landscape stretching. |
Tasks | Stochastic Optimization |
Published | 2018-10-05 |
URL | https://arxiv.org/abs/1810.02565v3 |
https://arxiv.org/pdf/1810.02565v3.pdf | |
PWC | https://paperswithcode.com/paper/continuous-time-models-for-stochastic |
Repo | https://github.com/aorvieto/SGD-SVRG-models |
Framework | none |