Paper Group ANR 787
On Generalization Error Bounds of Noisy Gradient Methods for Non-Convex Learning. Ordinal Monte Carlo Tree Search. Improved Variational Neural Machine Translation by Promoting Mutual Information. Representing Formal Languages: A Comparison Between Finite Automata and Recurrent Neural Networks. Replication-based emulation of the response distributio …
On Generalization Error Bounds of Noisy Gradient Methods for Non-Convex Learning
Title | On Generalization Error Bounds of Noisy Gradient Methods for Non-Convex Learning |
Authors | Jian Li, Xuanyuan Luo, Mingda Qiao |
Abstract | Generalization error (also known as the out-of-sample error) measures how well the hypothesis learned from training data generalizes to previously unseen data. Proving tight generalization error bounds is a central question in statistical learning theory. In this paper, we obtain generalization error bounds for learning general non-convex objectives, which has attracted significant attention in recent years. We develop a new framework, termed Bayes-Stability, for proving algorithm-dependent generalization error bounds. The new framework combines ideas from both the PAC-Bayesian theory and the notion of algorithmic stability. Applying the Bayes-Stability method, we obtain new data-dependent generalization bounds for stochastic gradient Langevin dynamics (SGLD) and several other noisy gradient methods (e.g., with momentum, mini-batch and acceleration, Entropy-SGD). Our result recovers (and is typically tighter than) a recent result in Mou et al. (2018) and improves upon the results in Pensia et al. (2018). Our experiments demonstrate that our data-dependent bounds can distinguish randomly labelled data from normal data, which provides an explanation to the intriguing phenomena observed in Zhang et al. (2017a). We also study the setting where the total loss is the sum of a bounded loss and an additional \ell_2 regularization term. We obtain new generalization bounds for the continuous Langevin dynamic in this setting by developing a new Log-Sobolev inequality for the parameter distribution at any time. Our new bounds are more desirable when the noisy level of the process is not small, and do not become vacuous even when T tends to infinity. |
Tasks | |
Published | 2019-02-02 |
URL | https://arxiv.org/abs/1902.00621v4 |
https://arxiv.org/pdf/1902.00621v4.pdf | |
PWC | https://paperswithcode.com/paper/on-generalization-error-bounds-of-noisy |
Repo | |
Framework | |
Ordinal Monte Carlo Tree Search
Title | Ordinal Monte Carlo Tree Search |
Authors | Tobias Joppen, Johannes Fürnkranz |
Abstract | In many problem settings, most notably in game playing, an agent receives a possibly delayed reward for its actions. Often, those rewards are handcrafted and not naturally given. Even simple terminal-only rewards, like winning equals 1 and losing equals -1, can not be seen as an unbiased statement, since these values are chosen arbitrarily, and the behavior of the learner may change with different encodings, such as setting the value of a loss to -0:5, which is often done in practice to encourage learning. It is hard to argue about good rewards and the performance of an agent often depends on the design of the reward signal. In particular, in domains where states by nature only have an ordinal ranking and where meaningful distance information between game state values are not available, a numerical reward signal is necessarily biased. In this paper, we take a look at Monte Carlo Tree Search (MCTS), a popular algorithm to solve MDPs, highlight a reoccurring problem concerning its use of rewards, and show that an ordinal treatment of the rewards overcomes this problem. Using the General Video Game Playing framework we show a dominance of our newly proposed ordinal MCTS algorithm over preference-based MCTS, vanilla MCTS and various other MCTS variants. |
Tasks | |
Published | 2019-01-14 |
URL | http://arxiv.org/abs/1901.04274v1 |
http://arxiv.org/pdf/1901.04274v1.pdf | |
PWC | https://paperswithcode.com/paper/ordinal-monte-carlo-tree-search |
Repo | |
Framework | |
Improved Variational Neural Machine Translation by Promoting Mutual Information
Title | Improved Variational Neural Machine Translation by Promoting Mutual Information |
Authors | Arya D. McCarthy, Xian Li, Jiatao Gu, Ning Dong |
Abstract | Posterior collapse plagues VAEs for text, especially for conditional text generation with strong autoregressive decoders. In this work, we address this problem in variational neural machine translation by explicitly promoting mutual information between the latent variables and the data. Our model extends the conditional variational autoencoder (CVAE) with two new ingredients: first, we propose a modified evidence lower bound (ELBO) objective which explicitly promotes mutual information; second, we regularize the probabilities of the decoder by mixing an auxiliary factorized distribution which is directly predicted by the latent variables. We present empirical results on the Transformer architecture and show the proposed model effectively addressed posterior collapse: latent variables are no longer ignored in the presence of powerful decoder. As a result, the proposed model yields improved translation quality while demonstrating superior performance in terms of data efficiency and robustness. |
Tasks | Machine Translation, Text Generation |
Published | 2019-09-19 |
URL | https://arxiv.org/abs/1909.09237v1 |
https://arxiv.org/pdf/1909.09237v1.pdf | |
PWC | https://paperswithcode.com/paper/improved-variational-neural-machine |
Repo | |
Framework | |
Representing Formal Languages: A Comparison Between Finite Automata and Recurrent Neural Networks
Title | Representing Formal Languages: A Comparison Between Finite Automata and Recurrent Neural Networks |
Authors | Joshua J. Michalenko, Ameesh Shah, Abhinav Verma, Richard G. Baraniuk, Swarat Chaudhuri, Ankit B. Patel |
Abstract | We investigate the internal representations that a recurrent neural network (RNN) uses while learning to recognize a regular formal language. Specifically, we train a RNN on positive and negative examples from a regular language, and ask if there is a simple decoding function that maps states of this RNN to states of the minimal deterministic finite automaton (MDFA) for the language. Our experiments show that such a decoding function indeed exists, and that it maps states of the RNN not to MDFA states, but to states of an {\em abstraction} obtained by clustering small sets of MDFA states into “superstates”. A qualitative analysis reveals that the abstraction often has a simple interpretation. Overall, the results suggest a strong structural relationship between internal representations used by RNNs and finite automata, and explain the well-known ability of RNNs to recognize formal grammatical structure. |
Tasks | |
Published | 2019-02-27 |
URL | http://arxiv.org/abs/1902.10297v1 |
http://arxiv.org/pdf/1902.10297v1.pdf | |
PWC | https://paperswithcode.com/paper/representing-formal-languages-a-comparison |
Repo | |
Framework | |
Replication-based emulation of the response distribution of stochastic simulators using generalized lambda distributions
Title | Replication-based emulation of the response distribution of stochastic simulators using generalized lambda distributions |
Authors | X. Zhu, B. Sudret |
Abstract | Due to limited computational power, performing uncertainty quantification analyses with complex computational models can be a challenging task. This is exacerbated in the context of stochastic simulators, the response of which to a given set of input parameters, rather than being a deterministic value, is a random variable with unknown probability density function (PDF). Of interest in this paper is the construction of a surrogate that can accurately predict this response PDF for any input parameters. We suggest using a flexible distribution family – the generalized lambda distribution – to approximate the response PDF. The associated distribution parameters are cast as functions of input parameters and represented by sparse polynomial chaos expansions. To build such a surrogate model, we propose an approach based on a local inference of the response PDF at each point of the experimental design based on replicated model evaluations. Two versions of this framework are proposed and compared on analytical examples and case studies. |
Tasks | |
Published | 2019-11-20 |
URL | https://arxiv.org/abs/1911.09067v1 |
https://arxiv.org/pdf/1911.09067v1.pdf | |
PWC | https://paperswithcode.com/paper/replication-based-emulation-of-the-response |
Repo | |
Framework | |
From Balustrades to Pierre Vinken: Looking for Syntax in Transformer Self-Attentions
Title | From Balustrades to Pierre Vinken: Looking for Syntax in Transformer Self-Attentions |
Authors | David Mareček, Rudolf Rosa |
Abstract | We inspect the multi-head self-attention in Transformer NMT encoders for three source languages, looking for patterns that could have a syntactic interpretation. In many of the attention heads, we frequently find sequences of consecutive states attending to the same position, which resemble syntactic phrases. We propose a transparent deterministic method of quantifying the amount of syntactic information present in the self-attentions, based on automatically building and evaluating phrase-structure trees from the phrase-like sequences. We compare the resulting trees to existing constituency treebanks, both manually and by computing precision and recall. |
Tasks | |
Published | 2019-06-05 |
URL | https://arxiv.org/abs/1906.01958v1 |
https://arxiv.org/pdf/1906.01958v1.pdf | |
PWC | https://paperswithcode.com/paper/from-balustrades-to-pierre-vinken-looking-for |
Repo | |
Framework | |
Improving Neural Architecture Search Image Classifiers via Ensemble Learning
Title | Improving Neural Architecture Search Image Classifiers via Ensemble Learning |
Authors | Vladimir Macko, Charles Weill, Hanna Mazzawi, Javier Gonzalvo |
Abstract | Finding the best neural network architecture requires significant time, resources, and human expertise. These challenges are partially addressed by neural architecture search (NAS) which is able to find the best convolutional layer or cell that is then used as a building block for the network. However, once a good building block is found, manual design is still required to assemble the final architecture as a combination of multiple blocks under a predefined parameter budget constraint. A common solution is to stack these blocks into a single tower and adjust the width and depth to fill the parameter budget. However, these single tower architectures may not be optimal. Instead, in this paper we present the AdaNAS algorithm, that uses ensemble techniques to compose a neural network as an ensemble of smaller networks automatically. Additionally, we introduce a novel technique based on knowledge distillation to iteratively train the smaller networks using the previous ensemble as a teacher. Our experiments demonstrate that ensembles of networks improve accuracy upon a single neural network while keeping the same number of parameters. Our models achieve comparable results with the state-of-the-art on CIFAR-10 and sets a new state-of-the-art on CIFAR-100. |
Tasks | Image Classification, Neural Architecture Search |
Published | 2019-03-14 |
URL | http://arxiv.org/abs/1903.06236v1 |
http://arxiv.org/pdf/1903.06236v1.pdf | |
PWC | https://paperswithcode.com/paper/improving-neural-architecture-search-image |
Repo | |
Framework | |
Spatio-Temporal Adversarial Learning for Detecting Unseen Falls
Title | Spatio-Temporal Adversarial Learning for Detecting Unseen Falls |
Authors | Shehroz S. Khan, Jacob Nogas, Alex Mihailidis |
Abstract | Fall detection is an important problem from both the health and machine learning perspective. A fall can lead to severe injuries, long term impairments or even death in some cases. In terms of machine learning, it presents a severely class imbalance problem with very few or no training data for falls owing to the fact that falls occur rarely. In this paper, we take an alternate philosophy to detect falls in the absence of their training data, by training the classifier on only the normal activities (that are available in abundance) and identifying a fall as an anomaly. To realize such a classifier, we use an adversarial learning framework, which comprises of a spatio-temporal autoencoder for reconstructing input video frames and a spatio-temporal convolution network to discriminate them against original video frames. 3D convolutions are used to learn spatial and temporal features from the input video frames. The adversarial learning of the spatio-temporal autoencoder will enable reconstructing the normal activities of daily living efficiently; thus, rendering detecting unseen falls plausible within this framework. We tested the performance of the proposed framework on camera sensing modalities that may preserve an individual’s privacy (fully or partially), such as thermal and depth camera. Our results on three publicly available datasets show that the proposed spatio-temporal adversarial framework performed better than other baseline frame based (or spatial) adversarial learning methods. |
Tasks | |
Published | 2019-05-19 |
URL | https://arxiv.org/abs/1905.07817v2 |
https://arxiv.org/pdf/1905.07817v2.pdf | |
PWC | https://paperswithcode.com/paper/spatio-temporal-adversarial-learning-for |
Repo | |
Framework | |
EdgeNet: Semantic Scene Completion from RGB-D images
Title | EdgeNet: Semantic Scene Completion from RGB-D images |
Authors | Aloisio Dourado, Teofilo Emidio de Campos, Hansung Kim, Adrian Hilton |
Abstract | Semantic scene completion is the task of predicting a complete 3D representation of volumetric occupancy with corresponding semantic labels for a scene from a single point of view. Previous works on Semantic Scene Completion from RGB-D data used either only depth or depth with colour by projecting the 2D image into the 3D volume resulting in a sparse data representation. In this work, we present a new strategy to encode colour information in 3D space using edge detection and flipped truncated signed distance. We also present EdgeNet, a new end-to-end neural network architecture capable of handling features generated from the fusion of depth and edge information. Experimental results show improvement of 6.9% over the state-of-the-art result on real data, for end-to-end approaches. |
Tasks | Edge Detection |
Published | 2019-08-08 |
URL | https://arxiv.org/abs/1908.02893v1 |
https://arxiv.org/pdf/1908.02893v1.pdf | |
PWC | https://paperswithcode.com/paper/edgenet-semantic-scene-completion-from-rgb-d |
Repo | |
Framework | |
Quality Aware Generative Adversarial Networks
Title | Quality Aware Generative Adversarial Networks |
Authors | Parimala Kancharla, Sumohana S. Channappayya |
Abstract | Generative Adversarial Networks (GANs) have become a very popular tool for implicitly learning high-dimensional probability distributions. Several improvements have been made to the original GAN formulation to address some of its shortcomings like mode collapse, convergence issues, entanglement, poor visual quality etc. While a significant effort has been directed towards improving the visual quality of images generated by GANs, it is rather surprising that objective image quality metrics have neither been employed as cost functions nor as regularizers in GAN objective functions. In this work, we show how a distance metric that is a variant of the Structural SIMilarity (SSIM) index (a popular full-reference image quality assessment algorithm), and a novel quality aware discriminator gradient penalty function that is inspired by the Natural Image Quality Evaluator (NIQE, a popular no-reference image quality assessment algorithm) can each be used as excellent regularizers for GAN objective functions. Specifically, we demonstrate state-of-the-art performance using the Wasserstein GAN gradient penalty (WGAN-GP) framework over CIFAR-10, STL10 and CelebA datasets. |
Tasks | Image Quality Assessment, No-Reference Image Quality Assessment |
Published | 2019-11-08 |
URL | https://arxiv.org/abs/1911.03149v1 |
https://arxiv.org/pdf/1911.03149v1.pdf | |
PWC | https://paperswithcode.com/paper/quality-aware-generative-adversarial-networks |
Repo | |
Framework | |
Flexible Modeling of Diversity with Strongly Log-Concave Distributions
Title | Flexible Modeling of Diversity with Strongly Log-Concave Distributions |
Authors | Joshua Robinson, Suvrit Sra, Stefanie Jegelka |
Abstract | Strongly log-concave (SLC) distributions are a rich class of discrete probability distributions over subsets of some ground set. They are strictly more general than strongly Rayleigh (SR) distributions such as the well-known determinantal point process. While SR distributions offer elegant models of diversity, they lack an easy control over how they express diversity. We propose SLC as the right extension of SR that enables easier, more intuitive control over diversity, illustrating this via examples of practical importance. We develop two fundamental tools needed to apply SLC distributions to learning and inference: sampling and mode finding. For sampling we develop an MCMC sampler and give theoretical mixing time bounds. For mode finding, we establish a weak log-submodularity property for SLC functions and derive optimization guarantees for a distorted greedy algorithm. |
Tasks | |
Published | 2019-06-12 |
URL | https://arxiv.org/abs/1906.05413v1 |
https://arxiv.org/pdf/1906.05413v1.pdf | |
PWC | https://paperswithcode.com/paper/flexible-modeling-of-diversity-with-strongly |
Repo | |
Framework | |
Self-Supervised Sim-to-Real Adaptation for Visual Robotic Manipulation
Title | Self-Supervised Sim-to-Real Adaptation for Visual Robotic Manipulation |
Authors | Rae Jeong, Yusuf Aytar, David Khosid, Yuxiang Zhou, Jackie Kay, Thomas Lampe, Konstantinos Bousmalis, Francesco Nori |
Abstract | Collecting and automatically obtaining reward signals from real robotic visual data for the purposes of training reinforcement learning algorithms can be quite challenging and time-consuming. Methods for utilizing unlabeled data can have a huge potential to further accelerate robotic learning. We consider here the problem of performing manipulation tasks from pixels. In such tasks, choosing an appropriate state representation is crucial for planning and control. This is even more relevant with real images where noise, occlusions and resolution affect the accuracy and reliability of state estimation. In this work, we learn a latent state representation implicitly with deep reinforcement learning in simulation, and then adapt it to the real domain using unlabeled real robot data. We propose to do so by optimizing sequence-based self supervised objectives. These exploit the temporal nature of robot experience, and can be common in both the simulated and real domains, without assuming any alignment of underlying states in simulated and unlabeled real images. We propose Contrastive Forward Dynamics loss, which combines dynamics model learning with time-contrastive techniques. The learned state representation that results from our methods can be used to robustly solve a manipulation task in simulation and to successfully transfer the learned skill on a real system. We demonstrate the effectiveness of our approaches by training a vision-based reinforcement learning agent for cube stacking. Agents trained with our method, using only 5 hours of unlabeled real robot data for adaptation, shows a clear improvement over domain randomization, and standard visual domain adaptation techniques for sim-to-real transfer. |
Tasks | Domain Adaptation |
Published | 2019-10-21 |
URL | https://arxiv.org/abs/1910.09470v1 |
https://arxiv.org/pdf/1910.09470v1.pdf | |
PWC | https://paperswithcode.com/paper/self-supervised-sim-to-real-adaptation-for |
Repo | |
Framework | |
Bivariate Beta-LSTM
Title | Bivariate Beta-LSTM |
Authors | Kyungwoo Song, JoonHo Jang, Seung jae Shin, Il-Chul Moon |
Abstract | Long Short-Term Memory (LSTM) infers the long term dependency through a cell state maintained by the input and the forget gate structures, which models a gate output as a value in [0,1] through a sigmoid function. However, due to the graduality of the sigmoid function, the sigmoid gate is not flexible in representing multi-modality or skewness. Besides, the previous models lack modeling on the correlation between the gates, which would be a new method to adopt inductive bias for a relationship between previous and current input. This paper proposes a new gate structure with the bivariate Beta distribution. The proposed gate structure enables probabilistic modeling on the gates within the LSTM cell so that the modelers can customize the cell state flow with priors and distributions. Moreover, we theoretically show the higher upper bound of the gradient compared to the sigmoid function, and we empirically observed that the bivariate Beta distribution gate structure provides higher gradient values in training. We demonstrate the effectiveness of bivariate Beta gate structure on the sentence classification, image classification, polyphonic music modeling, and image caption generation. |
Tasks | Density Estimation, Image Classification, Music Modeling, Sentence Classification |
Published | 2019-05-25 |
URL | https://arxiv.org/abs/1905.10521v3 |
https://arxiv.org/pdf/1905.10521v3.pdf | |
PWC | https://paperswithcode.com/paper/bivariate-beta-lstm |
Repo | |
Framework | |
From the Internet of Information to the Internet of Intelligence
Title | From the Internet of Information to the Internet of Intelligence |
Authors | F. Richard Yu |
Abstract | In the era of the Internet of information, we have gone through layering, cross-layer, and cross-system design paradigms. Recently, the curse of modeling" and curse of dimensionality” of the cross-system design paradigm have resulted in the popularity of using artificial intelligence (AI) to optimize the Internet of information. However, many significant research challenges remain to be addressed for the AI approach, including the lack of high-quality training data due to privacy and resources constraints in this data-driven approach. To address these challenges, we need to take a look at humans’ cooperation in a larger time scale. To facilitate cooperation in modern history, we have built three major technologies: grid of transportation", grid of energy”, and the Internet of information". In this paper, we argue that the next cooperation paradigm could be the Internet of intelligence (Intelligence-Net)", where intelligence can be easily obtained like energy and information, enabled by the recent advances in blockchain technology. We present some recent advances in these areas, and discuss some open issues and challenges that need to be addressed in the future. |
Tasks | |
Published | 2019-08-30 |
URL | https://arxiv.org/abs/1909.08068v1 |
https://arxiv.org/pdf/1909.08068v1.pdf | |
PWC | https://paperswithcode.com/paper/from-the-internet-of-information-to-the |
Repo | |
Framework | |
Self-supervised Learning with Geometric Constraints in Monocular Video: Connecting Flow, Depth, and Camera
Title | Self-supervised Learning with Geometric Constraints in Monocular Video: Connecting Flow, Depth, and Camera |
Authors | Yuhua Chen, Cordelia Schmid, Cristian Sminchisescu |
Abstract | We present GLNet, a self-supervised framework for learning depth, optical flow, camera pose and intrinsic parameters from monocular video - addressing the difficulty of acquiring realistic ground-truth for such tasks. We propose three contributions: 1) we design new loss functions that capture multiple geometric constraints (eg. epipolar geometry) as well as an adaptive photometric loss that supports multiple moving objects, rigid and non-rigid, 2) we extend the model such that it predicts camera intrinsics, making it applicable to uncalibrated video, and 3) we propose several online refinement strategies that rely on the symmetry of our self-supervised loss in training and testing, in particular optimizing model parameters and/or the output of different tasks, thus leveraging their mutual interactions. The idea of jointly optimizing the system output, under all geometric and photometric constraints can be viewed as a dense generalization of classical bundle adjustment. We demonstrate the effectiveness of our method on KITTI and Cityscapes, where we outperform previous self-supervised approaches on multiple tasks. We also show good generalization for transfer learning in YouTube videos. |
Tasks | Optical Flow Estimation, Transfer Learning |
Published | 2019-07-12 |
URL | https://arxiv.org/abs/1907.05820v2 |
https://arxiv.org/pdf/1907.05820v2.pdf | |
PWC | https://paperswithcode.com/paper/self-supervised-learning-with-geometric |
Repo | |
Framework | |