July 29, 2019

2902 words 14 mins read

Paper Group AWR 136

Hyperparameter Optimization: A Spectral Approach. Neural Machine Translation. A Novel Approach for Image Segmentation based on Histograms computed from Hue-data. Stabilizing GAN Training with Multiple Random Projections. ShapeWorld - A new test methodology for multimodal language understanding. Tensor-on-tensor regression. Generate To Adapt: Aligni …

Hyperparameter Optimization: A Spectral Approach


Title	Hyperparameter Optimization: A Spectral Approach
Authors	Elad Hazan, Adam Klivans, Yang Yuan
Abstract	We give a simple, fast algorithm for hyperparameter optimization inspired by techniques from the analysis of Boolean functions. We focus on the high-dimensional regime where the canonical example is training a neural network with a large number of hyperparameters. The algorithm — an iterative application of compressed sensing techniques for orthogonal polynomials — requires only uniform sampling of the hyperparameters and is thus easily parallelizable. Experiments for training deep neural networks on Cifar-10 show that compared to state-of-the-art tools (e.g., Hyperband and Spearmint), our algorithm finds significantly improved solutions, in some cases better than what is attainable by hand-tuning. In terms of overall running time (i.e., time required to sample various settings of hyperparameters plus additional computation time), we are at least an order of magnitude faster than Hyperband and Bayesian Optimization. We also outperform Random Search 8x. Additionally, our method comes with provable guarantees and yields the first improvements on the sample complexity of learning decision trees in over two decades. In particular, we obtain the first quasi-polynomial time algorithm for learning noisy decision trees with polynomial sample complexity.
Tasks	Hyperparameter Optimization
Published	2017-06-02
URL	http://arxiv.org/abs/1706.00764v4
PDF	http://arxiv.org/pdf/1706.00764v4.pdf
PWC	https://paperswithcode.com/paper/hyperparameter-optimization-a-spectral
Repo	https://github.com/callowbird/Harmonica
Framework	none

Neural Machine Translation


Title	Neural Machine Translation
Authors	Philipp Koehn
Abstract	Draft of textbook chapter on neural machine translation. a comprehensive treatment of the topic, ranging from introduction to neural networks, computation graphs, description of the currently dominant attentional sequence-to-sequence model, recent refinements, alternative architectures and challenges. Written as chapter for the textbook Statistical Machine Translation. Used in the JHU Fall 2017 class on machine translation.
Tasks	Machine Translation
Published	2017-09-22
URL	http://arxiv.org/abs/1709.07809v1
PDF	http://arxiv.org/pdf/1709.07809v1.pdf
PWC	https://paperswithcode.com/paper/neural-machine-translation
Repo	https://github.com/rsennrich/wmt16-scripts
Framework	none

A Novel Approach for Image Segmentation based on Histograms computed from Hue-data


Title	A Novel Approach for Image Segmentation based on Histograms computed from Hue-data
Authors	Viraj Mavani, Ayesha Gurnani, Jhanvi Shah
Abstract	Computer Vision is growing day by day in terms of user specific applications. The first step of any such application is segmenting an image. In this paper, we propose a novel and grass-root level image segmentation algorithm for cases in which the background has uniform color distribution. This algorithm can be used for images of flowers, birds, insects and many more where such background conditions occur. By image segmentation, the visualization of a computer increases manifolds and it can even attain near-human accuracy during classification.
Tasks	Semantic Segmentation
Published	2017-07-30
URL	http://arxiv.org/abs/1707.09643v1
PDF	http://arxiv.org/pdf/1707.09643v1.pdf
PWC	https://paperswithcode.com/paper/a-novel-approach-for-image-segmentation-based
Repo	https://github.com/Alihussain1/Image-Segmentation
Framework	none

Stabilizing GAN Training with Multiple Random Projections


Title	Stabilizing GAN Training with Multiple Random Projections
Authors	Behnam Neyshabur, Srinadh Bhojanapalli, Ayan Chakrabarti
Abstract	Training generative adversarial networks is unstable in high-dimensions as the true data distribution tends to be concentrated in a small fraction of the ambient space. The discriminator is then quickly able to classify nearly all generated samples as fake, leaving the generator without meaningful gradients and causing it to deteriorate after a point in training. In this work, we propose training a single generator simultaneously against an array of discriminators, each of which looks at a different random low-dimensional projection of the data. Individual discriminators, now provided with restricted views of the input, are unable to reject generated samples perfectly and continue to provide meaningful gradients to the generator throughout training. Meanwhile, the generator learns to produce samples consistent with the full data distribution to satisfy all discriminators simultaneously. We demonstrate the practical utility of this approach experimentally, and show that it is able to produce image samples with higher quality than traditional training with a single discriminator.
Tasks
Published	2017-05-22
URL	http://arxiv.org/abs/1705.07831v2
PDF	http://arxiv.org/pdf/1705.07831v2.pdf
PWC	https://paperswithcode.com/paper/stabilizing-gan-training-with-multiple-random
Repo	https://github.com/ayanc/rpgan
Framework	tf

ShapeWorld - A new test methodology for multimodal language understanding


Title	ShapeWorld - A new test methodology for multimodal language understanding
Authors	Alexander Kuhnle, Ann Copestake
Abstract	We introduce a novel framework for evaluating multimodal deep learning models with respect to their language understanding and generalization abilities. In this approach, artificial data is automatically generated according to the experimenter’s specifications. The content of the data, both during training and evaluation, can be controlled in detail, which enables tasks to be created that require true generalization abilities, in particular the combination of previously introduced concepts in novel ways. We demonstrate the potential of our methodology by evaluating various visual question answering models on four different tasks, and show how our framework gives us detailed insights into their capabilities and limitations. By open-sourcing our framework, we hope to stimulate progress in the field of multimodal language understanding.
Tasks	Visual Question Answering
Published	2017-04-14
URL	http://arxiv.org/abs/1704.04517v1
PDF	http://arxiv.org/pdf/1704.04517v1.pdf
PWC	https://paperswithcode.com/paper/shapeworld-a-new-test-methodology-for
Repo	https://github.com/AlexKuhnle/ShapeWorld
Framework	tf

Tensor-on-tensor regression


Title	Tensor-on-tensor regression
Authors	Eric F. Lock
Abstract	We propose a framework for the linear prediction of a multi-way array (i.e., a tensor) from another multi-way array of arbitrary dimension, using the contracted tensor product. This framework generalizes several existing approaches, including methods to predict a scalar outcome from a tensor, a matrix from a matrix, or a tensor from a scalar. We describe an approach that exploits the multiway structure of both the predictors and the outcomes by restricting the coefficients to have reduced CP-rank. We propose a general and efficient algorithm for penalized least-squares estimation, which allows for a ridge (L_2) penalty on the coefficients. The objective is shown to give the mode of a Bayesian posterior, which motivates a Gibbs sampling algorithm for inference. We illustrate the approach with an application to facial image data. An R package is available at https://github.com/lockEF/MultiwayRegression .
Tasks
Published	2017-01-04
URL	http://arxiv.org/abs/1701.01037v2
PDF	http://arxiv.org/pdf/1701.01037v2.pdf
PWC	https://paperswithcode.com/paper/tensor-on-tensor-regression
Repo	https://github.com/lockEF/MultiwayRegression
Framework	none

Generate To Adapt: Aligning Domains using Generative Adversarial Networks


Title	Generate To Adapt: Aligning Domains using Generative Adversarial Networks
Authors	Swami Sankaranarayanan, Yogesh Balaji, Carlos D. Castillo, Rama Chellappa
Abstract	Domain Adaptation is an actively researched problem in Computer Vision. In this work, we propose an approach that leverages unsupervised data to bring the source and target distributions closer in a learned joint feature space. We accomplish this by inducing a symbiotic relationship between the learned embedding and a generative adversarial network. This is in contrast to methods which use the adversarial framework for realistic data generation and retraining deep models with such data. We demonstrate the strength and generality of our approach by performing experiments on three different tasks with varying levels of difficulty: (1) Digit classification (MNIST, SVHN and USPS datasets) (2) Object recognition using OFFICE dataset and (3) Domain adaptation from synthetic to real data. Our method achieves state-of-the art performance in most experimental settings and by far the only GAN-based method that has been shown to work well across different datasets such as OFFICE and DIGITS.
Tasks	Domain Adaptation, Object Recognition
Published	2017-04-06
URL	http://arxiv.org/abs/1704.01705v4
PDF	http://arxiv.org/pdf/1704.01705v4.pdf
PWC	https://paperswithcode.com/paper/generate-to-adapt-aligning-domains-using
Repo	https://github.com/watay147/tele_project
Framework	pytorch

Online control of the false discovery rate with decaying memory


Title	Online control of the false discovery rate with decaying memory
Authors	Aaditya Ramdas, Fanny Yang, Martin J. Wainwright, Michael I. Jordan
Abstract	In the online multiple testing problem, p-values corresponding to different null hypotheses are observed one by one, and the decision of whether or not to reject the current hypothesis must be made immediately, after which the next p-value is observed. Alpha-investing algorithms to control the false discovery rate (FDR), formulated by Foster and Stine, have been generalized and applied to many settings, including quality-preserving databases in science and multiple A/B or multi-armed bandit tests for internet commerce. This paper improves the class of generalized alpha-investing algorithms (GAI) in four ways: (a) we show how to uniformly improve the power of the entire class of monotone GAI procedures by awarding more alpha-wealth for each rejection, giving a win-win resolution to a recent dilemma raised by Javanmard and Montanari, (b) we demonstrate how to incorporate prior weights to indicate domain knowledge of which hypotheses are likely to be non-null, (c) we allow for differing penalties for false discoveries to indicate that some hypotheses may be more important than others, (d) we define a new quantity called the decaying memory false discovery rate (mem-FDR) that may be more meaningful for truly temporal applications, and which alleviates problems that we describe and refer to as “piggybacking” and “alpha-death”. Our GAI++ algorithms incorporate all four generalizations simultaneously, and reduce to more powerful variants of earlier algorithms when the weights and decay are all set to unity. Finally, we also describe a simple method to derive new online FDR rules based on an estimated false discovery proportion.
Tasks
Published	2017-10-02
URL	http://arxiv.org/abs/1710.00499v1
PDF	http://arxiv.org/pdf/1710.00499v1.pdf
PWC	https://paperswithcode.com/paper/online-control-of-the-false-discovery-rate
Repo	https://github.com/fanny-yang/MABFDR
Framework	none

Poincaré Embeddings for Learning Hierarchical Representations


Title	Poincaré Embeddings for Learning Hierarchical Representations
Authors	Maximilian Nickel, Douwe Kiela
Abstract	Representation learning has become an invaluable approach for learning from symbolic data such as text and graphs. However, while complex symbolic datasets often exhibit a latent hierarchical structure, state-of-the-art methods typically learn embeddings in Euclidean vector spaces, which do not account for this property. For this purpose, we introduce a new approach for learning hierarchical representations of symbolic data by embedding them into hyperbolic space – or more precisely into an n-dimensional Poincar'e ball. Due to the underlying hyperbolic geometry, this allows us to learn parsimonious representations of symbolic data by simultaneously capturing hierarchy and similarity. We introduce an efficient algorithm to learn the embeddings based on Riemannian optimization and show experimentally that Poincar'e embeddings outperform Euclidean embeddings significantly on data with latent hierarchies, both in terms of representation capacity and in terms of generalization ability.
Tasks	Graph Embedding, Representation Learning
Published	2017-05-22
URL	http://arxiv.org/abs/1705.08039v2
PDF	http://arxiv.org/pdf/1705.08039v2.pdf
PWC	https://paperswithcode.com/paper/poincare-embeddings-for-learning-hierarchical
Repo	https://github.com/TatsuyaShirakawa/poincare-embedding
Framework	none

Cutting the Error by Half: Investigation of Very Deep CNN and Advanced Training Strategies for Document Image Classification


Title	Cutting the Error by Half: Investigation of Very Deep CNN and Advanced Training Strategies for Document Image Classification
Authors	Muhammad Zeshan Afzal, Andreas Kölsch, Sheraz Ahmed, Marcus Liwicki
Abstract	We present an exhaustive investigation of recent Deep Learning architectures, algorithms, and strategies for the task of document image classification to finally reduce the error by more than half. Existing approaches, such as the DeepDocClassifier, apply standard Convolutional Network architectures with transfer learning from the object recognition domain. The contribution of the paper is threefold: First, it investigates recently introduced very deep neural network architectures (GoogLeNet, VGG, ResNet) using transfer learning (from real images). Second, it proposes transfer learning from a huge set of document images, i.e. 400,000 documents. Third, it analyzes the impact of the amount of training data (document images) and other parameters to the classification abilities. We use two datasets, the Tobacco-3482 and the large-scale RVL-CDIP dataset. We achieve an accuracy of 91.13% for the Tobacco-3482 dataset while earlier approaches reach only 77.6%. Thus, a relative error reduction of more than 60% is achieved. For the large dataset RVL-CDIP, an accuracy of 90.97% is achieved, corresponding to a relative error reduction of 11.5%.
Tasks	Document Image Classification, Image Classification, Object Recognition, Transfer Learning
Published	2017-04-11
URL	http://arxiv.org/abs/1704.03557v1
PDF	http://arxiv.org/pdf/1704.03557v1.pdf
PWC	https://paperswithcode.com/paper/cutting-the-error-by-half-investigation-of
Repo	https://github.com/microsoft/unilm/tree/master/layoutlm
Framework	pytorch

Efficient and principled score estimation with Nyström kernel exponential families


Title	Efficient and principled score estimation with Nyström kernel exponential families
Authors	Dougal J. Sutherland, Heiko Strathmann, Michael Arbel, Arthur Gretton
Abstract	We propose a fast method with statistical guarantees for learning an exponential family density model where the natural parameter is in a reproducing kernel Hilbert space, and may be infinite-dimensional. The model is learned by fitting the derivative of the log density, the score, thus avoiding the need to compute a normalization constant. Our approach improves the computational efficiency of an earlier solution by using a low-rank, Nystr"om-like solution. The new solution retains the consistency and convergence rates of the full-rank solution (exactly in Fisher distance, and nearly in other distances), with guarantees on the degree of cost and storage reduction. We evaluate the method in experiments on density estimation and in the construction of an adaptive Hamiltonian Monte Carlo sampler. Compared to an existing score learning approach using a denoising autoencoder, our estimator is empirically more data-efficient when estimating the score, runs faster, and has fewer parameters (which can be tuned in a principled and interpretable way), in addition to providing statistical guarantees.
Tasks	Denoising, Density Estimation
Published	2017-05-23
URL	http://arxiv.org/abs/1705.08360v5
PDF	http://arxiv.org/pdf/1705.08360v5.pdf
PWC	https://paperswithcode.com/paper/efficient-and-principled-score-estimation
Repo	https://github.com/karlnapf/nystrom-kexpfam
Framework	tf

Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning


Title	Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning
Authors	Christoph Dann, Tor Lattimore, Emma Brunskill
Abstract	Statistical performance bounds for reinforcement learning (RL) algorithms can be critical for high-stakes applications like healthcare. This paper introduces a new framework for theoretically measuring the performance of such algorithms called Uniform-PAC, which is a strengthening of the classical Probably Approximately Correct (PAC) framework. In contrast to the PAC framework, the uniform version may be used to derive high probability regret guarantees and so forms a bridge between the two setups that has been missing in the literature. We demonstrate the benefits of the new framework for finite-state episodic MDPs with a new algorithm that is Uniform-PAC and simultaneously achieves optimal regret and PAC guarantees except for a factor of the horizon.
Tasks
Published	2017-03-22
URL	http://arxiv.org/abs/1703.07710v3
PDF	http://arxiv.org/pdf/1703.07710v3.pdf
PWC	https://paperswithcode.com/paper/unifying-pac-and-regret-uniform-pac-bounds
Repo	https://github.com/chrodan/FiniteEpisodicRL.jl
Framework	none

A Comparison of Neural Models for Word Ordering


Title	A Comparison of Neural Models for Word Ordering
Authors	Eva Hasler, Felix Stahlberg, Marcus Tomalin, Adri`a de Gispert, Bill Byrne \|
Abstract	We compare several language models for the word-ordering task and propose a new bag-to-sequence neural model based on attention-based sequence-to-sequence models. We evaluate the model on a large German WMT data set where it significantly outperforms existing models. We also describe a novel search strategy for LM-based word ordering and report results on the English Penn Treebank. Our best model setup outperforms prior work both in terms of speed and quality.
Tasks
Published	2017-08-05
URL	http://arxiv.org/abs/1708.01809v1
PDF	http://arxiv.org/pdf/1708.01809v1.pdf
PWC	https://paperswithcode.com/paper/a-comparison-of-neural-models-for-word
Repo	https://github.com/ehasler/tensorflow
Framework	tf

Light-Head R-CNN: In Defense of Two-Stage Object Detector


Title	Light-Head R-CNN: In Defense of Two-Stage Object Detector
Authors	Zeming Li, Chao Peng, Gang Yu, Xiangyu Zhang, Yangdong Deng, Jian Sun
Abstract	In this paper, we first investigate why typical two-stage methods are not as fast as single-stage, fast detectors like YOLO and SSD. We find that Faster R-CNN and R-FCN perform an intensive computation after or before RoI warping. Faster R-CNN involves two fully connected layers for RoI recognition, while R-FCN produces a large score maps. Thus, the speed of these networks is slow due to the heavy-head design in the architecture. Even if we significantly reduce the base model, the computation cost cannot be largely decreased accordingly. We propose a new two-stage detector, Light-Head R-CNN, to address the shortcoming in current two-stage approaches. In our design, we make the head of network as light as possible, by using a thin feature map and a cheap R-CNN subnet (pooling and single fully-connected layer). Our ResNet-101 based light-head R-CNN outperforms state-of-art object detectors on COCO while keeping time efficiency. More importantly, simply replacing the backbone with a tiny network (e.g, Xception), our Light-Head R-CNN gets 30.7 mmAP at 102 FPS on COCO, significantly outperforming the single-stage, fast detectors like YOLO and SSD on both speed and accuracy. Code will be made publicly available.
Tasks
Published	2017-11-20
URL	http://arxiv.org/abs/1711.07264v2
PDF	http://arxiv.org/pdf/1711.07264v2.pdf
PWC	https://paperswithcode.com/paper/light-head-r-cnn-in-defense-of-two-stage
Repo	https://github.com/rickyHong/pytorch-light-head-rcnn-repl
Framework	pytorch

Safe Mutations for Deep and Recurrent Neural Networks through Output Gradients


Title	Safe Mutations for Deep and Recurrent Neural Networks through Output Gradients
Authors	Joel Lehman, Jay Chen, Jeff Clune, Kenneth O. Stanley
Abstract	While neuroevolution (evolving neural networks) has a successful track record across a variety of domains from reinforcement learning to artificial life, it is rarely applied to large, deep neural networks. A central reason is that while random mutation generally works in low dimensions, a random perturbation of thousands or millions of weights is likely to break existing functionality, providing no learning signal even if some individual weight changes were beneficial. This paper proposes a solution by introducing a family of safe mutation (SM) operators that aim within the mutation operator itself to find a degree of change that does not alter network behavior too much, but still facilitates exploration. Importantly, these SM operators do not require any additional interactions with the environment. The most effective SM variant capitalizes on the intriguing opportunity to scale the degree of mutation of each individual weight according to the sensitivity of the network’s outputs to that weight, which requires computing the gradient of outputs with respect to the weights (instead of the gradient of error, as in conventional deep learning). This safe mutation through gradients (SM-G) operator dramatically increases the ability of a simple genetic algorithm-based neuroevolution method to find solutions in high-dimensional domains that require deep and/or recurrent neural networks (which tend to be particularly brittle to mutation), including domains that require processing raw pixels. By improving our ability to evolve deep neural networks, this new safer approach to mutation expands the scope of domains amenable to neuroevolution.
Tasks	Artificial Life
Published	2017-12-18
URL	http://arxiv.org/abs/1712.06563v3
PDF	http://arxiv.org/pdf/1712.06563v3.pdf
PWC	https://paperswithcode.com/paper/safe-mutations-for-deep-and-recurrent-neural
Repo	https://github.com/uber-research/safemutations
Framework	pytorch