Paper Group AWR 136
Hyperparameter Optimization: A Spectral Approach. Neural Machine Translation. A Novel Approach for Image Segmentation based on Histograms computed from Hue-data. Stabilizing GAN Training with Multiple Random Projections. ShapeWorld - A new test methodology for multimodal language understanding. Tensor-on-tensor regression. Generate To Adapt: Aligni …
Hyperparameter Optimization: A Spectral Approach
Title | Hyperparameter Optimization: A Spectral Approach |
Authors | Elad Hazan, Adam Klivans, Yang Yuan |
Abstract | We give a simple, fast algorithm for hyperparameter optimization inspired by techniques from the analysis of Boolean functions. We focus on the high-dimensional regime where the canonical example is training a neural network with a large number of hyperparameters. The algorithm — an iterative application of compressed sensing techniques for orthogonal polynomials — requires only uniform sampling of the hyperparameters and is thus easily parallelizable. Experiments for training deep neural networks on Cifar-10 show that compared to state-of-the-art tools (e.g., Hyperband and Spearmint), our algorithm finds significantly improved solutions, in some cases better than what is attainable by hand-tuning. In terms of overall running time (i.e., time required to sample various settings of hyperparameters plus additional computation time), we are at least an order of magnitude faster than Hyperband and Bayesian Optimization. We also outperform Random Search 8x. Additionally, our method comes with provable guarantees and yields the first improvements on the sample complexity of learning decision trees in over two decades. In particular, we obtain the first quasi-polynomial time algorithm for learning noisy decision trees with polynomial sample complexity. |
Tasks | Hyperparameter Optimization |
Published | 2017-06-02 |
URL | http://arxiv.org/abs/1706.00764v4 |
http://arxiv.org/pdf/1706.00764v4.pdf | |
PWC | https://paperswithcode.com/paper/hyperparameter-optimization-a-spectral |
Repo | https://github.com/callowbird/Harmonica |
Framework | none |
Neural Machine Translation
Title | Neural Machine Translation |
Authors | Philipp Koehn |
Abstract | Draft of textbook chapter on neural machine translation. a comprehensive treatment of the topic, ranging from introduction to neural networks, computation graphs, description of the currently dominant attentional sequence-to-sequence model, recent refinements, alternative architectures and challenges. Written as chapter for the textbook Statistical Machine Translation. Used in the JHU Fall 2017 class on machine translation. |
Tasks | Machine Translation |
Published | 2017-09-22 |
URL | http://arxiv.org/abs/1709.07809v1 |
http://arxiv.org/pdf/1709.07809v1.pdf | |
PWC | https://paperswithcode.com/paper/neural-machine-translation |
Repo | https://github.com/rsennrich/wmt16-scripts |
Framework | none |
A Novel Approach for Image Segmentation based on Histograms computed from Hue-data
Title | A Novel Approach for Image Segmentation based on Histograms computed from Hue-data |
Authors | Viraj Mavani, Ayesha Gurnani, Jhanvi Shah |
Abstract | Computer Vision is growing day by day in terms of user specific applications. The first step of any such application is segmenting an image. In this paper, we propose a novel and grass-root level image segmentation algorithm for cases in which the background has uniform color distribution. This algorithm can be used for images of flowers, birds, insects and many more where such background conditions occur. By image segmentation, the visualization of a computer increases manifolds and it can even attain near-human accuracy during classification. |
Tasks | Semantic Segmentation |
Published | 2017-07-30 |
URL | http://arxiv.org/abs/1707.09643v1 |
http://arxiv.org/pdf/1707.09643v1.pdf | |
PWC | https://paperswithcode.com/paper/a-novel-approach-for-image-segmentation-based |
Repo | https://github.com/Alihussain1/Image-Segmentation |
Framework | none |
Stabilizing GAN Training with Multiple Random Projections
Title | Stabilizing GAN Training with Multiple Random Projections |
Authors | Behnam Neyshabur, Srinadh Bhojanapalli, Ayan Chakrabarti |
Abstract | Training generative adversarial networks is unstable in high-dimensions as the true data distribution tends to be concentrated in a small fraction of the ambient space. The discriminator is then quickly able to classify nearly all generated samples as fake, leaving the generator without meaningful gradients and causing it to deteriorate after a point in training. In this work, we propose training a single generator simultaneously against an array of discriminators, each of which looks at a different random low-dimensional projection of the data. Individual discriminators, now provided with restricted views of the input, are unable to reject generated samples perfectly and continue to provide meaningful gradients to the generator throughout training. Meanwhile, the generator learns to produce samples consistent with the full data distribution to satisfy all discriminators simultaneously. We demonstrate the practical utility of this approach experimentally, and show that it is able to produce image samples with higher quality than traditional training with a single discriminator. |
Tasks | |
Published | 2017-05-22 |
URL | http://arxiv.org/abs/1705.07831v2 |
http://arxiv.org/pdf/1705.07831v2.pdf | |
PWC | https://paperswithcode.com/paper/stabilizing-gan-training-with-multiple-random |
Repo | https://github.com/ayanc/rpgan |
Framework | tf |
ShapeWorld - A new test methodology for multimodal language understanding
Title | ShapeWorld - A new test methodology for multimodal language understanding |
Authors | Alexander Kuhnle, Ann Copestake |
Abstract | We introduce a novel framework for evaluating multimodal deep learning models with respect to their language understanding and generalization abilities. In this approach, artificial data is automatically generated according to the experimenter’s specifications. The content of the data, both during training and evaluation, can be controlled in detail, which enables tasks to be created that require true generalization abilities, in particular the combination of previously introduced concepts in novel ways. We demonstrate the potential of our methodology by evaluating various visual question answering models on four different tasks, and show how our framework gives us detailed insights into their capabilities and limitations. By open-sourcing our framework, we hope to stimulate progress in the field of multimodal language understanding. |
Tasks | Visual Question Answering |
Published | 2017-04-14 |
URL | http://arxiv.org/abs/1704.04517v1 |
http://arxiv.org/pdf/1704.04517v1.pdf | |
PWC | https://paperswithcode.com/paper/shapeworld-a-new-test-methodology-for |
Repo | https://github.com/AlexKuhnle/ShapeWorld |
Framework | tf |
Tensor-on-tensor regression
Title | Tensor-on-tensor regression |
Authors | Eric F. Lock |
Abstract | We propose a framework for the linear prediction of a multi-way array (i.e., a tensor) from another multi-way array of arbitrary dimension, using the contracted tensor product. This framework generalizes several existing approaches, including methods to predict a scalar outcome from a tensor, a matrix from a matrix, or a tensor from a scalar. We describe an approach that exploits the multiway structure of both the predictors and the outcomes by restricting the coefficients to have reduced CP-rank. We propose a general and efficient algorithm for penalized least-squares estimation, which allows for a ridge (L_2) penalty on the coefficients. The objective is shown to give the mode of a Bayesian posterior, which motivates a Gibbs sampling algorithm for inference. We illustrate the approach with an application to facial image data. An R package is available at https://github.com/lockEF/MultiwayRegression . |
Tasks | |
Published | 2017-01-04 |
URL | http://arxiv.org/abs/1701.01037v2 |
http://arxiv.org/pdf/1701.01037v2.pdf | |
PWC | https://paperswithcode.com/paper/tensor-on-tensor-regression |
Repo | https://github.com/lockEF/MultiwayRegression |
Framework | none |
Generate To Adapt: Aligning Domains using Generative Adversarial Networks
Title | Generate To Adapt: Aligning Domains using Generative Adversarial Networks |
Authors | Swami Sankaranarayanan, Yogesh Balaji, Carlos D. Castillo, Rama Chellappa |
Abstract | Domain Adaptation is an actively researched problem in Computer Vision. In this work, we propose an approach that leverages unsupervised data to bring the source and target distributions closer in a learned joint feature space. We accomplish this by inducing a symbiotic relationship between the learned embedding and a generative adversarial network. This is in contrast to methods which use the adversarial framework for realistic data generation and retraining deep models with such data. We demonstrate the strength and generality of our approach by performing experiments on three different tasks with varying levels of difficulty: (1) Digit classification (MNIST, SVHN and USPS datasets) (2) Object recognition using OFFICE dataset and (3) Domain adaptation from synthetic to real data. Our method achieves state-of-the art performance in most experimental settings and by far the only GAN-based method that has been shown to work well across different datasets such as OFFICE and DIGITS. |
Tasks | Domain Adaptation, Object Recognition |
Published | 2017-04-06 |
URL | http://arxiv.org/abs/1704.01705v4 |
http://arxiv.org/pdf/1704.01705v4.pdf | |
PWC | https://paperswithcode.com/paper/generate-to-adapt-aligning-domains-using |
Repo | https://github.com/watay147/tele_project |
Framework | pytorch |
Online control of the false discovery rate with decaying memory
Title | Online control of the false discovery rate with decaying memory |
Authors | Aaditya Ramdas, Fanny Yang, Martin J. Wainwright, Michael I. Jordan |
Abstract | In the online multiple testing problem, p-values corresponding to different null hypotheses are observed one by one, and the decision of whether or not to reject the current hypothesis must be made immediately, after which the next p-value is observed. Alpha-investing algorithms to control the false discovery rate (FDR), formulated by Foster and Stine, have been generalized and applied to many settings, including quality-preserving databases in science and multiple A/B or multi-armed bandit tests for internet commerce. This paper improves the class of generalized alpha-investing algorithms (GAI) in four ways: (a) we show how to uniformly improve the power of the entire class of monotone GAI procedures by awarding more alpha-wealth for each rejection, giving a win-win resolution to a recent dilemma raised by Javanmard and Montanari, (b) we demonstrate how to incorporate prior weights to indicate domain knowledge of which hypotheses are likely to be non-null, (c) we allow for differing penalties for false discoveries to indicate that some hypotheses may be more important than others, (d) we define a new quantity called the decaying memory false discovery rate (mem-FDR) that may be more meaningful for truly temporal applications, and which alleviates problems that we describe and refer to as “piggybacking” and “alpha-death”. Our GAI++ algorithms incorporate all four generalizations simultaneously, and reduce to more powerful variants of earlier algorithms when the weights and decay are all set to unity. Finally, we also describe a simple method to derive new online FDR rules based on an estimated false discovery proportion. |
Tasks | |
Published | 2017-10-02 |
URL | http://arxiv.org/abs/1710.00499v1 |
http://arxiv.org/pdf/1710.00499v1.pdf | |
PWC | https://paperswithcode.com/paper/online-control-of-the-false-discovery-rate |
Repo | https://github.com/fanny-yang/MABFDR |
Framework | none |
Poincaré Embeddings for Learning Hierarchical Representations
Title | Poincaré Embeddings for Learning Hierarchical Representations |
Authors | Maximilian Nickel, Douwe Kiela |
Abstract | Representation learning has become an invaluable approach for learning from symbolic data such as text and graphs. However, while complex symbolic datasets often exhibit a latent hierarchical structure, state-of-the-art methods typically learn embeddings in Euclidean vector spaces, which do not account for this property. For this purpose, we introduce a new approach for learning hierarchical representations of symbolic data by embedding them into hyperbolic space – or more precisely into an n-dimensional Poincar'e ball. Due to the underlying hyperbolic geometry, this allows us to learn parsimonious representations of symbolic data by simultaneously capturing hierarchy and similarity. We introduce an efficient algorithm to learn the embeddings based on Riemannian optimization and show experimentally that Poincar'e embeddings outperform Euclidean embeddings significantly on data with latent hierarchies, both in terms of representation capacity and in terms of generalization ability. |
Tasks | Graph Embedding, Representation Learning |
Published | 2017-05-22 |
URL | http://arxiv.org/abs/1705.08039v2 |
http://arxiv.org/pdf/1705.08039v2.pdf | |
PWC | https://paperswithcode.com/paper/poincare-embeddings-for-learning-hierarchical |
Repo | https://github.com/TatsuyaShirakawa/poincare-embedding |
Framework | none |
Cutting the Error by Half: Investigation of Very Deep CNN and Advanced Training Strategies for Document Image Classification
Title | Cutting the Error by Half: Investigation of Very Deep CNN and Advanced Training Strategies for Document Image Classification |
Authors | Muhammad Zeshan Afzal, Andreas Kölsch, Sheraz Ahmed, Marcus Liwicki |
Abstract | We present an exhaustive investigation of recent Deep Learning architectures, algorithms, and strategies for the task of document image classification to finally reduce the error by more than half. Existing approaches, such as the DeepDocClassifier, apply standard Convolutional Network architectures with transfer learning from the object recognition domain. The contribution of the paper is threefold: First, it investigates recently introduced very deep neural network architectures (GoogLeNet, VGG, ResNet) using transfer learning (from real images). Second, it proposes transfer learning from a huge set of document images, i.e. 400,000 documents. Third, it analyzes the impact of the amount of training data (document images) and other parameters to the classification abilities. We use two datasets, the Tobacco-3482 and the large-scale RVL-CDIP dataset. We achieve an accuracy of 91.13% for the Tobacco-3482 dataset while earlier approaches reach only 77.6%. Thus, a relative error reduction of more than 60% is achieved. For the large dataset RVL-CDIP, an accuracy of 90.97% is achieved, corresponding to a relative error reduction of 11.5%. |
Tasks | Document Image Classification, Image Classification, Object Recognition, Transfer Learning |
Published | 2017-04-11 |
URL | http://arxiv.org/abs/1704.03557v1 |
http://arxiv.org/pdf/1704.03557v1.pdf | |
PWC | https://paperswithcode.com/paper/cutting-the-error-by-half-investigation-of |
Repo | https://github.com/microsoft/unilm/tree/master/layoutlm |
Framework | pytorch |
Efficient and principled score estimation with Nyström kernel exponential families
Title | Efficient and principled score estimation with Nyström kernel exponential families |
Authors | Dougal J. Sutherland, Heiko Strathmann, Michael Arbel, Arthur Gretton |
Abstract | We propose a fast method with statistical guarantees for learning an exponential family density model where the natural parameter is in a reproducing kernel Hilbert space, and may be infinite-dimensional. The model is learned by fitting the derivative of the log density, the score, thus avoiding the need to compute a normalization constant. Our approach improves the computational efficiency of an earlier solution by using a low-rank, Nystr"om-like solution. The new solution retains the consistency and convergence rates of the full-rank solution (exactly in Fisher distance, and nearly in other distances), with guarantees on the degree of cost and storage reduction. We evaluate the method in experiments on density estimation and in the construction of an adaptive Hamiltonian Monte Carlo sampler. Compared to an existing score learning approach using a denoising autoencoder, our estimator is empirically more data-efficient when estimating the score, runs faster, and has fewer parameters (which can be tuned in a principled and interpretable way), in addition to providing statistical guarantees. |
Tasks | Denoising, Density Estimation |
Published | 2017-05-23 |
URL | http://arxiv.org/abs/1705.08360v5 |
http://arxiv.org/pdf/1705.08360v5.pdf | |
PWC | https://paperswithcode.com/paper/efficient-and-principled-score-estimation |
Repo | https://github.com/karlnapf/nystrom-kexpfam |
Framework | tf |
Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning
Title | Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning |
Authors | Christoph Dann, Tor Lattimore, Emma Brunskill |
Abstract | Statistical performance bounds for reinforcement learning (RL) algorithms can be critical for high-stakes applications like healthcare. This paper introduces a new framework for theoretically measuring the performance of such algorithms called Uniform-PAC, which is a strengthening of the classical Probably Approximately Correct (PAC) framework. In contrast to the PAC framework, the uniform version may be used to derive high probability regret guarantees and so forms a bridge between the two setups that has been missing in the literature. We demonstrate the benefits of the new framework for finite-state episodic MDPs with a new algorithm that is Uniform-PAC and simultaneously achieves optimal regret and PAC guarantees except for a factor of the horizon. |
Tasks | |
Published | 2017-03-22 |
URL | http://arxiv.org/abs/1703.07710v3 |
http://arxiv.org/pdf/1703.07710v3.pdf | |
PWC | https://paperswithcode.com/paper/unifying-pac-and-regret-uniform-pac-bounds |
Repo | https://github.com/chrodan/FiniteEpisodicRL.jl |
Framework | none |
A Comparison of Neural Models for Word Ordering
Title | A Comparison of Neural Models for Word Ordering |
Authors | Eva Hasler, Felix Stahlberg, Marcus Tomalin, Adri`a de Gispert, Bill Byrne | |
Abstract | We compare several language models for the word-ordering task and propose a new bag-to-sequence neural model based on attention-based sequence-to-sequence models. We evaluate the model on a large German WMT data set where it significantly outperforms existing models. We also describe a novel search strategy for LM-based word ordering and report results on the English Penn Treebank. Our best model setup outperforms prior work both in terms of speed and quality. |
Tasks | |
Published | 2017-08-05 |
URL | http://arxiv.org/abs/1708.01809v1 |
http://arxiv.org/pdf/1708.01809v1.pdf | |
PWC | https://paperswithcode.com/paper/a-comparison-of-neural-models-for-word |
Repo | https://github.com/ehasler/tensorflow |
Framework | tf |
Light-Head R-CNN: In Defense of Two-Stage Object Detector
Title | Light-Head R-CNN: In Defense of Two-Stage Object Detector |
Authors | Zeming Li, Chao Peng, Gang Yu, Xiangyu Zhang, Yangdong Deng, Jian Sun |
Abstract | In this paper, we first investigate why typical two-stage methods are not as fast as single-stage, fast detectors like YOLO and SSD. We find that Faster R-CNN and R-FCN perform an intensive computation after or before RoI warping. Faster R-CNN involves two fully connected layers for RoI recognition, while R-FCN produces a large score maps. Thus, the speed of these networks is slow due to the heavy-head design in the architecture. Even if we significantly reduce the base model, the computation cost cannot be largely decreased accordingly. We propose a new two-stage detector, Light-Head R-CNN, to address the shortcoming in current two-stage approaches. In our design, we make the head of network as light as possible, by using a thin feature map and a cheap R-CNN subnet (pooling and single fully-connected layer). Our ResNet-101 based light-head R-CNN outperforms state-of-art object detectors on COCO while keeping time efficiency. More importantly, simply replacing the backbone with a tiny network (e.g, Xception), our Light-Head R-CNN gets 30.7 mmAP at 102 FPS on COCO, significantly outperforming the single-stage, fast detectors like YOLO and SSD on both speed and accuracy. Code will be made publicly available. |
Tasks | |
Published | 2017-11-20 |
URL | http://arxiv.org/abs/1711.07264v2 |
http://arxiv.org/pdf/1711.07264v2.pdf | |
PWC | https://paperswithcode.com/paper/light-head-r-cnn-in-defense-of-two-stage |
Repo | https://github.com/rickyHong/pytorch-light-head-rcnn-repl |
Framework | pytorch |
Safe Mutations for Deep and Recurrent Neural Networks through Output Gradients
Title | Safe Mutations for Deep and Recurrent Neural Networks through Output Gradients |
Authors | Joel Lehman, Jay Chen, Jeff Clune, Kenneth O. Stanley |
Abstract | While neuroevolution (evolving neural networks) has a successful track record across a variety of domains from reinforcement learning to artificial life, it is rarely applied to large, deep neural networks. A central reason is that while random mutation generally works in low dimensions, a random perturbation of thousands or millions of weights is likely to break existing functionality, providing no learning signal even if some individual weight changes were beneficial. This paper proposes a solution by introducing a family of safe mutation (SM) operators that aim within the mutation operator itself to find a degree of change that does not alter network behavior too much, but still facilitates exploration. Importantly, these SM operators do not require any additional interactions with the environment. The most effective SM variant capitalizes on the intriguing opportunity to scale the degree of mutation of each individual weight according to the sensitivity of the network’s outputs to that weight, which requires computing the gradient of outputs with respect to the weights (instead of the gradient of error, as in conventional deep learning). This safe mutation through gradients (SM-G) operator dramatically increases the ability of a simple genetic algorithm-based neuroevolution method to find solutions in high-dimensional domains that require deep and/or recurrent neural networks (which tend to be particularly brittle to mutation), including domains that require processing raw pixels. By improving our ability to evolve deep neural networks, this new safer approach to mutation expands the scope of domains amenable to neuroevolution. |
Tasks | Artificial Life |
Published | 2017-12-18 |
URL | http://arxiv.org/abs/1712.06563v3 |
http://arxiv.org/pdf/1712.06563v3.pdf | |
PWC | https://paperswithcode.com/paper/safe-mutations-for-deep-and-recurrent-neural |
Repo | https://github.com/uber-research/safemutations |
Framework | pytorch |