July 29, 2019

2832 words 14 mins read

Paper Group AWR 183

Paper Group AWR 183

Convolutional Networks for Spherical Signals. Image Restoration by Iterative Denoising and Backward Projections. Population Based Training of Neural Networks. Feature selection in high-dimensional dataset using MapReduce. Inferring Generative Model Structure with Static Analysis. Unsupervised Context-Sensitive Spelling Correction of English and Dut …

Convolutional Networks for Spherical Signals

Title Convolutional Networks for Spherical Signals
Authors Taco Cohen, Mario Geiger, Jonas Köhler, Max Welling
Abstract The success of convolutional networks in learning problems involving planar signals such as images is due to their ability to exploit the translation symmetry of the data distribution through weight sharing. Many areas of science and egineering deal with signals with other symmetries, such as rotation invariant data on the sphere. Examples include climate and weather science, astrophysics, and chemistry. In this paper we present spherical convolutional networks. These networks use convolutions on the sphere and rotation group, which results in rotational weight sharing and rotation equivariance. Using a synthetic spherical MNIST dataset, we show that spherical convolutional networks are very effective at dealing with rotationally invariant classification problems.
Tasks
Published 2017-09-14
URL http://arxiv.org/abs/1709.04893v2
PDF http://arxiv.org/pdf/1709.04893v2.pdf
PWC https://paperswithcode.com/paper/convolutional-networks-for-spherical-signals
Repo https://github.com/jonas-koehler/s2cnn
Framework pytorch

Image Restoration by Iterative Denoising and Backward Projections

Title Image Restoration by Iterative Denoising and Backward Projections
Authors Tom Tirer, Raja Giryes
Abstract Inverse problems appear in many applications, such as image deblurring and inpainting. The common approach to address them is to design a specific algorithm for each problem. The Plug-and-Play (P&P) framework, which has been recently introduced, allows solving general inverse problems by leveraging the impressive capabilities of existing denoising algorithms. While this fresh strategy has found many applications, a burdensome parameter tuning is often required in order to obtain high-quality results. In this work, we propose an alternative method for solving inverse problems using off-the-shelf denoisers, which requires less parameter tuning. First, we transform a typical cost function, composed of fidelity and prior terms, into a closely related, novel optimization problem. Then, we propose an efficient minimization scheme with a plug-and-play property, i.e., the prior term is handled solely by a denoising operation. Finally, we present an automatic tuning mechanism to set the method’s parameters. We provide a theoretical analysis of the method, and empirically demonstrate its competitiveness with task-specific techniques and the P&P approach for image inpainting and deblurring.
Tasks Deblurring, Denoising, Image Inpainting, Image Restoration
Published 2017-10-18
URL http://arxiv.org/abs/1710.06647v4
PDF http://arxiv.org/pdf/1710.06647v4.pdf
PWC https://paperswithcode.com/paper/image-restoration-by-iterative-denoising-and
Repo https://github.com/tomtirer/IDBP
Framework none

Population Based Training of Neural Networks

Title Population Based Training of Neural Networks
Authors Max Jaderberg, Valentin Dalibard, Simon Osindero, Wojciech M. Czarnecki, Jeff Donahue, Ali Razavi, Oriol Vinyals, Tim Green, Iain Dunning, Karen Simonyan, Chrisantha Fernando, Koray Kavukcuoglu
Abstract Neural networks dominate the modern machine learning landscape, but their training and success still suffer from sensitivity to empirical choices of hyperparameters such as model architecture, loss function, and optimisation algorithm. In this work we present \emph{Population Based Training (PBT)}, a simple asynchronous optimisation algorithm which effectively utilises a fixed computational budget to jointly optimise a population of models and their hyperparameters to maximise performance. Importantly, PBT discovers a schedule of hyperparameter settings rather than following the generally sub-optimal strategy of trying to find a single fixed set to use for the whole course of training. With just a small modification to a typical distributed hyperparameter training framework, our method allows robust and reliable training of models. We demonstrate the effectiveness of PBT on deep reinforcement learning problems, showing faster wall-clock convergence and higher final performance of agents by optimising over a suite of hyperparameters. In addition, we show the same method can be applied to supervised learning for machine translation, where PBT is used to maximise the BLEU score directly, and also to training of Generative Adversarial Networks to maximise the Inception score of generated images. In all cases PBT results in the automatic discovery of hyperparameter schedules and model selection which results in stable training and better final performance.
Tasks Machine Translation, Model Selection
Published 2017-11-27
URL http://arxiv.org/abs/1711.09846v2
PDF http://arxiv.org/pdf/1711.09846v2.pdf
PWC https://paperswithcode.com/paper/population-based-training-of-neural-networks
Repo https://github.com/cipher813/pbt
Framework none

Feature selection in high-dimensional dataset using MapReduce

Title Feature selection in high-dimensional dataset using MapReduce
Authors Claudio Reggiani, Yann-Aël Le Borgne, Gianluca Bontempi
Abstract This paper describes a distributed MapReduce implementation of the minimum Redundancy Maximum Relevance algorithm, a popular feature selection method in bioinformatics and network inference problems. The proposed approach handles both tall/narrow and wide/short datasets. We further provide an open source implementation based on Hadoop/Spark, and illustrate its scalability on datasets involving millions of observations or features.
Tasks Feature Selection
Published 2017-09-07
URL http://arxiv.org/abs/1709.02327v1
PDF http://arxiv.org/pdf/1709.02327v1.pdf
PWC https://paperswithcode.com/paper/feature-selection-in-high-dimensional-dataset
Repo https://github.com/creggian/spark-ifs
Framework none

Inferring Generative Model Structure with Static Analysis

Title Inferring Generative Model Structure with Static Analysis
Authors Paroma Varma, Bryan He, Payal Bajaj, Imon Banerjee, Nishith Khandwala, Daniel L. Rubin, Christopher Ré
Abstract Obtaining enough labeled data to robustly train complex discriminative models is a major bottleneck in the machine learning pipeline. A popular solution is combining multiple sources of weak supervision using generative models. The structure of these models affects training label quality, but is difficult to learn without any ground truth labels. We instead rely on these weak supervision sources having some structure by virtue of being encoded programmatically. We present Coral, a paradigm that infers generative model structure by statically analyzing the code for these heuristics, thus reducing the data required to learn structure significantly. We prove that Coral’s sample complexity scales quasilinearly with the number of heuristics and number of relations found, improving over the standard sample complexity, which is exponential in $n$ for identifying $n^{\textrm{th}}$ degree relations. Experimentally, Coral matches or outperforms traditional structure learning approaches by up to 3.81 F1 points. Using Coral to model dependencies instead of assuming independence results in better performance than a fully supervised model by 3.07 accuracy points when heuristics are used to label radiology data without ground truth labels.
Tasks
Published 2017-09-07
URL http://arxiv.org/abs/1709.02477v1
PDF http://arxiv.org/pdf/1709.02477v1.pdf
PWC https://paperswithcode.com/paper/inferring-generative-model-structure-with
Repo https://github.com/HazyResearch/babble
Framework none

Unsupervised Context-Sensitive Spelling Correction of English and Dutch Clinical Free-Text with Word and Character N-Gram Embeddings

Title Unsupervised Context-Sensitive Spelling Correction of English and Dutch Clinical Free-Text with Word and Character N-Gram Embeddings
Authors Pieter Fivez, Simon Šuster, Walter Daelemans
Abstract We present an unsupervised context-sensitive spelling correction method for clinical free-text that uses word and character n-gram embeddings. Our method generates misspelling replacement candidates and ranks them according to their semantic fit, by calculating a weighted cosine similarity between the vectorized representation of a candidate and the misspelling context. To tune the parameters of this model, we generate self-induced spelling error corpora. We perform our experiments for two languages. For English, we greatly outperform off-the-shelf spelling correction tools on a manually annotated MIMIC-III test set, and counter the frequency bias of a noisy channel model, showing that neural embeddings can be successfully exploited to improve upon the state-of-the-art. For Dutch, we also outperform an off-the-shelf spelling correction tool on manually annotated clinical records from the Antwerp University Hospital, but can offer no empirical evidence that our method counters the frequency bias of a noisy channel model in this case as well. However, both our context-sensitive model and our implementation of the noisy channel model obtain high scores on the test set, establishing a state-of-the-art for Dutch clinical spelling correction with the noisy channel model.
Tasks Spelling Correction
Published 2017-10-19
URL http://arxiv.org/abs/1710.07045v1
PDF http://arxiv.org/pdf/1710.07045v1.pdf
PWC https://paperswithcode.com/paper/unsupervised-context-sensitive-spelling-1
Repo https://github.com/clips/clinspell
Framework none

Gated Orthogonal Recurrent Units: On Learning to Forget

Title Gated Orthogonal Recurrent Units: On Learning to Forget
Authors Li Jing, Caglar Gulcehre, John Peurifoy, Yichen Shen, Max Tegmark, Marin Soljačić, Yoshua Bengio
Abstract We present a novel recurrent neural network (RNN) based model that combines the remembering ability of unitary RNNs with the ability of gated RNNs to effectively forget redundant/irrelevant information in its memory. We achieve this by extending unitary RNNs with a gating mechanism. Our model is able to outperform LSTMs, GRUs and Unitary RNNs on several long-term dependency benchmark tasks. We empirically both show the orthogonal/unitary RNNs lack the ability to forget and also the ability of GORU to simultaneously remember long term dependencies while forgetting irrelevant information. This plays an important role in recurrent neural networks. We provide competitive results along with an analysis of our model on many natural sequential tasks including the bAbI Question Answering, TIMIT speech spectrum prediction, Penn TreeBank, and synthetic tasks that involve long-term dependencies such as algorithmic, parenthesis, denoising and copying tasks.
Tasks Denoising, Question Answering
Published 2017-06-08
URL http://arxiv.org/abs/1706.02761v3
PDF http://arxiv.org/pdf/1706.02761v3.pdf
PWC https://paperswithcode.com/paper/gated-orthogonal-recurrent-units-on-learning
Repo https://github.com/jingli9111/GORU-tensorflow
Framework tf

Is space a word, too?

Title Is space a word, too?
Authors Jake Ryland Williams, Giovanni C. Santia
Abstract For words, rank-frequency distributions have long been heralded for adherence to a potentially-universal phenomenon known as Zipf’s law. The hypothetical form of this empirical phenomenon was refined by Ben^{i}ot Mandelbrot to that which is presently referred to as the Zipf-Mandelbrot law. Parallel to this, Herbet Simon proposed a selection model potentially explaining Zipf’s law. However, a significant dispute between Simon and Mandelbrot, notable empirical exceptions, and the lack of a strong empirical connection between Simon’s model and the Zipf-Mandelbrot law have left the questions of universality and mechanistic generation open. We offer a resolution to these issues by exhibiting how the dark matter of word segmentation, i.e., space, punctuation, etc., connect the Zipf-Mandelbrot law to Simon’s mechanistic process. This explains Mandelbrot’s refinement as no more than a fudge factor, accommodating the effects of the exclusion of the rank-frequency dark matter. Thus, integrating these non-word objects resolves a more-generalized rank-frequency law. Since this relies upon the integration of space, etc., we find support for the hypothesis that $all$ are generated by common processes, indicating from a physical perspective that space is a word, too.
Tasks
Published 2017-10-20
URL http://arxiv.org/abs/1710.07729v1
PDF http://arxiv.org/pdf/1710.07729v1.pdf
PWC https://paperswithcode.com/paper/is-space-a-word-too
Repo https://github.com/gsantia/Gutenberg
Framework none

Confidence through Attention

Title Confidence through Attention
Authors Matīss Rikters, Mark Fishel
Abstract Attention distributions of the generated translations are a useful bi-product of attention-based recurrent neural network translation models and can be treated as soft alignments between the input and output tokens. In this work, we use attention distributions as a confidence metric for output translations. We present two strategies of using the attention distributions: filtering out bad translations from a large back-translated corpus, and selecting the best translation in a hybrid setup of two different translation systems. While manual evaluation indicated only a weak correlation between our confidence score and human judgments, the use-cases showed improvements of up to 2.22 BLEU points for filtering and 0.99 points for hybrid translation, tested on English<->German and English<->Latvian translation.
Tasks Machine Translation
Published 2017-10-10
URL http://arxiv.org/abs/1710.03743v1
PDF http://arxiv.org/pdf/1710.03743v1.pdf
PWC https://paperswithcode.com/paper/confidence-through-attention
Repo https://github.com/M4t1ss/ConfidenceThroughAttention
Framework none

Parametric Gaussian Process Regression for Big Data

Title Parametric Gaussian Process Regression for Big Data
Authors Maziar Raissi
Abstract This work introduces the concept of parametric Gaussian processes (PGPs), which is built upon the seemingly self-contradictory idea of making Gaussian processes parametric. Parametric Gaussian processes, by construction, are designed to operate in “big data” regimes where one is interested in quantifying the uncertainty associated with noisy data. The proposed methodology circumvents the well-established need for stochastic variational inference, a scalable algorithm for approximating posterior distributions. The effectiveness of the proposed approach is demonstrated using an illustrative example with simulated data and a benchmark dataset in the airline industry with approximately 6 million records.
Tasks Gaussian Processes
Published 2017-04-11
URL http://arxiv.org/abs/1704.03144v2
PDF http://arxiv.org/pdf/1704.03144v2.pdf
PWC https://paperswithcode.com/paper/parametric-gaussian-process-regression-for
Repo https://github.com/maziarraissi/ParametricGP
Framework tf

Unified Embedding and Metric Learning for Zero-Exemplar Event Detection

Title Unified Embedding and Metric Learning for Zero-Exemplar Event Detection
Authors Noureldien Hussein, Efstratios Gavves, Arnold W. M. Smeulders
Abstract Event detection in unconstrained videos is conceived as a content-based video retrieval with two modalities: textual and visual. Given a text describing a novel event, the goal is to rank related videos accordingly. This task is zero-exemplar, no video examples are given to the novel event. Related works train a bank of concept detectors on external data sources. These detectors predict confidence scores for test videos, which are ranked and retrieved accordingly. In contrast, we learn a joint space in which the visual and textual representations are embedded. The space casts a novel event as a probability of pre-defined events. Also, it learns to measure the distance between an event and its related videos. Our model is trained end-to-end on publicly available EventNet. When applied to TRECVID Multimedia Event Detection dataset, it outperforms the state-of-the-art by a considerable margin.
Tasks Metric Learning, Video Retrieval
Published 2017-05-05
URL http://arxiv.org/abs/1705.02148v1
PDF http://arxiv.org/pdf/1705.02148v1.pdf
PWC https://paperswithcode.com/paper/unified-embedding-and-metric-learning-for
Repo https://github.com/noureldien/unified_embedding
Framework none

Teacher-Student Curriculum Learning

Title Teacher-Student Curriculum Learning
Authors Tambet Matiisen, Avital Oliver, Taco Cohen, John Schulman
Abstract We propose Teacher-Student Curriculum Learning (TSCL), a framework for automatic curriculum learning, where the Student tries to learn a complex task and the Teacher automatically chooses subtasks from a given set for the Student to train on. We describe a family of Teacher algorithms that rely on the intuition that the Student should practice more those tasks on which it makes the fastest progress, i.e. where the slope of the learning curve is highest. In addition, the Teacher algorithms address the problem of forgetting by also choosing tasks where the Student’s performance is getting worse. We demonstrate that TSCL matches or surpasses the results of carefully hand-crafted curricula in two tasks: addition of decimal numbers with LSTM and navigation in Minecraft. Using our automatically generated curriculum enabled to solve a Minecraft maze that could not be solved at all when training directly on solving the maze, and the learning was an order of magnitude faster than uniform sampling of subtasks.
Tasks
Published 2017-07-01
URL http://arxiv.org/abs/1707.00183v2
PDF http://arxiv.org/pdf/1707.00183v2.pdf
PWC https://paperswithcode.com/paper/teacher-student-curriculum-learning
Repo https://github.com/tambetm/TSCL
Framework none

Dropout Feature Ranking for Deep Learning Models

Title Dropout Feature Ranking for Deep Learning Models
Authors Chun-Hao Chang, Ladislav Rampasek, Anna Goldenberg
Abstract Deep neural networks (DNNs) achieve state-of-the-art results in a variety of domains. Unfortunately, DNNs are notorious for their non-interpretability, and thus limit their applicability in hypothesis-driven domains such as biology and healthcare. Moreover, in the resource-constraint setting, it is critical to design tests relying on fewer more informative features leading to high accuracy performance within reasonable budget. We aim to close this gap by proposing a new general feature ranking method for deep learning. We show that our simple yet effective method performs on par or compares favorably to eight strawman, classical and deep-learning feature ranking methods in two simulations and five very different datasets on tasks ranging from classification to regression, in both static and time series scenarios. We also illustrate the use of our method on a drug response dataset and show that it identifies genes relevant to the drug-response.
Tasks Time Series
Published 2017-12-22
URL http://arxiv.org/abs/1712.08645v2
PDF http://arxiv.org/pdf/1712.08645v2.pdf
PWC https://paperswithcode.com/paper/dropout-feature-ranking-for-deep-learning
Repo https://github.com/zzzace2000/dropout-feature-ranking
Framework pytorch

Representations of language in a model of visually grounded speech signal

Title Representations of language in a model of visually grounded speech signal
Authors Grzegorz Chrupała, Lieke Gelderloos, Afra Alishahi
Abstract We present a visually grounded model of speech perception which projects spoken utterances and images to a joint semantic space. We use a multi-layer recurrent highway network to model the temporal nature of spoken speech, and show that it learns to extract both form and meaning-based linguistic knowledge from the input signal. We carry out an in-depth analysis of the representations used by different components of the trained model and show that encoding of semantic aspects tends to become richer as we go up the hierarchy of layers, whereas encoding of form-related aspects of the language input tends to initially increase and then plateau or decrease.
Tasks
Published 2017-02-07
URL http://arxiv.org/abs/1702.01991v3
PDF http://arxiv.org/pdf/1702.01991v3.pdf
PWC https://paperswithcode.com/paper/representations-of-language-in-a-model-of
Repo https://github.com/gchrupala/visually-grounded-speech
Framework none

Binarized Convolutional Landmark Localizers for Human Pose Estimation and Face Alignment with Limited Resources

Title Binarized Convolutional Landmark Localizers for Human Pose Estimation and Face Alignment with Limited Resources
Authors Adrian Bulat, Georgios Tzimiropoulos
Abstract Our goal is to design architectures that retain the groundbreaking performance of CNNs for landmark localization and at the same time are lightweight, compact and suitable for applications with limited computational resources. To this end, we make the following contributions: (a) we are the first to study the effect of neural network binarization on localization tasks, namely human pose estimation and face alignment. We exhaustively evaluate various design choices, identify performance bottlenecks, and more importantly propose multiple orthogonal ways to boost performance. (b) Based on our analysis, we propose a novel hierarchical, parallel and multi-scale residual architecture that yields large performance improvement over the standard bottleneck block while having the same number of parameters, thus bridging the gap between the original network and its binarized counterpart. (c) We perform a large number of ablation studies that shed light on the properties and the performance of the proposed block. (d) We present results for experiments on the most challenging datasets for human pose estimation and face alignment, reporting in many cases state-of-the-art performance. Code can be downloaded from https://www.adrianbulat.com/binary-cnn-landmarks
Tasks Face Alignment, Pose Estimation
Published 2017-03-02
URL http://arxiv.org/abs/1703.00862v2
PDF http://arxiv.org/pdf/1703.00862v2.pdf
PWC https://paperswithcode.com/paper/binarized-convolutional-landmark-localizers
Repo https://github.com/1adrianb/binary-human-pose-estimation
Framework torch
comments powered by Disqus