July 29, 2019

2832 words 14 mins read

Paper Group AWR 183

Convolutional Networks for Spherical Signals. Image Restoration by Iterative Denoising and Backward Projections. Population Based Training of Neural Networks. Feature selection in high-dimensional dataset using MapReduce. Inferring Generative Model Structure with Static Analysis. Unsupervised Context-Sensitive Spelling Correction of English and Dut …

Convolutional Networks for Spherical Signals


Title	Convolutional Networks for Spherical Signals
Authors	Taco Cohen, Mario Geiger, Jonas Köhler, Max Welling
Abstract	The success of convolutional networks in learning problems involving planar signals such as images is due to their ability to exploit the translation symmetry of the data distribution through weight sharing. Many areas of science and egineering deal with signals with other symmetries, such as rotation invariant data on the sphere. Examples include climate and weather science, astrophysics, and chemistry. In this paper we present spherical convolutional networks. These networks use convolutions on the sphere and rotation group, which results in rotational weight sharing and rotation equivariance. Using a synthetic spherical MNIST dataset, we show that spherical convolutional networks are very effective at dealing with rotationally invariant classification problems.
Tasks
Published	2017-09-14
URL	http://arxiv.org/abs/1709.04893v2
PDF	http://arxiv.org/pdf/1709.04893v2.pdf
PWC	https://paperswithcode.com/paper/convolutional-networks-for-spherical-signals
Repo	https://github.com/jonas-koehler/s2cnn
Framework	pytorch

Image Restoration by Iterative Denoising and Backward Projections


Title	Image Restoration by Iterative Denoising and Backward Projections
Authors	Tom Tirer, Raja Giryes
Abstract	Inverse problems appear in many applications, such as image deblurring and inpainting. The common approach to address them is to design a specific algorithm for each problem. The Plug-and-Play (P&P) framework, which has been recently introduced, allows solving general inverse problems by leveraging the impressive capabilities of existing denoising algorithms. While this fresh strategy has found many applications, a burdensome parameter tuning is often required in order to obtain high-quality results. In this work, we propose an alternative method for solving inverse problems using off-the-shelf denoisers, which requires less parameter tuning. First, we transform a typical cost function, composed of fidelity and prior terms, into a closely related, novel optimization problem. Then, we propose an efficient minimization scheme with a plug-and-play property, i.e., the prior term is handled solely by a denoising operation. Finally, we present an automatic tuning mechanism to set the method’s parameters. We provide a theoretical analysis of the method, and empirically demonstrate its competitiveness with task-specific techniques and the P&P approach for image inpainting and deblurring.
Tasks	Deblurring, Denoising, Image Inpainting, Image Restoration
Published	2017-10-18
URL	http://arxiv.org/abs/1710.06647v4
PDF	http://arxiv.org/pdf/1710.06647v4.pdf
PWC	https://paperswithcode.com/paper/image-restoration-by-iterative-denoising-and
Repo	https://github.com/tomtirer/IDBP
Framework	none

Population Based Training of Neural Networks


Title	Population Based Training of Neural Networks
Authors	Max Jaderberg, Valentin Dalibard, Simon Osindero, Wojciech M. Czarnecki, Jeff Donahue, Ali Razavi, Oriol Vinyals, Tim Green, Iain Dunning, Karen Simonyan, Chrisantha Fernando, Koray Kavukcuoglu
Abstract	Neural networks dominate the modern machine learning landscape, but their training and success still suffer from sensitivity to empirical choices of hyperparameters such as model architecture, loss function, and optimisation algorithm. In this work we present \emph{Population Based Training (PBT)}, a simple asynchronous optimisation algorithm which effectively utilises a fixed computational budget to jointly optimise a population of models and their hyperparameters to maximise performance. Importantly, PBT discovers a schedule of hyperparameter settings rather than following the generally sub-optimal strategy of trying to find a single fixed set to use for the whole course of training. With just a small modification to a typical distributed hyperparameter training framework, our method allows robust and reliable training of models. We demonstrate the effectiveness of PBT on deep reinforcement learning problems, showing faster wall-clock convergence and higher final performance of agents by optimising over a suite of hyperparameters. In addition, we show the same method can be applied to supervised learning for machine translation, where PBT is used to maximise the BLEU score directly, and also to training of Generative Adversarial Networks to maximise the Inception score of generated images. In all cases PBT results in the automatic discovery of hyperparameter schedules and model selection which results in stable training and better final performance.
Tasks	Machine Translation, Model Selection
Published	2017-11-27
URL	http://arxiv.org/abs/1711.09846v2
PDF	http://arxiv.org/pdf/1711.09846v2.pdf
PWC	https://paperswithcode.com/paper/population-based-training-of-neural-networks
Repo	https://github.com/cipher813/pbt
Framework	none

Feature selection in high-dimensional dataset using MapReduce


Title	Feature selection in high-dimensional dataset using MapReduce
Authors	Claudio Reggiani, Yann-Aël Le Borgne, Gianluca Bontempi
Abstract	This paper describes a distributed MapReduce implementation of the minimum Redundancy Maximum Relevance algorithm, a popular feature selection method in bioinformatics and network inference problems. The proposed approach handles both tall/narrow and wide/short datasets. We further provide an open source implementation based on Hadoop/Spark, and illustrate its scalability on datasets involving millions of observations or features.
Tasks	Feature Selection
Published	2017-09-07
URL	http://arxiv.org/abs/1709.02327v1
PDF	http://arxiv.org/pdf/1709.02327v1.pdf
PWC	https://paperswithcode.com/paper/feature-selection-in-high-dimensional-dataset
Repo	https://github.com/creggian/spark-ifs
Framework	none

Inferring Generative Model Structure with Static Analysis


Title	Inferring Generative Model Structure with Static Analysis
Authors	Paroma Varma, Bryan He, Payal Bajaj, Imon Banerjee, Nishith Khandwala, Daniel L. Rubin, Christopher Ré
Abstract	Obtaining enough labeled data to robustly train complex discriminative models is a major bottleneck in the machine learning pipeline. A popular solution is combining multiple sources of weak supervision using generative models. The structure of these models affects training label quality, but is difficult to learn without any ground truth labels. We instead rely on these weak supervision sources having some structure by virtue of being encoded programmatically. We present Coral, a paradigm that infers generative model structure by statically analyzing the code for these heuristics, thus reducing the data required to learn structure significantly. We prove that Coral’s sample complexity scales quasilinearly with the number of heuristics and number of relations found, improving over the standard sample complexity, which is exponential in $n$ for identifying $n^{\textrm{th}}$ degree relations. Experimentally, Coral matches or outperforms traditional structure learning approaches by up to 3.81 F1 points. Using Coral to model dependencies instead of assuming independence results in better performance than a fully supervised model by 3.07 accuracy points when heuristics are used to label radiology data without ground truth labels.
Tasks
Published	2017-09-07
URL	http://arxiv.org/abs/1709.02477v1
PDF	http://arxiv.org/pdf/1709.02477v1.pdf
PWC	https://paperswithcode.com/paper/inferring-generative-model-structure-with
Repo	https://github.com/HazyResearch/babble
Framework	none

Unsupervised Context-Sensitive Spelling Correction of English and Dutch Clinical Free-Text with Word and Character N-Gram Embeddings


Title	Unsupervised Context-Sensitive Spelling Correction of English and Dutch Clinical Free-Text with Word and Character N-Gram Embeddings
Authors	Pieter Fivez, Simon Šuster, Walter Daelemans
Abstract	We present an unsupervised context-sensitive spelling correction method for clinical free-text that uses word and character n-gram embeddings. Our method generates misspelling replacement candidates and ranks them according to their semantic fit, by calculating a weighted cosine similarity between the vectorized representation of a candidate and the misspelling context. To tune the parameters of this model, we generate self-induced spelling error corpora. We perform our experiments for two languages. For English, we greatly outperform off-the-shelf spelling correction tools on a manually annotated MIMIC-III test set, and counter the frequency bias of a noisy channel model, showing that neural embeddings can be successfully exploited to improve upon the state-of-the-art. For Dutch, we also outperform an off-the-shelf spelling correction tool on manually annotated clinical records from the Antwerp University Hospital, but can offer no empirical evidence that our method counters the frequency bias of a noisy channel model in this case as well. However, both our context-sensitive model and our implementation of the noisy channel model obtain high scores on the test set, establishing a state-of-the-art for Dutch clinical spelling correction with the noisy channel model.
Tasks	Spelling Correction
Published	2017-10-19
URL	http://arxiv.org/abs/1710.07045v1
PDF	http://arxiv.org/pdf/1710.07045v1.pdf
PWC	https://paperswithcode.com/paper/unsupervised-context-sensitive-spelling-1
Repo	https://github.com/clips/clinspell
Framework	none

Gated Orthogonal Recurrent Units: On Learning to Forget


Title	Gated Orthogonal Recurrent Units: On Learning to Forget
Authors	Li Jing, Caglar Gulcehre, John Peurifoy, Yichen Shen, Max Tegmark, Marin Soljačić, Yoshua Bengio
Abstract	We present a novel recurrent neural network (RNN) based model that combines the remembering ability of unitary RNNs with the ability of gated RNNs to effectively forget redundant/irrelevant information in its memory. We achieve this by extending unitary RNNs with a gating mechanism. Our model is able to outperform LSTMs, GRUs and Unitary RNNs on several long-term dependency benchmark tasks. We empirically both show the orthogonal/unitary RNNs lack the ability to forget and also the ability of GORU to simultaneously remember long term dependencies while forgetting irrelevant information. This plays an important role in recurrent neural networks. We provide competitive results along with an analysis of our model on many natural sequential tasks including the bAbI Question Answering, TIMIT speech spectrum prediction, Penn TreeBank, and synthetic tasks that involve long-term dependencies such as algorithmic, parenthesis, denoising and copying tasks.
Tasks	Denoising, Question Answering
Published	2017-06-08
URL	http://arxiv.org/abs/1706.02761v3
PDF	http://arxiv.org/pdf/1706.02761v3.pdf
PWC	https://paperswithcode.com/paper/gated-orthogonal-recurrent-units-on-learning
Repo	https://github.com/jingli9111/GORU-tensorflow
Framework	tf

Is space a word, too?


Title	Is space a word, too?
Authors	Jake Ryland Williams, Giovanni C. Santia
Abstract	For words, rank-frequency distributions have long been heralded for adherence to a potentially-universal phenomenon known as Zipf’s law. The hypothetical form of this empirical phenomenon was refined by Ben^{i}ot Mandelbrot to that which is presently referred to as the Zipf-Mandelbrot law. Parallel to this, Herbet Simon proposed a selection model potentially explaining Zipf’s law. However, a significant dispute between Simon and Mandelbrot, notable empirical exceptions, and the lack of a strong empirical connection between Simon’s model and the Zipf-Mandelbrot law have left the questions of universality and mechanistic generation open. We offer a resolution to these issues by exhibiting how the dark matter of word segmentation, i.e., space, punctuation, etc., connect the Zipf-Mandelbrot law to Simon’s mechanistic process. This explains Mandelbrot’s refinement as no more than a fudge factor, accommodating the effects of the exclusion of the rank-frequency dark matter. Thus, integrating these non-word objects resolves a more-generalized rank-frequency law. Since this relies upon the integration of space, etc., we find support for the hypothesis that $all$ are generated by common processes, indicating from a physical perspective that space is a word, too.
Tasks
Published	2017-10-20
URL	http://arxiv.org/abs/1710.07729v1
PDF	http://arxiv.org/pdf/1710.07729v1.pdf
PWC	https://paperswithcode.com/paper/is-space-a-word-too
Repo	https://github.com/gsantia/Gutenberg
Framework	none

Confidence through Attention


Title	Confidence through Attention
Authors	Matīss Rikters, Mark Fishel
Abstract	Attention distributions of the generated translations are a useful bi-product of attention-based recurrent neural network translation models and can be treated as soft alignments between the input and output tokens. In this work, we use attention distributions as a confidence metric for output translations. We present two strategies of using the attention distributions: filtering out bad translations from a large back-translated corpus, and selecting the best translation in a hybrid setup of two different translation systems. While manual evaluation indicated only a weak correlation between our confidence score and human judgments, the use-cases showed improvements of up to 2.22 BLEU points for filtering and 0.99 points for hybrid translation, tested on English<->German and English<->Latvian translation.
Tasks	Machine Translation
Published	2017-10-10
URL	http://arxiv.org/abs/1710.03743v1
PDF	http://arxiv.org/pdf/1710.03743v1.pdf
PWC	https://paperswithcode.com/paper/confidence-through-attention
Repo	https://github.com/M4t1ss/ConfidenceThroughAttention
Framework	none

Parametric Gaussian Process Regression for Big Data


Title	Parametric Gaussian Process Regression for Big Data
Authors	Maziar Raissi
Abstract	This work introduces the concept of parametric Gaussian processes (PGPs), which is built upon the seemingly self-contradictory idea of making Gaussian processes parametric. Parametric Gaussian processes, by construction, are designed to operate in “big data” regimes where one is interested in quantifying the uncertainty associated with noisy data. The proposed methodology circumvents the well-established need for stochastic variational inference, a scalable algorithm for approximating posterior distributions. The effectiveness of the proposed approach is demonstrated using an illustrative example with simulated data and a benchmark dataset in the airline industry with approximately 6 million records.
Tasks	Gaussian Processes
Published	2017-04-11
URL	http://arxiv.org/abs/1704.03144v2
PDF	http://arxiv.org/pdf/1704.03144v2.pdf
PWC	https://paperswithcode.com/paper/parametric-gaussian-process-regression-for
Repo	https://github.com/maziarraissi/ParametricGP
Framework	tf

Unified Embedding and Metric Learning for Zero-Exemplar Event Detection


Title	Unified Embedding and Metric Learning for Zero-Exemplar Event Detection
Authors	Noureldien Hussein, Efstratios Gavves, Arnold W. M. Smeulders
Abstract	Event detection in unconstrained videos is conceived as a content-based video retrieval with two modalities: textual and visual. Given a text describing a novel event, the goal is to rank related videos accordingly. This task is zero-exemplar, no video examples are given to the novel event. Related works train a bank of concept detectors on external data sources. These detectors predict confidence scores for test videos, which are ranked and retrieved accordingly. In contrast, we learn a joint space in which the visual and textual representations are embedded. The space casts a novel event as a probability of pre-defined events. Also, it learns to measure the distance between an event and its related videos. Our model is trained end-to-end on publicly available EventNet. When applied to TRECVID Multimedia Event Detection dataset, it outperforms the state-of-the-art by a considerable margin.
Tasks	Metric Learning, Video Retrieval
Published	2017-05-05
URL	http://arxiv.org/abs/1705.02148v1
PDF	http://arxiv.org/pdf/1705.02148v1.pdf
PWC	https://paperswithcode.com/paper/unified-embedding-and-metric-learning-for
Repo	https://github.com/noureldien/unified_embedding
Framework	none

Teacher-Student Curriculum Learning


Title	Teacher-Student Curriculum Learning
Authors	Tambet Matiisen, Avital Oliver, Taco Cohen, John Schulman
Abstract	We propose Teacher-Student Curriculum Learning (TSCL), a framework for automatic curriculum learning, where the Student tries to learn a complex task and the Teacher automatically chooses subtasks from a given set for the Student to train on. We describe a family of Teacher algorithms that rely on the intuition that the Student should practice more those tasks on which it makes the fastest progress, i.e. where the slope of the learning curve is highest. In addition, the Teacher algorithms address the problem of forgetting by also choosing tasks where the Student’s performance is getting worse. We demonstrate that TSCL matches or surpasses the results of carefully hand-crafted curricula in two tasks: addition of decimal numbers with LSTM and navigation in Minecraft. Using our automatically generated curriculum enabled to solve a Minecraft maze that could not be solved at all when training directly on solving the maze, and the learning was an order of magnitude faster than uniform sampling of subtasks.
Tasks
Published	2017-07-01
URL	http://arxiv.org/abs/1707.00183v2
PDF	http://arxiv.org/pdf/1707.00183v2.pdf
PWC	https://paperswithcode.com/paper/teacher-student-curriculum-learning
Repo	https://github.com/tambetm/TSCL
Framework	none

Dropout Feature Ranking for Deep Learning Models


Title	Dropout Feature Ranking for Deep Learning Models
Authors	Chun-Hao Chang, Ladislav Rampasek, Anna Goldenberg
Abstract	Deep neural networks (DNNs) achieve state-of-the-art results in a variety of domains. Unfortunately, DNNs are notorious for their non-interpretability, and thus limit their applicability in hypothesis-driven domains such as biology and healthcare. Moreover, in the resource-constraint setting, it is critical to design tests relying on fewer more informative features leading to high accuracy performance within reasonable budget. We aim to close this gap by proposing a new general feature ranking method for deep learning. We show that our simple yet effective method performs on par or compares favorably to eight strawman, classical and deep-learning feature ranking methods in two simulations and five very different datasets on tasks ranging from classification to regression, in both static and time series scenarios. We also illustrate the use of our method on a drug response dataset and show that it identifies genes relevant to the drug-response.
Tasks	Time Series
Published	2017-12-22
URL	http://arxiv.org/abs/1712.08645v2
PDF	http://arxiv.org/pdf/1712.08645v2.pdf
PWC	https://paperswithcode.com/paper/dropout-feature-ranking-for-deep-learning
Repo	https://github.com/zzzace2000/dropout-feature-ranking
Framework	pytorch

Representations of language in a model of visually grounded speech signal


Title	Representations of language in a model of visually grounded speech signal
Authors	Grzegorz Chrupała, Lieke Gelderloos, Afra Alishahi
Abstract	We present a visually grounded model of speech perception which projects spoken utterances and images to a joint semantic space. We use a multi-layer recurrent highway network to model the temporal nature of spoken speech, and show that it learns to extract both form and meaning-based linguistic knowledge from the input signal. We carry out an in-depth analysis of the representations used by different components of the trained model and show that encoding of semantic aspects tends to become richer as we go up the hierarchy of layers, whereas encoding of form-related aspects of the language input tends to initially increase and then plateau or decrease.
Tasks
Published	2017-02-07
URL	http://arxiv.org/abs/1702.01991v3
PDF	http://arxiv.org/pdf/1702.01991v3.pdf
PWC	https://paperswithcode.com/paper/representations-of-language-in-a-model-of
Repo	https://github.com/gchrupala/visually-grounded-speech
Framework	none

Binarized Convolutional Landmark Localizers for Human Pose Estimation and Face Alignment with Limited Resources


Title	Binarized Convolutional Landmark Localizers for Human Pose Estimation and Face Alignment with Limited Resources
Authors	Adrian Bulat, Georgios Tzimiropoulos
Abstract	Our goal is to design architectures that retain the groundbreaking performance of CNNs for landmark localization and at the same time are lightweight, compact and suitable for applications with limited computational resources. To this end, we make the following contributions: (a) we are the first to study the effect of neural network binarization on localization tasks, namely human pose estimation and face alignment. We exhaustively evaluate various design choices, identify performance bottlenecks, and more importantly propose multiple orthogonal ways to boost performance. (b) Based on our analysis, we propose a novel hierarchical, parallel and multi-scale residual architecture that yields large performance improvement over the standard bottleneck block while having the same number of parameters, thus bridging the gap between the original network and its binarized counterpart. (c) We perform a large number of ablation studies that shed light on the properties and the performance of the proposed block. (d) We present results for experiments on the most challenging datasets for human pose estimation and face alignment, reporting in many cases state-of-the-art performance. Code can be downloaded from https://www.adrianbulat.com/binary-cnn-landmarks
Tasks	Face Alignment, Pose Estimation
Published	2017-03-02
URL	http://arxiv.org/abs/1703.00862v2
PDF	http://arxiv.org/pdf/1703.00862v2.pdf
PWC	https://paperswithcode.com/paper/binarized-convolutional-landmark-localizers
Repo	https://github.com/1adrianb/binary-human-pose-estimation
Framework	torch