May 7, 2019

2968 words 14 mins read

Paper Group AWR 96

DeMoN: Depth and Motion Network for Learning Monocular Stereo. Recurrent Neural Networks for Polyphonic Sound Event Detection in Real Life Recordings. Deep Motif Dashboard: Visualizing and Understanding Genomic Sequences Using Deep Neural Networks. Building Large Machine Reading-Comprehension Datasets using Paragraph Vectors. Tweet2Vec: Character-B …

DeMoN: Depth and Motion Network for Learning Monocular Stereo


Title	DeMoN: Depth and Motion Network for Learning Monocular Stereo
Authors	Benjamin Ummenhofer, Huizhong Zhou, Jonas Uhrig, Nikolaus Mayer, Eddy Ilg, Alexey Dosovitskiy, Thomas Brox
Abstract	In this paper we formulate structure from motion as a learning problem. We train a convolutional network end-to-end to compute depth and camera motion from successive, unconstrained image pairs. The architecture is composed of multiple stacked encoder-decoder networks, the core part being an iterative network that is able to improve its own predictions. The network estimates not only depth and motion, but additionally surface normals, optical flow between the images and confidence of the matching. A crucial component of the approach is a training loss based on spatial relative differences. Compared to traditional two-frame structure from motion methods, results are more accurate and more robust. In contrast to the popular depth-from-single-image networks, DeMoN learns the concept of matching and, thus, better generalizes to structures not seen during training.
Tasks	Depth And Camera Motion, Optical Flow Estimation
Published	2016-12-07
URL	http://arxiv.org/abs/1612.02401v2
PDF	http://arxiv.org/pdf/1612.02401v2.pdf
PWC	https://paperswithcode.com/paper/demon-depth-and-motion-network-for-learning
Repo	https://github.com/lmb-freiburg/demon
Framework	tf

Recurrent Neural Networks for Polyphonic Sound Event Detection in Real Life Recordings


Title	Recurrent Neural Networks for Polyphonic Sound Event Detection in Real Life Recordings
Authors	Giambattista Parascandolo, Heikki Huttunen, Tuomas Virtanen
Abstract	In this paper we present an approach to polyphonic sound event detection in real life recordings based on bi-directional long short term memory (BLSTM) recurrent neural networks (RNNs). A single multilabel BLSTM RNN is trained to map acoustic features of a mixture signal consisting of sounds from multiple classes, to binary activity indicators of each event class. Our method is tested on a large database of real-life recordings, with 61 classes (e.g. music, car, speech) from 10 different everyday contexts. The proposed method outperforms previous approaches by a large margin, and the results are further improved using data augmentation techniques. Overall, our system reports an average F1-score of 65.5% on 1 second blocks and 64.7% on single frames, a relative improvement over previous state-of-the-art approach of 6.8% and 15.1% respectively.
Tasks	Data Augmentation, Sound Event Detection
Published	2016-04-04
URL	http://arxiv.org/abs/1604.00861v1
PDF	http://arxiv.org/pdf/1604.00861v1.pdf
PWC	https://paperswithcode.com/paper/recurrent-neural-networks-for-polyphonic
Repo	https://github.com/yardencsGitHub/tweetynet
Framework	tf

Deep Motif Dashboard: Visualizing and Understanding Genomic Sequences Using Deep Neural Networks


Title	Deep Motif Dashboard: Visualizing and Understanding Genomic Sequences Using Deep Neural Networks
Authors	Jack Lanchantin, Ritambhara Singh, Beilun Wang, Yanjun Qi
Abstract	Deep neural network (DNN) models have recently obtained state-of-the-art prediction accuracy for the transcription factor binding (TFBS) site classification task. However, it remains unclear how these approaches identify meaningful DNA sequence signals and give insights as to why TFs bind to certain locations. In this paper, we propose a toolkit called the Deep Motif Dashboard (DeMo Dashboard) which provides a suite of visualization strategies to extract motifs, or sequence patterns from deep neural network models for TFBS classification. We demonstrate how to visualize and understand three important DNN models: convolutional, recurrent, and convolutional-recurrent networks. Our first visualization method is finding a test sequence’s saliency map which uses first-order derivatives to describe the importance of each nucleotide in making the final prediction. Second, considering recurrent models make predictions in a temporal manner (from one end of a TFBS sequence to the other), we introduce temporal output scores, indicating the prediction score of a model over time for a sequential input. Lastly, a class-specific visualization strategy finds the optimal input sequence for a given TFBS positive class via stochastic gradient optimization. Our experimental results indicate that a convolutional-recurrent architecture performs the best among the three architectures. The visualization techniques indicate that CNN-RNN makes predictions by modeling both motifs as well as dependencies among them.
Tasks
Published	2016-08-12
URL	http://arxiv.org/abs/1608.03644v4
PDF	http://arxiv.org/pdf/1608.03644v4.pdf
PWC	https://paperswithcode.com/paper/deep-motif-dashboard-visualizing-and
Repo	https://github.com/QData/DeepMotif
Framework	torch

Building Large Machine Reading-Comprehension Datasets using Paragraph Vectors


Title	Building Large Machine Reading-Comprehension Datasets using Paragraph Vectors
Authors	Radu Soricut, Nan Ding
Abstract	We present a dual contribution to the task of machine reading-comprehension: a technique for creating large-sized machine-comprehension (MC) datasets using paragraph-vector models; and a novel, hybrid neural-network architecture that combines the representation power of recurrent neural networks with the discriminative power of fully-connected multi-layered networks. We use the MC-dataset generation technique to build a dataset of around 2 million examples, for which we empirically determine the high-ceiling of human performance (around 91% accuracy), as well as the performance of a variety of computer models. Among all the models we have experimented with, our hybrid neural-network architecture achieves the highest performance (83.2% accuracy). The remaining gap to the human-performance ceiling provides enough room for future model improvements.
Tasks	Machine Reading Comprehension, Reading Comprehension
Published	2016-12-13
URL	http://arxiv.org/abs/1612.04342v1
PDF	http://arxiv.org/pdf/1612.04342v1.pdf
PWC	https://paperswithcode.com/paper/building-large-machine-reading-comprehension
Repo	https://github.com/google/mcafp
Framework	none


Title	Tweet2Vec: Character-Based Distributed Representations for Social Media
Authors	Bhuwan Dhingra, Zhong Zhou, Dylan Fitzpatrick, Michael Muehl, William W. Cohen
Abstract	Text from social media provides a set of challenges that can cause traditional NLP approaches to fail. Informal language, spelling errors, abbreviations, and special characters are all commonplace in these posts, leading to a prohibitively large vocabulary size for word-level approaches. We propose a character composition model, tweet2vec, which finds vector-space representations of whole tweets by learning complex, non-local dependencies in character sequences. The proposed model outperforms a word-level baseline at predicting user-annotated hashtags associated with the posts, doing significantly better when the input contains many out-of-vocabulary words or unusual character sequences. Our tweet2vec encoder is publicly available.
Tasks
Published	2016-05-11
URL	http://arxiv.org/abs/1605.03481v2
PDF	http://arxiv.org/pdf/1605.03481v2.pdf
PWC	https://paperswithcode.com/paper/tweet2vec-character-based-distributed
Repo	https://github.com/bdhingra/tweet2vec
Framework	none

The VGLC: The Video Game Level Corpus


Title	The VGLC: The Video Game Level Corpus
Authors	Adam James Summerville, Sam Snodgrass, Michael Mateas, Santiago Ontañón
Abstract	Levels are a key component of many different video games, and a large body of work has been produced on how to procedurally generate game levels. Recently, Machine Learning techniques have been applied to video game level generation towards the purpose of automatically generating levels that have the properties of the training corpus. Towards that end we have made available a corpora of video game levels in an easy to parse format ideal for different machine learning and other game AI research purposes.
Tasks
Published	2016-06-23
URL	http://arxiv.org/abs/1606.07487v2
PDF	http://arxiv.org/pdf/1606.07487v2.pdf
PWC	https://paperswithcode.com/paper/the-vglc-the-video-game-level-corpus
Repo	https://github.com/TheVGLC/TheVGLC
Framework	none

Composing graphical models with neural networks for structured representations and fast inference


Title	Composing graphical models with neural networks for structured representations and fast inference
Authors	Matthew J. Johnson, David Duvenaud, Alexander B. Wiltschko, Sandeep R. Datta, Ryan P. Adams
Abstract	We propose a general modeling and inference framework that composes probabilistic graphical models with deep learning methods and combines their respective strengths. Our model family augments graphical structure in latent variables with neural network observation models. For inference, we extend variational autoencoders to use graphical model approximating distributions with recognition networks that output conjugate potentials. All components of these models are learned simultaneously with a single objective, giving a scalable algorithm that leverages stochastic variational inference, natural gradients, graphical model message passing, and the reparameterization trick. We illustrate this framework with several example models and an application to mouse behavioral phenotyping.
Tasks
Published	2016-03-20
URL	http://arxiv.org/abs/1603.06277v5
PDF	http://arxiv.org/pdf/1603.06277v5.pdf
PWC	https://paperswithcode.com/paper/composing-graphical-models-with-neural
Repo	https://github.com/mattjj/svae
Framework	none

Playing SNES in the Retro Learning Environment


Title	Playing SNES in the Retro Learning Environment
Authors	Nadav Bhonker, Shai Rozenberg, Itay Hubara
Abstract	Mastering a video game requires skill, tactics and strategy. While these attributes may be acquired naturally by human players, teaching them to a computer program is a far more challenging task. In recent years, extensive research was carried out in the field of reinforcement learning and numerous algorithms were introduced, aiming to learn how to perform human tasks such as playing video games. As a result, the Arcade Learning Environment (ALE) (Bellemare et al., 2013) has become a commonly used benchmark environment allowing algorithms to train on various Atari 2600 games. In many games the state-of-the-art algorithms outperform humans. In this paper we introduce a new learning environment, the Retro Learning Environment — RLE, that can run games on the Super Nintendo Entertainment System (SNES), Sega Genesis and several other gaming consoles. The environment is expandable, allowing for more video games and consoles to be easily added to the environment, while maintaining the same interface as ALE. Moreover, RLE is compatible with Python and Torch. SNES games pose a significant challenge to current algorithms due to their higher level of complexity and versatility.
Tasks	Atari Games, SNES Games
Published	2016-11-07
URL	http://arxiv.org/abs/1611.02205v2
PDF	http://arxiv.org/pdf/1611.02205v2.pdf
PWC	https://paperswithcode.com/paper/playing-snes-in-the-retro-learning
Repo	https://github.com/nadavbh12/Retro-Learning-Environment
Framework	torch

Attend Refine Repeat: Active Box Proposal Generation via In-Out Localization


Title	Attend Refine Repeat: Active Box Proposal Generation via In-Out Localization
Authors	Spyros Gidaris, Nikos Komodakis
Abstract	The problem of computing category agnostic bounding box proposals is utilized as a core component in many computer vision tasks and thus has lately attracted a lot of attention. In this work we propose a new approach to tackle this problem that is based on an active strategy for generating box proposals that starts from a set of seed boxes, which are uniformly distributed on the image, and then progressively moves its attention on the promising image areas where it is more likely to discover well localized bounding box proposals. We call our approach AttractioNet and a core component of it is a CNN-based category agnostic object location refinement module that is capable of yielding accurate and robust bounding box predictions regardless of the object category. We extensively evaluate our AttractioNet approach on several image datasets (i.e. COCO, PASCAL, ImageNet detection and NYU-Depth V2 datasets) reporting on all of them state-of-the-art results that surpass the previous work in the field by a significant margin and also providing strong empirical evidence that our approach is capable to generalize to unseen categories. Furthermore, we evaluate our AttractioNet proposals in the context of the object detection task using a VGG16-Net based detector and the achieved detection performance on COCO manages to significantly surpass all other VGG16-Net based detectors while even being competitive with a heavily tuned ResNet-101 based detector. Code as well as box proposals computed for several datasets are available at:: https://github.com/gidariss/AttractioNet.
Tasks	Object Detection
Published	2016-06-14
URL	http://arxiv.org/abs/1606.04446v1
PDF	http://arxiv.org/pdf/1606.04446v1.pdf
PWC	https://paperswithcode.com/paper/attend-refine-repeat-active-box-proposal
Repo	https://github.com/gidariss/AttractioNet
Framework	none

In the Saddle: Chasing Fast and Repeatable Features


Title	In the Saddle: Chasing Fast and Repeatable Features
Authors	Javier Aldana-Iuit, Dmytro Mishkin, Ondrej Chum, Jiri Matas
Abstract	A novel similarity-covariant feature detector that extracts points whose neighbourhoods, when treated as a 3D intensity surface, have a saddle-like intensity profile. The saddle condition is verified efficiently by intensity comparisons on two concentric rings that must have exactly two dark-to-bright and two bright-to-dark transitions satisfying certain geometric constraints. Experiments show that the Saddle features are general, evenly spread and appearing in high density in a range of images. The Saddle detector is among the fastest proposed. In comparison with detector with similar speed, the Saddle features show superior matching performance on number of challenging datasets.
Tasks
Published	2016-08-24
URL	http://arxiv.org/abs/1608.06800v1
PDF	http://arxiv.org/pdf/1608.06800v1.pdf
PWC	https://paperswithcode.com/paper/in-the-saddle-chasing-fast-and-repeatable
Repo	https://github.com/aldanjav/saddle_detector
Framework	none

Realistic risk-mitigating recommendations via inverse classification


Title	Realistic risk-mitigating recommendations via inverse classification
Authors	Michael T. Lash, W. Nick Street
Abstract	Inverse classification, the process of making meaningful perturbations to a test point such that it is more likely to have a desired classification, has previously been addressed using data from a single static point in time. Such an approach yields inflated probability estimates, stemming from an implicitly made assumption that recommendations are implemented instantaneously. We propose using longitudinal data to alleviate such issues in two ways. First, we use past outcome probabilities as features in the present. Use of such past probabilities ties historical behavior to the present, allowing for more information to be taken into account when making initial probability estimates and subsequently performing inverse classification. Secondly, following inverse classification application, optimized instances’ unchangeable features (e.g.,~age) are updated using values from the next longitudinal time period. Optimized test instance probabilities are then reassessed. Updating the unchangeable features in this manner reflects the notion that improvements in outcome likelihood, which result from following the inverse classification recommendations, do not materialize instantaneously. As our experiments demonstrate, more realistic estimates of probability can be obtained by factoring in such considerations.
Tasks
Published	2016-11-13
URL	http://arxiv.org/abs/1611.04199v1
PDF	http://arxiv.org/pdf/1611.04199v1.pdf
PWC	https://paperswithcode.com/paper/realistic-risk-mitigating-recommendations-via
Repo	https://github.com/michael-lash/LongARIC
Framework	none

Local minima in training of neural networks


Title	Local minima in training of neural networks
Authors	Grzegorz Swirszcz, Wojciech Marian Czarnecki, Razvan Pascanu
Abstract	There has been a lot of recent interest in trying to characterize the error surface of deep models. This stems from a long standing question. Given that deep networks are highly nonlinear systems optimized by local gradient methods, why do they not seem to be affected by bad local minima? It is widely believed that training of deep models using gradient methods works so well because the error surface either has no local minima, or if they exist they need to be close in value to the global minimum. It is known that such results hold under very strong assumptions which are not satisfied by real models. In this paper we present examples showing that for such theorem to be true additional assumptions on the data, initialization schemes and/or the model classes have to be made. We look at the particular case of finite size datasets. We demonstrate that in this scenario one can construct counter-examples (datasets or initialization schemes) when the network does become susceptible to bad local minima over the weight space.
Tasks
Published	2016-11-19
URL	http://arxiv.org/abs/1611.06310v2
PDF	http://arxiv.org/pdf/1611.06310v2.pdf
PWC	https://paperswithcode.com/paper/local-minima-in-training-of-neural-networks
Repo	https://github.com/jchunn/Ambition
Framework	tf

Real-Time Intensity-Image Reconstruction for Event Cameras Using Manifold Regularisation


Title	Real-Time Intensity-Image Reconstruction for Event Cameras Using Manifold Regularisation
Authors	Christian Reinbacher, Gottfried Graber, Thomas Pock
Abstract	Event cameras or neuromorphic cameras mimic the human perception system as they measure the per-pixel intensity change rather than the actual intensity level. In contrast to traditional cameras, such cameras capture new information about the scene at MHz frequency in the form of sparse events. The high temporal resolution comes at the cost of losing the familiar per-pixel intensity information. In this work we propose a variational model that accurately models the behaviour of event cameras, enabling reconstruction of intensity images with arbitrary frame rate in real-time. Our method is formulated on a per-event-basis, where we explicitly incorporate information about the asynchronous nature of events via an event manifold induced by the relative timestamps of events. In our experiments we verify that solving the variational model on the manifold produces high-quality images without explicitly estimating optical flow.
Tasks	Image Reconstruction, Optical Flow Estimation
Published	2016-07-21
URL	http://arxiv.org/abs/1607.06283v2
PDF	http://arxiv.org/pdf/1607.06283v2.pdf
PWC	https://paperswithcode.com/paper/real-time-intensity-image-reconstruction-for
Repo	https://github.com/VLOGroup/dvs-reconstruction
Framework	none

A segmental framework for fully-unsupervised large-vocabulary speech recognition


Title	A segmental framework for fully-unsupervised large-vocabulary speech recognition
Authors	Herman Kamper, Aren Jansen, Sharon Goldwater
Abstract	Zero-resource speech technology is a growing research area that aims to develop methods for speech processing in the absence of transcriptions, lexicons, or language modelling text. Early term discovery systems focused on identifying isolated recurring patterns in a corpus, while more recent full-coverage systems attempt to completely segment and cluster the audio into word-like units—effectively performing unsupervised speech recognition. This article presents the first attempt we are aware of to apply such a system to large-vocabulary multi-speaker data. Our system uses a Bayesian modelling framework with segmental word representations: each word segment is represented as a fixed-dimensional acoustic embedding obtained by mapping the sequence of feature frames to a single embedding vector. We compare our system on English and Xitsonga datasets to state-of-the-art baselines, using a variety of measures including word error rate (obtained by mapping the unsupervised output to ground truth transcriptions). Very high word error rates are reported—in the order of 70–80% for speaker-dependent and 80–95% for speaker-independent systems—highlighting the difficulty of this task. Nevertheless, in terms of cluster quality and word segmentation metrics, we show that by imposing a consistent top-down segmentation while also using bottom-up knowledge from detected syllable boundaries, both single-speaker and multi-speaker versions of our system outperform a purely bottom-up single-speaker syllable-based approach. We also show that the discovered clusters can be made less speaker- and gender-specific by using an unsupervised autoencoder-like feature extractor to learn better frame-level features (prior to embedding). Our system’s discovered clusters are still less pure than those of unsupervised term discovery systems, but provide far greater coverage.
Tasks	Language Modelling, Large Vocabulary Continuous Speech Recognition, Speech Recognition
Published	2016-06-22
URL	http://arxiv.org/abs/1606.06950v2
PDF	http://arxiv.org/pdf/1606.06950v2.pdf
PWC	https://paperswithcode.com/paper/a-segmental-framework-for-fully-unsupervised
Repo	https://github.com/kamperh/recipe_bucktsong_awe
Framework	tf

Robust Probabilistic Modeling with Bayesian Data Reweighting


Title	Robust Probabilistic Modeling with Bayesian Data Reweighting
Authors	Yixin Wang, Alp Kucukelbir, David M. Blei
Abstract	Probabilistic models analyze data by relying on a set of assumptions. Data that exhibit deviations from these assumptions can undermine inference and prediction quality. Robust models offer protection against mismatch between a model’s assumptions and reality. We propose a way to systematically detect and mitigate mismatch of a large class of probabilistic models. The idea is to raise the likelihood of each observation to a weight and then to infer both the latent variables and the weights from data. Inferring the weights allows a model to identify observations that match its assumptions and down-weight others. This enables robust inference and improves predictive accuracy. We study four different forms of mismatch with reality, ranging from missing latent groups to structure misspecification. A Poisson factorization analysis of the Movielens 1M dataset shows the benefits of this approach in a practical scenario.
Tasks
Published	2016-06-13
URL	http://arxiv.org/abs/1606.03860v3
PDF	http://arxiv.org/pdf/1606.03860v3.pdf
PWC	https://paperswithcode.com/paper/robust-probabilistic-modeling-with-bayesian
Repo	https://github.com/yixinwang/robust-rpm-public
Framework	none