Paper Group AWR 96
DeMoN: Depth and Motion Network for Learning Monocular Stereo. Recurrent Neural Networks for Polyphonic Sound Event Detection in Real Life Recordings. Deep Motif Dashboard: Visualizing and Understanding Genomic Sequences Using Deep Neural Networks. Building Large Machine Reading-Comprehension Datasets using Paragraph Vectors. Tweet2Vec: Character-B …
DeMoN: Depth and Motion Network for Learning Monocular Stereo
Title | DeMoN: Depth and Motion Network for Learning Monocular Stereo |
Authors | Benjamin Ummenhofer, Huizhong Zhou, Jonas Uhrig, Nikolaus Mayer, Eddy Ilg, Alexey Dosovitskiy, Thomas Brox |
Abstract | In this paper we formulate structure from motion as a learning problem. We train a convolutional network end-to-end to compute depth and camera motion from successive, unconstrained image pairs. The architecture is composed of multiple stacked encoder-decoder networks, the core part being an iterative network that is able to improve its own predictions. The network estimates not only depth and motion, but additionally surface normals, optical flow between the images and confidence of the matching. A crucial component of the approach is a training loss based on spatial relative differences. Compared to traditional two-frame structure from motion methods, results are more accurate and more robust. In contrast to the popular depth-from-single-image networks, DeMoN learns the concept of matching and, thus, better generalizes to structures not seen during training. |
Tasks | Depth And Camera Motion, Optical Flow Estimation |
Published | 2016-12-07 |
URL | http://arxiv.org/abs/1612.02401v2 |
http://arxiv.org/pdf/1612.02401v2.pdf | |
PWC | https://paperswithcode.com/paper/demon-depth-and-motion-network-for-learning |
Repo | https://github.com/lmb-freiburg/demon |
Framework | tf |
Recurrent Neural Networks for Polyphonic Sound Event Detection in Real Life Recordings
Title | Recurrent Neural Networks for Polyphonic Sound Event Detection in Real Life Recordings |
Authors | Giambattista Parascandolo, Heikki Huttunen, Tuomas Virtanen |
Abstract | In this paper we present an approach to polyphonic sound event detection in real life recordings based on bi-directional long short term memory (BLSTM) recurrent neural networks (RNNs). A single multilabel BLSTM RNN is trained to map acoustic features of a mixture signal consisting of sounds from multiple classes, to binary activity indicators of each event class. Our method is tested on a large database of real-life recordings, with 61 classes (e.g. music, car, speech) from 10 different everyday contexts. The proposed method outperforms previous approaches by a large margin, and the results are further improved using data augmentation techniques. Overall, our system reports an average F1-score of 65.5% on 1 second blocks and 64.7% on single frames, a relative improvement over previous state-of-the-art approach of 6.8% and 15.1% respectively. |
Tasks | Data Augmentation, Sound Event Detection |
Published | 2016-04-04 |
URL | http://arxiv.org/abs/1604.00861v1 |
http://arxiv.org/pdf/1604.00861v1.pdf | |
PWC | https://paperswithcode.com/paper/recurrent-neural-networks-for-polyphonic |
Repo | https://github.com/yardencsGitHub/tweetynet |
Framework | tf |
Deep Motif Dashboard: Visualizing and Understanding Genomic Sequences Using Deep Neural Networks
Title | Deep Motif Dashboard: Visualizing and Understanding Genomic Sequences Using Deep Neural Networks |
Authors | Jack Lanchantin, Ritambhara Singh, Beilun Wang, Yanjun Qi |
Abstract | Deep neural network (DNN) models have recently obtained state-of-the-art prediction accuracy for the transcription factor binding (TFBS) site classification task. However, it remains unclear how these approaches identify meaningful DNA sequence signals and give insights as to why TFs bind to certain locations. In this paper, we propose a toolkit called the Deep Motif Dashboard (DeMo Dashboard) which provides a suite of visualization strategies to extract motifs, or sequence patterns from deep neural network models for TFBS classification. We demonstrate how to visualize and understand three important DNN models: convolutional, recurrent, and convolutional-recurrent networks. Our first visualization method is finding a test sequence’s saliency map which uses first-order derivatives to describe the importance of each nucleotide in making the final prediction. Second, considering recurrent models make predictions in a temporal manner (from one end of a TFBS sequence to the other), we introduce temporal output scores, indicating the prediction score of a model over time for a sequential input. Lastly, a class-specific visualization strategy finds the optimal input sequence for a given TFBS positive class via stochastic gradient optimization. Our experimental results indicate that a convolutional-recurrent architecture performs the best among the three architectures. The visualization techniques indicate that CNN-RNN makes predictions by modeling both motifs as well as dependencies among them. |
Tasks | |
Published | 2016-08-12 |
URL | http://arxiv.org/abs/1608.03644v4 |
http://arxiv.org/pdf/1608.03644v4.pdf | |
PWC | https://paperswithcode.com/paper/deep-motif-dashboard-visualizing-and |
Repo | https://github.com/QData/DeepMotif |
Framework | torch |
Building Large Machine Reading-Comprehension Datasets using Paragraph Vectors
Title | Building Large Machine Reading-Comprehension Datasets using Paragraph Vectors |
Authors | Radu Soricut, Nan Ding |
Abstract | We present a dual contribution to the task of machine reading-comprehension: a technique for creating large-sized machine-comprehension (MC) datasets using paragraph-vector models; and a novel, hybrid neural-network architecture that combines the representation power of recurrent neural networks with the discriminative power of fully-connected multi-layered networks. We use the MC-dataset generation technique to build a dataset of around 2 million examples, for which we empirically determine the high-ceiling of human performance (around 91% accuracy), as well as the performance of a variety of computer models. Among all the models we have experimented with, our hybrid neural-network architecture achieves the highest performance (83.2% accuracy). The remaining gap to the human-performance ceiling provides enough room for future model improvements. |
Tasks | Machine Reading Comprehension, Reading Comprehension |
Published | 2016-12-13 |
URL | http://arxiv.org/abs/1612.04342v1 |
http://arxiv.org/pdf/1612.04342v1.pdf | |
PWC | https://paperswithcode.com/paper/building-large-machine-reading-comprehension |
Repo | https://github.com/google/mcafp |
Framework | none |
Tweet2Vec: Character-Based Distributed Representations for Social Media
Title | Tweet2Vec: Character-Based Distributed Representations for Social Media |
Authors | Bhuwan Dhingra, Zhong Zhou, Dylan Fitzpatrick, Michael Muehl, William W. Cohen |
Abstract | Text from social media provides a set of challenges that can cause traditional NLP approaches to fail. Informal language, spelling errors, abbreviations, and special characters are all commonplace in these posts, leading to a prohibitively large vocabulary size for word-level approaches. We propose a character composition model, tweet2vec, which finds vector-space representations of whole tweets by learning complex, non-local dependencies in character sequences. The proposed model outperforms a word-level baseline at predicting user-annotated hashtags associated with the posts, doing significantly better when the input contains many out-of-vocabulary words or unusual character sequences. Our tweet2vec encoder is publicly available. |
Tasks | |
Published | 2016-05-11 |
URL | http://arxiv.org/abs/1605.03481v2 |
http://arxiv.org/pdf/1605.03481v2.pdf | |
PWC | https://paperswithcode.com/paper/tweet2vec-character-based-distributed |
Repo | https://github.com/bdhingra/tweet2vec |
Framework | none |
The VGLC: The Video Game Level Corpus
Title | The VGLC: The Video Game Level Corpus |
Authors | Adam James Summerville, Sam Snodgrass, Michael Mateas, Santiago Ontañón |
Abstract | Levels are a key component of many different video games, and a large body of work has been produced on how to procedurally generate game levels. Recently, Machine Learning techniques have been applied to video game level generation towards the purpose of automatically generating levels that have the properties of the training corpus. Towards that end we have made available a corpora of video game levels in an easy to parse format ideal for different machine learning and other game AI research purposes. |
Tasks | |
Published | 2016-06-23 |
URL | http://arxiv.org/abs/1606.07487v2 |
http://arxiv.org/pdf/1606.07487v2.pdf | |
PWC | https://paperswithcode.com/paper/the-vglc-the-video-game-level-corpus |
Repo | https://github.com/TheVGLC/TheVGLC |
Framework | none |
Composing graphical models with neural networks for structured representations and fast inference
Title | Composing graphical models with neural networks for structured representations and fast inference |
Authors | Matthew J. Johnson, David Duvenaud, Alexander B. Wiltschko, Sandeep R. Datta, Ryan P. Adams |
Abstract | We propose a general modeling and inference framework that composes probabilistic graphical models with deep learning methods and combines their respective strengths. Our model family augments graphical structure in latent variables with neural network observation models. For inference, we extend variational autoencoders to use graphical model approximating distributions with recognition networks that output conjugate potentials. All components of these models are learned simultaneously with a single objective, giving a scalable algorithm that leverages stochastic variational inference, natural gradients, graphical model message passing, and the reparameterization trick. We illustrate this framework with several example models and an application to mouse behavioral phenotyping. |
Tasks | |
Published | 2016-03-20 |
URL | http://arxiv.org/abs/1603.06277v5 |
http://arxiv.org/pdf/1603.06277v5.pdf | |
PWC | https://paperswithcode.com/paper/composing-graphical-models-with-neural |
Repo | https://github.com/mattjj/svae |
Framework | none |
Playing SNES in the Retro Learning Environment
Title | Playing SNES in the Retro Learning Environment |
Authors | Nadav Bhonker, Shai Rozenberg, Itay Hubara |
Abstract | Mastering a video game requires skill, tactics and strategy. While these attributes may be acquired naturally by human players, teaching them to a computer program is a far more challenging task. In recent years, extensive research was carried out in the field of reinforcement learning and numerous algorithms were introduced, aiming to learn how to perform human tasks such as playing video games. As a result, the Arcade Learning Environment (ALE) (Bellemare et al., 2013) has become a commonly used benchmark environment allowing algorithms to train on various Atari 2600 games. In many games the state-of-the-art algorithms outperform humans. In this paper we introduce a new learning environment, the Retro Learning Environment — RLE, that can run games on the Super Nintendo Entertainment System (SNES), Sega Genesis and several other gaming consoles. The environment is expandable, allowing for more video games and consoles to be easily added to the environment, while maintaining the same interface as ALE. Moreover, RLE is compatible with Python and Torch. SNES games pose a significant challenge to current algorithms due to their higher level of complexity and versatility. |
Tasks | Atari Games, SNES Games |
Published | 2016-11-07 |
URL | http://arxiv.org/abs/1611.02205v2 |
http://arxiv.org/pdf/1611.02205v2.pdf | |
PWC | https://paperswithcode.com/paper/playing-snes-in-the-retro-learning |
Repo | https://github.com/nadavbh12/Retro-Learning-Environment |
Framework | torch |
Attend Refine Repeat: Active Box Proposal Generation via In-Out Localization
Title | Attend Refine Repeat: Active Box Proposal Generation via In-Out Localization |
Authors | Spyros Gidaris, Nikos Komodakis |
Abstract | The problem of computing category agnostic bounding box proposals is utilized as a core component in many computer vision tasks and thus has lately attracted a lot of attention. In this work we propose a new approach to tackle this problem that is based on an active strategy for generating box proposals that starts from a set of seed boxes, which are uniformly distributed on the image, and then progressively moves its attention on the promising image areas where it is more likely to discover well localized bounding box proposals. We call our approach AttractioNet and a core component of it is a CNN-based category agnostic object location refinement module that is capable of yielding accurate and robust bounding box predictions regardless of the object category. We extensively evaluate our AttractioNet approach on several image datasets (i.e. COCO, PASCAL, ImageNet detection and NYU-Depth V2 datasets) reporting on all of them state-of-the-art results that surpass the previous work in the field by a significant margin and also providing strong empirical evidence that our approach is capable to generalize to unseen categories. Furthermore, we evaluate our AttractioNet proposals in the context of the object detection task using a VGG16-Net based detector and the achieved detection performance on COCO manages to significantly surpass all other VGG16-Net based detectors while even being competitive with a heavily tuned ResNet-101 based detector. Code as well as box proposals computed for several datasets are available at:: https://github.com/gidariss/AttractioNet. |
Tasks | Object Detection |
Published | 2016-06-14 |
URL | http://arxiv.org/abs/1606.04446v1 |
http://arxiv.org/pdf/1606.04446v1.pdf | |
PWC | https://paperswithcode.com/paper/attend-refine-repeat-active-box-proposal |
Repo | https://github.com/gidariss/AttractioNet |
Framework | none |
In the Saddle: Chasing Fast and Repeatable Features
Title | In the Saddle: Chasing Fast and Repeatable Features |
Authors | Javier Aldana-Iuit, Dmytro Mishkin, Ondrej Chum, Jiri Matas |
Abstract | A novel similarity-covariant feature detector that extracts points whose neighbourhoods, when treated as a 3D intensity surface, have a saddle-like intensity profile. The saddle condition is verified efficiently by intensity comparisons on two concentric rings that must have exactly two dark-to-bright and two bright-to-dark transitions satisfying certain geometric constraints. Experiments show that the Saddle features are general, evenly spread and appearing in high density in a range of images. The Saddle detector is among the fastest proposed. In comparison with detector with similar speed, the Saddle features show superior matching performance on number of challenging datasets. |
Tasks | |
Published | 2016-08-24 |
URL | http://arxiv.org/abs/1608.06800v1 |
http://arxiv.org/pdf/1608.06800v1.pdf | |
PWC | https://paperswithcode.com/paper/in-the-saddle-chasing-fast-and-repeatable |
Repo | https://github.com/aldanjav/saddle_detector |
Framework | none |
Realistic risk-mitigating recommendations via inverse classification
Title | Realistic risk-mitigating recommendations via inverse classification |
Authors | Michael T. Lash, W. Nick Street |
Abstract | Inverse classification, the process of making meaningful perturbations to a test point such that it is more likely to have a desired classification, has previously been addressed using data from a single static point in time. Such an approach yields inflated probability estimates, stemming from an implicitly made assumption that recommendations are implemented instantaneously. We propose using longitudinal data to alleviate such issues in two ways. First, we use past outcome probabilities as features in the present. Use of such past probabilities ties historical behavior to the present, allowing for more information to be taken into account when making initial probability estimates and subsequently performing inverse classification. Secondly, following inverse classification application, optimized instances’ unchangeable features (e.g.,~age) are updated using values from the next longitudinal time period. Optimized test instance probabilities are then reassessed. Updating the unchangeable features in this manner reflects the notion that improvements in outcome likelihood, which result from following the inverse classification recommendations, do not materialize instantaneously. As our experiments demonstrate, more realistic estimates of probability can be obtained by factoring in such considerations. |
Tasks | |
Published | 2016-11-13 |
URL | http://arxiv.org/abs/1611.04199v1 |
http://arxiv.org/pdf/1611.04199v1.pdf | |
PWC | https://paperswithcode.com/paper/realistic-risk-mitigating-recommendations-via |
Repo | https://github.com/michael-lash/LongARIC |
Framework | none |
Local minima in training of neural networks
Title | Local minima in training of neural networks |
Authors | Grzegorz Swirszcz, Wojciech Marian Czarnecki, Razvan Pascanu |
Abstract | There has been a lot of recent interest in trying to characterize the error surface of deep models. This stems from a long standing question. Given that deep networks are highly nonlinear systems optimized by local gradient methods, why do they not seem to be affected by bad local minima? It is widely believed that training of deep models using gradient methods works so well because the error surface either has no local minima, or if they exist they need to be close in value to the global minimum. It is known that such results hold under very strong assumptions which are not satisfied by real models. In this paper we present examples showing that for such theorem to be true additional assumptions on the data, initialization schemes and/or the model classes have to be made. We look at the particular case of finite size datasets. We demonstrate that in this scenario one can construct counter-examples (datasets or initialization schemes) when the network does become susceptible to bad local minima over the weight space. |
Tasks | |
Published | 2016-11-19 |
URL | http://arxiv.org/abs/1611.06310v2 |
http://arxiv.org/pdf/1611.06310v2.pdf | |
PWC | https://paperswithcode.com/paper/local-minima-in-training-of-neural-networks |
Repo | https://github.com/jchunn/Ambition |
Framework | tf |
Real-Time Intensity-Image Reconstruction for Event Cameras Using Manifold Regularisation
Title | Real-Time Intensity-Image Reconstruction for Event Cameras Using Manifold Regularisation |
Authors | Christian Reinbacher, Gottfried Graber, Thomas Pock |
Abstract | Event cameras or neuromorphic cameras mimic the human perception system as they measure the per-pixel intensity change rather than the actual intensity level. In contrast to traditional cameras, such cameras capture new information about the scene at MHz frequency in the form of sparse events. The high temporal resolution comes at the cost of losing the familiar per-pixel intensity information. In this work we propose a variational model that accurately models the behaviour of event cameras, enabling reconstruction of intensity images with arbitrary frame rate in real-time. Our method is formulated on a per-event-basis, where we explicitly incorporate information about the asynchronous nature of events via an event manifold induced by the relative timestamps of events. In our experiments we verify that solving the variational model on the manifold produces high-quality images without explicitly estimating optical flow. |
Tasks | Image Reconstruction, Optical Flow Estimation |
Published | 2016-07-21 |
URL | http://arxiv.org/abs/1607.06283v2 |
http://arxiv.org/pdf/1607.06283v2.pdf | |
PWC | https://paperswithcode.com/paper/real-time-intensity-image-reconstruction-for |
Repo | https://github.com/VLOGroup/dvs-reconstruction |
Framework | none |
A segmental framework for fully-unsupervised large-vocabulary speech recognition
Title | A segmental framework for fully-unsupervised large-vocabulary speech recognition |
Authors | Herman Kamper, Aren Jansen, Sharon Goldwater |
Abstract | Zero-resource speech technology is a growing research area that aims to develop methods for speech processing in the absence of transcriptions, lexicons, or language modelling text. Early term discovery systems focused on identifying isolated recurring patterns in a corpus, while more recent full-coverage systems attempt to completely segment and cluster the audio into word-like units—effectively performing unsupervised speech recognition. This article presents the first attempt we are aware of to apply such a system to large-vocabulary multi-speaker data. Our system uses a Bayesian modelling framework with segmental word representations: each word segment is represented as a fixed-dimensional acoustic embedding obtained by mapping the sequence of feature frames to a single embedding vector. We compare our system on English and Xitsonga datasets to state-of-the-art baselines, using a variety of measures including word error rate (obtained by mapping the unsupervised output to ground truth transcriptions). Very high word error rates are reported—in the order of 70–80% for speaker-dependent and 80–95% for speaker-independent systems—highlighting the difficulty of this task. Nevertheless, in terms of cluster quality and word segmentation metrics, we show that by imposing a consistent top-down segmentation while also using bottom-up knowledge from detected syllable boundaries, both single-speaker and multi-speaker versions of our system outperform a purely bottom-up single-speaker syllable-based approach. We also show that the discovered clusters can be made less speaker- and gender-specific by using an unsupervised autoencoder-like feature extractor to learn better frame-level features (prior to embedding). Our system’s discovered clusters are still less pure than those of unsupervised term discovery systems, but provide far greater coverage. |
Tasks | Language Modelling, Large Vocabulary Continuous Speech Recognition, Speech Recognition |
Published | 2016-06-22 |
URL | http://arxiv.org/abs/1606.06950v2 |
http://arxiv.org/pdf/1606.06950v2.pdf | |
PWC | https://paperswithcode.com/paper/a-segmental-framework-for-fully-unsupervised |
Repo | https://github.com/kamperh/recipe_bucktsong_awe |
Framework | tf |
Robust Probabilistic Modeling with Bayesian Data Reweighting
Title | Robust Probabilistic Modeling with Bayesian Data Reweighting |
Authors | Yixin Wang, Alp Kucukelbir, David M. Blei |
Abstract | Probabilistic models analyze data by relying on a set of assumptions. Data that exhibit deviations from these assumptions can undermine inference and prediction quality. Robust models offer protection against mismatch between a model’s assumptions and reality. We propose a way to systematically detect and mitigate mismatch of a large class of probabilistic models. The idea is to raise the likelihood of each observation to a weight and then to infer both the latent variables and the weights from data. Inferring the weights allows a model to identify observations that match its assumptions and down-weight others. This enables robust inference and improves predictive accuracy. We study four different forms of mismatch with reality, ranging from missing latent groups to structure misspecification. A Poisson factorization analysis of the Movielens 1M dataset shows the benefits of this approach in a practical scenario. |
Tasks | |
Published | 2016-06-13 |
URL | http://arxiv.org/abs/1606.03860v3 |
http://arxiv.org/pdf/1606.03860v3.pdf | |
PWC | https://paperswithcode.com/paper/robust-probabilistic-modeling-with-bayesian |
Repo | https://github.com/yixinwang/robust-rpm-public |
Framework | none |