May 7, 2019

2808 words 14 mins read

Paper Group AWR 33

Variational Neural Machine Translation. Improving Reliability of Word Similarity Evaluation by Redesigning Annotation Task and Performance Measure. GPU-accelerated real-time stixel computation. Using Discourse Signals for Robust Instructor Intervention Prediction. Slack and Margin Rescaling as Convex Extensions of Supermodular Functions. Single-Cha …

Variational Neural Machine Translation


Title	Variational Neural Machine Translation
Authors	Biao Zhang, Deyi Xiong, Jinsong Su, Hong Duan, Min Zhang
Abstract	Models of neural machine translation are often from a discriminative family of encoderdecoders that learn a conditional distribution of a target sentence given a source sentence. In this paper, we propose a variational model to learn this conditional distribution for neural machine translation: a variational encoderdecoder model that can be trained end-to-end. Different from the vanilla encoder-decoder model that generates target translations from hidden representations of source sentences alone, the variational model introduces a continuous latent variable to explicitly model underlying semantics of source sentences and to guide the generation of target translations. In order to perform efficient posterior inference and large-scale training, we build a neural posterior approximator conditioned on both the source and the target sides, and equip it with a reparameterization technique to estimate the variational lower bound. Experiments on both Chinese-English and English- German translation tasks show that the proposed variational neural machine translation achieves significant improvements over the vanilla neural machine translation baselines.
Tasks	Machine Translation
Published	2016-05-25
URL	http://arxiv.org/abs/1605.07869v2
PDF	http://arxiv.org/pdf/1605.07869v2.pdf
PWC	https://paperswithcode.com/paper/variational-neural-machine-translation
Repo	https://github.com/DeepLearnXMU/VNMT
Framework	none

Improving Reliability of Word Similarity Evaluation by Redesigning Annotation Task and Performance Measure


Title	Improving Reliability of Word Similarity Evaluation by Redesigning Annotation Task and Performance Measure
Authors	Oded Avraham, Yoav Goldberg
Abstract	We suggest a new method for creating and using gold-standard datasets for word similarity evaluation. Our goal is to improve the reliability of the evaluation, and we do this by redesigning the annotation task to achieve higher inter-rater agreement, and by defining a performance measure which takes the reliability of each annotation decision in the dataset into account.
Tasks
Published	2016-11-11
URL	http://arxiv.org/abs/1611.03641v2
PDF	http://arxiv.org/pdf/1611.03641v2.pdf
PWC	https://paperswithcode.com/paper/improving-reliability-of-word-similarity
Repo	https://github.com/oavraham1/ag-evaluation
Framework	none

GPU-accelerated real-time stixel computation


Title	GPU-accelerated real-time stixel computation
Authors	Daniel Hernandez-Juarez, Antonio Espinosa, David Vázquez, Antonio Manuel López, Juan Carlos Moure
Abstract	The Stixel World is a medium-level, compact representation of road scenes that abstracts millions of disparity pixels into hundreds or thousands of stixels. The goal of this work is to implement and evaluate a complete multi-stixel estimation pipeline on an embedded, energy-efficient, GPU-accelerated device. This work presents a full GPU-accelerated implementation of stixel estimation that produces reliable results at 26 frames per second (real-time) on the Tegra X1 for disparity images of 1024x440 pixels and stixel widths of 5 pixels, and achieves more than 400 frames per second on a high-end Titan X GPU card.
Tasks
Published	2016-10-13
URL	http://arxiv.org/abs/1610.04124v1
PDF	http://arxiv.org/pdf/1610.04124v1.pdf
PWC	https://paperswithcode.com/paper/gpu-accelerated-real-time-stixel-computation
Repo	https://github.com/dhernandez0/stixels
Framework	none

Using Discourse Signals for Robust Instructor Intervention Prediction


Title	Using Discourse Signals for Robust Instructor Intervention Prediction
Authors	Muthu Kumar Chandrasekaran, Carrie Demmans Epp, Min-Yen Kan, Diane Litman
Abstract	We tackle the prediction of instructor intervention in student posts from discussion forums in Massive Open Online Courses (MOOCs). Our key finding is that using automatically obtained discourse relations improves the prediction of when instructors intervene in student discussions, when compared with a state-of-the-art, feature-rich baseline. Our supervised classifier makes use of an automatic discourse parser which outputs Penn Discourse Treebank (PDTB) tags that represent in-post discourse features. We show PDTB relation-based features increase the robustness of the classifier and complement baseline features in recalling more diverse instructor intervention patterns. In comprehensive experiments over 14 MOOC offerings from several disciplines, the PDTB discourse features improve performance on average. The resultant models are less dependent on domain-specific vocabulary, allowing them to better generalize to new courses.
Tasks
Published	2016-12-03
URL	http://arxiv.org/abs/1612.00944v1
PDF	http://arxiv.org/pdf/1612.00944v1.pdf
PWC	https://paperswithcode.com/paper/using-discourse-signals-for-robust-instructor
Repo	https://github.com/WING-NUS/lib4moocdata
Framework	none

Slack and Margin Rescaling as Convex Extensions of Supermodular Functions


Title	Slack and Margin Rescaling as Convex Extensions of Supermodular Functions
Authors	Matthew B. Blaschko
Abstract	Slack and margin rescaling are variants of the structured output SVM, which is frequently applied to problems in computer vision such as image segmentation, object localization, and learning parts based object models. They define convex surrogates to task specific loss functions, which, when specialized to non-additive loss functions for multi-label problems, yield extensions to increasing set functions. We demonstrate in this paper that we may use these concepts to define polynomial time convex extensions of arbitrary supermodular functions, providing an analysis framework for the tightness of these surrogates. This analysis framework shows that, while neither margin nor slack rescaling dominate the other, known bounds on supermodular functions can be used to derive extensions that dominate both of these, indicating possible directions for defining novel structured output prediction surrogates. In addition to the analysis of structured prediction loss functions, these results imply an approach to supermodular minimization in which margin rescaling is combined with non-polynomial time convex extensions to compute a sequence of LP relaxations reminiscent of a cutting plane method. This approach is applied to the problem of selecting representative exemplars from a set of images, validating our theoretical contributions.
Tasks	Object Localization, Semantic Segmentation, Structured Prediction
Published	2016-06-19
URL	http://arxiv.org/abs/1606.05918v2
PDF	http://arxiv.org/pdf/1606.05918v2.pdf
PWC	https://paperswithcode.com/paper/slack-and-margin-rescaling-as-convex
Repo	https://github.com/blaschko/supermodularLP
Framework	none

Single-Channel Multi-Speaker Separation using Deep Clustering


Title	Single-Channel Multi-Speaker Separation using Deep Clustering
Authors	Yusuf Isik, Jonathan Le Roux, Zhuo Chen, Shinji Watanabe, John R. Hershey
Abstract	Deep clustering is a recently introduced deep learning architecture that uses discriminatively trained embeddings as the basis for clustering. It was recently applied to spectrogram segmentation, resulting in impressive results on speaker-independent multi-speaker separation. In this paper we extend the baseline system with an end-to-end signal approximation objective that greatly improves performance on a challenging speech separation. We first significantly improve upon the baseline system performance by incorporating better regularization, larger temporal context, and a deeper architecture, culminating in an overall improvement in signal to distortion ratio (SDR) of 10.3 dB compared to the baseline of 6.0 dB for two-speaker separation, as well as a 7.1 dB SDR improvement for three-speaker separation. We then extend the model to incorporate an enhancement layer to refine the signal estimates, and perform end-to-end training through both the clustering and enhancement stages to maximize signal fidelity. We evaluate the results using automatic speech recognition. The new signal approximation objective, combined with end-to-end training, produces unprecedented performance, reducing the word error rate (WER) from 89.1% down to 30.8%. This represents a major advancement towards solving the cocktail party problem.
Tasks	Speaker Separation, Speech Recognition, Speech Separation
Published	2016-07-07
URL	http://arxiv.org/abs/1607.02173v1
PDF	http://arxiv.org/pdf/1607.02173v1.pdf
PWC	https://paperswithcode.com/paper/single-channel-multi-speaker-separation-using
Repo	https://github.com/ishandutta2007/Speech-Denoising-Landscape
Framework	none

Multi-Oriented Text Detection with Fully Convolutional Networks


Title	Multi-Oriented Text Detection with Fully Convolutional Networks
Authors	Zheng Zhang, Chengquan Zhang, Wei Shen, Cong Yao, Wenyu Liu, Xiang Bai
Abstract	In this paper, we propose a novel approach for text detec- tion in natural images. Both local and global cues are taken into account for localizing text lines in a coarse-to-fine pro- cedure. First, a Fully Convolutional Network (FCN) model is trained to predict the salient map of text regions in a holistic manner. Then, text line hypotheses are estimated by combining the salient map and character components. Fi- nally, another FCN classifier is used to predict the centroid of each character, in order to remove the false hypotheses. The framework is general for handling text in multiple ori- entations, languages and fonts. The proposed method con- sistently achieves the state-of-the-art performance on three text detection benchmarks: MSRA-TD500, ICDAR2015 and ICDAR2013.
Tasks
Published	2016-04-14
URL	http://arxiv.org/abs/1604.04018v2
PDF	http://arxiv.org/pdf/1604.04018v2.pdf
PWC	https://paperswithcode.com/paper/multi-oriented-text-detection-with-fully
Repo	https://github.com/stupidZZ/FCN_Text
Framework	torch

Matching Networks for One Shot Learning


Title	Matching Networks for One Shot Learning
Authors	Oriol Vinyals, Charles Blundell, Timothy Lillicrap, Koray Kavukcuoglu, Daan Wierstra
Abstract	Learning from a few examples remains a key challenge in machine learning. Despite recent advances in important domains such as vision and language, the standard supervised deep learning paradigm does not offer a satisfactory solution for learning new concepts rapidly from little data. In this work, we employ ideas from metric learning based on deep neural features and from recent advances that augment neural networks with external memories. Our framework learns a network that maps a small labelled support set and an unlabelled example to its label, obviating the need for fine-tuning to adapt to new class types. We then define one-shot learning problems on vision (using Omniglot, ImageNet) and language tasks. Our algorithm improves one-shot accuracy on ImageNet from 87.6% to 93.2% and from 88.0% to 93.8% on Omniglot compared to competing approaches. We also demonstrate the usefulness of the same model on language modeling by introducing a one-shot task on the Penn Treebank.
Tasks	Few-Shot Image Classification, Few-Shot Learning, Language Modelling, Metric Learning, Omniglot, One-Shot Learning
Published	2016-06-13
URL	http://arxiv.org/abs/1606.04080v2
PDF	http://arxiv.org/pdf/1606.04080v2.pdf
PWC	https://paperswithcode.com/paper/matching-networks-for-one-shot-learning
Repo	https://github.com/schatty/matching-networks-tf
Framework	tf

Improving Neural Language Models with a Continuous Cache


Title	Improving Neural Language Models with a Continuous Cache
Authors	Edouard Grave, Armand Joulin, Nicolas Usunier
Abstract	We propose an extension to neural network language models to adapt their prediction to the recent history. Our model is a simplified version of memory augmented networks, which stores past hidden activations as memory and accesses them through a dot product with the current hidden activation. This mechanism is very efficient and scales to very large memory sizes. We also draw a link between the use of external memory in neural network and cache models used with count based language models. We demonstrate on several language model datasets that our approach performs significantly better than recent memory augmented networks.
Tasks	Language Modelling
Published	2016-12-13
URL	http://arxiv.org/abs/1612.04426v1
PDF	http://arxiv.org/pdf/1612.04426v1.pdf
PWC	https://paperswithcode.com/paper/improving-neural-language-models-with-a
Repo	https://github.com/arvieFrydenlund/awd-lstm-lm
Framework	pytorch

Deep Interactive Object Selection


Title	Deep Interactive Object Selection
Authors	Ning Xu, Brian Price, Scott Cohen, Jimei Yang, Thomas Huang
Abstract	Interactive object selection is a very important research problem and has many applications. Previous algorithms require substantial user interactions to estimate the foreground and background distributions. In this paper, we present a novel deep learning based algorithm which has a much better understanding of objectness and thus can reduce user interactions to just a few clicks. Our algorithm transforms user provided positive and negative clicks into two Euclidean distance maps which are then concatenated with the RGB channels of images to compose (image, user interactions) pairs. We generate many of such pairs by combining several random sampling strategies to model user click patterns and use them to fine tune deep Fully Convolutional Networks (FCNs). Finally the output probability maps of our FCN 8s model is integrated with graph cut optimization to refine the boundary segments. Our model is trained on the PASCAL segmentation dataset and evaluated on other datasets with different object classes. Experimental results on both seen and unseen objects clearly demonstrate that our algorithm has a good generalization ability and is superior to all existing interactive object selection approaches.
Tasks
Published	2016-03-13
URL	http://arxiv.org/abs/1603.04042v1
PDF	http://arxiv.org/pdf/1603.04042v1.pdf
PWC	https://paperswithcode.com/paper/deep-interactive-object-selection
Repo	https://github.com/IntelVCL/Intseg
Framework	tf

SQuAD: 100,000+ Questions for Machine Comprehension of Text


Title	SQuAD: 100,000+ Questions for Machine Comprehension of Text
Authors	Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, Percy Liang
Abstract	We present the Stanford Question Answering Dataset (SQuAD), a new reading comprehension dataset consisting of 100,000+ questions posed by crowdworkers on a set of Wikipedia articles, where the answer to each question is a segment of text from the corresponding reading passage. We analyze the dataset to understand the types of reasoning required to answer the questions, leaning heavily on dependency and constituency trees. We build a strong logistic regression model, which achieves an F1 score of 51.0%, a significant improvement over a simple baseline (20%). However, human performance (86.8%) is much higher, indicating that the dataset presents a good challenge problem for future research. The dataset is freely available at https://stanford-qa.com
Tasks	Question Answering, Reading Comprehension
Published	2016-06-16
URL	http://arxiv.org/abs/1606.05250v3
PDF	http://arxiv.org/pdf/1606.05250v3.pdf
PWC	https://paperswithcode.com/paper/squad-100000-questions-for-machine
Repo	https://github.com/ZhangShiyue/QGforQA
Framework	tf

Personalized Speech recognition on mobile devices


Title	Personalized Speech recognition on mobile devices
Authors	Ian McGraw, Rohit Prabhavalkar, Raziel Alvarez, Montse Gonzalez Arenas, Kanishka Rao, David Rybach, Ouais Alsharif, Hasim Sak, Alexander Gruenstein, Francoise Beaufays, Carolina Parada
Abstract	We describe a large vocabulary speech recognition system that is accurate, has low latency, and yet has a small enough memory and computational footprint to run faster than real-time on a Nexus 5 Android smartphone. We employ a quantized Long Short-Term Memory (LSTM) acoustic model trained with connectionist temporal classification (CTC) to directly predict phoneme targets, and further reduce its memory footprint using an SVD-based compression scheme. Additionally, we minimize our memory footprint by using a single language model for both dictation and voice command domains, constructed using Bayesian interpolation. Finally, in order to properly handle device-specific information, such as proper names and other context-dependent information, we inject vocabulary items into the decoder graph and bias the language model on-the-fly. Our system achieves 13.5% word error rate on an open-ended dictation task, running with a median speed that is seven times faster than real-time.
Tasks	Language Modelling, Large Vocabulary Continuous Speech Recognition, Speech Recognition
Published	2016-03-10
URL	http://arxiv.org/abs/1603.03185v2
PDF	http://arxiv.org/pdf/1603.03185v2.pdf
PWC	https://paperswithcode.com/paper/personalized-speech-recognition-on-mobile
Repo	https://github.com/knlee-voice/PaperNotes
Framework	none

Hierarchical Object Detection with Deep Reinforcement Learning


Title	Hierarchical Object Detection with Deep Reinforcement Learning
Authors	Miriam Bellver, Xavier Giro-i-Nieto, Ferran Marques, Jordi Torres
Abstract	We present a method for performing hierarchical object detection in images guided by a deep reinforcement learning agent. The key idea is to focus on those parts of the image that contain richer information and zoom on them. We train an intelligent agent that, given an image window, is capable of deciding where to focus the attention among five different predefined region candidates (smaller windows). This procedure is iterated providing a hierarchical image analysis.We compare two different candidate proposal strategies to guide the object search: with and without overlap. Moreover, our work compares two different strategies to extract features from a convolutional neural network for each region proposal: a first one that computes new feature maps for each region proposal, and a second one that computes the feature maps for the whole image to later generate crops for each region proposal. Experiments indicate better results for the overlapping candidate proposal strategy and a loss of performance for the cropped image features due to the loss of spatial resolution. We argue that, while this loss seems unavoidable when working with large amounts of object candidates, the much more reduced amount of region proposals generated by our reinforcement learning agent allows considering to extract features for each location without sharing convolutional computation among regions.
Tasks	Object Detection
Published	2016-11-11
URL	http://arxiv.org/abs/1611.03718v2
PDF	http://arxiv.org/pdf/1611.03718v2.pdf
PWC	https://paperswithcode.com/paper/hierarchical-object-detection-with-deep
Repo	https://github.com/imatge-upc/detection-2016-nipsws
Framework	none

Fast Bayesian Non-Negative Matrix Factorisation and Tri-Factorisation


Title	Fast Bayesian Non-Negative Matrix Factorisation and Tri-Factorisation
Authors	Thomas Brouwer, Jes Frellsen, Pietro Lio’
Abstract	We present a fast variational Bayesian algorithm for performing non-negative matrix factorisation and tri-factorisation. We show that our approach achieves faster convergence per iteration and timestep (wall-clock) than Gibbs sampling and non-probabilistic approaches, and do not require additional samples to estimate the posterior. We show that in particular for matrix tri-factorisation convergence is difficult, but our variational Bayesian approach offers a fast solution, allowing the tri-factorisation approach to be used more effectively.
Tasks
Published	2016-10-26
URL	http://arxiv.org/abs/1610.08127v1
PDF	http://arxiv.org/pdf/1610.08127v1.pdf
PWC	https://paperswithcode.com/paper/fast-bayesian-non-negative-matrix
Repo	https://github.com/ThomasBrouwer/HMF
Framework	none

Learning to Detect Multiple Photographic Defects


Title	Learning to Detect Multiple Photographic Defects
Authors	Ning Yu, Xiaohui Shen, Zhe Lin, Radomir Mech, Connelly Barnes
Abstract	In this paper, we introduce the problem of simultaneously detecting multiple photographic defects. We aim at detecting the existence, severity, and potential locations of common photographic defects related to color, noise, blur and composition. The automatic detection of such defects could be used to provide users with suggestions for how to improve photos without the need to laboriously try various correction methods. Defect detection could also help users select photos of higher quality while filtering out those with severe defects in photo curation and summarization. To investigate this problem, we collected a large-scale dataset of user annotations on seven common photographic defects, which allows us to evaluate algorithms by measuring their consistency with human judgments. Our new dataset enables us to formulate the problem as a multi-task learning problem and train a multi-column deep convolutional neural network (CNN) to simultaneously predict the severity of all the defects. Unlike some existing single-defect estimation methods that rely on low-level statistics and may fail in many cases on natural photographs, our model is able to understand image contents and quality at a higher level. As a result, in our experiments, we show that our model has predictions with much higher consistency with human judgments than low-level methods as well as several baseline CNN models. Our model also performs better than an average human from our user study.
Tasks	Multi-Task Learning
Published	2016-12-06
URL	http://arxiv.org/abs/1612.01635v5
PDF	http://arxiv.org/pdf/1612.01635v5.pdf
PWC	https://paperswithcode.com/paper/learning-to-detect-multiple-photographic
Repo	https://github.com/ningyu1991/DefectDetection
Framework	caffe2