May 7, 2019

2808 words 14 mins read

Paper Group AWR 33

Paper Group AWR 33

Variational Neural Machine Translation. Improving Reliability of Word Similarity Evaluation by Redesigning Annotation Task and Performance Measure. GPU-accelerated real-time stixel computation. Using Discourse Signals for Robust Instructor Intervention Prediction. Slack and Margin Rescaling as Convex Extensions of Supermodular Functions. Single-Cha …

Variational Neural Machine Translation

Title Variational Neural Machine Translation
Authors Biao Zhang, Deyi Xiong, Jinsong Su, Hong Duan, Min Zhang
Abstract Models of neural machine translation are often from a discriminative family of encoderdecoders that learn a conditional distribution of a target sentence given a source sentence. In this paper, we propose a variational model to learn this conditional distribution for neural machine translation: a variational encoderdecoder model that can be trained end-to-end. Different from the vanilla encoder-decoder model that generates target translations from hidden representations of source sentences alone, the variational model introduces a continuous latent variable to explicitly model underlying semantics of source sentences and to guide the generation of target translations. In order to perform efficient posterior inference and large-scale training, we build a neural posterior approximator conditioned on both the source and the target sides, and equip it with a reparameterization technique to estimate the variational lower bound. Experiments on both Chinese-English and English- German translation tasks show that the proposed variational neural machine translation achieves significant improvements over the vanilla neural machine translation baselines.
Tasks Machine Translation
Published 2016-05-25
URL http://arxiv.org/abs/1605.07869v2
PDF http://arxiv.org/pdf/1605.07869v2.pdf
PWC https://paperswithcode.com/paper/variational-neural-machine-translation
Repo https://github.com/DeepLearnXMU/VNMT
Framework none

Improving Reliability of Word Similarity Evaluation by Redesigning Annotation Task and Performance Measure

Title Improving Reliability of Word Similarity Evaluation by Redesigning Annotation Task and Performance Measure
Authors Oded Avraham, Yoav Goldberg
Abstract We suggest a new method for creating and using gold-standard datasets for word similarity evaluation. Our goal is to improve the reliability of the evaluation, and we do this by redesigning the annotation task to achieve higher inter-rater agreement, and by defining a performance measure which takes the reliability of each annotation decision in the dataset into account.
Tasks
Published 2016-11-11
URL http://arxiv.org/abs/1611.03641v2
PDF http://arxiv.org/pdf/1611.03641v2.pdf
PWC https://paperswithcode.com/paper/improving-reliability-of-word-similarity
Repo https://github.com/oavraham1/ag-evaluation
Framework none

GPU-accelerated real-time stixel computation

Title GPU-accelerated real-time stixel computation
Authors Daniel Hernandez-Juarez, Antonio Espinosa, David Vázquez, Antonio Manuel López, Juan Carlos Moure
Abstract The Stixel World is a medium-level, compact representation of road scenes that abstracts millions of disparity pixels into hundreds or thousands of stixels. The goal of this work is to implement and evaluate a complete multi-stixel estimation pipeline on an embedded, energy-efficient, GPU-accelerated device. This work presents a full GPU-accelerated implementation of stixel estimation that produces reliable results at 26 frames per second (real-time) on the Tegra X1 for disparity images of 1024x440 pixels and stixel widths of 5 pixels, and achieves more than 400 frames per second on a high-end Titan X GPU card.
Tasks
Published 2016-10-13
URL http://arxiv.org/abs/1610.04124v1
PDF http://arxiv.org/pdf/1610.04124v1.pdf
PWC https://paperswithcode.com/paper/gpu-accelerated-real-time-stixel-computation
Repo https://github.com/dhernandez0/stixels
Framework none

Using Discourse Signals for Robust Instructor Intervention Prediction

Title Using Discourse Signals for Robust Instructor Intervention Prediction
Authors Muthu Kumar Chandrasekaran, Carrie Demmans Epp, Min-Yen Kan, Diane Litman
Abstract We tackle the prediction of instructor intervention in student posts from discussion forums in Massive Open Online Courses (MOOCs). Our key finding is that using automatically obtained discourse relations improves the prediction of when instructors intervene in student discussions, when compared with a state-of-the-art, feature-rich baseline. Our supervised classifier makes use of an automatic discourse parser which outputs Penn Discourse Treebank (PDTB) tags that represent in-post discourse features. We show PDTB relation-based features increase the robustness of the classifier and complement baseline features in recalling more diverse instructor intervention patterns. In comprehensive experiments over 14 MOOC offerings from several disciplines, the PDTB discourse features improve performance on average. The resultant models are less dependent on domain-specific vocabulary, allowing them to better generalize to new courses.
Tasks
Published 2016-12-03
URL http://arxiv.org/abs/1612.00944v1
PDF http://arxiv.org/pdf/1612.00944v1.pdf
PWC https://paperswithcode.com/paper/using-discourse-signals-for-robust-instructor
Repo https://github.com/WING-NUS/lib4moocdata
Framework none

Slack and Margin Rescaling as Convex Extensions of Supermodular Functions

Title Slack and Margin Rescaling as Convex Extensions of Supermodular Functions
Authors Matthew B. Blaschko
Abstract Slack and margin rescaling are variants of the structured output SVM, which is frequently applied to problems in computer vision such as image segmentation, object localization, and learning parts based object models. They define convex surrogates to task specific loss functions, which, when specialized to non-additive loss functions for multi-label problems, yield extensions to increasing set functions. We demonstrate in this paper that we may use these concepts to define polynomial time convex extensions of arbitrary supermodular functions, providing an analysis framework for the tightness of these surrogates. This analysis framework shows that, while neither margin nor slack rescaling dominate the other, known bounds on supermodular functions can be used to derive extensions that dominate both of these, indicating possible directions for defining novel structured output prediction surrogates. In addition to the analysis of structured prediction loss functions, these results imply an approach to supermodular minimization in which margin rescaling is combined with non-polynomial time convex extensions to compute a sequence of LP relaxations reminiscent of a cutting plane method. This approach is applied to the problem of selecting representative exemplars from a set of images, validating our theoretical contributions.
Tasks Object Localization, Semantic Segmentation, Structured Prediction
Published 2016-06-19
URL http://arxiv.org/abs/1606.05918v2
PDF http://arxiv.org/pdf/1606.05918v2.pdf
PWC https://paperswithcode.com/paper/slack-and-margin-rescaling-as-convex
Repo https://github.com/blaschko/supermodularLP
Framework none

Single-Channel Multi-Speaker Separation using Deep Clustering

Title Single-Channel Multi-Speaker Separation using Deep Clustering
Authors Yusuf Isik, Jonathan Le Roux, Zhuo Chen, Shinji Watanabe, John R. Hershey
Abstract Deep clustering is a recently introduced deep learning architecture that uses discriminatively trained embeddings as the basis for clustering. It was recently applied to spectrogram segmentation, resulting in impressive results on speaker-independent multi-speaker separation. In this paper we extend the baseline system with an end-to-end signal approximation objective that greatly improves performance on a challenging speech separation. We first significantly improve upon the baseline system performance by incorporating better regularization, larger temporal context, and a deeper architecture, culminating in an overall improvement in signal to distortion ratio (SDR) of 10.3 dB compared to the baseline of 6.0 dB for two-speaker separation, as well as a 7.1 dB SDR improvement for three-speaker separation. We then extend the model to incorporate an enhancement layer to refine the signal estimates, and perform end-to-end training through both the clustering and enhancement stages to maximize signal fidelity. We evaluate the results using automatic speech recognition. The new signal approximation objective, combined with end-to-end training, produces unprecedented performance, reducing the word error rate (WER) from 89.1% down to 30.8%. This represents a major advancement towards solving the cocktail party problem.
Tasks Speaker Separation, Speech Recognition, Speech Separation
Published 2016-07-07
URL http://arxiv.org/abs/1607.02173v1
PDF http://arxiv.org/pdf/1607.02173v1.pdf
PWC https://paperswithcode.com/paper/single-channel-multi-speaker-separation-using
Repo https://github.com/ishandutta2007/Speech-Denoising-Landscape
Framework none

Multi-Oriented Text Detection with Fully Convolutional Networks

Title Multi-Oriented Text Detection with Fully Convolutional Networks
Authors Zheng Zhang, Chengquan Zhang, Wei Shen, Cong Yao, Wenyu Liu, Xiang Bai
Abstract In this paper, we propose a novel approach for text detec- tion in natural images. Both local and global cues are taken into account for localizing text lines in a coarse-to-fine pro- cedure. First, a Fully Convolutional Network (FCN) model is trained to predict the salient map of text regions in a holistic manner. Then, text line hypotheses are estimated by combining the salient map and character components. Fi- nally, another FCN classifier is used to predict the centroid of each character, in order to remove the false hypotheses. The framework is general for handling text in multiple ori- entations, languages and fonts. The proposed method con- sistently achieves the state-of-the-art performance on three text detection benchmarks: MSRA-TD500, ICDAR2015 and ICDAR2013.
Tasks
Published 2016-04-14
URL http://arxiv.org/abs/1604.04018v2
PDF http://arxiv.org/pdf/1604.04018v2.pdf
PWC https://paperswithcode.com/paper/multi-oriented-text-detection-with-fully
Repo https://github.com/stupidZZ/FCN_Text
Framework torch

Matching Networks for One Shot Learning

Title Matching Networks for One Shot Learning
Authors Oriol Vinyals, Charles Blundell, Timothy Lillicrap, Koray Kavukcuoglu, Daan Wierstra
Abstract Learning from a few examples remains a key challenge in machine learning. Despite recent advances in important domains such as vision and language, the standard supervised deep learning paradigm does not offer a satisfactory solution for learning new concepts rapidly from little data. In this work, we employ ideas from metric learning based on deep neural features and from recent advances that augment neural networks with external memories. Our framework learns a network that maps a small labelled support set and an unlabelled example to its label, obviating the need for fine-tuning to adapt to new class types. We then define one-shot learning problems on vision (using Omniglot, ImageNet) and language tasks. Our algorithm improves one-shot accuracy on ImageNet from 87.6% to 93.2% and from 88.0% to 93.8% on Omniglot compared to competing approaches. We also demonstrate the usefulness of the same model on language modeling by introducing a one-shot task on the Penn Treebank.
Tasks Few-Shot Image Classification, Few-Shot Learning, Language Modelling, Metric Learning, Omniglot, One-Shot Learning
Published 2016-06-13
URL http://arxiv.org/abs/1606.04080v2
PDF http://arxiv.org/pdf/1606.04080v2.pdf
PWC https://paperswithcode.com/paper/matching-networks-for-one-shot-learning
Repo https://github.com/schatty/matching-networks-tf
Framework tf

Improving Neural Language Models with a Continuous Cache

Title Improving Neural Language Models with a Continuous Cache
Authors Edouard Grave, Armand Joulin, Nicolas Usunier
Abstract We propose an extension to neural network language models to adapt their prediction to the recent history. Our model is a simplified version of memory augmented networks, which stores past hidden activations as memory and accesses them through a dot product with the current hidden activation. This mechanism is very efficient and scales to very large memory sizes. We also draw a link between the use of external memory in neural network and cache models used with count based language models. We demonstrate on several language model datasets that our approach performs significantly better than recent memory augmented networks.
Tasks Language Modelling
Published 2016-12-13
URL http://arxiv.org/abs/1612.04426v1
PDF http://arxiv.org/pdf/1612.04426v1.pdf
PWC https://paperswithcode.com/paper/improving-neural-language-models-with-a
Repo https://github.com/arvieFrydenlund/awd-lstm-lm
Framework pytorch

Deep Interactive Object Selection

Title Deep Interactive Object Selection
Authors Ning Xu, Brian Price, Scott Cohen, Jimei Yang, Thomas Huang
Abstract Interactive object selection is a very important research problem and has many applications. Previous algorithms require substantial user interactions to estimate the foreground and background distributions. In this paper, we present a novel deep learning based algorithm which has a much better understanding of objectness and thus can reduce user interactions to just a few clicks. Our algorithm transforms user provided positive and negative clicks into two Euclidean distance maps which are then concatenated with the RGB channels of images to compose (image, user interactions) pairs. We generate many of such pairs by combining several random sampling strategies to model user click patterns and use them to fine tune deep Fully Convolutional Networks (FCNs). Finally the output probability maps of our FCN 8s model is integrated with graph cut optimization to refine the boundary segments. Our model is trained on the PASCAL segmentation dataset and evaluated on other datasets with different object classes. Experimental results on both seen and unseen objects clearly demonstrate that our algorithm has a good generalization ability and is superior to all existing interactive object selection approaches.
Tasks
Published 2016-03-13
URL http://arxiv.org/abs/1603.04042v1
PDF http://arxiv.org/pdf/1603.04042v1.pdf
PWC https://paperswithcode.com/paper/deep-interactive-object-selection
Repo https://github.com/IntelVCL/Intseg
Framework tf

SQuAD: 100,000+ Questions for Machine Comprehension of Text

Title SQuAD: 100,000+ Questions for Machine Comprehension of Text
Authors Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, Percy Liang
Abstract We present the Stanford Question Answering Dataset (SQuAD), a new reading comprehension dataset consisting of 100,000+ questions posed by crowdworkers on a set of Wikipedia articles, where the answer to each question is a segment of text from the corresponding reading passage. We analyze the dataset to understand the types of reasoning required to answer the questions, leaning heavily on dependency and constituency trees. We build a strong logistic regression model, which achieves an F1 score of 51.0%, a significant improvement over a simple baseline (20%). However, human performance (86.8%) is much higher, indicating that the dataset presents a good challenge problem for future research. The dataset is freely available at https://stanford-qa.com
Tasks Question Answering, Reading Comprehension
Published 2016-06-16
URL http://arxiv.org/abs/1606.05250v3
PDF http://arxiv.org/pdf/1606.05250v3.pdf
PWC https://paperswithcode.com/paper/squad-100000-questions-for-machine
Repo https://github.com/ZhangShiyue/QGforQA
Framework tf

Personalized Speech recognition on mobile devices

Title Personalized Speech recognition on mobile devices
Authors Ian McGraw, Rohit Prabhavalkar, Raziel Alvarez, Montse Gonzalez Arenas, Kanishka Rao, David Rybach, Ouais Alsharif, Hasim Sak, Alexander Gruenstein, Francoise Beaufays, Carolina Parada
Abstract We describe a large vocabulary speech recognition system that is accurate, has low latency, and yet has a small enough memory and computational footprint to run faster than real-time on a Nexus 5 Android smartphone. We employ a quantized Long Short-Term Memory (LSTM) acoustic model trained with connectionist temporal classification (CTC) to directly predict phoneme targets, and further reduce its memory footprint using an SVD-based compression scheme. Additionally, we minimize our memory footprint by using a single language model for both dictation and voice command domains, constructed using Bayesian interpolation. Finally, in order to properly handle device-specific information, such as proper names and other context-dependent information, we inject vocabulary items into the decoder graph and bias the language model on-the-fly. Our system achieves 13.5% word error rate on an open-ended dictation task, running with a median speed that is seven times faster than real-time.
Tasks Language Modelling, Large Vocabulary Continuous Speech Recognition, Speech Recognition
Published 2016-03-10
URL http://arxiv.org/abs/1603.03185v2
PDF http://arxiv.org/pdf/1603.03185v2.pdf
PWC https://paperswithcode.com/paper/personalized-speech-recognition-on-mobile
Repo https://github.com/knlee-voice/PaperNotes
Framework none

Hierarchical Object Detection with Deep Reinforcement Learning

Title Hierarchical Object Detection with Deep Reinforcement Learning
Authors Miriam Bellver, Xavier Giro-i-Nieto, Ferran Marques, Jordi Torres
Abstract We present a method for performing hierarchical object detection in images guided by a deep reinforcement learning agent. The key idea is to focus on those parts of the image that contain richer information and zoom on them. We train an intelligent agent that, given an image window, is capable of deciding where to focus the attention among five different predefined region candidates (smaller windows). This procedure is iterated providing a hierarchical image analysis.We compare two different candidate proposal strategies to guide the object search: with and without overlap. Moreover, our work compares two different strategies to extract features from a convolutional neural network for each region proposal: a first one that computes new feature maps for each region proposal, and a second one that computes the feature maps for the whole image to later generate crops for each region proposal. Experiments indicate better results for the overlapping candidate proposal strategy and a loss of performance for the cropped image features due to the loss of spatial resolution. We argue that, while this loss seems unavoidable when working with large amounts of object candidates, the much more reduced amount of region proposals generated by our reinforcement learning agent allows considering to extract features for each location without sharing convolutional computation among regions.
Tasks Object Detection
Published 2016-11-11
URL http://arxiv.org/abs/1611.03718v2
PDF http://arxiv.org/pdf/1611.03718v2.pdf
PWC https://paperswithcode.com/paper/hierarchical-object-detection-with-deep
Repo https://github.com/imatge-upc/detection-2016-nipsws
Framework none

Fast Bayesian Non-Negative Matrix Factorisation and Tri-Factorisation

Title Fast Bayesian Non-Negative Matrix Factorisation and Tri-Factorisation
Authors Thomas Brouwer, Jes Frellsen, Pietro Lio’
Abstract We present a fast variational Bayesian algorithm for performing non-negative matrix factorisation and tri-factorisation. We show that our approach achieves faster convergence per iteration and timestep (wall-clock) than Gibbs sampling and non-probabilistic approaches, and do not require additional samples to estimate the posterior. We show that in particular for matrix tri-factorisation convergence is difficult, but our variational Bayesian approach offers a fast solution, allowing the tri-factorisation approach to be used more effectively.
Tasks
Published 2016-10-26
URL http://arxiv.org/abs/1610.08127v1
PDF http://arxiv.org/pdf/1610.08127v1.pdf
PWC https://paperswithcode.com/paper/fast-bayesian-non-negative-matrix
Repo https://github.com/ThomasBrouwer/HMF
Framework none

Learning to Detect Multiple Photographic Defects

Title Learning to Detect Multiple Photographic Defects
Authors Ning Yu, Xiaohui Shen, Zhe Lin, Radomir Mech, Connelly Barnes
Abstract In this paper, we introduce the problem of simultaneously detecting multiple photographic defects. We aim at detecting the existence, severity, and potential locations of common photographic defects related to color, noise, blur and composition. The automatic detection of such defects could be used to provide users with suggestions for how to improve photos without the need to laboriously try various correction methods. Defect detection could also help users select photos of higher quality while filtering out those with severe defects in photo curation and summarization. To investigate this problem, we collected a large-scale dataset of user annotations on seven common photographic defects, which allows us to evaluate algorithms by measuring their consistency with human judgments. Our new dataset enables us to formulate the problem as a multi-task learning problem and train a multi-column deep convolutional neural network (CNN) to simultaneously predict the severity of all the defects. Unlike some existing single-defect estimation methods that rely on low-level statistics and may fail in many cases on natural photographs, our model is able to understand image contents and quality at a higher level. As a result, in our experiments, we show that our model has predictions with much higher consistency with human judgments than low-level methods as well as several baseline CNN models. Our model also performs better than an average human from our user study.
Tasks Multi-Task Learning
Published 2016-12-06
URL http://arxiv.org/abs/1612.01635v5
PDF http://arxiv.org/pdf/1612.01635v5.pdf
PWC https://paperswithcode.com/paper/learning-to-detect-multiple-photographic
Repo https://github.com/ningyu1991/DefectDetection
Framework caffe2
comments powered by Disqus