May 7, 2019

2804 words 14 mins read

Paper Group AWR 40

Problems With Evaluation of Word Embeddings Using Word Similarity Tasks. Very Deep Convolutional Networks for End-to-End Speech Recognition. Dictionary Learning for Massive Matrix Factorization. Sequence Graph Transform (SGT): A Feature Extraction Function for Sequence Data Mining (Extended Version). Improving Variational Inference with Inverse Aut …

Problems With Evaluation of Word Embeddings Using Word Similarity Tasks


Title	Problems With Evaluation of Word Embeddings Using Word Similarity Tasks
Authors	Manaal Faruqui, Yulia Tsvetkov, Pushpendre Rastogi, Chris Dyer
Abstract	Lacking standardized extrinsic evaluation methods for vector representations of words, the NLP community has relied heavily on word similarity tasks as a proxy for intrinsic evaluation of word vectors. Word similarity evaluation, which correlates the distance between vectors and human judgments of semantic similarity is attractive, because it is computationally inexpensive and fast. In this paper we present several problems associated with the evaluation of word vectors on word similarity datasets, and summarize existing solutions. Our study suggests that the use of word similarity tasks for evaluation of word vectors is not sustainable and calls for further research on evaluation methods.
Tasks	Semantic Similarity, Semantic Textual Similarity, Word Embeddings
Published	2016-05-08
URL	http://arxiv.org/abs/1605.02276v3
PDF	http://arxiv.org/pdf/1605.02276v3.pdf
PWC	https://paperswithcode.com/paper/problems-with-evaluation-of-word-embeddings
Repo	https://github.com/avi-jit/SWOW-eval
Framework	none

Very Deep Convolutional Networks for End-to-End Speech Recognition


Title	Very Deep Convolutional Networks for End-to-End Speech Recognition
Authors	Yu Zhang, William Chan, Navdeep Jaitly
Abstract	Sequence-to-sequence models have shown success in end-to-end speech recognition. However these models have only used shallow acoustic encoder networks. In our work, we successively train very deep convolutional networks to add more expressive power and better generalization for end-to-end ASR models. We apply network-in-network principles, batch normalization, residual connections and convolutional LSTMs to build very deep recurrent and convolutional structures. Our models exploit the spectral structure in the feature space and add computational depth without overfitting issues. We experiment with the WSJ ASR task and achieve 10.5% word error rate without any dictionary or language using a 15 layer deep network.
Tasks	End-To-End Speech Recognition, Speech Recognition
Published	2016-10-10
URL	http://arxiv.org/abs/1610.03022v1
PDF	http://arxiv.org/pdf/1610.03022v1.pdf
PWC	https://paperswithcode.com/paper/very-deep-convolutional-networks-for-end-to
Repo	https://github.com/colaprograms/speechify
Framework	tf

Dictionary Learning for Massive Matrix Factorization


Title	Dictionary Learning for Massive Matrix Factorization
Authors	Arthur Mensch, Julien Mairal, Bertrand Thirion, Gaël Varoquaux
Abstract	Sparse matrix factorization is a popular tool to obtain interpretable data decompositions, which are also effective to perform data completion or denoising. Its applicability to large datasets has been addressed with online and randomized methods, that reduce the complexity in one of the matrix dimension, but not in both of them. In this paper, we tackle very large matrices in both dimensions. We propose a new factoriza-tion method that scales gracefully to terabyte-scale datasets, that could not be processed by previous algorithms in a reasonable amount of time. We demonstrate the efficiency of our approach on massive functional Magnetic Resonance Imaging (fMRI) data, and on matrix completion problems for recommender systems, where we obtain significant speed-ups compared to state-of-the art coordinate descent methods.
Tasks	Dictionary Learning, Matrix Completion, Recommendation Systems
Published	2016-05-03
URL	http://arxiv.org/abs/1605.00937v2
PDF	http://arxiv.org/pdf/1605.00937v2.pdf
PWC	https://paperswithcode.com/paper/dictionary-learning-for-massive-matrix
Repo	https://github.com/arthurmensch/modl
Framework	none

Sequence Graph Transform (SGT): A Feature Extraction Function for Sequence Data Mining (Extended Version)


Title	Sequence Graph Transform (SGT): A Feature Extraction Function for Sequence Data Mining (Extended Version)
Authors	Chitta Ranjan, Samaneh Ebrahimi, Kamran Paynabar
Abstract	The ubiquitous presence of sequence data across fields such as the web, healthcare, bioinformatics, and text mining has made sequence mining a vital research area. However, sequence mining is particularly challenging because of difficulty in finding (dis)similarity/distance between sequences. This is because a distance measure between sequences is not obvious due to their unstructuredness—arbitrary strings of arbitrary length. Feature representations, such as n-grams, are often used but they either compromise on extracting both short- and long-term sequence patterns or have a high computation. We propose a new function, Sequence Graph Transform (SGT), that extracts the short- and long-term sequence features and embeds them in a finite-dimensional feature space. Importantly, SGT has low computation and can extract any amount of short- to long-term patterns without any increase in the computation, also proved theoretically in this paper. Due to this, SGT yields superior result with significantly higher accuracy and lower computation compared to the existing methods. We show it via several experimentation and SGT’s real world application for clustering, classification, search and visualization as examples.
Tasks
Published	2016-08-11
URL	http://arxiv.org/abs/1608.03533v9
PDF	http://arxiv.org/pdf/1608.03533v9.pdf
PWC	https://paperswithcode.com/paper/sequence-graph-transform-sgt-a-feature
Repo	https://github.com/cran2367/sgt
Framework	tf

Improving Variational Inference with Inverse Autoregressive Flow


Title	Improving Variational Inference with Inverse Autoregressive Flow
Authors	Diederik P. Kingma, Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, Max Welling
Abstract	The framework of normalizing flows provides a general strategy for flexible variational inference of posteriors over latent variables. We propose a new type of normalizing flow, inverse autoregressive flow (IAF), that, in contrast to earlier published flows, scales well to high-dimensional latent spaces. The proposed flow consists of a chain of invertible transformations, where each transformation is based on an autoregressive neural network. In experiments, we show that IAF significantly improves upon diagonal Gaussian approximate posteriors. In addition, we demonstrate that a novel type of variational autoencoder, coupled with IAF, is competitive with neural autoregressive models in terms of attained log-likelihood on natural images, while allowing significantly faster synthesis.
Tasks
Published	2016-06-15
URL	http://arxiv.org/abs/1606.04934v2
PDF	http://arxiv.org/pdf/1606.04934v2.pdf
PWC	https://paperswithcode.com/paper/improving-variational-inference-with-inverse
Repo	https://github.com/openai/iaf
Framework	tf

Dynamic Memory Networks for Visual and Textual Question Answering


Title	Dynamic Memory Networks for Visual and Textual Question Answering
Authors	Caiming Xiong, Stephen Merity, Richard Socher
Abstract	Neural network architectures with memory and attention mechanisms exhibit certain reasoning capabilities required for question answering. One such architecture, the dynamic memory network (DMN), obtained high accuracy on a variety of language tasks. However, it was not shown whether the architecture achieves strong results for question answering when supporting facts are not marked during training or whether it could be applied to other modalities such as images. Based on an analysis of the DMN, we propose several improvements to its memory and input modules. Together with these changes we introduce a novel input module for images in order to be able to answer visual questions. Our new DMN+ model improves the state of the art on both the Visual Question Answering dataset and the \babi-10k text question-answering dataset without supporting fact supervision.
Tasks	Question Answering, Visual Question Answering
Published	2016-03-04
URL	http://arxiv.org/abs/1603.01417v1
PDF	http://arxiv.org/pdf/1603.01417v1.pdf
PWC	https://paperswithcode.com/paper/dynamic-memory-networks-for-visual-and
Repo	https://github.com/radiodee1/awesome-chatbot
Framework	tf

Match-SRNN: Modeling the Recursive Matching Structure with Spatial RNN


Title	Match-SRNN: Modeling the Recursive Matching Structure with Spatial RNN
Authors	Shengxian Wan, Yanyan Lan, Jun Xu, Jiafeng Guo, Liang Pang, Xueqi Cheng
Abstract	Semantic matching, which aims to determine the matching degree between two texts, is a fundamental problem for many NLP applications. Recently, deep learning approach has been applied to this problem and significant improvements have been achieved. In this paper, we propose to view the generation of the global interaction between two texts as a recursive process: i.e. the interaction of two texts at each position is a composition of the interactions between their prefixes as well as the word level interaction at the current position. Based on this idea, we propose a novel deep architecture, namely Match-SRNN, to model the recursive matching structure. Firstly, a tensor is constructed to capture the word level interactions. Then a spatial RNN is applied to integrate the local interactions recursively, with importance determined by four types of gates. Finally, the matching score is calculated based on the global interaction. We show that, after degenerated to the exact matching scenario, Match-SRNN can approximate the dynamic programming process of longest common subsequence. Thus, there exists a clear interpretation for Match-SRNN. Our experiments on two semantic matching tasks showed the effectiveness of Match-SRNN, and its ability of visualizing the learned matching structure.
Tasks
Published	2016-04-15
URL	http://arxiv.org/abs/1604.04378v1
PDF	http://arxiv.org/pdf/1604.04378v1.pdf
PWC	https://paperswithcode.com/paper/match-srnn-modeling-the-recursive-matching
Repo	https://github.com/T-Almeida/tensorflow-keras-multidimensional-rnn
Framework	tf

Unified Framework for Quantification


Title	Unified Framework for Quantification
Authors	Aykut Firat
Abstract	Quantification is the machine learning task of estimating test-data class proportions that are not necessarily similar to those in training. Apart from its intrinsic value as an aggregate statistic, quantification output can also be used to optimize classifier probabilities, thereby increasing classification accuracy. We unify major quantification approaches under a constrained multi-variate regression framework, and use mathematical programming to estimate class proportions for different loss functions. With this modeling approach, we extend existing binary-only quantification approaches to multi-class settings as well. We empirically verify our unified framework by experimenting with several multi-class datasets including the Stanford Sentiment Treebank and CIFAR-10.
Tasks
Published	2016-06-02
URL	http://arxiv.org/abs/1606.00868v1
PDF	http://arxiv.org/pdf/1606.00868v1.pdf
PWC	https://paperswithcode.com/paper/unified-framework-for-quantification
Repo	https://github.com/aykutfirat/Quantification
Framework	none

Exponential Machines


Title	Exponential Machines
Authors	Alexander Novikov, Mikhail Trofimov, Ivan Oseledets
Abstract	Modeling interactions between features improves the performance of machine learning solutions in many domains (e.g. recommender systems or sentiment analysis). In this paper, we introduce Exponential Machines (ExM), a predictor that models all interactions of every order. The key idea is to represent an exponentially large tensor of parameters in a factorized format called Tensor Train (TT). The Tensor Train format regularizes the model and lets you control the number of underlying parameters. To train the model, we develop a stochastic Riemannian optimization procedure, which allows us to fit tensors with 2^160 entries. We show that the model achieves state-of-the-art performance on synthetic data with high-order interactions and that it works on par with high-order factorization machines on a recommender system dataset MovieLens 100K.
Tasks	Recommendation Systems, Sentiment Analysis
Published	2016-05-12
URL	http://arxiv.org/abs/1605.03795v3
PDF	http://arxiv.org/pdf/1605.03795v3.pdf
PWC	https://paperswithcode.com/paper/exponential-machines
Repo	https://github.com/emstoudenmire/TNML
Framework	none

Low-rank Optimization with Convex Constraints


Title	Low-rank Optimization with Convex Constraints
Authors	Christian Grussler, Anders Rantzer, Pontus Giselsson
Abstract	The problem of low-rank approximation with convex constraints, which appears in data analysis, system identification, model order reduction, low-order controller design and low-complexity modelling is considered. Given a matrix, the objective is to find a low-rank approximation that meets rank and convex constraints, while minimizing the distance to the matrix in the squared Frobenius norm. In many situations, this non-convex problem is convexified by nuclear norm regularization. However, we will see that the approximations obtained by this method may be far from optimal. In this paper, we propose an alternative convex relaxation that uses the convex envelope of the squared Frobenius norm and the rank constraint. With this approach, easily verifiable conditions are obtained under which the solutions to the convex relaxation and the original non-convex problem coincide. An SDP representation of the convex envelope is derived, which allows us to apply this approach to several known problems. Our example on optimal low-rank Hankel approximation/model reduction illustrates that the proposed convex relaxation performs consistently better than nuclear norm regularization and may outperform balanced truncation.
Tasks
Published	2016-06-06
URL	http://arxiv.org/abs/1606.01793v3
PDF	http://arxiv.org/pdf/1606.01793v3.pdf
PWC	https://paperswithcode.com/paper/low-rank-optimization-with-convex-constraints
Repo	https://github.com/LowRankOpt/LRINorm
Framework	none

Data-driven generation of spatio-temporal routines in human mobility


Title	Data-driven generation of spatio-temporal routines in human mobility
Authors	Luca Pappalardo, Filippo Simini
Abstract	The generation of realistic spatio-temporal trajectories of human mobility is of fundamental importance in a wide range of applications, such as the developing of protocols for mobile ad-hoc networks or what-if analysis in urban ecosystems. Current generative algorithms fail in accurately reproducing the individuals’ recurrent schedules and at the same time in accounting for the possibility that individuals may break the routine during periods of variable duration. In this article we present DITRAS (DIary-based TRAjectory Simulator), a framework to simulate the spatio-temporal patterns of human mobility. DITRAS operates in two steps: the generation of a mobility diary and the translation of the mobility diary into a mobility trajectory. We propose a data-driven algorithm which constructs a diary generator from real data, capturing the tendency of individuals to follow or break their routine. We also propose a trajectory generator based on the concept of preferential exploration and preferential return. We instantiate DITRAS with the proposed diary and trajectory generators and compare the resulting algorithm with real data and synthetic data produced by other generative algorithms, built by instantiating DITRAS with several combinations of diary and trajectory generators. We show that the proposed algorithm reproduces the statistical properties of real trajectories in the most accurate way, making a step forward the understanding of the origin of the spatio-temporal patterns of human mobility.
Tasks
Published	2016-07-16
URL	http://arxiv.org/abs/1607.05952v3
PDF	http://arxiv.org/pdf/1607.05952v3.pdf
PWC	https://paperswithcode.com/paper/data-driven-generation-of-spatio-temporal
Repo	https://github.com/jonpappalord/DITRAS
Framework	none

Safe Exploration in Finite Markov Decision Processes with Gaussian Processes


Title	Safe Exploration in Finite Markov Decision Processes with Gaussian Processes
Authors	Matteo Turchetta, Felix Berkenkamp, Andreas Krause
Abstract	In classical reinforcement learning, when exploring an environment, agents accept arbitrary short term loss for long term gain. This is infeasible for safety critical applications, such as robotics, where even a single unsafe action may cause system failure. In this paper, we address the problem of safely exploring finite Markov decision processes (MDP). We define safety in terms of an, a priori unknown, safety constraint that depends on states and actions. We aim to explore the MDP under this constraint, assuming that the unknown function satisfies regularity conditions expressed via a Gaussian process prior. We develop a novel algorithm for this task and prove that it is able to completely explore the safely reachable part of the MDP without violating the safety constraint. To achieve this, it cautiously explores safe states and actions in order to gain statistical confidence about the safety of unvisited state-action pairs from noisy observations collected while navigating the environment. Moreover, the algorithm explicitly considers reachability when exploring the MDP, ensuring that it does not get stuck in any state with no safe way out. We demonstrate our method on digital terrain models for the task of exploring an unknown map with a rover.
Tasks	Gaussian Processes, Safe Exploration
Published	2016-06-15
URL	http://arxiv.org/abs/1606.04753v2
PDF	http://arxiv.org/pdf/1606.04753v2.pdf
PWC	https://paperswithcode.com/paper/safe-exploration-in-finite-markov-decision
Repo	https://github.com/befelix/SafeMDP
Framework	none

Joint Learning of the Embedding of Words and Entities for Named Entity Disambiguation


Title	Joint Learning of the Embedding of Words and Entities for Named Entity Disambiguation
Authors	Ikuya Yamada, Hiroyuki Shindo, Hideaki Takeda, Yoshiyasu Takefuji
Abstract	Named Entity Disambiguation (NED) refers to the task of resolving multiple named entity mentions in a document to their correct references in a knowledge base (KB) (e.g., Wikipedia). In this paper, we propose a novel embedding method specifically designed for NED. The proposed method jointly maps words and entities into the same continuous vector space. We extend the skip-gram model by using two models. The KB graph model learns the relatedness of entities using the link structure of the KB, whereas the anchor context model aims to align vectors such that similar words and entities occur close to one another in the vector space by leveraging KB anchors and their context words. By combining contexts based on the proposed embedding with standard NED features, we achieved state-of-the-art accuracy of 93.1% on the standard CoNLL dataset and 85.2% on the TAC 2010 dataset.
Tasks	Entity Disambiguation, Entity Linking
Published	2016-01-06
URL	http://arxiv.org/abs/1601.01343v4
PDF	http://arxiv.org/pdf/1601.01343v4.pdf
PWC	https://paperswithcode.com/paper/joint-learning-of-the-embedding-of-words-and
Repo	https://github.com/wikipedia2vec/wikipedia2vec
Framework	none

Abstractive Text Summarization Using Sequence-to-Sequence RNNs and Beyond


Title	Abstractive Text Summarization Using Sequence-to-Sequence RNNs and Beyond
Authors	Ramesh Nallapati, Bowen Zhou, Cicero Nogueira dos santos, Caglar Gulcehre, Bing Xiang
Abstract	In this work, we model abstractive text summarization using Attentional Encoder-Decoder Recurrent Neural Networks, and show that they achieve state-of-the-art performance on two different corpora. We propose several novel models that address critical problems in summarization that are not adequately modeled by the basic architecture, such as modeling key-words, capturing the hierarchy of sentence-to-word structure, and emitting words that are rare or unseen at training time. Our work shows that many of our proposed models contribute to further improvement in performance. We also propose a new dataset consisting of multi-sentence summaries, and establish performance benchmarks for further research.
Tasks	Abstractive Text Summarization, Text Summarization
Published	2016-02-19
URL	http://arxiv.org/abs/1602.06023v5
PDF	http://arxiv.org/pdf/1602.06023v5.pdf
PWC	https://paperswithcode.com/paper/abstractive-text-summarization-using-sequence
Repo	https://github.com/yunzhusong/AAAI20-PORLHG
Framework	none

Generating Factoid Questions With Recurrent Neural Networks: The 30M Factoid Question-Answer Corpus


Title	Generating Factoid Questions With Recurrent Neural Networks: The 30M Factoid Question-Answer Corpus
Authors	Iulian Vlad Serban, Alberto García-Durán, Caglar Gulcehre, Sungjin Ahn, Sarath Chandar, Aaron Courville, Yoshua Bengio
Abstract	Over the past decade, large-scale supervised learning corpora have enabled machine learning researchers to make substantial advances. However, to this date, there are no large-scale question-answer corpora available. In this paper we present the 30M Factoid Question-Answer Corpus, an enormous question answer pair corpus produced by applying a novel neural network architecture on the knowledge base Freebase to transduce facts into natural language questions. The produced question answer pairs are evaluated both by human evaluators and using automatic evaluation metrics, including well-established machine translation and sentence similarity metrics. Across all evaluation criteria the question-generation model outperforms the competing template-based baseline. Furthermore, when presented to human evaluators, the generated questions appear comparable in quality to real human-generated questions.
Tasks	Machine Translation, Question Generation
Published	2016-03-22
URL	http://arxiv.org/abs/1603.06807v2
PDF	http://arxiv.org/pdf/1603.06807v2.pdf
PWC	https://paperswithcode.com/paper/generating-factoid-questions-with-recurrent
Repo	https://github.com/imatge-upc/vqa-2016-cvprw
Framework	tf