May 7, 2019

2804 words 14 mins read

Paper Group AWR 40

Paper Group AWR 40

Problems With Evaluation of Word Embeddings Using Word Similarity Tasks. Very Deep Convolutional Networks for End-to-End Speech Recognition. Dictionary Learning for Massive Matrix Factorization. Sequence Graph Transform (SGT): A Feature Extraction Function for Sequence Data Mining (Extended Version). Improving Variational Inference with Inverse Aut …

Problems With Evaluation of Word Embeddings Using Word Similarity Tasks

Title Problems With Evaluation of Word Embeddings Using Word Similarity Tasks
Authors Manaal Faruqui, Yulia Tsvetkov, Pushpendre Rastogi, Chris Dyer
Abstract Lacking standardized extrinsic evaluation methods for vector representations of words, the NLP community has relied heavily on word similarity tasks as a proxy for intrinsic evaluation of word vectors. Word similarity evaluation, which correlates the distance between vectors and human judgments of semantic similarity is attractive, because it is computationally inexpensive and fast. In this paper we present several problems associated with the evaluation of word vectors on word similarity datasets, and summarize existing solutions. Our study suggests that the use of word similarity tasks for evaluation of word vectors is not sustainable and calls for further research on evaluation methods.
Tasks Semantic Similarity, Semantic Textual Similarity, Word Embeddings
Published 2016-05-08
URL http://arxiv.org/abs/1605.02276v3
PDF http://arxiv.org/pdf/1605.02276v3.pdf
PWC https://paperswithcode.com/paper/problems-with-evaluation-of-word-embeddings
Repo https://github.com/avi-jit/SWOW-eval
Framework none

Very Deep Convolutional Networks for End-to-End Speech Recognition

Title Very Deep Convolutional Networks for End-to-End Speech Recognition
Authors Yu Zhang, William Chan, Navdeep Jaitly
Abstract Sequence-to-sequence models have shown success in end-to-end speech recognition. However these models have only used shallow acoustic encoder networks. In our work, we successively train very deep convolutional networks to add more expressive power and better generalization for end-to-end ASR models. We apply network-in-network principles, batch normalization, residual connections and convolutional LSTMs to build very deep recurrent and convolutional structures. Our models exploit the spectral structure in the feature space and add computational depth without overfitting issues. We experiment with the WSJ ASR task and achieve 10.5% word error rate without any dictionary or language using a 15 layer deep network.
Tasks End-To-End Speech Recognition, Speech Recognition
Published 2016-10-10
URL http://arxiv.org/abs/1610.03022v1
PDF http://arxiv.org/pdf/1610.03022v1.pdf
PWC https://paperswithcode.com/paper/very-deep-convolutional-networks-for-end-to
Repo https://github.com/colaprograms/speechify
Framework tf

Dictionary Learning for Massive Matrix Factorization

Title Dictionary Learning for Massive Matrix Factorization
Authors Arthur Mensch, Julien Mairal, Bertrand Thirion, Gaël Varoquaux
Abstract Sparse matrix factorization is a popular tool to obtain interpretable data decompositions, which are also effective to perform data completion or denoising. Its applicability to large datasets has been addressed with online and randomized methods, that reduce the complexity in one of the matrix dimension, but not in both of them. In this paper, we tackle very large matrices in both dimensions. We propose a new factoriza-tion method that scales gracefully to terabyte-scale datasets, that could not be processed by previous algorithms in a reasonable amount of time. We demonstrate the efficiency of our approach on massive functional Magnetic Resonance Imaging (fMRI) data, and on matrix completion problems for recommender systems, where we obtain significant speed-ups compared to state-of-the art coordinate descent methods.
Tasks Dictionary Learning, Matrix Completion, Recommendation Systems
Published 2016-05-03
URL http://arxiv.org/abs/1605.00937v2
PDF http://arxiv.org/pdf/1605.00937v2.pdf
PWC https://paperswithcode.com/paper/dictionary-learning-for-massive-matrix
Repo https://github.com/arthurmensch/modl
Framework none

Sequence Graph Transform (SGT): A Feature Extraction Function for Sequence Data Mining (Extended Version)

Title Sequence Graph Transform (SGT): A Feature Extraction Function for Sequence Data Mining (Extended Version)
Authors Chitta Ranjan, Samaneh Ebrahimi, Kamran Paynabar
Abstract The ubiquitous presence of sequence data across fields such as the web, healthcare, bioinformatics, and text mining has made sequence mining a vital research area. However, sequence mining is particularly challenging because of difficulty in finding (dis)similarity/distance between sequences. This is because a distance measure between sequences is not obvious due to their unstructuredness—arbitrary strings of arbitrary length. Feature representations, such as n-grams, are often used but they either compromise on extracting both short- and long-term sequence patterns or have a high computation. We propose a new function, Sequence Graph Transform (SGT), that extracts the short- and long-term sequence features and embeds them in a finite-dimensional feature space. Importantly, SGT has low computation and can extract any amount of short- to long-term patterns without any increase in the computation, also proved theoretically in this paper. Due to this, SGT yields superior result with significantly higher accuracy and lower computation compared to the existing methods. We show it via several experimentation and SGT’s real world application for clustering, classification, search and visualization as examples.
Tasks
Published 2016-08-11
URL http://arxiv.org/abs/1608.03533v9
PDF http://arxiv.org/pdf/1608.03533v9.pdf
PWC https://paperswithcode.com/paper/sequence-graph-transform-sgt-a-feature
Repo https://github.com/cran2367/sgt
Framework tf

Improving Variational Inference with Inverse Autoregressive Flow

Title Improving Variational Inference with Inverse Autoregressive Flow
Authors Diederik P. Kingma, Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, Max Welling
Abstract The framework of normalizing flows provides a general strategy for flexible variational inference of posteriors over latent variables. We propose a new type of normalizing flow, inverse autoregressive flow (IAF), that, in contrast to earlier published flows, scales well to high-dimensional latent spaces. The proposed flow consists of a chain of invertible transformations, where each transformation is based on an autoregressive neural network. In experiments, we show that IAF significantly improves upon diagonal Gaussian approximate posteriors. In addition, we demonstrate that a novel type of variational autoencoder, coupled with IAF, is competitive with neural autoregressive models in terms of attained log-likelihood on natural images, while allowing significantly faster synthesis.
Tasks
Published 2016-06-15
URL http://arxiv.org/abs/1606.04934v2
PDF http://arxiv.org/pdf/1606.04934v2.pdf
PWC https://paperswithcode.com/paper/improving-variational-inference-with-inverse
Repo https://github.com/openai/iaf
Framework tf

Dynamic Memory Networks for Visual and Textual Question Answering

Title Dynamic Memory Networks for Visual and Textual Question Answering
Authors Caiming Xiong, Stephen Merity, Richard Socher
Abstract Neural network architectures with memory and attention mechanisms exhibit certain reasoning capabilities required for question answering. One such architecture, the dynamic memory network (DMN), obtained high accuracy on a variety of language tasks. However, it was not shown whether the architecture achieves strong results for question answering when supporting facts are not marked during training or whether it could be applied to other modalities such as images. Based on an analysis of the DMN, we propose several improvements to its memory and input modules. Together with these changes we introduce a novel input module for images in order to be able to answer visual questions. Our new DMN+ model improves the state of the art on both the Visual Question Answering dataset and the \babi-10k text question-answering dataset without supporting fact supervision.
Tasks Question Answering, Visual Question Answering
Published 2016-03-04
URL http://arxiv.org/abs/1603.01417v1
PDF http://arxiv.org/pdf/1603.01417v1.pdf
PWC https://paperswithcode.com/paper/dynamic-memory-networks-for-visual-and
Repo https://github.com/radiodee1/awesome-chatbot
Framework tf

Match-SRNN: Modeling the Recursive Matching Structure with Spatial RNN

Title Match-SRNN: Modeling the Recursive Matching Structure with Spatial RNN
Authors Shengxian Wan, Yanyan Lan, Jun Xu, Jiafeng Guo, Liang Pang, Xueqi Cheng
Abstract Semantic matching, which aims to determine the matching degree between two texts, is a fundamental problem for many NLP applications. Recently, deep learning approach has been applied to this problem and significant improvements have been achieved. In this paper, we propose to view the generation of the global interaction between two texts as a recursive process: i.e. the interaction of two texts at each position is a composition of the interactions between their prefixes as well as the word level interaction at the current position. Based on this idea, we propose a novel deep architecture, namely Match-SRNN, to model the recursive matching structure. Firstly, a tensor is constructed to capture the word level interactions. Then a spatial RNN is applied to integrate the local interactions recursively, with importance determined by four types of gates. Finally, the matching score is calculated based on the global interaction. We show that, after degenerated to the exact matching scenario, Match-SRNN can approximate the dynamic programming process of longest common subsequence. Thus, there exists a clear interpretation for Match-SRNN. Our experiments on two semantic matching tasks showed the effectiveness of Match-SRNN, and its ability of visualizing the learned matching structure.
Tasks
Published 2016-04-15
URL http://arxiv.org/abs/1604.04378v1
PDF http://arxiv.org/pdf/1604.04378v1.pdf
PWC https://paperswithcode.com/paper/match-srnn-modeling-the-recursive-matching
Repo https://github.com/T-Almeida/tensorflow-keras-multidimensional-rnn
Framework tf

Unified Framework for Quantification

Title Unified Framework for Quantification
Authors Aykut Firat
Abstract Quantification is the machine learning task of estimating test-data class proportions that are not necessarily similar to those in training. Apart from its intrinsic value as an aggregate statistic, quantification output can also be used to optimize classifier probabilities, thereby increasing classification accuracy. We unify major quantification approaches under a constrained multi-variate regression framework, and use mathematical programming to estimate class proportions for different loss functions. With this modeling approach, we extend existing binary-only quantification approaches to multi-class settings as well. We empirically verify our unified framework by experimenting with several multi-class datasets including the Stanford Sentiment Treebank and CIFAR-10.
Tasks
Published 2016-06-02
URL http://arxiv.org/abs/1606.00868v1
PDF http://arxiv.org/pdf/1606.00868v1.pdf
PWC https://paperswithcode.com/paper/unified-framework-for-quantification
Repo https://github.com/aykutfirat/Quantification
Framework none

Exponential Machines

Title Exponential Machines
Authors Alexander Novikov, Mikhail Trofimov, Ivan Oseledets
Abstract Modeling interactions between features improves the performance of machine learning solutions in many domains (e.g. recommender systems or sentiment analysis). In this paper, we introduce Exponential Machines (ExM), a predictor that models all interactions of every order. The key idea is to represent an exponentially large tensor of parameters in a factorized format called Tensor Train (TT). The Tensor Train format regularizes the model and lets you control the number of underlying parameters. To train the model, we develop a stochastic Riemannian optimization procedure, which allows us to fit tensors with 2^160 entries. We show that the model achieves state-of-the-art performance on synthetic data with high-order interactions and that it works on par with high-order factorization machines on a recommender system dataset MovieLens 100K.
Tasks Recommendation Systems, Sentiment Analysis
Published 2016-05-12
URL http://arxiv.org/abs/1605.03795v3
PDF http://arxiv.org/pdf/1605.03795v3.pdf
PWC https://paperswithcode.com/paper/exponential-machines
Repo https://github.com/emstoudenmire/TNML
Framework none

Low-rank Optimization with Convex Constraints

Title Low-rank Optimization with Convex Constraints
Authors Christian Grussler, Anders Rantzer, Pontus Giselsson
Abstract The problem of low-rank approximation with convex constraints, which appears in data analysis, system identification, model order reduction, low-order controller design and low-complexity modelling is considered. Given a matrix, the objective is to find a low-rank approximation that meets rank and convex constraints, while minimizing the distance to the matrix in the squared Frobenius norm. In many situations, this non-convex problem is convexified by nuclear norm regularization. However, we will see that the approximations obtained by this method may be far from optimal. In this paper, we propose an alternative convex relaxation that uses the convex envelope of the squared Frobenius norm and the rank constraint. With this approach, easily verifiable conditions are obtained under which the solutions to the convex relaxation and the original non-convex problem coincide. An SDP representation of the convex envelope is derived, which allows us to apply this approach to several known problems. Our example on optimal low-rank Hankel approximation/model reduction illustrates that the proposed convex relaxation performs consistently better than nuclear norm regularization and may outperform balanced truncation.
Tasks
Published 2016-06-06
URL http://arxiv.org/abs/1606.01793v3
PDF http://arxiv.org/pdf/1606.01793v3.pdf
PWC https://paperswithcode.com/paper/low-rank-optimization-with-convex-constraints
Repo https://github.com/LowRankOpt/LRINorm
Framework none

Data-driven generation of spatio-temporal routines in human mobility

Title Data-driven generation of spatio-temporal routines in human mobility
Authors Luca Pappalardo, Filippo Simini
Abstract The generation of realistic spatio-temporal trajectories of human mobility is of fundamental importance in a wide range of applications, such as the developing of protocols for mobile ad-hoc networks or what-if analysis in urban ecosystems. Current generative algorithms fail in accurately reproducing the individuals’ recurrent schedules and at the same time in accounting for the possibility that individuals may break the routine during periods of variable duration. In this article we present DITRAS (DIary-based TRAjectory Simulator), a framework to simulate the spatio-temporal patterns of human mobility. DITRAS operates in two steps: the generation of a mobility diary and the translation of the mobility diary into a mobility trajectory. We propose a data-driven algorithm which constructs a diary generator from real data, capturing the tendency of individuals to follow or break their routine. We also propose a trajectory generator based on the concept of preferential exploration and preferential return. We instantiate DITRAS with the proposed diary and trajectory generators and compare the resulting algorithm with real data and synthetic data produced by other generative algorithms, built by instantiating DITRAS with several combinations of diary and trajectory generators. We show that the proposed algorithm reproduces the statistical properties of real trajectories in the most accurate way, making a step forward the understanding of the origin of the spatio-temporal patterns of human mobility.
Tasks
Published 2016-07-16
URL http://arxiv.org/abs/1607.05952v3
PDF http://arxiv.org/pdf/1607.05952v3.pdf
PWC https://paperswithcode.com/paper/data-driven-generation-of-spatio-temporal
Repo https://github.com/jonpappalord/DITRAS
Framework none

Safe Exploration in Finite Markov Decision Processes with Gaussian Processes

Title Safe Exploration in Finite Markov Decision Processes with Gaussian Processes
Authors Matteo Turchetta, Felix Berkenkamp, Andreas Krause
Abstract In classical reinforcement learning, when exploring an environment, agents accept arbitrary short term loss for long term gain. This is infeasible for safety critical applications, such as robotics, where even a single unsafe action may cause system failure. In this paper, we address the problem of safely exploring finite Markov decision processes (MDP). We define safety in terms of an, a priori unknown, safety constraint that depends on states and actions. We aim to explore the MDP under this constraint, assuming that the unknown function satisfies regularity conditions expressed via a Gaussian process prior. We develop a novel algorithm for this task and prove that it is able to completely explore the safely reachable part of the MDP without violating the safety constraint. To achieve this, it cautiously explores safe states and actions in order to gain statistical confidence about the safety of unvisited state-action pairs from noisy observations collected while navigating the environment. Moreover, the algorithm explicitly considers reachability when exploring the MDP, ensuring that it does not get stuck in any state with no safe way out. We demonstrate our method on digital terrain models for the task of exploring an unknown map with a rover.
Tasks Gaussian Processes, Safe Exploration
Published 2016-06-15
URL http://arxiv.org/abs/1606.04753v2
PDF http://arxiv.org/pdf/1606.04753v2.pdf
PWC https://paperswithcode.com/paper/safe-exploration-in-finite-markov-decision
Repo https://github.com/befelix/SafeMDP
Framework none

Joint Learning of the Embedding of Words and Entities for Named Entity Disambiguation

Title Joint Learning of the Embedding of Words and Entities for Named Entity Disambiguation
Authors Ikuya Yamada, Hiroyuki Shindo, Hideaki Takeda, Yoshiyasu Takefuji
Abstract Named Entity Disambiguation (NED) refers to the task of resolving multiple named entity mentions in a document to their correct references in a knowledge base (KB) (e.g., Wikipedia). In this paper, we propose a novel embedding method specifically designed for NED. The proposed method jointly maps words and entities into the same continuous vector space. We extend the skip-gram model by using two models. The KB graph model learns the relatedness of entities using the link structure of the KB, whereas the anchor context model aims to align vectors such that similar words and entities occur close to one another in the vector space by leveraging KB anchors and their context words. By combining contexts based on the proposed embedding with standard NED features, we achieved state-of-the-art accuracy of 93.1% on the standard CoNLL dataset and 85.2% on the TAC 2010 dataset.
Tasks Entity Disambiguation, Entity Linking
Published 2016-01-06
URL http://arxiv.org/abs/1601.01343v4
PDF http://arxiv.org/pdf/1601.01343v4.pdf
PWC https://paperswithcode.com/paper/joint-learning-of-the-embedding-of-words-and
Repo https://github.com/wikipedia2vec/wikipedia2vec
Framework none

Abstractive Text Summarization Using Sequence-to-Sequence RNNs and Beyond

Title Abstractive Text Summarization Using Sequence-to-Sequence RNNs and Beyond
Authors Ramesh Nallapati, Bowen Zhou, Cicero Nogueira dos santos, Caglar Gulcehre, Bing Xiang
Abstract In this work, we model abstractive text summarization using Attentional Encoder-Decoder Recurrent Neural Networks, and show that they achieve state-of-the-art performance on two different corpora. We propose several novel models that address critical problems in summarization that are not adequately modeled by the basic architecture, such as modeling key-words, capturing the hierarchy of sentence-to-word structure, and emitting words that are rare or unseen at training time. Our work shows that many of our proposed models contribute to further improvement in performance. We also propose a new dataset consisting of multi-sentence summaries, and establish performance benchmarks for further research.
Tasks Abstractive Text Summarization, Text Summarization
Published 2016-02-19
URL http://arxiv.org/abs/1602.06023v5
PDF http://arxiv.org/pdf/1602.06023v5.pdf
PWC https://paperswithcode.com/paper/abstractive-text-summarization-using-sequence
Repo https://github.com/yunzhusong/AAAI20-PORLHG
Framework none

Generating Factoid Questions With Recurrent Neural Networks: The 30M Factoid Question-Answer Corpus

Title Generating Factoid Questions With Recurrent Neural Networks: The 30M Factoid Question-Answer Corpus
Authors Iulian Vlad Serban, Alberto García-Durán, Caglar Gulcehre, Sungjin Ahn, Sarath Chandar, Aaron Courville, Yoshua Bengio
Abstract Over the past decade, large-scale supervised learning corpora have enabled machine learning researchers to make substantial advances. However, to this date, there are no large-scale question-answer corpora available. In this paper we present the 30M Factoid Question-Answer Corpus, an enormous question answer pair corpus produced by applying a novel neural network architecture on the knowledge base Freebase to transduce facts into natural language questions. The produced question answer pairs are evaluated both by human evaluators and using automatic evaluation metrics, including well-established machine translation and sentence similarity metrics. Across all evaluation criteria the question-generation model outperforms the competing template-based baseline. Furthermore, when presented to human evaluators, the generated questions appear comparable in quality to real human-generated questions.
Tasks Machine Translation, Question Generation
Published 2016-03-22
URL http://arxiv.org/abs/1603.06807v2
PDF http://arxiv.org/pdf/1603.06807v2.pdf
PWC https://paperswithcode.com/paper/generating-factoid-questions-with-recurrent
Repo https://github.com/imatge-upc/vqa-2016-cvprw
Framework tf
comments powered by Disqus