Paper Group AWR 40
Problems With Evaluation of Word Embeddings Using Word Similarity Tasks. Very Deep Convolutional Networks for End-to-End Speech Recognition. Dictionary Learning for Massive Matrix Factorization. Sequence Graph Transform (SGT): A Feature Extraction Function for Sequence Data Mining (Extended Version). Improving Variational Inference with Inverse Aut …
Problems With Evaluation of Word Embeddings Using Word Similarity Tasks
Title | Problems With Evaluation of Word Embeddings Using Word Similarity Tasks |
Authors | Manaal Faruqui, Yulia Tsvetkov, Pushpendre Rastogi, Chris Dyer |
Abstract | Lacking standardized extrinsic evaluation methods for vector representations of words, the NLP community has relied heavily on word similarity tasks as a proxy for intrinsic evaluation of word vectors. Word similarity evaluation, which correlates the distance between vectors and human judgments of semantic similarity is attractive, because it is computationally inexpensive and fast. In this paper we present several problems associated with the evaluation of word vectors on word similarity datasets, and summarize existing solutions. Our study suggests that the use of word similarity tasks for evaluation of word vectors is not sustainable and calls for further research on evaluation methods. |
Tasks | Semantic Similarity, Semantic Textual Similarity, Word Embeddings |
Published | 2016-05-08 |
URL | http://arxiv.org/abs/1605.02276v3 |
http://arxiv.org/pdf/1605.02276v3.pdf | |
PWC | https://paperswithcode.com/paper/problems-with-evaluation-of-word-embeddings |
Repo | https://github.com/avi-jit/SWOW-eval |
Framework | none |
Very Deep Convolutional Networks for End-to-End Speech Recognition
Title | Very Deep Convolutional Networks for End-to-End Speech Recognition |
Authors | Yu Zhang, William Chan, Navdeep Jaitly |
Abstract | Sequence-to-sequence models have shown success in end-to-end speech recognition. However these models have only used shallow acoustic encoder networks. In our work, we successively train very deep convolutional networks to add more expressive power and better generalization for end-to-end ASR models. We apply network-in-network principles, batch normalization, residual connections and convolutional LSTMs to build very deep recurrent and convolutional structures. Our models exploit the spectral structure in the feature space and add computational depth without overfitting issues. We experiment with the WSJ ASR task and achieve 10.5% word error rate without any dictionary or language using a 15 layer deep network. |
Tasks | End-To-End Speech Recognition, Speech Recognition |
Published | 2016-10-10 |
URL | http://arxiv.org/abs/1610.03022v1 |
http://arxiv.org/pdf/1610.03022v1.pdf | |
PWC | https://paperswithcode.com/paper/very-deep-convolutional-networks-for-end-to |
Repo | https://github.com/colaprograms/speechify |
Framework | tf |
Dictionary Learning for Massive Matrix Factorization
Title | Dictionary Learning for Massive Matrix Factorization |
Authors | Arthur Mensch, Julien Mairal, Bertrand Thirion, Gaël Varoquaux |
Abstract | Sparse matrix factorization is a popular tool to obtain interpretable data decompositions, which are also effective to perform data completion or denoising. Its applicability to large datasets has been addressed with online and randomized methods, that reduce the complexity in one of the matrix dimension, but not in both of them. In this paper, we tackle very large matrices in both dimensions. We propose a new factoriza-tion method that scales gracefully to terabyte-scale datasets, that could not be processed by previous algorithms in a reasonable amount of time. We demonstrate the efficiency of our approach on massive functional Magnetic Resonance Imaging (fMRI) data, and on matrix completion problems for recommender systems, where we obtain significant speed-ups compared to state-of-the art coordinate descent methods. |
Tasks | Dictionary Learning, Matrix Completion, Recommendation Systems |
Published | 2016-05-03 |
URL | http://arxiv.org/abs/1605.00937v2 |
http://arxiv.org/pdf/1605.00937v2.pdf | |
PWC | https://paperswithcode.com/paper/dictionary-learning-for-massive-matrix |
Repo | https://github.com/arthurmensch/modl |
Framework | none |
Sequence Graph Transform (SGT): A Feature Extraction Function for Sequence Data Mining (Extended Version)
Title | Sequence Graph Transform (SGT): A Feature Extraction Function for Sequence Data Mining (Extended Version) |
Authors | Chitta Ranjan, Samaneh Ebrahimi, Kamran Paynabar |
Abstract | The ubiquitous presence of sequence data across fields such as the web, healthcare, bioinformatics, and text mining has made sequence mining a vital research area. However, sequence mining is particularly challenging because of difficulty in finding (dis)similarity/distance between sequences. This is because a distance measure between sequences is not obvious due to their unstructuredness—arbitrary strings of arbitrary length. Feature representations, such as n-grams, are often used but they either compromise on extracting both short- and long-term sequence patterns or have a high computation. We propose a new function, Sequence Graph Transform (SGT), that extracts the short- and long-term sequence features and embeds them in a finite-dimensional feature space. Importantly, SGT has low computation and can extract any amount of short- to long-term patterns without any increase in the computation, also proved theoretically in this paper. Due to this, SGT yields superior result with significantly higher accuracy and lower computation compared to the existing methods. We show it via several experimentation and SGT’s real world application for clustering, classification, search and visualization as examples. |
Tasks | |
Published | 2016-08-11 |
URL | http://arxiv.org/abs/1608.03533v9 |
http://arxiv.org/pdf/1608.03533v9.pdf | |
PWC | https://paperswithcode.com/paper/sequence-graph-transform-sgt-a-feature |
Repo | https://github.com/cran2367/sgt |
Framework | tf |
Improving Variational Inference with Inverse Autoregressive Flow
Title | Improving Variational Inference with Inverse Autoregressive Flow |
Authors | Diederik P. Kingma, Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, Max Welling |
Abstract | The framework of normalizing flows provides a general strategy for flexible variational inference of posteriors over latent variables. We propose a new type of normalizing flow, inverse autoregressive flow (IAF), that, in contrast to earlier published flows, scales well to high-dimensional latent spaces. The proposed flow consists of a chain of invertible transformations, where each transformation is based on an autoregressive neural network. In experiments, we show that IAF significantly improves upon diagonal Gaussian approximate posteriors. In addition, we demonstrate that a novel type of variational autoencoder, coupled with IAF, is competitive with neural autoregressive models in terms of attained log-likelihood on natural images, while allowing significantly faster synthesis. |
Tasks | |
Published | 2016-06-15 |
URL | http://arxiv.org/abs/1606.04934v2 |
http://arxiv.org/pdf/1606.04934v2.pdf | |
PWC | https://paperswithcode.com/paper/improving-variational-inference-with-inverse |
Repo | https://github.com/openai/iaf |
Framework | tf |
Dynamic Memory Networks for Visual and Textual Question Answering
Title | Dynamic Memory Networks for Visual and Textual Question Answering |
Authors | Caiming Xiong, Stephen Merity, Richard Socher |
Abstract | Neural network architectures with memory and attention mechanisms exhibit certain reasoning capabilities required for question answering. One such architecture, the dynamic memory network (DMN), obtained high accuracy on a variety of language tasks. However, it was not shown whether the architecture achieves strong results for question answering when supporting facts are not marked during training or whether it could be applied to other modalities such as images. Based on an analysis of the DMN, we propose several improvements to its memory and input modules. Together with these changes we introduce a novel input module for images in order to be able to answer visual questions. Our new DMN+ model improves the state of the art on both the Visual Question Answering dataset and the \babi-10k text question-answering dataset without supporting fact supervision. |
Tasks | Question Answering, Visual Question Answering |
Published | 2016-03-04 |
URL | http://arxiv.org/abs/1603.01417v1 |
http://arxiv.org/pdf/1603.01417v1.pdf | |
PWC | https://paperswithcode.com/paper/dynamic-memory-networks-for-visual-and |
Repo | https://github.com/radiodee1/awesome-chatbot |
Framework | tf |
Match-SRNN: Modeling the Recursive Matching Structure with Spatial RNN
Title | Match-SRNN: Modeling the Recursive Matching Structure with Spatial RNN |
Authors | Shengxian Wan, Yanyan Lan, Jun Xu, Jiafeng Guo, Liang Pang, Xueqi Cheng |
Abstract | Semantic matching, which aims to determine the matching degree between two texts, is a fundamental problem for many NLP applications. Recently, deep learning approach has been applied to this problem and significant improvements have been achieved. In this paper, we propose to view the generation of the global interaction between two texts as a recursive process: i.e. the interaction of two texts at each position is a composition of the interactions between their prefixes as well as the word level interaction at the current position. Based on this idea, we propose a novel deep architecture, namely Match-SRNN, to model the recursive matching structure. Firstly, a tensor is constructed to capture the word level interactions. Then a spatial RNN is applied to integrate the local interactions recursively, with importance determined by four types of gates. Finally, the matching score is calculated based on the global interaction. We show that, after degenerated to the exact matching scenario, Match-SRNN can approximate the dynamic programming process of longest common subsequence. Thus, there exists a clear interpretation for Match-SRNN. Our experiments on two semantic matching tasks showed the effectiveness of Match-SRNN, and its ability of visualizing the learned matching structure. |
Tasks | |
Published | 2016-04-15 |
URL | http://arxiv.org/abs/1604.04378v1 |
http://arxiv.org/pdf/1604.04378v1.pdf | |
PWC | https://paperswithcode.com/paper/match-srnn-modeling-the-recursive-matching |
Repo | https://github.com/T-Almeida/tensorflow-keras-multidimensional-rnn |
Framework | tf |
Unified Framework for Quantification
Title | Unified Framework for Quantification |
Authors | Aykut Firat |
Abstract | Quantification is the machine learning task of estimating test-data class proportions that are not necessarily similar to those in training. Apart from its intrinsic value as an aggregate statistic, quantification output can also be used to optimize classifier probabilities, thereby increasing classification accuracy. We unify major quantification approaches under a constrained multi-variate regression framework, and use mathematical programming to estimate class proportions for different loss functions. With this modeling approach, we extend existing binary-only quantification approaches to multi-class settings as well. We empirically verify our unified framework by experimenting with several multi-class datasets including the Stanford Sentiment Treebank and CIFAR-10. |
Tasks | |
Published | 2016-06-02 |
URL | http://arxiv.org/abs/1606.00868v1 |
http://arxiv.org/pdf/1606.00868v1.pdf | |
PWC | https://paperswithcode.com/paper/unified-framework-for-quantification |
Repo | https://github.com/aykutfirat/Quantification |
Framework | none |
Exponential Machines
Title | Exponential Machines |
Authors | Alexander Novikov, Mikhail Trofimov, Ivan Oseledets |
Abstract | Modeling interactions between features improves the performance of machine learning solutions in many domains (e.g. recommender systems or sentiment analysis). In this paper, we introduce Exponential Machines (ExM), a predictor that models all interactions of every order. The key idea is to represent an exponentially large tensor of parameters in a factorized format called Tensor Train (TT). The Tensor Train format regularizes the model and lets you control the number of underlying parameters. To train the model, we develop a stochastic Riemannian optimization procedure, which allows us to fit tensors with 2^160 entries. We show that the model achieves state-of-the-art performance on synthetic data with high-order interactions and that it works on par with high-order factorization machines on a recommender system dataset MovieLens 100K. |
Tasks | Recommendation Systems, Sentiment Analysis |
Published | 2016-05-12 |
URL | http://arxiv.org/abs/1605.03795v3 |
http://arxiv.org/pdf/1605.03795v3.pdf | |
PWC | https://paperswithcode.com/paper/exponential-machines |
Repo | https://github.com/emstoudenmire/TNML |
Framework | none |
Low-rank Optimization with Convex Constraints
Title | Low-rank Optimization with Convex Constraints |
Authors | Christian Grussler, Anders Rantzer, Pontus Giselsson |
Abstract | The problem of low-rank approximation with convex constraints, which appears in data analysis, system identification, model order reduction, low-order controller design and low-complexity modelling is considered. Given a matrix, the objective is to find a low-rank approximation that meets rank and convex constraints, while minimizing the distance to the matrix in the squared Frobenius norm. In many situations, this non-convex problem is convexified by nuclear norm regularization. However, we will see that the approximations obtained by this method may be far from optimal. In this paper, we propose an alternative convex relaxation that uses the convex envelope of the squared Frobenius norm and the rank constraint. With this approach, easily verifiable conditions are obtained under which the solutions to the convex relaxation and the original non-convex problem coincide. An SDP representation of the convex envelope is derived, which allows us to apply this approach to several known problems. Our example on optimal low-rank Hankel approximation/model reduction illustrates that the proposed convex relaxation performs consistently better than nuclear norm regularization and may outperform balanced truncation. |
Tasks | |
Published | 2016-06-06 |
URL | http://arxiv.org/abs/1606.01793v3 |
http://arxiv.org/pdf/1606.01793v3.pdf | |
PWC | https://paperswithcode.com/paper/low-rank-optimization-with-convex-constraints |
Repo | https://github.com/LowRankOpt/LRINorm |
Framework | none |
Data-driven generation of spatio-temporal routines in human mobility
Title | Data-driven generation of spatio-temporal routines in human mobility |
Authors | Luca Pappalardo, Filippo Simini |
Abstract | The generation of realistic spatio-temporal trajectories of human mobility is of fundamental importance in a wide range of applications, such as the developing of protocols for mobile ad-hoc networks or what-if analysis in urban ecosystems. Current generative algorithms fail in accurately reproducing the individuals’ recurrent schedules and at the same time in accounting for the possibility that individuals may break the routine during periods of variable duration. In this article we present DITRAS (DIary-based TRAjectory Simulator), a framework to simulate the spatio-temporal patterns of human mobility. DITRAS operates in two steps: the generation of a mobility diary and the translation of the mobility diary into a mobility trajectory. We propose a data-driven algorithm which constructs a diary generator from real data, capturing the tendency of individuals to follow or break their routine. We also propose a trajectory generator based on the concept of preferential exploration and preferential return. We instantiate DITRAS with the proposed diary and trajectory generators and compare the resulting algorithm with real data and synthetic data produced by other generative algorithms, built by instantiating DITRAS with several combinations of diary and trajectory generators. We show that the proposed algorithm reproduces the statistical properties of real trajectories in the most accurate way, making a step forward the understanding of the origin of the spatio-temporal patterns of human mobility. |
Tasks | |
Published | 2016-07-16 |
URL | http://arxiv.org/abs/1607.05952v3 |
http://arxiv.org/pdf/1607.05952v3.pdf | |
PWC | https://paperswithcode.com/paper/data-driven-generation-of-spatio-temporal |
Repo | https://github.com/jonpappalord/DITRAS |
Framework | none |
Safe Exploration in Finite Markov Decision Processes with Gaussian Processes
Title | Safe Exploration in Finite Markov Decision Processes with Gaussian Processes |
Authors | Matteo Turchetta, Felix Berkenkamp, Andreas Krause |
Abstract | In classical reinforcement learning, when exploring an environment, agents accept arbitrary short term loss for long term gain. This is infeasible for safety critical applications, such as robotics, where even a single unsafe action may cause system failure. In this paper, we address the problem of safely exploring finite Markov decision processes (MDP). We define safety in terms of an, a priori unknown, safety constraint that depends on states and actions. We aim to explore the MDP under this constraint, assuming that the unknown function satisfies regularity conditions expressed via a Gaussian process prior. We develop a novel algorithm for this task and prove that it is able to completely explore the safely reachable part of the MDP without violating the safety constraint. To achieve this, it cautiously explores safe states and actions in order to gain statistical confidence about the safety of unvisited state-action pairs from noisy observations collected while navigating the environment. Moreover, the algorithm explicitly considers reachability when exploring the MDP, ensuring that it does not get stuck in any state with no safe way out. We demonstrate our method on digital terrain models for the task of exploring an unknown map with a rover. |
Tasks | Gaussian Processes, Safe Exploration |
Published | 2016-06-15 |
URL | http://arxiv.org/abs/1606.04753v2 |
http://arxiv.org/pdf/1606.04753v2.pdf | |
PWC | https://paperswithcode.com/paper/safe-exploration-in-finite-markov-decision |
Repo | https://github.com/befelix/SafeMDP |
Framework | none |
Joint Learning of the Embedding of Words and Entities for Named Entity Disambiguation
Title | Joint Learning of the Embedding of Words and Entities for Named Entity Disambiguation |
Authors | Ikuya Yamada, Hiroyuki Shindo, Hideaki Takeda, Yoshiyasu Takefuji |
Abstract | Named Entity Disambiguation (NED) refers to the task of resolving multiple named entity mentions in a document to their correct references in a knowledge base (KB) (e.g., Wikipedia). In this paper, we propose a novel embedding method specifically designed for NED. The proposed method jointly maps words and entities into the same continuous vector space. We extend the skip-gram model by using two models. The KB graph model learns the relatedness of entities using the link structure of the KB, whereas the anchor context model aims to align vectors such that similar words and entities occur close to one another in the vector space by leveraging KB anchors and their context words. By combining contexts based on the proposed embedding with standard NED features, we achieved state-of-the-art accuracy of 93.1% on the standard CoNLL dataset and 85.2% on the TAC 2010 dataset. |
Tasks | Entity Disambiguation, Entity Linking |
Published | 2016-01-06 |
URL | http://arxiv.org/abs/1601.01343v4 |
http://arxiv.org/pdf/1601.01343v4.pdf | |
PWC | https://paperswithcode.com/paper/joint-learning-of-the-embedding-of-words-and |
Repo | https://github.com/wikipedia2vec/wikipedia2vec |
Framework | none |
Abstractive Text Summarization Using Sequence-to-Sequence RNNs and Beyond
Title | Abstractive Text Summarization Using Sequence-to-Sequence RNNs and Beyond |
Authors | Ramesh Nallapati, Bowen Zhou, Cicero Nogueira dos santos, Caglar Gulcehre, Bing Xiang |
Abstract | In this work, we model abstractive text summarization using Attentional Encoder-Decoder Recurrent Neural Networks, and show that they achieve state-of-the-art performance on two different corpora. We propose several novel models that address critical problems in summarization that are not adequately modeled by the basic architecture, such as modeling key-words, capturing the hierarchy of sentence-to-word structure, and emitting words that are rare or unseen at training time. Our work shows that many of our proposed models contribute to further improvement in performance. We also propose a new dataset consisting of multi-sentence summaries, and establish performance benchmarks for further research. |
Tasks | Abstractive Text Summarization, Text Summarization |
Published | 2016-02-19 |
URL | http://arxiv.org/abs/1602.06023v5 |
http://arxiv.org/pdf/1602.06023v5.pdf | |
PWC | https://paperswithcode.com/paper/abstractive-text-summarization-using-sequence |
Repo | https://github.com/yunzhusong/AAAI20-PORLHG |
Framework | none |
Generating Factoid Questions With Recurrent Neural Networks: The 30M Factoid Question-Answer Corpus
Title | Generating Factoid Questions With Recurrent Neural Networks: The 30M Factoid Question-Answer Corpus |
Authors | Iulian Vlad Serban, Alberto García-Durán, Caglar Gulcehre, Sungjin Ahn, Sarath Chandar, Aaron Courville, Yoshua Bengio |
Abstract | Over the past decade, large-scale supervised learning corpora have enabled machine learning researchers to make substantial advances. However, to this date, there are no large-scale question-answer corpora available. In this paper we present the 30M Factoid Question-Answer Corpus, an enormous question answer pair corpus produced by applying a novel neural network architecture on the knowledge base Freebase to transduce facts into natural language questions. The produced question answer pairs are evaluated both by human evaluators and using automatic evaluation metrics, including well-established machine translation and sentence similarity metrics. Across all evaluation criteria the question-generation model outperforms the competing template-based baseline. Furthermore, when presented to human evaluators, the generated questions appear comparable in quality to real human-generated questions. |
Tasks | Machine Translation, Question Generation |
Published | 2016-03-22 |
URL | http://arxiv.org/abs/1603.06807v2 |
http://arxiv.org/pdf/1603.06807v2.pdf | |
PWC | https://paperswithcode.com/paper/generating-factoid-questions-with-recurrent |
Repo | https://github.com/imatge-upc/vqa-2016-cvprw |
Framework | tf |