October 15, 2019

2221 words 11 mins read

Paper Group NANR 249

Backpropagation with Callbacks: Foundations for Efficient and Expressive Differentiable Programming. Tweety at SemEval-2018 Task 2: Predicting Emojis using Hierarchical Attention Neural Networks and Support Vector Machine. GATED FAST WEIGHTS FOR ASSOCIATIVE RETRIEVAL. A Multi-task Approach to Learning Multilingual Representations. Multilingual Seq2 …

Backpropagation with Callbacks: Foundations for Efficient and Expressive Differentiable Programming


Title	Backpropagation with Callbacks: Foundations for Efficient and Expressive Differentiable Programming
Authors	Fei Wang, James Decker, Xilun Wu, Gregory Essertel, Tiark Rompf
Abstract	Training of deep learning models depends on gradient descent and end-to-end differentiation. Under the slogan of differentiable programming, there is an increasing demand for efficient automatic gradient computation for emerging network architectures that incorporate dynamic control flow, especially in NLP. In this paper we propose an implementation of backpropagation using functions with callbacks, where the forward pass is executed as a sequence of function calls, and the backward pass as a corresponding sequence of function returns. A key realization is that this technique of chaining callbacks is well known in the programming languages community as continuation-passing style (CPS). Any program can be converted to this form using standard techniques, and hence, any program can be mechanically converted to compute gradients. Our approach achieves the same flexibility as other reverse-mode automatic differentiation (AD) techniques, but it can be implemented without any auxiliary data structures besides the function call stack, and it can easily be combined with graph construction and native code generation techniques through forms of multi-stage programming, leading to a highly efficient implementation that combines the performance benefits of define-then-run software frameworks such as TensorFlow with the expressiveness of define-by-run frameworks such as PyTorch.
Tasks	Code Generation, graph construction
Published	2018-12-01
URL	http://papers.nips.cc/paper/8221-backpropagation-with-callbacks-foundations-for-efficient-and-expressive-differentiable-programming
PDF	http://papers.nips.cc/paper/8221-backpropagation-with-callbacks-foundations-for-efficient-and-expressive-differentiable-programming.pdf
PWC	https://paperswithcode.com/paper/backpropagation-with-callbacks-foundations
Repo
Framework

Tweety at SemEval-2018 Task 2: Predicting Emojis using Hierarchical Attention Neural Networks and Support Vector Machine


Title	Tweety at SemEval-2018 Task 2: Predicting Emojis using Hierarchical Attention Neural Networks and Support Vector Machine
Authors	Daniel Kopev, Atanas Atanasov, Dimitrina Zlatkova, Momchil Hardalov, Ivan Koychev, Ivelina Nikolova, Galia Angelova
Abstract	We present the system built for SemEval-2018 Task 2 on Emoji Prediction. Although Twitter messages are very short we managed to design a wide variety of features: textual, semantic, sentiment, emotion-, and color-related ones. We investigated different methods of text preprocessing including replacing text emojis with respective tokens and splitting hashtags to capture more meaning. To represent text we used word n-grams and word embeddings. We experimented with a wide range of classifiers and our best results were achieved using a SVM-based classifier and a Hierarchical Attention Neural Network.
Tasks	Word Embeddings
Published	2018-06-01
URL	https://www.aclweb.org/anthology/S18-1080/
PDF	https://www.aclweb.org/anthology/S18-1080
PWC	https://paperswithcode.com/paper/tweety-at-semeval-2018-task-2-predicting
Repo
Framework

GATED FAST WEIGHTS FOR ASSOCIATIVE RETRIEVAL


Title	GATED FAST WEIGHTS FOR ASSOCIATIVE RETRIEVAL
Authors	Imanol Schlag, Jürgen Schmidhuber
Abstract	We improve previous end-to-end differentiable neural networks (NNs) with fast weight memories. A gate mechanism updates fast weights at every time step of a sequence through two separate outer-product-based matrices generated by slow parts of the net. The system is trained on a complex sequence to sequence variation of the Associative Retrieval Problem with roughly 70 times more temporal memory (i.e. time-varying variables) than similar-sized standard recurrent NNs (RNNs). In terms of accuracy and number of parameters, our architecture outperforms a variety of RNNs, including Long Short-Term Memory, Hypernetworks, and related fast weight architectures.
Tasks
Published	2018-01-01
URL	https://openreview.net/forum?id=HJ8W1Q-0Z
PDF	https://openreview.net/pdf?id=HJ8W1Q-0Z
PWC	https://paperswithcode.com/paper/gated-fast-weights-for-associative-retrieval
Repo
Framework

A Multi-task Approach to Learning Multilingual Representations


Title	A Multi-task Approach to Learning Multilingual Representations
Authors	Karan Singla, Dogan Can, Shrikanth Narayanan
Abstract	We present a novel multi-task modeling approach to learning multilingual distributed representations of text. Our system learns word and sentence embeddings jointly by training a multilingual skip-gram model together with a cross-lingual sentence similarity model. Our architecture can transparently use both monolingual and sentence aligned bilingual corpora to learn multilingual embeddings, thus covering a vocabulary significantly larger than the vocabulary of the bilingual corpora alone. Our model shows competitive performance in a standard cross-lingual document classification task. We also show the effectiveness of our method in a limited resource scenario.
Tasks	Cross-Lingual Document Classification, Document Classification, Sentence Embeddings, Word Embeddings
Published	2018-07-01
URL	https://www.aclweb.org/anthology/P18-2035/
PDF	https://www.aclweb.org/anthology/P18-2035
PWC	https://paperswithcode.com/paper/a-multi-task-approach-to-learning
Repo
Framework

Multilingual Seq2seq Training with Similarity Loss for Cross-Lingual Document Classification


Title	Multilingual Seq2seq Training with Similarity Loss for Cross-Lingual Document Classification
Authors	Katherine Yu, Haoran Li, Barlas Oguz
Abstract	In this paper we continue experiments where neural machine translation training is used to produce joint cross-lingual fixed-dimensional sentence embeddings. In this framework we introduce a simple method of adding a loss to the learning objective which penalizes distance between representations of bilingually aligned sentences. We evaluate cross-lingual transfer using two approaches, cross-lingual similarity search on an aligned corpus (Europarl) and cross-lingual document classification on a recently published benchmark Reuters corpus, and we find the similarity loss significantly improves performance on both. Furthermore, we notice that while our Reuters results are very competitive, our English results are not as competitive, showing room for improvement in the current cross-lingual state-of-the-art. Our results are based on a set of 6 European languages.
Tasks	Cross-Lingual Document Classification, Cross-Lingual Transfer, Document Classification, Machine Translation, Representation Learning, Sentence Embedding, Sentence Embeddings, Word Embeddings
Published	2018-07-01
URL	https://www.aclweb.org/anthology/W18-3023/
PDF	https://www.aclweb.org/anthology/W18-3023
PWC	https://paperswithcode.com/paper/multilingual-seq2seq-training-with-similarity
Repo
Framework

Demo2Vec: Reasoning Object Affordances From Online Videos


Title	Demo2Vec: Reasoning Object Affordances From Online Videos
Authors	Kuan Fang, Te-Lin Wu, Daniel Yang, Silvio Savarese, Joseph J. Lim
Abstract	Watching expert demonstrations is an important way for humans and robots to reason about affordances of unseen objects. In this paper, we consider the problem of reasoning object affordances through the feature embedding of demonstration videos. We design the Demo2Vec model which learns to extract embedded vectors of demonstration videos and predicts the interaction region and the action label on a target image of the same object. We introduce the Online Product Review dataset for Affordance (OPRA) by collecting and labeling diverse YouTube product review videos. Our Demo2Vec model outperforms various recurrent neural network baselines on the collected dataset.
Tasks
Published	2018-06-01
URL	http://openaccess.thecvf.com/content_cvpr_2018/html/Fang_Demo2Vec_Reasoning_Object_CVPR_2018_paper.html
PDF	http://openaccess.thecvf.com/content_cvpr_2018/papers/Fang_Demo2Vec_Reasoning_Object_CVPR_2018_paper.pdf
PWC	https://paperswithcode.com/paper/demo2vec-reasoning-object-affordances-from
Repo
Framework

Lyapunov Functions for First-Order Methods: Tight Automated Convergence Guarantees


Title	Lyapunov Functions for First-Order Methods: Tight Automated Convergence Guarantees
Authors	Adrien Taylor, Bryan Van Scoy, Laurent Lessard
Abstract	We present a novel way of generating Lyapunov functions for proving linear convergence rates of first-order optimization methods. Our approach provably obtains the fastest linear convergence rate that can be verified by a quadratic Lyapunov function (with given states), and only relies on solving a small-sized semidefinite program. Our approach combines the advantages of performance estimation problems (PEP, due to Drori and Teboulle (2014)) and integral quadratic constraints (IQC, due to Lessard et al. (2016)), and relies on convex interpolation (due to Taylor et al. (2017c;b)).
Tasks
Published	2018-07-01
URL	https://icml.cc/Conferences/2018/Schedule?showEvent=2168
PDF	http://proceedings.mlr.press/v80/taylor18a/taylor18a.pdf
PWC	https://paperswithcode.com/paper/lyapunov-functions-for-first-order-methods
Repo
Framework

Crowdsourced Multimodal Corpora Collection Tool


Title	Crowdsourced Multimodal Corpora Collection Tool
Authors	Patrik Jonell, Catharine Oertel, Dimosthenis Kontogiorgos, Jonas Beskow, Joakim Gustafson
Abstract
Tasks
Published	2018-05-01
URL	https://www.aclweb.org/anthology/L18-1117/
PDF	https://www.aclweb.org/anthology/L18-1117
PWC	https://paperswithcode.com/paper/crowdsourced-multimodal-corpora-collection
Repo
Framework

Nonparametric Regression with Comparisons: Escaping the Curse of Dimensionality with Ordinal Information


Title	Nonparametric Regression with Comparisons: Escaping the Curse of Dimensionality with Ordinal Information
Authors	Yichong Xu, Hariank Muthakana, Sivaraman Balakrishnan, Aarti Singh, Artur Dubrawski
Abstract	In supervised learning, we leverage a labeled dataset to design methods for function estimation. In many practical situations, we are able to obtain alternative feedback, possibly at a low cost. A broad goal is to understand the usefulness of, and to design algorithms to exploit, this alternative feedback. We focus on a semi-supervised setting where we obtain additional ordinal (or comparison) information for potentially unlabeled samples. We consider ordinal feedback of varying qualities where we have either a perfect ordering of the samples, a noisy ordering of the samples or noisy pairwise comparisons between the samples. We provide a precise quantification of the usefulness of these types of ordinal feedback in non-parametric regression, showing that in many cases it is possible to accurately estimate an underlying function with a very small labeled set, effectively escaping the curse of dimensionality. We develop an algorithm called Ranking-Regression (RR) and analyze its accuracy as a function of size of the labeled and unlabeled datasets and various noise parameters. We also present lower bounds, that establish fundamental limits for the task and show that RR is optimal in a variety of settings. Finally, we present experiments that show the efficacy of RR and investigate its robustness to various sources of noise and model-misspecification.
Tasks
Published	2018-07-01
URL	https://icml.cc/Conferences/2018/Schedule?showEvent=2095
PDF	http://proceedings.mlr.press/v80/xu18e/xu18e.pdf
PWC	https://paperswithcode.com/paper/nonparametric-regression-with-comparisons-1
Repo
Framework

Streaming word similarity mining on the cheap


Title	Streaming word similarity mining on the cheap
Authors	Olof G{"o}rnerup, Daniel Gillblad
Abstract	Accurately and efficiently estimating word similarities from text is fundamental in natural language processing. In this paper, we propose a fast and lightweight method for estimating similarities from streams by explicitly counting second-order co-occurrences. The method rests on the observation that words that are highly correlated with respect to such counts are also highly similar with respect to first-order co-occurrences. Using buffers of co-occurred words per word to count second-order co-occurrences, we can then estimate similarities in a single pass over data without having to do prohibitively expensive similarity calculations. We demonstrate that this approach is scalable, converges rapidly, behaves robustly under parameter changes, and that it captures word similarities on par with those given by state-of-the-art word embeddings.
Tasks	Document Classification, Word Alignment, Word Embeddings
Published	2018-10-01
URL	https://www.aclweb.org/anthology/D18-1172/
PDF	https://www.aclweb.org/anthology/D18-1172
PWC	https://paperswithcode.com/paper/streaming-word-similarity-mining-on-the-cheap
Repo
Framework

A Named Entity Recognition Shootout for German


Title	A Named Entity Recognition Shootout for German
Authors	Martin Riedl, Sebastian Pad{'o}
Abstract	We ask how to practically build a model for German named entity recognition (NER) that performs at the state of the art for both contemporary and historical texts, i.e., a big-data and a small-data scenario. The two best-performing model families are pitted against each other (linear-chain CRFs and BiLSTM) to observe the trade-off between expressiveness and data requirements. BiLSTM outperforms the CRF when large datasets are available and performs inferior for the smallest dataset. BiLSTMs profit substantially from transfer learning, which enables them to be trained on multiple corpora, resulting in a new state-of-the-art model for German NER on two contemporary German corpora (CoNLL 2003 and GermEval 2014) and two historic corpora.
Tasks	Entity Linking, Named Entity Recognition, Question Answering, Representation Learning, Transfer Learning
Published	2018-07-01
URL	https://www.aclweb.org/anthology/P18-2020/
PDF	https://www.aclweb.org/anthology/P18-2020
PWC	https://paperswithcode.com/paper/a-named-entity-recognition-shootout-for
Repo
Framework

Binary Partitions with Approximate Minimum Impurity


Title	Binary Partitions with Approximate Minimum Impurity
Authors	Eduardo Laber, Marco Molinaro, Felipe Mello Pereira
Abstract	The problem of splitting attributes is one of the main steps in the construction of decision trees. In order to decide the best split, impurity measures such as Entropy and Gini are widely used. In practice, decision-tree inducers use heuristics for finding splits with small impurity when they consider nominal attributes with a large number of distinct values. However, there are no known guarantees for the quality of the splits obtained by these heuristics. To fill this gap, we propose two new splitting procedures that provably achieve near-optimal impurity. We also report experiments that provide evidence that the proposed methods are interesting candidates to be employed in splitting nominal attributes with many values during decision tree/random forest induction.
Tasks
Published	2018-07-01
URL	https://icml.cc/Conferences/2018/Schedule?showEvent=1929
PDF	http://proceedings.mlr.press/v80/laber18a/laber18a.pdf
PWC	https://paperswithcode.com/paper/binary-partitions-with-approximate-minimum
Repo
Framework

Annotating Chinese Light Verb Constructions according to PARSEME guidelines


Title	Annotating Chinese Light Verb Constructions according to PARSEME guidelines
Authors	Menghan Jiang, Natalia Klyueva, Hongzhi Xu, Chu-Ren Huang
Abstract
Tasks	Machine Translation
Published	2018-05-01
URL	https://www.aclweb.org/anthology/L18-1394/
PDF	https://www.aclweb.org/anthology/L18-1394
PWC	https://paperswithcode.com/paper/annotating-chinese-light-verb-constructions
Repo
Framework

NORMA: Neighborhood Sensitive Maps for Multilingual Word Embeddings


Title	NORMA: Neighborhood Sensitive Maps for Multilingual Word Embeddings
Authors	Ndapa Nakashole
Abstract	Inducing multilingual word embeddings by learning a linear map between embedding spaces of different languages achieves remarkable accuracy on related languages. However, accuracy drops substantially when translating between distant languages. Given that languages exhibit differences in vocabulary, grammar, written form, or syntax, one would expect that embedding spaces of different languages have different structures especially for distant languages. With the goal of capturing such differences, we propose a method for learning neighborhood sensitive maps, NORMA. Our experiments show that NORMA outperforms current state-of-the-art methods for word translation between distant languages.
Tasks	Machine Translation, Multilingual Word Embeddings, Word Embeddings
Published	2018-10-01
URL	https://www.aclweb.org/anthology/D18-1047/
PDF	https://www.aclweb.org/anthology/D18-1047
PWC	https://paperswithcode.com/paper/norma-neighborhood-sensitive-maps-for
Repo
Framework

The Context-Aware Learner


Title	The Context-Aware Learner
Authors	Conor Durkan, Amos Storkey, Harrison Edwards
Abstract	One important aspect of generalization in machine learning involves reasoning about previously seen data in new settings. Such reasoning requires learning disentangled representations of data which are interpretable in isolation, but can also be combined in a new, unseen scenario. To this end, we introduce the context-aware learner, a model based on the variational autoencoding framework, which can learn such representations across data sets exhibiting a number of distinct contexts. Moreover, it is successfully able to combine these representations to generate data not seen at training time. The model enjoys an exponential increase in representational ability for a linear increase in context count. We demonstrate that the theory readily extends to a meta-learning setting such as this, and describe a fully unsupervised model in complete generality. Finally, we validate our approach using an adaptation with weak supervision.
Tasks	Meta-Learning
Published	2018-01-01
URL	https://openreview.net/forum?id=BJRxfZbAW
PDF	https://openreview.net/pdf?id=BJRxfZbAW
PWC	https://paperswithcode.com/paper/the-context-aware-learner
Repo
Framework