Paper Group NANR 249
Backpropagation with Callbacks: Foundations for Efficient and Expressive Differentiable Programming. Tweety at SemEval-2018 Task 2: Predicting Emojis using Hierarchical Attention Neural Networks and Support Vector Machine. GATED FAST WEIGHTS FOR ASSOCIATIVE RETRIEVAL. A Multi-task Approach to Learning Multilingual Representations. Multilingual Seq2 …
Backpropagation with Callbacks: Foundations for Efficient and Expressive Differentiable Programming
Title | Backpropagation with Callbacks: Foundations for Efficient and Expressive Differentiable Programming |
Authors | Fei Wang, James Decker, Xilun Wu, Gregory Essertel, Tiark Rompf |
Abstract | Training of deep learning models depends on gradient descent and end-to-end differentiation. Under the slogan of differentiable programming, there is an increasing demand for efficient automatic gradient computation for emerging network architectures that incorporate dynamic control flow, especially in NLP. In this paper we propose an implementation of backpropagation using functions with callbacks, where the forward pass is executed as a sequence of function calls, and the backward pass as a corresponding sequence of function returns. A key realization is that this technique of chaining callbacks is well known in the programming languages community as continuation-passing style (CPS). Any program can be converted to this form using standard techniques, and hence, any program can be mechanically converted to compute gradients. Our approach achieves the same flexibility as other reverse-mode automatic differentiation (AD) techniques, but it can be implemented without any auxiliary data structures besides the function call stack, and it can easily be combined with graph construction and native code generation techniques through forms of multi-stage programming, leading to a highly efficient implementation that combines the performance benefits of define-then-run software frameworks such as TensorFlow with the expressiveness of define-by-run frameworks such as PyTorch. |
Tasks | Code Generation, graph construction |
Published | 2018-12-01 |
URL | http://papers.nips.cc/paper/8221-backpropagation-with-callbacks-foundations-for-efficient-and-expressive-differentiable-programming |
http://papers.nips.cc/paper/8221-backpropagation-with-callbacks-foundations-for-efficient-and-expressive-differentiable-programming.pdf | |
PWC | https://paperswithcode.com/paper/backpropagation-with-callbacks-foundations |
Repo | |
Framework | |
Tweety at SemEval-2018 Task 2: Predicting Emojis using Hierarchical Attention Neural Networks and Support Vector Machine
Title | Tweety at SemEval-2018 Task 2: Predicting Emojis using Hierarchical Attention Neural Networks and Support Vector Machine |
Authors | Daniel Kopev, Atanas Atanasov, Dimitrina Zlatkova, Momchil Hardalov, Ivan Koychev, Ivelina Nikolova, Galia Angelova |
Abstract | We present the system built for SemEval-2018 Task 2 on Emoji Prediction. Although Twitter messages are very short we managed to design a wide variety of features: textual, semantic, sentiment, emotion-, and color-related ones. We investigated different methods of text preprocessing including replacing text emojis with respective tokens and splitting hashtags to capture more meaning. To represent text we used word n-grams and word embeddings. We experimented with a wide range of classifiers and our best results were achieved using a SVM-based classifier and a Hierarchical Attention Neural Network. |
Tasks | Word Embeddings |
Published | 2018-06-01 |
URL | https://www.aclweb.org/anthology/S18-1080/ |
https://www.aclweb.org/anthology/S18-1080 | |
PWC | https://paperswithcode.com/paper/tweety-at-semeval-2018-task-2-predicting |
Repo | |
Framework | |
GATED FAST WEIGHTS FOR ASSOCIATIVE RETRIEVAL
Title | GATED FAST WEIGHTS FOR ASSOCIATIVE RETRIEVAL |
Authors | Imanol Schlag, Jürgen Schmidhuber |
Abstract | We improve previous end-to-end differentiable neural networks (NNs) with fast weight memories. A gate mechanism updates fast weights at every time step of a sequence through two separate outer-product-based matrices generated by slow parts of the net. The system is trained on a complex sequence to sequence variation of the Associative Retrieval Problem with roughly 70 times more temporal memory (i.e. time-varying variables) than similar-sized standard recurrent NNs (RNNs). In terms of accuracy and number of parameters, our architecture outperforms a variety of RNNs, including Long Short-Term Memory, Hypernetworks, and related fast weight architectures. |
Tasks | |
Published | 2018-01-01 |
URL | https://openreview.net/forum?id=HJ8W1Q-0Z |
https://openreview.net/pdf?id=HJ8W1Q-0Z | |
PWC | https://paperswithcode.com/paper/gated-fast-weights-for-associative-retrieval |
Repo | |
Framework | |
A Multi-task Approach to Learning Multilingual Representations
Title | A Multi-task Approach to Learning Multilingual Representations |
Authors | Karan Singla, Dogan Can, Shrikanth Narayanan |
Abstract | We present a novel multi-task modeling approach to learning multilingual distributed representations of text. Our system learns word and sentence embeddings jointly by training a multilingual skip-gram model together with a cross-lingual sentence similarity model. Our architecture can transparently use both monolingual and sentence aligned bilingual corpora to learn multilingual embeddings, thus covering a vocabulary significantly larger than the vocabulary of the bilingual corpora alone. Our model shows competitive performance in a standard cross-lingual document classification task. We also show the effectiveness of our method in a limited resource scenario. |
Tasks | Cross-Lingual Document Classification, Document Classification, Sentence Embeddings, Word Embeddings |
Published | 2018-07-01 |
URL | https://www.aclweb.org/anthology/P18-2035/ |
https://www.aclweb.org/anthology/P18-2035 | |
PWC | https://paperswithcode.com/paper/a-multi-task-approach-to-learning |
Repo | |
Framework | |
Multilingual Seq2seq Training with Similarity Loss for Cross-Lingual Document Classification
Title | Multilingual Seq2seq Training with Similarity Loss for Cross-Lingual Document Classification |
Authors | Katherine Yu, Haoran Li, Barlas Oguz |
Abstract | In this paper we continue experiments where neural machine translation training is used to produce joint cross-lingual fixed-dimensional sentence embeddings. In this framework we introduce a simple method of adding a loss to the learning objective which penalizes distance between representations of bilingually aligned sentences. We evaluate cross-lingual transfer using two approaches, cross-lingual similarity search on an aligned corpus (Europarl) and cross-lingual document classification on a recently published benchmark Reuters corpus, and we find the similarity loss significantly improves performance on both. Furthermore, we notice that while our Reuters results are very competitive, our English results are not as competitive, showing room for improvement in the current cross-lingual state-of-the-art. Our results are based on a set of 6 European languages. |
Tasks | Cross-Lingual Document Classification, Cross-Lingual Transfer, Document Classification, Machine Translation, Representation Learning, Sentence Embedding, Sentence Embeddings, Word Embeddings |
Published | 2018-07-01 |
URL | https://www.aclweb.org/anthology/W18-3023/ |
https://www.aclweb.org/anthology/W18-3023 | |
PWC | https://paperswithcode.com/paper/multilingual-seq2seq-training-with-similarity |
Repo | |
Framework | |
Demo2Vec: Reasoning Object Affordances From Online Videos
Title | Demo2Vec: Reasoning Object Affordances From Online Videos |
Authors | Kuan Fang, Te-Lin Wu, Daniel Yang, Silvio Savarese, Joseph J. Lim |
Abstract | Watching expert demonstrations is an important way for humans and robots to reason about affordances of unseen objects. In this paper, we consider the problem of reasoning object affordances through the feature embedding of demonstration videos. We design the Demo2Vec model which learns to extract embedded vectors of demonstration videos and predicts the interaction region and the action label on a target image of the same object. We introduce the Online Product Review dataset for Affordance (OPRA) by collecting and labeling diverse YouTube product review videos. Our Demo2Vec model outperforms various recurrent neural network baselines on the collected dataset. |
Tasks | |
Published | 2018-06-01 |
URL | http://openaccess.thecvf.com/content_cvpr_2018/html/Fang_Demo2Vec_Reasoning_Object_CVPR_2018_paper.html |
http://openaccess.thecvf.com/content_cvpr_2018/papers/Fang_Demo2Vec_Reasoning_Object_CVPR_2018_paper.pdf | |
PWC | https://paperswithcode.com/paper/demo2vec-reasoning-object-affordances-from |
Repo | |
Framework | |
Lyapunov Functions for First-Order Methods: Tight Automated Convergence Guarantees
Title | Lyapunov Functions for First-Order Methods: Tight Automated Convergence Guarantees |
Authors | Adrien Taylor, Bryan Van Scoy, Laurent Lessard |
Abstract | We present a novel way of generating Lyapunov functions for proving linear convergence rates of first-order optimization methods. Our approach provably obtains the fastest linear convergence rate that can be verified by a quadratic Lyapunov function (with given states), and only relies on solving a small-sized semidefinite program. Our approach combines the advantages of performance estimation problems (PEP, due to Drori and Teboulle (2014)) and integral quadratic constraints (IQC, due to Lessard et al. (2016)), and relies on convex interpolation (due to Taylor et al. (2017c;b)). |
Tasks | |
Published | 2018-07-01 |
URL | https://icml.cc/Conferences/2018/Schedule?showEvent=2168 |
http://proceedings.mlr.press/v80/taylor18a/taylor18a.pdf | |
PWC | https://paperswithcode.com/paper/lyapunov-functions-for-first-order-methods |
Repo | |
Framework | |
Crowdsourced Multimodal Corpora Collection Tool
Title | Crowdsourced Multimodal Corpora Collection Tool |
Authors | Patrik Jonell, Catharine Oertel, Dimosthenis Kontogiorgos, Jonas Beskow, Joakim Gustafson |
Abstract | |
Tasks | |
Published | 2018-05-01 |
URL | https://www.aclweb.org/anthology/L18-1117/ |
https://www.aclweb.org/anthology/L18-1117 | |
PWC | https://paperswithcode.com/paper/crowdsourced-multimodal-corpora-collection |
Repo | |
Framework | |
Nonparametric Regression with Comparisons: Escaping the Curse of Dimensionality with Ordinal Information
Title | Nonparametric Regression with Comparisons: Escaping the Curse of Dimensionality with Ordinal Information |
Authors | Yichong Xu, Hariank Muthakana, Sivaraman Balakrishnan, Aarti Singh, Artur Dubrawski |
Abstract | In supervised learning, we leverage a labeled dataset to design methods for function estimation. In many practical situations, we are able to obtain alternative feedback, possibly at a low cost. A broad goal is to understand the usefulness of, and to design algorithms to exploit, this alternative feedback. We focus on a semi-supervised setting where we obtain additional ordinal (or comparison) information for potentially unlabeled samples. We consider ordinal feedback of varying qualities where we have either a perfect ordering of the samples, a noisy ordering of the samples or noisy pairwise comparisons between the samples. We provide a precise quantification of the usefulness of these types of ordinal feedback in non-parametric regression, showing that in many cases it is possible to accurately estimate an underlying function with a very small labeled set, effectively escaping the curse of dimensionality. We develop an algorithm called Ranking-Regression (RR) and analyze its accuracy as a function of size of the labeled and unlabeled datasets and various noise parameters. We also present lower bounds, that establish fundamental limits for the task and show that RR is optimal in a variety of settings. Finally, we present experiments that show the efficacy of RR and investigate its robustness to various sources of noise and model-misspecification. |
Tasks | |
Published | 2018-07-01 |
URL | https://icml.cc/Conferences/2018/Schedule?showEvent=2095 |
http://proceedings.mlr.press/v80/xu18e/xu18e.pdf | |
PWC | https://paperswithcode.com/paper/nonparametric-regression-with-comparisons-1 |
Repo | |
Framework | |
Streaming word similarity mining on the cheap
Title | Streaming word similarity mining on the cheap |
Authors | Olof G{"o}rnerup, Daniel Gillblad |
Abstract | Accurately and efficiently estimating word similarities from text is fundamental in natural language processing. In this paper, we propose a fast and lightweight method for estimating similarities from streams by explicitly counting second-order co-occurrences. The method rests on the observation that words that are highly correlated with respect to such counts are also highly similar with respect to first-order co-occurrences. Using buffers of co-occurred words per word to count second-order co-occurrences, we can then estimate similarities in a single pass over data without having to do prohibitively expensive similarity calculations. We demonstrate that this approach is scalable, converges rapidly, behaves robustly under parameter changes, and that it captures word similarities on par with those given by state-of-the-art word embeddings. |
Tasks | Document Classification, Word Alignment, Word Embeddings |
Published | 2018-10-01 |
URL | https://www.aclweb.org/anthology/D18-1172/ |
https://www.aclweb.org/anthology/D18-1172 | |
PWC | https://paperswithcode.com/paper/streaming-word-similarity-mining-on-the-cheap |
Repo | |
Framework | |
A Named Entity Recognition Shootout for German
Title | A Named Entity Recognition Shootout for German |
Authors | Martin Riedl, Sebastian Pad{'o} |
Abstract | We ask how to practically build a model for German named entity recognition (NER) that performs at the state of the art for both contemporary and historical texts, i.e., a big-data and a small-data scenario. The two best-performing model families are pitted against each other (linear-chain CRFs and BiLSTM) to observe the trade-off between expressiveness and data requirements. BiLSTM outperforms the CRF when large datasets are available and performs inferior for the smallest dataset. BiLSTMs profit substantially from transfer learning, which enables them to be trained on multiple corpora, resulting in a new state-of-the-art model for German NER on two contemporary German corpora (CoNLL 2003 and GermEval 2014) and two historic corpora. |
Tasks | Entity Linking, Named Entity Recognition, Question Answering, Representation Learning, Transfer Learning |
Published | 2018-07-01 |
URL | https://www.aclweb.org/anthology/P18-2020/ |
https://www.aclweb.org/anthology/P18-2020 | |
PWC | https://paperswithcode.com/paper/a-named-entity-recognition-shootout-for |
Repo | |
Framework | |
Binary Partitions with Approximate Minimum Impurity
Title | Binary Partitions with Approximate Minimum Impurity |
Authors | Eduardo Laber, Marco Molinaro, Felipe Mello Pereira |
Abstract | The problem of splitting attributes is one of the main steps in the construction of decision trees. In order to decide the best split, impurity measures such as Entropy and Gini are widely used. In practice, decision-tree inducers use heuristics for finding splits with small impurity when they consider nominal attributes with a large number of distinct values. However, there are no known guarantees for the quality of the splits obtained by these heuristics. To fill this gap, we propose two new splitting procedures that provably achieve near-optimal impurity. We also report experiments that provide evidence that the proposed methods are interesting candidates to be employed in splitting nominal attributes with many values during decision tree/random forest induction. |
Tasks | |
Published | 2018-07-01 |
URL | https://icml.cc/Conferences/2018/Schedule?showEvent=1929 |
http://proceedings.mlr.press/v80/laber18a/laber18a.pdf | |
PWC | https://paperswithcode.com/paper/binary-partitions-with-approximate-minimum |
Repo | |
Framework | |
Annotating Chinese Light Verb Constructions according to PARSEME guidelines
Title | Annotating Chinese Light Verb Constructions according to PARSEME guidelines |
Authors | Menghan Jiang, Natalia Klyueva, Hongzhi Xu, Chu-Ren Huang |
Abstract | |
Tasks | Machine Translation |
Published | 2018-05-01 |
URL | https://www.aclweb.org/anthology/L18-1394/ |
https://www.aclweb.org/anthology/L18-1394 | |
PWC | https://paperswithcode.com/paper/annotating-chinese-light-verb-constructions |
Repo | |
Framework | |
NORMA: Neighborhood Sensitive Maps for Multilingual Word Embeddings
Title | NORMA: Neighborhood Sensitive Maps for Multilingual Word Embeddings |
Authors | Ndapa Nakashole |
Abstract | Inducing multilingual word embeddings by learning a linear map between embedding spaces of different languages achieves remarkable accuracy on related languages. However, accuracy drops substantially when translating between distant languages. Given that languages exhibit differences in vocabulary, grammar, written form, or syntax, one would expect that embedding spaces of different languages have different structures especially for distant languages. With the goal of capturing such differences, we propose a method for learning neighborhood sensitive maps, NORMA. Our experiments show that NORMA outperforms current state-of-the-art methods for word translation between distant languages. |
Tasks | Machine Translation, Multilingual Word Embeddings, Word Embeddings |
Published | 2018-10-01 |
URL | https://www.aclweb.org/anthology/D18-1047/ |
https://www.aclweb.org/anthology/D18-1047 | |
PWC | https://paperswithcode.com/paper/norma-neighborhood-sensitive-maps-for |
Repo | |
Framework | |
The Context-Aware Learner
Title | The Context-Aware Learner |
Authors | Conor Durkan, Amos Storkey, Harrison Edwards |
Abstract | One important aspect of generalization in machine learning involves reasoning about previously seen data in new settings. Such reasoning requires learning disentangled representations of data which are interpretable in isolation, but can also be combined in a new, unseen scenario. To this end, we introduce the context-aware learner, a model based on the variational autoencoding framework, which can learn such representations across data sets exhibiting a number of distinct contexts. Moreover, it is successfully able to combine these representations to generate data not seen at training time. The model enjoys an exponential increase in representational ability for a linear increase in context count. We demonstrate that the theory readily extends to a meta-learning setting such as this, and describe a fully unsupervised model in complete generality. Finally, we validate our approach using an adaptation with weak supervision. |
Tasks | Meta-Learning |
Published | 2018-01-01 |
URL | https://openreview.net/forum?id=BJRxfZbAW |
https://openreview.net/pdf?id=BJRxfZbAW | |
PWC | https://paperswithcode.com/paper/the-context-aware-learner |
Repo | |
Framework | |