October 21, 2019

3068 words 15 mins read

Paper Group AWR 117

Episodic Curiosity through Reachability. Labeling Gaps Between Words: Recognizing Overlapping Mentions with Mention Separators. Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks. Adversarial Over-Sensitivity and Over-Stability Strategies for Dialogue Models. Predefined Sparseness in Recurrent Sequence Models. Iterative ann …

Episodic Curiosity through Reachability


Title	Episodic Curiosity through Reachability
Authors	Nikolay Savinov, Anton Raichuk, Raphaël Marinier, Damien Vincent, Marc Pollefeys, Timothy Lillicrap, Sylvain Gelly
Abstract	Rewards are sparse in the real world and most of today’s reinforcement learning algorithms struggle with such sparsity. One solution to this problem is to allow the agent to create rewards for itself - thus making rewards dense and more suitable for learning. In particular, inspired by curious behaviour in animals, observing something novel could be rewarded with a bonus. Such bonus is summed up with the real task reward - making it possible for RL algorithms to learn from the combined reward. We propose a new curiosity method which uses episodic memory to form the novelty bonus. To determine the bonus, the current observation is compared with the observations in memory. Crucially, the comparison is done based on how many environment steps it takes to reach the current observation from those in memory - which incorporates rich information about environment dynamics. This allows us to overcome the known “couch-potato” issues of prior work - when the agent finds a way to instantly gratify itself by exploiting actions which lead to hardly predictable consequences. We test our approach in visually rich 3D environments in ViZDoom, DMLab and MuJoCo. In navigational tasks from ViZDoom and DMLab, our agent outperforms the state-of-the-art curiosity method ICM. In MuJoCo, an ant equipped with our curiosity module learns locomotion out of the first-person-view curiosity only.
Tasks
Published	2018-10-04
URL	https://arxiv.org/abs/1810.02274v5
PDF	https://arxiv.org/pdf/1810.02274v5.pdf
PWC	https://paperswithcode.com/paper/episodic-curiosity-through-reachability
Repo	https://github.com/google-research/episodic-curiosity
Framework	tf

Labeling Gaps Between Words: Recognizing Overlapping Mentions with Mention Separators


Title	Labeling Gaps Between Words: Recognizing Overlapping Mentions with Mention Separators
Authors	Aldrian Obaja Muis, Wei Lu
Abstract	In this paper, we propose a new model that is capable of recognizing overlapping mentions. We introduce a novel notion of mention separators that can be effectively used to capture how mentions overlap with one another. On top of a novel multigraph representation that we introduce, we show that efficient and exact inference can still be performed. We present some theoretical analysis on the differences between our model and a recently proposed model for recognizing overlapping mentions, and discuss the possible implications of the differences. Through extensive empirical analysis on standard datasets, we demonstrate the effectiveness of our approach.
Tasks
Published	2018-10-22
URL	http://arxiv.org/abs/1810.09073v1
PDF	http://arxiv.org/pdf/1810.09073v1.pdf
PWC	https://paperswithcode.com/paper/labeling-gaps-between-words-recognizing
Repo	https://github.com/fishjh2/merge_label
Framework	pytorch

Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks


Title	Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks
Authors	Yikang Shen, Shawn Tan, Alessandro Sordoni, Aaron Courville
Abstract	Natural language is hierarchically structured: smaller units (e.g., phrases) are nested within larger units (e.g., clauses). When a larger constituent ends, all of the smaller constituents that are nested within it must also be closed. While the standard LSTM architecture allows different neurons to track information at different time scales, it does not have an explicit bias towards modeling a hierarchy of constituents. This paper proposes to add such an inductive bias by ordering the neurons; a vector of master input and forget gates ensures that when a given neuron is updated, all the neurons that follow it in the ordering are also updated. Our novel recurrent architecture, ordered neurons LSTM (ON-LSTM), achieves good performance on four different tasks: language modeling, unsupervised parsing, targeted syntactic evaluation, and logical inference.
Tasks	Constituency Grammar Induction, Language Modelling
Published	2018-10-22
URL	https://arxiv.org/abs/1810.09536v6
PDF	https://arxiv.org/pdf/1810.09536v6.pdf
PWC	https://paperswithcode.com/paper/ordered-neurons-integrating-tree-structures
Repo	https://github.com/TieDanCuihua/ORDERED-NEURONS-INTEGRATING-TREE-STRUCTURES-INTO-RECURRENT-NEURAL-NETWORKS--tensorflow
Framework	tf

Adversarial Over-Sensitivity and Over-Stability Strategies for Dialogue Models


Title	Adversarial Over-Sensitivity and Over-Stability Strategies for Dialogue Models
Authors	Tong Niu, Mohit Bansal
Abstract	We present two categories of model-agnostic adversarial strategies that reveal the weaknesses of several generative, task-oriented dialogue models: Should-Not-Change strategies that evaluate over-sensitivity to small and semantics-preserving edits, as well as Should-Change strategies that test if a model is over-stable against subtle yet semantics-changing modifications. We next perform adversarial training with each strategy, employing a max-margin approach for negative generative examples. This not only makes the target dialogue model more robust to the adversarial inputs, but also helps it perform significantly better on the original inputs. Moreover, training on all strategies combined achieves further improvements, achieving a new state-of-the-art performance on the original task (also verified via human evaluation). In addition to adversarial training, we also address the robustness task at the model-level, by feeding it subword units as both inputs and outputs, and show that the resulting model is equally competitive, requires only 1/4 of the original vocabulary size, and is robust to one of the adversarial strategies (to which the original model is vulnerable) even without adversarial training.
Tasks
Published	2018-09-06
URL	http://arxiv.org/abs/1809.02079v1
PDF	http://arxiv.org/pdf/1809.02079v1.pdf
PWC	https://paperswithcode.com/paper/adversarial-over-sensitivity-and-over
Repo	https://github.com/WolfNiu/AdversarialDialogue
Framework	tf

Predefined Sparseness in Recurrent Sequence Models


Title	Predefined Sparseness in Recurrent Sequence Models
Authors	Thomas Demeester, Johannes Deleu, Fréderic Godin, Chris Develder
Abstract	Inducing sparseness while training neural networks has been shown to yield models with a lower memory footprint but similar effectiveness to dense models. However, sparseness is typically induced starting from a dense model, and thus this advantage does not hold during training. We propose techniques to enforce sparseness upfront in recurrent sequence models for NLP applications, to also benefit training. First, in language modeling, we show how to increase hidden state sizes in recurrent layers without increasing the number of parameters, leading to more expressive models. Second, for sequence labeling, we show that word embeddings with predefined sparseness lead to similar performance as dense embeddings, at a fraction of the number of trainable parameters.
Tasks	Language Modelling, Word Embeddings
Published	2018-08-27
URL	http://arxiv.org/abs/1808.08720v1
PDF	http://arxiv.org/pdf/1808.08720v1.pdf
PWC	https://paperswithcode.com/paper/predefined-sparseness-in-recurrent-sequence
Repo	https://github.com/tdmeeste/SparseSeqModels
Framework	pytorch

Iterative annotation to ease neural network training: Specialized machine learning in medical image analysis


Title	Iterative annotation to ease neural network training: Specialized machine learning in medical image analysis
Authors	Brendon Lutnick, Brandon Ginley, Darshana Govind, Sean D. McGarry, Peter S. LaViolette, Rabi Yacoub, Sanjay Jain, John E. Tomaszewski, Kuang-Yu Jen, Pinaki Sarder
Abstract	Neural networks promise to bring robust, quantitative analysis to medical fields, but adoption is limited by the technicalities of training these networks. To address this translation gap between medical researchers and neural networks in the field of pathology, we have created an intuitive interface which utilizes the commonly used whole slide image (WSI) viewer, Aperio ImageScope (Leica Biosystems Imaging, Inc.), for the annotation and display of neural network predictions on WSIs. Leveraging this, we propose the use of a human-in-the-loop strategy to reduce the burden of WSI annotation. We track network performance improvements as a function of iteration and quantify the use of this pipeline for the segmentation of renal histologic findings on WSIs. More specifically, we present network performance when applied to segmentation of renal micro compartments, and demonstrate multi-class segmentation in human and mouse renal tissue slides. Finally, to show the adaptability of this technique to other medical imaging fields, we demonstrate its ability to iteratively segment human prostate glands from radiology imaging data.
Tasks
Published	2018-12-18
URL	http://arxiv.org/abs/1812.07509v1
PDF	http://arxiv.org/pdf/1812.07509v1.pdf
PWC	https://paperswithcode.com/paper/iterative-annotation-to-ease-neural-network
Repo	https://github.com/SarderLab/H-AI-L
Framework	tf

Rafiki: Machine Learning as an Analytics Service System


Title	Rafiki: Machine Learning as an Analytics Service System
Authors	Wei Wang, Sheng Wang, Jinyang Gao, Meihui Zhang, Gang Chen, Teck Khim Ng, Beng Chin Ooi
Abstract	Big data analytics is gaining massive momentum in the last few years. Applying machine learning models to big data has become an implicit requirement or an expectation for most analysis tasks, especially on high-stakes applications.Typical applications include sentiment analysis against reviews for analyzing on-line products, image classification in food logging applications for monitoring user’s daily intake and stock movement prediction. Extending traditional database systems to support the above analysis is intriguing but challenging. First, it is almost impossible to implement all machine learning models in the database engines. Second, expertise knowledge is required to optimize the training and inference procedures in terms of efficiency and effectiveness, which imposes heavy burden on the system users. In this paper, we develop and present a system, called Rafiki, to provide the training and inference service of machine learning models, and facilitate complex analytics on top of cloud platforms. Rafiki provides distributed hyper-parameter tuning for the training service, and online ensemble modeling for the inference service which trades off between latency and accuracy. Experimental results confirm the efficiency, effectiveness, scalability and usability of Rafiki.
Tasks	AutoML, Hyperparameter Optimization, Image Classification, Sentiment Analysis
Published	2018-04-17
URL	http://arxiv.org/abs/1804.06087v1
PDF	http://arxiv.org/pdf/1804.06087v1.pdf
PWC	https://paperswithcode.com/paper/rafiki-machine-learning-as-an-analytics
Repo	https://github.com/nginyc/rafiki
Framework	none

Bilingual Sentiment Embeddings: Joint Projection of Sentiment Across Languages


Title	Bilingual Sentiment Embeddings: Joint Projection of Sentiment Across Languages
Authors	Jeremy Barnes, Roman Klinger, Sabine Schulte im Walde
Abstract	Sentiment analysis in low-resource languages suffers from a lack of annotated corpora to estimate high-performing models. Machine translation and bilingual word embeddings provide some relief through cross-lingual sentiment approaches. However, they either require large amounts of parallel data or do not sufficiently capture sentiment information. We introduce Bilingual Sentiment Embeddings (BLSE), which jointly represent sentiment information in a source and target language. This model only requires a small bilingual lexicon, a source-language corpus annotated for sentiment, and monolingual word embeddings for each language. We perform experiments on three language combinations (Spanish, Catalan, Basque) for sentence-level cross-lingual sentiment classification and find that our model significantly outperforms state-of-the-art methods on four out of six experimental setups, as well as capturing complementary information to machine translation. Our analysis of the resulting embedding space provides evidence that it represents sentiment information in the resource-poor target language without any annotated data in that language.
Tasks	Machine Translation, Sentiment Analysis, Word Embeddings
Published	2018-05-23
URL	http://arxiv.org/abs/1805.09016v1
PDF	http://arxiv.org/pdf/1805.09016v1.pdf
PWC	https://paperswithcode.com/paper/bilingual-sentiment-embeddings-joint
Repo	https://github.com/jbarnesspain/blse
Framework	pytorch

Improving Aspect Term Extraction with Bidirectional Dependency Tree Representation


Title	Improving Aspect Term Extraction with Bidirectional Dependency Tree Representation
Authors	Huaishao Luo, Tianrui Li, Bing Liu, Bin Wang, Herwig Unger
Abstract	Aspect term extraction is one of the important subtasks in aspect-based sentiment analysis. Previous studies have shown that using dependency tree structure representation is promising for this task. However, most dependency tree structures involve only one directional propagation on the dependency tree. In this paper, we first propose a novel bidirectional dependency tree network to extract dependency structure features from the given sentences. The key idea is to explicitly incorporate both representations gained separately from the bottom-up and top-down propagation on the given dependency syntactic tree. An end-to-end framework is then developed to integrate the embedded representations and BiLSTM plus CRF to learn both tree-structured and sequential features to solve the aspect term extraction problem. Experimental results demonstrate that the proposed model outperforms state-of-the-art baseline models on four benchmark SemEval datasets.
Tasks	Aspect-Based Sentiment Analysis, Sentiment Analysis
Published	2018-05-21
URL	https://arxiv.org/abs/1805.07889v2
PDF	https://arxiv.org/pdf/1805.07889v2.pdf
PWC	https://paperswithcode.com/paper/improving-aspect-term-extraction-with
Repo	https://github.com/ArrowLuo/BiDTree
Framework	tf

Fortification of Neural Morphological Segmentation Models for Polysynthetic Minimal-Resource Languages


Title	Fortification of Neural Morphological Segmentation Models for Polysynthetic Minimal-Resource Languages
Authors	Katharina Kann, Manuel Mager, Ivan Meza-Ruiz, Hinrich Schütze
Abstract	Morphological segmentation for polysynthetic languages is challenging, because a word may consist of many individual morphemes and training data can be extremely scarce. Since neural sequence-to-sequence (seq2seq) models define the state of the art for morphological segmentation in high-resource settings and for (mostly) European languages, we first show that they also obtain competitive performance for Mexican polysynthetic languages in minimal-resource settings. We then propose two novel multi-task training approaches -one with, one without need for external unlabeled resources-, and two corresponding data augmentation methods, improving over the neural baseline for all languages. Finally, we explore cross-lingual transfer as a third way to fortify our neural model and show that we can train one single multi-lingual model for related languages while maintaining comparable or even improved performance, thus reducing the amount of parameters by close to 75%. We provide our morphological segmentation datasets for Mexicanero, Nahuatl, Wixarika and Yorem Nokki for future research.
Tasks	Cross-Lingual Transfer, Data Augmentation
Published	2018-04-17
URL	http://arxiv.org/abs/1804.06024v1
PDF	http://arxiv.org/pdf/1804.06024v1.pdf
PWC	https://paperswithcode.com/paper/fortification-of-neural-morphological
Repo	https://github.com/pywirrarika/naki
Framework	none

A Universal Music Translation Network


Title	A Universal Music Translation Network
Authors	Noam Mor, Lior Wolf, Adam Polyak, Yaniv Taigman
Abstract	We present a method for translating music across musical instruments, genres, and styles. This method is based on a multi-domain wavenet autoencoder, with a shared encoder and a disentangled latent space that is trained end-to-end on waveforms. Employing a diverse training dataset and large net capacity, the domain-independent encoder allows us to translate even from musical domains that were not seen during training. The method is unsupervised and does not rely on supervision in the form of matched samples between domains or musical transcriptions. We evaluate our method on NSynth, as well as on a dataset collected from professional musicians, and achieve convincing translations, even when translating from whistling, potentially enabling the creation of instrumental music by untrained humans.
Tasks
Published	2018-05-21
URL	http://arxiv.org/abs/1805.07848v2
PDF	http://arxiv.org/pdf/1805.07848v2.pdf
PWC	https://paperswithcode.com/paper/a-universal-music-translation-network
Repo	https://github.com/ShichengChen/WaveNetSeparateAudio
Framework	pytorch

Entropic GANs meet VAEs: A Statistical Approach to Compute Sample Likelihoods in GANs


Title	Entropic GANs meet VAEs: A Statistical Approach to Compute Sample Likelihoods in GANs
Authors	Yogesh Balaji, Hamed Hassani, Rama Chellappa, Soheil Feizi
Abstract	Building on the success of deep learning, two modern approaches to learn a probability model from the data are Generative Adversarial Networks (GANs) and Variational AutoEncoders (VAEs). VAEs consider an explicit probability model for the data and compute a generative distribution by maximizing a variational lower-bound on the log-likelihood function. GANs, however, compute a generative model by minimizing a distance between observed and generated probability distributions without considering an explicit model for the observed data. The lack of having explicit probability models in GANs prohibits computation of sample likelihoods in their frameworks and limits their use in statistical inference problems. In this work, we resolve this issue by constructing an explicit probability model that can be used to compute sample likelihood statistics in GANs. In particular, we prove that under this probability model, a family of Wasserstein GANs with an entropy regularization can be viewed as a generative model that maximizes a variational lower-bound on average sample log likelihoods, an approach that VAEs are based on. This result makes a principled connection between two modern generative models, namely GANs and VAEs. In addition to the aforementioned theoretical results, we compute likelihood statistics for GANs trained on Gaussian, MNIST, SVHN, CIFAR-10 and LSUN datasets. Our numerical results validate the proposed theory.
Tasks
Published	2018-10-09
URL	https://arxiv.org/abs/1810.04147v2
PDF	https://arxiv.org/pdf/1810.04147v2.pdf
PWC	https://paperswithcode.com/paper/entropic-gans-meet-vaes-a-statistical
Repo	https://github.com/yogeshbalaji/EntropicGANs_meet_VAEs
Framework	tf

Domain Adapted Word Embeddings for Improved Sentiment Classification


Title	Domain Adapted Word Embeddings for Improved Sentiment Classification
Authors	Prathusha K Sarma, YIngyu Liang, William A Sethares
Abstract	Generic word embeddings are trained on large-scale generic corpora; Domain Specific (DS) word embeddings are trained only on data from a domain of interest. This paper proposes a method to combine the breadth of generic embeddings with the specificity of domain specific embeddings. The resulting embeddings, called Domain Adapted (DA) word embeddings, are formed by aligning corresponding word vectors using Canonical Correlation Analysis (CCA) or the related nonlinear Kernel CCA. Evaluation results on sentiment classification tasks show that the DA embeddings substantially outperform both generic and DS embeddings when used as input features to standard or state-of-the-art sentence encoding algorithms for classification.
Tasks	Sentiment Analysis, Word Embeddings
Published	2018-05-11
URL	http://arxiv.org/abs/1805.04576v1
PDF	http://arxiv.org/pdf/1805.04576v1.pdf
PWC	https://paperswithcode.com/paper/domain-adapted-word-embeddings-for-improved
Repo	https://github.com/GallupGovt/multivac
Framework	none

Characterizing Entities in the Bitcoin Blockchain


Title	Characterizing Entities in the Bitcoin Blockchain
Authors	Marc Jourdan, Sebastien Blandin, Laura Wynter, Pralhad Deshpande
Abstract	Bitcoin has created a new exchange paradigm within which financial transactions can be trusted without an intermediary. This premise of a free decentralized transactional network however requires, in its current implementation, unrestricted access to the ledger for peer-based transaction verification. A number of studies have shown that, in this pseudonymous context, identities can be leaked based on transaction features or off-network information. In this work, we analyze the information revealed by the pattern of transactions in the neighborhood of a given entity transaction. By definition, these features which pertain to an extended network are not directly controllable by the entity, but might enable leakage of information about transacting entities. We define a number of new features relevant to entity characterization on the Bitcoin Blockchain and study their efficacy in practice. We show that even a weak attacker with shallow data mining knowledge is able to leverage these features to characterize the entity properties.
Tasks
Published	2018-10-29
URL	http://arxiv.org/abs/1810.11956v1
PDF	http://arxiv.org/pdf/1810.11956v1.pdf
PWC	https://paperswithcode.com/paper/characterizing-entities-in-the-bitcoin
Repo	https://github.com/Maru92/EntityAddressBitcoin
Framework	none

Identifying and Alleviating Concept Drift in Streaming Tensor Decomposition


Title	Identifying and Alleviating Concept Drift in Streaming Tensor Decomposition
Authors	Ravdeep Pasricha, Ekta Gujral, Evangelos E. Papalexakis
Abstract	Tensor decompositions are used in various data mining applications from social network to medical applications and are extremely useful in discovering latent structures or concepts in the data. Many real-world applications are dynamic in nature and so are their data. To deal with this dynamic nature of data, there exist a variety of online tensor decomposition algorithms. A central assumption in all those algorithms is that the number of latent concepts remains fixed throughout the entire stream. However, this need not be the case. Every incoming batch in the stream may have a different number of latent concepts, and the difference in latent concepts from one tensor batch to another can provide insights into how our findings in a particular application behave and deviate over time. In this paper, we define “concept” and “concept drift” in the context of streaming tensor decomposition, as the manifestation of the variability of latent concepts throughout the stream. Furthermore, we introduce SeekAndDestroy, an algorithm that detects concept drift in streaming tensor decomposition and is able to produce results robust to that drift. To the best of our knowledge, this is the first work that investigates concept drift in streaming tensor decomposition. We extensively evaluate SeekAndDestroy on synthetic datasets, which exhibit a wide variety of realistic drift. Our experiments demonstrate the effectiveness of SeekAndDestroy, both in the detection of concept drift and in the alleviation of its effects, producing results with similar quality to decomposing the entire tensor in one shot. Additionally, in real datasets, SeekAndDestroy outperforms other streaming baselines, while discovering novel useful components.
Tasks
Published	2018-04-25
URL	http://arxiv.org/abs/1804.09619v2
PDF	http://arxiv.org/pdf/1804.09619v2.pdf
PWC	https://paperswithcode.com/paper/identifying-and-alleviating-concept-drift-in
Repo	https://github.com/ravdeep003/conceptDrift
Framework	none