October 21, 2019

3068 words 15 mins read

Paper Group AWR 117

Paper Group AWR 117

Episodic Curiosity through Reachability. Labeling Gaps Between Words: Recognizing Overlapping Mentions with Mention Separators. Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks. Adversarial Over-Sensitivity and Over-Stability Strategies for Dialogue Models. Predefined Sparseness in Recurrent Sequence Models. Iterative ann …

Episodic Curiosity through Reachability

Title Episodic Curiosity through Reachability
Authors Nikolay Savinov, Anton Raichuk, Raphaël Marinier, Damien Vincent, Marc Pollefeys, Timothy Lillicrap, Sylvain Gelly
Abstract Rewards are sparse in the real world and most of today’s reinforcement learning algorithms struggle with such sparsity. One solution to this problem is to allow the agent to create rewards for itself - thus making rewards dense and more suitable for learning. In particular, inspired by curious behaviour in animals, observing something novel could be rewarded with a bonus. Such bonus is summed up with the real task reward - making it possible for RL algorithms to learn from the combined reward. We propose a new curiosity method which uses episodic memory to form the novelty bonus. To determine the bonus, the current observation is compared with the observations in memory. Crucially, the comparison is done based on how many environment steps it takes to reach the current observation from those in memory - which incorporates rich information about environment dynamics. This allows us to overcome the known “couch-potato” issues of prior work - when the agent finds a way to instantly gratify itself by exploiting actions which lead to hardly predictable consequences. We test our approach in visually rich 3D environments in ViZDoom, DMLab and MuJoCo. In navigational tasks from ViZDoom and DMLab, our agent outperforms the state-of-the-art curiosity method ICM. In MuJoCo, an ant equipped with our curiosity module learns locomotion out of the first-person-view curiosity only.
Tasks
Published 2018-10-04
URL https://arxiv.org/abs/1810.02274v5
PDF https://arxiv.org/pdf/1810.02274v5.pdf
PWC https://paperswithcode.com/paper/episodic-curiosity-through-reachability
Repo https://github.com/google-research/episodic-curiosity
Framework tf

Labeling Gaps Between Words: Recognizing Overlapping Mentions with Mention Separators

Title Labeling Gaps Between Words: Recognizing Overlapping Mentions with Mention Separators
Authors Aldrian Obaja Muis, Wei Lu
Abstract In this paper, we propose a new model that is capable of recognizing overlapping mentions. We introduce a novel notion of mention separators that can be effectively used to capture how mentions overlap with one another. On top of a novel multigraph representation that we introduce, we show that efficient and exact inference can still be performed. We present some theoretical analysis on the differences between our model and a recently proposed model for recognizing overlapping mentions, and discuss the possible implications of the differences. Through extensive empirical analysis on standard datasets, we demonstrate the effectiveness of our approach.
Tasks
Published 2018-10-22
URL http://arxiv.org/abs/1810.09073v1
PDF http://arxiv.org/pdf/1810.09073v1.pdf
PWC https://paperswithcode.com/paper/labeling-gaps-between-words-recognizing
Repo https://github.com/fishjh2/merge_label
Framework pytorch

Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks

Title Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks
Authors Yikang Shen, Shawn Tan, Alessandro Sordoni, Aaron Courville
Abstract Natural language is hierarchically structured: smaller units (e.g., phrases) are nested within larger units (e.g., clauses). When a larger constituent ends, all of the smaller constituents that are nested within it must also be closed. While the standard LSTM architecture allows different neurons to track information at different time scales, it does not have an explicit bias towards modeling a hierarchy of constituents. This paper proposes to add such an inductive bias by ordering the neurons; a vector of master input and forget gates ensures that when a given neuron is updated, all the neurons that follow it in the ordering are also updated. Our novel recurrent architecture, ordered neurons LSTM (ON-LSTM), achieves good performance on four different tasks: language modeling, unsupervised parsing, targeted syntactic evaluation, and logical inference.
Tasks Constituency Grammar Induction, Language Modelling
Published 2018-10-22
URL https://arxiv.org/abs/1810.09536v6
PDF https://arxiv.org/pdf/1810.09536v6.pdf
PWC https://paperswithcode.com/paper/ordered-neurons-integrating-tree-structures
Repo https://github.com/TieDanCuihua/ORDERED-NEURONS-INTEGRATING-TREE-STRUCTURES-INTO-RECURRENT-NEURAL-NETWORKS--tensorflow
Framework tf

Adversarial Over-Sensitivity and Over-Stability Strategies for Dialogue Models

Title Adversarial Over-Sensitivity and Over-Stability Strategies for Dialogue Models
Authors Tong Niu, Mohit Bansal
Abstract We present two categories of model-agnostic adversarial strategies that reveal the weaknesses of several generative, task-oriented dialogue models: Should-Not-Change strategies that evaluate over-sensitivity to small and semantics-preserving edits, as well as Should-Change strategies that test if a model is over-stable against subtle yet semantics-changing modifications. We next perform adversarial training with each strategy, employing a max-margin approach for negative generative examples. This not only makes the target dialogue model more robust to the adversarial inputs, but also helps it perform significantly better on the original inputs. Moreover, training on all strategies combined achieves further improvements, achieving a new state-of-the-art performance on the original task (also verified via human evaluation). In addition to adversarial training, we also address the robustness task at the model-level, by feeding it subword units as both inputs and outputs, and show that the resulting model is equally competitive, requires only 1/4 of the original vocabulary size, and is robust to one of the adversarial strategies (to which the original model is vulnerable) even without adversarial training.
Tasks
Published 2018-09-06
URL http://arxiv.org/abs/1809.02079v1
PDF http://arxiv.org/pdf/1809.02079v1.pdf
PWC https://paperswithcode.com/paper/adversarial-over-sensitivity-and-over
Repo https://github.com/WolfNiu/AdversarialDialogue
Framework tf

Predefined Sparseness in Recurrent Sequence Models

Title Predefined Sparseness in Recurrent Sequence Models
Authors Thomas Demeester, Johannes Deleu, Fréderic Godin, Chris Develder
Abstract Inducing sparseness while training neural networks has been shown to yield models with a lower memory footprint but similar effectiveness to dense models. However, sparseness is typically induced starting from a dense model, and thus this advantage does not hold during training. We propose techniques to enforce sparseness upfront in recurrent sequence models for NLP applications, to also benefit training. First, in language modeling, we show how to increase hidden state sizes in recurrent layers without increasing the number of parameters, leading to more expressive models. Second, for sequence labeling, we show that word embeddings with predefined sparseness lead to similar performance as dense embeddings, at a fraction of the number of trainable parameters.
Tasks Language Modelling, Word Embeddings
Published 2018-08-27
URL http://arxiv.org/abs/1808.08720v1
PDF http://arxiv.org/pdf/1808.08720v1.pdf
PWC https://paperswithcode.com/paper/predefined-sparseness-in-recurrent-sequence
Repo https://github.com/tdmeeste/SparseSeqModels
Framework pytorch

Iterative annotation to ease neural network training: Specialized machine learning in medical image analysis

Title Iterative annotation to ease neural network training: Specialized machine learning in medical image analysis
Authors Brendon Lutnick, Brandon Ginley, Darshana Govind, Sean D. McGarry, Peter S. LaViolette, Rabi Yacoub, Sanjay Jain, John E. Tomaszewski, Kuang-Yu Jen, Pinaki Sarder
Abstract Neural networks promise to bring robust, quantitative analysis to medical fields, but adoption is limited by the technicalities of training these networks. To address this translation gap between medical researchers and neural networks in the field of pathology, we have created an intuitive interface which utilizes the commonly used whole slide image (WSI) viewer, Aperio ImageScope (Leica Biosystems Imaging, Inc.), for the annotation and display of neural network predictions on WSIs. Leveraging this, we propose the use of a human-in-the-loop strategy to reduce the burden of WSI annotation. We track network performance improvements as a function of iteration and quantify the use of this pipeline for the segmentation of renal histologic findings on WSIs. More specifically, we present network performance when applied to segmentation of renal micro compartments, and demonstrate multi-class segmentation in human and mouse renal tissue slides. Finally, to show the adaptability of this technique to other medical imaging fields, we demonstrate its ability to iteratively segment human prostate glands from radiology imaging data.
Tasks
Published 2018-12-18
URL http://arxiv.org/abs/1812.07509v1
PDF http://arxiv.org/pdf/1812.07509v1.pdf
PWC https://paperswithcode.com/paper/iterative-annotation-to-ease-neural-network
Repo https://github.com/SarderLab/H-AI-L
Framework tf

Rafiki: Machine Learning as an Analytics Service System

Title Rafiki: Machine Learning as an Analytics Service System
Authors Wei Wang, Sheng Wang, Jinyang Gao, Meihui Zhang, Gang Chen, Teck Khim Ng, Beng Chin Ooi
Abstract Big data analytics is gaining massive momentum in the last few years. Applying machine learning models to big data has become an implicit requirement or an expectation for most analysis tasks, especially on high-stakes applications.Typical applications include sentiment analysis against reviews for analyzing on-line products, image classification in food logging applications for monitoring user’s daily intake and stock movement prediction. Extending traditional database systems to support the above analysis is intriguing but challenging. First, it is almost impossible to implement all machine learning models in the database engines. Second, expertise knowledge is required to optimize the training and inference procedures in terms of efficiency and effectiveness, which imposes heavy burden on the system users. In this paper, we develop and present a system, called Rafiki, to provide the training and inference service of machine learning models, and facilitate complex analytics on top of cloud platforms. Rafiki provides distributed hyper-parameter tuning for the training service, and online ensemble modeling for the inference service which trades off between latency and accuracy. Experimental results confirm the efficiency, effectiveness, scalability and usability of Rafiki.
Tasks AutoML, Hyperparameter Optimization, Image Classification, Sentiment Analysis
Published 2018-04-17
URL http://arxiv.org/abs/1804.06087v1
PDF http://arxiv.org/pdf/1804.06087v1.pdf
PWC https://paperswithcode.com/paper/rafiki-machine-learning-as-an-analytics
Repo https://github.com/nginyc/rafiki
Framework none

Bilingual Sentiment Embeddings: Joint Projection of Sentiment Across Languages

Title Bilingual Sentiment Embeddings: Joint Projection of Sentiment Across Languages
Authors Jeremy Barnes, Roman Klinger, Sabine Schulte im Walde
Abstract Sentiment analysis in low-resource languages suffers from a lack of annotated corpora to estimate high-performing models. Machine translation and bilingual word embeddings provide some relief through cross-lingual sentiment approaches. However, they either require large amounts of parallel data or do not sufficiently capture sentiment information. We introduce Bilingual Sentiment Embeddings (BLSE), which jointly represent sentiment information in a source and target language. This model only requires a small bilingual lexicon, a source-language corpus annotated for sentiment, and monolingual word embeddings for each language. We perform experiments on three language combinations (Spanish, Catalan, Basque) for sentence-level cross-lingual sentiment classification and find that our model significantly outperforms state-of-the-art methods on four out of six experimental setups, as well as capturing complementary information to machine translation. Our analysis of the resulting embedding space provides evidence that it represents sentiment information in the resource-poor target language without any annotated data in that language.
Tasks Machine Translation, Sentiment Analysis, Word Embeddings
Published 2018-05-23
URL http://arxiv.org/abs/1805.09016v1
PDF http://arxiv.org/pdf/1805.09016v1.pdf
PWC https://paperswithcode.com/paper/bilingual-sentiment-embeddings-joint
Repo https://github.com/jbarnesspain/blse
Framework pytorch

Improving Aspect Term Extraction with Bidirectional Dependency Tree Representation

Title Improving Aspect Term Extraction with Bidirectional Dependency Tree Representation
Authors Huaishao Luo, Tianrui Li, Bing Liu, Bin Wang, Herwig Unger
Abstract Aspect term extraction is one of the important subtasks in aspect-based sentiment analysis. Previous studies have shown that using dependency tree structure representation is promising for this task. However, most dependency tree structures involve only one directional propagation on the dependency tree. In this paper, we first propose a novel bidirectional dependency tree network to extract dependency structure features from the given sentences. The key idea is to explicitly incorporate both representations gained separately from the bottom-up and top-down propagation on the given dependency syntactic tree. An end-to-end framework is then developed to integrate the embedded representations and BiLSTM plus CRF to learn both tree-structured and sequential features to solve the aspect term extraction problem. Experimental results demonstrate that the proposed model outperforms state-of-the-art baseline models on four benchmark SemEval datasets.
Tasks Aspect-Based Sentiment Analysis, Sentiment Analysis
Published 2018-05-21
URL https://arxiv.org/abs/1805.07889v2
PDF https://arxiv.org/pdf/1805.07889v2.pdf
PWC https://paperswithcode.com/paper/improving-aspect-term-extraction-with
Repo https://github.com/ArrowLuo/BiDTree
Framework tf

Fortification of Neural Morphological Segmentation Models for Polysynthetic Minimal-Resource Languages

Title Fortification of Neural Morphological Segmentation Models for Polysynthetic Minimal-Resource Languages
Authors Katharina Kann, Manuel Mager, Ivan Meza-Ruiz, Hinrich Schütze
Abstract Morphological segmentation for polysynthetic languages is challenging, because a word may consist of many individual morphemes and training data can be extremely scarce. Since neural sequence-to-sequence (seq2seq) models define the state of the art for morphological segmentation in high-resource settings and for (mostly) European languages, we first show that they also obtain competitive performance for Mexican polysynthetic languages in minimal-resource settings. We then propose two novel multi-task training approaches -one with, one without need for external unlabeled resources-, and two corresponding data augmentation methods, improving over the neural baseline for all languages. Finally, we explore cross-lingual transfer as a third way to fortify our neural model and show that we can train one single multi-lingual model for related languages while maintaining comparable or even improved performance, thus reducing the amount of parameters by close to 75%. We provide our morphological segmentation datasets for Mexicanero, Nahuatl, Wixarika and Yorem Nokki for future research.
Tasks Cross-Lingual Transfer, Data Augmentation
Published 2018-04-17
URL http://arxiv.org/abs/1804.06024v1
PDF http://arxiv.org/pdf/1804.06024v1.pdf
PWC https://paperswithcode.com/paper/fortification-of-neural-morphological
Repo https://github.com/pywirrarika/naki
Framework none

A Universal Music Translation Network

Title A Universal Music Translation Network
Authors Noam Mor, Lior Wolf, Adam Polyak, Yaniv Taigman
Abstract We present a method for translating music across musical instruments, genres, and styles. This method is based on a multi-domain wavenet autoencoder, with a shared encoder and a disentangled latent space that is trained end-to-end on waveforms. Employing a diverse training dataset and large net capacity, the domain-independent encoder allows us to translate even from musical domains that were not seen during training. The method is unsupervised and does not rely on supervision in the form of matched samples between domains or musical transcriptions. We evaluate our method on NSynth, as well as on a dataset collected from professional musicians, and achieve convincing translations, even when translating from whistling, potentially enabling the creation of instrumental music by untrained humans.
Tasks
Published 2018-05-21
URL http://arxiv.org/abs/1805.07848v2
PDF http://arxiv.org/pdf/1805.07848v2.pdf
PWC https://paperswithcode.com/paper/a-universal-music-translation-network
Repo https://github.com/ShichengChen/WaveNetSeparateAudio
Framework pytorch

Entropic GANs meet VAEs: A Statistical Approach to Compute Sample Likelihoods in GANs

Title Entropic GANs meet VAEs: A Statistical Approach to Compute Sample Likelihoods in GANs
Authors Yogesh Balaji, Hamed Hassani, Rama Chellappa, Soheil Feizi
Abstract Building on the success of deep learning, two modern approaches to learn a probability model from the data are Generative Adversarial Networks (GANs) and Variational AutoEncoders (VAEs). VAEs consider an explicit probability model for the data and compute a generative distribution by maximizing a variational lower-bound on the log-likelihood function. GANs, however, compute a generative model by minimizing a distance between observed and generated probability distributions without considering an explicit model for the observed data. The lack of having explicit probability models in GANs prohibits computation of sample likelihoods in their frameworks and limits their use in statistical inference problems. In this work, we resolve this issue by constructing an explicit probability model that can be used to compute sample likelihood statistics in GANs. In particular, we prove that under this probability model, a family of Wasserstein GANs with an entropy regularization can be viewed as a generative model that maximizes a variational lower-bound on average sample log likelihoods, an approach that VAEs are based on. This result makes a principled connection between two modern generative models, namely GANs and VAEs. In addition to the aforementioned theoretical results, we compute likelihood statistics for GANs trained on Gaussian, MNIST, SVHN, CIFAR-10 and LSUN datasets. Our numerical results validate the proposed theory.
Tasks
Published 2018-10-09
URL https://arxiv.org/abs/1810.04147v2
PDF https://arxiv.org/pdf/1810.04147v2.pdf
PWC https://paperswithcode.com/paper/entropic-gans-meet-vaes-a-statistical
Repo https://github.com/yogeshbalaji/EntropicGANs_meet_VAEs
Framework tf

Domain Adapted Word Embeddings for Improved Sentiment Classification

Title Domain Adapted Word Embeddings for Improved Sentiment Classification
Authors Prathusha K Sarma, YIngyu Liang, William A Sethares
Abstract Generic word embeddings are trained on large-scale generic corpora; Domain Specific (DS) word embeddings are trained only on data from a domain of interest. This paper proposes a method to combine the breadth of generic embeddings with the specificity of domain specific embeddings. The resulting embeddings, called Domain Adapted (DA) word embeddings, are formed by aligning corresponding word vectors using Canonical Correlation Analysis (CCA) or the related nonlinear Kernel CCA. Evaluation results on sentiment classification tasks show that the DA embeddings substantially outperform both generic and DS embeddings when used as input features to standard or state-of-the-art sentence encoding algorithms for classification.
Tasks Sentiment Analysis, Word Embeddings
Published 2018-05-11
URL http://arxiv.org/abs/1805.04576v1
PDF http://arxiv.org/pdf/1805.04576v1.pdf
PWC https://paperswithcode.com/paper/domain-adapted-word-embeddings-for-improved
Repo https://github.com/GallupGovt/multivac
Framework none

Characterizing Entities in the Bitcoin Blockchain

Title Characterizing Entities in the Bitcoin Blockchain
Authors Marc Jourdan, Sebastien Blandin, Laura Wynter, Pralhad Deshpande
Abstract Bitcoin has created a new exchange paradigm within which financial transactions can be trusted without an intermediary. This premise of a free decentralized transactional network however requires, in its current implementation, unrestricted access to the ledger for peer-based transaction verification. A number of studies have shown that, in this pseudonymous context, identities can be leaked based on transaction features or off-network information. In this work, we analyze the information revealed by the pattern of transactions in the neighborhood of a given entity transaction. By definition, these features which pertain to an extended network are not directly controllable by the entity, but might enable leakage of information about transacting entities. We define a number of new features relevant to entity characterization on the Bitcoin Blockchain and study their efficacy in practice. We show that even a weak attacker with shallow data mining knowledge is able to leverage these features to characterize the entity properties.
Tasks
Published 2018-10-29
URL http://arxiv.org/abs/1810.11956v1
PDF http://arxiv.org/pdf/1810.11956v1.pdf
PWC https://paperswithcode.com/paper/characterizing-entities-in-the-bitcoin
Repo https://github.com/Maru92/EntityAddressBitcoin
Framework none

Identifying and Alleviating Concept Drift in Streaming Tensor Decomposition

Title Identifying and Alleviating Concept Drift in Streaming Tensor Decomposition
Authors Ravdeep Pasricha, Ekta Gujral, Evangelos E. Papalexakis
Abstract Tensor decompositions are used in various data mining applications from social network to medical applications and are extremely useful in discovering latent structures or concepts in the data. Many real-world applications are dynamic in nature and so are their data. To deal with this dynamic nature of data, there exist a variety of online tensor decomposition algorithms. A central assumption in all those algorithms is that the number of latent concepts remains fixed throughout the entire stream. However, this need not be the case. Every incoming batch in the stream may have a different number of latent concepts, and the difference in latent concepts from one tensor batch to another can provide insights into how our findings in a particular application behave and deviate over time. In this paper, we define “concept” and “concept drift” in the context of streaming tensor decomposition, as the manifestation of the variability of latent concepts throughout the stream. Furthermore, we introduce SeekAndDestroy, an algorithm that detects concept drift in streaming tensor decomposition and is able to produce results robust to that drift. To the best of our knowledge, this is the first work that investigates concept drift in streaming tensor decomposition. We extensively evaluate SeekAndDestroy on synthetic datasets, which exhibit a wide variety of realistic drift. Our experiments demonstrate the effectiveness of SeekAndDestroy, both in the detection of concept drift and in the alleviation of its effects, producing results with similar quality to decomposing the entire tensor in one shot. Additionally, in real datasets, SeekAndDestroy outperforms other streaming baselines, while discovering novel useful components.
Tasks
Published 2018-04-25
URL http://arxiv.org/abs/1804.09619v2
PDF http://arxiv.org/pdf/1804.09619v2.pdf
PWC https://paperswithcode.com/paper/identifying-and-alleviating-concept-drift-in
Repo https://github.com/ravdeep003/conceptDrift
Framework none
comments powered by Disqus