Paper Group AWR 117
Episodic Curiosity through Reachability. Labeling Gaps Between Words: Recognizing Overlapping Mentions with Mention Separators. Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks. Adversarial Over-Sensitivity and Over-Stability Strategies for Dialogue Models. Predefined Sparseness in Recurrent Sequence Models. Iterative ann …
Episodic Curiosity through Reachability
Title | Episodic Curiosity through Reachability |
Authors | Nikolay Savinov, Anton Raichuk, Raphaël Marinier, Damien Vincent, Marc Pollefeys, Timothy Lillicrap, Sylvain Gelly |
Abstract | Rewards are sparse in the real world and most of today’s reinforcement learning algorithms struggle with such sparsity. One solution to this problem is to allow the agent to create rewards for itself - thus making rewards dense and more suitable for learning. In particular, inspired by curious behaviour in animals, observing something novel could be rewarded with a bonus. Such bonus is summed up with the real task reward - making it possible for RL algorithms to learn from the combined reward. We propose a new curiosity method which uses episodic memory to form the novelty bonus. To determine the bonus, the current observation is compared with the observations in memory. Crucially, the comparison is done based on how many environment steps it takes to reach the current observation from those in memory - which incorporates rich information about environment dynamics. This allows us to overcome the known “couch-potato” issues of prior work - when the agent finds a way to instantly gratify itself by exploiting actions which lead to hardly predictable consequences. We test our approach in visually rich 3D environments in ViZDoom, DMLab and MuJoCo. In navigational tasks from ViZDoom and DMLab, our agent outperforms the state-of-the-art curiosity method ICM. In MuJoCo, an ant equipped with our curiosity module learns locomotion out of the first-person-view curiosity only. |
Tasks | |
Published | 2018-10-04 |
URL | https://arxiv.org/abs/1810.02274v5 |
https://arxiv.org/pdf/1810.02274v5.pdf | |
PWC | https://paperswithcode.com/paper/episodic-curiosity-through-reachability |
Repo | https://github.com/google-research/episodic-curiosity |
Framework | tf |
Labeling Gaps Between Words: Recognizing Overlapping Mentions with Mention Separators
Title | Labeling Gaps Between Words: Recognizing Overlapping Mentions with Mention Separators |
Authors | Aldrian Obaja Muis, Wei Lu |
Abstract | In this paper, we propose a new model that is capable of recognizing overlapping mentions. We introduce a novel notion of mention separators that can be effectively used to capture how mentions overlap with one another. On top of a novel multigraph representation that we introduce, we show that efficient and exact inference can still be performed. We present some theoretical analysis on the differences between our model and a recently proposed model for recognizing overlapping mentions, and discuss the possible implications of the differences. Through extensive empirical analysis on standard datasets, we demonstrate the effectiveness of our approach. |
Tasks | |
Published | 2018-10-22 |
URL | http://arxiv.org/abs/1810.09073v1 |
http://arxiv.org/pdf/1810.09073v1.pdf | |
PWC | https://paperswithcode.com/paper/labeling-gaps-between-words-recognizing |
Repo | https://github.com/fishjh2/merge_label |
Framework | pytorch |
Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks
Title | Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks |
Authors | Yikang Shen, Shawn Tan, Alessandro Sordoni, Aaron Courville |
Abstract | Natural language is hierarchically structured: smaller units (e.g., phrases) are nested within larger units (e.g., clauses). When a larger constituent ends, all of the smaller constituents that are nested within it must also be closed. While the standard LSTM architecture allows different neurons to track information at different time scales, it does not have an explicit bias towards modeling a hierarchy of constituents. This paper proposes to add such an inductive bias by ordering the neurons; a vector of master input and forget gates ensures that when a given neuron is updated, all the neurons that follow it in the ordering are also updated. Our novel recurrent architecture, ordered neurons LSTM (ON-LSTM), achieves good performance on four different tasks: language modeling, unsupervised parsing, targeted syntactic evaluation, and logical inference. |
Tasks | Constituency Grammar Induction, Language Modelling |
Published | 2018-10-22 |
URL | https://arxiv.org/abs/1810.09536v6 |
https://arxiv.org/pdf/1810.09536v6.pdf | |
PWC | https://paperswithcode.com/paper/ordered-neurons-integrating-tree-structures |
Repo | https://github.com/TieDanCuihua/ORDERED-NEURONS-INTEGRATING-TREE-STRUCTURES-INTO-RECURRENT-NEURAL-NETWORKS--tensorflow |
Framework | tf |
Adversarial Over-Sensitivity and Over-Stability Strategies for Dialogue Models
Title | Adversarial Over-Sensitivity and Over-Stability Strategies for Dialogue Models |
Authors | Tong Niu, Mohit Bansal |
Abstract | We present two categories of model-agnostic adversarial strategies that reveal the weaknesses of several generative, task-oriented dialogue models: Should-Not-Change strategies that evaluate over-sensitivity to small and semantics-preserving edits, as well as Should-Change strategies that test if a model is over-stable against subtle yet semantics-changing modifications. We next perform adversarial training with each strategy, employing a max-margin approach for negative generative examples. This not only makes the target dialogue model more robust to the adversarial inputs, but also helps it perform significantly better on the original inputs. Moreover, training on all strategies combined achieves further improvements, achieving a new state-of-the-art performance on the original task (also verified via human evaluation). In addition to adversarial training, we also address the robustness task at the model-level, by feeding it subword units as both inputs and outputs, and show that the resulting model is equally competitive, requires only 1/4 of the original vocabulary size, and is robust to one of the adversarial strategies (to which the original model is vulnerable) even without adversarial training. |
Tasks | |
Published | 2018-09-06 |
URL | http://arxiv.org/abs/1809.02079v1 |
http://arxiv.org/pdf/1809.02079v1.pdf | |
PWC | https://paperswithcode.com/paper/adversarial-over-sensitivity-and-over |
Repo | https://github.com/WolfNiu/AdversarialDialogue |
Framework | tf |
Predefined Sparseness in Recurrent Sequence Models
Title | Predefined Sparseness in Recurrent Sequence Models |
Authors | Thomas Demeester, Johannes Deleu, Fréderic Godin, Chris Develder |
Abstract | Inducing sparseness while training neural networks has been shown to yield models with a lower memory footprint but similar effectiveness to dense models. However, sparseness is typically induced starting from a dense model, and thus this advantage does not hold during training. We propose techniques to enforce sparseness upfront in recurrent sequence models for NLP applications, to also benefit training. First, in language modeling, we show how to increase hidden state sizes in recurrent layers without increasing the number of parameters, leading to more expressive models. Second, for sequence labeling, we show that word embeddings with predefined sparseness lead to similar performance as dense embeddings, at a fraction of the number of trainable parameters. |
Tasks | Language Modelling, Word Embeddings |
Published | 2018-08-27 |
URL | http://arxiv.org/abs/1808.08720v1 |
http://arxiv.org/pdf/1808.08720v1.pdf | |
PWC | https://paperswithcode.com/paper/predefined-sparseness-in-recurrent-sequence |
Repo | https://github.com/tdmeeste/SparseSeqModels |
Framework | pytorch |
Iterative annotation to ease neural network training: Specialized machine learning in medical image analysis
Title | Iterative annotation to ease neural network training: Specialized machine learning in medical image analysis |
Authors | Brendon Lutnick, Brandon Ginley, Darshana Govind, Sean D. McGarry, Peter S. LaViolette, Rabi Yacoub, Sanjay Jain, John E. Tomaszewski, Kuang-Yu Jen, Pinaki Sarder |
Abstract | Neural networks promise to bring robust, quantitative analysis to medical fields, but adoption is limited by the technicalities of training these networks. To address this translation gap between medical researchers and neural networks in the field of pathology, we have created an intuitive interface which utilizes the commonly used whole slide image (WSI) viewer, Aperio ImageScope (Leica Biosystems Imaging, Inc.), for the annotation and display of neural network predictions on WSIs. Leveraging this, we propose the use of a human-in-the-loop strategy to reduce the burden of WSI annotation. We track network performance improvements as a function of iteration and quantify the use of this pipeline for the segmentation of renal histologic findings on WSIs. More specifically, we present network performance when applied to segmentation of renal micro compartments, and demonstrate multi-class segmentation in human and mouse renal tissue slides. Finally, to show the adaptability of this technique to other medical imaging fields, we demonstrate its ability to iteratively segment human prostate glands from radiology imaging data. |
Tasks | |
Published | 2018-12-18 |
URL | http://arxiv.org/abs/1812.07509v1 |
http://arxiv.org/pdf/1812.07509v1.pdf | |
PWC | https://paperswithcode.com/paper/iterative-annotation-to-ease-neural-network |
Repo | https://github.com/SarderLab/H-AI-L |
Framework | tf |
Rafiki: Machine Learning as an Analytics Service System
Title | Rafiki: Machine Learning as an Analytics Service System |
Authors | Wei Wang, Sheng Wang, Jinyang Gao, Meihui Zhang, Gang Chen, Teck Khim Ng, Beng Chin Ooi |
Abstract | Big data analytics is gaining massive momentum in the last few years. Applying machine learning models to big data has become an implicit requirement or an expectation for most analysis tasks, especially on high-stakes applications.Typical applications include sentiment analysis against reviews for analyzing on-line products, image classification in food logging applications for monitoring user’s daily intake and stock movement prediction. Extending traditional database systems to support the above analysis is intriguing but challenging. First, it is almost impossible to implement all machine learning models in the database engines. Second, expertise knowledge is required to optimize the training and inference procedures in terms of efficiency and effectiveness, which imposes heavy burden on the system users. In this paper, we develop and present a system, called Rafiki, to provide the training and inference service of machine learning models, and facilitate complex analytics on top of cloud platforms. Rafiki provides distributed hyper-parameter tuning for the training service, and online ensemble modeling for the inference service which trades off between latency and accuracy. Experimental results confirm the efficiency, effectiveness, scalability and usability of Rafiki. |
Tasks | AutoML, Hyperparameter Optimization, Image Classification, Sentiment Analysis |
Published | 2018-04-17 |
URL | http://arxiv.org/abs/1804.06087v1 |
http://arxiv.org/pdf/1804.06087v1.pdf | |
PWC | https://paperswithcode.com/paper/rafiki-machine-learning-as-an-analytics |
Repo | https://github.com/nginyc/rafiki |
Framework | none |
Bilingual Sentiment Embeddings: Joint Projection of Sentiment Across Languages
Title | Bilingual Sentiment Embeddings: Joint Projection of Sentiment Across Languages |
Authors | Jeremy Barnes, Roman Klinger, Sabine Schulte im Walde |
Abstract | Sentiment analysis in low-resource languages suffers from a lack of annotated corpora to estimate high-performing models. Machine translation and bilingual word embeddings provide some relief through cross-lingual sentiment approaches. However, they either require large amounts of parallel data or do not sufficiently capture sentiment information. We introduce Bilingual Sentiment Embeddings (BLSE), which jointly represent sentiment information in a source and target language. This model only requires a small bilingual lexicon, a source-language corpus annotated for sentiment, and monolingual word embeddings for each language. We perform experiments on three language combinations (Spanish, Catalan, Basque) for sentence-level cross-lingual sentiment classification and find that our model significantly outperforms state-of-the-art methods on four out of six experimental setups, as well as capturing complementary information to machine translation. Our analysis of the resulting embedding space provides evidence that it represents sentiment information in the resource-poor target language without any annotated data in that language. |
Tasks | Machine Translation, Sentiment Analysis, Word Embeddings |
Published | 2018-05-23 |
URL | http://arxiv.org/abs/1805.09016v1 |
http://arxiv.org/pdf/1805.09016v1.pdf | |
PWC | https://paperswithcode.com/paper/bilingual-sentiment-embeddings-joint |
Repo | https://github.com/jbarnesspain/blse |
Framework | pytorch |
Improving Aspect Term Extraction with Bidirectional Dependency Tree Representation
Title | Improving Aspect Term Extraction with Bidirectional Dependency Tree Representation |
Authors | Huaishao Luo, Tianrui Li, Bing Liu, Bin Wang, Herwig Unger |
Abstract | Aspect term extraction is one of the important subtasks in aspect-based sentiment analysis. Previous studies have shown that using dependency tree structure representation is promising for this task. However, most dependency tree structures involve only one directional propagation on the dependency tree. In this paper, we first propose a novel bidirectional dependency tree network to extract dependency structure features from the given sentences. The key idea is to explicitly incorporate both representations gained separately from the bottom-up and top-down propagation on the given dependency syntactic tree. An end-to-end framework is then developed to integrate the embedded representations and BiLSTM plus CRF to learn both tree-structured and sequential features to solve the aspect term extraction problem. Experimental results demonstrate that the proposed model outperforms state-of-the-art baseline models on four benchmark SemEval datasets. |
Tasks | Aspect-Based Sentiment Analysis, Sentiment Analysis |
Published | 2018-05-21 |
URL | https://arxiv.org/abs/1805.07889v2 |
https://arxiv.org/pdf/1805.07889v2.pdf | |
PWC | https://paperswithcode.com/paper/improving-aspect-term-extraction-with |
Repo | https://github.com/ArrowLuo/BiDTree |
Framework | tf |
Fortification of Neural Morphological Segmentation Models for Polysynthetic Minimal-Resource Languages
Title | Fortification of Neural Morphological Segmentation Models for Polysynthetic Minimal-Resource Languages |
Authors | Katharina Kann, Manuel Mager, Ivan Meza-Ruiz, Hinrich Schütze |
Abstract | Morphological segmentation for polysynthetic languages is challenging, because a word may consist of many individual morphemes and training data can be extremely scarce. Since neural sequence-to-sequence (seq2seq) models define the state of the art for morphological segmentation in high-resource settings and for (mostly) European languages, we first show that they also obtain competitive performance for Mexican polysynthetic languages in minimal-resource settings. We then propose two novel multi-task training approaches -one with, one without need for external unlabeled resources-, and two corresponding data augmentation methods, improving over the neural baseline for all languages. Finally, we explore cross-lingual transfer as a third way to fortify our neural model and show that we can train one single multi-lingual model for related languages while maintaining comparable or even improved performance, thus reducing the amount of parameters by close to 75%. We provide our morphological segmentation datasets for Mexicanero, Nahuatl, Wixarika and Yorem Nokki for future research. |
Tasks | Cross-Lingual Transfer, Data Augmentation |
Published | 2018-04-17 |
URL | http://arxiv.org/abs/1804.06024v1 |
http://arxiv.org/pdf/1804.06024v1.pdf | |
PWC | https://paperswithcode.com/paper/fortification-of-neural-morphological |
Repo | https://github.com/pywirrarika/naki |
Framework | none |
A Universal Music Translation Network
Title | A Universal Music Translation Network |
Authors | Noam Mor, Lior Wolf, Adam Polyak, Yaniv Taigman |
Abstract | We present a method for translating music across musical instruments, genres, and styles. This method is based on a multi-domain wavenet autoencoder, with a shared encoder and a disentangled latent space that is trained end-to-end on waveforms. Employing a diverse training dataset and large net capacity, the domain-independent encoder allows us to translate even from musical domains that were not seen during training. The method is unsupervised and does not rely on supervision in the form of matched samples between domains or musical transcriptions. We evaluate our method on NSynth, as well as on a dataset collected from professional musicians, and achieve convincing translations, even when translating from whistling, potentially enabling the creation of instrumental music by untrained humans. |
Tasks | |
Published | 2018-05-21 |
URL | http://arxiv.org/abs/1805.07848v2 |
http://arxiv.org/pdf/1805.07848v2.pdf | |
PWC | https://paperswithcode.com/paper/a-universal-music-translation-network |
Repo | https://github.com/ShichengChen/WaveNetSeparateAudio |
Framework | pytorch |
Entropic GANs meet VAEs: A Statistical Approach to Compute Sample Likelihoods in GANs
Title | Entropic GANs meet VAEs: A Statistical Approach to Compute Sample Likelihoods in GANs |
Authors | Yogesh Balaji, Hamed Hassani, Rama Chellappa, Soheil Feizi |
Abstract | Building on the success of deep learning, two modern approaches to learn a probability model from the data are Generative Adversarial Networks (GANs) and Variational AutoEncoders (VAEs). VAEs consider an explicit probability model for the data and compute a generative distribution by maximizing a variational lower-bound on the log-likelihood function. GANs, however, compute a generative model by minimizing a distance between observed and generated probability distributions without considering an explicit model for the observed data. The lack of having explicit probability models in GANs prohibits computation of sample likelihoods in their frameworks and limits their use in statistical inference problems. In this work, we resolve this issue by constructing an explicit probability model that can be used to compute sample likelihood statistics in GANs. In particular, we prove that under this probability model, a family of Wasserstein GANs with an entropy regularization can be viewed as a generative model that maximizes a variational lower-bound on average sample log likelihoods, an approach that VAEs are based on. This result makes a principled connection between two modern generative models, namely GANs and VAEs. In addition to the aforementioned theoretical results, we compute likelihood statistics for GANs trained on Gaussian, MNIST, SVHN, CIFAR-10 and LSUN datasets. Our numerical results validate the proposed theory. |
Tasks | |
Published | 2018-10-09 |
URL | https://arxiv.org/abs/1810.04147v2 |
https://arxiv.org/pdf/1810.04147v2.pdf | |
PWC | https://paperswithcode.com/paper/entropic-gans-meet-vaes-a-statistical |
Repo | https://github.com/yogeshbalaji/EntropicGANs_meet_VAEs |
Framework | tf |
Domain Adapted Word Embeddings for Improved Sentiment Classification
Title | Domain Adapted Word Embeddings for Improved Sentiment Classification |
Authors | Prathusha K Sarma, YIngyu Liang, William A Sethares |
Abstract | Generic word embeddings are trained on large-scale generic corpora; Domain Specific (DS) word embeddings are trained only on data from a domain of interest. This paper proposes a method to combine the breadth of generic embeddings with the specificity of domain specific embeddings. The resulting embeddings, called Domain Adapted (DA) word embeddings, are formed by aligning corresponding word vectors using Canonical Correlation Analysis (CCA) or the related nonlinear Kernel CCA. Evaluation results on sentiment classification tasks show that the DA embeddings substantially outperform both generic and DS embeddings when used as input features to standard or state-of-the-art sentence encoding algorithms for classification. |
Tasks | Sentiment Analysis, Word Embeddings |
Published | 2018-05-11 |
URL | http://arxiv.org/abs/1805.04576v1 |
http://arxiv.org/pdf/1805.04576v1.pdf | |
PWC | https://paperswithcode.com/paper/domain-adapted-word-embeddings-for-improved |
Repo | https://github.com/GallupGovt/multivac |
Framework | none |
Characterizing Entities in the Bitcoin Blockchain
Title | Characterizing Entities in the Bitcoin Blockchain |
Authors | Marc Jourdan, Sebastien Blandin, Laura Wynter, Pralhad Deshpande |
Abstract | Bitcoin has created a new exchange paradigm within which financial transactions can be trusted without an intermediary. This premise of a free decentralized transactional network however requires, in its current implementation, unrestricted access to the ledger for peer-based transaction verification. A number of studies have shown that, in this pseudonymous context, identities can be leaked based on transaction features or off-network information. In this work, we analyze the information revealed by the pattern of transactions in the neighborhood of a given entity transaction. By definition, these features which pertain to an extended network are not directly controllable by the entity, but might enable leakage of information about transacting entities. We define a number of new features relevant to entity characterization on the Bitcoin Blockchain and study their efficacy in practice. We show that even a weak attacker with shallow data mining knowledge is able to leverage these features to characterize the entity properties. |
Tasks | |
Published | 2018-10-29 |
URL | http://arxiv.org/abs/1810.11956v1 |
http://arxiv.org/pdf/1810.11956v1.pdf | |
PWC | https://paperswithcode.com/paper/characterizing-entities-in-the-bitcoin |
Repo | https://github.com/Maru92/EntityAddressBitcoin |
Framework | none |
Identifying and Alleviating Concept Drift in Streaming Tensor Decomposition
Title | Identifying and Alleviating Concept Drift in Streaming Tensor Decomposition |
Authors | Ravdeep Pasricha, Ekta Gujral, Evangelos E. Papalexakis |
Abstract | Tensor decompositions are used in various data mining applications from social network to medical applications and are extremely useful in discovering latent structures or concepts in the data. Many real-world applications are dynamic in nature and so are their data. To deal with this dynamic nature of data, there exist a variety of online tensor decomposition algorithms. A central assumption in all those algorithms is that the number of latent concepts remains fixed throughout the entire stream. However, this need not be the case. Every incoming batch in the stream may have a different number of latent concepts, and the difference in latent concepts from one tensor batch to another can provide insights into how our findings in a particular application behave and deviate over time. In this paper, we define “concept” and “concept drift” in the context of streaming tensor decomposition, as the manifestation of the variability of latent concepts throughout the stream. Furthermore, we introduce SeekAndDestroy, an algorithm that detects concept drift in streaming tensor decomposition and is able to produce results robust to that drift. To the best of our knowledge, this is the first work that investigates concept drift in streaming tensor decomposition. We extensively evaluate SeekAndDestroy on synthetic datasets, which exhibit a wide variety of realistic drift. Our experiments demonstrate the effectiveness of SeekAndDestroy, both in the detection of concept drift and in the alleviation of its effects, producing results with similar quality to decomposing the entire tensor in one shot. Additionally, in real datasets, SeekAndDestroy outperforms other streaming baselines, while discovering novel useful components. |
Tasks | |
Published | 2018-04-25 |
URL | http://arxiv.org/abs/1804.09619v2 |
http://arxiv.org/pdf/1804.09619v2.pdf | |
PWC | https://paperswithcode.com/paper/identifying-and-alleviating-concept-drift-in |
Repo | https://github.com/ravdeep003/conceptDrift |
Framework | none |