Paper Group ANR 739
xSense: Learning Sense-Separated Sparse Representations and Textual Definitions for Explainable Word Sense Networks. Learning Safe Policies with Expert Guidance. Convolutional Gated Recurrent Units for Medical Relation Classification. DJAM: distributed Jacobi asynchronous method for learning personal models. Synergistic Reconstruction and Synthesis …
xSense: Learning Sense-Separated Sparse Representations and Textual Definitions for Explainable Word Sense Networks
Title | xSense: Learning Sense-Separated Sparse Representations and Textual Definitions for Explainable Word Sense Networks |
Authors | Ting-Yun Chang, Ta-Chung Chi, Shang-Chi Tsai, Yun-Nung Chen |
Abstract | Despite the success achieved on various natural language processing tasks, word embeddings are difficult to interpret due to the dense vector representations. This paper focuses on interpreting the embeddings for various aspects, including sense separation in the vector dimensions and definition generation. Specifically, given a context together with a target word, our algorithm first projects the target word embedding to a high-dimensional sparse vector and picks the specific dimensions that can best explain the semantic meaning of the target word by the encoded contextual information, where the sense of the target word can be indirectly inferred. Finally, our algorithm applies an RNN to generate the textual definition of the target word in the human readable form, which enables direct interpretation of the corresponding word embedding. This paper also introduces a large and high-quality context-definition dataset that consists of sense definitions together with multiple example sentences per polysemous word, which is a valuable resource for definition modeling and word sense disambiguation. The conducted experiments show the superior performance in BLEU score and the human evaluation test. |
Tasks | Word Embeddings, Word Sense Disambiguation |
Published | 2018-09-10 |
URL | http://arxiv.org/abs/1809.03348v1 |
http://arxiv.org/pdf/1809.03348v1.pdf | |
PWC | https://paperswithcode.com/paper/xsense-learning-sense-separated-sparse |
Repo | |
Framework | |
Learning Safe Policies with Expert Guidance
Title | Learning Safe Policies with Expert Guidance |
Authors | Jessie Huang, Fa Wu, Doina Precup, Yang Cai |
Abstract | We propose a framework for ensuring safe behavior of a reinforcement learning agent when the reward function may be difficult to specify. In order to do this, we rely on the existence of demonstrations from expert policies, and we provide a theoretical framework for the agent to optimize in the space of rewards consistent with its existing knowledge. We propose two methods to solve the resulting optimization: an exact ellipsoid-based method and a method in the spirit of the “follow-the-perturbed-leader” algorithm. Our experiments demonstrate the behavior of our algorithm in both discrete and continuous problems. The trained agent safely avoids states with potential negative effects while imitating the behavior of the expert in the other states. |
Tasks | |
Published | 2018-05-21 |
URL | http://arxiv.org/abs/1805.08313v2 |
http://arxiv.org/pdf/1805.08313v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-safe-policies-with-expert-guidance |
Repo | |
Framework | |
Convolutional Gated Recurrent Units for Medical Relation Classification
Title | Convolutional Gated Recurrent Units for Medical Relation Classification |
Authors | Bin He, Yi Guan, Rui Dai |
Abstract | Convolutional neural network (CNN) and recurrent neural network (RNN) models have become the mainstream methods for relation classification. We propose a unified architecture, which exploits the advantages of CNN and RNN simultaneously, to identify medical relations in clinical records, with only word embedding features. Our model learns phrase-level features through a CNN layer, and these feature representations are directly fed into a bidirectional gated recurrent unit (GRU) layer to capture long-term feature dependencies. We evaluate our model on two clinical datasets, and experiments demonstrate that our model performs significantly better than previous single-model methods on both datasets. |
Tasks | Relation Classification |
Published | 2018-07-29 |
URL | http://arxiv.org/abs/1807.11082v1 |
http://arxiv.org/pdf/1807.11082v1.pdf | |
PWC | https://paperswithcode.com/paper/convolutional-gated-recurrent-units-for |
Repo | |
Framework | |
DJAM: distributed Jacobi asynchronous method for learning personal models
Title | DJAM: distributed Jacobi asynchronous method for learning personal models |
Authors | Inês Almeida, João Xavier |
Abstract | Processing data collected by a network of agents often boils down to solving an optimization problem. The distributed nature of these problems calls for methods that are, themselves, distributed. While most collaborative learning problems require agents to reach a common (or consensus) model, there are situations in which the consensus solution may not be optimal. For instance, agents may want to reach a compromise between agreeing with their neighbors and minimizing a personal loss function. We present DJAM, a Jacobi-like distributed algorithm for learning personalized models. This method is implementation-friendly: it has no hyperparameters that need tuning, it is asynchronous, and its updates only require single-neighbor interactions. We prove that DJAM converges with probability one to the solution, provided that the personal loss functions are strongly convex and have Lipschitz gradient. We then give evidence that DJAM is on par with state-of-the-art methods: our method reaches a solution with error similar to the error of a carefully tuned ADMM in about the same number of single-neighbor interactions. |
Tasks | |
Published | 2018-03-26 |
URL | http://arxiv.org/abs/1803.09737v2 |
http://arxiv.org/pdf/1803.09737v2.pdf | |
PWC | https://paperswithcode.com/paper/djam-distributed-jacobi-asynchronous-method |
Repo | |
Framework | |
Synergistic Reconstruction and Synthesis via Generative Adversarial Networks for Accelerated Multi-Contrast MRI
Title | Synergistic Reconstruction and Synthesis via Generative Adversarial Networks for Accelerated Multi-Contrast MRI |
Authors | Salman Ul Hassan Dar, Mahmut Yurt, Mohammad Shahdloo, Muhammed Emrullah Ildız, Tolga Çukur |
Abstract | Multi-contrast MRI acquisitions of an anatomy enrich the magnitude of information available for diagnosis. Yet, excessive scan times associated with additional contrasts may be a limiting factor. Two mainstream approaches for enhanced scan efficiency are reconstruction of undersampled acquisitions and synthesis of missing acquisitions. In reconstruction, performance decreases towards higher acceleration factors with diminished sampling density particularly at high-spatial-frequencies. In synthesis, the absence of data samples from the target contrast can lead to artefactual sensitivity or insensitivity to image features. Here we propose a new approach for synergistic reconstruction-synthesis of multi-contrast MRI based on conditional generative adversarial networks. The proposed method preserves high-frequency details of the target contrast by relying on the shared high-frequency information available from the source contrast, and prevents feature leakage or loss by relying on the undersampled acquisitions of the target contrast. Demonstrations on brain MRI datasets from healthy subjects and patients indicate the superior performance of the proposed method compared to previous state-of-the-art. The proposed method can help improve the quality and scan efficiency of multi-contrast MRI exams. |
Tasks | |
Published | 2018-05-27 |
URL | http://arxiv.org/abs/1805.10704v1 |
http://arxiv.org/pdf/1805.10704v1.pdf | |
PWC | https://paperswithcode.com/paper/synergistic-reconstruction-and-synthesis-via |
Repo | |
Framework | |
Sparse and Low-rank Tensor Estimation via Cubic Sketchings
Title | Sparse and Low-rank Tensor Estimation via Cubic Sketchings |
Authors | Botao Hao, Anru Zhang, Guang Cheng |
Abstract | In this paper, we propose a general framework for sparse and low-rank tensor estimation from cubic sketchings. A two-stage non-convex implementation is developed based on sparse tensor decomposition and thresholded gradient descent, which ensures exact recovery in the noiseless case and stable recovery in the noisy case with high probability. The non-asymptotic analysis sheds light on an interplay between optimization error and statistical error. The proposed procedure is shown to be rate-optimal under certain conditions. As a technical by-product, novel high-order concentration inequalities are derived for studying high-moment sub-Gaussian tensors. An interesting tensor formulation illustrates the potential application to high-order interaction pursuit in high-dimensional linear regression. |
Tasks | |
Published | 2018-01-29 |
URL | https://arxiv.org/abs/1801.09326v4 |
https://arxiv.org/pdf/1801.09326v4.pdf | |
PWC | https://paperswithcode.com/paper/sparse-and-low-rank-tensor-estimation-via |
Repo | |
Framework | |
Why Do Neural Response Generation Models Prefer Universal Replies?
Title | Why Do Neural Response Generation Models Prefer Universal Replies? |
Authors | Bowen Wu, Nan Jiang, Zhifeng Gao, Mengyuan Li, Zongsheng Wang, Suke Li, Qihang Feng, Wenge Rong, Baoxun Wang |
Abstract | Recent advances in sequence-to-sequence learning reveal a purely data-driven approach to the response generation task. Despite its diverse applications, existing neural models are prone to producing short and generic replies, making it infeasible to tackle open-domain challenges. In this research, we analyze this critical issue in light of the model’s optimization goal and the specific characteristics of the human-to-human dialog corpus. By decomposing the black box into parts, a detailed analysis of the probability limit was conducted to reveal the reason behind these universal replies. Based on these analyses, we propose a max-margin ranking regularization term to avoid the models leaning to these replies. Finally, empirical experiments on case studies and benchmarks with several metrics validate this approach. |
Tasks | |
Published | 2018-08-28 |
URL | https://arxiv.org/abs/1808.09187v2 |
https://arxiv.org/pdf/1808.09187v2.pdf | |
PWC | https://paperswithcode.com/paper/why-do-neural-response-generation-models |
Repo | |
Framework | |
Representation Balancing MDPs for Off-Policy Policy Evaluation
Title | Representation Balancing MDPs for Off-Policy Policy Evaluation |
Authors | Yao Liu, Omer Gottesman, Aniruddh Raghu, Matthieu Komorowski, Aldo Faisal, Finale Doshi-Velez, Emma Brunskill |
Abstract | We study the problem of off-policy policy evaluation (OPPE) in RL. In contrast to prior work, we consider how to estimate both the individual policy value and average policy value accurately. We draw inspiration from recent work in causal reasoning, and propose a new finite sample generalization error bound for value estimates from MDP models. Using this upper bound as an objective, we develop a learning algorithm of an MDP model with a balanced representation, and show that our approach can yield substantially lower MSE in common synthetic benchmarks and a HIV treatment simulation domain. |
Tasks | |
Published | 2018-05-23 |
URL | http://arxiv.org/abs/1805.09044v4 |
http://arxiv.org/pdf/1805.09044v4.pdf | |
PWC | https://paperswithcode.com/paper/representation-balancing-mdps-for-off-policy |
Repo | |
Framework | |
Comparison of computer systems and ranking criteria for automatic melanoma detection in dermoscopic images
Title | Comparison of computer systems and ranking criteria for automatic melanoma detection in dermoscopic images |
Authors | Kajsa Møllersen, Maciel Zortea, Thomas R. Schopf, Herbert Kirchesch, Fred Godtliebsen |
Abstract | Melanoma is the deadliest form of skin cancer. Computer systems can assist in melanoma detection, but are not widespread in clinical practice. In 2016, an open challenge in classification of dermoscopic images of skin lesions was announced. A training set of 900 images with corresponding class labels and semi-automatic/manual segmentation masks was released for the challenge. An independent test set of 379 images was used to rank the participants. This article demonstrates the impact of ranking criteria, segmentation method and classifier, and highlights the clinical perspective. We compare five different measures for diagnostic accuracy by analysing the resulting ranking of the computer systems in the challenge. Choice of performance measure had great impact on the ranking. Systems that were ranked among the top three for one measure, dropped to the bottom half when changing performance measure. Nevus Doctor, a computer system previously developed by the authors, was used to investigate the impact of segmentation and classifier. The unexpected small impact of automatic versus semi-automatic/manual segmentation suggests that improvements of the automatic segmentation method w.r.t. resemblance to semi-automatic/manual segmentation will not improve diagnostic accuracy substantially. A small set of similar classification algorithms are used to investigate the impact of classifier on the diagnostic accuracy. The variability in diagnostic accuracy for different classifier algorithms was larger than the variability for segmentation methods, and suggests a focus for future investigations. From a clinical perspective, the misclassification of a melanoma as benign has far greater cost than the misclassification of a benign lesion. For computer systems to have clinical impact, their performance should be ranked by a high-sensitivity measure. |
Tasks | |
Published | 2018-02-05 |
URL | http://arxiv.org/abs/1802.01301v1 |
http://arxiv.org/pdf/1802.01301v1.pdf | |
PWC | https://paperswithcode.com/paper/comparison-of-computer-systems-and-ranking |
Repo | |
Framework | |
From Phonology to Syntax: Unsupervised Linguistic Typology at Different Levels with Language Embeddings
Title | From Phonology to Syntax: Unsupervised Linguistic Typology at Different Levels with Language Embeddings |
Authors | Johannes Bjerva, Isabelle Augenstein |
Abstract | A core part of linguistic typology is the classification of languages according to linguistic properties, such as those detailed in the World Atlas of Language Structure (WALS). Doing this manually is prohibitively time-consuming, which is in part evidenced by the fact that only 100 out of over 7,000 languages spoken in the world are fully covered in WALS. We learn distributed language representations, which can be used to predict typological properties on a massively multilingual scale. Additionally, quantitative and qualitative analyses of these language embeddings can tell us how language similarities are encoded in NLP models for tasks at different typological levels. The representations are learned in an unsupervised manner alongside tasks at three typological levels: phonology (grapheme-to-phoneme prediction, and phoneme reconstruction), morphology (morphological inflection), and syntax (part-of-speech tagging). We consider more than 800 languages and find significant differences in the language representations encoded, depending on the target task. For instance, although Norwegian Bokm{\aa}l and Danish are typologically close to one another, they are phonologically distant, which is reflected in their language embeddings growing relatively distant in a phonological task. We are also able to predict typological features in WALS with high accuracies, even for unseen language families. |
Tasks | Morphological Inflection, Part-Of-Speech Tagging |
Published | 2018-02-23 |
URL | http://arxiv.org/abs/1802.09375v1 |
http://arxiv.org/pdf/1802.09375v1.pdf | |
PWC | https://paperswithcode.com/paper/from-phonology-to-syntax-unsupervised |
Repo | |
Framework | |
Multi-modal Capsule Routing for Actor and Action Video Segmentation Conditioned on Natural Language Queries
Title | Multi-modal Capsule Routing for Actor and Action Video Segmentation Conditioned on Natural Language Queries |
Authors | Bruce McIntosh, Kevin Duarte, Yogesh S Rawat, Mubarak Shah |
Abstract | In this paper, we propose an end-to-end capsule network for pixel level localization of actors and actions present in a video. The localization is performed based on a natural language query through which an actor and action are specified. We propose to encode both the video as well as textual input in the form of capsules, which provide more effective representation in comparison with standard convolution based features. We introduce a novel capsule based attention mechanism for fusion of video and text capsules for text selected video segmentation. The attention mechanism is performed via joint EM routing over video and text capsules for text selected actor and action localization. The existing works on actor-action localization are mainly focused on localization in a single frame instead of the full video. Different from existing works, we propose to perform the localization on all frames of the video. To validate the potential of the proposed network for actor and action localization on all the frames of a video, we extend an existing actor-action dataset (A2D) with annotations for all the frames. The experimental evaluation demonstrates the effectiveness of the proposed capsule network for text selective actor and action localization in videos, and it also improves upon the performance of the existing state-of-the art works on single frame-based localization. |
Tasks | Action Localization, Video Semantic Segmentation |
Published | 2018-12-02 |
URL | http://arxiv.org/abs/1812.00303v1 |
http://arxiv.org/pdf/1812.00303v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-modal-capsule-routing-for-actor-and |
Repo | |
Framework | |
PaccMann: Prediction of anticancer compound sensitivity with multi-modal attention-based neural networks
Title | PaccMann: Prediction of anticancer compound sensitivity with multi-modal attention-based neural networks |
Authors | Ali Oskooei, Jannis Born, Matteo Manica, Vigneshwari Subramanian, Julio Sáez-Rodríguez, María Rodríguez Martínez |
Abstract | We present a novel approach for the prediction of anticancer compound sensitivity by means of multi-modal attention-based neural networks (PaccMann). In our approach, we integrate three key pillars of drug sensitivity, namely, the molecular structure of compounds, transcriptomic profiles of cancer cells as well as prior knowledge about interactions among proteins within cells. Our models ingest a drug-cell pair consisting of SMILES encoding of a compound and the gene expression profile of a cancer cell and predicts an IC50 sensitivity value. Gene expression profiles are encoded using an attention-based encoding mechanism that assigns high weights to the most informative genes. We present and study three encoders for SMILES string of compounds: 1) bidirectional recurrent 2) convolutional 3) attention-based encoders. We compare our devised models against a baseline model that ingests engineered fingerprints to represent the molecular structure. We demonstrate that using our attention-based encoders, we can surpass the baseline model. The use of attention-based encoders enhance interpretability and enable us to identify genes, bonds and atoms that were used by the network to make a prediction. |
Tasks | |
Published | 2018-11-16 |
URL | https://arxiv.org/abs/1811.06802v2 |
https://arxiv.org/pdf/1811.06802v2.pdf | |
PWC | https://paperswithcode.com/paper/paccmann-prediction-of-anticancer-compound |
Repo | |
Framework | |
Measuring Semantic Abstraction of Multilingual NMT with Paraphrase Recognition and Generation Tasks
Title | Measuring Semantic Abstraction of Multilingual NMT with Paraphrase Recognition and Generation Tasks |
Authors | Jörg Tiedemann, Yves Scherrer |
Abstract | In this paper, we investigate whether multilingual neural translation models learn stronger semantic abstractions of sentences than bilingual ones. We test this hypotheses by measuring the perplexity of such models when applied to paraphrases of the source language. The intuition is that an encoder produces better representations if a decoder is capable of recognizing synonymous sentences in the same language even though the model is never trained for that task. In our setup, we add 16 different auxiliary languages to a bidirectional bilingual baseline model (English-French) and test it with in-domain and out-of-domain paraphrases in English. The results show that the perplexity is significantly reduced in each of the cases, indicating that meaning can be grounded in translation. This is further supported by a study on paraphrase generation that we also include at the end of the paper. |
Tasks | Paraphrase Generation |
Published | 2018-08-21 |
URL | https://arxiv.org/abs/1808.06826v2 |
https://arxiv.org/pdf/1808.06826v2.pdf | |
PWC | https://paperswithcode.com/paper/translational-grounding-using-paraphrase |
Repo | |
Framework | |
Hyperbolic Embeddings for Learning Options in Hierarchical Reinforcement Learning
Title | Hyperbolic Embeddings for Learning Options in Hierarchical Reinforcement Learning |
Authors | Saket Tiwari, M. Prannoy |
Abstract | Hierarchical reinforcement learning deals with the problem of breaking down large tasks into meaningful sub-tasks. Autonomous discovery of these sub-tasks has remained a challenging problem. We propose a novel method of learning sub-tasks by combining paradigms of routing in computer networks and graph based skill discovery within the options framework to define meaningful sub-goals. We apply the recent advancements of learning embeddings using Riemannian optimisation in the hyperbolic space to embed the state set into the hyperbolic space and create a model of the environment. In doing so we enforce a global topology on the states and are able to exploit this topology to learn meaningful sub-tasks. We demonstrate empirically, both in discrete and continuous domains, how these embeddings can improve the learning of meaningful sub-tasks. |
Tasks | Hierarchical Reinforcement Learning |
Published | 2018-12-04 |
URL | http://arxiv.org/abs/1812.01487v2 |
http://arxiv.org/pdf/1812.01487v2.pdf | |
PWC | https://paperswithcode.com/paper/hyperbolic-embeddings-for-learning-options-in |
Repo | |
Framework | |
Interactive Agent Modeling by Learning to Probe
Title | Interactive Agent Modeling by Learning to Probe |
Authors | Tianmin Shu, Caiming Xiong, Ying Nian Wu, Song-Chun Zhu |
Abstract | The ability of modeling the other agents, such as understanding their intentions and skills, is essential to an agent’s interactions with other agents. Conventional agent modeling relies on passive observation from demonstrations. In this work, we propose an interactive agent modeling scheme enabled by encouraging an agent to learn to probe. In particular, the probing agent (i.e. a learner) learns to interact with the environment and with a target agent (i.e., a demonstrator) to maximize the change in the observed behaviors of that agent. Through probing, rich behaviors can be observed and are used for enhancing the agent modeling to learn a more accurate mind model of the target agent. Our framework consists of two learning processes: i) imitation learning for an approximated agent model and ii) pure curiosity-driven reinforcement learning for an efficient probing policy to discover new behaviors that otherwise can not be observed. We have validated our approach in four different tasks. The experimental results suggest that the agent model learned by our approach i) generalizes better in novel scenarios than the ones learned by passive observation, random probing, and other curiosity-driven approaches do, and ii) can be used for enhancing performance in multiple applications including distilling optimal planning to a policy net, collaboration, and competition. A video demo is available at https://www.dropbox.com/s/8mz6rd3349tso67/Probing_Demo.mov?dl=0 |
Tasks | Imitation Learning |
Published | 2018-10-01 |
URL | http://arxiv.org/abs/1810.00510v1 |
http://arxiv.org/pdf/1810.00510v1.pdf | |
PWC | https://paperswithcode.com/paper/interactive-agent-modeling-by-learning-to |
Repo | |
Framework | |