July 26, 2019

2473 words 12 mins read

Paper Group NAWR 14

Paper Group NAWR 14

Learning to Align the Source Code to the Compiled Object Code. Outcome-adaptive lasso: variable selection for causal inference. Discriminative Information Retrieval for Question Answering Sentence Selection. Beyond Word2Vec: Embedding Words and Phrases in Same Vector Space. Multimodal Autoencoder: A Deep Learning Approach to Filling In Missing Sens …

Learning to Align the Source Code to the Compiled Object Code

Title Learning to Align the Source Code to the Compiled Object Code
Authors Dor Levy, Lior Wolf
Abstract We propose a new neural network architecture and use it for the task of statement-by-statement alignment of source code and its compiled object code. Our architecture learns the alignment between the two sequences – one being the translation of the other – by mapping each statement to a context-dependent representation vector and aligning such vectors using a grid of the two sequence domains. Our experiments include short C functions, both artificial and human-written, and show that our neural network architecture is able to predict the alignment with high accuracy, outperforming known baselines. We also demonstrate that our model is general and can learn to solve graph problems such as the Traveling Salesman Problem.
Tasks
Published 2017-08-01
URL https://icml.cc/Conferences/2017/Schedule?showEvent=821
PDF http://proceedings.mlr.press/v70/levy17a/levy17a.pdf
PWC https://paperswithcode.com/paper/learning-to-align-the-source-code-to-the
Repo https://github.com/DorLevyML/learn-align
Framework tf

Outcome-adaptive lasso: variable selection for causal inference

Title Outcome-adaptive lasso: variable selection for causal inference
Authors Susan M Shortreed, Ashkan Ertefaie
Abstract Methodological advancements, including propensity score methods, have resulted in improved unbiased estimation of treatment effects from observational data. Traditionally, a “throw in the kitchen sink” approach has been used to select covariates for inclusion into the propensity score, but recent work shows including unnecessary covariates can impact both the bias and statistical efficiency of propensity score estimators. In particular, the inclusion of covariates that impact exposure but not the outcome, can inflate standard errors without improving bias, while the inclusion of covariates associated with the outcome but unrelated to exposure can improve precision. We propose the outcome-adaptive lasso for selecting appropriate covariates for inclusion in propensity score models to account for confounding bias and maintaining statistical efficiency. This proposed approach can perform variable selection in the presence of a large number of spurious covariates, i.e. covariates unrelated to outcome or exposure. We present theoretical and simulation results indicating that the outcome-adaptive lasso selects the propensity score model that includes all true confounders and predictors of outcome, while excluding other covariates. We illustrate covariate selection using the outcome-adaptive lasso, including comparison to alternative approaches, using simulated data and in a survey of patients using opioid therapy to manage chronic pain.
Tasks Causal Inference
Published 2017-03-08
URL https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5591052/
PDF https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5591052/pdf/nihms-852754.pdf
PWC https://paperswithcode.com/paper/outcome-adaptive-lasso-variable-selection-for
Repo https://github.com/tom-beer/Outcome-Adaptive-LASSO
Framework none

Discriminative Information Retrieval for Question Answering Sentence Selection

Title Discriminative Information Retrieval for Question Answering Sentence Selection
Authors Tongfei Chen, Benjamin Van Durme
Abstract We propose a framework for discriminative IR atop linguistic features, trained to improve the recall of answer candidate passage retrieval, the initial step in text-based question answering. We formalize this as an instance of linear feature-based IR, demonstrating a 34{%}-43{%} improvement in recall for candidate triage for QA.
Tasks Information Retrieval, Question Answering
Published 2017-04-01
URL https://www.aclweb.org/anthology/E17-2114/
PDF https://www.aclweb.org/anthology/E17-2114
PWC https://paperswithcode.com/paper/discriminative-information-retrieval-for
Repo https://github.com/ctongfei/probe
Framework none

Beyond Word2Vec: Embedding Words and Phrases in Same Vector Space

Title Beyond Word2Vec: Embedding Words and Phrases in Same Vector Space
Authors Vijay Prakash Dwivedi, Manish Shrivastava
Abstract Word embeddings are being used for several linguistic problems and NLP tasks. Improvements in solutions to such problems are great because of the recent breakthroughs in vector representation of words and research in vector space models. However, vector embeddings of phrases keeping semantics intact with words has been challenging. We propose a novel methodology using Siamese deep neural networks to embed multi-word units and fine-tune the current state-of-the-art word embed-dings keeping both in the same vector space. We show several semantic relations between words and phrases using the embeddings generated by our system and evaluate that the similarity of words and their corresponding paraphrases are maximized using the modified embeddings.
Tasks Phrase Vector Embedding, Semantic Textual Similarity, Word Embeddings
Published 2017-12-18
URL https://cdn.iiit.ac.in/cdn/ltrc.iiit.ac.in/icon2017/proceedings/icon2017/pdf/W17-7526.pdf
PDF https://cdn.iiit.ac.in/cdn/ltrc.iiit.ac.in/icon2017/proceedings/icon2017/pdf/W17-7526.pdf
PWC https://paperswithcode.com/paper/beyond-word2vec-embedding-words-and-phrases
Repo https://github.com/vijaydwivedi75/Beyond_word2vec
Framework tf

Multimodal Autoencoder: A Deep Learning Approach to Filling In Missing Sensor Data and Enabling Better Mood Prediction

Title Multimodal Autoencoder: A Deep Learning Approach to Filling In Missing Sensor Data and Enabling Better Mood Prediction
Authors Natasha Jaques, Sara Taylor, Akane Sano, Rosalind W. Picard
Abstract To accomplish forecasting of mood in real-world situations, affective computing systems need to collect and learn from multimodal data collected over weeks or months of daily use. Such systems are likely to encounter frequent data loss, e.g. when a phone loses location access, or when a sensor is recharging. Lost data can handicap classifiers trained with all modalities present in the data. This paper describes a new technique for handling missing multimodal data using a specialized denoising autoencoder: the Multimodal Autoencoder (MMAE). Empirical results from over 200 participants and 5500 days of data demonstrate that the MMAE is able to predict the feature values from multiple missing modalities more accurately than reconstruction methods such as principal components analysis (PCA). We discuss several practical benefits of the MMAE’s encoding and show that it can provide robust mood prediction even when up to three quarters of the data sources are lost.
Tasks Denoising
Published 2017-10-26
URL https://www.semanticscholar.org/paper/Multimodal-autoencoder%3A-A-deep-learning-approach-to-Jaques-Taylor/7adc6c841b7a3a11bfd8da1878085f51d2a82393
PDF https://affect.media.mit.edu/pdfs/17.Jaques_autoencoder_ACII.pdf
PWC https://paperswithcode.com/paper/multimodal-autoencoder-a-deep-learning
Repo https://github.com/natashamjaques/MultimodalAutoencoder
Framework tf

Sentiment Analysis of Tunisian Dialects: Linguistic Ressources and Experiments

Title Sentiment Analysis of Tunisian Dialects: Linguistic Ressources and Experiments
Authors Salima Medhaffar, Fethi Bougares, Yannick Est{`e}ve, Lamia Hadrich-Belguith
Abstract Dialectal Arabic (DA) is significantly different from the Arabic language taught in schools and used in written communication and formal speech (broadcast news, religion, politics, etc.). There are many existing researches in the field of Arabic language Sentiment Analysis (SA); however, they are generally restricted to Modern Standard Arabic (MSA) or some dialects of economic or political interest. In this paper we are interested in the SA of the Tunisian Dialect. We utilize Machine Learning techniques to determine the polarity of comments written in Tunisian Dialect. First, we evaluate the SA systems performances with models trained using freely available MSA and Multi-dialectal data sets. We then collect and annotate a Tunisian Dialect corpus of 17.000 comments from Facebook. This corpus allows us a significant accuracy improvement compared to the best model trained on other Arabic dialects or MSA data. We believe that this first freely available corpus will be valuable to researchers working in the field of Tunisian Sentiment Analysis and similar areas.
Tasks Sentiment Analysis
Published 2017-04-01
URL https://www.aclweb.org/anthology/W17-1307/
PDF https://www.aclweb.org/anthology/W17-1307
PWC https://paperswithcode.com/paper/sentiment-analysis-of-tunisian-dialects
Repo https://github.com/fbougares/TSAC
Framework none

Protein Interface Prediction using Graph Convolutional Networks

Title Protein Interface Prediction using Graph Convolutional Networks
Authors Alex Fout, Jonathon Byrd, Basir Shariat, Asa Ben-Hur
Abstract We consider the prediction of interfaces between proteins, a challenging problem with important applications in drug discovery and design, and examine the performance of existing and newly proposed spatial graph convolution operators for this task. By performing convolution over a local neighborhood of a node of interest, we are able to stack multiple layers of convolution and learn effective latent representations that integrate information across the graph that represent the three dimensional structure of a protein of interest. An architecture that combines the learned features across pairs of proteins is then used to classify pairs of amino acid residues as part of an interface or not. In our experiments, several graph convolution operators yielded accuracy that is better than the state-of-the-art SVM method in this task.
Tasks Drug Discovery
Published 2017-12-01
URL http://papers.nips.cc/paper/7231-protein-interface-prediction-using-graph-convolutional-networks
PDF http://papers.nips.cc/paper/7231-protein-interface-prediction-using-graph-convolutional-networks.pdf
PWC https://paperswithcode.com/paper/protein-interface-prediction-using-graph
Repo https://github.com/fouticus/pipgcn
Framework tf

Sense Embeddings in Knowledge-Based Word Sense Disambiguation

Title Sense Embeddings in Knowledge-Based Word Sense Disambiguation
Authors Lo{"\i}c Vial, Benjamin Lecouteux, Didier Schwab
Abstract
Tasks Machine Translation, Word Embeddings, Word Sense Disambiguation
Published 2017-01-01
URL https://www.aclweb.org/anthology/W17-6940/
PDF https://www.aclweb.org/anthology/W17-6940
PWC https://paperswithcode.com/paper/sense-embeddings-in-knowledge-based-word
Repo https://github.com/getalp/WSD-IWCS2017-Vialetal
Framework none

Graph Convolutional Networks for Named Entity Recognition

Title Graph Convolutional Networks for Named Entity Recognition
Authors Alberto Cetoli, Stefano Bragaglia, Andrew O{'}Harney, Marc Sloan
Abstract
Tasks Feature Engineering, Named Entity Recognition, Semantic Role Labeling
Published 2017-01-01
URL https://www.aclweb.org/anthology/W17-7607/
PDF https://www.aclweb.org/anthology/W17-7607
PWC https://paperswithcode.com/paper/graph-convolutional-networks-for-named-entity
Repo https://github.com/contextscout/gcn_ner
Framework tf

Incremental Discontinuous Phrase Structure Parsing with the GAP Transition

Title Incremental Discontinuous Phrase Structure Parsing with the GAP Transition
Authors Maximin Coavoux, Beno{^\i}t Crabb{'e}
Abstract This article introduces a novel transition system for discontinuous lexicalized constituent parsing called SR-GAP. It is an extension of the shift-reduce algorithm with an additional gap transition. Evaluation on two German treebanks shows that SR-GAP outperforms the previous best transition-based discontinuous parser (Maier, 2015) by a large margin (it is notably twice as accurate on the prediction of discontinuous constituents), and is competitive with the state of the art (Fern{'a}ndez-Gonz{'a}lez and Martins, 2015). As a side contribution, we adapt span features (Hall et al., 2014) to discontinuous parsing.
Tasks
Published 2017-04-01
URL https://www.aclweb.org/anthology/E17-1118/
PDF https://www.aclweb.org/anthology/E17-1118
PWC https://paperswithcode.com/paper/incremental-discontinuous-phrase-structure
Repo https://github.com/mcoavoux/mtg
Framework none

Robust Training under Linguistic Adversity

Title Robust Training under Linguistic Adversity
Authors Yitong Li, Trevor Cohn, Timothy Baldwin
Abstract Deep neural networks have achieved remarkable results across many language processing tasks, however they have been shown to be susceptible to overfitting and highly sensitive to noise, including adversarial attacks. In this work, we propose a linguistically-motivated approach for training robust models based on exposing the model to corrupted text examples at training time. We consider several flavours of linguistically plausible corruption, include lexical semantic and syntactic methods. Empirically, we evaluate our method with a convolutional neural model across a range of sentiment analysis datasets. Compared with a baseline and the dropout method, our method achieves better overall performance.
Tasks Sentiment Analysis, Speech Recognition, Word Embeddings
Published 2017-04-01
URL https://www.aclweb.org/anthology/E17-2004/
PDF https://www.aclweb.org/anthology/E17-2004
PWC https://paperswithcode.com/paper/robust-training-under-linguistic-adversity
Repo https://github.com/lrank/Linguistic_adversity
Framework tf

Dict2vec : Learning Word Embeddings using Lexical Dictionaries

Title Dict2vec : Learning Word Embeddings using Lexical Dictionaries
Authors Julien Tissier, Christophe Gravier, Amaury Habrard
Abstract Learning word embeddings on large unlabeled corpus has been shown to be successful in improving many natural language tasks. The most efficient and popular approaches learn or retrofit such representations using additional external data. Resulting embeddings are generally better than their corpus-only counterparts, although such resources cover a fraction of words in the vocabulary. In this paper, we propose a new approach, Dict2vec, based on one of the largest yet refined datasource for describing words {–} natural language dictionaries. Dict2vec builds new word pairs from dictionary entries so that semantically-related words are moved closer, and negative sampling filters out pairs whose words are unrelated in dictionaries. We evaluate the word representations obtained using Dict2vec on eleven datasets for the word similarity task and on four datasets for a text classification task.
Tasks Knowledge Graphs, Learning Word Embeddings, Machine Translation, Reading Comprehension, Semantic Role Labeling, Text Classification, Word Embeddings, Word Sense Disambiguation
Published 2017-09-01
URL https://www.aclweb.org/anthology/D17-1024/
PDF https://www.aclweb.org/anthology/D17-1024
PWC https://paperswithcode.com/paper/dict2vec-learning-word-embeddings-using
Repo https://github.com/tca19/dict2vec
Framework none

Magnets for Sarcasm: Making Sarcasm Detection Timely, Contextual and Very Personal

Title Magnets for Sarcasm: Making Sarcasm Detection Timely, Contextual and Very Personal
Authors Aniruddha Ghosh, Tony Veale
Abstract Sarcasm is a pervasive phenomenon in social media, permitting the concise communication of meaning, affect and attitude. Concision requires wit to produce and wit to understand, which demands from each party knowledge of norms, context and a speaker{'}s mindset. Insight into a speaker{'}s psychological profile at the time of production is a valuable source of context for sarcasm detection. Using a neural architecture, we show significant gains in detection accuracy when knowledge of the speaker{'}s mood at the time of production can be inferred. Our focus is on sarcasm detection on Twitter, and show that the mood exhibited by a speaker over tweets leading up to a new post is as useful a cue for sarcasm as the topical context of the post itself. The work opens the door to an empirical exploration not just of sarcasm in text but of the sarcastic state of mind.
Tasks Sarcasm Detection
Published 2017-09-01
URL https://www.aclweb.org/anthology/D17-1050/
PDF https://www.aclweb.org/anthology/D17-1050
PWC https://paperswithcode.com/paper/magnets-for-sarcasm-making-sarcasm-detection
Repo https://github.com/AniSkywalker/SarcasmDetection
Framework tf

Authorship attribution of source code by using back propagation neural network based on particle swarm optimization

Title Authorship attribution of source code by using back propagation neural network based on particle swarm optimization
Authors Xinyu Yang, Guoai Xu, Qi Li, Yanhui Guo, Miao Zhang
Abstract Authorship attribution is to identify the most likely author of a given sample among a set of candidate known authors. It can be not only applied to discover the original author of plain text, such as novels, blogs, emails, posts etc., but also used to identify source code programmers. Authorship attribution of source code is required in diverse applications, ranging from malicious code tracking to solving authorship dispute or software plagiarism detection. This paper aims to propose a new method to identify the programmer of Java source code samples with a higher accuracy. To this end, it first introduces back propagation (BP) neural network based on particle swarm optimization (PSO) into authorship attribution of source code. It begins by computing a set of defined feature metrics, including lexical and layout metrics, structure and syntax metrics, totally 19 dimensions. Then these metrics are input to neural network for supervised learning, the weights of which are output by PSO and BP hybrid algorithm. The effectiveness of the proposed method is evaluated on a collected dataset with 3,022 Java files belong to 40 authors. Experiment results show that the proposed method achieves 91.060% accuracy. And a comparison with previous work on authorship attribution of source code for Java language illustrates that this proposed method outperforms others overall, also with an acceptable overhead.
Tasks
Published 2017-11-02
URL https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0187204
PDF https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0187204&type=printable
PWC https://paperswithcode.com/paper/authorship-attribution-of-source-code-by
Repo https://github.com/ml-in-programming/ml-on-source-code-models
Framework pytorch

Further Investigation into Reference Bias in Monolingual Evaluation of Machine Translation

Title Further Investigation into Reference Bias in Monolingual Evaluation of Machine Translation
Authors Qingsong Ma, Yvette Graham, Timothy Baldwin, Qun Liu
Abstract Monolingual evaluation of Machine Translation (MT) aims to simplify human assessment by requiring assessors to compare the meaning of the MT output with a reference translation, opening up the task to a much larger pool of genuinely qualified evaluators. Monolingual evaluation runs the risk, however, of bias in favour of MT systems that happen to produce translations superficially similar to the reference and, consistent with this intuition, previous investigations have concluded monolingual assessment to be strongly biased in this respect. On re-examination of past analyses, we identify a series of potential analytical errors that force some important questions to be raised about the reliability of past conclusions, however. We subsequently carry out further investigation into reference bias via direct human assessment of MT adequacy via quality controlled crowd-sourcing. Contrary to both intuition and past conclusions, results for show no significant evidence of reference bias in monolingual evaluation of MT.
Tasks Machine Translation
Published 2017-09-01
URL https://www.aclweb.org/anthology/D17-1262/
PDF https://www.aclweb.org/anthology/D17-1262
PWC https://paperswithcode.com/paper/further-investigation-into-reference-bias-in
Repo https://github.com/qingsongma/percentage-refBias
Framework none
comments powered by Disqus