Paper Group NAWR 14
Learning to Align the Source Code to the Compiled Object Code. Outcome-adaptive lasso: variable selection for causal inference. Discriminative Information Retrieval for Question Answering Sentence Selection. Beyond Word2Vec: Embedding Words and Phrases in Same Vector Space. Multimodal Autoencoder: A Deep Learning Approach to Filling In Missing Sens …
Learning to Align the Source Code to the Compiled Object Code
Title | Learning to Align the Source Code to the Compiled Object Code |
Authors | Dor Levy, Lior Wolf |
Abstract | We propose a new neural network architecture and use it for the task of statement-by-statement alignment of source code and its compiled object code. Our architecture learns the alignment between the two sequences – one being the translation of the other – by mapping each statement to a context-dependent representation vector and aligning such vectors using a grid of the two sequence domains. Our experiments include short C functions, both artificial and human-written, and show that our neural network architecture is able to predict the alignment with high accuracy, outperforming known baselines. We also demonstrate that our model is general and can learn to solve graph problems such as the Traveling Salesman Problem. |
Tasks | |
Published | 2017-08-01 |
URL | https://icml.cc/Conferences/2017/Schedule?showEvent=821 |
http://proceedings.mlr.press/v70/levy17a/levy17a.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-align-the-source-code-to-the |
Repo | https://github.com/DorLevyML/learn-align |
Framework | tf |
Outcome-adaptive lasso: variable selection for causal inference
Title | Outcome-adaptive lasso: variable selection for causal inference |
Authors | Susan M Shortreed, Ashkan Ertefaie |
Abstract | Methodological advancements, including propensity score methods, have resulted in improved unbiased estimation of treatment effects from observational data. Traditionally, a “throw in the kitchen sink” approach has been used to select covariates for inclusion into the propensity score, but recent work shows including unnecessary covariates can impact both the bias and statistical efficiency of propensity score estimators. In particular, the inclusion of covariates that impact exposure but not the outcome, can inflate standard errors without improving bias, while the inclusion of covariates associated with the outcome but unrelated to exposure can improve precision. We propose the outcome-adaptive lasso for selecting appropriate covariates for inclusion in propensity score models to account for confounding bias and maintaining statistical efficiency. This proposed approach can perform variable selection in the presence of a large number of spurious covariates, i.e. covariates unrelated to outcome or exposure. We present theoretical and simulation results indicating that the outcome-adaptive lasso selects the propensity score model that includes all true confounders and predictors of outcome, while excluding other covariates. We illustrate covariate selection using the outcome-adaptive lasso, including comparison to alternative approaches, using simulated data and in a survey of patients using opioid therapy to manage chronic pain. |
Tasks | Causal Inference |
Published | 2017-03-08 |
URL | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5591052/ |
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5591052/pdf/nihms-852754.pdf | |
PWC | https://paperswithcode.com/paper/outcome-adaptive-lasso-variable-selection-for |
Repo | https://github.com/tom-beer/Outcome-Adaptive-LASSO |
Framework | none |
Discriminative Information Retrieval for Question Answering Sentence Selection
Title | Discriminative Information Retrieval for Question Answering Sentence Selection |
Authors | Tongfei Chen, Benjamin Van Durme |
Abstract | We propose a framework for discriminative IR atop linguistic features, trained to improve the recall of answer candidate passage retrieval, the initial step in text-based question answering. We formalize this as an instance of linear feature-based IR, demonstrating a 34{%}-43{%} improvement in recall for candidate triage for QA. |
Tasks | Information Retrieval, Question Answering |
Published | 2017-04-01 |
URL | https://www.aclweb.org/anthology/E17-2114/ |
https://www.aclweb.org/anthology/E17-2114 | |
PWC | https://paperswithcode.com/paper/discriminative-information-retrieval-for |
Repo | https://github.com/ctongfei/probe |
Framework | none |
Beyond Word2Vec: Embedding Words and Phrases in Same Vector Space
Title | Beyond Word2Vec: Embedding Words and Phrases in Same Vector Space |
Authors | Vijay Prakash Dwivedi, Manish Shrivastava |
Abstract | Word embeddings are being used for several linguistic problems and NLP tasks. Improvements in solutions to such problems are great because of the recent breakthroughs in vector representation of words and research in vector space models. However, vector embeddings of phrases keeping semantics intact with words has been challenging. We propose a novel methodology using Siamese deep neural networks to embed multi-word units and fine-tune the current state-of-the-art word embed-dings keeping both in the same vector space. We show several semantic relations between words and phrases using the embeddings generated by our system and evaluate that the similarity of words and their corresponding paraphrases are maximized using the modified embeddings. |
Tasks | Phrase Vector Embedding, Semantic Textual Similarity, Word Embeddings |
Published | 2017-12-18 |
URL | https://cdn.iiit.ac.in/cdn/ltrc.iiit.ac.in/icon2017/proceedings/icon2017/pdf/W17-7526.pdf |
https://cdn.iiit.ac.in/cdn/ltrc.iiit.ac.in/icon2017/proceedings/icon2017/pdf/W17-7526.pdf | |
PWC | https://paperswithcode.com/paper/beyond-word2vec-embedding-words-and-phrases |
Repo | https://github.com/vijaydwivedi75/Beyond_word2vec |
Framework | tf |
Multimodal Autoencoder: A Deep Learning Approach to Filling In Missing Sensor Data and Enabling Better Mood Prediction
Title | Multimodal Autoencoder: A Deep Learning Approach to Filling In Missing Sensor Data and Enabling Better Mood Prediction |
Authors | Natasha Jaques, Sara Taylor, Akane Sano, Rosalind W. Picard |
Abstract | To accomplish forecasting of mood in real-world situations, affective computing systems need to collect and learn from multimodal data collected over weeks or months of daily use. Such systems are likely to encounter frequent data loss, e.g. when a phone loses location access, or when a sensor is recharging. Lost data can handicap classifiers trained with all modalities present in the data. This paper describes a new technique for handling missing multimodal data using a specialized denoising autoencoder: the Multimodal Autoencoder (MMAE). Empirical results from over 200 participants and 5500 days of data demonstrate that the MMAE is able to predict the feature values from multiple missing modalities more accurately than reconstruction methods such as principal components analysis (PCA). We discuss several practical benefits of the MMAE’s encoding and show that it can provide robust mood prediction even when up to three quarters of the data sources are lost. |
Tasks | Denoising |
Published | 2017-10-26 |
URL | https://www.semanticscholar.org/paper/Multimodal-autoencoder%3A-A-deep-learning-approach-to-Jaques-Taylor/7adc6c841b7a3a11bfd8da1878085f51d2a82393 |
https://affect.media.mit.edu/pdfs/17.Jaques_autoencoder_ACII.pdf | |
PWC | https://paperswithcode.com/paper/multimodal-autoencoder-a-deep-learning |
Repo | https://github.com/natashamjaques/MultimodalAutoencoder |
Framework | tf |
Sentiment Analysis of Tunisian Dialects: Linguistic Ressources and Experiments
Title | Sentiment Analysis of Tunisian Dialects: Linguistic Ressources and Experiments |
Authors | Salima Medhaffar, Fethi Bougares, Yannick Est{`e}ve, Lamia Hadrich-Belguith |
Abstract | Dialectal Arabic (DA) is significantly different from the Arabic language taught in schools and used in written communication and formal speech (broadcast news, religion, politics, etc.). There are many existing researches in the field of Arabic language Sentiment Analysis (SA); however, they are generally restricted to Modern Standard Arabic (MSA) or some dialects of economic or political interest. In this paper we are interested in the SA of the Tunisian Dialect. We utilize Machine Learning techniques to determine the polarity of comments written in Tunisian Dialect. First, we evaluate the SA systems performances with models trained using freely available MSA and Multi-dialectal data sets. We then collect and annotate a Tunisian Dialect corpus of 17.000 comments from Facebook. This corpus allows us a significant accuracy improvement compared to the best model trained on other Arabic dialects or MSA data. We believe that this first freely available corpus will be valuable to researchers working in the field of Tunisian Sentiment Analysis and similar areas. |
Tasks | Sentiment Analysis |
Published | 2017-04-01 |
URL | https://www.aclweb.org/anthology/W17-1307/ |
https://www.aclweb.org/anthology/W17-1307 | |
PWC | https://paperswithcode.com/paper/sentiment-analysis-of-tunisian-dialects |
Repo | https://github.com/fbougares/TSAC |
Framework | none |
Protein Interface Prediction using Graph Convolutional Networks
Title | Protein Interface Prediction using Graph Convolutional Networks |
Authors | Alex Fout, Jonathon Byrd, Basir Shariat, Asa Ben-Hur |
Abstract | We consider the prediction of interfaces between proteins, a challenging problem with important applications in drug discovery and design, and examine the performance of existing and newly proposed spatial graph convolution operators for this task. By performing convolution over a local neighborhood of a node of interest, we are able to stack multiple layers of convolution and learn effective latent representations that integrate information across the graph that represent the three dimensional structure of a protein of interest. An architecture that combines the learned features across pairs of proteins is then used to classify pairs of amino acid residues as part of an interface or not. In our experiments, several graph convolution operators yielded accuracy that is better than the state-of-the-art SVM method in this task. |
Tasks | Drug Discovery |
Published | 2017-12-01 |
URL | http://papers.nips.cc/paper/7231-protein-interface-prediction-using-graph-convolutional-networks |
http://papers.nips.cc/paper/7231-protein-interface-prediction-using-graph-convolutional-networks.pdf | |
PWC | https://paperswithcode.com/paper/protein-interface-prediction-using-graph |
Repo | https://github.com/fouticus/pipgcn |
Framework | tf |
Sense Embeddings in Knowledge-Based Word Sense Disambiguation
Title | Sense Embeddings in Knowledge-Based Word Sense Disambiguation |
Authors | Lo{"\i}c Vial, Benjamin Lecouteux, Didier Schwab |
Abstract | |
Tasks | Machine Translation, Word Embeddings, Word Sense Disambiguation |
Published | 2017-01-01 |
URL | https://www.aclweb.org/anthology/W17-6940/ |
https://www.aclweb.org/anthology/W17-6940 | |
PWC | https://paperswithcode.com/paper/sense-embeddings-in-knowledge-based-word |
Repo | https://github.com/getalp/WSD-IWCS2017-Vialetal |
Framework | none |
Graph Convolutional Networks for Named Entity Recognition
Title | Graph Convolutional Networks for Named Entity Recognition |
Authors | Alberto Cetoli, Stefano Bragaglia, Andrew O{'}Harney, Marc Sloan |
Abstract | |
Tasks | Feature Engineering, Named Entity Recognition, Semantic Role Labeling |
Published | 2017-01-01 |
URL | https://www.aclweb.org/anthology/W17-7607/ |
https://www.aclweb.org/anthology/W17-7607 | |
PWC | https://paperswithcode.com/paper/graph-convolutional-networks-for-named-entity |
Repo | https://github.com/contextscout/gcn_ner |
Framework | tf |
Incremental Discontinuous Phrase Structure Parsing with the GAP Transition
Title | Incremental Discontinuous Phrase Structure Parsing with the GAP Transition |
Authors | Maximin Coavoux, Beno{^\i}t Crabb{'e} |
Abstract | This article introduces a novel transition system for discontinuous lexicalized constituent parsing called SR-GAP. It is an extension of the shift-reduce algorithm with an additional gap transition. Evaluation on two German treebanks shows that SR-GAP outperforms the previous best transition-based discontinuous parser (Maier, 2015) by a large margin (it is notably twice as accurate on the prediction of discontinuous constituents), and is competitive with the state of the art (Fern{'a}ndez-Gonz{'a}lez and Martins, 2015). As a side contribution, we adapt span features (Hall et al., 2014) to discontinuous parsing. |
Tasks | |
Published | 2017-04-01 |
URL | https://www.aclweb.org/anthology/E17-1118/ |
https://www.aclweb.org/anthology/E17-1118 | |
PWC | https://paperswithcode.com/paper/incremental-discontinuous-phrase-structure |
Repo | https://github.com/mcoavoux/mtg |
Framework | none |
Robust Training under Linguistic Adversity
Title | Robust Training under Linguistic Adversity |
Authors | Yitong Li, Trevor Cohn, Timothy Baldwin |
Abstract | Deep neural networks have achieved remarkable results across many language processing tasks, however they have been shown to be susceptible to overfitting and highly sensitive to noise, including adversarial attacks. In this work, we propose a linguistically-motivated approach for training robust models based on exposing the model to corrupted text examples at training time. We consider several flavours of linguistically plausible corruption, include lexical semantic and syntactic methods. Empirically, we evaluate our method with a convolutional neural model across a range of sentiment analysis datasets. Compared with a baseline and the dropout method, our method achieves better overall performance. |
Tasks | Sentiment Analysis, Speech Recognition, Word Embeddings |
Published | 2017-04-01 |
URL | https://www.aclweb.org/anthology/E17-2004/ |
https://www.aclweb.org/anthology/E17-2004 | |
PWC | https://paperswithcode.com/paper/robust-training-under-linguistic-adversity |
Repo | https://github.com/lrank/Linguistic_adversity |
Framework | tf |
Dict2vec : Learning Word Embeddings using Lexical Dictionaries
Title | Dict2vec : Learning Word Embeddings using Lexical Dictionaries |
Authors | Julien Tissier, Christophe Gravier, Amaury Habrard |
Abstract | Learning word embeddings on large unlabeled corpus has been shown to be successful in improving many natural language tasks. The most efficient and popular approaches learn or retrofit such representations using additional external data. Resulting embeddings are generally better than their corpus-only counterparts, although such resources cover a fraction of words in the vocabulary. In this paper, we propose a new approach, Dict2vec, based on one of the largest yet refined datasource for describing words {–} natural language dictionaries. Dict2vec builds new word pairs from dictionary entries so that semantically-related words are moved closer, and negative sampling filters out pairs whose words are unrelated in dictionaries. We evaluate the word representations obtained using Dict2vec on eleven datasets for the word similarity task and on four datasets for a text classification task. |
Tasks | Knowledge Graphs, Learning Word Embeddings, Machine Translation, Reading Comprehension, Semantic Role Labeling, Text Classification, Word Embeddings, Word Sense Disambiguation |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/D17-1024/ |
https://www.aclweb.org/anthology/D17-1024 | |
PWC | https://paperswithcode.com/paper/dict2vec-learning-word-embeddings-using |
Repo | https://github.com/tca19/dict2vec |
Framework | none |
Magnets for Sarcasm: Making Sarcasm Detection Timely, Contextual and Very Personal
Title | Magnets for Sarcasm: Making Sarcasm Detection Timely, Contextual and Very Personal |
Authors | Aniruddha Ghosh, Tony Veale |
Abstract | Sarcasm is a pervasive phenomenon in social media, permitting the concise communication of meaning, affect and attitude. Concision requires wit to produce and wit to understand, which demands from each party knowledge of norms, context and a speaker{'}s mindset. Insight into a speaker{'}s psychological profile at the time of production is a valuable source of context for sarcasm detection. Using a neural architecture, we show significant gains in detection accuracy when knowledge of the speaker{'}s mood at the time of production can be inferred. Our focus is on sarcasm detection on Twitter, and show that the mood exhibited by a speaker over tweets leading up to a new post is as useful a cue for sarcasm as the topical context of the post itself. The work opens the door to an empirical exploration not just of sarcasm in text but of the sarcastic state of mind. |
Tasks | Sarcasm Detection |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/D17-1050/ |
https://www.aclweb.org/anthology/D17-1050 | |
PWC | https://paperswithcode.com/paper/magnets-for-sarcasm-making-sarcasm-detection |
Repo | https://github.com/AniSkywalker/SarcasmDetection |
Framework | tf |
Authorship attribution of source code by using back propagation neural network based on particle swarm optimization
Title | Authorship attribution of source code by using back propagation neural network based on particle swarm optimization |
Authors | Xinyu Yang, Guoai Xu, Qi Li, Yanhui Guo, Miao Zhang |
Abstract | Authorship attribution is to identify the most likely author of a given sample among a set of candidate known authors. It can be not only applied to discover the original author of plain text, such as novels, blogs, emails, posts etc., but also used to identify source code programmers. Authorship attribution of source code is required in diverse applications, ranging from malicious code tracking to solving authorship dispute or software plagiarism detection. This paper aims to propose a new method to identify the programmer of Java source code samples with a higher accuracy. To this end, it first introduces back propagation (BP) neural network based on particle swarm optimization (PSO) into authorship attribution of source code. It begins by computing a set of defined feature metrics, including lexical and layout metrics, structure and syntax metrics, totally 19 dimensions. Then these metrics are input to neural network for supervised learning, the weights of which are output by PSO and BP hybrid algorithm. The effectiveness of the proposed method is evaluated on a collected dataset with 3,022 Java files belong to 40 authors. Experiment results show that the proposed method achieves 91.060% accuracy. And a comparison with previous work on authorship attribution of source code for Java language illustrates that this proposed method outperforms others overall, also with an acceptable overhead. |
Tasks | |
Published | 2017-11-02 |
URL | https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0187204 |
https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0187204&type=printable | |
PWC | https://paperswithcode.com/paper/authorship-attribution-of-source-code-by |
Repo | https://github.com/ml-in-programming/ml-on-source-code-models |
Framework | pytorch |
Further Investigation into Reference Bias in Monolingual Evaluation of Machine Translation
Title | Further Investigation into Reference Bias in Monolingual Evaluation of Machine Translation |
Authors | Qingsong Ma, Yvette Graham, Timothy Baldwin, Qun Liu |
Abstract | Monolingual evaluation of Machine Translation (MT) aims to simplify human assessment by requiring assessors to compare the meaning of the MT output with a reference translation, opening up the task to a much larger pool of genuinely qualified evaluators. Monolingual evaluation runs the risk, however, of bias in favour of MT systems that happen to produce translations superficially similar to the reference and, consistent with this intuition, previous investigations have concluded monolingual assessment to be strongly biased in this respect. On re-examination of past analyses, we identify a series of potential analytical errors that force some important questions to be raised about the reliability of past conclusions, however. We subsequently carry out further investigation into reference bias via direct human assessment of MT adequacy via quality controlled crowd-sourcing. Contrary to both intuition and past conclusions, results for show no significant evidence of reference bias in monolingual evaluation of MT. |
Tasks | Machine Translation |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/D17-1262/ |
https://www.aclweb.org/anthology/D17-1262 | |
PWC | https://paperswithcode.com/paper/further-investigation-into-reference-bias-in |
Repo | https://github.com/qingsongma/percentage-refBias |
Framework | none |