May 4, 2019

1613 words 8 mins read

Paper Group NANR 153

Convolutional Neural Networks vs. Convolution Kernels: Feature Engineering for Answer Sentence Reranking. Hypothesis Testing in Unsupervised Domain Adaptation with Applications in Alzheimer’s Disease. Shallow Parsing Pipeline - Hindi-English Code-Mixed Social Media Text. Quantized Random Projections and Non-Linear Estimation of Cosine Similarity. R …

Convolutional Neural Networks vs. Convolution Kernels: Feature Engineering for Answer Sentence Reranking


Title	Convolutional Neural Networks vs. Convolution Kernels: Feature Engineering for Answer Sentence Reranking
Authors	Kateryna Tymoshenko, Daniele Bonadiman, Aless Moschitti, ro
Abstract
Tasks	Feature Engineering, Learning-To-Rank, Question Answering
Published	2016-06-01
URL	https://www.aclweb.org/anthology/N16-1152/
PDF	https://www.aclweb.org/anthology/N16-1152
PWC	https://paperswithcode.com/paper/convolutional-neural-networks-vs-convolution
Repo
Framework

Hypothesis Testing in Unsupervised Domain Adaptation with Applications in Alzheimer’s Disease


Title	Hypothesis Testing in Unsupervised Domain Adaptation with Applications in Alzheimer’s Disease
Authors	Hao Zhou, Vamsi K. Ithapu, Sathya Narayanan Ravi, Vikas Singh, Grace Wahba, Sterling C. Johnson
Abstract	Consider samples from two different data sources ${\mathbf{x_s^i}} \sim P_{\rm source}$ and ${\mathbf{x_t^i}} \sim P_{\rm target}$. We only observe their transformed versions $h(\mathbf{x_s^i})$ and $g(\mathbf{x_t^i})$, for some known function class $h(\cdot)$ and $g(\cdot)$. Our goal is to perform a statistical test checking if $P_{\rm source}$ = $P_{\rm target}$ while removing the distortions induced by the transformations. This problem is closely related to concepts underlying numerous domain adaptation algorithms, and in our case, is motivated by the need to combine clinical and imaging based biomarkers from multiple sites and/or batches, where this problem is fairly common and an impediment in the conduct of analyses with much larger sample sizes. We develop a framework that addresses this problem using ideas from hypothesis testing on the transformed measurements, where in the distortions need to be estimated {\it in tandem} with the testing. We derive a simple algorithm and study its convergence and consistency properties in detail, and we also provide lower-bound strategies based on recent work in continuous optimization. On a dataset of individuals at risk for neurological disease, our results are competitive with alternative procedures that are twice as expensive and in some cases operationally infeasible to implement.
Tasks	Domain Adaptation, Unsupervised Domain Adaptation
Published	2016-12-01
URL	http://papers.nips.cc/paper/6209-hypothesis-testing-in-unsupervised-domain-adaptation-with-applications-in-alzheimers-disease
PDF	http://papers.nips.cc/paper/6209-hypothesis-testing-in-unsupervised-domain-adaptation-with-applications-in-alzheimers-disease.pdf
PWC	https://paperswithcode.com/paper/hypothesis-testing-in-unsupervised-domain
Repo
Framework


Title	Shallow Parsing Pipeline - Hindi-English Code-Mixed Social Media Text
Authors	Arnav Sharma, Sakshi Gupta, Raveesh Motlani, Piyush Bansal, Manish Shrivastava, Radhika Mamidi, Dipti M. Sharma
Abstract
Tasks	Language Identification
Published	2016-06-01
URL	https://www.aclweb.org/anthology/N16-1159/
PDF	https://www.aclweb.org/anthology/N16-1159
PWC	https://paperswithcode.com/paper/shallow-parsing-pipeline-hindi-english-code
Repo
Framework

Quantized Random Projections and Non-Linear Estimation of Cosine Similarity


Title	Quantized Random Projections and Non-Linear Estimation of Cosine Similarity
Authors	Ping Li, Michael Mitzenmacher, Martin Slawski
Abstract	Random projections constitute a simple, yet effective technique for dimensionality reduction with applications in learning and search problems. In the present paper, we consider the problem of estimating cosine similarities when the projected data undergo scalar quantization to $b$ bits. We here argue that the maximum likelihood estimator (MLE) is a principled approach to deal with the non-linearity resulting from quantization, and subsequently study its computational and statistical properties. A specific focus is on the on the trade-off between bit depth and the number of projections given a fixed budget of bits for storage or transmission. Along the way, we also touch upon the existence of a qualitative counterpart to the Johnson-Lindenstrauss lemma in the presence of quantization.
Tasks	Dimensionality Reduction, Quantization
Published	2016-12-01
URL	http://papers.nips.cc/paper/6492-quantized-random-projections-and-non-linear-estimation-of-cosine-similarity
PDF	http://papers.nips.cc/paper/6492-quantized-random-projections-and-non-linear-estimation-of-cosine-similarity.pdf
PWC	https://paperswithcode.com/paper/quantized-random-projections-and-non-linear
Repo
Framework

Retrofitting Sense-Specific Word Vectors Using Parallel Text


Title	Retrofitting Sense-Specific Word Vectors Using Parallel Text
Authors	Allyson Ettinger, Philip Resnik, Marine Carpuat
Abstract
Tasks	Word Alignment, Word Sense Disambiguation
Published	2016-06-01
URL	https://www.aclweb.org/anthology/N16-1163/
PDF	https://www.aclweb.org/anthology/N16-1163
PWC	https://paperswithcode.com/paper/retrofitting-sense-specific-word-vectors
Repo
Framework

An Empirical Study of Arabic Formulaic Sequence Extraction Methods


Title	An Empirical Study of Arabic Formulaic Sequence Extraction Methods
Authors	Ayman Alghamdi, Eric Atwell, Claire Brierley
Abstract	This paper aims to implement what is referred to as the collocation of the Arabic keywords approach for extracting formulaic sequences (FSs) in the form of high frequency but semantically regular formulas that are not restricted to any syntactic construction or semantic domain. The study applies several distributional semantic models in order to automatically extract relevant FSs related to Arabic keywords. The data sets used in this experiment are rendered from a new developed corpus-based Arabic wordlist consisting of 5,189 lexical items which represent a variety of modern standard Arabic (MSA) genres and regions, the new wordlist being based on an overlapping frequency based on a comprehensive comparison of four large Arabic corpora with a total size of over 8 billion running words. Empirical n-best precision evaluation methods are used to determine the best association measures (AMs) for extracting high frequency and meaningful FSs. The gold standard reference FSs list was developed in previous studies and manually evaluated against well-established quantitative and qualitative criteria. The results demonstrate that the MI.log{_}f AM achieved the highest results in extracting significant FSs from the large MSA corpus, while the T-score association measure achieved the worst results.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1080/
PDF	https://www.aclweb.org/anthology/L16-1080
PWC	https://paperswithcode.com/paper/an-empirical-study-of-arabic-formulaic
Repo
Framework

Integration of Lexical and Semantic Knowledge for Sentiment Analysis in SMS


Title	Integration of Lexical and Semantic Knowledge for Sentiment Analysis in SMS
Authors	Wejdene Khiari, Mathieu Roche, Asma Bouhafs Hafsia
Abstract	With the explosive growth of online social media (forums, blogs, and social networks), exploitation of these new information sources has become essential. Our work is based on the sud4science project. The goal of this project is to perform multidisciplinary work on a corpus of authentic SMS, in French, collected in 2011 and anonymised (88milSMS corpus: http://88milsms.huma-num.fr). This paper highlights a new method to integrate opinion detection knowledge from an SMS corpus by combining lexical and semantic information. More precisely, our approach gives more weight to words with a sentiment (i.e. presence of words in a dedicated dictionary) for a classification task based on three classes: positive, negative, and neutral. The experiments were conducted on two corpora: an elongated SMS corpus (i.e. repetitions of characters in messages) and a non-elongated SMS corpus. We noted that non-elongated SMS were much better classified than elongated SMS. Overall, this study highlighted that the integration of semantic knowledge always improves classification.
Tasks	Sentiment Analysis
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1188/
PDF	https://www.aclweb.org/anthology/L16-1188
PWC	https://paperswithcode.com/paper/integration-of-lexical-and-semantic-knowledge
Repo
Framework

End-to-End Argumentation Mining in Student Essays


Title	End-to-End Argumentation Mining in Student Essays
Authors	Isaac Persing, Vincent Ng
Abstract
Tasks	Argument Mining
Published	2016-06-01
URL	https://www.aclweb.org/anthology/N16-1164/
PDF	https://www.aclweb.org/anthology/N16-1164
PWC	https://paperswithcode.com/paper/end-to-end-argumentation-mining-in-student
Repo
Framework

Activity Modeling in Email


Title	Activity Modeling in Email
Authors	Ashequl Qadir, Michael Gamon, Patrick Pantel, Ahmed Hassan Awadallah
Abstract
Tasks
Published	2016-06-01
URL	https://www.aclweb.org/anthology/N16-1171/
PDF	https://www.aclweb.org/anthology/N16-1171
PWC	https://paperswithcode.com/paper/activity-modeling-in-email
Repo
Framework

Modeling Complement Types in Phrase-Based SMT


Title	Modeling Complement Types in Phrase-Based SMT
Authors	Marion Weller-Di Marco, Alex Fraser, er, Sabine Schulte im Walde
Abstract
Tasks	Machine Translation
Published	2016-08-01
URL	https://www.aclweb.org/anthology/W16-2205/
PDF	https://www.aclweb.org/anthology/W16-2205
PWC	https://paperswithcode.com/paper/modeling-complement-types-in-phrase-based-smt
Repo
Framework

Analogy-based detection of morphological and semantic relations with word embeddings: what works and what doesn’t.


Title	Analogy-based detection of morphological and semantic relations with word embeddings: what works and what doesn’t.
Authors	Anna Gladkova, Aleks Drozd, r, Satoshi Matsuoka
Abstract
Tasks	Morphological Analysis, Word Embeddings, Word Sense Disambiguation
Published	2016-06-01
URL	https://www.aclweb.org/anthology/N16-2002/
PDF	https://www.aclweb.org/anthology/N16-2002
PWC	https://paperswithcode.com/paper/analogy-based-detection-of-morphological-and
Repo
Framework

ArgRewrite: A Web-based Revision Assistant for Argumentative Writings


Title	ArgRewrite: A Web-based Revision Assistant for Argumentative Writings
Authors	Fan Zhang, Rebecca Hwa, Diane Litman, Homa B. Hashemi
Abstract
Tasks
Published	2016-06-01
URL	https://www.aclweb.org/anthology/N16-3008/
PDF	https://www.aclweb.org/anthology/N16-3008
PWC	https://paperswithcode.com/paper/argrewrite-a-web-based-revision-assistant-for
Repo
Framework

Improving Translation Selection with Supersenses


Title	Improving Translation Selection with Supersenses
Authors	Haiqing Tang, Deyi Xiong, Oier Lopez de Lacalle, Eneko Agirre
Abstract	Selecting appropriate translations for source words with multiple meanings still remains a challenge for statistical machine translation (SMT). One reason for this is that most SMT systems are not good at detecting the proper sense for a polysemic word when it appears in different contexts. In this paper, we adopt a supersense tagging method to annotate source words with coarse-grained ontological concepts. In order to enable the system to choose an appropriate translation for a word or phrase according to the annotated supersense of the word or phrase, we propose two translation models with supersense knowledge: a maximum entropy based model and a supersense embedding model. The effectiveness of our proposed models is validated on a large-scale English-to-Spanish translation task. Results indicate that our method can significantly improve translation quality via correctly conveying the meaning of the source language to the target language.
Tasks	Machine Translation, Word Sense Disambiguation
Published	2016-12-01
URL	https://www.aclweb.org/anthology/C16-1293/
PDF	https://www.aclweb.org/anthology/C16-1293
PWC	https://paperswithcode.com/paper/improving-translation-selection-with
Repo
Framework

Latent Topic Embedding


Title	Latent Topic Embedding
Authors	Di Jiang, Lei Shi, Rongzhong Lian, Hua Wu
Abstract	Topic modeling and word embedding are two important techniques for deriving latent semantics from data. General-purpose topic models typically work in coarse granularity by capturing word co-occurrence at the document/sentence level. In contrast, word embedding models usually work in much finer granularity by modeling word co-occurrence within small sliding windows. With the aim of deriving latent semantics by considering word co-occurrence at different levels of granularity, we propose a novel model named \textit{Latent Topic Embedding} (LTE), which seamlessly integrates topic generation and embedding learning in one unified framework. We further propose an efficient Monte Carlo EM algorithm to estimate the parameters of interest. By retaining the individual advantages of topic modeling and word embedding, LTE results in better latent topics and word embedding. Extensive experiments verify the superiority of LTE over the state-of-the-arts.
Tasks	Topic Models, Word Embeddings
Published	2016-12-01
URL	https://www.aclweb.org/anthology/C16-1253/
PDF	https://www.aclweb.org/anthology/C16-1253
PWC	https://paperswithcode.com/paper/latent-topic-embedding
Repo
Framework

Inherently Pronominal Verbs in Czech: Description and Conversion Based on Treebank Annotation


Title	Inherently Pronominal Verbs in Czech: Description and Conversion Based on Treebank Annotation
Authors	Zde{\v{n}}ka Ure{\v{s}}ov{'a}, Eduard Bej{\v{c}}ek, Jan Haji{\v{c}}
Abstract
Tasks
Published	2016-08-01
URL	https://www.aclweb.org/anthology/W16-1812/
PDF	https://www.aclweb.org/anthology/W16-1812
PWC	https://paperswithcode.com/paper/inherently-pronominal-verbs-in-czech
Repo
Framework