May 5, 2019

2186 words 11 mins read

Paper Group NANR 148

CUFE at SemEval-2016 Task 4: A Gated Recurrent Model for Sentiment Classification. Towards producing bilingual lexica from monolingual corpora. Linguistic features for Hindi light verb construction identification. Cross-lingual Learning of an Open-domain Semantic Parser. ConFarm: Extracting Surface Representations of Verb and Noun Constructions fro …

CUFE at SemEval-2016 Task 4: A Gated Recurrent Model for Sentiment Classification


Title	CUFE at SemEval-2016 Task 4: A Gated Recurrent Model for Sentiment Classification
Authors	Mahmoud Nabil, Amir Atyia, Mohamed Aly
Abstract
Tasks	Language Modelling, Sentiment Analysis, Word Embeddings
Published	2016-06-01
URL	https://www.aclweb.org/anthology/S16-1005/
PDF	https://www.aclweb.org/anthology/S16-1005
PWC	https://paperswithcode.com/paper/cufe-at-semeval-2016-task-4-a-gated-recurrent
Repo
Framework

Towards producing bilingual lexica from monolingual corpora


Title	Towards producing bilingual lexica from monolingual corpora
Authors	Jingyi Han, N{'u}ria Bel
Abstract	Bilingual lexica are the basis for many cross-lingual natural language processing tasks. Recent works have shown success in learning bilingual dictionary by taking advantages of comparable corpora and a diverse set of signals derived from monolingual corpora. In the present work, we describe an approach to automatically learn bilingual lexica by training a supervised classifier using word embedding-based vectors of only a few hundred translation equivalent word pairs. The word embedding representations of translation pairs were obtained from source and target monolingual corpora, which are not necessarily related. Our classifier is able to predict whether a new word pair is under a translation relation or not. We tested it on two quite distinct language pairs Chinese-Spanish and English-Spanish. The classifiers achieved more than 0.90 precision and recall for both language pairs in different evaluation scenarios. These results show a high potential for this method to be used in bilingual lexica production for language pairs with reduced amount of parallel or comparable corpora, in particular for phrase table expansion in Statistical Machine Translation systems.
Tasks	Machine Translation
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1353/
PDF	https://www.aclweb.org/anthology/L16-1353
PWC	https://paperswithcode.com/paper/towards-producing-bilingual-lexica-from
Repo
Framework

Linguistic features for Hindi light verb construction identification


Title	Linguistic features for Hindi light verb construction identification
Authors	Ashwini Vaidya, Sumeet Agarwal, Martha Palmer
Abstract	Light verb constructions (LVC) in Hindi are highly productive. If we can distinguish a case such as nirnay lenaa {`}decision take; decide{'} from an ordinary verb-argument combination kaagaz lenaa {`}paper take; take (a) paper{'},it has been shown to aid NLP applications such as parsing (Begum et al., 2011) and machine translation (Pal et al., 2011). In this paper, we propose an LVC identification system using language specific features for Hindi which shows an improvement over previous work(Begum et al., 2011). To build our system, we carry out a linguistic analysis of Hindi LVCs using Hindi Treebank annotations and propose two new features that are aimed at capturing the diversity of Hindi LVCs in the corpus. We find that our model performs robustly across a diverse range of LVCs and our results underscore the importance of semantic features, which is in keeping with the findings for English. Our error analysis also demonstrates that our classifier can be used to further refine LVC annotations in the Hindi Treebank and make them more consistent across the board.
Tasks	Machine Translation
Published	2016-12-01
URL	https://www.aclweb.org/anthology/C16-1125/
PDF	https://www.aclweb.org/anthology/C16-1125
PWC	https://paperswithcode.com/paper/linguistic-features-for-hindi-light-verb
Repo
Framework

Cross-lingual Learning of an Open-domain Semantic Parser


Title	Cross-lingual Learning of an Open-domain Semantic Parser
Authors	Kilian Evang, Johan Bos
Abstract	We propose a method for learning semantic CCG parsers by projecting annotations via a parallel corpus. The method opens an avenue towards cheaply creating multilingual semantic parsers mapping open-domain text to formal meaning representations. A first cross-lingually learned Dutch (from English) semantic parser obtains f-scores ranging from 42.99{%} to 69.22{%} depending on the level of label informativity taken into account, compared to 58.40{%} to 78.88{%} for the underlying source-language system. These are promising numbers compared to state-of-the-art semantic parsing in open domains.
Tasks	Semantic Parsing
Published	2016-12-01
URL	https://www.aclweb.org/anthology/C16-1056/
PDF	https://www.aclweb.org/anthology/C16-1056
PWC	https://paperswithcode.com/paper/cross-lingual-learning-of-an-open-domain
Repo
Framework

ConFarm: Extracting Surface Representations of Verb and Noun Constructions from Dependency Annotated Corpora of Russian


Title	ConFarm: Extracting Surface Representations of Verb and Noun Constructions from Dependency Annotated Corpora of Russian
Authors	Nikita Mediankin
Abstract	ConFarm is a web service dedicated to extraction of surface representations of verb and noun constructions from dependency annotated corpora of Russian texts. Currently, the extraction of constructions with a specific lemma from SynTagRus and Russian National Corpus is available. The system provides flexible interface that allows users to fine-tune the output. Extracted constructions are grouped by their contents to allow for compact representation, and the groups are visualised as a graph in order to help navigating the extraction results. ConFarm differs from similar existing tools for Russian language in that it offers full constructions, as opposed to extracting separate dependents of search word or working with collocations, and allows users to discover unexpected constructions as opposed to searching for examples of a user-defined construction.
Tasks
Published	2016-12-01
URL	https://www.aclweb.org/anthology/C16-2050/
PDF	https://www.aclweb.org/anthology/C16-2050
PWC	https://paperswithcode.com/paper/confarm-extracting-surface-representations-of
Repo
Framework

An Unsupervised Multi-Document Summarization Framework Based on Neural Document Model


Title	An Unsupervised Multi-Document Summarization Framework Based on Neural Document Model
Authors	Shulei Ma, Zhi-Hong Deng, Yunlun Yang
Abstract	In the age of information exploding, multi-document summarization is attracting particular attention for the ability to help people get the main ideas in a short time. Traditional extractive methods simply treat the document set as a group of sentences while ignoring the global semantics of the documents. Meanwhile, neural document model is effective on representing the semantic content of documents in low-dimensional vectors. In this paper, we propose a document-level reconstruction framework named DocRebuild, which reconstructs the documents with summary sentences through a neural document model and selects summary sentences to minimize the reconstruction error. We also apply two strategies, sentence filtering and beamsearch, to improve the performance of our method. Experimental results on the benchmark datasets DUC 2006 and DUC 2007 show that DocRebuild is effective and outperforms the state-of-the-art unsupervised algorithms.
Tasks	Document Summarization, Multi-Document Summarization, Sentiment Analysis
Published	2016-12-01
URL	https://www.aclweb.org/anthology/C16-1143/
PDF	https://www.aclweb.org/anthology/C16-1143
PWC	https://paperswithcode.com/paper/an-unsupervised-multi-document-summarization
Repo
Framework

Learning grammatical categories using paradigmatic representations: Substitute words for language acquisition


Title	Learning grammatical categories using paradigmatic representations: Substitute words for language acquisition
Authors	Mehmet Ali Yatbaz, Volkan Cirik, Aylin K{"u}ntay, Deniz Yuret
Abstract	Learning syntactic categories is a fundamental task in language acquisition. Previous studies show that co-occurrence patterns of preceding and following words are essential to group words into categories. However, the neighboring words, or frames, are rarely repeated exactly in the data. This creates data sparsity and hampers learning for frame based models. In this work, we propose a paradigmatic representation of word context which uses probable substitutes instead of frames. Our experiments on child-directed speech show that models based on probable substitutes learn more accurate categories with fewer examples compared to models based on frames.
Tasks	Language Acquisition
Published	2016-12-01
URL	https://www.aclweb.org/anthology/C16-1068/
PDF	https://www.aclweb.org/anthology/C16-1068
PWC	https://paperswithcode.com/paper/learning-grammatical-categories-using
Repo
Framework

Grammatical error correction using neural machine translation


Title	Grammatical error correction using neural machine translation
Authors	Zheng Yuan, Ted Briscoe
Abstract
Tasks	Grammatical Error Correction, Language Modelling, Machine Translation
Published	2016-06-01
URL	https://www.aclweb.org/anthology/N16-1042/
PDF	https://www.aclweb.org/anthology/N16-1042
PWC	https://paperswithcode.com/paper/grammatical-error-correction-using-neural
Repo
Framework

A Turkish Database for Psycholinguistic Studies Based on Frequency, Age of Acquisition, and Imageability


Title	A Turkish Database for Psycholinguistic Studies Based on Frequency, Age of Acquisition, and Imageability
Authors	Elif Ahsen Acar, Deniz Zeyrek, Murathan Kurfal{\i}, Cem Boz{\c{s}}ahin
Abstract	This study primarily aims to build a Turkish psycholinguistic database including three variables: word frequency, age of acquisition (AoA), and imageability, where AoA and imageability information are limited to nouns. We used a corpus-based approach to obtain information about the AoA variable. We built two corpora: a child literature corpus (CLC) including 535 books written for 3-12 years old children, and a corpus of transcribed children{'}s speech (CSC) at ages 1;4-4;8. A comparison between the word frequencies of CLC and CSC gave positive correlation results, suggesting the usability of the CLC to extract AoA information. We assumed that frequent words of the CLC would correspond to early acquired words whereas frequent words of a corpus of adult language would correspond to late acquired words. To validate AoA results from our corpus-based approach, a rated AoA questionnaire was conducted on adults. Imageability values were collected via a different questionnaire conducted on adults. We conclude that it is possible to deduce AoA information for high frequency words with the corpus-based approach. The results about low frequency words were inconclusive, which is attributed to the fact that corpus-based AoA information is affected by the strong negative correlation between corpus frequency and rated AoA.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1571/
PDF	https://www.aclweb.org/anthology/L16-1571
PWC	https://paperswithcode.com/paper/a-turkish-database-for-psycholinguistic
Repo
Framework

Algorithms and matching lower bounds for approximately-convex optimization


Title	Algorithms and matching lower bounds for approximately-convex optimization
Authors	Andrej Risteski, Yuanzhi Li
Abstract	In recent years, a rapidly increasing number of applications in practice requires solving non-convex objectives, like training neural networks, learning graphical models, maximum likelihood estimation etc. Though simple heuristics such as gradient descent with very few modifications tend to work well, theoretical understanding is very weak. We consider possibly the most natural class of non-convex functions where one could hope to obtain provable guarantees: functions that are ``approximately convex’', i.e. functions $\tf: \Real^d \to \Real$ for which there exists a \emph{convex function} $f$ such that for all $x$, $\tf(x) - f(x) \le \errnoise$ for a fixed value $\errnoise$. We then want to minimize $\tf$, i.e. output a point $\tx$ such that $\tf(\tx) \le \min_{x} \tf(x) + \err$. It is quite natural to conjecture that for fixed $\err$, the problem gets harder for larger $\errnoise$, however, the exact dependency of $\err$ and $\errnoise$ is not known. In this paper, we strengthen the known \emph{information theoretic} lower bounds on the trade-off between $\err$ and $\errnoise$ substantially, and exhibit an algorithm that matches these lower bounds for a large class of convex bodies. \|
Tasks
Published	2016-12-01
URL	http://papers.nips.cc/paper/6576-algorithms-and-matching-lower-bounds-for-approximately-convex-optimization
PDF	http://papers.nips.cc/paper/6576-algorithms-and-matching-lower-bounds-for-approximately-convex-optimization.pdf
PWC	https://paperswithcode.com/paper/algorithms-and-matching-lower-bounds-for
Repo
Framework

Multiple Emotions Detection in Conversation Transcripts


Title	Multiple Emotions Detection in Conversation Transcripts
Authors	Duc-Anh Phan, Hiroyuki Shindo, Yuji Matsumoto
Abstract
Tasks
Published	2016-10-01
URL	https://www.aclweb.org/anthology/Y16-2006/
PDF	https://www.aclweb.org/anthology/Y16-2006
PWC	https://paperswithcode.com/paper/multiple-emotions-detection-in-conversation
Repo
Framework

Refurbishing a Morphological Database for German


Title	Refurbishing a Morphological Database for German
Authors	Petra Steiner
Abstract	The CELEX database is one of the standard lexical resources for German. It yields a wealth of data especially for phonological and morphological applications. The morphological part comprises deep-structure morphological analyses of German. However, as it was developed in the Nineties, both encoding and spelling are outdated. About one fifth of over 50,000 datasets contain umlauts and signs such as {\ss}. Changes to a modern version cannot be obtained by simple substitution. In this paper, we shortly describe the original content and form of the orthographic and morphological database for German in CELEX. Then we present our work on modernizing the linguistic data. Lemmas and morphological analyses are transferred to a modern standard of encoding by first merging orthographic and morphological information of the lemmas and their entries and then performing a second substitution for the morphs within their morphological analyses. Changes to modern German spelling are performed by substitution rules according to orthographical standards. We show an example of the use of the data for the disambiguation of morphological structures. The discussion describes prospects of future work on this or similar lexicons. The Perl script is publicly available on our website.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1176/
PDF	https://www.aclweb.org/anthology/L16-1176
PWC	https://paperswithcode.com/paper/refurbishing-a-morphological-database-for
Repo
Framework

Building a Cross-document Event-Event Relation Corpus


Title	Building a Cross-document Event-Event Relation Corpus
Authors	Yu Hong, Tongtao Zhang, Tim O{'}Gorman, Sharone Horowit-Hendler, Heng Ji, Martha Palmer
Abstract
Tasks	Knowledge Base Population
Published	2016-08-01
URL	https://www.aclweb.org/anthology/W16-1701/
PDF	https://www.aclweb.org/anthology/W16-1701
PWC	https://paperswithcode.com/paper/building-a-cross-document-event-event
Repo
Framework

Towards a QUD-Based Analysis of Gapping Constructions


Title	Towards a QUD-Based Analysis of Gapping Constructions
Authors	Sang-Hee Park
Abstract
Tasks
Published	2016-10-01
URL	https://www.aclweb.org/anthology/Y16-2028/
PDF	https://www.aclweb.org/anthology/Y16-2028
PWC	https://paperswithcode.com/paper/towards-a-qud-based-analysis-of-gapping
Repo
Framework


Title	Character-Aware Neural Networks for Arabic Named Entity Recognition for Social Media
Authors	Mourad Gridach
Abstract	Named Entity Recognition (NER) is the task of classifying or labelling atomic elements in the text into categories such as Person, Location or Organisation. For Arabic language, recognizing named entities is a challenging task because of the complexity and the unique characteristics of this language. In addition, most of the previous work focuses on Modern Standard Arabic (MSA), however, recognizing named entities in social media is becoming more interesting these days. Dialectal Arabic (DA) and MSA are both used in social media, which is deemed as another challenging task. Most state-of-the-art Arabic NER systems count heavily on handcrafted engineering features and lexicons which is time consuming. In this paper, we introduce a novel neural network architecture which benefits both from character- and word-level representations automatically, by using combination of bidirectional LSTM and Conditional Random Field (CRF), eliminating the need for most feature engineering. Moreover, our model relies on unsupervised word representations learned from unannotated corpora. Experimental results demonstrate that our model achieves state-of-the-art performance on publicly available benchmark for Arabic NER for social media and surpassing the previous system by a large margin.
Tasks	Feature Engineering, Information Retrieval, Machine Translation, Named Entity Recognition, Opinion Mining, Question Answering, Text Clustering
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-3703/
PDF	https://www.aclweb.org/anthology/W16-3703
PWC	https://paperswithcode.com/paper/character-aware-neural-networks-for-arabic
Repo
Framework