May 5, 2019

2186 words 11 mins read

Paper Group NANR 148

Paper Group NANR 148

CUFE at SemEval-2016 Task 4: A Gated Recurrent Model for Sentiment Classification. Towards producing bilingual lexica from monolingual corpora. Linguistic features for Hindi light verb construction identification. Cross-lingual Learning of an Open-domain Semantic Parser. ConFarm: Extracting Surface Representations of Verb and Noun Constructions fro …

CUFE at SemEval-2016 Task 4: A Gated Recurrent Model for Sentiment Classification

Title CUFE at SemEval-2016 Task 4: A Gated Recurrent Model for Sentiment Classification
Authors Mahmoud Nabil, Amir Atyia, Mohamed Aly
Abstract
Tasks Language Modelling, Sentiment Analysis, Word Embeddings
Published 2016-06-01
URL https://www.aclweb.org/anthology/S16-1005/
PDF https://www.aclweb.org/anthology/S16-1005
PWC https://paperswithcode.com/paper/cufe-at-semeval-2016-task-4-a-gated-recurrent
Repo
Framework

Towards producing bilingual lexica from monolingual corpora

Title Towards producing bilingual lexica from monolingual corpora
Authors Jingyi Han, N{'u}ria Bel
Abstract Bilingual lexica are the basis for many cross-lingual natural language processing tasks. Recent works have shown success in learning bilingual dictionary by taking advantages of comparable corpora and a diverse set of signals derived from monolingual corpora. In the present work, we describe an approach to automatically learn bilingual lexica by training a supervised classifier using word embedding-based vectors of only a few hundred translation equivalent word pairs. The word embedding representations of translation pairs were obtained from source and target monolingual corpora, which are not necessarily related. Our classifier is able to predict whether a new word pair is under a translation relation or not. We tested it on two quite distinct language pairs Chinese-Spanish and English-Spanish. The classifiers achieved more than 0.90 precision and recall for both language pairs in different evaluation scenarios. These results show a high potential for this method to be used in bilingual lexica production for language pairs with reduced amount of parallel or comparable corpora, in particular for phrase table expansion in Statistical Machine Translation systems.
Tasks Machine Translation
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1353/
PDF https://www.aclweb.org/anthology/L16-1353
PWC https://paperswithcode.com/paper/towards-producing-bilingual-lexica-from
Repo
Framework

Linguistic features for Hindi light verb construction identification

Title Linguistic features for Hindi light verb construction identification
Authors Ashwini Vaidya, Sumeet Agarwal, Martha Palmer
Abstract Light verb constructions (LVC) in Hindi are highly productive. If we can distinguish a case such as nirnay lenaa {}decision take; decide{'} from an ordinary verb-argument combination kaagaz lenaa {}paper take; take (a) paper{'},it has been shown to aid NLP applications such as parsing (Begum et al., 2011) and machine translation (Pal et al., 2011). In this paper, we propose an LVC identification system using language specific features for Hindi which shows an improvement over previous work(Begum et al., 2011). To build our system, we carry out a linguistic analysis of Hindi LVCs using Hindi Treebank annotations and propose two new features that are aimed at capturing the diversity of Hindi LVCs in the corpus. We find that our model performs robustly across a diverse range of LVCs and our results underscore the importance of semantic features, which is in keeping with the findings for English. Our error analysis also demonstrates that our classifier can be used to further refine LVC annotations in the Hindi Treebank and make them more consistent across the board.
Tasks Machine Translation
Published 2016-12-01
URL https://www.aclweb.org/anthology/C16-1125/
PDF https://www.aclweb.org/anthology/C16-1125
PWC https://paperswithcode.com/paper/linguistic-features-for-hindi-light-verb
Repo
Framework

Cross-lingual Learning of an Open-domain Semantic Parser

Title Cross-lingual Learning of an Open-domain Semantic Parser
Authors Kilian Evang, Johan Bos
Abstract We propose a method for learning semantic CCG parsers by projecting annotations via a parallel corpus. The method opens an avenue towards cheaply creating multilingual semantic parsers mapping open-domain text to formal meaning representations. A first cross-lingually learned Dutch (from English) semantic parser obtains f-scores ranging from 42.99{%} to 69.22{%} depending on the level of label informativity taken into account, compared to 58.40{%} to 78.88{%} for the underlying source-language system. These are promising numbers compared to state-of-the-art semantic parsing in open domains.
Tasks Semantic Parsing
Published 2016-12-01
URL https://www.aclweb.org/anthology/C16-1056/
PDF https://www.aclweb.org/anthology/C16-1056
PWC https://paperswithcode.com/paper/cross-lingual-learning-of-an-open-domain
Repo
Framework

ConFarm: Extracting Surface Representations of Verb and Noun Constructions from Dependency Annotated Corpora of Russian

Title ConFarm: Extracting Surface Representations of Verb and Noun Constructions from Dependency Annotated Corpora of Russian
Authors Nikita Mediankin
Abstract ConFarm is a web service dedicated to extraction of surface representations of verb and noun constructions from dependency annotated corpora of Russian texts. Currently, the extraction of constructions with a specific lemma from SynTagRus and Russian National Corpus is available. The system provides flexible interface that allows users to fine-tune the output. Extracted constructions are grouped by their contents to allow for compact representation, and the groups are visualised as a graph in order to help navigating the extraction results. ConFarm differs from similar existing tools for Russian language in that it offers full constructions, as opposed to extracting separate dependents of search word or working with collocations, and allows users to discover unexpected constructions as opposed to searching for examples of a user-defined construction.
Tasks
Published 2016-12-01
URL https://www.aclweb.org/anthology/C16-2050/
PDF https://www.aclweb.org/anthology/C16-2050
PWC https://paperswithcode.com/paper/confarm-extracting-surface-representations-of
Repo
Framework

An Unsupervised Multi-Document Summarization Framework Based on Neural Document Model

Title An Unsupervised Multi-Document Summarization Framework Based on Neural Document Model
Authors Shulei Ma, Zhi-Hong Deng, Yunlun Yang
Abstract In the age of information exploding, multi-document summarization is attracting particular attention for the ability to help people get the main ideas in a short time. Traditional extractive methods simply treat the document set as a group of sentences while ignoring the global semantics of the documents. Meanwhile, neural document model is effective on representing the semantic content of documents in low-dimensional vectors. In this paper, we propose a document-level reconstruction framework named DocRebuild, which reconstructs the documents with summary sentences through a neural document model and selects summary sentences to minimize the reconstruction error. We also apply two strategies, sentence filtering and beamsearch, to improve the performance of our method. Experimental results on the benchmark datasets DUC 2006 and DUC 2007 show that DocRebuild is effective and outperforms the state-of-the-art unsupervised algorithms.
Tasks Document Summarization, Multi-Document Summarization, Sentiment Analysis
Published 2016-12-01
URL https://www.aclweb.org/anthology/C16-1143/
PDF https://www.aclweb.org/anthology/C16-1143
PWC https://paperswithcode.com/paper/an-unsupervised-multi-document-summarization
Repo
Framework

Learning grammatical categories using paradigmatic representations: Substitute words for language acquisition

Title Learning grammatical categories using paradigmatic representations: Substitute words for language acquisition
Authors Mehmet Ali Yatbaz, Volkan Cirik, Aylin K{"u}ntay, Deniz Yuret
Abstract Learning syntactic categories is a fundamental task in language acquisition. Previous studies show that co-occurrence patterns of preceding and following words are essential to group words into categories. However, the neighboring words, or frames, are rarely repeated exactly in the data. This creates data sparsity and hampers learning for frame based models. In this work, we propose a paradigmatic representation of word context which uses probable substitutes instead of frames. Our experiments on child-directed speech show that models based on probable substitutes learn more accurate categories with fewer examples compared to models based on frames.
Tasks Language Acquisition
Published 2016-12-01
URL https://www.aclweb.org/anthology/C16-1068/
PDF https://www.aclweb.org/anthology/C16-1068
PWC https://paperswithcode.com/paper/learning-grammatical-categories-using
Repo
Framework

Grammatical error correction using neural machine translation

Title Grammatical error correction using neural machine translation
Authors Zheng Yuan, Ted Briscoe
Abstract
Tasks Grammatical Error Correction, Language Modelling, Machine Translation
Published 2016-06-01
URL https://www.aclweb.org/anthology/N16-1042/
PDF https://www.aclweb.org/anthology/N16-1042
PWC https://paperswithcode.com/paper/grammatical-error-correction-using-neural
Repo
Framework

A Turkish Database for Psycholinguistic Studies Based on Frequency, Age of Acquisition, and Imageability

Title A Turkish Database for Psycholinguistic Studies Based on Frequency, Age of Acquisition, and Imageability
Authors Elif Ahsen Acar, Deniz Zeyrek, Murathan Kurfal{\i}, Cem Boz{\c{s}}ahin
Abstract This study primarily aims to build a Turkish psycholinguistic database including three variables: word frequency, age of acquisition (AoA), and imageability, where AoA and imageability information are limited to nouns. We used a corpus-based approach to obtain information about the AoA variable. We built two corpora: a child literature corpus (CLC) including 535 books written for 3-12 years old children, and a corpus of transcribed children{'}s speech (CSC) at ages 1;4-4;8. A comparison between the word frequencies of CLC and CSC gave positive correlation results, suggesting the usability of the CLC to extract AoA information. We assumed that frequent words of the CLC would correspond to early acquired words whereas frequent words of a corpus of adult language would correspond to late acquired words. To validate AoA results from our corpus-based approach, a rated AoA questionnaire was conducted on adults. Imageability values were collected via a different questionnaire conducted on adults. We conclude that it is possible to deduce AoA information for high frequency words with the corpus-based approach. The results about low frequency words were inconclusive, which is attributed to the fact that corpus-based AoA information is affected by the strong negative correlation between corpus frequency and rated AoA.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1571/
PDF https://www.aclweb.org/anthology/L16-1571
PWC https://paperswithcode.com/paper/a-turkish-database-for-psycholinguistic
Repo
Framework

Algorithms and matching lower bounds for approximately-convex optimization

Title Algorithms and matching lower bounds for approximately-convex optimization
Authors Andrej Risteski, Yuanzhi Li
Abstract In recent years, a rapidly increasing number of applications in practice requires solving non-convex objectives, like training neural networks, learning graphical models, maximum likelihood estimation etc. Though simple heuristics such as gradient descent with very few modifications tend to work well, theoretical understanding is very weak. We consider possibly the most natural class of non-convex functions where one could hope to obtain provable guarantees: functions that are ``approximately convex’', i.e. functions $\tf: \Real^d \to \Real$ for which there exists a \emph{convex function} $f$ such that for all $x$, $\tf(x) - f(x) \le \errnoise$ for a fixed value $\errnoise$. We then want to minimize $\tf$, i.e. output a point $\tx$ such that $\tf(\tx) \le \min_{x} \tf(x) + \err$. It is quite natural to conjecture that for fixed $\err$, the problem gets harder for larger $\errnoise$, however, the exact dependency of $\err$ and $\errnoise$ is not known. In this paper, we strengthen the known \emph{information theoretic} lower bounds on the trade-off between $\err$ and $\errnoise$ substantially, and exhibit an algorithm that matches these lower bounds for a large class of convex bodies. |
Tasks
Published 2016-12-01
URL http://papers.nips.cc/paper/6576-algorithms-and-matching-lower-bounds-for-approximately-convex-optimization
PDF http://papers.nips.cc/paper/6576-algorithms-and-matching-lower-bounds-for-approximately-convex-optimization.pdf
PWC https://paperswithcode.com/paper/algorithms-and-matching-lower-bounds-for
Repo
Framework

Multiple Emotions Detection in Conversation Transcripts

Title Multiple Emotions Detection in Conversation Transcripts
Authors Duc-Anh Phan, Hiroyuki Shindo, Yuji Matsumoto
Abstract
Tasks
Published 2016-10-01
URL https://www.aclweb.org/anthology/Y16-2006/
PDF https://www.aclweb.org/anthology/Y16-2006
PWC https://paperswithcode.com/paper/multiple-emotions-detection-in-conversation
Repo
Framework

Refurbishing a Morphological Database for German

Title Refurbishing a Morphological Database for German
Authors Petra Steiner
Abstract The CELEX database is one of the standard lexical resources for German. It yields a wealth of data especially for phonological and morphological applications. The morphological part comprises deep-structure morphological analyses of German. However, as it was developed in the Nineties, both encoding and spelling are outdated. About one fifth of over 50,000 datasets contain umlauts and signs such as {\ss}. Changes to a modern version cannot be obtained by simple substitution. In this paper, we shortly describe the original content and form of the orthographic and morphological database for German in CELEX. Then we present our work on modernizing the linguistic data. Lemmas and morphological analyses are transferred to a modern standard of encoding by first merging orthographic and morphological information of the lemmas and their entries and then performing a second substitution for the morphs within their morphological analyses. Changes to modern German spelling are performed by substitution rules according to orthographical standards. We show an example of the use of the data for the disambiguation of morphological structures. The discussion describes prospects of future work on this or similar lexicons. The Perl script is publicly available on our website.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1176/
PDF https://www.aclweb.org/anthology/L16-1176
PWC https://paperswithcode.com/paper/refurbishing-a-morphological-database-for
Repo
Framework

Building a Cross-document Event-Event Relation Corpus

Title Building a Cross-document Event-Event Relation Corpus
Authors Yu Hong, Tongtao Zhang, Tim O{'}Gorman, Sharone Horowit-Hendler, Heng Ji, Martha Palmer
Abstract
Tasks Knowledge Base Population
Published 2016-08-01
URL https://www.aclweb.org/anthology/W16-1701/
PDF https://www.aclweb.org/anthology/W16-1701
PWC https://paperswithcode.com/paper/building-a-cross-document-event-event
Repo
Framework

Towards a QUD-Based Analysis of Gapping Constructions

Title Towards a QUD-Based Analysis of Gapping Constructions
Authors Sang-Hee Park
Abstract
Tasks
Published 2016-10-01
URL https://www.aclweb.org/anthology/Y16-2028/
PDF https://www.aclweb.org/anthology/Y16-2028
PWC https://paperswithcode.com/paper/towards-a-qud-based-analysis-of-gapping
Repo
Framework

Character-Aware Neural Networks for Arabic Named Entity Recognition for Social Media

Title Character-Aware Neural Networks for Arabic Named Entity Recognition for Social Media
Authors Mourad Gridach
Abstract Named Entity Recognition (NER) is the task of classifying or labelling atomic elements in the text into categories such as Person, Location or Organisation. For Arabic language, recognizing named entities is a challenging task because of the complexity and the unique characteristics of this language. In addition, most of the previous work focuses on Modern Standard Arabic (MSA), however, recognizing named entities in social media is becoming more interesting these days. Dialectal Arabic (DA) and MSA are both used in social media, which is deemed as another challenging task. Most state-of-the-art Arabic NER systems count heavily on handcrafted engineering features and lexicons which is time consuming. In this paper, we introduce a novel neural network architecture which benefits both from character- and word-level representations automatically, by using combination of bidirectional LSTM and Conditional Random Field (CRF), eliminating the need for most feature engineering. Moreover, our model relies on unsupervised word representations learned from unannotated corpora. Experimental results demonstrate that our model achieves state-of-the-art performance on publicly available benchmark for Arabic NER for social media and surpassing the previous system by a large margin.
Tasks Feature Engineering, Information Retrieval, Machine Translation, Named Entity Recognition, Opinion Mining, Question Answering, Text Clustering
Published 2016-12-01
URL https://www.aclweb.org/anthology/W16-3703/
PDF https://www.aclweb.org/anthology/W16-3703
PWC https://paperswithcode.com/paper/character-aware-neural-networks-for-arabic
Repo
Framework
comments powered by Disqus