Paper Group NANR 148
CUFE at SemEval-2016 Task 4: A Gated Recurrent Model for Sentiment Classification. Towards producing bilingual lexica from monolingual corpora. Linguistic features for Hindi light verb construction identification. Cross-lingual Learning of an Open-domain Semantic Parser. ConFarm: Extracting Surface Representations of Verb and Noun Constructions fro …
CUFE at SemEval-2016 Task 4: A Gated Recurrent Model for Sentiment Classification
Title | CUFE at SemEval-2016 Task 4: A Gated Recurrent Model for Sentiment Classification |
Authors | Mahmoud Nabil, Amir Atyia, Mohamed Aly |
Abstract | |
Tasks | Language Modelling, Sentiment Analysis, Word Embeddings |
Published | 2016-06-01 |
URL | https://www.aclweb.org/anthology/S16-1005/ |
https://www.aclweb.org/anthology/S16-1005 | |
PWC | https://paperswithcode.com/paper/cufe-at-semeval-2016-task-4-a-gated-recurrent |
Repo | |
Framework | |
Towards producing bilingual lexica from monolingual corpora
Title | Towards producing bilingual lexica from monolingual corpora |
Authors | Jingyi Han, N{'u}ria Bel |
Abstract | Bilingual lexica are the basis for many cross-lingual natural language processing tasks. Recent works have shown success in learning bilingual dictionary by taking advantages of comparable corpora and a diverse set of signals derived from monolingual corpora. In the present work, we describe an approach to automatically learn bilingual lexica by training a supervised classifier using word embedding-based vectors of only a few hundred translation equivalent word pairs. The word embedding representations of translation pairs were obtained from source and target monolingual corpora, which are not necessarily related. Our classifier is able to predict whether a new word pair is under a translation relation or not. We tested it on two quite distinct language pairs Chinese-Spanish and English-Spanish. The classifiers achieved more than 0.90 precision and recall for both language pairs in different evaluation scenarios. These results show a high potential for this method to be used in bilingual lexica production for language pairs with reduced amount of parallel or comparable corpora, in particular for phrase table expansion in Statistical Machine Translation systems. |
Tasks | Machine Translation |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1353/ |
https://www.aclweb.org/anthology/L16-1353 | |
PWC | https://paperswithcode.com/paper/towards-producing-bilingual-lexica-from |
Repo | |
Framework | |
Linguistic features for Hindi light verb construction identification
Title | Linguistic features for Hindi light verb construction identification |
Authors | Ashwini Vaidya, Sumeet Agarwal, Martha Palmer |
Abstract | Light verb constructions (LVC) in Hindi are highly productive. If we can distinguish a case such as nirnay lenaa {}decision take; decide{'} from an ordinary verb-argument combination kaagaz lenaa { }paper take; take (a) paper{'},it has been shown to aid NLP applications such as parsing (Begum et al., 2011) and machine translation (Pal et al., 2011). In this paper, we propose an LVC identification system using language specific features for Hindi which shows an improvement over previous work(Begum et al., 2011). To build our system, we carry out a linguistic analysis of Hindi LVCs using Hindi Treebank annotations and propose two new features that are aimed at capturing the diversity of Hindi LVCs in the corpus. We find that our model performs robustly across a diverse range of LVCs and our results underscore the importance of semantic features, which is in keeping with the findings for English. Our error analysis also demonstrates that our classifier can be used to further refine LVC annotations in the Hindi Treebank and make them more consistent across the board. |
Tasks | Machine Translation |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/C16-1125/ |
https://www.aclweb.org/anthology/C16-1125 | |
PWC | https://paperswithcode.com/paper/linguistic-features-for-hindi-light-verb |
Repo | |
Framework | |
Cross-lingual Learning of an Open-domain Semantic Parser
Title | Cross-lingual Learning of an Open-domain Semantic Parser |
Authors | Kilian Evang, Johan Bos |
Abstract | We propose a method for learning semantic CCG parsers by projecting annotations via a parallel corpus. The method opens an avenue towards cheaply creating multilingual semantic parsers mapping open-domain text to formal meaning representations. A first cross-lingually learned Dutch (from English) semantic parser obtains f-scores ranging from 42.99{%} to 69.22{%} depending on the level of label informativity taken into account, compared to 58.40{%} to 78.88{%} for the underlying source-language system. These are promising numbers compared to state-of-the-art semantic parsing in open domains. |
Tasks | Semantic Parsing |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/C16-1056/ |
https://www.aclweb.org/anthology/C16-1056 | |
PWC | https://paperswithcode.com/paper/cross-lingual-learning-of-an-open-domain |
Repo | |
Framework | |
ConFarm: Extracting Surface Representations of Verb and Noun Constructions from Dependency Annotated Corpora of Russian
Title | ConFarm: Extracting Surface Representations of Verb and Noun Constructions from Dependency Annotated Corpora of Russian |
Authors | Nikita Mediankin |
Abstract | ConFarm is a web service dedicated to extraction of surface representations of verb and noun constructions from dependency annotated corpora of Russian texts. Currently, the extraction of constructions with a specific lemma from SynTagRus and Russian National Corpus is available. The system provides flexible interface that allows users to fine-tune the output. Extracted constructions are grouped by their contents to allow for compact representation, and the groups are visualised as a graph in order to help navigating the extraction results. ConFarm differs from similar existing tools for Russian language in that it offers full constructions, as opposed to extracting separate dependents of search word or working with collocations, and allows users to discover unexpected constructions as opposed to searching for examples of a user-defined construction. |
Tasks | |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/C16-2050/ |
https://www.aclweb.org/anthology/C16-2050 | |
PWC | https://paperswithcode.com/paper/confarm-extracting-surface-representations-of |
Repo | |
Framework | |
An Unsupervised Multi-Document Summarization Framework Based on Neural Document Model
Title | An Unsupervised Multi-Document Summarization Framework Based on Neural Document Model |
Authors | Shulei Ma, Zhi-Hong Deng, Yunlun Yang |
Abstract | In the age of information exploding, multi-document summarization is attracting particular attention for the ability to help people get the main ideas in a short time. Traditional extractive methods simply treat the document set as a group of sentences while ignoring the global semantics of the documents. Meanwhile, neural document model is effective on representing the semantic content of documents in low-dimensional vectors. In this paper, we propose a document-level reconstruction framework named DocRebuild, which reconstructs the documents with summary sentences through a neural document model and selects summary sentences to minimize the reconstruction error. We also apply two strategies, sentence filtering and beamsearch, to improve the performance of our method. Experimental results on the benchmark datasets DUC 2006 and DUC 2007 show that DocRebuild is effective and outperforms the state-of-the-art unsupervised algorithms. |
Tasks | Document Summarization, Multi-Document Summarization, Sentiment Analysis |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/C16-1143/ |
https://www.aclweb.org/anthology/C16-1143 | |
PWC | https://paperswithcode.com/paper/an-unsupervised-multi-document-summarization |
Repo | |
Framework | |
Learning grammatical categories using paradigmatic representations: Substitute words for language acquisition
Title | Learning grammatical categories using paradigmatic representations: Substitute words for language acquisition |
Authors | Mehmet Ali Yatbaz, Volkan Cirik, Aylin K{"u}ntay, Deniz Yuret |
Abstract | Learning syntactic categories is a fundamental task in language acquisition. Previous studies show that co-occurrence patterns of preceding and following words are essential to group words into categories. However, the neighboring words, or frames, are rarely repeated exactly in the data. This creates data sparsity and hampers learning for frame based models. In this work, we propose a paradigmatic representation of word context which uses probable substitutes instead of frames. Our experiments on child-directed speech show that models based on probable substitutes learn more accurate categories with fewer examples compared to models based on frames. |
Tasks | Language Acquisition |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/C16-1068/ |
https://www.aclweb.org/anthology/C16-1068 | |
PWC | https://paperswithcode.com/paper/learning-grammatical-categories-using |
Repo | |
Framework | |
Grammatical error correction using neural machine translation
Title | Grammatical error correction using neural machine translation |
Authors | Zheng Yuan, Ted Briscoe |
Abstract | |
Tasks | Grammatical Error Correction, Language Modelling, Machine Translation |
Published | 2016-06-01 |
URL | https://www.aclweb.org/anthology/N16-1042/ |
https://www.aclweb.org/anthology/N16-1042 | |
PWC | https://paperswithcode.com/paper/grammatical-error-correction-using-neural |
Repo | |
Framework | |
A Turkish Database for Psycholinguistic Studies Based on Frequency, Age of Acquisition, and Imageability
Title | A Turkish Database for Psycholinguistic Studies Based on Frequency, Age of Acquisition, and Imageability |
Authors | Elif Ahsen Acar, Deniz Zeyrek, Murathan Kurfal{\i}, Cem Boz{\c{s}}ahin |
Abstract | This study primarily aims to build a Turkish psycholinguistic database including three variables: word frequency, age of acquisition (AoA), and imageability, where AoA and imageability information are limited to nouns. We used a corpus-based approach to obtain information about the AoA variable. We built two corpora: a child literature corpus (CLC) including 535 books written for 3-12 years old children, and a corpus of transcribed children{'}s speech (CSC) at ages 1;4-4;8. A comparison between the word frequencies of CLC and CSC gave positive correlation results, suggesting the usability of the CLC to extract AoA information. We assumed that frequent words of the CLC would correspond to early acquired words whereas frequent words of a corpus of adult language would correspond to late acquired words. To validate AoA results from our corpus-based approach, a rated AoA questionnaire was conducted on adults. Imageability values were collected via a different questionnaire conducted on adults. We conclude that it is possible to deduce AoA information for high frequency words with the corpus-based approach. The results about low frequency words were inconclusive, which is attributed to the fact that corpus-based AoA information is affected by the strong negative correlation between corpus frequency and rated AoA. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1571/ |
https://www.aclweb.org/anthology/L16-1571 | |
PWC | https://paperswithcode.com/paper/a-turkish-database-for-psycholinguistic |
Repo | |
Framework | |
Algorithms and matching lower bounds for approximately-convex optimization
Title | Algorithms and matching lower bounds for approximately-convex optimization |
Authors | Andrej Risteski, Yuanzhi Li |
Abstract | In recent years, a rapidly increasing number of applications in practice requires solving non-convex objectives, like training neural networks, learning graphical models, maximum likelihood estimation etc. Though simple heuristics such as gradient descent with very few modifications tend to work well, theoretical understanding is very weak. We consider possibly the most natural class of non-convex functions where one could hope to obtain provable guarantees: functions that are ``approximately convex’', i.e. functions $\tf: \Real^d \to \Real$ for which there exists a \emph{convex function} $f$ such that for all $x$, $\tf(x) - f(x) \le \errnoise$ for a fixed value $\errnoise$. We then want to minimize $\tf$, i.e. output a point $\tx$ such that $\tf(\tx) \le \min_{x} \tf(x) + \err$. It is quite natural to conjecture that for fixed $\err$, the problem gets harder for larger $\errnoise$, however, the exact dependency of $\err$ and $\errnoise$ is not known. In this paper, we strengthen the known \emph{information theoretic} lower bounds on the trade-off between $\err$ and $\errnoise$ substantially, and exhibit an algorithm that matches these lower bounds for a large class of convex bodies. | |
Tasks | |
Published | 2016-12-01 |
URL | http://papers.nips.cc/paper/6576-algorithms-and-matching-lower-bounds-for-approximately-convex-optimization |
http://papers.nips.cc/paper/6576-algorithms-and-matching-lower-bounds-for-approximately-convex-optimization.pdf | |
PWC | https://paperswithcode.com/paper/algorithms-and-matching-lower-bounds-for |
Repo | |
Framework | |
Multiple Emotions Detection in Conversation Transcripts
Title | Multiple Emotions Detection in Conversation Transcripts |
Authors | Duc-Anh Phan, Hiroyuki Shindo, Yuji Matsumoto |
Abstract | |
Tasks | |
Published | 2016-10-01 |
URL | https://www.aclweb.org/anthology/Y16-2006/ |
https://www.aclweb.org/anthology/Y16-2006 | |
PWC | https://paperswithcode.com/paper/multiple-emotions-detection-in-conversation |
Repo | |
Framework | |
Refurbishing a Morphological Database for German
Title | Refurbishing a Morphological Database for German |
Authors | Petra Steiner |
Abstract | The CELEX database is one of the standard lexical resources for German. It yields a wealth of data especially for phonological and morphological applications. The morphological part comprises deep-structure morphological analyses of German. However, as it was developed in the Nineties, both encoding and spelling are outdated. About one fifth of over 50,000 datasets contain umlauts and signs such as {\ss}. Changes to a modern version cannot be obtained by simple substitution. In this paper, we shortly describe the original content and form of the orthographic and morphological database for German in CELEX. Then we present our work on modernizing the linguistic data. Lemmas and morphological analyses are transferred to a modern standard of encoding by first merging orthographic and morphological information of the lemmas and their entries and then performing a second substitution for the morphs within their morphological analyses. Changes to modern German spelling are performed by substitution rules according to orthographical standards. We show an example of the use of the data for the disambiguation of morphological structures. The discussion describes prospects of future work on this or similar lexicons. The Perl script is publicly available on our website. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1176/ |
https://www.aclweb.org/anthology/L16-1176 | |
PWC | https://paperswithcode.com/paper/refurbishing-a-morphological-database-for |
Repo | |
Framework | |
Building a Cross-document Event-Event Relation Corpus
Title | Building a Cross-document Event-Event Relation Corpus |
Authors | Yu Hong, Tongtao Zhang, Tim O{'}Gorman, Sharone Horowit-Hendler, Heng Ji, Martha Palmer |
Abstract | |
Tasks | Knowledge Base Population |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/W16-1701/ |
https://www.aclweb.org/anthology/W16-1701 | |
PWC | https://paperswithcode.com/paper/building-a-cross-document-event-event |
Repo | |
Framework | |
Towards a QUD-Based Analysis of Gapping Constructions
Title | Towards a QUD-Based Analysis of Gapping Constructions |
Authors | Sang-Hee Park |
Abstract | |
Tasks | |
Published | 2016-10-01 |
URL | https://www.aclweb.org/anthology/Y16-2028/ |
https://www.aclweb.org/anthology/Y16-2028 | |
PWC | https://paperswithcode.com/paper/towards-a-qud-based-analysis-of-gapping |
Repo | |
Framework | |
Character-Aware Neural Networks for Arabic Named Entity Recognition for Social Media
Title | Character-Aware Neural Networks for Arabic Named Entity Recognition for Social Media |
Authors | Mourad Gridach |
Abstract | Named Entity Recognition (NER) is the task of classifying or labelling atomic elements in the text into categories such as Person, Location or Organisation. For Arabic language, recognizing named entities is a challenging task because of the complexity and the unique characteristics of this language. In addition, most of the previous work focuses on Modern Standard Arabic (MSA), however, recognizing named entities in social media is becoming more interesting these days. Dialectal Arabic (DA) and MSA are both used in social media, which is deemed as another challenging task. Most state-of-the-art Arabic NER systems count heavily on handcrafted engineering features and lexicons which is time consuming. In this paper, we introduce a novel neural network architecture which benefits both from character- and word-level representations automatically, by using combination of bidirectional LSTM and Conditional Random Field (CRF), eliminating the need for most feature engineering. Moreover, our model relies on unsupervised word representations learned from unannotated corpora. Experimental results demonstrate that our model achieves state-of-the-art performance on publicly available benchmark for Arabic NER for social media and surpassing the previous system by a large margin. |
Tasks | Feature Engineering, Information Retrieval, Machine Translation, Named Entity Recognition, Opinion Mining, Question Answering, Text Clustering |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/W16-3703/ |
https://www.aclweb.org/anthology/W16-3703 | |
PWC | https://paperswithcode.com/paper/character-aware-neural-networks-for-arabic |
Repo | |
Framework | |