Paper Group NANR 189
Consistent Word Segmentation, Part-of-Speech Tagging and Dependency Labelling Annotation for Chinese Language. Splitting compounds with ngrams. Using phone features to improve dialogue state tracking generalisation to unseen states. LVCSR System on a Hybrid GPU-CPU Embedded Platform for Real-Time Dialog Applications. Unravelling Names of Fictional …
Consistent Word Segmentation, Part-of-Speech Tagging and Dependency Labelling Annotation for Chinese Language
Title | Consistent Word Segmentation, Part-of-Speech Tagging and Dependency Labelling Annotation for Chinese Language |
Authors | Mo Shen, Wingmui Li, HyunJeong Choe, Chenhui Chu, Daisuke Kawahara, Sadao Kurohashi |
Abstract | In this paper, we propose a new annotation approach to Chinese word segmentation, part-of-speech (POS) tagging and dependency labelling that aims to overcome the two major issues in traditional morphology-based annotation: Inconsistency and data sparsity. We re-annotate the Penn Chinese Treebank 5.0 (CTB5) and demonstrate the advantages of this approach compared to the original CTB5 annotation through word segmentation, POS tagging and machine translation experiments. |
Tasks | Chinese Word Segmentation, Machine Translation, Morphological Analysis, Part-Of-Speech Tagging |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/C16-1029/ |
https://www.aclweb.org/anthology/C16-1029 | |
PWC | https://paperswithcode.com/paper/consistent-word-segmentation-part-of-speech |
Repo | |
Framework | |
Splitting compounds with ngrams
Title | Splitting compounds with ngrams |
Authors | Naomi Tachikawa Shapiro |
Abstract | Compound words with unmarked word boundaries are problematic for many tasks in NLP and computational linguistics, including information extraction, machine translation, and syllabification. This paper introduces a simple, proof-of-concept language modeling approach to automatic compound segmentation, as applied to Finnish. This approach utilizes an off-the-shelf morphological analyzer to split training words into their constituent morphemes. A language model is subsequently trained on ngrams composed of morphemes, morpheme boundaries, and word boundaries. Linguistic constraints are then used to weed out phonotactically ill-formed segmentations, thereby allowing the language model to select the best grammatical segmentation. This approach achieves an accuracy of {\textasciitilde}97{%}. |
Tasks | Language Modelling, Machine Translation, Morphological Analysis, Semantic Parsing |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/C16-1061/ |
https://www.aclweb.org/anthology/C16-1061 | |
PWC | https://paperswithcode.com/paper/splitting-compounds-with-ngrams |
Repo | |
Framework | |
Using phone features to improve dialogue state tracking generalisation to unseen states
Title | Using phone features to improve dialogue state tracking generalisation to unseen states |
Authors | I{~n}igo Casanueva, Thomas Hain, Mauro Nicolao, Phil Green |
Abstract | |
Tasks | Dialogue State Tracking, Spoken Language Understanding |
Published | 2016-09-01 |
URL | https://www.aclweb.org/anthology/W16-3611/ |
https://www.aclweb.org/anthology/W16-3611 | |
PWC | https://paperswithcode.com/paper/using-phone-features-to-improve-dialogue |
Repo | |
Framework | |
LVCSR System on a Hybrid GPU-CPU Embedded Platform for Real-Time Dialog Applications
Title | LVCSR System on a Hybrid GPU-CPU Embedded Platform for Real-Time Dialog Applications |
Authors | Alexei V. Ivanov, Patrick L. Lange, David Suendermann-Oeft |
Abstract | |
Tasks | Language Modelling, Large Vocabulary Continuous Speech Recognition, Speech Recognition, Spoken Language Understanding |
Published | 2016-09-01 |
URL | https://www.aclweb.org/anthology/W16-3627/ |
https://www.aclweb.org/anthology/W16-3627 | |
PWC | https://paperswithcode.com/paper/lvcsr-system-on-a-hybrid-gpu-cpu-embedded |
Repo | |
Framework | |
Unravelling Names of Fictional Characters
Title | Unravelling Names of Fictional Characters |
Authors | Katerina Papantoniou, Stasinos Konstantopoulos |
Abstract | |
Tasks | |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/P16-1203/ |
https://www.aclweb.org/anthology/P16-1203 | |
PWC | https://paperswithcode.com/paper/unravelling-names-of-fictional-characters |
Repo | |
Framework | |
Analysing the Integration of Semantic Web Features for Document Planning across Genres
Title | Analysing the Integration of Semantic Web Features for Document Planning across Genres |
Authors | Marta Vicente, Elena Lloret |
Abstract | |
Tasks | Text Generation |
Published | 2016-09-01 |
URL | https://www.aclweb.org/anthology/W16-3513/ |
https://www.aclweb.org/anthology/W16-3513 | |
PWC | https://paperswithcode.com/paper/analysing-the-integration-of-semantic-web |
Repo | |
Framework | |
Fast Coupled Sequence Labeling on Heterogeneous Annotations via Context-aware Pruning
Title | Fast Coupled Sequence Labeling on Heterogeneous Annotations via Context-aware Pruning |
Authors | Zhenghua Li, Jiayuan Chao, Min Zhang, Jiwen Yang |
Abstract | |
Tasks | |
Published | 2016-11-01 |
URL | https://www.aclweb.org/anthology/D16-1072/ |
https://www.aclweb.org/anthology/D16-1072 | |
PWC | https://paperswithcode.com/paper/fast-coupled-sequence-labeling-on |
Repo | |
Framework | |
Identifying Individual Differences in Gender, Ethnicity, and Personality from Dialogue for Deception Detection
Title | Identifying Individual Differences in Gender, Ethnicity, and Personality from Dialogue for Deception Detection |
Authors | Sarah Ita Levitan, Yocheved Levitan, Guozhen An, Michelle Levine, Rivka Levitan, Andrew Rosenberg, Julia Hirschberg |
Abstract | |
Tasks | Deception Detection |
Published | 2016-06-01 |
URL | https://www.aclweb.org/anthology/W16-0806/ |
https://www.aclweb.org/anthology/W16-0806 | |
PWC | https://paperswithcode.com/paper/identifying-individual-differences-in-gender |
Repo | |
Framework | |
Zero-resource Dependency Parsing: Boosting Delexicalized Cross-lingual Transfer with Linguistic Knowledge
Title | Zero-resource Dependency Parsing: Boosting Delexicalized Cross-lingual Transfer with Linguistic Knowledge |
Authors | Lauriane Aufrant, Guillaume Wisniewski, Fran{\c{c}}ois Yvon |
Abstract | This paper studies cross-lingual transfer for dependency parsing, focusing on very low-resource settings where delexicalized transfer is the only fully automatic option. We show how to boost parsing performance by rewriting the source sentences so as to better match the linguistic regularities of the target language. We contrast a data-driven approach with an approach relying on linguistically motivated rules automatically extracted from the World Atlas of Language Structures. Our findings are backed up by experiments involving 40 languages. They show that both approaches greatly outperform the baseline, the knowledge-driven method yielding the best accuracies, with average improvements of +2.9 UAS, and up to +90 UAS (absolute) on some frequent PoS configurations. |
Tasks | Active Learning, Cross-Lingual Transfer, Dependency Parsing, Machine Translation |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/C16-1012/ |
https://www.aclweb.org/anthology/C16-1012 | |
PWC | https://paperswithcode.com/paper/zero-resource-dependency-parsing-boosting |
Repo | |
Framework | |
Political News Sentiment Analysis for Under-resourced Languages
Title | Political News Sentiment Analysis for Under-resourced Languages |
Authors | Patrik F. Bakken, Terje A. Bratlie, Cristina Marco, Jon Atle Gulla |
Abstract | This paper presents classification results for the analysis of sentiment in political news articles. The domain of political news is particularly challenging, as journalists are presumably objective, whilst at the same time opinions can be subtly expressed. To deal with this challenge, in this work we conduct a two-step classification model, distinguishing first subjective and second positive and negative sentiment texts. More specifically, we propose a shallow machine learning approach where only minimal features are needed to train the classifier, including sentiment-bearing Co-Occurring Terms (COTs) and negation words. This approach yields close to state-of-the-art results. Contrary to results in other domains, the use of negations as features does not have a positive impact in the evaluation results. This method is particularly suited for languages that suffer from a lack of resources, such as sentiment lexicons or parsers, and for those systems that need to function in real-time. |
Tasks | Sentiment Analysis |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/C16-1281/ |
https://www.aclweb.org/anthology/C16-1281 | |
PWC | https://paperswithcode.com/paper/political-news-sentiment-analysis-for-under |
Repo | |
Framework | |
Discontinuity (Re)\mbox$^2$-visited: A Minimalist Approach to Pseudoprojective Constituent Parsing
Title | Discontinuity (Re)\mbox$^2$-visited: A Minimalist Approach to Pseudoprojective Constituent Parsing |
Authors | Yannick Versley |
Abstract | |
Tasks | Constituency Parsing, Dependency Parsing |
Published | 2016-06-01 |
URL | https://www.aclweb.org/anthology/W16-0907/ |
https://www.aclweb.org/anthology/W16-0907 | |
PWC | https://paperswithcode.com/paper/discontinuity-re-visited-a-minimalist |
Repo | |
Framework | |
Finding metaphorical triggers through source (not target) domain lexicalization patterns
Title | Finding metaphorical triggers through source (not target) domain lexicalization patterns |
Authors | Jenny Lederer |
Abstract | |
Tasks | |
Published | 2016-06-01 |
URL | https://www.aclweb.org/anthology/W16-1101/ |
https://www.aclweb.org/anthology/W16-1101 | |
PWC | https://paperswithcode.com/paper/finding-metaphorical-triggers-through-source |
Repo | |
Framework | |
Legacy language atlas data mining: mapping Kru languages
Title | Legacy language atlas data mining: mapping Kru languages |
Authors | Dafydd Gibbon |
Abstract | An online tool based on dialectometric methods, DistGraph, is applied to a group of Kru languages of C{^o}te d{'}Ivoire, Liberia and Burkina Faso. The inputs to this resource consist of tables of languages x linguistic features (e.g. phonological, lexical or grammatical), and statistical and graphical outputs are generated which show similarities and differences between the languages in terms of the features as virtual distances. In the present contribution, attention is focussed on the consonant systems of the languages, a traditional starting point for language comparison. The data are harvested from a legacy language data resource based on fieldwork in the 1970s and 1980s, a language atlas of the Kru languages. The method on which the online tool is based extends beyond documentation of individual languages to the documentation of language groups, and supports difference-based prioritisation in education programmes, decisions on language policy and documentation and conservation funding, as well as research on language typology and heritage documentation of history and migration. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1515/ |
https://www.aclweb.org/anthology/L16-1515 | |
PWC | https://paperswithcode.com/paper/legacy-language-atlas-data-mining-mapping-kru |
Repo | |
Framework | |
Language Resource Addition Strategies for Raw Text Parsing
Title | Language Resource Addition Strategies for Raw Text Parsing |
Authors | Atsushi Ushiku, Tetsuro Sasada, Shinsuke Mori |
Abstract | We focus on the improvement of accuracy of raw text parsing, from the viewpoint of language resource addition. In Japanese, the raw text parsing is divided into three steps: word segmentation, part-of-speech tagging, and dependency parsing. We investigate the contribution of language resource addition in each of three steps to the improvement in accuracy for two domain corpora. The experimental results show that this improvement depends on the target domain. For example, when we handle well-written texts of limited vocabulary, white paper, an effective language resource is a word-POS pair sequence corpus for the parsing accuracy. So we conclude that it is important to check out the characteristics of the target domain and to choose a suitable language resource addition strategy for the parsing accuracy improvement. |
Tasks | Dependency Parsing, Part-Of-Speech Tagging |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1105/ |
https://www.aclweb.org/anthology/L16-1105 | |
PWC | https://paperswithcode.com/paper/language-resource-addition-strategies-for-raw |
Repo | |
Framework | |
Most babies'' are
little’’ and most problems'' are
huge’': Compositional Entailment in Adjective-Nouns
Title | Most babies'' are little’’ and most problems'' are huge’': Compositional Entailment in Adjective-Nouns |
Authors | Ellie Pavlick, Chris Callison-Burch |
Abstract | |
Tasks | Common Sense Reasoning, Natural Language Inference |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/P16-1204/ |
https://www.aclweb.org/anthology/P16-1204 | |
PWC | https://paperswithcode.com/paper/most-babies-are-little-and-most-problems-are |
Repo | |
Framework | |