Paper Group NANR 125
CaseSummarizer: A System for Automated Summarization of Legal Texts. The NTNU-YZU System in the AESW Shared Task: Automated Evaluation of Scientific Writing Using a Convolutional Neural Network. Detecting Word Usage Errors in Chinese Sentences for Learning Chinese as a Foreign Language. TLT-CRF: A Lexicon-supported Morphological Tagger for Latin Ba …
CaseSummarizer: A System for Automated Summarization of Legal Texts
Title | CaseSummarizer: A System for Automated Summarization of Legal Texts |
Authors | Seth Polsley, Pooja Jhunjhunwala, Ruihong Huang |
Abstract | Attorneys, judges, and others in the justice system are constantly surrounded by large amounts of legal text, which can be difficult to manage across many cases. We present CaseSummarizer, a tool for automated text summarization of legal documents which uses standard summary methods based on word frequency augmented with additional domain-specific knowledge. Summaries are then provided through an informative interface with abbreviations, significance heat maps, and other flexible controls. It is evaluated using ROUGE and human scoring against several other summarization systems, including summary text and feedback provided by domain experts. |
Tasks | Text Generation, Text Summarization |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/C16-2054/ |
https://www.aclweb.org/anthology/C16-2054 | |
PWC | https://paperswithcode.com/paper/casesummarizer-a-system-for-automated |
Repo | |
Framework | |
The NTNU-YZU System in the AESW Shared Task: Automated Evaluation of Scientific Writing Using a Convolutional Neural Network
Title | The NTNU-YZU System in the AESW Shared Task: Automated Evaluation of Scientific Writing Using a Convolutional Neural Network |
Authors | Lung-Hao Lee, Bo-Lin Lin, Liang-Chih Yu, Yuen-Hsien Tseng |
Abstract | |
Tasks | Grammatical Error Detection |
Published | 2016-06-01 |
URL | https://www.aclweb.org/anthology/W16-0513/ |
https://www.aclweb.org/anthology/W16-0513 | |
PWC | https://paperswithcode.com/paper/the-ntnu-yzu-system-in-the-aesw-shared-task |
Repo | |
Framework | |
Detecting Word Usage Errors in Chinese Sentences for Learning Chinese as a Foreign Language
Title | Detecting Word Usage Errors in Chinese Sentences for Learning Chinese as a Foreign Language |
Authors | Yow-Ting Shiue, Hsin-Hsi Chen |
Abstract | Automated grammatical error detection, which helps users improve their writing, is an important application in NLP. Recently more and more people are learning Chinese, and an automated error detection system can be helpful for the learners. This paper proposes n-gram features, dependency count features, dependency bigram features, and single-character features to determine if a Chinese sentence contains word usage errors, in which a word is written as a wrong form or the word selection is inappropriate. With marking potential errors on the level of sentence segments, typically delimited by punctuation marks, the learner can try to correct the problems without the assistant of a language teacher. Experiments on the HSK corpus show that the classifier combining all sets of features achieves an accuracy of 0.8423. By utilizing certain combination of the sets of features, we can construct a system that favors precision or recall. The best precision we achieve is 0.9536, indicating that our system is reliable and seldom produces misleading results. |
Tasks | Grammatical Error Detection |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1033/ |
https://www.aclweb.org/anthology/L16-1033 | |
PWC | https://paperswithcode.com/paper/detecting-word-usage-errors-in-chinese |
Repo | |
Framework | |
TLT-CRF: A Lexicon-supported Morphological Tagger for Latin Based on Conditional Random Fields
Title | TLT-CRF: A Lexicon-supported Morphological Tagger for Latin Based on Conditional Random Fields |
Authors | Tim vor der Br{"u}ck, Alex Mehler, er |
Abstract | We present a morphological tagger for Latin, called TTLab Latin Tagger based on Conditional Random Fields (TLT-CRF) which uses a large Latin lexicon. Beyond Part of Speech (PoS), TLT-CRF tags eight inflectional categories of verbs, adjectives or nouns. It utilizes a statistical model based on CRFs together with a rule interpreter that addresses scenarios of sparse training data. We present results of evaluating TLT-CRF to answer the question what can be learnt following the paradigm of 1st order CRFs in conjunction with a large lexical resource and a rule interpreter. Furthermore, we investigate the contigency of representational features and targeted parts of speech to learn about selective features. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1240/ |
https://www.aclweb.org/anthology/L16-1240 | |
PWC | https://paperswithcode.com/paper/tlt-crf-a-lexicon-supported-morphological |
Repo | |
Framework | |
Incremental Fine-grained Information Status Classification Using Attention-based LSTMs
Title | Incremental Fine-grained Information Status Classification Using Attention-based LSTMs |
Authors | Yufang Hou |
Abstract | Information status plays an important role in discourse processing. According to the hearer{'}s common sense knowledge and his comprehension of the preceding text, a discourse entity could be old, mediated or new. In this paper, we propose an attention-based LSTM model to address the problem of fine-grained information status classification in an incremental manner. Our approach resembles how human beings process the task, i.e., decide the information status of the current discourse entity based on its preceding context. Experimental results on the ISNotes corpus (Markert et al., 2012) reveal that (1) despite its moderate result, our model with only word embedding features captures the necessary semantic knowledge needed for the task by a large extent; and (2) when incorporating with additional several simple features, our model achieves the competitive results compared to the state-of-the-art approach (Hou et al., 2013) which heavily depends on lots of hand-crafted semantic features. |
Tasks | Common Sense Reasoning |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/C16-1177/ |
https://www.aclweb.org/anthology/C16-1177 | |
PWC | https://paperswithcode.com/paper/incremental-fine-grained-information-status |
Repo | |
Framework | |
From built examples to attested examples: a syntax-based query for non-specialists
Title | From built examples to attested examples: a syntax-based query for non-specialists |
Authors | Ilaine Wang, Sylvain Kahane, Isabelle Tellier |
Abstract | |
Tasks | |
Published | 2016-10-01 |
URL | https://www.aclweb.org/anthology/Y16-3011/ |
https://www.aclweb.org/anthology/Y16-3011 | |
PWC | https://paperswithcode.com/paper/from-built-examples-to-attested-examples-a |
Repo | |
Framework | |
Beyond Plain Spatial Knowledge: Determining Where Entities Are and Are Not Located, and For How Long
Title | Beyond Plain Spatial Knowledge: Determining Where Entities Are and Are Not Located, and For How Long |
Authors | Alakan Vempala, a, Eduardo Blanco |
Abstract | |
Tasks | Coreference Resolution, Question Answering |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/P16-1142/ |
https://www.aclweb.org/anthology/P16-1142 | |
PWC | https://paperswithcode.com/paper/beyond-plain-spatial-knowledge-determining |
Repo | |
Framework | |
Universal Dependencies: A Cross-Linguistic Perspective on Grammar and Lexicon
Title | Universal Dependencies: A Cross-Linguistic Perspective on Grammar and Lexicon |
Authors | Joakim Nivre |
Abstract | Universal Dependencies is an initiative to develop cross-linguistically consistent grammatical annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning and parsing research from a language typology perspective. It assumes a dependency-based approach to syntax and a lexicalist approach to morphology, which together entail that the fundamental units of grammatical annotation are words. Words have properties captured by morphological annotation and enter into relations captured by syntactic annotation. Moreover, priority is given to relations between lexical content words, as opposed to grammatical function words. In this position paper, I discuss how this approach allows us to capture similarities and differences across typologically diverse languages. |
Tasks | |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/W16-3806/ |
https://www.aclweb.org/anthology/W16-3806 | |
PWC | https://paperswithcode.com/paper/universal-dependencies-a-cross-linguistic |
Repo | |
Framework | |
Reading-Time Annotations for ``Balanced Corpus of Contemporary Written Japanese’’
Title | Reading-Time Annotations for ``Balanced Corpus of Contemporary Written Japanese’’ | |
Authors | Masayuki Asahara, Hajime Ono, Edson T. Miyamoto |
Abstract | The Dundee Eyetracking Corpus contains eyetracking data collected while native speakers of English and French read newspaper editorial articles. Similar resources for other languages are still rare, especially for languages in which words are not overtly delimited with spaces. This is a report on a project to build an eyetracking corpus for Japanese. Measurements were collected while 24 native speakers of Japanese read excerpts from the Balanced Corpus of Contemporary Written Japanese Texts were presented with or without segmentation (i.e. with or without space at the boundaries between bunsetsu segmentations) and with two types of methodologies (eyetracking and self-paced reading presentation). Readers{'} background information including vocabulary-size estimation and Japanese reading-span score were also collected. As an example of the possible uses for the corpus, we also report analyses investigating the phenomena of anti-locality. |
Tasks | |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/C16-1066/ |
https://www.aclweb.org/anthology/C16-1066 | |
PWC | https://paperswithcode.com/paper/reading-time-annotations-for-balanced-corpus |
Repo | |
Framework | |
An Aligned French-Chinese corpus of 10K segments from university educational material
Title | An Aligned French-Chinese corpus of 10K segments from university educational material |
Authors | Ruslan Kalitvianski, Lingxiao Wang, Val{'e}rie Bellynck, Christian Boitet |
Abstract | This paper describes a corpus of nearly 10K French-Chinese aligned segments, produced by post-editing machine translated computer science courseware. This corpus was built from 2013 to 2016 within the PROJECT{_}NAME project, by native Chinese students. The quality, as judged by native speakers, is ad-equate for understanding (far better than by reading only the original French) and for getting better marks. This corpus is annotated at segment-level by a self-assessed quality score. It has been directly used as supplemental training data to build a statistical machine translation system dedicated to that sublanguage, and can be used to extract the specific bilingual terminology. To our knowledge, it is the first corpus of this kind to be released. |
Tasks | Machine Translation |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/W16-4915/ |
https://www.aclweb.org/anthology/W16-4915 | |
PWC | https://paperswithcode.com/paper/an-aligned-french-chinese-corpus-of-10k |
Repo | |
Framework | |
Automatic Corpus Extension for Data-driven Natural Language Generation
Title | Automatic Corpus Extension for Data-driven Natural Language Generation |
Authors | Elena Manishina, Bassam Jabaian, St{'e}phane Huet, Fabrice Lef{`e}vre |
Abstract | As data-driven approaches started to make their way into the Natural Language Generation (NLG) domain, the need for automation of corpus building and extension became apparent. Corpus creation and extension in data-driven NLG domain traditionally involved manual paraphrasing performed by either a group of experts or with resort to crowd-sourcing. Building the training corpora manually is a costly enterprise which requires a lot of time and human resources. We propose to automate the process of corpus extension by integrating automatically obtained synonyms and paraphrases. Our methodology allowed us to significantly increase the size of the training corpus and its level of variability (the number of distinct tokens and specific syntactic structures). Our extension solutions are fully automatic and require only some initial validation. The human evaluation results confirm that in many cases native speakers favor the outputs of the model built on the extended corpus. |
Tasks | Text Generation |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1575/ |
https://www.aclweb.org/anthology/L16-1575 | |
PWC | https://paperswithcode.com/paper/automatic-corpus-extension-for-data-driven |
Repo | |
Framework | |
Generating a Yiddish Speech Corpus, Forced Aligner and Basic ASR System for the AHEYM Project
Title | Generating a Yiddish Speech Corpus, Forced Aligner and Basic ASR System for the AHEYM Project |
Authors | Malgorzata {'C}avar, Damir {'C}avar, Dov-Ber Kerler, Anya Quilitzsch |
Abstract | To create automatic transcription and annotation tools for the AHEYM corpus of recorded interviews with Yiddish speakers in Eastern Europe we develop initial Yiddish language resources that are used for adaptations of speech and language technologies. Our project aims at the development of resources and technologies that can make the entire AHEYM corpus and other Yiddish resources more accessible to not only the community of Yiddish speakers or linguists with language expertise, but also historians and experts from other disciplines or the general public. In this paper we describe the rationale behind our approach, the procedures and methods, and challenges that are not specific to the AHEYM corpus, but apply to all documentary language data that is collected in the field. To the best of our knowledge, this is the first attempt to create a speech corpus and speech technologies for Yiddish. This is also the first attempt to work out speech and language technologies to transcribe and translate a large collection of Yiddish spoken language resources. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1744/ |
https://www.aclweb.org/anthology/L16-1744 | |
PWC | https://paperswithcode.com/paper/generating-a-yiddish-speech-corpus-forced |
Repo | |
Framework | |
LSTM CCG Parsing
Title | LSTM CCG Parsing |
Authors | Mike Lewis, Kenton Lee, Luke Zettlemoyer |
Abstract | |
Tasks | CCG Supertagging, Structured Prediction, Word Embeddings |
Published | 2016-06-01 |
URL | https://www.aclweb.org/anthology/N16-1026/ |
https://www.aclweb.org/anthology/N16-1026 | |
PWC | https://paperswithcode.com/paper/lstm-ccg-parsing |
Repo | |
Framework | |
Dynamic Feature Induction: The Last Gist to the State-of-the-Art
Title | Dynamic Feature Induction: The Last Gist to the State-of-the-Art |
Authors | Jinho D. Choi |
Abstract | |
Tasks | Feature Engineering, Named Entity Recognition, Part-Of-Speech Tagging |
Published | 2016-06-01 |
URL | https://www.aclweb.org/anthology/N16-1031/ |
https://www.aclweb.org/anthology/N16-1031 | |
PWC | https://paperswithcode.com/paper/dynamic-feature-induction-the-last-gist-to |
Repo | |
Framework | |
Zara: A Virtual Interactive Dialogue System Incorporating Emotion, Sentiment and Personality Recognition
Title | Zara: A Virtual Interactive Dialogue System Incorporating Emotion, Sentiment and Personality Recognition |
Authors | Pascale Fung, Anik Dey, Farhad Bin Siddique, Ruixi Lin, Yang Yang, Dario Bertero, Yan Wan, Ricky Ho Yin Chan, Chien-Sheng Wu |
Abstract | Zara, or {`}Zara the Supergirl{'} is a virtual robot, that can exhibit empathy while interacting with an user, with the aid of its built in facial and emotion recognition, sentiment analysis, and speech module. At the end of the 5-10 minute conversation, Zara can give a personality analysis of the user based on all the user utterances. We have also implemented a real-time emotion recognition, using a CNN model that detects emotion from raw audio without feature extraction, and have achieved an average of 65.7{%} accuracy on six different emotion classes, which is an impressive 4.5{%} improvement from the conventional feature based SVM classification. Also, we have described a CNN based sentiment analysis module trained using out-of-domain data, that recognizes sentiment from the speech recognition transcript, which has a 74.8 F-measure when tested on human-machine dialogues. | |
Tasks | Emotion Recognition, Feature Engineering, Sentiment Analysis, Speech Recognition |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/C16-2058/ |
https://www.aclweb.org/anthology/C16-2058 | |
PWC | https://paperswithcode.com/paper/zara-a-virtual-interactive-dialogue-system |
Repo | |
Framework | |