May 5, 2019

2046 words 10 mins read

Paper Group NANR 125

CaseSummarizer: A System for Automated Summarization of Legal Texts. The NTNU-YZU System in the AESW Shared Task: Automated Evaluation of Scientific Writing Using a Convolutional Neural Network. Detecting Word Usage Errors in Chinese Sentences for Learning Chinese as a Foreign Language. TLT-CRF: A Lexicon-supported Morphological Tagger for Latin Ba …

CaseSummarizer: A System for Automated Summarization of Legal Texts


Title	CaseSummarizer: A System for Automated Summarization of Legal Texts
Authors	Seth Polsley, Pooja Jhunjhunwala, Ruihong Huang
Abstract	Attorneys, judges, and others in the justice system are constantly surrounded by large amounts of legal text, which can be difficult to manage across many cases. We present CaseSummarizer, a tool for automated text summarization of legal documents which uses standard summary methods based on word frequency augmented with additional domain-specific knowledge. Summaries are then provided through an informative interface with abbreviations, significance heat maps, and other flexible controls. It is evaluated using ROUGE and human scoring against several other summarization systems, including summary text and feedback provided by domain experts.
Tasks	Text Generation, Text Summarization
Published	2016-12-01
URL	https://www.aclweb.org/anthology/C16-2054/
PDF	https://www.aclweb.org/anthology/C16-2054
PWC	https://paperswithcode.com/paper/casesummarizer-a-system-for-automated
Repo
Framework

The NTNU-YZU System in the AESW Shared Task: Automated Evaluation of Scientific Writing Using a Convolutional Neural Network


Title	The NTNU-YZU System in the AESW Shared Task: Automated Evaluation of Scientific Writing Using a Convolutional Neural Network
Authors	Lung-Hao Lee, Bo-Lin Lin, Liang-Chih Yu, Yuen-Hsien Tseng
Abstract
Tasks	Grammatical Error Detection
Published	2016-06-01
URL	https://www.aclweb.org/anthology/W16-0513/
PDF	https://www.aclweb.org/anthology/W16-0513
PWC	https://paperswithcode.com/paper/the-ntnu-yzu-system-in-the-aesw-shared-task
Repo
Framework

Detecting Word Usage Errors in Chinese Sentences for Learning Chinese as a Foreign Language


Title	Detecting Word Usage Errors in Chinese Sentences for Learning Chinese as a Foreign Language
Authors	Yow-Ting Shiue, Hsin-Hsi Chen
Abstract	Automated grammatical error detection, which helps users improve their writing, is an important application in NLP. Recently more and more people are learning Chinese, and an automated error detection system can be helpful for the learners. This paper proposes n-gram features, dependency count features, dependency bigram features, and single-character features to determine if a Chinese sentence contains word usage errors, in which a word is written as a wrong form or the word selection is inappropriate. With marking potential errors on the level of sentence segments, typically delimited by punctuation marks, the learner can try to correct the problems without the assistant of a language teacher. Experiments on the HSK corpus show that the classifier combining all sets of features achieves an accuracy of 0.8423. By utilizing certain combination of the sets of features, we can construct a system that favors precision or recall. The best precision we achieve is 0.9536, indicating that our system is reliable and seldom produces misleading results.
Tasks	Grammatical Error Detection
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1033/
PDF	https://www.aclweb.org/anthology/L16-1033
PWC	https://paperswithcode.com/paper/detecting-word-usage-errors-in-chinese
Repo
Framework

TLT-CRF: A Lexicon-supported Morphological Tagger for Latin Based on Conditional Random Fields


Title	TLT-CRF: A Lexicon-supported Morphological Tagger for Latin Based on Conditional Random Fields
Authors	Tim vor der Br{"u}ck, Alex Mehler, er
Abstract	We present a morphological tagger for Latin, called TTLab Latin Tagger based on Conditional Random Fields (TLT-CRF) which uses a large Latin lexicon. Beyond Part of Speech (PoS), TLT-CRF tags eight inflectional categories of verbs, adjectives or nouns. It utilizes a statistical model based on CRFs together with a rule interpreter that addresses scenarios of sparse training data. We present results of evaluating TLT-CRF to answer the question what can be learnt following the paradigm of 1st order CRFs in conjunction with a large lexical resource and a rule interpreter. Furthermore, we investigate the contigency of representational features and targeted parts of speech to learn about selective features.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1240/
PDF	https://www.aclweb.org/anthology/L16-1240
PWC	https://paperswithcode.com/paper/tlt-crf-a-lexicon-supported-morphological
Repo
Framework

Incremental Fine-grained Information Status Classification Using Attention-based LSTMs


Title	Incremental Fine-grained Information Status Classification Using Attention-based LSTMs
Authors	Yufang Hou
Abstract	Information status plays an important role in discourse processing. According to the hearer{'}s common sense knowledge and his comprehension of the preceding text, a discourse entity could be old, mediated or new. In this paper, we propose an attention-based LSTM model to address the problem of fine-grained information status classification in an incremental manner. Our approach resembles how human beings process the task, i.e., decide the information status of the current discourse entity based on its preceding context. Experimental results on the ISNotes corpus (Markert et al., 2012) reveal that (1) despite its moderate result, our model with only word embedding features captures the necessary semantic knowledge needed for the task by a large extent; and (2) when incorporating with additional several simple features, our model achieves the competitive results compared to the state-of-the-art approach (Hou et al., 2013) which heavily depends on lots of hand-crafted semantic features.
Tasks	Common Sense Reasoning
Published	2016-12-01
URL	https://www.aclweb.org/anthology/C16-1177/
PDF	https://www.aclweb.org/anthology/C16-1177
PWC	https://paperswithcode.com/paper/incremental-fine-grained-information-status
Repo
Framework

From built examples to attested examples: a syntax-based query for non-specialists


Title	From built examples to attested examples: a syntax-based query for non-specialists
Authors	Ilaine Wang, Sylvain Kahane, Isabelle Tellier
Abstract
Tasks
Published	2016-10-01
URL	https://www.aclweb.org/anthology/Y16-3011/
PDF	https://www.aclweb.org/anthology/Y16-3011
PWC	https://paperswithcode.com/paper/from-built-examples-to-attested-examples-a
Repo
Framework

Beyond Plain Spatial Knowledge: Determining Where Entities Are and Are Not Located, and For How Long


Title	Beyond Plain Spatial Knowledge: Determining Where Entities Are and Are Not Located, and For How Long
Authors	Alakan Vempala, a, Eduardo Blanco
Abstract
Tasks	Coreference Resolution, Question Answering
Published	2016-08-01
URL	https://www.aclweb.org/anthology/P16-1142/
PDF	https://www.aclweb.org/anthology/P16-1142
PWC	https://paperswithcode.com/paper/beyond-plain-spatial-knowledge-determining
Repo
Framework

Universal Dependencies: A Cross-Linguistic Perspective on Grammar and Lexicon


Title	Universal Dependencies: A Cross-Linguistic Perspective on Grammar and Lexicon
Authors	Joakim Nivre
Abstract	Universal Dependencies is an initiative to develop cross-linguistically consistent grammatical annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning and parsing research from a language typology perspective. It assumes a dependency-based approach to syntax and a lexicalist approach to morphology, which together entail that the fundamental units of grammatical annotation are words. Words have properties captured by morphological annotation and enter into relations captured by syntactic annotation. Moreover, priority is given to relations between lexical content words, as opposed to grammatical function words. In this position paper, I discuss how this approach allows us to capture similarities and differences across typologically diverse languages.
Tasks
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-3806/
PDF	https://www.aclweb.org/anthology/W16-3806
PWC	https://paperswithcode.com/paper/universal-dependencies-a-cross-linguistic
Repo
Framework

Reading-Time Annotations for ``Balanced Corpus of Contemporary Written Japanese’’


Title	Reading-Time Annotations for ``Balanced Corpus of Contemporary Written Japanese’’ \|
Authors	Masayuki Asahara, Hajime Ono, Edson T. Miyamoto
Abstract	The Dundee Eyetracking Corpus contains eyetracking data collected while native speakers of English and French read newspaper editorial articles. Similar resources for other languages are still rare, especially for languages in which words are not overtly delimited with spaces. This is a report on a project to build an eyetracking corpus for Japanese. Measurements were collected while 24 native speakers of Japanese read excerpts from the Balanced Corpus of Contemporary Written Japanese Texts were presented with or without segmentation (i.e. with or without space at the boundaries between bunsetsu segmentations) and with two types of methodologies (eyetracking and self-paced reading presentation). Readers{'} background information including vocabulary-size estimation and Japanese reading-span score were also collected. As an example of the possible uses for the corpus, we also report analyses investigating the phenomena of anti-locality.
Tasks
Published	2016-12-01
URL	https://www.aclweb.org/anthology/C16-1066/
PDF	https://www.aclweb.org/anthology/C16-1066
PWC	https://paperswithcode.com/paper/reading-time-annotations-for-balanced-corpus
Repo
Framework

An Aligned French-Chinese corpus of 10K segments from university educational material


Title	An Aligned French-Chinese corpus of 10K segments from university educational material
Authors	Ruslan Kalitvianski, Lingxiao Wang, Val{'e}rie Bellynck, Christian Boitet
Abstract	This paper describes a corpus of nearly 10K French-Chinese aligned segments, produced by post-editing machine translated computer science courseware. This corpus was built from 2013 to 2016 within the PROJECT{_}NAME project, by native Chinese students. The quality, as judged by native speakers, is ad-equate for understanding (far better than by reading only the original French) and for getting better marks. This corpus is annotated at segment-level by a self-assessed quality score. It has been directly used as supplemental training data to build a statistical machine translation system dedicated to that sublanguage, and can be used to extract the specific bilingual terminology. To our knowledge, it is the first corpus of this kind to be released.
Tasks	Machine Translation
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-4915/
PDF	https://www.aclweb.org/anthology/W16-4915
PWC	https://paperswithcode.com/paper/an-aligned-french-chinese-corpus-of-10k
Repo
Framework

Automatic Corpus Extension for Data-driven Natural Language Generation


Title	Automatic Corpus Extension for Data-driven Natural Language Generation
Authors	Elena Manishina, Bassam Jabaian, St{'e}phane Huet, Fabrice Lef{`e}vre
Abstract	As data-driven approaches started to make their way into the Natural Language Generation (NLG) domain, the need for automation of corpus building and extension became apparent. Corpus creation and extension in data-driven NLG domain traditionally involved manual paraphrasing performed by either a group of experts or with resort to crowd-sourcing. Building the training corpora manually is a costly enterprise which requires a lot of time and human resources. We propose to automate the process of corpus extension by integrating automatically obtained synonyms and paraphrases. Our methodology allowed us to significantly increase the size of the training corpus and its level of variability (the number of distinct tokens and specific syntactic structures). Our extension solutions are fully automatic and require only some initial validation. The human evaluation results confirm that in many cases native speakers favor the outputs of the model built on the extended corpus.
Tasks	Text Generation
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1575/
PDF	https://www.aclweb.org/anthology/L16-1575
PWC	https://paperswithcode.com/paper/automatic-corpus-extension-for-data-driven
Repo
Framework

Generating a Yiddish Speech Corpus, Forced Aligner and Basic ASR System for the AHEYM Project


Title	Generating a Yiddish Speech Corpus, Forced Aligner and Basic ASR System for the AHEYM Project
Authors	Malgorzata {'C}avar, Damir {'C}avar, Dov-Ber Kerler, Anya Quilitzsch
Abstract	To create automatic transcription and annotation tools for the AHEYM corpus of recorded interviews with Yiddish speakers in Eastern Europe we develop initial Yiddish language resources that are used for adaptations of speech and language technologies. Our project aims at the development of resources and technologies that can make the entire AHEYM corpus and other Yiddish resources more accessible to not only the community of Yiddish speakers or linguists with language expertise, but also historians and experts from other disciplines or the general public. In this paper we describe the rationale behind our approach, the procedures and methods, and challenges that are not specific to the AHEYM corpus, but apply to all documentary language data that is collected in the field. To the best of our knowledge, this is the first attempt to create a speech corpus and speech technologies for Yiddish. This is also the first attempt to work out speech and language technologies to transcribe and translate a large collection of Yiddish spoken language resources.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1744/
PDF	https://www.aclweb.org/anthology/L16-1744
PWC	https://paperswithcode.com/paper/generating-a-yiddish-speech-corpus-forced
Repo
Framework

LSTM CCG Parsing


Title	LSTM CCG Parsing
Authors	Mike Lewis, Kenton Lee, Luke Zettlemoyer
Abstract
Tasks	CCG Supertagging, Structured Prediction, Word Embeddings
Published	2016-06-01
URL	https://www.aclweb.org/anthology/N16-1026/
PDF	https://www.aclweb.org/anthology/N16-1026
PWC	https://paperswithcode.com/paper/lstm-ccg-parsing
Repo
Framework

Dynamic Feature Induction: The Last Gist to the State-of-the-Art


Title	Dynamic Feature Induction: The Last Gist to the State-of-the-Art
Authors	Jinho D. Choi
Abstract
Tasks	Feature Engineering, Named Entity Recognition, Part-Of-Speech Tagging
Published	2016-06-01
URL	https://www.aclweb.org/anthology/N16-1031/
PDF	https://www.aclweb.org/anthology/N16-1031
PWC	https://paperswithcode.com/paper/dynamic-feature-induction-the-last-gist-to
Repo
Framework

Zara: A Virtual Interactive Dialogue System Incorporating Emotion, Sentiment and Personality Recognition


Title	Zara: A Virtual Interactive Dialogue System Incorporating Emotion, Sentiment and Personality Recognition
Authors	Pascale Fung, Anik Dey, Farhad Bin Siddique, Ruixi Lin, Yang Yang, Dario Bertero, Yan Wan, Ricky Ho Yin Chan, Chien-Sheng Wu
Abstract	Zara, or {`}Zara the Supergirl{'} is a virtual robot, that can exhibit empathy while interacting with an user, with the aid of its built in facial and emotion recognition, sentiment analysis, and speech module. At the end of the 5-10 minute conversation, Zara can give a personality analysis of the user based on all the user utterances. We have also implemented a real-time emotion recognition, using a CNN model that detects emotion from raw audio without feature extraction, and have achieved an average of 65.7{%} accuracy on six different emotion classes, which is an impressive 4.5{%} improvement from the conventional feature based SVM classification. Also, we have described a CNN based sentiment analysis module trained using out-of-domain data, that recognizes sentiment from the speech recognition transcript, which has a 74.8 F-measure when tested on human-machine dialogues. \|
Tasks	Emotion Recognition, Feature Engineering, Sentiment Analysis, Speech Recognition
Published	2016-12-01
URL	https://www.aclweb.org/anthology/C16-2058/
PDF	https://www.aclweb.org/anthology/C16-2058
PWC	https://paperswithcode.com/paper/zara-a-virtual-interactive-dialogue-system
Repo
Framework