May 5, 2019

2046 words 10 mins read

Paper Group NANR 125

Paper Group NANR 125

CaseSummarizer: A System for Automated Summarization of Legal Texts. The NTNU-YZU System in the AESW Shared Task: Automated Evaluation of Scientific Writing Using a Convolutional Neural Network. Detecting Word Usage Errors in Chinese Sentences for Learning Chinese as a Foreign Language. TLT-CRF: A Lexicon-supported Morphological Tagger for Latin Ba …

Title CaseSummarizer: A System for Automated Summarization of Legal Texts
Authors Seth Polsley, Pooja Jhunjhunwala, Ruihong Huang
Abstract Attorneys, judges, and others in the justice system are constantly surrounded by large amounts of legal text, which can be difficult to manage across many cases. We present CaseSummarizer, a tool for automated text summarization of legal documents which uses standard summary methods based on word frequency augmented with additional domain-specific knowledge. Summaries are then provided through an informative interface with abbreviations, significance heat maps, and other flexible controls. It is evaluated using ROUGE and human scoring against several other summarization systems, including summary text and feedback provided by domain experts.
Tasks Text Generation, Text Summarization
Published 2016-12-01
URL https://www.aclweb.org/anthology/C16-2054/
PDF https://www.aclweb.org/anthology/C16-2054
PWC https://paperswithcode.com/paper/casesummarizer-a-system-for-automated
Repo
Framework

The NTNU-YZU System in the AESW Shared Task: Automated Evaluation of Scientific Writing Using a Convolutional Neural Network

Title The NTNU-YZU System in the AESW Shared Task: Automated Evaluation of Scientific Writing Using a Convolutional Neural Network
Authors Lung-Hao Lee, Bo-Lin Lin, Liang-Chih Yu, Yuen-Hsien Tseng
Abstract
Tasks Grammatical Error Detection
Published 2016-06-01
URL https://www.aclweb.org/anthology/W16-0513/
PDF https://www.aclweb.org/anthology/W16-0513
PWC https://paperswithcode.com/paper/the-ntnu-yzu-system-in-the-aesw-shared-task
Repo
Framework

Detecting Word Usage Errors in Chinese Sentences for Learning Chinese as a Foreign Language

Title Detecting Word Usage Errors in Chinese Sentences for Learning Chinese as a Foreign Language
Authors Yow-Ting Shiue, Hsin-Hsi Chen
Abstract Automated grammatical error detection, which helps users improve their writing, is an important application in NLP. Recently more and more people are learning Chinese, and an automated error detection system can be helpful for the learners. This paper proposes n-gram features, dependency count features, dependency bigram features, and single-character features to determine if a Chinese sentence contains word usage errors, in which a word is written as a wrong form or the word selection is inappropriate. With marking potential errors on the level of sentence segments, typically delimited by punctuation marks, the learner can try to correct the problems without the assistant of a language teacher. Experiments on the HSK corpus show that the classifier combining all sets of features achieves an accuracy of 0.8423. By utilizing certain combination of the sets of features, we can construct a system that favors precision or recall. The best precision we achieve is 0.9536, indicating that our system is reliable and seldom produces misleading results.
Tasks Grammatical Error Detection
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1033/
PDF https://www.aclweb.org/anthology/L16-1033
PWC https://paperswithcode.com/paper/detecting-word-usage-errors-in-chinese
Repo
Framework

TLT-CRF: A Lexicon-supported Morphological Tagger for Latin Based on Conditional Random Fields

Title TLT-CRF: A Lexicon-supported Morphological Tagger for Latin Based on Conditional Random Fields
Authors Tim vor der Br{"u}ck, Alex Mehler, er
Abstract We present a morphological tagger for Latin, called TTLab Latin Tagger based on Conditional Random Fields (TLT-CRF) which uses a large Latin lexicon. Beyond Part of Speech (PoS), TLT-CRF tags eight inflectional categories of verbs, adjectives or nouns. It utilizes a statistical model based on CRFs together with a rule interpreter that addresses scenarios of sparse training data. We present results of evaluating TLT-CRF to answer the question what can be learnt following the paradigm of 1st order CRFs in conjunction with a large lexical resource and a rule interpreter. Furthermore, we investigate the contigency of representational features and targeted parts of speech to learn about selective features.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1240/
PDF https://www.aclweb.org/anthology/L16-1240
PWC https://paperswithcode.com/paper/tlt-crf-a-lexicon-supported-morphological
Repo
Framework

Incremental Fine-grained Information Status Classification Using Attention-based LSTMs

Title Incremental Fine-grained Information Status Classification Using Attention-based LSTMs
Authors Yufang Hou
Abstract Information status plays an important role in discourse processing. According to the hearer{'}s common sense knowledge and his comprehension of the preceding text, a discourse entity could be old, mediated or new. In this paper, we propose an attention-based LSTM model to address the problem of fine-grained information status classification in an incremental manner. Our approach resembles how human beings process the task, i.e., decide the information status of the current discourse entity based on its preceding context. Experimental results on the ISNotes corpus (Markert et al., 2012) reveal that (1) despite its moderate result, our model with only word embedding features captures the necessary semantic knowledge needed for the task by a large extent; and (2) when incorporating with additional several simple features, our model achieves the competitive results compared to the state-of-the-art approach (Hou et al., 2013) which heavily depends on lots of hand-crafted semantic features.
Tasks Common Sense Reasoning
Published 2016-12-01
URL https://www.aclweb.org/anthology/C16-1177/
PDF https://www.aclweb.org/anthology/C16-1177
PWC https://paperswithcode.com/paper/incremental-fine-grained-information-status
Repo
Framework

From built examples to attested examples: a syntax-based query for non-specialists

Title From built examples to attested examples: a syntax-based query for non-specialists
Authors Ilaine Wang, Sylvain Kahane, Isabelle Tellier
Abstract
Tasks
Published 2016-10-01
URL https://www.aclweb.org/anthology/Y16-3011/
PDF https://www.aclweb.org/anthology/Y16-3011
PWC https://paperswithcode.com/paper/from-built-examples-to-attested-examples-a
Repo
Framework

Beyond Plain Spatial Knowledge: Determining Where Entities Are and Are Not Located, and For How Long

Title Beyond Plain Spatial Knowledge: Determining Where Entities Are and Are Not Located, and For How Long
Authors Alakan Vempala, a, Eduardo Blanco
Abstract
Tasks Coreference Resolution, Question Answering
Published 2016-08-01
URL https://www.aclweb.org/anthology/P16-1142/
PDF https://www.aclweb.org/anthology/P16-1142
PWC https://paperswithcode.com/paper/beyond-plain-spatial-knowledge-determining
Repo
Framework

Universal Dependencies: A Cross-Linguistic Perspective on Grammar and Lexicon

Title Universal Dependencies: A Cross-Linguistic Perspective on Grammar and Lexicon
Authors Joakim Nivre
Abstract Universal Dependencies is an initiative to develop cross-linguistically consistent grammatical annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning and parsing research from a language typology perspective. It assumes a dependency-based approach to syntax and a lexicalist approach to morphology, which together entail that the fundamental units of grammatical annotation are words. Words have properties captured by morphological annotation and enter into relations captured by syntactic annotation. Moreover, priority is given to relations between lexical content words, as opposed to grammatical function words. In this position paper, I discuss how this approach allows us to capture similarities and differences across typologically diverse languages.
Tasks
Published 2016-12-01
URL https://www.aclweb.org/anthology/W16-3806/
PDF https://www.aclweb.org/anthology/W16-3806
PWC https://paperswithcode.com/paper/universal-dependencies-a-cross-linguistic
Repo
Framework

Reading-Time Annotations for ``Balanced Corpus of Contemporary Written Japanese’’

Title Reading-Time Annotations for ``Balanced Corpus of Contemporary Written Japanese’’ |
Authors Masayuki Asahara, Hajime Ono, Edson T. Miyamoto
Abstract The Dundee Eyetracking Corpus contains eyetracking data collected while native speakers of English and French read newspaper editorial articles. Similar resources for other languages are still rare, especially for languages in which words are not overtly delimited with spaces. This is a report on a project to build an eyetracking corpus for Japanese. Measurements were collected while 24 native speakers of Japanese read excerpts from the Balanced Corpus of Contemporary Written Japanese Texts were presented with or without segmentation (i.e. with or without space at the boundaries between bunsetsu segmentations) and with two types of methodologies (eyetracking and self-paced reading presentation). Readers{'} background information including vocabulary-size estimation and Japanese reading-span score were also collected. As an example of the possible uses for the corpus, we also report analyses investigating the phenomena of anti-locality.
Tasks
Published 2016-12-01
URL https://www.aclweb.org/anthology/C16-1066/
PDF https://www.aclweb.org/anthology/C16-1066
PWC https://paperswithcode.com/paper/reading-time-annotations-for-balanced-corpus
Repo
Framework

An Aligned French-Chinese corpus of 10K segments from university educational material

Title An Aligned French-Chinese corpus of 10K segments from university educational material
Authors Ruslan Kalitvianski, Lingxiao Wang, Val{'e}rie Bellynck, Christian Boitet
Abstract This paper describes a corpus of nearly 10K French-Chinese aligned segments, produced by post-editing machine translated computer science courseware. This corpus was built from 2013 to 2016 within the PROJECT{_}NAME project, by native Chinese students. The quality, as judged by native speakers, is ad-equate for understanding (far better than by reading only the original French) and for getting better marks. This corpus is annotated at segment-level by a self-assessed quality score. It has been directly used as supplemental training data to build a statistical machine translation system dedicated to that sublanguage, and can be used to extract the specific bilingual terminology. To our knowledge, it is the first corpus of this kind to be released.
Tasks Machine Translation
Published 2016-12-01
URL https://www.aclweb.org/anthology/W16-4915/
PDF https://www.aclweb.org/anthology/W16-4915
PWC https://paperswithcode.com/paper/an-aligned-french-chinese-corpus-of-10k
Repo
Framework

Automatic Corpus Extension for Data-driven Natural Language Generation

Title Automatic Corpus Extension for Data-driven Natural Language Generation
Authors Elena Manishina, Bassam Jabaian, St{'e}phane Huet, Fabrice Lef{`e}vre
Abstract As data-driven approaches started to make their way into the Natural Language Generation (NLG) domain, the need for automation of corpus building and extension became apparent. Corpus creation and extension in data-driven NLG domain traditionally involved manual paraphrasing performed by either a group of experts or with resort to crowd-sourcing. Building the training corpora manually is a costly enterprise which requires a lot of time and human resources. We propose to automate the process of corpus extension by integrating automatically obtained synonyms and paraphrases. Our methodology allowed us to significantly increase the size of the training corpus and its level of variability (the number of distinct tokens and specific syntactic structures). Our extension solutions are fully automatic and require only some initial validation. The human evaluation results confirm that in many cases native speakers favor the outputs of the model built on the extended corpus.
Tasks Text Generation
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1575/
PDF https://www.aclweb.org/anthology/L16-1575
PWC https://paperswithcode.com/paper/automatic-corpus-extension-for-data-driven
Repo
Framework

Generating a Yiddish Speech Corpus, Forced Aligner and Basic ASR System for the AHEYM Project

Title Generating a Yiddish Speech Corpus, Forced Aligner and Basic ASR System for the AHEYM Project
Authors Malgorzata {'C}avar, Damir {'C}avar, Dov-Ber Kerler, Anya Quilitzsch
Abstract To create automatic transcription and annotation tools for the AHEYM corpus of recorded interviews with Yiddish speakers in Eastern Europe we develop initial Yiddish language resources that are used for adaptations of speech and language technologies. Our project aims at the development of resources and technologies that can make the entire AHEYM corpus and other Yiddish resources more accessible to not only the community of Yiddish speakers or linguists with language expertise, but also historians and experts from other disciplines or the general public. In this paper we describe the rationale behind our approach, the procedures and methods, and challenges that are not specific to the AHEYM corpus, but apply to all documentary language data that is collected in the field. To the best of our knowledge, this is the first attempt to create a speech corpus and speech technologies for Yiddish. This is also the first attempt to work out speech and language technologies to transcribe and translate a large collection of Yiddish spoken language resources.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1744/
PDF https://www.aclweb.org/anthology/L16-1744
PWC https://paperswithcode.com/paper/generating-a-yiddish-speech-corpus-forced
Repo
Framework

LSTM CCG Parsing

Title LSTM CCG Parsing
Authors Mike Lewis, Kenton Lee, Luke Zettlemoyer
Abstract
Tasks CCG Supertagging, Structured Prediction, Word Embeddings
Published 2016-06-01
URL https://www.aclweb.org/anthology/N16-1026/
PDF https://www.aclweb.org/anthology/N16-1026
PWC https://paperswithcode.com/paper/lstm-ccg-parsing
Repo
Framework

Dynamic Feature Induction: The Last Gist to the State-of-the-Art

Title Dynamic Feature Induction: The Last Gist to the State-of-the-Art
Authors Jinho D. Choi
Abstract
Tasks Feature Engineering, Named Entity Recognition, Part-Of-Speech Tagging
Published 2016-06-01
URL https://www.aclweb.org/anthology/N16-1031/
PDF https://www.aclweb.org/anthology/N16-1031
PWC https://paperswithcode.com/paper/dynamic-feature-induction-the-last-gist-to
Repo
Framework

Zara: A Virtual Interactive Dialogue System Incorporating Emotion, Sentiment and Personality Recognition

Title Zara: A Virtual Interactive Dialogue System Incorporating Emotion, Sentiment and Personality Recognition
Authors Pascale Fung, Anik Dey, Farhad Bin Siddique, Ruixi Lin, Yang Yang, Dario Bertero, Yan Wan, Ricky Ho Yin Chan, Chien-Sheng Wu
Abstract Zara, or {`}Zara the Supergirl{'} is a virtual robot, that can exhibit empathy while interacting with an user, with the aid of its built in facial and emotion recognition, sentiment analysis, and speech module. At the end of the 5-10 minute conversation, Zara can give a personality analysis of the user based on all the user utterances. We have also implemented a real-time emotion recognition, using a CNN model that detects emotion from raw audio without feature extraction, and have achieved an average of 65.7{%} accuracy on six different emotion classes, which is an impressive 4.5{%} improvement from the conventional feature based SVM classification. Also, we have described a CNN based sentiment analysis module trained using out-of-domain data, that recognizes sentiment from the speech recognition transcript, which has a 74.8 F-measure when tested on human-machine dialogues. |
Tasks Emotion Recognition, Feature Engineering, Sentiment Analysis, Speech Recognition
Published 2016-12-01
URL https://www.aclweb.org/anthology/C16-2058/
PDF https://www.aclweb.org/anthology/C16-2058
PWC https://paperswithcode.com/paper/zara-a-virtual-interactive-dialogue-system
Repo
Framework
comments powered by Disqus