May 4, 2019

1526 words 8 mins read

Paper Group NANR 184

Segmenting Hashtags using Automatically Created Training Data. D-GloVe: A Feasible Least Squares Model for Estimating Word Embedding Densities. Potential and Limits of Using Post-edits as Reference Translations for MT Evaluation. MuTUAL: A Controlled Authoring Support System Enabling Contextual Machine Translation. Challenges and Solutions for Cons …

Segmenting Hashtags using Automatically Created Training Data


Title	Segmenting Hashtags using Automatically Created Training Data
Authors	Arda {\c{C}}elebi, Arzucan {"O}zg{"u}r
Abstract	Hashtags, which are commonly composed of multiple words, are increasingly used to convey the actual messages in tweets. Understanding what tweets are saying is getting more dependent on understanding hashtags. Therefore, identifying the individual words that constitute a hashtag is an important, yet a challenging task due to the abrupt nature of the language used in tweets. In this study, we introduce a feature-rich approach based on using supervised machine learning methods to segment hashtags. Our approach is unsupervised in the sense that instead of using manually segmented hashtags for training the machine learning classifiers, we automatically create our training data by using tweets as well as by automatically extracting hashtag segmentations from a large corpus. We achieve promising results with such automatically created noisy training data.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1476/
PDF	https://www.aclweb.org/anthology/L16-1476
PWC	https://paperswithcode.com/paper/segmenting-hashtags-using-automatically
Repo
Framework

D-GloVe: A Feasible Least Squares Model for Estimating Word Embedding Densities


Title	D-GloVe: A Feasible Least Squares Model for Estimating Word Embedding Densities
Authors	Shoaib Jameel, Steven Schockaert
Abstract	We propose a new word embedding model, inspired by GloVe, which is formulated as a feasible least squares optimization problem. In contrast to existing models, we explicitly represent the uncertainty about the exact definition of each word vector. To this end, we estimate the error that results from using noisy co-occurrence counts in the formulation of the model, and we model the imprecision that results from including uninformative context words. Our experimental results demonstrate that this model compares favourably with existing word embedding models.
Tasks	Word Embeddings
Published	2016-12-01
URL	https://www.aclweb.org/anthology/C16-1174/
PDF	https://www.aclweb.org/anthology/C16-1174
PWC	https://paperswithcode.com/paper/d-glove-a-feasible-least-squares-model-for
Repo
Framework

Potential and Limits of Using Post-edits as Reference Translations for MT Evaluation


Title	Potential and Limits of Using Post-edits as Reference Translations for MT Evaluation
Authors	Maja Popovic, Mihael Ar{\v{c}}an, Arle Lommel
Abstract
Tasks	Common Sense Reasoning, Machine Translation
Published	2016-01-01
URL	https://www.aclweb.org/anthology/W16-3410/
PDF	https://www.aclweb.org/anthology/W16-3410
PWC	https://paperswithcode.com/paper/potential-and-limits-of-using-post-edits-as
Repo
Framework

MuTUAL: A Controlled Authoring Support System Enabling Contextual Machine Translation


Title	MuTUAL: A Controlled Authoring Support System Enabling Contextual Machine Translation
Authors	Rei Miyata, Anthony Hartley, Kyo Kageura, C{'e}cile Paris, Masao Utiyama, Eiichiro Sumita
Abstract	The paper introduces a web-based authoring support system, MuTUAL, which aims to help writers create multilingual texts. The highlighted feature of the system is that it enables machine translation (MT) to generate outputs appropriate to their functional context within the target document. Our system is operational online, implementing core mechanisms for document structuring and controlled writing. These include a topic template and a controlled language authoring assistant, linked to our statistical MT system.
Tasks	Machine Translation
Published	2016-12-01
URL	https://www.aclweb.org/anthology/C16-2008/
PDF	https://www.aclweb.org/anthology/C16-2008
PWC	https://paperswithcode.com/paper/mutual-a-controlled-authoring-support-system
Repo
Framework

Challenges and Solutions for Consistent Annotation of Vietnamese Treebank


Title	Challenges and Solutions for Consistent Annotation of Vietnamese Treebank
Authors	Quy Nguyen, Yusuke Miyao, Ha Le, Ngan Nguyen
Abstract	Treebanks are important resources for researchers in natural language processing, speech recognition, theoretical linguistics, etc. To strengthen the automatic processing of the Vietnamese language, a Vietnamese treebank has been built. However, the quality of this treebank is not satisfactory and is a possible source for the low performance of Vietnamese language processing. We have been building a new treebank for Vietnamese with about 40,000 sentences annotated with three layers: word segmentation, part-of-speech tagging, and bracketing. In this paper, we describe several challenges of Vietnamese language and how we solve them in developing annotation guidelines. We also present our methods to improve the quality of the annotation guidelines and ensure annotation accuracy and consistency. Experiment results show that inter-annotator agreement ratios and accuracy are higher than 90{%} which is satisfactory.
Tasks	Part-Of-Speech Tagging, Speech Recognition
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1243/
PDF	https://www.aclweb.org/anthology/L16-1243
PWC	https://paperswithcode.com/paper/challenges-and-solutions-for-consistent
Repo
Framework

SideNoter: Scholarly Paper Browsing System based on PDF Restructuring and Text Annotation


Title	SideNoter: Scholarly Paper Browsing System based on PDF Restructuring and Text Annotation
Authors	Takeshi Abekawa, Akiko Aizawa
Abstract	In this paper, we discuss our ongoing efforts to construct a scientific paper browsing system that helps users to read and understand advanced technical content distributed in PDF. Since PDF is a format specifically designed for printing, layout and logical structures of documents are indistinguishably embedded in the file. It requires much effort to extract natural language text from PDF files, and reversely, display semantic annotations produced by NLP tools on the original page layout. In our browsing system, we tackle these issues caused by the gap between printable document and plain text. Our system provides ways to extract natural language sentences from PDF files together with their logical structures, and also to map arbitrary textual spans to their corresponding regions on page images. We setup a demonstration system using papers published in ACL anthology and demonstrate the enhanced search and refined recommendation functions which we plan to make widely available to NLP researchers.
Tasks
Published	2016-12-01
URL	https://www.aclweb.org/anthology/C16-2029/
PDF	https://www.aclweb.org/anthology/C16-2029
PWC	https://paperswithcode.com/paper/sidenoter-scholarly-paper-browsing-system
Repo
Framework

NYU-MILA Neural Machine Translation Systems for WMT’16


Title	NYU-MILA Neural Machine Translation Systems for WMT’16
Authors	Junyoung Chung, Kyunghyun Cho, Yoshua Bengio
Abstract
Tasks	Machine Translation
Published	2016-08-01
URL	https://www.aclweb.org/anthology/W16-2309/
PDF	https://www.aclweb.org/anthology/W16-2309
PWC	https://paperswithcode.com/paper/nyu-mila-neural-machine-translation-systems
Repo
Framework

Greedy Feature Construction


Title	Greedy Feature Construction
Authors	Dino Oglic, Thomas Gärtner
Abstract	We present an effective method for supervised feature construction. The main goal of the approach is to construct a feature representation for which a set of linear hypotheses is of sufficient capacity – large enough to contain a satisfactory solution to the considered problem and small enough to allow good generalization from a small number of training examples. We achieve this goal with a greedy procedure that constructs features by empirically fitting squared error residuals. The proposed constructive procedure is consistent and can output a rich set of features. The effectiveness of the approach is evaluated empirically by fitting a linear ridge regression model in the constructed feature space and our empirical results indicate a superior performance of our approach over competing methods.
Tasks
Published	2016-12-01
URL	http://papers.nips.cc/paper/6557-greedy-feature-construction
PDF	http://papers.nips.cc/paper/6557-greedy-feature-construction.pdf
PWC	https://paperswithcode.com/paper/greedy-feature-construction
Repo
Framework

Chinese Couplet Generation with Neural Network Structures


Title	Chinese Couplet Generation with Neural Network Structures
Authors	Rui Yan, Cheng-Te Li, Xiaohua Hu, Ming Zhang
Abstract
Tasks	Language Modelling, Text Generation
Published	2016-08-01
URL	https://www.aclweb.org/anthology/P16-1222/
PDF	https://www.aclweb.org/anthology/P16-1222
PWC	https://paperswithcode.com/paper/chinese-couplet-generation-with-neural
Repo
Framework

Neural Network-Based Model for Japanese Predicate Argument Structure Analysis


Title	Neural Network-Based Model for Japanese Predicate Argument Structure Analysis
Authors	Tomohide Shibata, Daisuke Kawahara, Sadao Kurohashi
Abstract
Tasks	Machine Translation, Semantic Role Labeling
Published	2016-08-01
URL	https://www.aclweb.org/anthology/P16-1117/
PDF	https://www.aclweb.org/anthology/P16-1117
PWC	https://paperswithcode.com/paper/neural-network-based-model-for-japanese
Repo
Framework

SYN2015: Representative Corpus of Contemporary Written Czech


Title	SYN2015: Representative Corpus of Contemporary Written Czech
Authors	Michal K{\v{r}}en, V{'a}clav Cvr{\v{c}}ek, Tom{'a}{\v{s}} {\v{C}}apka, Anna {\v{C}}erm{'a}kov{'a}, Milena Hn{'a}tkov{'a}, Lucie Chlumsk{'a}, Tom{'a}{\v{s}} Jel{'\i}nek, Dominika Kov{'a}{\v{r}}{'\i}kov{'a}, Vladim{'\i}r Petkevi{\v{c}}, Pavel Proch{'a}zka, Hana Skoumalov{'a}, Michal {\v{S}}krabal, Petr Trune{\v{c}}ek, Pavel Vond{\v{r}}i{\v{c}}ka, Adrian Jan Zasina
Abstract	The paper concentrates on the design, composition and annotation of SYN2015, a new 100-million representative corpus of contemporary written Czech. SYN2015 is a sequel of the representative corpora of the SYN series that can be described as traditional (as opposed to the web-crawled corpora), featuring cleared copyright issues, well-defined composition, reliability of annotation and high-quality text processing. At the same time, SYN2015 is designed as a reflection of the variety of written Czech text production with necessary methodological and technological enhancements that include a detailed bibliographic annotation and text classification based on an updated scheme. The corpus has been produced using a completely rebuilt text processing toolchain called SynKorp. SYN2015 is lemmatized, morphologically and syntactically annotated with state-of-the-art tools. It has been published within the framework of the Czech National Corpus and it is available via the standard corpus query interface KonText at http://kontext.korpus.cz as well as a dataset in shuffled format.
Tasks	Text Classification
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1400/
PDF	https://www.aclweb.org/anthology/L16-1400
PWC	https://paperswithcode.com/paper/syn2015-representative-corpus-of-contemporary
Repo
Framework

Two End-to-end Shallow Discourse Parsers for English and Chinese in CoNLL-2016 Shared Task


Title	Two End-to-end Shallow Discourse Parsers for English and Chinese in CoNLL-2016 Shared Task
Authors	Jianxiang Wang, Man Lan
Abstract
Tasks
Published	2016-08-01
URL	https://www.aclweb.org/anthology/K16-2004/
PDF	https://www.aclweb.org/anthology/K16-2004
PWC	https://paperswithcode.com/paper/two-end-to-end-shallow-discourse-parsers-for
Repo
Framework

Paraphrase for Open Question Answering: New Dataset and Methods


Title	Paraphrase for Open Question Answering: New Dataset and Methods
Authors	Ying Xu, Pascual Mart{'\i}nez-G{'o}mez, Yusuke Miyao, R Goebel, y
Abstract
Tasks	Open Information Extraction, Question Answering, Semantic Parsing
Published	2016-06-01
URL	https://www.aclweb.org/anthology/W16-0109/
PDF	https://www.aclweb.org/anthology/W16-0109
PWC	https://paperswithcode.com/paper/paraphrase-for-open-question-answering-new
Repo
Framework

Gender-Distinguishing Features in Film Dialogue


Title	Gender-Distinguishing Features in Film Dialogue
Authors	Alex Schofield, ra, Leo Mehr
Abstract
Tasks	Language Modelling
Published	2016-06-01
URL	https://www.aclweb.org/anthology/W16-0204/
PDF	https://www.aclweb.org/anthology/W16-0204
PWC	https://paperswithcode.com/paper/gender-distinguishing-features-in-film
Repo
Framework

Speech Intelligibility and the Production of Fricative and Affricate among Mandarin-speaking Children with Cerebral Palsy


Title	Speech Intelligibility and the Production of Fricative and Affricate among Mandarin-speaking Children with Cerebral Palsy
Authors	Chin-Ting Liu, Li-mei Chen, Yu-Ching Lin, Chia-Fang Cheng, Hui-chen Chang
Abstract
Tasks
Published	2016-10-01
URL	https://www.aclweb.org/anthology/O16-1016/
PDF	https://www.aclweb.org/anthology/O16-1016
PWC	https://paperswithcode.com/paper/speech-intelligibility-and-the-production-of
Repo
Framework