May 4, 2019

1526 words 8 mins read

Paper Group NANR 184

Paper Group NANR 184

Segmenting Hashtags using Automatically Created Training Data. D-GloVe: A Feasible Least Squares Model for Estimating Word Embedding Densities. Potential and Limits of Using Post-edits as Reference Translations for MT Evaluation. MuTUAL: A Controlled Authoring Support System Enabling Contextual Machine Translation. Challenges and Solutions for Cons …

Segmenting Hashtags using Automatically Created Training Data

Title Segmenting Hashtags using Automatically Created Training Data
Authors Arda {\c{C}}elebi, Arzucan {"O}zg{"u}r
Abstract Hashtags, which are commonly composed of multiple words, are increasingly used to convey the actual messages in tweets. Understanding what tweets are saying is getting more dependent on understanding hashtags. Therefore, identifying the individual words that constitute a hashtag is an important, yet a challenging task due to the abrupt nature of the language used in tweets. In this study, we introduce a feature-rich approach based on using supervised machine learning methods to segment hashtags. Our approach is unsupervised in the sense that instead of using manually segmented hashtags for training the machine learning classifiers, we automatically create our training data by using tweets as well as by automatically extracting hashtag segmentations from a large corpus. We achieve promising results with such automatically created noisy training data.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1476/
PDF https://www.aclweb.org/anthology/L16-1476
PWC https://paperswithcode.com/paper/segmenting-hashtags-using-automatically
Repo
Framework

D-GloVe: A Feasible Least Squares Model for Estimating Word Embedding Densities

Title D-GloVe: A Feasible Least Squares Model for Estimating Word Embedding Densities
Authors Shoaib Jameel, Steven Schockaert
Abstract We propose a new word embedding model, inspired by GloVe, which is formulated as a feasible least squares optimization problem. In contrast to existing models, we explicitly represent the uncertainty about the exact definition of each word vector. To this end, we estimate the error that results from using noisy co-occurrence counts in the formulation of the model, and we model the imprecision that results from including uninformative context words. Our experimental results demonstrate that this model compares favourably with existing word embedding models.
Tasks Word Embeddings
Published 2016-12-01
URL https://www.aclweb.org/anthology/C16-1174/
PDF https://www.aclweb.org/anthology/C16-1174
PWC https://paperswithcode.com/paper/d-glove-a-feasible-least-squares-model-for
Repo
Framework

Potential and Limits of Using Post-edits as Reference Translations for MT Evaluation

Title Potential and Limits of Using Post-edits as Reference Translations for MT Evaluation
Authors Maja Popovic, Mihael Ar{\v{c}}an, Arle Lommel
Abstract
Tasks Common Sense Reasoning, Machine Translation
Published 2016-01-01
URL https://www.aclweb.org/anthology/W16-3410/
PDF https://www.aclweb.org/anthology/W16-3410
PWC https://paperswithcode.com/paper/potential-and-limits-of-using-post-edits-as
Repo
Framework

MuTUAL: A Controlled Authoring Support System Enabling Contextual Machine Translation

Title MuTUAL: A Controlled Authoring Support System Enabling Contextual Machine Translation
Authors Rei Miyata, Anthony Hartley, Kyo Kageura, C{'e}cile Paris, Masao Utiyama, Eiichiro Sumita
Abstract The paper introduces a web-based authoring support system, MuTUAL, which aims to help writers create multilingual texts. The highlighted feature of the system is that it enables machine translation (MT) to generate outputs appropriate to their functional context within the target document. Our system is operational online, implementing core mechanisms for document structuring and controlled writing. These include a topic template and a controlled language authoring assistant, linked to our statistical MT system.
Tasks Machine Translation
Published 2016-12-01
URL https://www.aclweb.org/anthology/C16-2008/
PDF https://www.aclweb.org/anthology/C16-2008
PWC https://paperswithcode.com/paper/mutual-a-controlled-authoring-support-system
Repo
Framework

Challenges and Solutions for Consistent Annotation of Vietnamese Treebank

Title Challenges and Solutions for Consistent Annotation of Vietnamese Treebank
Authors Quy Nguyen, Yusuke Miyao, Ha Le, Ngan Nguyen
Abstract Treebanks are important resources for researchers in natural language processing, speech recognition, theoretical linguistics, etc. To strengthen the automatic processing of the Vietnamese language, a Vietnamese treebank has been built. However, the quality of this treebank is not satisfactory and is a possible source for the low performance of Vietnamese language processing. We have been building a new treebank for Vietnamese with about 40,000 sentences annotated with three layers: word segmentation, part-of-speech tagging, and bracketing. In this paper, we describe several challenges of Vietnamese language and how we solve them in developing annotation guidelines. We also present our methods to improve the quality of the annotation guidelines and ensure annotation accuracy and consistency. Experiment results show that inter-annotator agreement ratios and accuracy are higher than 90{%} which is satisfactory.
Tasks Part-Of-Speech Tagging, Speech Recognition
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1243/
PDF https://www.aclweb.org/anthology/L16-1243
PWC https://paperswithcode.com/paper/challenges-and-solutions-for-consistent
Repo
Framework

SideNoter: Scholarly Paper Browsing System based on PDF Restructuring and Text Annotation

Title SideNoter: Scholarly Paper Browsing System based on PDF Restructuring and Text Annotation
Authors Takeshi Abekawa, Akiko Aizawa
Abstract In this paper, we discuss our ongoing efforts to construct a scientific paper browsing system that helps users to read and understand advanced technical content distributed in PDF. Since PDF is a format specifically designed for printing, layout and logical structures of documents are indistinguishably embedded in the file. It requires much effort to extract natural language text from PDF files, and reversely, display semantic annotations produced by NLP tools on the original page layout. In our browsing system, we tackle these issues caused by the gap between printable document and plain text. Our system provides ways to extract natural language sentences from PDF files together with their logical structures, and also to map arbitrary textual spans to their corresponding regions on page images. We setup a demonstration system using papers published in ACL anthology and demonstrate the enhanced search and refined recommendation functions which we plan to make widely available to NLP researchers.
Tasks
Published 2016-12-01
URL https://www.aclweb.org/anthology/C16-2029/
PDF https://www.aclweb.org/anthology/C16-2029
PWC https://paperswithcode.com/paper/sidenoter-scholarly-paper-browsing-system
Repo
Framework

NYU-MILA Neural Machine Translation Systems for WMT’16

Title NYU-MILA Neural Machine Translation Systems for WMT’16
Authors Junyoung Chung, Kyunghyun Cho, Yoshua Bengio
Abstract
Tasks Machine Translation
Published 2016-08-01
URL https://www.aclweb.org/anthology/W16-2309/
PDF https://www.aclweb.org/anthology/W16-2309
PWC https://paperswithcode.com/paper/nyu-mila-neural-machine-translation-systems
Repo
Framework

Greedy Feature Construction

Title Greedy Feature Construction
Authors Dino Oglic, Thomas Gärtner
Abstract We present an effective method for supervised feature construction. The main goal of the approach is to construct a feature representation for which a set of linear hypotheses is of sufficient capacity – large enough to contain a satisfactory solution to the considered problem and small enough to allow good generalization from a small number of training examples. We achieve this goal with a greedy procedure that constructs features by empirically fitting squared error residuals. The proposed constructive procedure is consistent and can output a rich set of features. The effectiveness of the approach is evaluated empirically by fitting a linear ridge regression model in the constructed feature space and our empirical results indicate a superior performance of our approach over competing methods.
Tasks
Published 2016-12-01
URL http://papers.nips.cc/paper/6557-greedy-feature-construction
PDF http://papers.nips.cc/paper/6557-greedy-feature-construction.pdf
PWC https://paperswithcode.com/paper/greedy-feature-construction
Repo
Framework

Chinese Couplet Generation with Neural Network Structures

Title Chinese Couplet Generation with Neural Network Structures
Authors Rui Yan, Cheng-Te Li, Xiaohua Hu, Ming Zhang
Abstract
Tasks Language Modelling, Text Generation
Published 2016-08-01
URL https://www.aclweb.org/anthology/P16-1222/
PDF https://www.aclweb.org/anthology/P16-1222
PWC https://paperswithcode.com/paper/chinese-couplet-generation-with-neural
Repo
Framework

Neural Network-Based Model for Japanese Predicate Argument Structure Analysis

Title Neural Network-Based Model for Japanese Predicate Argument Structure Analysis
Authors Tomohide Shibata, Daisuke Kawahara, Sadao Kurohashi
Abstract
Tasks Machine Translation, Semantic Role Labeling
Published 2016-08-01
URL https://www.aclweb.org/anthology/P16-1117/
PDF https://www.aclweb.org/anthology/P16-1117
PWC https://paperswithcode.com/paper/neural-network-based-model-for-japanese
Repo
Framework

SYN2015: Representative Corpus of Contemporary Written Czech

Title SYN2015: Representative Corpus of Contemporary Written Czech
Authors Michal K{\v{r}}en, V{'a}clav Cvr{\v{c}}ek, Tom{'a}{\v{s}} {\v{C}}apka, Anna {\v{C}}erm{'a}kov{'a}, Milena Hn{'a}tkov{'a}, Lucie Chlumsk{'a}, Tom{'a}{\v{s}} Jel{'\i}nek, Dominika Kov{'a}{\v{r}}{'\i}kov{'a}, Vladim{'\i}r Petkevi{\v{c}}, Pavel Proch{'a}zka, Hana Skoumalov{'a}, Michal {\v{S}}krabal, Petr Trune{\v{c}}ek, Pavel Vond{\v{r}}i{\v{c}}ka, Adrian Jan Zasina
Abstract The paper concentrates on the design, composition and annotation of SYN2015, a new 100-million representative corpus of contemporary written Czech. SYN2015 is a sequel of the representative corpora of the SYN series that can be described as traditional (as opposed to the web-crawled corpora), featuring cleared copyright issues, well-defined composition, reliability of annotation and high-quality text processing. At the same time, SYN2015 is designed as a reflection of the variety of written Czech text production with necessary methodological and technological enhancements that include a detailed bibliographic annotation and text classification based on an updated scheme. The corpus has been produced using a completely rebuilt text processing toolchain called SynKorp. SYN2015 is lemmatized, morphologically and syntactically annotated with state-of-the-art tools. It has been published within the framework of the Czech National Corpus and it is available via the standard corpus query interface KonText at http://kontext.korpus.cz as well as a dataset in shuffled format.
Tasks Text Classification
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1400/
PDF https://www.aclweb.org/anthology/L16-1400
PWC https://paperswithcode.com/paper/syn2015-representative-corpus-of-contemporary
Repo
Framework

Two End-to-end Shallow Discourse Parsers for English and Chinese in CoNLL-2016 Shared Task

Title Two End-to-end Shallow Discourse Parsers for English and Chinese in CoNLL-2016 Shared Task
Authors Jianxiang Wang, Man Lan
Abstract
Tasks
Published 2016-08-01
URL https://www.aclweb.org/anthology/K16-2004/
PDF https://www.aclweb.org/anthology/K16-2004
PWC https://paperswithcode.com/paper/two-end-to-end-shallow-discourse-parsers-for
Repo
Framework

Paraphrase for Open Question Answering: New Dataset and Methods

Title Paraphrase for Open Question Answering: New Dataset and Methods
Authors Ying Xu, Pascual Mart{'\i}nez-G{'o}mez, Yusuke Miyao, R Goebel, y
Abstract
Tasks Open Information Extraction, Question Answering, Semantic Parsing
Published 2016-06-01
URL https://www.aclweb.org/anthology/W16-0109/
PDF https://www.aclweb.org/anthology/W16-0109
PWC https://paperswithcode.com/paper/paraphrase-for-open-question-answering-new
Repo
Framework

Gender-Distinguishing Features in Film Dialogue

Title Gender-Distinguishing Features in Film Dialogue
Authors Alex Schofield, ra, Leo Mehr
Abstract
Tasks Language Modelling
Published 2016-06-01
URL https://www.aclweb.org/anthology/W16-0204/
PDF https://www.aclweb.org/anthology/W16-0204
PWC https://paperswithcode.com/paper/gender-distinguishing-features-in-film
Repo
Framework

Speech Intelligibility and the Production of Fricative and Affricate among Mandarin-speaking Children with Cerebral Palsy

Title Speech Intelligibility and the Production of Fricative and Affricate among Mandarin-speaking Children with Cerebral Palsy
Authors Chin-Ting Liu, Li-mei Chen, Yu-Ching Lin, Chia-Fang Cheng, Hui-chen Chang
Abstract
Tasks
Published 2016-10-01
URL https://www.aclweb.org/anthology/O16-1016/
PDF https://www.aclweb.org/anthology/O16-1016
PWC https://paperswithcode.com/paper/speech-intelligibility-and-the-production-of
Repo
Framework
comments powered by Disqus