Paper Group NANR 184
Segmenting Hashtags using Automatically Created Training Data. D-GloVe: A Feasible Least Squares Model for Estimating Word Embedding Densities. Potential and Limits of Using Post-edits as Reference Translations for MT Evaluation. MuTUAL: A Controlled Authoring Support System Enabling Contextual Machine Translation. Challenges and Solutions for Cons …
Segmenting Hashtags using Automatically Created Training Data
Title | Segmenting Hashtags using Automatically Created Training Data |
Authors | Arda {\c{C}}elebi, Arzucan {"O}zg{"u}r |
Abstract | Hashtags, which are commonly composed of multiple words, are increasingly used to convey the actual messages in tweets. Understanding what tweets are saying is getting more dependent on understanding hashtags. Therefore, identifying the individual words that constitute a hashtag is an important, yet a challenging task due to the abrupt nature of the language used in tweets. In this study, we introduce a feature-rich approach based on using supervised machine learning methods to segment hashtags. Our approach is unsupervised in the sense that instead of using manually segmented hashtags for training the machine learning classifiers, we automatically create our training data by using tweets as well as by automatically extracting hashtag segmentations from a large corpus. We achieve promising results with such automatically created noisy training data. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1476/ |
https://www.aclweb.org/anthology/L16-1476 | |
PWC | https://paperswithcode.com/paper/segmenting-hashtags-using-automatically |
Repo | |
Framework | |
D-GloVe: A Feasible Least Squares Model for Estimating Word Embedding Densities
Title | D-GloVe: A Feasible Least Squares Model for Estimating Word Embedding Densities |
Authors | Shoaib Jameel, Steven Schockaert |
Abstract | We propose a new word embedding model, inspired by GloVe, which is formulated as a feasible least squares optimization problem. In contrast to existing models, we explicitly represent the uncertainty about the exact definition of each word vector. To this end, we estimate the error that results from using noisy co-occurrence counts in the formulation of the model, and we model the imprecision that results from including uninformative context words. Our experimental results demonstrate that this model compares favourably with existing word embedding models. |
Tasks | Word Embeddings |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/C16-1174/ |
https://www.aclweb.org/anthology/C16-1174 | |
PWC | https://paperswithcode.com/paper/d-glove-a-feasible-least-squares-model-for |
Repo | |
Framework | |
Potential and Limits of Using Post-edits as Reference Translations for MT Evaluation
Title | Potential and Limits of Using Post-edits as Reference Translations for MT Evaluation |
Authors | Maja Popovic, Mihael Ar{\v{c}}an, Arle Lommel |
Abstract | |
Tasks | Common Sense Reasoning, Machine Translation |
Published | 2016-01-01 |
URL | https://www.aclweb.org/anthology/W16-3410/ |
https://www.aclweb.org/anthology/W16-3410 | |
PWC | https://paperswithcode.com/paper/potential-and-limits-of-using-post-edits-as |
Repo | |
Framework | |
MuTUAL: A Controlled Authoring Support System Enabling Contextual Machine Translation
Title | MuTUAL: A Controlled Authoring Support System Enabling Contextual Machine Translation |
Authors | Rei Miyata, Anthony Hartley, Kyo Kageura, C{'e}cile Paris, Masao Utiyama, Eiichiro Sumita |
Abstract | The paper introduces a web-based authoring support system, MuTUAL, which aims to help writers create multilingual texts. The highlighted feature of the system is that it enables machine translation (MT) to generate outputs appropriate to their functional context within the target document. Our system is operational online, implementing core mechanisms for document structuring and controlled writing. These include a topic template and a controlled language authoring assistant, linked to our statistical MT system. |
Tasks | Machine Translation |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/C16-2008/ |
https://www.aclweb.org/anthology/C16-2008 | |
PWC | https://paperswithcode.com/paper/mutual-a-controlled-authoring-support-system |
Repo | |
Framework | |
Challenges and Solutions for Consistent Annotation of Vietnamese Treebank
Title | Challenges and Solutions for Consistent Annotation of Vietnamese Treebank |
Authors | Quy Nguyen, Yusuke Miyao, Ha Le, Ngan Nguyen |
Abstract | Treebanks are important resources for researchers in natural language processing, speech recognition, theoretical linguistics, etc. To strengthen the automatic processing of the Vietnamese language, a Vietnamese treebank has been built. However, the quality of this treebank is not satisfactory and is a possible source for the low performance of Vietnamese language processing. We have been building a new treebank for Vietnamese with about 40,000 sentences annotated with three layers: word segmentation, part-of-speech tagging, and bracketing. In this paper, we describe several challenges of Vietnamese language and how we solve them in developing annotation guidelines. We also present our methods to improve the quality of the annotation guidelines and ensure annotation accuracy and consistency. Experiment results show that inter-annotator agreement ratios and accuracy are higher than 90{%} which is satisfactory. |
Tasks | Part-Of-Speech Tagging, Speech Recognition |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1243/ |
https://www.aclweb.org/anthology/L16-1243 | |
PWC | https://paperswithcode.com/paper/challenges-and-solutions-for-consistent |
Repo | |
Framework | |
SideNoter: Scholarly Paper Browsing System based on PDF Restructuring and Text Annotation
Title | SideNoter: Scholarly Paper Browsing System based on PDF Restructuring and Text Annotation |
Authors | Takeshi Abekawa, Akiko Aizawa |
Abstract | In this paper, we discuss our ongoing efforts to construct a scientific paper browsing system that helps users to read and understand advanced technical content distributed in PDF. Since PDF is a format specifically designed for printing, layout and logical structures of documents are indistinguishably embedded in the file. It requires much effort to extract natural language text from PDF files, and reversely, display semantic annotations produced by NLP tools on the original page layout. In our browsing system, we tackle these issues caused by the gap between printable document and plain text. Our system provides ways to extract natural language sentences from PDF files together with their logical structures, and also to map arbitrary textual spans to their corresponding regions on page images. We setup a demonstration system using papers published in ACL anthology and demonstrate the enhanced search and refined recommendation functions which we plan to make widely available to NLP researchers. |
Tasks | |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/C16-2029/ |
https://www.aclweb.org/anthology/C16-2029 | |
PWC | https://paperswithcode.com/paper/sidenoter-scholarly-paper-browsing-system |
Repo | |
Framework | |
NYU-MILA Neural Machine Translation Systems for WMT’16
Title | NYU-MILA Neural Machine Translation Systems for WMT’16 |
Authors | Junyoung Chung, Kyunghyun Cho, Yoshua Bengio |
Abstract | |
Tasks | Machine Translation |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/W16-2309/ |
https://www.aclweb.org/anthology/W16-2309 | |
PWC | https://paperswithcode.com/paper/nyu-mila-neural-machine-translation-systems |
Repo | |
Framework | |
Greedy Feature Construction
Title | Greedy Feature Construction |
Authors | Dino Oglic, Thomas Gärtner |
Abstract | We present an effective method for supervised feature construction. The main goal of the approach is to construct a feature representation for which a set of linear hypotheses is of sufficient capacity – large enough to contain a satisfactory solution to the considered problem and small enough to allow good generalization from a small number of training examples. We achieve this goal with a greedy procedure that constructs features by empirically fitting squared error residuals. The proposed constructive procedure is consistent and can output a rich set of features. The effectiveness of the approach is evaluated empirically by fitting a linear ridge regression model in the constructed feature space and our empirical results indicate a superior performance of our approach over competing methods. |
Tasks | |
Published | 2016-12-01 |
URL | http://papers.nips.cc/paper/6557-greedy-feature-construction |
http://papers.nips.cc/paper/6557-greedy-feature-construction.pdf | |
PWC | https://paperswithcode.com/paper/greedy-feature-construction |
Repo | |
Framework | |
Chinese Couplet Generation with Neural Network Structures
Title | Chinese Couplet Generation with Neural Network Structures |
Authors | Rui Yan, Cheng-Te Li, Xiaohua Hu, Ming Zhang |
Abstract | |
Tasks | Language Modelling, Text Generation |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/P16-1222/ |
https://www.aclweb.org/anthology/P16-1222 | |
PWC | https://paperswithcode.com/paper/chinese-couplet-generation-with-neural |
Repo | |
Framework | |
Neural Network-Based Model for Japanese Predicate Argument Structure Analysis
Title | Neural Network-Based Model for Japanese Predicate Argument Structure Analysis |
Authors | Tomohide Shibata, Daisuke Kawahara, Sadao Kurohashi |
Abstract | |
Tasks | Machine Translation, Semantic Role Labeling |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/P16-1117/ |
https://www.aclweb.org/anthology/P16-1117 | |
PWC | https://paperswithcode.com/paper/neural-network-based-model-for-japanese |
Repo | |
Framework | |
SYN2015: Representative Corpus of Contemporary Written Czech
Title | SYN2015: Representative Corpus of Contemporary Written Czech |
Authors | Michal K{\v{r}}en, V{'a}clav Cvr{\v{c}}ek, Tom{'a}{\v{s}} {\v{C}}apka, Anna {\v{C}}erm{'a}kov{'a}, Milena Hn{'a}tkov{'a}, Lucie Chlumsk{'a}, Tom{'a}{\v{s}} Jel{'\i}nek, Dominika Kov{'a}{\v{r}}{'\i}kov{'a}, Vladim{'\i}r Petkevi{\v{c}}, Pavel Proch{'a}zka, Hana Skoumalov{'a}, Michal {\v{S}}krabal, Petr Trune{\v{c}}ek, Pavel Vond{\v{r}}i{\v{c}}ka, Adrian Jan Zasina |
Abstract | The paper concentrates on the design, composition and annotation of SYN2015, a new 100-million representative corpus of contemporary written Czech. SYN2015 is a sequel of the representative corpora of the SYN series that can be described as traditional (as opposed to the web-crawled corpora), featuring cleared copyright issues, well-defined composition, reliability of annotation and high-quality text processing. At the same time, SYN2015 is designed as a reflection of the variety of written Czech text production with necessary methodological and technological enhancements that include a detailed bibliographic annotation and text classification based on an updated scheme. The corpus has been produced using a completely rebuilt text processing toolchain called SynKorp. SYN2015 is lemmatized, morphologically and syntactically annotated with state-of-the-art tools. It has been published within the framework of the Czech National Corpus and it is available via the standard corpus query interface KonText at http://kontext.korpus.cz as well as a dataset in shuffled format. |
Tasks | Text Classification |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1400/ |
https://www.aclweb.org/anthology/L16-1400 | |
PWC | https://paperswithcode.com/paper/syn2015-representative-corpus-of-contemporary |
Repo | |
Framework | |
Two End-to-end Shallow Discourse Parsers for English and Chinese in CoNLL-2016 Shared Task
Title | Two End-to-end Shallow Discourse Parsers for English and Chinese in CoNLL-2016 Shared Task |
Authors | Jianxiang Wang, Man Lan |
Abstract | |
Tasks | |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/K16-2004/ |
https://www.aclweb.org/anthology/K16-2004 | |
PWC | https://paperswithcode.com/paper/two-end-to-end-shallow-discourse-parsers-for |
Repo | |
Framework | |
Paraphrase for Open Question Answering: New Dataset and Methods
Title | Paraphrase for Open Question Answering: New Dataset and Methods |
Authors | Ying Xu, Pascual Mart{'\i}nez-G{'o}mez, Yusuke Miyao, R Goebel, y |
Abstract | |
Tasks | Open Information Extraction, Question Answering, Semantic Parsing |
Published | 2016-06-01 |
URL | https://www.aclweb.org/anthology/W16-0109/ |
https://www.aclweb.org/anthology/W16-0109 | |
PWC | https://paperswithcode.com/paper/paraphrase-for-open-question-answering-new |
Repo | |
Framework | |
Gender-Distinguishing Features in Film Dialogue
Title | Gender-Distinguishing Features in Film Dialogue |
Authors | Alex Schofield, ra, Leo Mehr |
Abstract | |
Tasks | Language Modelling |
Published | 2016-06-01 |
URL | https://www.aclweb.org/anthology/W16-0204/ |
https://www.aclweb.org/anthology/W16-0204 | |
PWC | https://paperswithcode.com/paper/gender-distinguishing-features-in-film |
Repo | |
Framework | |
Speech Intelligibility and the Production of Fricative and Affricate among Mandarin-speaking Children with Cerebral Palsy
Title | Speech Intelligibility and the Production of Fricative and Affricate among Mandarin-speaking Children with Cerebral Palsy |
Authors | Chin-Ting Liu, Li-mei Chen, Yu-Ching Lin, Chia-Fang Cheng, Hui-chen Chang |
Abstract | |
Tasks | |
Published | 2016-10-01 |
URL | https://www.aclweb.org/anthology/O16-1016/ |
https://www.aclweb.org/anthology/O16-1016 | |
PWC | https://paperswithcode.com/paper/speech-intelligibility-and-the-production-of |
Repo | |
Framework | |