July 26, 2019

2304 words 11 mins read

Paper Group NANR 108

Paper Group NANR 108

Byte-based Neural Machine Translation. Vector space models for evaluating semantic fluency in autism. Learning from Parenthetical Sentences for Term Translation in Machine Translation. Estimating High-dimensional Non-Gaussian Multiple Index Models via Stein’s Lemma. Exploratory Analysis for Ontology Learning from Social Events on Social Media Strea …

Byte-based Neural Machine Translation

Title Byte-based Neural Machine Translation
Authors Marta R. Costa-juss{`a}, Carlos Escolano, Jos{'e} A. R. Fonollosa
Abstract This paper presents experiments comparing character-based and byte-based neural machine translation systems. The main motivation of the byte-based neural machine translation system is to build multi-lingual neural machine translation systems that can share the same vocabulary. We compare the performance of both systems in several language pairs and we see that the performance in test is similar for most language pairs while the training time is slightly reduced in the case of byte-based neural machine translation.
Tasks Language Modelling, Machine Translation, Named Entity Recognition, Speech Recognition
Published 2017-09-01
URL https://www.aclweb.org/anthology/W17-4123/
PDF https://www.aclweb.org/anthology/W17-4123
PWC https://paperswithcode.com/paper/byte-based-neural-machine-translation
Repo
Framework

Vector space models for evaluating semantic fluency in autism

Title Vector space models for evaluating semantic fluency in autism
Authors Emily Prud{'}hommeaux, Jan van Santen, Douglas Gliner
Abstract A common test administered during neurological examination is the semantic fluency test, in which the patient must list as many examples of a given semantic category as possible under timed conditions. Poor performance is associated with neurological conditions characterized by impairments in executive function, such as dementia, schizophrenia, and autism spectrum disorder (ASD). Methods for analyzing semantic fluency responses at the level of detail necessary to uncover these differences have typically relied on subjective manual annotation. In this paper, we explore automated approaches for scoring semantic fluency responses that leverage ontological resources and distributional semantic models to characterize the semantic fluency responses produced by young children with and without ASD. Using these methods, we find significant differences in the semantic fluency responses of children with ASD, demonstrating the utility of using objective methods for clinical language analysis.
Tasks
Published 2017-07-01
URL https://www.aclweb.org/anthology/P17-2006/
PDF https://www.aclweb.org/anthology/P17-2006
PWC https://paperswithcode.com/paper/vector-space-models-for-evaluating-semantic
Repo
Framework

Learning from Parenthetical Sentences for Term Translation in Machine Translation

Title Learning from Parenthetical Sentences for Term Translation in Machine Translation
Authors Guoping Huang, Jiajun Zhang, Yu Zhou, Chengqing Zong
Abstract Terms extensively exist in specific domains, and term translation plays a critical role in domain-specific machine translation (MT) tasks. However, it{'}s a challenging task to translate them correctly for the huge number of pre-existing terms and the endless new terms. To achieve better term translation quality, it is necessary to inject external term knowledge into the underlying MT system. Fortunately, there are plenty of term translation knowledge in parenthetical sentences on the Internet. In this paper, we propose a simple, straightforward and effective framework to improve term translation by learning from parenthetical sentences. This framework includes: (1) a focused web crawler; (2) a parenthetical sentence filter, acquiring parenthetical sentences including bilingual term pairs; (3) a term translation knowledge extractor, extracting bilingual term translation candidates; (4) a probability learner, generating the term translation table for MT decoders. The extensive experiments demonstrate that our proposed framework significantly improves the translation quality of terms and sentences.
Tasks Machine Translation
Published 2017-12-01
URL https://www.aclweb.org/anthology/W17-6005/
PDF https://www.aclweb.org/anthology/W17-6005
PWC https://paperswithcode.com/paper/learning-from-parenthetical-sentences-for
Repo
Framework

Estimating High-dimensional Non-Gaussian Multiple Index Models via Stein’s Lemma

Title Estimating High-dimensional Non-Gaussian Multiple Index Models via Stein’s Lemma
Authors Zhuoran Yang, Krishnakumar Balasubramanian, Princeton Zhaoran Wang, Han Liu
Abstract We consider estimating the parametric components of semiparametric multi-index models in high dimensions. To bypass the requirements of Gaussianity or elliptical symmetry of covariates in existing methods, we propose to leverage a second-order Stein’s method with score function-based corrections. We prove that our estimator achieves a near-optimal statistical rate of convergence even when the score function or the response variable is heavy-tailed. To establish the key concentration results, we develop a data-driven truncation argument that may be of independent interest. We supplement our theoretical findings with simulations.
Tasks
Published 2017-12-01
URL http://papers.nips.cc/paper/7190-estimating-high-dimensional-non-gaussian-multiple-index-models-via-steins-lemma
PDF http://papers.nips.cc/paper/7190-estimating-high-dimensional-non-gaussian-multiple-index-models-via-steins-lemma.pdf
PWC https://paperswithcode.com/paper/estimating-high-dimensional-non-gaussian
Repo
Framework

Exploratory Analysis for Ontology Learning from Social Events on Social Media Streaming in Spanish

Title Exploratory Analysis for Ontology Learning from Social Events on Social Media Streaming in Spanish
Authors Enrique Valeriano, Arturo Oncevay-Marcos
Abstract
Tasks
Published 2017-09-01
URL https://www.aclweb.org/anthology/W17-7001/
PDF https://www.aclweb.org/anthology/W17-7001
PWC https://paperswithcode.com/paper/exploratory-analysis-for-ontology-learning
Repo
Framework

SEE: Towards Semi-SupervisedEnd-to-End Scene Text Recognition

Title SEE: Towards Semi-SupervisedEnd-to-End Scene Text Recognition
Authors Christian Bartz, Haojin Yang, Christoph Meinel
Abstract Detecting and recognizing text in natural scene images is a challenging, yet not completely solved task. In recent years several new systems that try to solve at least one of the two sub-tasks (text detection and text recognition) have been proposed. In this paper we present SEE, a step towards semi-supervised neural networks for scene text detection and recognition, that can be optimized end-to-end. Most existing works consist of multiple deep neural networks and several pre-processing steps. In contrast to this, we propose to use a single deep neural network, that learns to detect and recognize text from natural images, in a semi-supervised way.SEE is a network that integrates and jointly learns a spatial transformer network, which can learn to detect text regions in an image, and a text recognition network that takes the identified text regions and recognizes their textual content. We introduce the idea behind our novel approach and show its feasibility, by performing a range of experiments on standard benchmark datasets, where we achieve competitive results
Tasks Optical Character Recognition, Scene Text Detection, Scene Text Recognition
Published 2017-12-14
URL https://arxiv.org/abs/1712.05404
PDF https://arxiv.org/pdf/1712.05404
PWC https://paperswithcode.com/paper/see-towards-semi-supervisedend-to-end-scene
Repo
Framework

Comparison of SMT and NMT trained with large Patent Corpora: Japio at WAT2017

Title Comparison of SMT and NMT trained with large Patent Corpora: Japio at WAT2017
Authors Satoshi Kinoshita, Tadaaki Oshio, Tomoharu Mitsuhashi
Abstract Japio participates in patent subtasks (JPC-EJ/JE/CJ/KJ) with phrase-based statistical machine translation (SMT) and neural machine translation (NMT) systems which are trained with its own patent corpora in addition to the subtask corpora provided by organizers of WAT2017. In EJ and CJ subtasks, SMT and NMT systems whose sizes of training corpora are about 50 million and 10 million sentence pairs respectively achieved comparable scores for automatic evaluations, but NMT systems were superior to SMT systems for both official and in-house human evaluations.
Tasks Information Retrieval, Machine Translation
Published 2017-11-01
URL https://www.aclweb.org/anthology/W17-5713/
PDF https://www.aclweb.org/anthology/W17-5713
PWC https://paperswithcode.com/paper/comparison-of-smt-and-nmt-trained-with-large
Repo
Framework

Salience Rank: Efficient Keyphrase Extraction with Topic Modeling

Title Salience Rank: Efficient Keyphrase Extraction with Topic Modeling
Authors Nedelina Teneva, Weiwei Cheng
Abstract Topical PageRank (TPR) uses latent topic distribution inferred by Latent Dirichlet Allocation (LDA) to perform ranking of noun phrases extracted from documents. The ranking procedure consists of running PageRank K times, where K is the number of topics used in the LDA model. In this paper, we propose a modification of TPR, called Salience Rank. Salience Rank only needs to run PageRank once and extracts comparable or better keyphrases on benchmark datasets. In addition to quality and efficiency benefit, our method has the flexibility to extract keyphrases with varying tradeoffs between topic specificity and corpus specificity.
Tasks Part-Of-Speech Tagging
Published 2017-07-01
URL https://www.aclweb.org/anthology/P17-2084/
PDF https://www.aclweb.org/anthology/P17-2084
PWC https://paperswithcode.com/paper/salience-rank-efficient-keyphrase-extraction
Repo
Framework

Adapting Kernel Representations Online Using Submodular Maximization

Title Adapting Kernel Representations Online Using Submodular Maximization
Authors Matthew Schlegel, Yangchen Pan, Jiecao Chen, Martha White
Abstract Kernel representations provide a nonlinear representation, through similarities to prototypes, but require only simple linear learning algorithms given those prototypes. In a continual learning setting, with a constant stream of observations, it is critical to have an efficient mechanism for sub-selecting prototypes amongst observations. In this work, we develop an approximately submodular criterion for this setting, and an efficient online greedy submodular maximization algorithm for optimizing the criterion. We extend streaming submodular maximization algorithms to continual learning, by removing the need for multiple passes—which is infeasible—and instead introducing the idea of coverage time. We propose a general block-diagonal approximation for the greedy update with our criterion, that enables updates linear in the number of prototypes. We empirically demonstrate the effectiveness of this approximation, in terms of approximation quality, significant runtime improvements, and effective prediction performance.
Tasks Continual Learning
Published 2017-08-01
URL https://icml.cc/Conferences/2017/Schedule?showEvent=668
PDF http://proceedings.mlr.press/v70/schlegel17a/schlegel17a.pdf
PWC https://paperswithcode.com/paper/adapting-kernel-representations-online-using
Repo
Framework

A Universal Dependencies Treebank for Marathi

Title A Universal Dependencies Treebank for Marathi
Authors Vinit Ravishankar
Abstract
Tasks Dependency Parsing, Transliteration
Published 2017-01-01
URL https://www.aclweb.org/anthology/W17-7623/
PDF https://www.aclweb.org/anthology/W17-7623
PWC https://paperswithcode.com/paper/a-universal-dependencies-treebank-for-marathi
Repo
Framework

Evaluating Natural Language Understanding Services for Conversational Question Answering Systems

Title Evaluating Natural Language Understanding Services for Conversational Question Answering Systems
Authors Daniel Braun, Hern, Adrian ez Mendez, Florian Matthes, Manfred Langen
Abstract Conversational interfaces recently gained a lot of attention. One of the reasons for the current hype is the fact that chatbots (one particularly popular form of conversational interfaces) nowadays can be created without any programming knowledge, thanks to different toolkits and so-called Natural Language Understanding (NLU) services. While these NLU services are already widely used in both, industry and science, so far, they have not been analysed systematically. In this paper, we present a method to evaluate the classification performance of NLU services. Moreover, we present two new corpora, one consisting of annotated questions and one consisting of annotated questions with the corresponding answers. Based on these corpora, we conduct an evaluation of some of the most popular NLU services. Thereby we want to enable both, researchers and companies to make more educated decisions about which service they should use.
Tasks Chatbot, Dialogue Management, Question Answering
Published 2017-08-01
URL https://www.aclweb.org/anthology/W17-5522/
PDF https://www.aclweb.org/anthology/W17-5522
PWC https://paperswithcode.com/paper/evaluating-natural-language-understanding
Repo
Framework

CUNI NMT System for WAT 2017 Translation Tasks

Title CUNI NMT System for WAT 2017 Translation Tasks
Authors Tom Kocmi, Du{\v{s}}an Vari{\v{s}}, Ond{\v{r}}ej Bojar
Abstract The paper presents this year{'}s CUNI submissions to the WAT 2017 Translation Task focusing on the Japanese-English translation, namely Scientific papers subtask, Patents subtask and Newswire subtask. We compare two neural network architectures, the standard sequence-to-sequence with attention (Seq2Seq) and an architecture using convolutional sentence encoder (FBConv2Seq), both implemented in the NMT framework Neural Monkey that we currently participate in developing. We also compare various types of preprocessing of the source Japanese sentences and their impact on the overall results. Furthermore, we include the results of our experiments with out-of-domain data obtained by combining the corpora provided for each subtask.
Tasks Machine Translation, Tokenization
Published 2017-11-01
URL https://www.aclweb.org/anthology/W17-5715/
PDF https://www.aclweb.org/anthology/W17-5715
PWC https://paperswithcode.com/paper/cuni-nmt-system-for-wat-2017-translation
Repo
Framework

Applying Deep Neural Network to Retrieve Relevant Civil Law Articles

Title Applying Deep Neural Network to Retrieve Relevant Civil Law Articles
Authors Anh Hang Nga Tran
Abstract The paper aims to achieve the legal question answering information retrieval (IR) task at Competition on Legal Information Extraction/Entailment (COLIEE) 2017. Our proposal methodology for the task is to utilize deep neural network, natural language processing and word2vec. The system was evaluated using training and testing data from the competition on legal information extraction/entailment (COLIEE). Our system mainly focuses on giving relevant civil law articles for given bar exams. The corpus of legal questions is drawn from Japanese Legal Bar exam queries. We implemented a combined deep neural network with additional features NLP and word2vec to gain the corresponding civil law articles based on a given bar exam {`}Yes/No{'} questions. This paper focuses on clustering words-with-relation in order to acquire relevant civil law articles. All evaluation processes were done on the COLIEE 2017 training and test data set. The experimental result shows a very promising result. |
Tasks Information Retrieval, Natural Language Inference, Part-Of-Speech Tagging, Question Answering
Published 2017-09-01
URL https://www.aclweb.org/anthology/R17-2007/
PDF https://doi.org/10.26615/issn.1314-9156.2017_007
PWC https://paperswithcode.com/paper/applying-deep-neural-network-to-retrieve
Repo
Framework

Representation and Interchange of Linguistic Annotation. An In-Depth, Side-by-Side Comparison of Three Designs

Title Representation and Interchange of Linguistic Annotation. An In-Depth, Side-by-Side Comparison of Three Designs
Authors Richard Eckart de Castilho, Nancy Ide, Emanuele Lapponi, Stephan Oepen, Keith Suderman, Erik Velldal, Marc Verhagen
Abstract For decades, most self-respecting linguistic engineering initiatives have designed and implemented custom representations for various layers of, for example, morphological, syntactic, and semantic analysis. Despite occasional efforts at harmonization or even standardization, our field today is blessed with a multitude of ways of encoding and exchanging linguistic annotations of these types, both at the levels of {`}abstract syntax{'}, naming choices, and of course file formats. To a large degree, it is possible to work within and across design plurality by conversion, and often there may be good reasons for divergent design reflecting differences in use. However, it is likely that some abstract commonalities across choices of representation are obscured by more superficial differences, and conversely there is no obvious procedure to tease apart what actually constitute contentful vs. mere technical divergences. In this study, we seek to conceptually align three representations for common types of morpho-syntactic analysis, pinpoint what in our view constitute contentful differences, and reflect on the underlying principles and specific requirements that led to individual choices. We expect that a more in-depth understanding of these choices across designs may led to increased harmonization, or at least to more informed design of future representations. |
Tasks
Published 2017-04-01
URL https://www.aclweb.org/anthology/W17-0808/
PDF https://www.aclweb.org/anthology/W17-0808
PWC https://paperswithcode.com/paper/representation-and-interchange-of-linguistic
Repo
Framework

Assessing SRL Frameworks with Automatic Training Data Expansion

Title Assessing SRL Frameworks with Automatic Training Data Expansion
Authors Silvana Hartmann, {'E}va M{'u}jdricza-Maydt, Ilia Kuznetsov, Iryna Gurevych, Anette Frank
Abstract We present the first experiment-based study that explicitly contrasts the three major semantic role labeling frameworks. As a prerequisite, we create a dataset labeled with parallel FrameNet-, PropBank-, and VerbNet-style labels for German. We train a state-of-the-art SRL tool for German for the different annotation styles and provide a comparative analysis across frameworks. We further explore the behavior of the frameworks with automatic training data generation. VerbNet provides larger semantic expressivity than PropBank, and we find that its generalization capacity approaches PropBank in SRL training, but it benefits less from training data expansion than the sparse-data affected FrameNet.
Tasks Question Answering, Semantic Role Labeling
Published 2017-04-01
URL https://www.aclweb.org/anthology/W17-0814/
PDF https://www.aclweb.org/anthology/W17-0814
PWC https://paperswithcode.com/paper/assessing-srl-frameworks-with-automatic
Repo
Framework
comments powered by Disqus