July 26, 2019

2182 words 11 mins read

Paper Group NANR 56

A Report on the 2017 Native Language Identification Shared Task. Improving Implicit Discourse Relation Recognition with Discourse-specific Word Embeddings. Native Language Identification Using a Mixture of Character and Word N-grams. Identifying and Tracking Sentiments and Topics from Social Media Texts during Natural Disasters. Elucidating Concept …

A Report on the 2017 Native Language Identification Shared Task


Title	A Report on the 2017 Native Language Identification Shared Task
Authors	Shervin Malmasi, Keelan Evanini, Aoife Cahill, Joel Tetreault, Robert Pugh, Christopher Hamill, Diane Napolitano, Yao Qian
Abstract	Native Language Identification (NLI) is the task of automatically identifying the native language (L1) of an individual based on their language production in a learned language. It is typically framed as a classification task where the set of L1s is known a priori. Two previous shared tasks on NLI have been organized where the aim was to identify the L1 of learners of English based on essays (2013) and spoken responses (2016) they provided during a standardized assessment of academic English proficiency. The 2017 shared task combines the inputs from the two prior tasks for the first time. There are three tracks: NLI on the essay only, NLI on the spoken response only (based on a transcription of the response and i-vector acoustic features), and NLI using both responses. We believe this makes for a more interesting shared task while building on the methods and results from the previous two shared tasks. In this paper, we report the results of the shared task. A total of 19 teams competed across the three different sub-tasks. The fusion track showed that combining the written and spoken responses provides a large boost in prediction accuracy. Multiple classifier systems (e.g. ensembles and meta-classifiers) were the most effective in all tasks, with most based on traditional classifiers (e.g. SVMs) with lexical/syntactic features.
Tasks	Grammatical Error Correction, Language Acquisition, Language Identification, Native Language Identification
Published	2017-09-01
URL	https://www.aclweb.org/anthology/W17-5007/
PDF	https://www.aclweb.org/anthology/W17-5007
PWC	https://paperswithcode.com/paper/a-report-on-the-2017-native-language
Repo
Framework

Improving Implicit Discourse Relation Recognition with Discourse-specific Word Embeddings


Title	Improving Implicit Discourse Relation Recognition with Discourse-specific Word Embeddings
Authors	Changxing Wu, Xiaodong Shi, Yidong Chen, Jinsong Su, Boli Wang
Abstract	We introduce a simple and effective method to learn discourse-specific word embeddings (DSWE) for implicit discourse relation recognition. Specifically, DSWE is learned by performing connective classification on massive explicit discourse data, and capable of capturing discourse relationships between words. On the PDTB data set, using DSWE as features achieves significant improvements over baselines.
Tasks	Machine Translation, Question Answering, Word Embeddings
Published	2017-07-01
URL	https://www.aclweb.org/anthology/P17-2042/
PDF	https://www.aclweb.org/anthology/P17-2042
PWC	https://paperswithcode.com/paper/improving-implicit-discourse-relation-1
Repo
Framework

Native Language Identification Using a Mixture of Character and Word N-grams


Title	Native Language Identification Using a Mixture of Character and Word N-grams
Authors	Elham Mohammadi, Hadi Veisi, Hessam Amini
Abstract	Native language identification (NLI) is the task of determining an author{'}s native language, based on a piece of his/her writing in a second language. In recent years, NLI has received much attention due to its challenging nature and its applications in language pedagogy and forensic linguistics. We participated in the NLI2017 shared task under the name UT-DSP. In our effort to implement a method for native language identification, we made use of a fusion of character and word N-grams, and achieved an optimal F1-Score of 77.64{%}, using both essay and speech transcription datasets.
Tasks	Language Acquisition, Language Identification, Native Language Identification
Published	2017-09-01
URL	https://www.aclweb.org/anthology/W17-5022/
PDF	https://www.aclweb.org/anthology/W17-5022
PWC	https://paperswithcode.com/paper/native-language-identification-using-a
Repo
Framework


Title	Identifying and Tracking Sentiments and Topics from Social Media Texts during Natural Disasters
Authors	Min Yang, Jincheng Mei, Heng Ji, Wei Zhao, Zhou Zhao, Xiaojun Chen
Abstract	We study the problem of identifying the topics and sentiments and tracking their shifts from social media texts in different geographical regions during emergencies and disasters. We propose a location-based dynamic sentiment-topic model (LDST) which can jointly model topic, sentiment, time and Geolocation information. The experimental results demonstrate that LDST performs very well at discovering topics and sentiments from social media and tracking their shifts in different geographical regions during emergencies and disasters. We will release the data and source code after this work is published.
Tasks	Topic Models
Published	2017-09-01
URL	https://www.aclweb.org/anthology/D17-1055/
PDF	https://www.aclweb.org/anthology/D17-1055
PWC	https://paperswithcode.com/paper/identifying-and-tracking-sentiments-and
Repo
Framework

Elucidating Conceptual Properties from Word Embeddings


Title	Elucidating Conceptual Properties from Word Embeddings
Authors	Kyoung-Rok Jang, Sung-Hyon Myaeng
Abstract	In this paper, we introduce a method of identifying the components (i.e. dimensions) of word embeddings that strongly signifies properties of a word. By elucidating such properties hidden in word embeddings, we could make word embeddings more interpretable, and also could perform property-based meaning comparison. With the capability, we can answer questions like {`}To what degree a given word has the property cuteness?{''} or {`}In what perspective two words are similar?{''}. We verify our method by examining how the strength of property-signifying components correlates with the degree of prototypicality of a target word.
Tasks	Decision Making, Named Entity Recognition, Sentiment Analysis, Word Embeddings
Published	2017-04-01
URL	https://www.aclweb.org/anthology/W17-1911/
PDF	https://www.aclweb.org/anthology/W17-1911
PWC	https://paperswithcode.com/paper/elucidating-conceptual-properties-from-word
Repo
Framework

Automatic Community Creation for Abstractive Spoken Conversations Summarization


Title	Automatic Community Creation for Abstractive Spoken Conversations Summarization
Authors	Karan Singla, Evgeny Stepanov, Ali Orkan Bayer, Giuseppe Carenini, Giuseppe Riccardi
Abstract	Summarization of spoken conversations is a challenging task, since it requires deep understanding of dialogs. Abstractive summarization techniques rely on linking the summary sentences to sets of original conversation sentences, i.e. communities. Unfortunately, such linking information is rarely available or requires trained annotators. We propose and experiment automatic community creation using cosine similarity on different levels of representation: raw text, WordNet SynSet IDs, and word embeddings. We show that the abstractive summarization systems with automatic communities significantly outperform previously published results on both English and Italian corpora.
Tasks	Abstractive Text Summarization, Word Embeddings
Published	2017-09-01
URL	https://www.aclweb.org/anthology/W17-4506/
PDF	https://www.aclweb.org/anthology/W17-4506
PWC	https://paperswithcode.com/paper/automatic-community-creation-for-abstractive
Repo
Framework

Identifying Deception in Indonesian Transcribed Interviews through Lexical-based Approach


Title	Identifying Deception in Indonesian Transcribed Interviews through Lexical-based Approach
Authors	Tifani Warnita, Dessi Puji Lestari
Abstract
Tasks
Published	2017-11-01
URL	https://www.aclweb.org/anthology/Y17-1022/
PDF	https://www.aclweb.org/anthology/Y17-1022
PWC	https://paperswithcode.com/paper/identifying-deception-in-indonesian
Repo
Framework

Process-constrained batch Bayesian optimisation


Title	Process-constrained batch Bayesian optimisation
Authors	Pratibha Vellanki, Santu Rana, Sunil Gupta, David Rubin, Alessandra Sutti, Thomas Dorin, Murray Height, Paul Sanders, Svetha Venkatesh
Abstract	Abstract Prevailing batch Bayesian optimisation methods allow all control variables to be freely altered at each iteration. Real-world experiments, however, often have physical limitations making it time-consuming to alter all settings for each recommendation in a batch. This gives rise to a unique problem in BO: in a recommended batch, a set of variables that are expensive to experimentally change need to be fixed, while the remaining control variables can be varied. We formulate this as a process-constrained batch Bayesian optimisation problem. We propose two algorithms, pc-BO(basic) and pc-BO(nested). pc-BO(basic) is simpler but lacks convergence guarantee. In contrast pc-BO(nested) is slightly more complex, but admits convergence analysis. We show that the regret of pc-BO(nested) is sublinear. We demonstrate the performance of both pc-BO(basic) and pc-BO(nested) by optimising benchmark test functions, tuning hyper-parameters of the SVM classifier, optimising the heat-treatment process for an Al-Sc alloy to achieve target hardness, and optimising the short polymer fibre production process.
Tasks	Bayesian Optimisation
Published	2017-12-01
URL	http://papers.nips.cc/paper/6933-process-constrained-batch-bayesian-optimisation
PDF	http://papers.nips.cc/paper/6933-process-constrained-batch-bayesian-optimisation.pdf
PWC	https://paperswithcode.com/paper/process-constrained-batch-bayesian
Repo
Framework

Alto: Rapid Prototyping for Parsing and Translation


Title	Alto: Rapid Prototyping for Parsing and Translation
Authors	Johannes Gontrum, Jonas Groschwitz, Alex Koller, er, Christoph Teichmann
Abstract	We present Alto, a rapid prototyping tool for new grammar formalisms. Alto implements generic but efficient algorithms for parsing, translation, and training for a range of monolingual and synchronous grammar formalisms. It can easily be extended to new formalisms, which makes all of these algorithms immediately available for the new formalism.
Tasks	Machine Translation, Semantic Parsing
Published	2017-04-01
URL	https://www.aclweb.org/anthology/E17-3008/
PDF	https://www.aclweb.org/anthology/E17-3008
PWC	https://paperswithcode.com/paper/alto-rapid-prototyping-for-parsing-and
Repo
Framework

EviNets: Neural Networks for Combining Evidence Signals for Factoid Question Answering


Title	EviNets: Neural Networks for Combining Evidence Signals for Factoid Question Answering
Authors	Denis Savenkov, Eugene Agichtein
Abstract	A critical task for question answering is the final answer selection stage, which has to combine multiple signals available about each answer candidate. This paper proposes EviNets: a novel neural network architecture for factoid question answering. EviNets scores candidate answer entities by combining the available supporting evidence, e.g., structured knowledge bases and unstructured text documents. EviNets represents each piece of evidence with a dense embeddings vector, scores their relevance to the question, and aggregates the support for each candidate to predict their final scores. Each of the components is generic and allows plugging in a variety of models for semantic similarity scoring and information aggregation. We demonstrate the effectiveness of EviNets in experiments on the existing TREC QA and WikiMovies benchmarks, and on the new Yahoo! Answers dataset introduced in this paper. EviNets can be extended to other information types and could facilitate future work on combining evidence signals for joint reasoning in question answering.
Tasks	Answer Selection, Feature Engineering, Information Retrieval, Knowledge Base Question Answering, Question Answering, Semantic Similarity, Semantic Textual Similarity
Published	2017-07-01
URL	https://www.aclweb.org/anthology/P17-2047/
PDF	https://www.aclweb.org/anthology/P17-2047
PWC	https://paperswithcode.com/paper/evinets-neural-networks-for-combining
Repo
Framework

Do LSTMs really work so well for PoS tagging? – A replication study


Title	Do LSTMs really work so well for PoS tagging? – A replication study
Authors	Tobias Horsmann, Torsten Zesch
Abstract	A recent study by Plank et al. (2016) found that LSTM-based PoS taggers considerably improve over the current state-of-the-art when evaluated on the corpora of the Universal Dependencies project that use a coarse-grained tagset. We replicate this study using a fresh collection of 27 corpora of 21 languages that are annotated with fine-grained tagsets of varying size. Our replication confirms the result in general, and we additionally find that the advantage of LSTMs is even bigger for larger tagsets. However, we also find that for the very large tagsets of morphologically rich languages, hand-crafted morphological lexicons are still necessary to reach state-of-the-art performance.
Tasks	Feature Engineering, Part-Of-Speech Tagging
Published	2017-09-01
URL	https://www.aclweb.org/anthology/D17-1076/
PDF	https://www.aclweb.org/anthology/D17-1076
PWC	https://paperswithcode.com/paper/do-lstms-really-work-so-well-for-pos-tagging
Repo
Framework

Adaptive SVRG Methods under Error Bound Conditions with Unknown Growth Parameter


Title	Adaptive SVRG Methods under Error Bound Conditions with Unknown Growth Parameter
Authors	Yi Xu, Qihang Lin, Tianbao Yang
Abstract	Error bound, an inherent property of an optimization problem, has recently revived in the development of algorithms with improved global convergence without strong convexity. The most studied error bound is the quadratic error bound, which generalizes strong convexity and is satisfied by a large family of machine learning problems. Quadratic error bound have been leveraged to achieve linear convergence in many first-order methods including the stochastic variance reduced gradient (SVRG) method, which is one of the most important stochastic optimization methods in machine learning. However, the studies along this direction face the critical issue that the algorithms must depend on an unknown growth parameter (a generalization of strong convexity modulus) in the error bound. This parameter is difficult to estimate exactly and the algorithms choosing this parameter heuristically do not have theoretical convergence guarantee. To address this issue, we propose novel SVRG methods that automatically search for this unknown parameter on the fly of optimization while still obtain almost the same convergence rate as when this parameter is known. We also analyze the convergence property of SVRG methods under H"{o}lderian error bound, which generalizes the quadratic error bound.
Tasks	Stochastic Optimization
Published	2017-12-01
URL	http://papers.nips.cc/paper/6920-adaptive-svrg-methods-under-error-bound-conditions-with-unknown-growth-parameter
PDF	http://papers.nips.cc/paper/6920-adaptive-svrg-methods-under-error-bound-conditions-with-unknown-growth-parameter.pdf
PWC	https://paperswithcode.com/paper/adaptive-svrg-methods-under-error-bound
Repo
Framework

Extending hybrid word-character neural machine translation with multi-task learning of morphological analysis


Title	Extending hybrid word-character neural machine translation with multi-task learning of morphological analysis
Authors	Stig-Arne Gr{"o}nroos, Sami Virpioja, Mikko Kurimo
Abstract
Tasks	Machine Translation, Morphological Analysis, Multi-Task Learning
Published	2017-09-01
URL	https://www.aclweb.org/anthology/W17-4727/
PDF	https://www.aclweb.org/anthology/W17-4727
PWC	https://paperswithcode.com/paper/extending-hybrid-word-character-neural
Repo
Framework

A rule-based system for cross-lingual parsing of Romance languages with Universal Dependencies


Title	A rule-based system for cross-lingual parsing of Romance languages with Universal Dependencies
Authors	Marcos Garcia, Pablo Gamallo
Abstract	This article describes MetaRomance, a rule-based cross-lingual parser for Romance languages submitted to CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies. The system is an almost delexicalized parser which does not need training data to analyze Romance languages. It contains linguistically motivated rules based on PoS-tag patterns. The rules included in MetaRomance were developed in about 12 hours by one expert with no prior knowledge in Universal Dependencies, and can be easily extended using a transparent formalism. In this paper we compare the performance of MetaRomance with other supervised systems participating in the competition, paying special attention to the parsing of different treebanks of the same language. We also compare our system with a delexicalized parser for Romance languages, and take advantage of the harmonized annotation of Universal Dependencies to propose a language ranking based on the syntactic distance each variety has from Romance languages.
Tasks	Dependency Parsing
Published	2017-08-01
URL	https://www.aclweb.org/anthology/K17-3029/
PDF	https://www.aclweb.org/anthology/K17-3029
PWC	https://paperswithcode.com/paper/a-rule-based-system-for-cross-lingual-parsing
Repo
Framework

運用類神經網路方法之語言端點偵測研究 (A Study on Voice Activation Detection by Using Neural Networks) [In Chinese]


Title	運用類神經網路方法之語言端點偵測研究 (A Study on Voice Activation Detection by Using Neural Networks) [In Chinese]
Authors	Yu-Chih Deng, Chen-Yu Chiang, Chen-Ming Pan
Abstract
Tasks
Published	2017-11-01
URL	https://www.aclweb.org/anthology/O17-1002/
PDF	https://www.aclweb.org/anthology/O17-1002
PWC	https://paperswithcode.com/paper/ec-eccc2e-13a1eae-c-ea-c-c-a-study-on-voice
Repo
Framework