July 26, 2019

1876 words 9 mins read

Paper Group NANR 26

DLATK: Differential Language Analysis ToolKit. Combining Textual and Speech Features in the NLI Task Using State-of-the-Art Machine Learning Techniques. Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017). XJSA at SemEval-2017 Task 4: A Deep System for Sentiment Classification in Twitter. Demographic Word Embeddings for Racism Dete …

DLATK: Differential Language Analysis ToolKit


Title	DLATK: Differential Language Analysis ToolKit
Authors	H. Andrew Schwartz, Salvatore Giorgi, Maarten Sap, Patrick Crutchley, Lyle Ungar, Johannes Eichstaedt
Abstract	We present Differential Language Analysis Toolkit (DLATK), an open-source python package and command-line tool developed for conducting social-scientific language analyses. While DLATK provides standard NLP pipeline steps such as tokenization or SVM-classification, its novel strengths lie in analyses useful for psychological, health, and social science: (1) incorporation of extra-linguistic structured information, (2) specified levels and units of analysis (e.g. document, user, community), (3) statistical metrics for continuous outcomes, and (4) robust, proven, and accurate pipelines for social-scientific prediction problems. DLATK integrates multiple popular packages (SKLearn, Mallet), enables interactive usage (Jupyter Notebooks), and generally follows object oriented principles to make it easy to tie in additional libraries or storage technologies.
Tasks	Tokenization
Published	2017-09-01
URL	https://www.aclweb.org/anthology/D17-2010/
PDF	https://www.aclweb.org/anthology/D17-2010
PWC	https://paperswithcode.com/paper/dlatk-differential-language-analysis-toolkit
Repo
Framework

Combining Textual and Speech Features in the NLI Task Using State-of-the-Art Machine Learning Techniques


Title	Combining Textual and Speech Features in the NLI Task Using State-of-the-Art Machine Learning Techniques
Authors	Pavel Ircing, Jan {\v{S}}vec, Zbyn{\v{e}}k Zaj{'\i}c, Barbora Hladk{'a}, Martin Holub
Abstract	We summarize the involvement of our CEMI team in the {''}NLI Shared Task 2017{''}, which deals with both textual and speech input data. We submitted the results achieved by using three different system architectures; each of them combines multiple supervised learning models trained on various feature sets. As expected, better results are achieved with the systems that use both the textual data and the spoken responses. Combining the input data of two different modalities led to a rather dramatic improvement in classification performance. Our best performing method is based on a set of feed-forward neural networks whose hidden-layer outputs are combined together using a softmax layer. We achieved a macro-averaged F1 score of 0.9257 on the evaluation (unseen) test set and our team placed first in the main task together with other three teams.
Tasks	Language Acquisition, Language Identification
Published	2017-09-01
URL	https://www.aclweb.org/anthology/W17-5021/
PDF	https://www.aclweb.org/anthology/W17-5021
PWC	https://paperswithcode.com/paper/combining-textual-and-speech-features-in-the
Repo
Framework

Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017)


Title	Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017)
Authors
Abstract
Tasks
Published	2017-04-01
URL	https://www.aclweb.org/anthology/W17-1700/
PDF	https://www.aclweb.org/anthology/W17-1700
PWC	https://paperswithcode.com/paper/proceedings-of-the-13th-workshop-on-multiword
Repo
Framework

XJSA at SemEval-2017 Task 4: A Deep System for Sentiment Classification in Twitter


Title	XJSA at SemEval-2017 Task 4: A Deep System for Sentiment Classification in Twitter
Authors	Yazhou Hao, YangYang Lan, Yufei Li, Chen Li
Abstract	This paper describes the XJSA System submission from XJTU. Our system was created for SemEval2017 Task 4 {–} subtask A which is very popular and fundamental. The system is based on convolutional neural network and word embedding. We used two pre-trained word vectors and adopt a dynamic strategy for k-max pooling.
Tasks	Semantic Parsing, Sentiment Analysis, Speech Recognition
Published	2017-08-01
URL	https://www.aclweb.org/anthology/S17-2122/
PDF	https://www.aclweb.org/anthology/S17-2122
PWC	https://paperswithcode.com/paper/xjsa-at-semeval-2017-task-4-a-deep-system-for
Repo
Framework

Demographic Word Embeddings for Racism Detection on Twitter


Title	Demographic Word Embeddings for Racism Detection on Twitter
Authors	Mohammed Hasanuzzaman, Ga{"e}l Dias, Andy Way
Abstract	Most social media platforms grant users freedom of speech by allowing them to freely express their thoughts, beliefs, and opinions. Although this represents incredible and unique communication opportunities, it also presents important challenges. Online racism is such an example. In this study, we present a supervised learning strategy to detect racist language on Twitter based on word embedding that incorporate demographic (Age, Gender, and Location) information. Our methodology achieves reasonable classification accuracy over a gold standard dataset (F1=76.3{%}) and significantly improves over the classification performance of demographic-agnostic models.
Tasks	Word Embeddings
Published	2017-11-01
URL	https://www.aclweb.org/anthology/I17-1093/
PDF	https://www.aclweb.org/anthology/I17-1093
PWC	https://paperswithcode.com/paper/demographic-word-embeddings-for-racism
Repo
Framework

TBX in ODD: Schema-agnostic specification and documentation for TermBase eXchange


Title	TBX in ODD: Schema-agnostic specification and documentation for TermBase eXchange
Authors	Stefan Pernes, Laurent Romary
Abstract
Tasks
Published	2017-09-01
URL	https://www.aclweb.org/anthology/W17-7006/
PDF	https://www.aclweb.org/anthology/W17-7006
PWC	https://paperswithcode.com/paper/tbx-in-odd-schema-agnostic-specification-and
Repo
Framework

Book Review: Syntax-Based Statistical Machine Translation by Philip Williams, Rico Sennrich, Matt Post and Philipp Koehn


Title	Book Review: Syntax-Based Statistical Machine Translation by Philip Williams, Rico Sennrich, Matt Post and Philipp Koehn
Authors	Christian Hadiwinoto
Abstract
Tasks	Machine Translation
Published	2017-12-01
URL	https://www.aclweb.org/anthology/J17-4006/
PDF	https://www.aclweb.org/anthology/J17-4006
PWC	https://paperswithcode.com/paper/book-review-syntax-based-statistical-machine
Repo
Framework

Nonlinear Acceleration of Stochastic Algorithms


Title	Nonlinear Acceleration of Stochastic Algorithms
Authors	Damien Scieur, Francis Bach, Alexandre D’Aspremont
Abstract	Extrapolation methods use the last few iterates of an optimization algorithm to produce a better estimate of the optimum. They were shown to achieve optimal convergence rates in a deterministic setting using simple gradient iterates. Here, we study extrapolation methods in a stochastic setting, where the iterates are produced by either a simple or an accelerated stochastic gradient algorithm. We first derive convergence bounds for arbitrary, potentially biased perturbations, then produce asymptotic bounds using the ratio between the variance of the noise and the accuracy of the current point. Finally, we apply this acceleration technique to stochastic algorithms such as SGD, SAGA, SVRG and Katyusha in different settings, and show significant performance gains.
Tasks
Published	2017-12-01
URL	http://papers.nips.cc/paper/6987-nonlinear-acceleration-of-stochastic-algorithms
PDF	http://papers.nips.cc/paper/6987-nonlinear-acceleration-of-stochastic-algorithms.pdf
PWC	https://paperswithcode.com/paper/nonlinear-acceleration-of-stochastic
Repo
Framework

SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation


Title	SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation
Authors	Daniel Cer, Mona Diab, Eneko Agirre, I{~n}igo Lopez-Gazpio, Lucia Specia
Abstract	Semantic Textual Similarity (STS) measures the meaning similarity of sentences. Applications include machine translation (MT), summarization, generation, question answering (QA), short answer grading, semantic search, dialog and conversational systems. The STS shared task is a venue for assessing the current state-of-the-art. The 2017 task focuses on multilingual and cross-lingual pairs with one sub-track exploring MT quality estimation (MTQE) data. The task obtained strong participation from 31 teams, with 17 participating in \textit{all language tracks}. We summarize performance and review a selection of well performing methods. Analysis highlights common errors, providing insight into the limitations of existing models. To support ongoing work on semantic representations, the \textit{STS Benchmark} is introduced as a new shared training and evaluation set carefully selected from the corpus of English STS shared task data (2012-2017).
Tasks	Machine Translation, Natural Language Inference, Question Answering, Semantic Textual Similarity
Published	2017-08-01
URL	https://www.aclweb.org/anthology/S17-2001/
PDF	https://www.aclweb.org/anthology/S17-2001
PWC	https://paperswithcode.com/paper/semeval-2017-task-1-semantic-textual-1
Repo
Framework

ICE: Idiom and Collocation Extractor for Research and Education


Title	ICE: Idiom and Collocation Extractor for Research and Education
Authors	Vasanthi Vuppuluri, Shahryar Baki, An Nguyen, Rakesh Verma
Abstract	Collocation and idiom extraction are well-known challenges with many potential applications in Natural Language Processing (NLP). Our experimental, open-source software system, called ICE, is a python package for flexibly extracting collocations and idioms, currently in English. It also has a competitive POS tagger that can be used alone or as part of collocation/idiom extraction. ICE is available free of cost for research and educational uses in two user-friendly formats. This paper gives an overview of ICE and its performance, and briefly describes the research underlying the extraction algorithms.
Tasks	Question Answering
Published	2017-04-01
URL	https://www.aclweb.org/anthology/E17-3027/
PDF	https://www.aclweb.org/anthology/E17-3027
PWC	https://paperswithcode.com/paper/ice-idiom-and-collocation-extractor-for
Repo
Framework

UParse: the Edinburgh system for the CoNLL 2017 UD shared task


Title	UParse: the Edinburgh system for the CoNLL 2017 UD shared task
Authors	Clara Vania, Xingxing Zhang, Adam Lopez
Abstract	This paper presents our submissions for the CoNLL 2017 UD Shared Task. Our parser, called UParse, is based on a neural network graph-based dependency parser. The parser uses features from a bidirectional LSTM to to produce a distribution over possible heads for each word in the sentence. To allow transfer learning for low-resource treebanks and surprise languages, we train several multilingual models for related languages, grouped by their genus and language families. Out of 33 participants, our system achieves rank 9th in the main results, with 75.49 UAS and 68.87 LAS F-1 scores (average across 81 treebanks).
Tasks	Dependency Parsing, Machine Translation, Question Answering, Transfer Learning
Published	2017-08-01
URL	https://www.aclweb.org/anthology/K17-3010/
PDF	https://www.aclweb.org/anthology/K17-3010
PWC	https://paperswithcode.com/paper/uparse-the-edinburgh-system-for-the-conll
Repo
Framework

Head-Lexicalized Bidirectional Tree LSTMs


Title	Head-Lexicalized Bidirectional Tree LSTMs
Authors	Zhiyang Teng, Yue Zhang
Abstract	Sequential LSTMs have been extended to model tree structures, giving competitive results for a number of tasks. Existing methods model constituent trees by bottom-up combinations of constituent nodes, making direct use of input word information only for leaf nodes. This is different from sequential LSTMs, which contain references to input words for each node. In this paper, we propose a method for automatic head-lexicalization for tree-structure LSTMs, propagating head words from leaf nodes to every constituent node. In addition, enabled by head lexicalization, we build a tree LSTM in the top-down direction, which corresponds to bidirectional sequential LSTMs in structure. Experiments show that both extensions give better representations of tree structures. Our final model gives the best results on the Stanford Sentiment Treebank and highly competitive results on the TREC question type classification task.
Tasks	Language Modelling, Relation Extraction, Sentiment Analysis
Published	2017-01-01
URL	https://www.aclweb.org/anthology/Q17-1012/
PDF	https://www.aclweb.org/anthology/Q17-1012
PWC	https://paperswithcode.com/paper/head-lexicalized-bidirectional-tree-lstms
Repo
Framework


Title	Proceedings of the 6th Workshop on Recent Advances in RST and Related Formalisms
Authors
Abstract
Tasks
Published	2017-09-01
URL	https://www.aclweb.org/anthology/W17-3600/
PDF	https://www.aclweb.org/anthology/W17-3600
PWC	https://paperswithcode.com/paper/proceedings-of-the-6th-workshop-on-recent
Repo
Framework

BIT at SemEval-2017 Task 1: Using Semantic Information Space to Evaluate Semantic Textual Similarity


Title	BIT at SemEval-2017 Task 1: Using Semantic Information Space to Evaluate Semantic Textual Similarity
Authors	Hao Wu, Heyan Huang, Ping Jian, Yuhang Guo, Chao Su
Abstract	This paper presents three systems for semantic textual similarity (STS) evaluation at SemEval-2017 STS task. One is an unsupervised system and the other two are supervised systems which simply employ the unsupervised one. All our systems mainly depend on the (SIS), which is constructed based on the semantic hierarchical taxonomy in WordNet, to compute non-overlapping information content (IC) of sentences. Our team ranked 2nd among 31 participating teams by the primary score of Pearson correlation coefficient (PCC) mean of 7 tracks and achieved the best performance on Track 1 (AR-AR) dataset.
Tasks	Information Retrieval, Machine Translation, Question Answering, Semantic Textual Similarity, Text Summarization
Published	2017-08-01
URL	https://www.aclweb.org/anthology/S17-2007/
PDF	https://www.aclweb.org/anthology/S17-2007
PWC	https://paperswithcode.com/paper/bit-at-semeval-2017-task-1-using-semantic
Repo
Framework

Nonparametric Bayesian Semi-supervised Word Segmentation


Title	Nonparametric Bayesian Semi-supervised Word Segmentation
Authors	Ryo Fujii, Ryo Domoto, Daichi Mochihashi
Abstract	This paper presents a novel hybrid generative/discriminative model of word segmentation based on nonparametric Bayesian methods. Unlike ordinary discriminative word segmentation which relies only on labeled data, our semi-supervised model also leverages a huge amounts of unlabeled text to automatically learn new {``}words{''}, and further constrains them by using a labeled data to segment non-standard texts such as those found in social networking services. Specifically, our hybrid model combines a discriminative classifier (CRF; Lafferty et al. (2001) and unsupervised word segmentation (NPYLM; Mochihashi et al. (2009)), with a transparent exchange of information between these two model structures within the semi-supervised framework (JESS-CM; Suzuki and Isozaki (2008)). We confirmed that it can appropriately segment non-standard texts like those in Twitter and Weibo and has nearly state-of-the-art accuracy on standard datasets in Japanese, Chinese, and Thai. \|
Tasks	Language Modelling, Machine Translation, Speech Recognition, Tokenization
Published	2017-01-01
URL	https://www.aclweb.org/anthology/Q17-1013/
PDF	https://www.aclweb.org/anthology/Q17-1013
PWC	https://paperswithcode.com/paper/nonparametric-bayesian-semi-supervised-word
Repo
Framework