July 26, 2019

1876 words 9 mins read

Paper Group NANR 26

Paper Group NANR 26

DLATK: Differential Language Analysis ToolKit. Combining Textual and Speech Features in the NLI Task Using State-of-the-Art Machine Learning Techniques. Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017). XJSA at SemEval-2017 Task 4: A Deep System for Sentiment Classification in Twitter. Demographic Word Embeddings for Racism Dete …

DLATK: Differential Language Analysis ToolKit

Title DLATK: Differential Language Analysis ToolKit
Authors H. Andrew Schwartz, Salvatore Giorgi, Maarten Sap, Patrick Crutchley, Lyle Ungar, Johannes Eichstaedt
Abstract We present Differential Language Analysis Toolkit (DLATK), an open-source python package and command-line tool developed for conducting social-scientific language analyses. While DLATK provides standard NLP pipeline steps such as tokenization or SVM-classification, its novel strengths lie in analyses useful for psychological, health, and social science: (1) incorporation of extra-linguistic structured information, (2) specified levels and units of analysis (e.g. document, user, community), (3) statistical metrics for continuous outcomes, and (4) robust, proven, and accurate pipelines for social-scientific prediction problems. DLATK integrates multiple popular packages (SKLearn, Mallet), enables interactive usage (Jupyter Notebooks), and generally follows object oriented principles to make it easy to tie in additional libraries or storage technologies.
Tasks Tokenization
Published 2017-09-01
URL https://www.aclweb.org/anthology/D17-2010/
PDF https://www.aclweb.org/anthology/D17-2010
PWC https://paperswithcode.com/paper/dlatk-differential-language-analysis-toolkit
Repo
Framework

Combining Textual and Speech Features in the NLI Task Using State-of-the-Art Machine Learning Techniques

Title Combining Textual and Speech Features in the NLI Task Using State-of-the-Art Machine Learning Techniques
Authors Pavel Ircing, Jan {\v{S}}vec, Zbyn{\v{e}}k Zaj{'\i}c, Barbora Hladk{'a}, Martin Holub
Abstract We summarize the involvement of our CEMI team in the {''}NLI Shared Task 2017{''}, which deals with both textual and speech input data. We submitted the results achieved by using three different system architectures; each of them combines multiple supervised learning models trained on various feature sets. As expected, better results are achieved with the systems that use both the textual data and the spoken responses. Combining the input data of two different modalities led to a rather dramatic improvement in classification performance. Our best performing method is based on a set of feed-forward neural networks whose hidden-layer outputs are combined together using a softmax layer. We achieved a macro-averaged F1 score of 0.9257 on the evaluation (unseen) test set and our team placed first in the main task together with other three teams.
Tasks Language Acquisition, Language Identification
Published 2017-09-01
URL https://www.aclweb.org/anthology/W17-5021/
PDF https://www.aclweb.org/anthology/W17-5021
PWC https://paperswithcode.com/paper/combining-textual-and-speech-features-in-the
Repo
Framework

Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017)

Title Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017)
Authors
Abstract
Tasks
Published 2017-04-01
URL https://www.aclweb.org/anthology/W17-1700/
PDF https://www.aclweb.org/anthology/W17-1700
PWC https://paperswithcode.com/paper/proceedings-of-the-13th-workshop-on-multiword
Repo
Framework

XJSA at SemEval-2017 Task 4: A Deep System for Sentiment Classification in Twitter

Title XJSA at SemEval-2017 Task 4: A Deep System for Sentiment Classification in Twitter
Authors Yazhou Hao, YangYang Lan, Yufei Li, Chen Li
Abstract This paper describes the XJSA System submission from XJTU. Our system was created for SemEval2017 Task 4 {–} subtask A which is very popular and fundamental. The system is based on convolutional neural network and word embedding. We used two pre-trained word vectors and adopt a dynamic strategy for k-max pooling.
Tasks Semantic Parsing, Sentiment Analysis, Speech Recognition
Published 2017-08-01
URL https://www.aclweb.org/anthology/S17-2122/
PDF https://www.aclweb.org/anthology/S17-2122
PWC https://paperswithcode.com/paper/xjsa-at-semeval-2017-task-4-a-deep-system-for
Repo
Framework

Demographic Word Embeddings for Racism Detection on Twitter

Title Demographic Word Embeddings for Racism Detection on Twitter
Authors Mohammed Hasanuzzaman, Ga{"e}l Dias, Andy Way
Abstract Most social media platforms grant users freedom of speech by allowing them to freely express their thoughts, beliefs, and opinions. Although this represents incredible and unique communication opportunities, it also presents important challenges. Online racism is such an example. In this study, we present a supervised learning strategy to detect racist language on Twitter based on word embedding that incorporate demographic (Age, Gender, and Location) information. Our methodology achieves reasonable classification accuracy over a gold standard dataset (F1=76.3{%}) and significantly improves over the classification performance of demographic-agnostic models.
Tasks Word Embeddings
Published 2017-11-01
URL https://www.aclweb.org/anthology/I17-1093/
PDF https://www.aclweb.org/anthology/I17-1093
PWC https://paperswithcode.com/paper/demographic-word-embeddings-for-racism
Repo
Framework

TBX in ODD: Schema-agnostic specification and documentation for TermBase eXchange

Title TBX in ODD: Schema-agnostic specification and documentation for TermBase eXchange
Authors Stefan Pernes, Laurent Romary
Abstract
Tasks
Published 2017-09-01
URL https://www.aclweb.org/anthology/W17-7006/
PDF https://www.aclweb.org/anthology/W17-7006
PWC https://paperswithcode.com/paper/tbx-in-odd-schema-agnostic-specification-and
Repo
Framework

Book Review: Syntax-Based Statistical Machine Translation by Philip Williams, Rico Sennrich, Matt Post and Philipp Koehn

Title Book Review: Syntax-Based Statistical Machine Translation by Philip Williams, Rico Sennrich, Matt Post and Philipp Koehn
Authors Christian Hadiwinoto
Abstract
Tasks Machine Translation
Published 2017-12-01
URL https://www.aclweb.org/anthology/J17-4006/
PDF https://www.aclweb.org/anthology/J17-4006
PWC https://paperswithcode.com/paper/book-review-syntax-based-statistical-machine
Repo
Framework

Nonlinear Acceleration of Stochastic Algorithms

Title Nonlinear Acceleration of Stochastic Algorithms
Authors Damien Scieur, Francis Bach, Alexandre D’Aspremont
Abstract Extrapolation methods use the last few iterates of an optimization algorithm to produce a better estimate of the optimum. They were shown to achieve optimal convergence rates in a deterministic setting using simple gradient iterates. Here, we study extrapolation methods in a stochastic setting, where the iterates are produced by either a simple or an accelerated stochastic gradient algorithm. We first derive convergence bounds for arbitrary, potentially biased perturbations, then produce asymptotic bounds using the ratio between the variance of the noise and the accuracy of the current point. Finally, we apply this acceleration technique to stochastic algorithms such as SGD, SAGA, SVRG and Katyusha in different settings, and show significant performance gains.
Tasks
Published 2017-12-01
URL http://papers.nips.cc/paper/6987-nonlinear-acceleration-of-stochastic-algorithms
PDF http://papers.nips.cc/paper/6987-nonlinear-acceleration-of-stochastic-algorithms.pdf
PWC https://paperswithcode.com/paper/nonlinear-acceleration-of-stochastic
Repo
Framework

SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation

Title SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation
Authors Daniel Cer, Mona Diab, Eneko Agirre, I{~n}igo Lopez-Gazpio, Lucia Specia
Abstract Semantic Textual Similarity (STS) measures the meaning similarity of sentences. Applications include machine translation (MT), summarization, generation, question answering (QA), short answer grading, semantic search, dialog and conversational systems. The STS shared task is a venue for assessing the current state-of-the-art. The 2017 task focuses on multilingual and cross-lingual pairs with one sub-track exploring MT quality estimation (MTQE) data. The task obtained strong participation from 31 teams, with 17 participating in \textit{all language tracks}. We summarize performance and review a selection of well performing methods. Analysis highlights common errors, providing insight into the limitations of existing models. To support ongoing work on semantic representations, the \textit{STS Benchmark} is introduced as a new shared training and evaluation set carefully selected from the corpus of English STS shared task data (2012-2017).
Tasks Machine Translation, Natural Language Inference, Question Answering, Semantic Textual Similarity
Published 2017-08-01
URL https://www.aclweb.org/anthology/S17-2001/
PDF https://www.aclweb.org/anthology/S17-2001
PWC https://paperswithcode.com/paper/semeval-2017-task-1-semantic-textual-1
Repo
Framework

ICE: Idiom and Collocation Extractor for Research and Education

Title ICE: Idiom and Collocation Extractor for Research and Education
Authors Vasanthi Vuppuluri, Shahryar Baki, An Nguyen, Rakesh Verma
Abstract Collocation and idiom extraction are well-known challenges with many potential applications in Natural Language Processing (NLP). Our experimental, open-source software system, called ICE, is a python package for flexibly extracting collocations and idioms, currently in English. It also has a competitive POS tagger that can be used alone or as part of collocation/idiom extraction. ICE is available free of cost for research and educational uses in two user-friendly formats. This paper gives an overview of ICE and its performance, and briefly describes the research underlying the extraction algorithms.
Tasks Question Answering
Published 2017-04-01
URL https://www.aclweb.org/anthology/E17-3027/
PDF https://www.aclweb.org/anthology/E17-3027
PWC https://paperswithcode.com/paper/ice-idiom-and-collocation-extractor-for
Repo
Framework

UParse: the Edinburgh system for the CoNLL 2017 UD shared task

Title UParse: the Edinburgh system for the CoNLL 2017 UD shared task
Authors Clara Vania, Xingxing Zhang, Adam Lopez
Abstract This paper presents our submissions for the CoNLL 2017 UD Shared Task. Our parser, called UParse, is based on a neural network graph-based dependency parser. The parser uses features from a bidirectional LSTM to to produce a distribution over possible heads for each word in the sentence. To allow transfer learning for low-resource treebanks and surprise languages, we train several multilingual models for related languages, grouped by their genus and language families. Out of 33 participants, our system achieves rank 9th in the main results, with 75.49 UAS and 68.87 LAS F-1 scores (average across 81 treebanks).
Tasks Dependency Parsing, Machine Translation, Question Answering, Transfer Learning
Published 2017-08-01
URL https://www.aclweb.org/anthology/K17-3010/
PDF https://www.aclweb.org/anthology/K17-3010
PWC https://paperswithcode.com/paper/uparse-the-edinburgh-system-for-the-conll
Repo
Framework

Head-Lexicalized Bidirectional Tree LSTMs

Title Head-Lexicalized Bidirectional Tree LSTMs
Authors Zhiyang Teng, Yue Zhang
Abstract Sequential LSTMs have been extended to model tree structures, giving competitive results for a number of tasks. Existing methods model constituent trees by bottom-up combinations of constituent nodes, making direct use of input word information only for leaf nodes. This is different from sequential LSTMs, which contain references to input words for each node. In this paper, we propose a method for automatic head-lexicalization for tree-structure LSTMs, propagating head words from leaf nodes to every constituent node. In addition, enabled by head lexicalization, we build a tree LSTM in the top-down direction, which corresponds to bidirectional sequential LSTMs in structure. Experiments show that both extensions give better representations of tree structures. Our final model gives the best results on the Stanford Sentiment Treebank and highly competitive results on the TREC question type classification task.
Tasks Language Modelling, Relation Extraction, Sentiment Analysis
Published 2017-01-01
URL https://www.aclweb.org/anthology/Q17-1012/
PDF https://www.aclweb.org/anthology/Q17-1012
PWC https://paperswithcode.com/paper/head-lexicalized-bidirectional-tree-lstms
Repo
Framework
Title Proceedings of the 6th Workshop on Recent Advances in RST and Related Formalisms
Authors
Abstract
Tasks
Published 2017-09-01
URL https://www.aclweb.org/anthology/W17-3600/
PDF https://www.aclweb.org/anthology/W17-3600
PWC https://paperswithcode.com/paper/proceedings-of-the-6th-workshop-on-recent
Repo
Framework

BIT at SemEval-2017 Task 1: Using Semantic Information Space to Evaluate Semantic Textual Similarity

Title BIT at SemEval-2017 Task 1: Using Semantic Information Space to Evaluate Semantic Textual Similarity
Authors Hao Wu, Heyan Huang, Ping Jian, Yuhang Guo, Chao Su
Abstract This paper presents three systems for semantic textual similarity (STS) evaluation at SemEval-2017 STS task. One is an unsupervised system and the other two are supervised systems which simply employ the unsupervised one. All our systems mainly depend on the (SIS), which is constructed based on the semantic hierarchical taxonomy in WordNet, to compute non-overlapping information content (IC) of sentences. Our team ranked 2nd among 31 participating teams by the primary score of Pearson correlation coefficient (PCC) mean of 7 tracks and achieved the best performance on Track 1 (AR-AR) dataset.
Tasks Information Retrieval, Machine Translation, Question Answering, Semantic Textual Similarity, Text Summarization
Published 2017-08-01
URL https://www.aclweb.org/anthology/S17-2007/
PDF https://www.aclweb.org/anthology/S17-2007
PWC https://paperswithcode.com/paper/bit-at-semeval-2017-task-1-using-semantic
Repo
Framework

Nonparametric Bayesian Semi-supervised Word Segmentation

Title Nonparametric Bayesian Semi-supervised Word Segmentation
Authors Ryo Fujii, Ryo Domoto, Daichi Mochihashi
Abstract This paper presents a novel hybrid generative/discriminative model of word segmentation based on nonparametric Bayesian methods. Unlike ordinary discriminative word segmentation which relies only on labeled data, our semi-supervised model also leverages a huge amounts of unlabeled text to automatically learn new {``}words{''}, and further constrains them by using a labeled data to segment non-standard texts such as those found in social networking services. Specifically, our hybrid model combines a discriminative classifier (CRF; Lafferty et al. (2001) and unsupervised word segmentation (NPYLM; Mochihashi et al. (2009)), with a transparent exchange of information between these two model structures within the semi-supervised framework (JESS-CM; Suzuki and Isozaki (2008)). We confirmed that it can appropriately segment non-standard texts like those in Twitter and Weibo and has nearly state-of-the-art accuracy on standard datasets in Japanese, Chinese, and Thai. |
Tasks Language Modelling, Machine Translation, Speech Recognition, Tokenization
Published 2017-01-01
URL https://www.aclweb.org/anthology/Q17-1013/
PDF https://www.aclweb.org/anthology/Q17-1013
PWC https://paperswithcode.com/paper/nonparametric-bayesian-semi-supervised-word
Repo
Framework
comments powered by Disqus