Paper Group NANR 26
![Paper Group NANR 26](/2017/images/pwc/paper-all_hu5eb227011acad6b922a57ded5f50b7dc_25576_900x500_fit_q75_box.jpg)
DLATK: Differential Language Analysis ToolKit. Combining Textual and Speech Features in the NLI Task Using State-of-the-Art Machine Learning Techniques. Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017). XJSA at SemEval-2017 Task 4: A Deep System for Sentiment Classification in Twitter. Demographic Word Embeddings for Racism Dete …
DLATK: Differential Language Analysis ToolKit
Title | DLATK: Differential Language Analysis ToolKit |
Authors | H. Andrew Schwartz, Salvatore Giorgi, Maarten Sap, Patrick Crutchley, Lyle Ungar, Johannes Eichstaedt |
Abstract | We present Differential Language Analysis Toolkit (DLATK), an open-source python package and command-line tool developed for conducting social-scientific language analyses. While DLATK provides standard NLP pipeline steps such as tokenization or SVM-classification, its novel strengths lie in analyses useful for psychological, health, and social science: (1) incorporation of extra-linguistic structured information, (2) specified levels and units of analysis (e.g. document, user, community), (3) statistical metrics for continuous outcomes, and (4) robust, proven, and accurate pipelines for social-scientific prediction problems. DLATK integrates multiple popular packages (SKLearn, Mallet), enables interactive usage (Jupyter Notebooks), and generally follows object oriented principles to make it easy to tie in additional libraries or storage technologies. |
Tasks | Tokenization |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/D17-2010/ |
https://www.aclweb.org/anthology/D17-2010 | |
PWC | https://paperswithcode.com/paper/dlatk-differential-language-analysis-toolkit |
Repo | |
Framework | |
Combining Textual and Speech Features in the NLI Task Using State-of-the-Art Machine Learning Techniques
Title | Combining Textual and Speech Features in the NLI Task Using State-of-the-Art Machine Learning Techniques |
Authors | Pavel Ircing, Jan {\v{S}}vec, Zbyn{\v{e}}k Zaj{'\i}c, Barbora Hladk{'a}, Martin Holub |
Abstract | We summarize the involvement of our CEMI team in the {''}NLI Shared Task 2017{''}, which deals with both textual and speech input data. We submitted the results achieved by using three different system architectures; each of them combines multiple supervised learning models trained on various feature sets. As expected, better results are achieved with the systems that use both the textual data and the spoken responses. Combining the input data of two different modalities led to a rather dramatic improvement in classification performance. Our best performing method is based on a set of feed-forward neural networks whose hidden-layer outputs are combined together using a softmax layer. We achieved a macro-averaged F1 score of 0.9257 on the evaluation (unseen) test set and our team placed first in the main task together with other three teams. |
Tasks | Language Acquisition, Language Identification |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/W17-5021/ |
https://www.aclweb.org/anthology/W17-5021 | |
PWC | https://paperswithcode.com/paper/combining-textual-and-speech-features-in-the |
Repo | |
Framework | |
Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017)
Title | Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017) |
Authors | |
Abstract | |
Tasks | |
Published | 2017-04-01 |
URL | https://www.aclweb.org/anthology/W17-1700/ |
https://www.aclweb.org/anthology/W17-1700 | |
PWC | https://paperswithcode.com/paper/proceedings-of-the-13th-workshop-on-multiword |
Repo | |
Framework | |
XJSA at SemEval-2017 Task 4: A Deep System for Sentiment Classification in Twitter
Title | XJSA at SemEval-2017 Task 4: A Deep System for Sentiment Classification in Twitter |
Authors | Yazhou Hao, YangYang Lan, Yufei Li, Chen Li |
Abstract | This paper describes the XJSA System submission from XJTU. Our system was created for SemEval2017 Task 4 {–} subtask A which is very popular and fundamental. The system is based on convolutional neural network and word embedding. We used two pre-trained word vectors and adopt a dynamic strategy for k-max pooling. |
Tasks | Semantic Parsing, Sentiment Analysis, Speech Recognition |
Published | 2017-08-01 |
URL | https://www.aclweb.org/anthology/S17-2122/ |
https://www.aclweb.org/anthology/S17-2122 | |
PWC | https://paperswithcode.com/paper/xjsa-at-semeval-2017-task-4-a-deep-system-for |
Repo | |
Framework | |
Demographic Word Embeddings for Racism Detection on Twitter
Title | Demographic Word Embeddings for Racism Detection on Twitter |
Authors | Mohammed Hasanuzzaman, Ga{"e}l Dias, Andy Way |
Abstract | Most social media platforms grant users freedom of speech by allowing them to freely express their thoughts, beliefs, and opinions. Although this represents incredible and unique communication opportunities, it also presents important challenges. Online racism is such an example. In this study, we present a supervised learning strategy to detect racist language on Twitter based on word embedding that incorporate demographic (Age, Gender, and Location) information. Our methodology achieves reasonable classification accuracy over a gold standard dataset (F1=76.3{%}) and significantly improves over the classification performance of demographic-agnostic models. |
Tasks | Word Embeddings |
Published | 2017-11-01 |
URL | https://www.aclweb.org/anthology/I17-1093/ |
https://www.aclweb.org/anthology/I17-1093 | |
PWC | https://paperswithcode.com/paper/demographic-word-embeddings-for-racism |
Repo | |
Framework | |
TBX in ODD: Schema-agnostic specification and documentation for TermBase eXchange
Title | TBX in ODD: Schema-agnostic specification and documentation for TermBase eXchange |
Authors | Stefan Pernes, Laurent Romary |
Abstract | |
Tasks | |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/W17-7006/ |
https://www.aclweb.org/anthology/W17-7006 | |
PWC | https://paperswithcode.com/paper/tbx-in-odd-schema-agnostic-specification-and |
Repo | |
Framework | |
Book Review: Syntax-Based Statistical Machine Translation by Philip Williams, Rico Sennrich, Matt Post and Philipp Koehn
Title | Book Review: Syntax-Based Statistical Machine Translation by Philip Williams, Rico Sennrich, Matt Post and Philipp Koehn |
Authors | Christian Hadiwinoto |
Abstract | |
Tasks | Machine Translation |
Published | 2017-12-01 |
URL | https://www.aclweb.org/anthology/J17-4006/ |
https://www.aclweb.org/anthology/J17-4006 | |
PWC | https://paperswithcode.com/paper/book-review-syntax-based-statistical-machine |
Repo | |
Framework | |
Nonlinear Acceleration of Stochastic Algorithms
Title | Nonlinear Acceleration of Stochastic Algorithms |
Authors | Damien Scieur, Francis Bach, Alexandre D’Aspremont |
Abstract | Extrapolation methods use the last few iterates of an optimization algorithm to produce a better estimate of the optimum. They were shown to achieve optimal convergence rates in a deterministic setting using simple gradient iterates. Here, we study extrapolation methods in a stochastic setting, where the iterates are produced by either a simple or an accelerated stochastic gradient algorithm. We first derive convergence bounds for arbitrary, potentially biased perturbations, then produce asymptotic bounds using the ratio between the variance of the noise and the accuracy of the current point. Finally, we apply this acceleration technique to stochastic algorithms such as SGD, SAGA, SVRG and Katyusha in different settings, and show significant performance gains. |
Tasks | |
Published | 2017-12-01 |
URL | http://papers.nips.cc/paper/6987-nonlinear-acceleration-of-stochastic-algorithms |
http://papers.nips.cc/paper/6987-nonlinear-acceleration-of-stochastic-algorithms.pdf | |
PWC | https://paperswithcode.com/paper/nonlinear-acceleration-of-stochastic |
Repo | |
Framework | |
SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation
Title | SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation |
Authors | Daniel Cer, Mona Diab, Eneko Agirre, I{~n}igo Lopez-Gazpio, Lucia Specia |
Abstract | Semantic Textual Similarity (STS) measures the meaning similarity of sentences. Applications include machine translation (MT), summarization, generation, question answering (QA), short answer grading, semantic search, dialog and conversational systems. The STS shared task is a venue for assessing the current state-of-the-art. The 2017 task focuses on multilingual and cross-lingual pairs with one sub-track exploring MT quality estimation (MTQE) data. The task obtained strong participation from 31 teams, with 17 participating in \textit{all language tracks}. We summarize performance and review a selection of well performing methods. Analysis highlights common errors, providing insight into the limitations of existing models. To support ongoing work on semantic representations, the \textit{STS Benchmark} is introduced as a new shared training and evaluation set carefully selected from the corpus of English STS shared task data (2012-2017). |
Tasks | Machine Translation, Natural Language Inference, Question Answering, Semantic Textual Similarity |
Published | 2017-08-01 |
URL | https://www.aclweb.org/anthology/S17-2001/ |
https://www.aclweb.org/anthology/S17-2001 | |
PWC | https://paperswithcode.com/paper/semeval-2017-task-1-semantic-textual-1 |
Repo | |
Framework | |
ICE: Idiom and Collocation Extractor for Research and Education
Title | ICE: Idiom and Collocation Extractor for Research and Education |
Authors | Vasanthi Vuppuluri, Shahryar Baki, An Nguyen, Rakesh Verma |
Abstract | Collocation and idiom extraction are well-known challenges with many potential applications in Natural Language Processing (NLP). Our experimental, open-source software system, called ICE, is a python package for flexibly extracting collocations and idioms, currently in English. It also has a competitive POS tagger that can be used alone or as part of collocation/idiom extraction. ICE is available free of cost for research and educational uses in two user-friendly formats. This paper gives an overview of ICE and its performance, and briefly describes the research underlying the extraction algorithms. |
Tasks | Question Answering |
Published | 2017-04-01 |
URL | https://www.aclweb.org/anthology/E17-3027/ |
https://www.aclweb.org/anthology/E17-3027 | |
PWC | https://paperswithcode.com/paper/ice-idiom-and-collocation-extractor-for |
Repo | |
Framework | |
UParse: the Edinburgh system for the CoNLL 2017 UD shared task
Title | UParse: the Edinburgh system for the CoNLL 2017 UD shared task |
Authors | Clara Vania, Xingxing Zhang, Adam Lopez |
Abstract | This paper presents our submissions for the CoNLL 2017 UD Shared Task. Our parser, called UParse, is based on a neural network graph-based dependency parser. The parser uses features from a bidirectional LSTM to to produce a distribution over possible heads for each word in the sentence. To allow transfer learning for low-resource treebanks and surprise languages, we train several multilingual models for related languages, grouped by their genus and language families. Out of 33 participants, our system achieves rank 9th in the main results, with 75.49 UAS and 68.87 LAS F-1 scores (average across 81 treebanks). |
Tasks | Dependency Parsing, Machine Translation, Question Answering, Transfer Learning |
Published | 2017-08-01 |
URL | https://www.aclweb.org/anthology/K17-3010/ |
https://www.aclweb.org/anthology/K17-3010 | |
PWC | https://paperswithcode.com/paper/uparse-the-edinburgh-system-for-the-conll |
Repo | |
Framework | |
Head-Lexicalized Bidirectional Tree LSTMs
Title | Head-Lexicalized Bidirectional Tree LSTMs |
Authors | Zhiyang Teng, Yue Zhang |
Abstract | Sequential LSTMs have been extended to model tree structures, giving competitive results for a number of tasks. Existing methods model constituent trees by bottom-up combinations of constituent nodes, making direct use of input word information only for leaf nodes. This is different from sequential LSTMs, which contain references to input words for each node. In this paper, we propose a method for automatic head-lexicalization for tree-structure LSTMs, propagating head words from leaf nodes to every constituent node. In addition, enabled by head lexicalization, we build a tree LSTM in the top-down direction, which corresponds to bidirectional sequential LSTMs in structure. Experiments show that both extensions give better representations of tree structures. Our final model gives the best results on the Stanford Sentiment Treebank and highly competitive results on the TREC question type classification task. |
Tasks | Language Modelling, Relation Extraction, Sentiment Analysis |
Published | 2017-01-01 |
URL | https://www.aclweb.org/anthology/Q17-1012/ |
https://www.aclweb.org/anthology/Q17-1012 | |
PWC | https://paperswithcode.com/paper/head-lexicalized-bidirectional-tree-lstms |
Repo | |
Framework | |
Proceedings of the 6th Workshop on Recent Advances in RST and Related Formalisms
Title | Proceedings of the 6th Workshop on Recent Advances in RST and Related Formalisms |
Authors | |
Abstract | |
Tasks | |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/W17-3600/ |
https://www.aclweb.org/anthology/W17-3600 | |
PWC | https://paperswithcode.com/paper/proceedings-of-the-6th-workshop-on-recent |
Repo | |
Framework | |
BIT at SemEval-2017 Task 1: Using Semantic Information Space to Evaluate Semantic Textual Similarity
Title | BIT at SemEval-2017 Task 1: Using Semantic Information Space to Evaluate Semantic Textual Similarity |
Authors | Hao Wu, Heyan Huang, Ping Jian, Yuhang Guo, Chao Su |
Abstract | This paper presents three systems for semantic textual similarity (STS) evaluation at SemEval-2017 STS task. One is an unsupervised system and the other two are supervised systems which simply employ the unsupervised one. All our systems mainly depend on the (SIS), which is constructed based on the semantic hierarchical taxonomy in WordNet, to compute non-overlapping information content (IC) of sentences. Our team ranked 2nd among 31 participating teams by the primary score of Pearson correlation coefficient (PCC) mean of 7 tracks and achieved the best performance on Track 1 (AR-AR) dataset. |
Tasks | Information Retrieval, Machine Translation, Question Answering, Semantic Textual Similarity, Text Summarization |
Published | 2017-08-01 |
URL | https://www.aclweb.org/anthology/S17-2007/ |
https://www.aclweb.org/anthology/S17-2007 | |
PWC | https://paperswithcode.com/paper/bit-at-semeval-2017-task-1-using-semantic |
Repo | |
Framework | |
Nonparametric Bayesian Semi-supervised Word Segmentation
Title | Nonparametric Bayesian Semi-supervised Word Segmentation |
Authors | Ryo Fujii, Ryo Domoto, Daichi Mochihashi |
Abstract | This paper presents a novel hybrid generative/discriminative model of word segmentation based on nonparametric Bayesian methods. Unlike ordinary discriminative word segmentation which relies only on labeled data, our semi-supervised model also leverages a huge amounts of unlabeled text to automatically learn new {``}words{''}, and further constrains them by using a labeled data to segment non-standard texts such as those found in social networking services. Specifically, our hybrid model combines a discriminative classifier (CRF; Lafferty et al. (2001) and unsupervised word segmentation (NPYLM; Mochihashi et al. (2009)), with a transparent exchange of information between these two model structures within the semi-supervised framework (JESS-CM; Suzuki and Isozaki (2008)). We confirmed that it can appropriately segment non-standard texts like those in Twitter and Weibo and has nearly state-of-the-art accuracy on standard datasets in Japanese, Chinese, and Thai. | |
Tasks | Language Modelling, Machine Translation, Speech Recognition, Tokenization |
Published | 2017-01-01 |
URL | https://www.aclweb.org/anthology/Q17-1013/ |
https://www.aclweb.org/anthology/Q17-1013 | |
PWC | https://paperswithcode.com/paper/nonparametric-bayesian-semi-supervised-word |
Repo | |
Framework | |