July 26, 2019

2133 words 11 mins read

Paper Group NANR 101

Feature Hashing for Language and Dialect Identification. Truth of Varying Shades: Analyzing Language in Fake News and Political Fact-Checking. Developing a Suite of Mobile Applications for Collaborative Language Documentation. Identifying Humor in Reviews using Background Text Sources. Extensible Hierarchical Method of Detecting Interactive Actions …

Feature Hashing for Language and Dialect Identification


Title	Feature Hashing for Language and Dialect Identification
Authors	Shervin Malmasi, Mark Dras
Abstract	We evaluate feature hashing for language identification (LID), a method not previously used for this task. Using a standard dataset, we first show that while feature performance is high, LID data is highly dimensional and mostly sparse ({\textgreater}99.5{%}) as it includes large vocabularies for many languages; memory requirements grow as languages are added. Next we apply hashing using various hash sizes, demonstrating that there is no performance loss with dimensionality reductions of up to 86{%}. We also show that using an ensemble of low-dimension hash-based classifiers further boosts performance. Feature hashing is highly useful for LID and holds great promise for future work in this area.
Tasks	Dimensionality Reduction, Information Retrieval, Language Identification, Machine Translation, Text Categorization
Published	2017-07-01
URL	https://www.aclweb.org/anthology/P17-2063/
PDF	https://www.aclweb.org/anthology/P17-2063
PWC	https://paperswithcode.com/paper/feature-hashing-for-language-and-dialect
Repo
Framework

Truth of Varying Shades: Analyzing Language in Fake News and Political Fact-Checking


Title	Truth of Varying Shades: Analyzing Language in Fake News and Political Fact-Checking
Authors	Hannah Rashkin, Eunsol Choi, Jin Yea Jang, Svitlana Volkova, Yejin Choi
Abstract	We present an analytic study on the language of news media in the context of political fact-checking and fake news detection. We compare the language of real news with that of satire, hoaxes, and propaganda to find linguistic characteristics of untrustworthy text. To probe the feasibility of automatic political fact-checking, we also present a case study based on PolitiFact.com using their factuality judgments on a 6-point scale. Experiments show that while media fact-checking remains to be an open research question, stylistic cues can help determine the truthfulness of text.
Tasks	Fake News Detection
Published	2017-09-01
URL	https://www.aclweb.org/anthology/D17-1317/
PDF	https://www.aclweb.org/anthology/D17-1317
PWC	https://paperswithcode.com/paper/truth-of-varying-shades-analyzing-language-in
Repo
Framework

Developing a Suite of Mobile Applications for Collaborative Language Documentation


Title	Developing a Suite of Mobile Applications for Collaborative Language Documentation
Authors	Mat Bettinson, Steven Bird
Abstract
Tasks
Published	2017-03-01
URL	https://www.aclweb.org/anthology/W17-0121/
PDF	https://www.aclweb.org/anthology/W17-0121
PWC	https://paperswithcode.com/paper/developing-a-suite-of-mobile-applications-for
Repo
Framework

Identifying Humor in Reviews using Background Text Sources


Title	Identifying Humor in Reviews using Background Text Sources
Authors	Alex Morales, Chengxiang Zhai
Abstract	We study the problem of automatically identifying humorous text from a new kind of text data, i.e., online reviews. We propose a generative language model, based on the theory of incongruity, to model humorous text, which allows us to leverage background text sources, such as Wikipedia entry descriptions, and enables construction of multiple features for identifying humorous reviews. Evaluation of these features using supervised learning for classifying reviews into humorous and non-humorous reviews shows that the features constructed based on the proposed generative model are much more effective than the major features proposed in the existing literature, allowing us to achieve almost 86{%} accuracy. These humorous review predictions can also supply good indicators for identifying helpful reviews.
Tasks	Language Modelling
Published	2017-09-01
URL	https://www.aclweb.org/anthology/D17-1051/
PDF	https://www.aclweb.org/anthology/D17-1051
PWC	https://paperswithcode.com/paper/identifying-humor-in-reviews-using-background
Repo
Framework

Extensible Hierarchical Method of Detecting Interactive Actions for Video Understanding


Title	Extensible Hierarchical Method of Detecting Interactive Actions for Video Understanding
Authors	Jinyoung Moon, Junho Jin, Yongjin Kwon, Kyuchang Kang, Jongyoul Park, Kyoung Park
Abstract	For video understanding, namely analyzing who did what in a video, actions along with objects are primary elements. Most studies on actions have handled recognition problems for a well‐trimmed video and focused on enhancing their classification performance. However, action detection, including localization as well as recognition, is required because, in general, actions intersect in time and space. In addition, most studies have not considered extensibility for a newly added action that has been previously trained. Therefore, proposed in this paper is an extensible hierarchical method for detecting generic actions, which combine object movements and spatial relations between two objects, and inherited actions, which are determined by the related objects through an ontology and rule based methodology. The hierarchical design of the method enables it to detect any interactive actions based on the spatial relations between two objects. The method using object information achieves an F‐measure of 90.27%. Moreover, this paper describes the extensibility of the method for a new action contained in a video from a video domain that is different from the dataset used.
Tasks	Action Detection, Action Recognition In Videos, Video Understanding
Published	2017-08-11
URL	https://doi.org/10.4218/etrij.17.0116.0054
PDF	https://onlinelibrary.wiley.com/doi/pdf/10.4218/etrij.17.0116.0054
PWC	https://paperswithcode.com/paper/extensible-hierarchical-method-of-detecting
Repo
Framework

English Multiword Expression-aware Dependency Parsing Including Named Entities


Title	English Multiword Expression-aware Dependency Parsing Including Named Entities
Authors	Akihiko Kato, Hiroyuki Shindo, Yuji Matsumoto
Abstract	Because syntactic structures and spans of multiword expressions (MWEs) are independently annotated in many English syntactic corpora, they are generally inconsistent with respect to one another, which is harmful to the implementation of an aggregate system. In this work, we construct a corpus that ensures consistency between dependency structures and MWEs, including named entities. Further, we explore models that predict both MWE-spans and an MWE-aware dependency structure. Experimental results show that our joint model using additional MWE-span features achieves an MWE recognition improvement of 1.35 points over a pipeline model.
Tasks	Dependency Parsing
Published	2017-07-01
URL	https://www.aclweb.org/anthology/P17-2068/
PDF	https://www.aclweb.org/anthology/P17-2068
PWC	https://paperswithcode.com/paper/english-multiword-expression-aware-dependency
Repo
Framework

Embedded Semantic Lexicon Induction with Joint Global and Local Optimization


Title	Embedded Semantic Lexicon Induction with Joint Global and Local Optimization
Authors	Sujay Kumar Jauhar, Eduard Hovy
Abstract	Creating annotated frame lexicons such as PropBank and FrameNet is expensive and labor intensive. We present a method to induce an embedded frame lexicon in an minimally supervised fashion using nothing more than unlabeled predicate-argument word pairs. We hypothesize that aggregating such pair selectional preferences across training leads us to a global understanding that captures predicate-argument frame structure. Our approach revolves around a novel integration between a predictive embedding model and an Indian Buffet Process posterior regularizer. We show, through our experimental evaluation, that we outperform baselines on two tasks and can learn an embedded frame lexicon that is able to capture some interesting generalities in relation to hand-crafted semantic frames.
Tasks	Question Answering, Semantic Parsing
Published	2017-08-01
URL	https://www.aclweb.org/anthology/S17-1025/
PDF	https://www.aclweb.org/anthology/S17-1025
PWC	https://paperswithcode.com/paper/embedded-semantic-lexicon-induction-with
Repo
Framework

Cross-lingual Learning of Semantic Textual Similarity with Multilingual Word Representations


Title	Cross-lingual Learning of Semantic Textual Similarity with Multilingual Word Representations
Authors	Johannes Bjerva, Robert {"O}stling
Abstract
Tasks	Machine Translation, Semantic Textual Similarity
Published	2017-05-01
URL	https://www.aclweb.org/anthology/W17-0224/
PDF	https://www.aclweb.org/anthology/W17-0224
PWC	https://paperswithcode.com/paper/cross-lingual-learning-of-semantic-textual
Repo
Framework

Multiple Choice Question Generation Utilizing An Ontology


Title	Multiple Choice Question Generation Utilizing An Ontology
Authors	Katherine Stasaski, Marti A. Hearst
Abstract	Ontologies provide a structured representation of concepts and the relationships which connect them. This work investigates how a pre-existing educational Biology ontology can be used to generate useful practice questions for students by using the connectivity structure in a novel way. It also introduces a novel way to generate multiple-choice distractors from the ontology, and compares this to a baseline of using embedding representations of nodes. An assessment by an experienced science teacher shows a significant advantage over a baseline when using the ontology for distractor generation. A subsequent study with three science teachers on the results of a modified question generation algorithm finds significant improvements. An in-depth analysis of the teachers{'} comments yields useful insights for any researcher working on automated question generation for educational applications.
Tasks	Question Generation
Published	2017-09-01
URL	https://www.aclweb.org/anthology/W17-5034/
PDF	https://www.aclweb.org/anthology/W17-5034
PWC	https://paperswithcode.com/paper/multiple-choice-question-generation-utilizing
Repo
Framework

DT_Team at SemEval-2017 Task 1: Semantic Similarity Using Alignments, Sentence-Level Embeddings and Gaussian Mixture Model Output


Title	DT_Team at SemEval-2017 Task 1: Semantic Similarity Using Alignments, Sentence-Level Embeddings and Gaussian Mixture Model Output
Authors	Nabin Maharjan, Rajendra Banjade, Dipesh Gautam, Lasang J. Tamang, Vasile Rus
Abstract	We describe our system (DT Team) submitted at SemEval-2017 Task 1, Semantic Textual Similarity (STS) challenge for English (Track 5). We developed three different models with various features including similarity scores calculated using word and chunk alignments, word/sentence embeddings, and Gaussian Mixture Model(GMM). The correlation between our system{'}s output and the human judgments were up to 0.8536, which is more than 10{%} above baseline, and almost as good as the best performing system which was at 0.8547 correlation (the difference is just about 0.1{%}). Also, our system produced leading results when evaluated with a separate STS benchmark dataset. The word alignment and sentence embeddings based features were found to be very effective.
Tasks	Lemmatization, Semantic Similarity, Semantic Textual Similarity, Sentence Embeddings, Tokenization, Word Alignment
Published	2017-08-01
URL	https://www.aclweb.org/anthology/S17-2014/
PDF	https://www.aclweb.org/anthology/S17-2014
PWC	https://paperswithcode.com/paper/dt_team-at-semeval-2017-task-1-semantic
Repo
Framework

The Effect of Excluding Out of Domain Training Data from Supervised Named-Entity Recognition


Title	The Effect of Excluding Out of Domain Training Data from Supervised Named-Entity Recognition
Authors	Adam Persson
Abstract
Tasks	Named Entity Recognition
Published	2017-05-01
URL	https://www.aclweb.org/anthology/W17-0240/
PDF	https://www.aclweb.org/anthology/W17-0240
PWC	https://paperswithcode.com/paper/the-effect-of-excluding-out-of-domain
Repo
Framework

Online Partial Least Square Optimization: Dropping Convexity for Better Efficiency and Scalability


Title	Online Partial Least Square Optimization: Dropping Convexity for Better Efficiency and Scalability
Authors	Zhehui Chen, Lin F. Yang, Chris Junchi Li, Tuo Zhao
Abstract	Multiview representation learning is popular for latent factor analysis. Many existing approaches formulate the multiview representation learning as convex optimization problems, where global optima can be obtained by certain algorithms in polynomial time. However, many evidences have corroborated that heuristic nonconvex approaches also have good empirical computational performance and convergence to the global optima, although there is a lack of theoretical justification. Such a gap between theory and practice motivates us to study a nonconvex formulation for multiview representation learning, which can be efficiently solved by a simple stochastic gradient descent method. By analyzing the dynamics of the algorithm based on diffusion processes, we establish a global rate of convergence to the global optima. Numerical experiments are provided to support our theory.
Tasks	Representation Learning
Published	2017-08-01
URL	https://icml.cc/Conferences/2017/Schedule?showEvent=791
PDF	http://proceedings.mlr.press/v70/chen17h/chen17h.pdf
PWC	https://paperswithcode.com/paper/online-partial-least-square-optimization
Repo
Framework

Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe


Title	Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe
Authors	Milan Straka, Jana Strakov{'a}
Abstract	Many natural language processing tasks, including the most advanced ones, routinely start by several basic processing steps {–} tokenization and segmentation, most likely also POS tagging and lemmatization, and commonly parsing as well. A multilingual pipeline performing these steps can be trained using the Universal Dependencies project, which contains annotations of the described tasks for 50 languages in the latest release UD 2.0. We present an update to UDPipe, a simple-to-use pipeline processing CoNLL-U version 2.0 files, which performs these tasks for multiple languages without requiring additional external data. We provide models for all 50 languages of UD 2.0, and furthermore, the pipeline can be trained easily using data in CoNLL-U format. UDPipe is a standalone application in C++, with bindings available for Python, Java, C{#} and Perl. In the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, UDPipe was the eight best system, while achieving low running times and moderately sized models.
Tasks	Dependency Parsing, Lemmatization, Tokenization
Published	2017-08-01
URL	https://www.aclweb.org/anthology/K17-3009/
PDF	https://www.aclweb.org/anthology/K17-3009
PWC	https://paperswithcode.com/paper/tokenizing-pos-tagging-lemmatizing-and
Repo
Framework

SemEval-2017 Task 9: Abstract Meaning Representation Parsing and Generation


Title	SemEval-2017 Task 9: Abstract Meaning Representation Parsing and Generation
Authors	Jonathan May, Jay Priyadarshi
Abstract	In this report we summarize the results of the 2017 AMR SemEval shared task. The task consisted of two separate yet related subtasks. In the parsing subtask, participants were asked to produce Abstract Meaning Representation (AMR) (Banarescu et al., 2013) graphs for a set of English sentences in the biomedical domain. In the generation subtask, participants were asked to generate English sentences given AMR graphs in the news/forum domain. A total of five sites participated in the parsing subtask, and four participated in the generation subtask. Along with a description of the task and the participants{'} systems, we show various score ablations and some sample outputs.
Tasks	Amr Parsing, Machine Translation
Published	2017-08-01
URL	https://www.aclweb.org/anthology/S17-2090/
PDF	https://www.aclweb.org/anthology/S17-2090
PWC	https://paperswithcode.com/paper/semeval-2017-task-9-abstract-meaning
Repo
Framework

MI&T Lab at SemEval-2017 task 4: An Integrated Training Method of Word Vector for Sentiment Classification


Title	MI&T Lab at SemEval-2017 task 4: An Integrated Training Method of Word Vector for Sentiment Classification
Authors	Jingjing Zhao, Yan Yang, Bing Xu
Abstract	A CNN method for sentiment classification task in Task 4A of SemEval 2017 is presented. To solve the problem of word2vec training word vector slowly, a method of training word vector by integrating word2vec and Convolutional Neural Network (CNN) is proposed. This training method not only improves the training speed of word2vec, but also makes the word vector more effective for the target task. Furthermore, the word2vec adopts a full connection between the input layer and the projection layer of the Continuous Bag-of-Words (CBOW) for acquiring the semantic information of the original sentence.
Tasks	Sentiment Analysis, Twitter Sentiment Analysis
Published	2017-08-01
URL	https://www.aclweb.org/anthology/S17-2114/
PDF	https://www.aclweb.org/anthology/S17-2114
PWC	https://paperswithcode.com/paper/mit-lab-at-semeval-2017-task-4-an-integrated
Repo
Framework