July 26, 2019

2133 words 11 mins read

Paper Group NANR 101

Paper Group NANR 101

Feature Hashing for Language and Dialect Identification. Truth of Varying Shades: Analyzing Language in Fake News and Political Fact-Checking. Developing a Suite of Mobile Applications for Collaborative Language Documentation. Identifying Humor in Reviews using Background Text Sources. Extensible Hierarchical Method of Detecting Interactive Actions …

Feature Hashing for Language and Dialect Identification

Title Feature Hashing for Language and Dialect Identification
Authors Shervin Malmasi, Mark Dras
Abstract We evaluate feature hashing for language identification (LID), a method not previously used for this task. Using a standard dataset, we first show that while feature performance is high, LID data is highly dimensional and mostly sparse ({\textgreater}99.5{%}) as it includes large vocabularies for many languages; memory requirements grow as languages are added. Next we apply hashing using various hash sizes, demonstrating that there is no performance loss with dimensionality reductions of up to 86{%}. We also show that using an ensemble of low-dimension hash-based classifiers further boosts performance. Feature hashing is highly useful for LID and holds great promise for future work in this area.
Tasks Dimensionality Reduction, Information Retrieval, Language Identification, Machine Translation, Text Categorization
Published 2017-07-01
URL https://www.aclweb.org/anthology/P17-2063/
PDF https://www.aclweb.org/anthology/P17-2063
PWC https://paperswithcode.com/paper/feature-hashing-for-language-and-dialect
Repo
Framework

Truth of Varying Shades: Analyzing Language in Fake News and Political Fact-Checking

Title Truth of Varying Shades: Analyzing Language in Fake News and Political Fact-Checking
Authors Hannah Rashkin, Eunsol Choi, Jin Yea Jang, Svitlana Volkova, Yejin Choi
Abstract We present an analytic study on the language of news media in the context of political fact-checking and fake news detection. We compare the language of real news with that of satire, hoaxes, and propaganda to find linguistic characteristics of untrustworthy text. To probe the feasibility of automatic political fact-checking, we also present a case study based on PolitiFact.com using their factuality judgments on a 6-point scale. Experiments show that while media fact-checking remains to be an open research question, stylistic cues can help determine the truthfulness of text.
Tasks Fake News Detection
Published 2017-09-01
URL https://www.aclweb.org/anthology/D17-1317/
PDF https://www.aclweb.org/anthology/D17-1317
PWC https://paperswithcode.com/paper/truth-of-varying-shades-analyzing-language-in
Repo
Framework

Developing a Suite of Mobile Applications for Collaborative Language Documentation

Title Developing a Suite of Mobile Applications for Collaborative Language Documentation
Authors Mat Bettinson, Steven Bird
Abstract
Tasks
Published 2017-03-01
URL https://www.aclweb.org/anthology/W17-0121/
PDF https://www.aclweb.org/anthology/W17-0121
PWC https://paperswithcode.com/paper/developing-a-suite-of-mobile-applications-for
Repo
Framework

Identifying Humor in Reviews using Background Text Sources

Title Identifying Humor in Reviews using Background Text Sources
Authors Alex Morales, Chengxiang Zhai
Abstract We study the problem of automatically identifying humorous text from a new kind of text data, i.e., online reviews. We propose a generative language model, based on the theory of incongruity, to model humorous text, which allows us to leverage background text sources, such as Wikipedia entry descriptions, and enables construction of multiple features for identifying humorous reviews. Evaluation of these features using supervised learning for classifying reviews into humorous and non-humorous reviews shows that the features constructed based on the proposed generative model are much more effective than the major features proposed in the existing literature, allowing us to achieve almost 86{%} accuracy. These humorous review predictions can also supply good indicators for identifying helpful reviews.
Tasks Language Modelling
Published 2017-09-01
URL https://www.aclweb.org/anthology/D17-1051/
PDF https://www.aclweb.org/anthology/D17-1051
PWC https://paperswithcode.com/paper/identifying-humor-in-reviews-using-background
Repo
Framework

Extensible Hierarchical Method of Detecting Interactive Actions for Video Understanding

Title Extensible Hierarchical Method of Detecting Interactive Actions for Video Understanding
Authors Jinyoung Moon, Junho Jin, Yongjin Kwon, Kyuchang Kang, Jongyoul Park, Kyoung Park
Abstract For video understanding, namely analyzing who did what in a video, actions along with objects are primary elements. Most studies on actions have handled recognition problems for a well‐trimmed video and focused on enhancing their classification performance. However, action detection, including localization as well as recognition, is required because, in general, actions intersect in time and space. In addition, most studies have not considered extensibility for a newly added action that has been previously trained. Therefore, proposed in this paper is an extensible hierarchical method for detecting generic actions, which combine object movements and spatial relations between two objects, and inherited actions, which are determined by the related objects through an ontology and rule based methodology. The hierarchical design of the method enables it to detect any interactive actions based on the spatial relations between two objects. The method using object information achieves an F‐measure of 90.27%. Moreover, this paper describes the extensibility of the method for a new action contained in a video from a video domain that is different from the dataset used.
Tasks Action Detection, Action Recognition In Videos, Video Understanding
Published 2017-08-11
URL https://doi.org/10.4218/etrij.17.0116.0054
PDF https://onlinelibrary.wiley.com/doi/pdf/10.4218/etrij.17.0116.0054
PWC https://paperswithcode.com/paper/extensible-hierarchical-method-of-detecting
Repo
Framework

English Multiword Expression-aware Dependency Parsing Including Named Entities

Title English Multiword Expression-aware Dependency Parsing Including Named Entities
Authors Akihiko Kato, Hiroyuki Shindo, Yuji Matsumoto
Abstract Because syntactic structures and spans of multiword expressions (MWEs) are independently annotated in many English syntactic corpora, they are generally inconsistent with respect to one another, which is harmful to the implementation of an aggregate system. In this work, we construct a corpus that ensures consistency between dependency structures and MWEs, including named entities. Further, we explore models that predict both MWE-spans and an MWE-aware dependency structure. Experimental results show that our joint model using additional MWE-span features achieves an MWE recognition improvement of 1.35 points over a pipeline model.
Tasks Dependency Parsing
Published 2017-07-01
URL https://www.aclweb.org/anthology/P17-2068/
PDF https://www.aclweb.org/anthology/P17-2068
PWC https://paperswithcode.com/paper/english-multiword-expression-aware-dependency
Repo
Framework

Embedded Semantic Lexicon Induction with Joint Global and Local Optimization

Title Embedded Semantic Lexicon Induction with Joint Global and Local Optimization
Authors Sujay Kumar Jauhar, Eduard Hovy
Abstract Creating annotated frame lexicons such as PropBank and FrameNet is expensive and labor intensive. We present a method to induce an embedded frame lexicon in an minimally supervised fashion using nothing more than unlabeled predicate-argument word pairs. We hypothesize that aggregating such pair selectional preferences across training leads us to a global understanding that captures predicate-argument frame structure. Our approach revolves around a novel integration between a predictive embedding model and an Indian Buffet Process posterior regularizer. We show, through our experimental evaluation, that we outperform baselines on two tasks and can learn an embedded frame lexicon that is able to capture some interesting generalities in relation to hand-crafted semantic frames.
Tasks Question Answering, Semantic Parsing
Published 2017-08-01
URL https://www.aclweb.org/anthology/S17-1025/
PDF https://www.aclweb.org/anthology/S17-1025
PWC https://paperswithcode.com/paper/embedded-semantic-lexicon-induction-with
Repo
Framework

Cross-lingual Learning of Semantic Textual Similarity with Multilingual Word Representations

Title Cross-lingual Learning of Semantic Textual Similarity with Multilingual Word Representations
Authors Johannes Bjerva, Robert {"O}stling
Abstract
Tasks Machine Translation, Semantic Textual Similarity
Published 2017-05-01
URL https://www.aclweb.org/anthology/W17-0224/
PDF https://www.aclweb.org/anthology/W17-0224
PWC https://paperswithcode.com/paper/cross-lingual-learning-of-semantic-textual
Repo
Framework

Multiple Choice Question Generation Utilizing An Ontology

Title Multiple Choice Question Generation Utilizing An Ontology
Authors Katherine Stasaski, Marti A. Hearst
Abstract Ontologies provide a structured representation of concepts and the relationships which connect them. This work investigates how a pre-existing educational Biology ontology can be used to generate useful practice questions for students by using the connectivity structure in a novel way. It also introduces a novel way to generate multiple-choice distractors from the ontology, and compares this to a baseline of using embedding representations of nodes. An assessment by an experienced science teacher shows a significant advantage over a baseline when using the ontology for distractor generation. A subsequent study with three science teachers on the results of a modified question generation algorithm finds significant improvements. An in-depth analysis of the teachers{'} comments yields useful insights for any researcher working on automated question generation for educational applications.
Tasks Question Generation
Published 2017-09-01
URL https://www.aclweb.org/anthology/W17-5034/
PDF https://www.aclweb.org/anthology/W17-5034
PWC https://paperswithcode.com/paper/multiple-choice-question-generation-utilizing
Repo
Framework

DT_Team at SemEval-2017 Task 1: Semantic Similarity Using Alignments, Sentence-Level Embeddings and Gaussian Mixture Model Output

Title DT_Team at SemEval-2017 Task 1: Semantic Similarity Using Alignments, Sentence-Level Embeddings and Gaussian Mixture Model Output
Authors Nabin Maharjan, Rajendra Banjade, Dipesh Gautam, Lasang J. Tamang, Vasile Rus
Abstract We describe our system (DT Team) submitted at SemEval-2017 Task 1, Semantic Textual Similarity (STS) challenge for English (Track 5). We developed three different models with various features including similarity scores calculated using word and chunk alignments, word/sentence embeddings, and Gaussian Mixture Model(GMM). The correlation between our system{'}s output and the human judgments were up to 0.8536, which is more than 10{%} above baseline, and almost as good as the best performing system which was at 0.8547 correlation (the difference is just about 0.1{%}). Also, our system produced leading results when evaluated with a separate STS benchmark dataset. The word alignment and sentence embeddings based features were found to be very effective.
Tasks Lemmatization, Semantic Similarity, Semantic Textual Similarity, Sentence Embeddings, Tokenization, Word Alignment
Published 2017-08-01
URL https://www.aclweb.org/anthology/S17-2014/
PDF https://www.aclweb.org/anthology/S17-2014
PWC https://paperswithcode.com/paper/dt_team-at-semeval-2017-task-1-semantic
Repo
Framework

The Effect of Excluding Out of Domain Training Data from Supervised Named-Entity Recognition

Title The Effect of Excluding Out of Domain Training Data from Supervised Named-Entity Recognition
Authors Adam Persson
Abstract
Tasks Named Entity Recognition
Published 2017-05-01
URL https://www.aclweb.org/anthology/W17-0240/
PDF https://www.aclweb.org/anthology/W17-0240
PWC https://paperswithcode.com/paper/the-effect-of-excluding-out-of-domain
Repo
Framework

Online Partial Least Square Optimization: Dropping Convexity for Better Efficiency and Scalability

Title Online Partial Least Square Optimization: Dropping Convexity for Better Efficiency and Scalability
Authors Zhehui Chen, Lin F. Yang, Chris Junchi Li, Tuo Zhao
Abstract Multiview representation learning is popular for latent factor analysis. Many existing approaches formulate the multiview representation learning as convex optimization problems, where global optima can be obtained by certain algorithms in polynomial time. However, many evidences have corroborated that heuristic nonconvex approaches also have good empirical computational performance and convergence to the global optima, although there is a lack of theoretical justification. Such a gap between theory and practice motivates us to study a nonconvex formulation for multiview representation learning, which can be efficiently solved by a simple stochastic gradient descent method. By analyzing the dynamics of the algorithm based on diffusion processes, we establish a global rate of convergence to the global optima. Numerical experiments are provided to support our theory.
Tasks Representation Learning
Published 2017-08-01
URL https://icml.cc/Conferences/2017/Schedule?showEvent=791
PDF http://proceedings.mlr.press/v70/chen17h/chen17h.pdf
PWC https://paperswithcode.com/paper/online-partial-least-square-optimization
Repo
Framework

Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe

Title Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe
Authors Milan Straka, Jana Strakov{'a}
Abstract Many natural language processing tasks, including the most advanced ones, routinely start by several basic processing steps {–} tokenization and segmentation, most likely also POS tagging and lemmatization, and commonly parsing as well. A multilingual pipeline performing these steps can be trained using the Universal Dependencies project, which contains annotations of the described tasks for 50 languages in the latest release UD 2.0. We present an update to UDPipe, a simple-to-use pipeline processing CoNLL-U version 2.0 files, which performs these tasks for multiple languages without requiring additional external data. We provide models for all 50 languages of UD 2.0, and furthermore, the pipeline can be trained easily using data in CoNLL-U format. UDPipe is a standalone application in C++, with bindings available for Python, Java, C{#} and Perl. In the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, UDPipe was the eight best system, while achieving low running times and moderately sized models.
Tasks Dependency Parsing, Lemmatization, Tokenization
Published 2017-08-01
URL https://www.aclweb.org/anthology/K17-3009/
PDF https://www.aclweb.org/anthology/K17-3009
PWC https://paperswithcode.com/paper/tokenizing-pos-tagging-lemmatizing-and
Repo
Framework

SemEval-2017 Task 9: Abstract Meaning Representation Parsing and Generation

Title SemEval-2017 Task 9: Abstract Meaning Representation Parsing and Generation
Authors Jonathan May, Jay Priyadarshi
Abstract In this report we summarize the results of the 2017 AMR SemEval shared task. The task consisted of two separate yet related subtasks. In the parsing subtask, participants were asked to produce Abstract Meaning Representation (AMR) (Banarescu et al., 2013) graphs for a set of English sentences in the biomedical domain. In the generation subtask, participants were asked to generate English sentences given AMR graphs in the news/forum domain. A total of five sites participated in the parsing subtask, and four participated in the generation subtask. Along with a description of the task and the participants{'} systems, we show various score ablations and some sample outputs.
Tasks Amr Parsing, Machine Translation
Published 2017-08-01
URL https://www.aclweb.org/anthology/S17-2090/
PDF https://www.aclweb.org/anthology/S17-2090
PWC https://paperswithcode.com/paper/semeval-2017-task-9-abstract-meaning
Repo
Framework

MI&T Lab at SemEval-2017 task 4: An Integrated Training Method of Word Vector for Sentiment Classification

Title MI&T Lab at SemEval-2017 task 4: An Integrated Training Method of Word Vector for Sentiment Classification
Authors Jingjing Zhao, Yan Yang, Bing Xu
Abstract A CNN method for sentiment classification task in Task 4A of SemEval 2017 is presented. To solve the problem of word2vec training word vector slowly, a method of training word vector by integrating word2vec and Convolutional Neural Network (CNN) is proposed. This training method not only improves the training speed of word2vec, but also makes the word vector more effective for the target task. Furthermore, the word2vec adopts a full connection between the input layer and the projection layer of the Continuous Bag-of-Words (CBOW) for acquiring the semantic information of the original sentence.
Tasks Sentiment Analysis, Twitter Sentiment Analysis
Published 2017-08-01
URL https://www.aclweb.org/anthology/S17-2114/
PDF https://www.aclweb.org/anthology/S17-2114
PWC https://paperswithcode.com/paper/mit-lab-at-semeval-2017-task-4-an-integrated
Repo
Framework
comments powered by Disqus