October 15, 2019

2270 words 11 mins read

Paper Group NANR 181

Getting to ``Hearer-old’': Charting Referring Expressions Across Time. Modeling Personality Traits of Filipino Twitter Users. An Approach to the CLPsych 2018 Shared Task Using Top-Down Text Representation and Simple Bottom-Up Model Selection. MIAPARLE: Online training for the discrimination of stress contrasts. Limbic: Author-Based Sentiment Aspect …

Getting to ``Hearer-old’': Charting Referring Expressions Across Time


Title	Getting to ``Hearer-old’': Charting Referring Expressions Across Time \|
Authors	Ieva Stali{=u}nait{.e}, Hannah Rohde, Bonnie Webber, Annie Louis
Abstract	When a reader is first introduced to an entity, its referring expression must describe the entity. For entities that are widely known, a single word or phrase often suffices. This paper presents the first study of how expressions that refer to the same entity develop over time. We track thousands of person and organization entities over 20 years of New York Times (NYT). As entities move from hearer-new (first introduction to the NYT audience) to hearer-old (common knowledge) status, we show empirically that the referring expressions along this trajectory depend on the type of the entity, and exhibit linguistic properties related to becoming common knowledge (e.g., shorter length, less use of appositives, more definiteness). These properties can also be used to build a model to predict how long it will take for an entity to reach hearer-old status. Our results reach 10-30{%} absolute improvement over a majority-class baseline.
Tasks	Coreference Resolution
Published	2018-10-01
URL	https://www.aclweb.org/anthology/D18-1466/
PDF	https://www.aclweb.org/anthology/D18-1466
PWC	https://paperswithcode.com/paper/getting-to-hearer-old-charting-referring
Repo
Framework

Modeling Personality Traits of Filipino Twitter Users


Title	Modeling Personality Traits of Filipino Twitter Users
Authors	Edward Tighe, Charibeth Cheng
Abstract	Recent studies in the field of text-based personality recognition experiment with different languages, feature extraction techniques, and machine learning algorithms to create better and more accurate models; however, little focus is placed on exploring the language use of a group of individuals defined by nationality. Individuals of the same nationality share certain practices and communicate certain ideas that can become embedded into their natural language. Many nationals are also not limited to speaking just one language, such as how Filipinos speak Filipino and English, the two national languages of the Philippines. The addition of several regional/indigenous languages, along with the commonness of code-switching, allow for a Filipino to have a rich vocabulary. This presents an opportunity to create a text-based personality model based on how Filipinos speak, regardless of the language they use. To do so, data was collected from 250 Filipino Twitter users. Different combinations of data processing techniques were experimented upon to create personality models for each of the Big Five. The results for both regression and classification show that Conscientiousness is consistently the easiest trait to model, followed by Extraversion. Classification models for Agreeableness and Neuroticism had subpar performances, but performed better than those of Openness. An analysis on personality trait score representation showed that classifying extreme outliers generally produce better results for all traits except for Neuroticism and Openness.
Tasks
Published	2018-06-01
URL	https://www.aclweb.org/anthology/W18-1115/
PDF	https://www.aclweb.org/anthology/W18-1115
PWC	https://paperswithcode.com/paper/modeling-personality-traits-of-filipino
Repo
Framework

An Approach to the CLPsych 2018 Shared Task Using Top-Down Text Representation and Simple Bottom-Up Model Selection


Title	An Approach to the CLPsych 2018 Shared Task Using Top-Down Text Representation and Simple Bottom-Up Model Selection
Authors	Micah Iserman, Molly Ireland, Andrew Littlefield, Tyler Davis, Sage Maliepaard
Abstract
Tasks	Model Selection
Published	2018-06-01
URL	https://www.aclweb.org/anthology/papers/W18-0605/w18-0605
PDF	https://www.aclweb.org/anthology/W18-0605
PWC	https://paperswithcode.com/paper/an-approach-to-the-clpsych-2018-shared-task
Repo
Framework

MIAPARLE: Online training for the discrimination of stress contrasts


Title	MIAPARLE: Online training for the discrimination of stress contrasts
Authors	Jean-Philippe Goldman, S Schwab, ra
Abstract
Tasks
Published	2018-05-01
URL	https://www.aclweb.org/anthology/L18-1364/
PDF	https://www.aclweb.org/anthology/L18-1364
PWC	https://paperswithcode.com/paper/miaparle-online-training-for-the
Repo
Framework

Limbic: Author-Based Sentiment Aspect Modeling Regularized with Word Embeddings and Discourse Relations


Title	Limbic: Author-Based Sentiment Aspect Modeling Regularized with Word Embeddings and Discourse Relations
Authors	Zhe Zhang, Munindar Singh
Abstract	We propose Limbic, an unsupervised probabilistic model that addresses the problem of discovering aspects and sentiments and associating them with authors of opinionated texts. Limbic combines three ideas, incorporating authors, discourse relations, and word embeddings. For discourse relations, Limbic adopts a generative process regularized by a Markov Random Field. To promote words with high semantic similarity into the same topic, Limbic captures semantic regularities from word embeddings via a generalized P{'o}lya Urn process. We demonstrate that Limbic (1) discovers aspects associated with sentiments with high lexical diversity; (2) outperforms state-of-the-art models by a substantial margin in topic cohesion and sentiment classification.
Tasks	Semantic Similarity, Semantic Textual Similarity, Sentiment Analysis, Topic Models, Word Embeddings
Published	2018-10-01
URL	https://www.aclweb.org/anthology/D18-1378/
PDF	https://www.aclweb.org/anthology/D18-1378
PWC	https://paperswithcode.com/paper/limbic-author-based-sentiment-aspect-modeling
Repo
Framework

Infant Word Comprehension-to-Production Index Applied to Investigation of Noun Learning Predominance Using Cross-lingual CDI database


Title	Infant Word Comprehension-to-Production Index Applied to Investigation of Noun Learning Predominance Using Cross-lingual CDI database
Authors	Yasuhiro Minami, Tessei Kobayashi, Yuko Okumura
Abstract
Tasks
Published	2018-05-01
URL	https://www.aclweb.org/anthology/L18-1362/
PDF	https://www.aclweb.org/anthology/L18-1362
PWC	https://paperswithcode.com/paper/infant-word-comprehension-to-production-index
Repo
Framework

AUTOMATA GUIDED HIERARCHICAL REINFORCEMENT LEARNING FOR ZERO-SHOT SKILL COMPOSITION


Title	AUTOMATA GUIDED HIERARCHICAL REINFORCEMENT LEARNING FOR ZERO-SHOT SKILL COMPOSITION
Authors	Xiao Li, Yao Ma, Calin Belta
Abstract	An obstacle that prevents the wide adoption of (deep) reinforcement learning (RL) in control systems is its need for a large number of interactions with the environment in order to master a skill. The learned skill usually generalizes poorly across domains and re-training is often necessary when presented with a new task. We present a framework that combines techniques in \textit{formal methods} with \textit{hierarchical reinforcement learning} (HRL). The set of techniques we provide allows for the convenient specification of tasks with logical expressions, learns hierarchical policies (meta-controller and low-level controllers) with well-defined intrinsic rewards using any RL methods and is able to construct new skills from existing ones without additional learning. We evaluate the proposed methods in a simple grid world simulation as well as simulation on a Baxter robot.
Tasks	Hierarchical Reinforcement Learning
Published	2018-01-01
URL	https://openreview.net/forum?id=BJgVaG-Ab
PDF	https://openreview.net/pdf?id=BJgVaG-Ab
PWC	https://paperswithcode.com/paper/automata-guided-hierarchical-reinforcement-1
Repo
Framework

Aggressive language in an online hacking forum


Title	Aggressive language in an online hacking forum
Authors	Andrew Caines, Sergio Pastrana, Alice Hutchings, Paula Buttery
Abstract	We probe the heterogeneity in levels of abusive language in different sections of the Internet, using an annotated corpus of Wikipedia page edit comments to train a binary classifier for abuse detection. Our test data come from the CrimeBB Corpus of hacking-related forum posts and we find that (a) forum interactions are rarely abusive, (b) the abusive language which does exist tends to be relatively mild compared to that found in the Wikipedia comments domain, and tends to involve aggressive posturing rather than hate speech or threats of violence. We observe that the purpose of conversations in online forums tend to be more constructive and informative than those in Wikipedia page edit comments which are geared more towards adversarial interactions, and that this may explain the lower levels of abuse found in our forum data than in Wikipedia comments. Further work remains to be done to compare these results with other inter-domain classification experiments, and to understand the impact of aggressive language in forum conversations.
Tasks	Abuse Detection
Published	2018-10-01
URL	https://www.aclweb.org/anthology/W18-5109/
PDF	https://www.aclweb.org/anthology/W18-5109
PWC	https://paperswithcode.com/paper/aggressive-language-in-an-online-hacking
Repo
Framework

Sanskrit Sandhi Splitting using seq2(seq)2


Title	Sanskrit Sandhi Splitting using seq2(seq)2
Authors	Rahul Aralikatte, Neelamadhav Gantayat, Naveen Panwar, Anush Sankaran, Senthil Mani
Abstract	In Sanskrit, small words (morphemes) are combined to form compound words through a process known as Sandhi. Sandhi splitting is the process of splitting a given compound word into its constituent morphemes. Although rules governing word splitting exists in the language, it is highly challenging to identify the location of the splits in a compound word. Though existing Sandhi splitting systems incorporate these pre-defined splitting rules, they have a low accuracy as the same compound word might be broken down in multiple ways to provide syntactically correct splits. In this research, we propose a novel deep learning architecture called Double Decoder RNN (DD-RNN), which (i) predicts the location of the split(s) with 95{%} accuracy, and (ii) predicts the constituent words (learning the Sandhi splitting rules) with 79.5{%} accuracy, outperforming the state-of-art by 20{%}. Additionally, we show the generalization capability of our deep learning model, by showing competitive results in the problem of Chinese word segmentation, as well.
Tasks	Chinese Word Segmentation
Published	2018-10-01
URL	https://www.aclweb.org/anthology/D18-1530/
PDF	https://www.aclweb.org/anthology/D18-1530
PWC	https://paperswithcode.com/paper/sanskrit-sandhi-splitting-using-seq2seq2
Repo
Framework

Learning and Using the Arrow of Time


Title	Learning and Using the Arrow of Time
Authors	Donglai Wei, Joseph J. Lim, Andrew Zisserman, William T. Freeman
Abstract	We seek to understand the arrow of time in videos – what makes videos look like they are playing forwards or backwards? Can we visualize the cues? Can the arrow of time be a supervisory signal useful for activity analysis? To this end, we build three large-scale video datasets and apply a learning-based approach to these tasks. To learn the arrow of time efficiently and reliably, we design a ConvNet suitable for extended temporal footprints and for class activation visualization, and study the effect of artificial cues, such as cinematographic conventions, on learning. Our trained model achieves state-of-the-art performance on large-scale real-world video datasets. Through cluster analysis and localization of important regions for the prediction, we examine learned visual cues that are consistent among many samples and show when and where they occur. Lastly, we use the trained ConvNet for two applications: self-supervision for action recognition, and video forensics – determining whether Hollywood film clips have been deliberately reversed in time, often used as special effects.
Tasks	Temporal Action Localization
Published	2018-06-01
URL	http://openaccess.thecvf.com/content_cvpr_2018/html/Wei_Learning_and_Using_CVPR_2018_paper.html
PDF	http://openaccess.thecvf.com/content_cvpr_2018/papers/Wei_Learning_and_Using_CVPR_2018_paper.pdf
PWC	https://paperswithcode.com/paper/learning-and-using-the-arrow-of-time
Repo
Framework

Read and Comprehend by Gated-Attention Reader with More Belief


Title	Read and Comprehend by Gated-Attention Reader with More Belief
Authors	Haohui Deng, Yik-Cheung Tam
Abstract	Gated-Attention (GA) Reader has been effective for reading comprehension. GA Reader makes two assumptions: (1) a uni-directional attention that uses an input query to gate token encodings of a document; (2) encoding at the cloze position of an input query is considered for answer prediction. In this paper, we propose Collaborative Gating (CG) and Self-Belief Aggregation (SBA) to address the above assumptions respectively. In CG, we first use an input document to gate token encodings of an input query so that the influence of irrelevant query tokens may be reduced. Then the filtered query is used to gate token encodings of an document in a collaborative fashion. In SBA, we conjecture that query tokens other than the cloze token may be informative for answer prediction. We apply self-attention to link the cloze token with other tokens in a query so that the importance of query tokens with respect to the cloze position are weighted. Then their evidences are weighted, propagated and aggregated for better reading comprehension. Experiments show that our approaches advance the state-of-theart results in CNN, Daily Mail, and Who Did What public test sets.
Tasks	Reading Comprehension, Word Alignment
Published	2018-06-01
URL	https://www.aclweb.org/anthology/N18-4012/
PDF	https://www.aclweb.org/anthology/N18-4012
PWC	https://paperswithcode.com/paper/read-and-comprehend-by-gated-attention-reader
Repo
Framework

Chrono at SemEval-2018 Task 6: A System for Normalizing Temporal Expressions


Title	Chrono at SemEval-2018 Task 6: A System for Normalizing Temporal Expressions
Authors	Amy Olex, Luke Maffey, Nicholas Morgan, Bridget McInnes
Abstract	Temporal information extraction is a challenging task. Here we describe Chrono, a hybrid rule-based and machine learning system that identifies temporal expressions in text and normalizes them into the SCATE schema. After minor parsing logic adjustments, Chrono has emerged as the top performing system for SemEval 2018 Task 6: Parsing Time Normalizations.
Tasks	Temporal Information Extraction
Published	2018-06-01
URL	https://www.aclweb.org/anthology/S18-1012/
PDF	https://www.aclweb.org/anthology/S18-1012
PWC	https://paperswithcode.com/paper/chrono-at-semeval-2018-task-6-a-system-for
Repo
Framework

Treat the system like a human student: Automatic naturalness evaluation of generated text without reference texts


Title	Treat the system like a human student: Automatic naturalness evaluation of generated text without reference texts
Authors	Isabel Groves, Ye Tian, Ioannis Douratsos
Abstract	The current most popular method for automatic Natural Language Generation (NLG) evaluation is comparing generated text with human-written reference sentences using a metrics system, which has drawbacks around reliability and scalability. We draw inspiration from second language (L2) assessment and extract a set of linguistic features to predict human judgments of sentence naturalness. Our experiment using a small dataset showed that the feature-based approach yields promising results, with the added potential of providing interpretability into the source of the problems.
Tasks	Image Captioning, Machine Translation, Text Generation, Text Summarization
Published	2018-11-01
URL	https://www.aclweb.org/anthology/W18-6512/
PDF	https://www.aclweb.org/anthology/W18-6512
PWC	https://paperswithcode.com/paper/treat-the-system-like-a-human-student
Repo
Framework

HSA-RNN: Hierarchical Structure-Adaptive RNN for Video Summarization


Title	HSA-RNN: Hierarchical Structure-Adaptive RNN for Video Summarization
Authors	Bin Zhao, Xuelong Li, Xiaoqiang Lu
Abstract	Although video summarization has achieved great success in recent years, few approaches have realized the influence of video structure on the summarization results. As we know, the video data follow a hierarchical structure, i.e., a video is composed of shots, and a shot is composed of several frames. Generally, shots provide the activity-level information for people to understand the video content. While few existing summarization approaches pay attention to the shot segmentation procedure. They generate shots by some trivial strategies, such as fixed length segmentation, which may destroy the underlying hierarchical structure of video data and further reduce the quality of generated summaries. To address this problem, we propose a structure-adaptive video summarization approach that integrates shot segmentation and video summarization into a Hierarchical Structure-Adaptive RNN, denoted as HSA-RNN. We evaluate the proposed approach on four popular datasets, i.e., SumMe, TVsum, CoSum and VTW. The experimental results have demonstrated the effectiveness of HSA-RNN in the video summarization task.
Tasks	Video Summarization
Published	2018-06-01
URL	http://openaccess.thecvf.com/content_cvpr_2018/html/Zhao_HSA-RNN_Hierarchical_Structure-Adaptive_CVPR_2018_paper.html
PDF	http://openaccess.thecvf.com/content_cvpr_2018/papers/Zhao_HSA-RNN_Hierarchical_Structure-Adaptive_CVPR_2018_paper.pdf
PWC	https://paperswithcode.com/paper/hsa-rnn-hierarchical-structure-adaptive-rnn
Repo
Framework

A Bird’s-eye View of Language Processing Projects at the Romanian Academy


Title	A Bird’s-eye View of Language Processing Projects at the Romanian Academy
Authors	Dan Tufi{\textcommabelow{s}}, Dan Cristea
Abstract
Tasks	Autonomous Vehicles, Keyword Spotting, Transliteration
Published	2018-05-01
URL	https://www.aclweb.org/anthology/L18-1388/
PDF	https://www.aclweb.org/anthology/L18-1388
PWC	https://paperswithcode.com/paper/a-birdas-eye-view-of-language-processing
Repo
Framework