January 24, 2020

2732 words 13 mins read

Paper Group NANR 184

Paper Group NANR 184

Uphill from here: Sentiment patterns in videos from left- and right-wing YouTube news channels. Data Set for Stance and Sentiment Analysis from User Comments on Croatian News. Formalism for a language agnostic language learning game and productive grid generation. A Wide-Coverage Context-Free Grammar for Icelandic and an Accompanying Parsing System …

Uphill from here: Sentiment patterns in videos from left- and right-wing YouTube news channels

Title Uphill from here: Sentiment patterns in videos from left- and right-wing YouTube news channels
Authors Felix Soldner, Justin Chun-ting Ho, Mykola Makhortykh, Isabelle W.J. van der Vegt, Maximilian Mozes, Bennett Kleinberg
Abstract News consumption exhibits an increasing shift towards online sources, which bring platforms such as YouTube more into focus. Thus, the distribution of politically loaded news is easier, receives more attention, but also raises the concern of forming isolated ideological communities. Understanding how such news is communicated and received is becoming increasingly important. To expand our understanding in this domain, we apply a linguistic temporal trajectory analysis to analyze sentiment patterns in English-language videos from news channels on YouTube. We examine transcripts from videos distributed through eight channels with pro-left and pro-right political leanings. Using unsupervised clustering, we identify seven different sentiment patterns in the transcripts. We found that the use of two sentiment patterns differed significantly depending on political leaning. Furthermore, we used predictive models to examine how different sentiment patterns relate to video popularity and if they differ depending on the channel{'}s political leaning. No clear relations between sentiment patterns and popularity were found. However, results indicate, that videos from pro-right news channels are more popular and that a negative sentiment further increases that popularity, when sentiments are averaged for each video.
Tasks
Published 2019-06-01
URL https://www.aclweb.org/anthology/W19-2110/
PDF https://www.aclweb.org/anthology/W19-2110
PWC https://paperswithcode.com/paper/uphill-from-here-sentiment-patterns-in-videos
Repo
Framework

Data Set for Stance and Sentiment Analysis from User Comments on Croatian News

Title Data Set for Stance and Sentiment Analysis from User Comments on Croatian News
Authors Mihaela Bo{\v{s}}njak, Mladen Karan
Abstract Nowadays it is becoming more important than ever to find new ways of extracting useful information from the evergrowing amount of user-generated data available online. In this paper, we describe the creation of a data set that contains news articles and corresponding comments from Croatian news outlet 24 sata. Our annotation scheme is specifically tailored for the task of detecting stances and sentiment from user comments as well as assessing if commentator claims are verifiable. Through this data, we hope to get a better understanding of the publics viewpoint on various events. In addition, we also explore the potential of applying supervised machine learning models toautomate annotation of more data.
Tasks Sentiment Analysis
Published 2019-08-01
URL https://www.aclweb.org/anthology/W19-3707/
PDF https://www.aclweb.org/anthology/W19-3707
PWC https://paperswithcode.com/paper/data-set-for-stance-and-sentiment-analysis
Repo
Framework

Formalism for a language agnostic language learning game and productive grid generation

Title Formalism for a language agnostic language learning game and productive grid generation
Authors Sylvain Hatier, Arnaud Bey, Mathieu Loiseau
Abstract
Tasks
Published 2019-09-01
URL https://www.aclweb.org/anthology/W19-6306/
PDF https://www.aclweb.org/anthology/W19-6306
PWC https://paperswithcode.com/paper/formalism-for-a-language-agnostic-language
Repo
Framework

A Wide-Coverage Context-Free Grammar for Icelandic and an Accompanying Parsing System

Title A Wide-Coverage Context-Free Grammar for Icelandic and an Accompanying Parsing System
Authors Vilhj{'a}lmur {\TH}orsteinsson, Hulda {'O}lad{'o}ttir, Hrafn Loftsson
Abstract We present an open-source, wide-coverage context-free grammar (CFG) for Icelandic, and an accompanying parsing system. The grammar has over 5,600 nonterminals, 4,600 terminals and 19,000 productions in fully expanded form, with feature agreement constraints for case, gender, number and person. The parsing system consists of an enhanced Earley-based parser and a mechanism to select best-scoring parse trees from shared packed parse forests. Our parsing system is able to parse about 90{%} of all sentences in articles published on the main Icelandic news websites. Preliminary evaluation with evalb shows an F-measure of 70.72{%} on parsed sentences. Our system demonstrates that parsing a morphologically rich language using a wide-coverage CFG can be practical.
Tasks
Published 2019-09-01
URL https://www.aclweb.org/anthology/R19-1160/
PDF https://www.aclweb.org/anthology/R19-1160
PWC https://paperswithcode.com/paper/a-wide-coverage-context-free-grammar-for
Repo
Framework

SDC - Stacked Dilated Convolution: A Unified Descriptor Network for Dense Matching Tasks

Title SDC - Stacked Dilated Convolution: A Unified Descriptor Network for Dense Matching Tasks
Authors Rene Schuster, Oliver Wasenmuller, Christian Unger, Didier Stricker
Abstract Dense pixel matching is important for many computer vision tasks such as disparity and flow estimation. We present a robust, unified descriptor network that considers a large context region with high spatial variance. Our network has a very large receptive field and avoids striding layers to maintain spatial resolution. These properties are achieved by creating a novel neural network layer that consists of multiple, parallel, stacked dilated convolutions (SDC). Several of these layers are combined to form our SDC descriptor network. In our experiments, we show that our SDC features outperform state-of-the-art feature descriptors in terms of accuracy and robustness. In addition, we demonstrate the superior performance of SDC in state-of-the-art stereo matching, optical flow and scene flow algorithms on several famous public benchmarks.
Tasks Optical Flow Estimation, Stereo Matching, Stereo Matching Hand
Published 2019-06-01
URL http://openaccess.thecvf.com/content_CVPR_2019/html/Schuster_SDC_-_Stacked_Dilated_Convolution_A_Unified_Descriptor_Network_for_CVPR_2019_paper.html
PDF http://openaccess.thecvf.com/content_CVPR_2019/papers/Schuster_SDC_-_Stacked_Dilated_Convolution_A_Unified_Descriptor_Network_for_CVPR_2019_paper.pdf
PWC https://paperswithcode.com/paper/sdc-stacked-dilated-convolution-a-unified-1
Repo
Framework

Measuring Immediate Adaptation Performance for Neural Machine Translation

Title Measuring Immediate Adaptation Performance for Neural Machine Translation
Authors Patrick Simianer, Joern Wuebker, John DeNero
Abstract Incremental domain adaptation, in which a system learns from the correct output for each input immediately after making its prediction for that input, can dramatically improve system performance for interactive machine translation. Users of interactive systems are sensitive to the speed of adaptation and how often a system repeats mistakes, despite being corrected. Adaptation is most commonly assessed using corpus-level BLEU- or TER-derived metrics that do not explicitly take adaptation speed into account. We find that these metrics often do not capture immediate adaptation effects, such as zero-shot and one-shot learning of domain-specific lexical items. To this end, we propose new metrics that directly evaluate immediate adaptation performance for machine translation. We use these metrics to choose the most suitable adaptation method from a range of different adaptation techniques for neural machine translation systems.
Tasks Domain Adaptation, Machine Translation, One-Shot Learning
Published 2019-06-01
URL https://www.aclweb.org/anthology/N19-1206/
PDF https://www.aclweb.org/anthology/N19-1206
PWC https://paperswithcode.com/paper/measuring-immediate-adaptation-performance
Repo
Framework

MICRON: Multigranular Interaction for Contextualizing RepresentatiON in Non-factoid Question Answering

Title MICRON: Multigranular Interaction for Contextualizing RepresentatiON in Non-factoid Question Answering
Authors Hojae Han, Seungtaek Choi, Haeju Park, Seung-won Hwang
Abstract This paper studies the problem of non-factoid question answering, where the answer may span over multiple sentences. Existing solutions can be categorized into representation- and interaction-focused approaches. We combine their complementary strength, by a hybrid approach allowing multi-granular interactions, but represented at word level, enabling an easy integration with strong word-level signals. Specifically, we propose MICRON: Multigranular Interaction for Contextualizing RepresentatiON, a novel approach which derives contextualized uni-gram representation from n-grams. Our contributions are as follows: First, we enable multi-granular matches between question and answer $n$-grams. Second, by contextualizing word representation with surrounding n-grams, MICRON can naturally utilize word-based signals for query term weighting, known to be effective in information retrieval. We validate MICRON in two public non-factoid question answering datasets: WikiPassageQA and InsuranceQA, showing our model achieves the state of the art among baselines with reported performances on both datasets.
Tasks Information Retrieval, Question Answering
Published 2019-11-01
URL https://www.aclweb.org/anthology/D19-1601/
PDF https://www.aclweb.org/anthology/D19-1601
PWC https://paperswithcode.com/paper/micron-multigranular-interaction-for
Repo
Framework

Vecalign: Improved Sentence Alignment in Linear Time and Space

Title Vecalign: Improved Sentence Alignment in Linear Time and Space
Authors Brian Thompson, Philipp Koehn
Abstract We introduce Vecalign, a novel bilingual sentence alignment method which is linear in time and space with respect to the number of sentences being aligned and which requires only bilingual sentence embeddings. On a standard German{–}French test set, Vecalign outperforms the previous state-of-the-art method (which has quadratic time complexity and requires a machine translation system) by 5 F1 points. It substantially outperforms the popular Hunalign toolkit at recovering Bible verse alignments in medium- to low-resource language pairs, and it improves downstream MT quality by 1.7 and 1.6 BLEU in Sinhala-English and Nepali-English, respectively, compared to the Hunalign-based Paracrawl pipeline.
Tasks Machine Translation, Sentence Embeddings
Published 2019-11-01
URL https://www.aclweb.org/anthology/D19-1136/
PDF https://www.aclweb.org/anthology/D19-1136
PWC https://paperswithcode.com/paper/vecalign-improved-sentence-alignment-in
Repo
Framework

Multi-Input Multi-Output Sequence Labeling for Joint Extraction of Fact and Condition Tuples from Scientific Text

Title Multi-Input Multi-Output Sequence Labeling for Joint Extraction of Fact and Condition Tuples from Scientific Text
Authors Tianwen Jiang, Tong Zhao, Bing Qin, Ting Liu, Nitesh Chawla, Meng Jiang
Abstract Condition is essential in scientific statement. Without the conditions (e.g., equipment, environment) that were precisely specified, facts (e.g., observations) in the statements may no longer be valid. Existing ScienceIE methods, which aim at extracting factual tuples from scientific text, do not consider the conditions. In this work, we propose a new sequence labeling framework (as well as a new tag schema) to jointly extract the fact and condition tuples from statement sentences. The framework has (1) a multi-output module to generate one or multiple tuples and (2) a multi-input module to feed in multiple types of signals as sequences. It improves F1 score relatively by 4.2{%} on BioNLP2013 and by 6.2{%} on a new bio-text dataset for tuple extraction.
Tasks
Published 2019-11-01
URL https://www.aclweb.org/anthology/D19-1029/
PDF https://www.aclweb.org/anthology/D19-1029
PWC https://paperswithcode.com/paper/multi-input-multi-output-sequence-labeling
Repo
Framework

Cross-lingual Structure Transfer for Relation and Event Extraction

Title Cross-lingual Structure Transfer for Relation and Event Extraction
Authors Ananya Subburathinam, Di Lu, Heng Ji, Jonathan May, Shih-Fu Chang, Avirup Sil, Clare Voss
Abstract The identification of complex semantic structures such as events and entity relations, already a challenging Information Extraction task, is doubly difficult from sources written in under-resourced and under-annotated languages. We investigate the suitability of cross-lingual structure transfer techniques for these tasks. We exploit relation- and event-relevant language-universal features, leveraging both symbolic (including part-of-speech and dependency path) and distributional (including type representation and contextualized representation) information. By representing all entity mentions, event triggers, and contexts into this complex and structured multilingual common space, using graph convolutional networks, we can train a relation or event extractor from source language annotations and apply it to the target language. Extensive experiments on cross-lingual relation and event transfer among English, Chinese, and Arabic demonstrate that our approach achieves performance comparable to state-of-the-art supervised models trained on up to 3,000 manually annotated mentions: up to 62.6{%} F-score for Relation Extraction, and 63.1{%} F-score for Event Argument Role Labeling. The event argument role labeling model transferred from English to Chinese achieves similar performance as the model trained from Chinese. We thus find that language-universal symbolic and distributional representations are complementary for cross-lingual structure transfer.
Tasks Relation Extraction
Published 2019-11-01
URL https://www.aclweb.org/anthology/D19-1030/
PDF https://www.aclweb.org/anthology/D19-1030
PWC https://paperswithcode.com/paper/cross-lingual-structure-transfer-for-relation
Repo
Framework

You Reap What You Sow: Using Videos to Generate High Precision Object Proposals for Weakly-Supervised Object Detection

Title You Reap What You Sow: Using Videos to Generate High Precision Object Proposals for Weakly-Supervised Object Detection
Authors Krishna Kumar Singh, Yong Jae Lee
Abstract We propose a novel way of using videos to obtain high precision object proposals for weakly-supervised object detection. Existing weakly-supervised detection approaches use off-the-shelf proposal methods like edge boxes or selective search to obtain candidate boxes. These methods provide high recall but at the expense of thousands of noisy proposals. Thus, the entire burden of finding the few relevant object regions is left to the ensuing object mining step. To mitigate this issue, we focus instead on improving the precision of the initial candidate object proposals. Since we cannot rely on localization annotations, we turn to video and leverage motion cues to automatically estimate the extent of objects to train a Weakly-supervised Region Proposal Network (W-RPN). We use the W-RPN to generate high precision object proposals, which are in turn used to re-rank high recall proposals like edge boxes or selective search according to their spatial overlap. Our W-RPN proposals lead to significant improvement in performance for state-of-the-art weakly-supervised object detection approaches on PASCAL VOC 2007 and 2012.
Tasks Object Detection, Weakly Supervised Object Detection
Published 2019-06-01
URL http://openaccess.thecvf.com/content_CVPR_2019/html/Singh_You_Reap_What_You_Sow_Using_Videos_to_Generate_High_CVPR_2019_paper.html
PDF http://openaccess.thecvf.com/content_CVPR_2019/papers/Singh_You_Reap_What_You_Sow_Using_Videos_to_Generate_High_CVPR_2019_paper.pdf
PWC https://paperswithcode.com/paper/you-reap-what-you-sow-using-videos-to
Repo
Framework

Lexical concreteness in narrative

Title Lexical concreteness in narrative
Authors Michael Flor, Swapna Somasundaran
Abstract This study explores the relation between lexical concreteness and narrative text quality. We present a methodology to quantitatively measure lexical concreteness of a text. We apply it to a corpus of student stories, scored according to writing evaluation rubrics. Lexical concreteness is weakly-to-moderately related to story quality, depending on story-type. The relation is mostly borne by adjectives and nouns, but also found for adverbs and verbs.
Tasks
Published 2019-08-01
URL https://www.aclweb.org/anthology/W19-3408/
PDF https://www.aclweb.org/anthology/W19-3408
PWC https://paperswithcode.com/paper/lexical-concreteness-in-narrative
Repo
Framework

Estimating Information Flow in DNNs

Title Estimating Information Flow in DNNs
Authors Ziv Goldfeld, Ewout van den Berg, Kristjan Greenewald, Brian Kingsbury, Igor Melnyk, Nam Nguyen, Yury Polyanskiy
Abstract We study the evolution of internal representations during deep neural network (DNN) training, aiming to demystify the compression aspect of the information bottleneck theory. The theory suggests that DNN training comprises a rapid fitting phase followed by a slower compression phase, in which the mutual information I(X;T) between the input X and internal representations T decreases. Several papers observe compression of estimated mutual information on different DNN models, but the true I(X;T) over these networks is provably either constant (discrete X) or infinite (continuous X). This work explains the discrepancy between theory and experiments, and clarifies what was actually measured by these past works. To this end, we introduce an auxiliary (noisy) DNN framework for which I(X;T) is a meaningful quantity that depends on the network’s parameters. This noisy framework is shown to be a good proxy for the original (deterministic) DNN both in terms of performance and the learned representations. We then develop a rigorous estimator for I(X;T) in noisy DNNs and observe compression in various models. By relating I(X;T) in the noisy DNN to an information-theoretic communication problem, we show that compression is driven by the progressive clustering of hidden representations of inputs from the same class. Several methods to directly monitor clustering of hidden representations, both in noisy and deterministic DNNs, are used to show that meaningful clusters form in the T space. Finally, we return to the estimator of I(X;T) employed in past works, and demonstrate that while it fails to capture the true (vacuous) mutual information, it does serve as a measure for clustering. This clarifies the past observations of compression and isolates the geometric clustering of hidden representations as the true phenomenon of interest.
Tasks
Published 2019-05-01
URL https://openreview.net/forum?id=HkxOoiAcYX
PDF https://openreview.net/pdf?id=HkxOoiAcYX
PWC https://paperswithcode.com/paper/estimating-information-flow-in-dnns
Repo
Framework

Team GPLSI. Approach for automated fact checking

Title Team GPLSI. Approach for automated fact checking
Authors Aim{'e}e Alonso-Reina, Robiert Sep{'u}lveda-Torres, Estela Saquete, Manuel Palomar
Abstract Fever Shared 2.0 Task is a challenge meant for developing automated fact checking systems. Our approach for the Fever 2.0 is based on a previous proposal developed by Team Athene UKP TU Darmstadt. Our proposal modifies the sentence retrieval phase, using statement extraction and representation in the form of triplets (subject, object, action). Triplets are extracted from the claim and compare to triplets extracted from Wikipedia articles using semantic similarity. Our results are satisfactory but there is room for improvement.
Tasks Semantic Similarity, Semantic Textual Similarity
Published 2019-11-01
URL https://www.aclweb.org/anthology/D19-6617/
PDF https://www.aclweb.org/anthology/D19-6617
PWC https://paperswithcode.com/paper/team-gplsi-approach-for-automated-fact
Repo
Framework

Fully Unsupervised Crosslingual Semantic Textual Similarity Metric Based on BERT for Identifying Parallel Data

Title Fully Unsupervised Crosslingual Semantic Textual Similarity Metric Based on BERT for Identifying Parallel Data
Authors Chi-kiu Lo, Michel Simard
Abstract We present a fully unsupervised crosslingual semantic textual similarity (STS) metric, based on contextual embeddings extracted from BERT {–} Bidirectional Encoder Representations from Transformers (Devlin et al., 2019). The goal of crosslingual STS is to measure to what degree two segments of text in different languages express the same meaning. Not only is it a key task in crosslingual natural language understanding (XLU), it is also particularly useful for identifying parallel resources for training and evaluating downstream multilingual natural language processing (NLP) applications, such as machine translation. Most previous crosslingual STS methods relied heavily on existing parallel resources, thus leading to a circular dependency problem. With the advent of massively multilingual context representation models such as BERT, which are trained on the concatenation of non-parallel data from each language, we show that the deadlock around parallel resources can be broken. We perform intrinsic evaluations on crosslingual STS data sets and extrinsic evaluations on parallel corpus filtering and human translation equivalence assessment tasks. Our results show that the unsupervised crosslingual STS metric using BERT without fine-tuning achieves performance on par with supervised or weakly supervised approaches.
Tasks Machine Translation, Semantic Textual Similarity
Published 2019-11-01
URL https://www.aclweb.org/anthology/K19-1020/
PDF https://www.aclweb.org/anthology/K19-1020
PWC https://paperswithcode.com/paper/fully-unsupervised-crosslingual-semantic
Repo
Framework
comments powered by Disqus