October 15, 2019

2731 words 13 mins read

Paper Group NANR 225

Paper Group NANR 225

Efficient Sequence Learning with Group Recurrent Networks. Multi-label Learning for Large Text Corpora using Latent Variable Model with Provable Gurantees. The ACL Anthology: Current State and Future Directions. Combination of Supervised and Reinforcement Learning For Vision-Based Autonomous Control. Interactional Stancetaking in Online Forums. The …

Efficient Sequence Learning with Group Recurrent Networks

Title Efficient Sequence Learning with Group Recurrent Networks
Authors Fei Gao, Lijun Wu, Li Zhao, Tao Qin, Xueqi Cheng, Tie-Yan Liu
Abstract Recurrent neural networks have achieved state-of-the-art results in many artificial intelligence tasks, such as language modeling, neural machine translation, speech recognition and so on. One of the key factors to these successes is big models. However, training such big models usually takes days or even weeks of time even if using tens of GPU cards. In this paper, we propose an efficient architecture to improve the efficiency of such RNN model training, which adopts the group strategy for recurrent layers, while exploiting the representation rearrangement strategy between layers as well as time steps. To demonstrate the advantages of our models, we conduct experiments on several datasets and tasks. The results show that our architecture achieves comparable or better accuracy comparing with baselines, with a much smaller number of parameters and at a much lower computational cost.
Tasks Language Modelling, Machine Translation, Speech Recognition
Published 2018-06-01
URL https://www.aclweb.org/anthology/N18-1073/
PDF https://www.aclweb.org/anthology/N18-1073
PWC https://paperswithcode.com/paper/efficient-sequence-learning-with-group
Repo
Framework

Multi-label Learning for Large Text Corpora using Latent Variable Model with Provable Gurantees

Title Multi-label Learning for Large Text Corpora using Latent Variable Model with Provable Gurantees
Authors Sayantan Dasgupta
Abstract Here we study the problem of learning labels for large text corpora where each document can be assigned a variable number of labels. The problem is trivial when the label dimensionality is small and can be easily solved by a series of one-vs-all classifiers. However, as the label dimensionality increases, the parameter space of such one-vs-all classifiers becomes extremely large and outstrips the memory. Here we propose a latent variable model to reduce the size of the parameter space, but still efficiently learn the labels. We learn the model using spectral learning and show how to extract the parameters using only three passes through the training dataset. Further, we analyse the sample complexity of our model using PAC learning theory and then demonstrate the performance of our algorithm on several benchmark datasets in comparison with existing algorithms.
Tasks Multi-Label Learning
Published 2018-01-01
URL https://openreview.net/forum?id=HJ_X8GupW
PDF https://openreview.net/pdf?id=HJ_X8GupW
PWC https://paperswithcode.com/paper/multi-label-learning-for-large-text-corpora
Repo
Framework

The ACL Anthology: Current State and Future Directions

Title The ACL Anthology: Current State and Future Directions
Authors Daniel Gildea, Min-Yen Kan, Nitin Madnani, Christoph Teichmann, Martin Villalba
Abstract
Tasks
Published 2018-07-01
URL https://www.aclweb.org/anthology/papers/W18-2504/w18-2504
PDF https://www.aclweb.org/anthology/W18-2504v2
PWC https://paperswithcode.com/paper/the-acl-anthology-current-state-and-future
Repo
Framework

Combination of Supervised and Reinforcement Learning For Vision-Based Autonomous Control

Title Combination of Supervised and Reinforcement Learning For Vision-Based Autonomous Control
Authors Dmitry Kangin, Nicolas Pugeault
Abstract Reinforcement learning methods have recently achieved impressive results on a wide range of control problems. However, especially with complex inputs, they still require an extensive amount of training data in order to converge to a meaningful solution. This limitation largely prohibits their usage for complex input spaces such as video signals, and it is still impossible to use it for a number of complex problems in a real world environments, including many of those for video based control. Supervised learning, on the contrary, is capable of learning on a relatively small number of samples, however it does not take into account reward-based control policies and is not capable to provide independent control policies. In this article we propose a model-free control method, which uses a combination of reinforcement and supervised learning for autonomous control and paves the way towards policy based control in real world environments. We use SpeedDreams/TORCS video game to demonstrate that our approach requires much less samples (hundreds of thousands against millions or tens of millions) comparing to the state-of-the-art reinforcement learning techniques on similar data, and at the same time overcomes both supervised and reinforcement learning approaches in terms of quality. Additionally, we demonstrate the applicability of the method to MuJoCo control problems.
Tasks
Published 2018-01-01
URL https://openreview.net/forum?id=BkeC_J-R-
PDF https://openreview.net/pdf?id=BkeC_J-R-
PWC https://paperswithcode.com/paper/combination-of-supervised-and-reinforcement
Repo
Framework

Interactional Stancetaking in Online Forums

Title Interactional Stancetaking in Online Forums
Authors Scott F. Kiesling, Umashanthi Pavalanathan, Jim Fitzpatrick, Xiaochuang Han, Jacob Eisenstein
Abstract Language is shaped by the relationships between the speaker/writer and the audience, the object of discussion, and the talk itself. In turn, language is used to reshape these relationships over the course of an interaction. Computational researchers have succeeded in operationalizing sentiment, formality, and politeness, but each of these constructs captures only some aspects of social and relational meaning. Theories of interactional stancetaking have been put forward as holistic accounts, but until now, these theories have been applied only through detailed qualitative analysis of (portions of) a few individual conversations. In this article, we propose a new computational operationalization of interpersonal stancetaking. We begin with annotations of three linked stance dimensions{—}affect, investment, and alignment{—}on 68 conversation threads from the online platform Reddit. Using these annotations, we investigate thread structure and linguistic properties of stancetaking in online conversations. We identify lexical features that characterize the extremes along each stancetaking dimension, and show that these stancetaking properties can be predicted with moderate accuracy from bag-of-words features, even with a relatively small labeled training set. These quantitative analyses are supplemented by extensive qualitative analysis, highlighting the compatibility of computational and qualitative methods in synthesizing evidence about the creation of interactional meaning.
Tasks
Published 2018-12-01
URL https://www.aclweb.org/anthology/J18-4007/
PDF https://www.aclweb.org/anthology/J18-4007
PWC https://paperswithcode.com/paper/interactional-stancetaking-in-online-forums
Repo
Framework

The Data Challenge in Misinformation Detection: Source Reputation vs. Content Veracity

Title The Data Challenge in Misinformation Detection: Source Reputation vs. Content Veracity
Authors Fatemeh Torabi Asr, Maite Taboada
Abstract Misinformation detection at the level of full news articles is a text classification problem. Reliably labeled data in this domain is rare. Previous work relied on news articles collected from so-called {}reputable{''} and {}suspicious{''} websites and labeled accordingly. We leverage fact-checking websites to collect individually-labeled news articles with regard to the veracity of their content and use this data to test the cross-domain generalization of a classifier trained on bigger text collections but labeled according to source reputation. Our results suggest that reputation-based classification is not sufficient for predicting the veracity level of the majority of news articles, and that the system performance on different test datasets depends on topic distribution. Therefore collecting well-balanced and carefully-assessed training data is a priority for developing robust misinformation detection systems.
Tasks Domain Generalization, Fake News Detection, Text Classification
Published 2018-11-01
URL https://www.aclweb.org/anthology/W18-5502/
PDF https://www.aclweb.org/anthology/W18-5502
PWC https://paperswithcode.com/paper/the-data-challenge-in-misinformation
Repo
Framework

A POS Tagging Model Adapted to Learner English

Title A POS Tagging Model Adapted to Learner English
Authors Ryo Nagata, Tomoya Mizumoto, Yuta Kikuchi, Yoshifumi Kawasaki, Kotaro Funakoshi
Abstract There has been very limited work on the adaptation of Part-Of-Speech (POS) tagging to learner English despite the fact that POS tagging is widely used in related tasks. In this paper, we explore how we can adapt POS tagging to learner English efficiently and effectively. Based on the discussion of possible causes of POS tagging errors in learner English, we show that deep neural models are particularly suitable for this. Considering the previous findings and the discussion, we introduce the design of our model based on bidirectional Long Short-Term Memory. In addition, we describe how to adapt it to a wide variety of native languages (potentially, hundreds of them). In the evaluation section, we empirically show that it is effective for POS tagging in learner English, achieving an accuracy of 0.964, which significantly outperforms the state-of-the-art POS-tagger. We further investigate the tagging results in detail, revealing which part of the model design does or does not improve the performance.
Tasks Grammatical Error Correction, Part-Of-Speech Tagging
Published 2018-11-01
URL https://www.aclweb.org/anthology/W18-6106/
PDF https://www.aclweb.org/anthology/W18-6106
PWC https://paperswithcode.com/paper/a-pos-tagging-model-adapted-to-learner
Repo
Framework

A Joint Model of Conversational Discourse Latent Topics on Microblogs

Title A Joint Model of Conversational Discourse Latent Topics on Microblogs
Authors Jing Li, Yan Song, Zhongyu Wei, Kam-Fai Wong
Abstract Conventional topic models are ineffective for topic extraction from microblog messages, because the data sparseness exhibited in short messages lacking structure and contexts results in poor message-level word co-occurrence patterns. To address this issue, we organize microblog messages as conversation trees based on their reposting and replying relations, and propose an unsupervised model that jointly learns word distributions to represent: (1) different roles of conversational discourse, and (2) various latent topics in reflecting content information. By explicitly distinguishing the probabilities of messages with varying discourse roles in containing topical words, our model is able to discover clusters of discourse words that are indicative of topical content. In an automatic evaluation on large-scale microblog corpora, our joint model yields topics with better coherence scores than competitive topic models from previous studies. Qualitative analysis on model outputs indicates that our model induces meaningful representations for both discourse and topics. We further present an empirical study on microblog summarization based on the outputs of our joint model. The results show that the jointly modeled discourse and topic representations can effectively indicate summary-worthy content in microblog conversations.
Tasks Topic Models
Published 2018-12-01
URL https://www.aclweb.org/anthology/J18-4008/
PDF https://www.aclweb.org/anthology/J18-4008
PWC https://paperswithcode.com/paper/a-joint-model-of-conversational-discourse
Repo
Framework

Deep Lipschitz networks and Dudley GANs

Title Deep Lipschitz networks and Dudley GANs
Authors Ehsan Abbasnejad, Javen Shi, Anton van den Hengel
Abstract Generative adversarial networks (GANs) have enjoyed great success, however often suffer instability during training which motivates many attempts to resolve this issue. Theoretical explanation for the cause of instability is provided in Wasserstein GAN (WGAN), and wasserstein distance is proposed to stablize the training. Though WGAN is indeed more stable than previous GANs, it takes much more iterations and time to train. This is because the ways to ensure Lipschitz condition in WGAN (such as weight-clipping) significantly limit the capacity of the network. In this paper, we argue that it is beneficial to ensure Lipschitz condition as well as maintain sufficient capacity and expressiveness of the network. To facilitate this, we develop both theoretical and practical building blocks, using which one can construct different neural networks using a large range of metrics, as well as ensure Lipschitz condition and sufficient capacity of the networks. Using the proposed building blocks, and a special choice of a metric called Dudley metric, we propose Dudley GAN that outperforms the state of the arts in both convergence and sample quality. We discover a natural link between Dudley GAN (and its extension) and empirical risk minimization, which gives rise to generalization analysis.
Tasks
Published 2018-01-01
URL https://openreview.net/forum?id=rkw-jlb0W
PDF https://openreview.net/pdf?id=rkw-jlb0W
PWC https://paperswithcode.com/paper/deep-lipschitz-networks-and-dudley-gans
Repo
Framework

What makes us laugh? Investigations into Automatic Humor Classification

Title What makes us laugh? Investigations into Automatic Humor Classification
Authors Vikram Ahuja, Taradheesh Bali, Navjyoti Singh
Abstract Most scholarly works in the field of computational detection of humour derive their inspiration from the incongruity theory. Incongruity is an indispensable facet in drawing a line between humorous and non-humorous occurrences but is immensely inadequate in shedding light on what actually made the particular occurrence a funny one. Classical theories like Script-based Semantic Theory of Humour and General Verbal Theory of Humour try and achieve this feat to an adequate extent. In this paper we adhere to a more holistic approach towards classification of humour based on these classical theories with a few improvements and revisions. Through experiments based on our linear approach and performed on large data-sets of jokes, we are able to demonstrate the adaptability and show componentizability of our model, and that a host of classification techniques can be used to overcome the challenging problem of distinguishing between various categories and sub-categories of jokes.
Tasks Humor Detection
Published 2018-06-01
URL https://www.aclweb.org/anthology/W18-1101/
PDF https://www.aclweb.org/anthology/W18-1101
PWC https://paperswithcode.com/paper/what-makes-us-laugh-investigations-into
Repo
Framework

We Usually Don’t Like Going to the Dentist: Using Common Sense to Detect Irony on Twitter

Title We Usually Don’t Like Going to the Dentist: Using Common Sense to Detect Irony on Twitter
Authors Cynthia Van Hee, Els Lefever, V{'e}ronique Hoste
Abstract Although common sense and connotative knowledge come naturally to most people, computers still struggle to perform well on tasks for which such extratextual information is required. Automatic approaches to sentiment analysis and irony detection have revealed that the lack of such world knowledge undermines classification performance. In this article, we therefore address the challenge of modeling implicit or prototypical sentiment in the framework of automatic irony detection. Starting from manually annotated connoted situation phrases (e.g., {}flight delays,{''} {}sitting the whole day at the doctor{'}s office{''}), we defined the implicit sentiment held towards such situations automatically by using both a lexico-semantic knowledge base and a data-driven method. We further investigate how such implicit sentiment information affects irony detection by assessing a state-of-the-art irony classifier before and after it is informed with implicit sentiment information.
Tasks Common Sense Reasoning, Sentiment Analysis
Published 2018-12-01
URL https://www.aclweb.org/anthology/J18-4010/
PDF https://www.aclweb.org/anthology/J18-4010
PWC https://paperswithcode.com/paper/we-usually-dont-like-going-to-the-dentist
Repo
Framework

Cube Padding for Weakly-Supervised Saliency Prediction in 360° Videos

Title Cube Padding for Weakly-Supervised Saliency Prediction in 360° Videos
Authors Hsien-Tzu Cheng, Chun-Hung Chao, Jin-Dong Dong, Hao-Kai Wen, Tyng-Luh Liu, Min Sun
Abstract Automatic saliency prediction in 360° videos is critical for viewpoint guidance applications (e.g., Facebook 360 Guide). We propose a spatial-temporal network which is (1) unsupervisedly trained and (2) tailor-made for 360° viewing sphere. Note that most existing methods are less scalable since they rely on annotated saliency map for training. Most importantly, they convert 360° sphere to 2D images (e.g., a single equirectangular image or multiple separate Normal Field-of-View (NFoV) images) which introduces distortion and image boundaries. In contrast, we propose a simple and effective Cube Padding (CP) technique as follows. Firstly, we render the 360° view on six faces of a cube using perspective projection. Thus, it introduces very little distortion. Then, we concatenate all six faces while utilizing the connectivity between faces on the cube for image padding (i.e., Cube Padding) in convolution, pooling, convolutional LSTM layers. In this way, PC introduces no image boundary while being applicable to almost all Convolutional Neural Network (CNN) structures. To evaluate our method, we propose Wild-360, a new 360° video saliency dataset, containing challenging videos with saliency heatmap annotations. In experiments, our method outperforms all baseline methods in both speed and quality.
Tasks Saliency Prediction
Published 2018-06-01
URL http://openaccess.thecvf.com/content_cvpr_2018/html/Cheng_Cube_Padding_for_CVPR_2018_paper.html
PDF http://openaccess.thecvf.com/content_cvpr_2018/papers/Cheng_Cube_Padding_for_CVPR_2018_paper.pdf
PWC https://paperswithcode.com/paper/cube-padding-for-weakly-supervised-saliency-1
Repo
Framework

Low-Resource Machine Transliteration Using Recurrent Neural Networks of Asian Languages

Title Low-Resource Machine Transliteration Using Recurrent Neural Networks of Asian Languages
Authors Ngoc Tan Le, Fatiha Sadat
Abstract Grapheme-to-phoneme models are key components in automatic speech recognition and text-to-speech systems. With low-resource language pairs that do not have available and well-developed pronunciation lexicons, grapheme-to-phoneme models are particularly useful. These models are based on initial alignments between grapheme source and phoneme target sequences. Inspired by sequence-to-sequence recurrent neural network-based translation methods, the current research presents an approach that applies an alignment representation for input sequences and pre-trained source and target embeddings to overcome the transliteration problem for a low-resource languages pair. We participated in the NEWS 2018 shared task for the English-Vietnamese transliteration task.
Tasks Machine Translation, Speech Recognition, Transliteration, Word Embeddings
Published 2018-07-01
URL https://www.aclweb.org/anthology/W18-2414/
PDF https://www.aclweb.org/anthology/W18-2414
PWC https://paperswithcode.com/paper/low-resource-machine-transliteration-using
Repo
Framework

Linguistic and Sociolinguistic Annotation of 17th Century Dutch Letters

Title Linguistic and Sociolinguistic Annotation of 17th Century Dutch Letters
Authors Marijn Schraagen, Feike Dietz, Marjo van Koppen
Abstract
Tasks
Published 2018-05-01
URL https://www.aclweb.org/anthology/L18-1184/
PDF https://www.aclweb.org/anthology/L18-1184
PWC https://paperswithcode.com/paper/linguistic-and-sociolinguistic-annotation-of
Repo
Framework

Nonoverlap-Promoting Variable Selection

Title Nonoverlap-Promoting Variable Selection
Authors Pengtao Xie, Hongbao Zhang, Yichen Zhu, Eric Xing
Abstract Variable selection is a classic problem in machine learning (ML), widely used to find important explanatory factors, and improve generalization performance and interpretability of ML models. In this paper, we consider variable selection for models where multiple responses are to be predicted based on the same set of covariates. Since each response is relevant to a unique subset of covariates, we desire the selected variables for different responses have small overlap. We propose a regularizer that simultaneously encourage orthogonality and sparsity, which jointly brings in an effect of reducing overlap. We apply this regularizer to four model instances and develop efficient algorithms to solve the regularized problems. We provide a formal analysis on why the proposed regularizer can reduce generalization error. Experiments on both simulation studies and real-world datasets demonstrate the effectiveness of the proposed regularizer in selecting less-overlapped variables and improving generalization performance.
Tasks
Published 2018-07-01
URL https://icml.cc/Conferences/2018/Schedule?showEvent=1886
PDF http://proceedings.mlr.press/v80/xie18b/xie18b.pdf
PWC https://paperswithcode.com/paper/nonoverlap-promoting-variable-selection
Repo
Framework
comments powered by Disqus