January 24, 2020

2373 words 12 mins read

Paper Group NANR 245

Transformer Dissection: An Unified Understanding for Transformer’s Attention via the Lens of Kernel. LexicalAT: Lexical-Based Adversarial Reinforcement Training for Robust Sentiment Classification. KSU at SemEval-2019 Task 3: Hybrid Features for Emotion Recognition in Textual Conversation. Split or Merge: Which is Better for Unsupervised RST Parsin …

Transformer Dissection: An Unified Understanding for Transformer’s Attention via the Lens of Kernel


Title	Transformer Dissection: An Unified Understanding for Transformer’s Attention via the Lens of Kernel
Authors	Yao-Hung Hubert Tsai, Shaojie Bai, Makoto Yamada, Louis-Philippe Morency, Ruslan Salakhutdinov
Abstract	Transformer is a powerful architecture that achieves superior performance on various sequence learning tasks, including neural machine translation, language understanding, and sequence prediction. At the core of the Transformer is the attention mechanism, which concurrently processes all inputs in the streams. In this paper, we present a new formulation of attention via the lens of the kernel. To be more precise, we realize that the attention can be seen as applying kernel smoother over the inputs with the kernel scores being the similarities between inputs. This new formulation gives us a better way to understand individual components of the Transformer{'}s attention, such as the better way to integrate the positional embedding. Another important advantage of our kernel-based formulation is that it paves the way to a larger space of composing Transformer{'}s attention. As an example, we propose a new variant of Transformer{'}s attention which models the input as a product of symmetric kernels. This approach achieves competitive performance to the current state of the art model with less computation. In our experiments, we empirically study different kernel construction strategies on two widely used tasks: neural machine translation and sequence prediction.
Tasks	Machine Translation
Published	2019-11-01
URL	https://www.aclweb.org/anthology/D19-1443/
PDF	https://www.aclweb.org/anthology/D19-1443
PWC	https://paperswithcode.com/paper/transformer-dissection-an-unified-1
Repo
Framework

LexicalAT: Lexical-Based Adversarial Reinforcement Training for Robust Sentiment Classification


Title	LexicalAT: Lexical-Based Adversarial Reinforcement Training for Robust Sentiment Classification
Authors	Jingjing Xu, Liang Zhao, Hanqi Yan, Qi Zeng, Yun Liang, Xu Sun
Abstract	Recent work has shown that current text classification models are fragile and sensitive to simple perturbations. In this work, we propose a novel adversarial training approach, LexicalAT, to improve the robustness of current classification models. The proposed approach consists of a generator and a classifier. The generator learns to generate examples to attack the classifier while the classifier learns to defend these attacks. Considering the diversity of attacks, the generator uses a large-scale lexical knowledge base, WordNet, to generate attacking examples by replacing some words in training examples with their synonyms (e.g., sad and unhappy), neighbor words (e.g., fox and wolf), or super-superior words (e.g., chair and armchair). Due to the discrete generation step in the generator, we use policy gradient, a reinforcement learning approach, to train the two modules. Experiments show LexicalAT outperforms strong baselines and reduces test errors on various neural networks, including CNN, RNN, and BERT.
Tasks	Sentiment Analysis, Text Classification
Published	2019-11-01
URL	https://www.aclweb.org/anthology/D19-1554/
PDF	https://www.aclweb.org/anthology/D19-1554
PWC	https://paperswithcode.com/paper/lexicalat-lexical-based-adversarial
Repo
Framework

KSU at SemEval-2019 Task 3: Hybrid Features for Emotion Recognition in Textual Conversation


Title	KSU at SemEval-2019 Task 3: Hybrid Features for Emotion Recognition in Textual Conversation
Authors	Nourah Alswaidan, Mohamed El Bachir Menai
Abstract	We proposed a model to address emotion recognition in textual conversation based on using automatically extracted features and human engineered features. The proposed model utilizes a fast gated-recurrent-unit backed by CuDNN, and a convolutional neural network to automatically extract features. The human engineered features take the frequency-inverse document frequency of semantic meaning and mood tags extracted from SinticNet.
Tasks	Emotion Recognition
Published	2019-06-01
URL	https://www.aclweb.org/anthology/S19-2041/
PDF	https://www.aclweb.org/anthology/S19-2041
PWC	https://paperswithcode.com/paper/ksu-at-semeval-2019-task-3-hybrid-features
Repo
Framework

Split or Merge: Which is Better for Unsupervised RST Parsing?


Title	Split or Merge: Which is Better for Unsupervised RST Parsing?
Authors	Naoki Kobayashi, Tsutomu Hirao, Kengo Nakamura, Hidetaka Kamigaito, Manabu Okumura, Masaaki Nagata
Abstract	Rhetorical Structure Theory (RST) parsing is crucial for many downstream NLP tasks that require a discourse structure for a text. Most of the previous RST parsers have been based on supervised learning approaches. That is, they require an annotated corpus of sufficient size and quality, and heavily rely on the language and domain dependent corpus. In this paper, we present two language-independent unsupervised RST parsing methods based on dynamic programming. The first one builds the optimal tree in terms of a dissimilarity score function that is defined for splitting a text span into smaller ones. The second builds the optimal tree in terms of a similarity score function that is defined for merging two adjacent spans into a large one. Experimental results on English and German RST treebanks showed that our parser based on span merging achieved the best score, around 0.8 F$_1$ score, which is close to the scores of the previous supervised parsers.
Tasks
Published	2019-11-01
URL	https://www.aclweb.org/anthology/D19-1587/
PDF	https://www.aclweb.org/anthology/D19-1587
PWC	https://paperswithcode.com/paper/split-or-merge-which-is-better-for
Repo
Framework

Proceedings of the Beyond Vision and LANguage: inTEgrating Real-world kNowledge (LANTERN)


Title	Proceedings of the Beyond Vision and LANguage: inTEgrating Real-world kNowledge (LANTERN)
Authors
Abstract
Tasks
Published	2019-11-01
URL	https://www.aclweb.org/anthology/D19-6400/
PDF	https://www.aclweb.org/anthology/D19-6400
PWC	https://paperswithcode.com/paper/proceedings-of-the-beyond-vision-and-language
Repo
Framework

Combining Knowledge Hunting and Neural Language Models to Solve the Winograd Schema Challenge


Title	Combining Knowledge Hunting and Neural Language Models to Solve the Winograd Schema Challenge
Authors	Ashok Prakash, Arpit Sharma, Arindam Mitra, Chitta Baral
Abstract	Winograd Schema Challenge (WSC) is a pronoun resolution task which seems to require reasoning with commonsense knowledge. The needed knowledge is not present in the given text. Automatic extraction of the needed knowledge is a bottleneck in solving the challenge. The existing state-of-the-art approach uses the knowledge embedded in their pre-trained language model. However, the language models only embed part of the knowledge, the ones related to frequently co-existing concepts. This limits the performance of such models on the WSC problems. In this work, we build-up on the language model based methods and augment them with a commonsense knowledge hunting (using automatic extraction from text) module and an explicit reasoning module. Our end-to-end system built in such a manner improves on the accuracy of two of the available language model based approaches by 5.53{%} and 7.7{%} respectively. Overall our system achieves the state-of-the-art accuracy of 71.06{%} on the WSC dataset, an improvement of 7.36{%} over the previous best.
Tasks	Language Modelling
Published	2019-07-01
URL	https://www.aclweb.org/anthology/P19-1614/
PDF	https://www.aclweb.org/anthology/P19-1614
PWC	https://paperswithcode.com/paper/combining-knowledge-hunting-and-neural
Repo
Framework

Linguistic features and proficiency classification in L2 Spanish and L2Portuguese.


Title	Linguistic features and proficiency classification in L2 Spanish and L2Portuguese.
Authors	Iria del R{'\i}o
Abstract
Tasks
Published	2019-09-01
URL	https://www.aclweb.org/anthology/W19-6304/
PDF	https://www.aclweb.org/anthology/W19-6304
PWC	https://paperswithcode.com/paper/linguistic-features-and-proficiency
Repo
Framework

Applying BERT to Document Retrieval with Birch


Title	Applying BERT to Document Retrieval with Birch
Authors	Zeynep Akkalyoncu Yilmaz, Shengjin Wang, Wei Yang, Haotian Zhang, Jimmy Lin
Abstract	We present Birch, a system that applies BERT to document retrieval via integration with the open-source Anserini information retrieval toolkit to demonstrate end-to-end search over large document collections. Birch implements simple ranking models that achieve state-of-the-art effectiveness on standard TREC newswire and social media test collections. This demonstration focuses on technical challenges in the integration of NLP and IR capabilities, along with the design rationale behind our approach to tightly-coupled integration between Python (to support neural networks) and the Java Virtual Machine (to support document retrieval using the open-source Lucene search library). We demonstrate integration of Birch with an existing search interface as well as interactive notebooks that highlight its capabilities in an easy-to-understand manner.
Tasks	Information Retrieval
Published	2019-11-01
URL	https://www.aclweb.org/anthology/D19-3004/
PDF	https://www.aclweb.org/anthology/D19-3004
PWC	https://paperswithcode.com/paper/applying-bert-to-document-retrieval-with
Repo
Framework

CapSal: Leveraging Captioning to Boost Semantics for Salient Object Detection


Title	CapSal: Leveraging Captioning to Boost Semantics for Salient Object Detection
Authors	Lu Zhang, Jianming Zhang, Zhe Lin, Huchuan Lu, You He
Abstract	Detecting salient objects in cluttered scenes is a big challenge. To address this problem, we argue that the model needs to learn discriminative semantic features for salient objects. To this end, we propose to leverage captioning as an auxiliary semantic task to boost salient object detection in complex scenarios. Specifically, we develop a CapSal model which consists of two sub-networks, the Image Captioning Network (ICN) and the Local-Global Perception Network (LGPN). ICN encodes the embedding of a generated caption to capture the semantic information of major objects in the scene, while LGPN incorporates the captioning embedding with local-global visual contexts for predicting the saliency map. ICN and LGPN are jointly trained to model high-level semantics as well as visual saliency. Extensive experiments demonstrate the effectiveness of image captioning in boosting the performance of salient object detection. In particular, our model performs significantly better than the state-of-the-art methods on several challenging datasets of complex scenarios.
Tasks	Image Captioning, Object Detection, Salient Object Detection
Published	2019-06-01
URL	http://openaccess.thecvf.com/content_CVPR_2019/html/Zhang_CapSal_Leveraging_Captioning_to_Boost_Semantics_for_Salient_Object_Detection_CVPR_2019_paper.html
PDF	http://openaccess.thecvf.com/content_CVPR_2019/papers/Zhang_CapSal_Leveraging_Captioning_to_Boost_Semantics_for_Salient_Object_Detection_CVPR_2019_paper.pdf
PWC	https://paperswithcode.com/paper/capsal-leveraging-captioning-to-boost
Repo
Framework

Unsupervised 3D Reconstruction Networks


Title	Unsupervised 3D Reconstruction Networks
Authors	Geonho Cha, Minsik Lee, Songhwai Oh
Abstract	In this paper, we propose 3D unsupervised reconstruction networks (3D-URN), which reconstruct the 3D structures of instances in a given object category from their 2D feature points under an orthographic camera model. 3D-URN consists of a 3D shape reconstructor and a rotation estimator, which are trained in a fully-unsupervised manner incorporating the proposed unsupervised loss functions. The role of the 3D shape reconstructor is to reconstruct the 3D shape of an instance from its 2D feature points, and the rotation estimator infers the camera pose. After training, 3D-URN can infer the 3D structure of an unseen instance in the same category, which is not possible in the conventional schemes of non-rigid structure from motion and structure from category. The experimental result shows the state-of-the-art performance, which demonstrates the effectiveness of the proposed method.
Tasks	3D Reconstruction
Published	2019-10-01
URL	http://openaccess.thecvf.com/content_ICCV_2019/html/Cha_Unsupervised_3D_Reconstruction_Networks_ICCV_2019_paper.html
PDF	http://openaccess.thecvf.com/content_ICCV_2019/papers/Cha_Unsupervised_3D_Reconstruction_Networks_ICCV_2019_paper.pdf
PWC	https://paperswithcode.com/paper/unsupervised-3d-reconstruction-networks
Repo
Framework

Enthymemetic Conditionals: Topoi as a guide for acceptability


Title	Enthymemetic Conditionals: Topoi as a guide for acceptability
Authors	Eimear Maguire
Abstract
Tasks
Published	2019-06-01
URL	https://www.aclweb.org/anthology/W19-1008/
PDF	https://www.aclweb.org/anthology/W19-1008
PWC	https://paperswithcode.com/paper/enthymemetic-conditionals-topoi-as-a-guide
Repo
Framework

Cross-sectional Learning of Extremal Dependence among Financial Assets


Title	Cross-sectional Learning of Extremal Dependence among Financial Assets
Authors	Xing Yan, Qi Wu, Wen Zhang
Abstract	We propose a novel probabilistic model to facilitate the learning of multivariate tail dependence of multiple financial assets. Our method allows one to construct from known random vectors, e.g., standard normal, sophisticated joint heavy-tailed random vectors featuring not only distinct marginal tail heaviness, but also flexible tail dependence structure. The novelty lies in that pairwise tail dependence between any two dimensions is modeled separately from their correlation, and can vary respectively according to its own parameter rather than the correlation parameter, which is an essential advantage over many commonly used methods such as multivariate $t$ or elliptical distribution. It is also intuitive to interpret, easy to track, and simple to sample comparing to the copula approach. We show its flexible tail dependence structure through simulation. Coupled with a GARCH model to eliminate serial dependence of each individual asset return series, we use this novel method to model and forecast multivariate conditional distribution of stock returns, and obtain notable performance improvements in multi-dimensional coverage tests. Besides, our empirical finding about the asymmetry of tails of the idiosyncratic component as well as the market component is interesting and worth to be well studied in the future.
Tasks
Published	2019-12-01
URL	http://papers.nips.cc/paper/8641-cross-sectional-learning-of-extremal-dependence-among-financial-assets
PDF	http://papers.nips.cc/paper/8641-cross-sectional-learning-of-extremal-dependence-among-financial-assets.pdf
PWC	https://paperswithcode.com/paper/cross-sectional-learning-of-extremal
Repo
Framework

Towards Incremental Learning of Word Embeddings Using Context Informativeness


Title	Towards Incremental Learning of Word Embeddings Using Context Informativeness
Authors	Alex Kabbach, re, Kristina Gulordava, Aur{'e}lie Herbelot
Abstract	In this paper, we investigate the task of learning word embeddings from very sparse data in an incremental, cognitively-plausible way. We focus on the notion of {`}informativeness{'}, that is, the idea that some content is more valuable to the learning process than other. We further highlight the challenges of online learning and argue that previous systems fall short of implementing incrementality. Concretely, we incorporate informativeness in a previously proposed model of nonce learning, using it for context selection and learning rate modulation. We test our system on the task of learning new words from definitions, as well as on the task of learning new words from potentially uninformative contexts. We demonstrate that informativeness is crucial to obtaining state-of-the-art performance in a truly incremental setup. \|
Tasks	Learning Word Embeddings, Word Embeddings
Published	2019-07-01
URL	https://www.aclweb.org/anthology/P19-2022/
PDF	https://www.aclweb.org/anthology/P19-2022
PWC	https://paperswithcode.com/paper/towards-incremental-learning-of-word
Repo
Framework

G3raphGround: Graph-Based Language Grounding


Title	G3raphGround: Graph-Based Language Grounding
Authors	Mohit Bajaj, Lanjun Wang, Leonid Sigal
Abstract	In this paper we present an end-to-end framework for grounding of phrases in images. In contrast to previous works, our model, which we call GraphGround, uses graphs to formulate more complex, non-sequential dependencies among proposal image regions and phrases. We capture intra-modal dependencies using a separate graph neural network for each modality (visual and lingual), and then use conditional message-passing in another graph neural network to fuse their outputs and capture cross-modal relationships. This final representation results in grounding decisions. The framework supports many-to-many matching and is able to ground single phrase to multiple image regions and vice versa. We validate our design choices through a series of ablation studies and illustrate state-of-the-art performance on Flickr30k and ReferIt Game benchmark datasets.
Tasks
Published	2019-10-01
URL	http://openaccess.thecvf.com/content_ICCV_2019/html/Bajaj_G3raphGround_Graph-Based_Language_Grounding_ICCV_2019_paper.html
PDF	http://openaccess.thecvf.com/content_ICCV_2019/papers/Bajaj_G3raphGround_Graph-Based_Language_Grounding_ICCV_2019_paper.pdf
PWC	https://paperswithcode.com/paper/g3raphground-graph-based-language-grounding
Repo
Framework

Mixture-Kernel Graph Attention Network for Situation Recognition


Title	Mixture-Kernel Graph Attention Network for Situation Recognition
Authors	Mohammed Suhail, Leonid Sigal
Abstract	Understanding images beyond salient actions involves reasoning about scene context, objects, and the roles they play in the captured event. Situation recognition has recently been introduced as the task of jointly reasoning about the verbs (actions) and a set of semantic-role and entity (noun) pairs in the form of action frames. Labeling an image with an action frame requires an assignment of values (nouns) to the roles based on the observed image content. Among the inherent challenges are the rich conditional structured dependencies between the output role assignments and the overall semantic sparsity. In this paper, we propose a novel mixture-kernel attention graph neural network (GNN) architecture designed to address these challenges. Our GNN enables dynamic graph structure during training and inference, through the use of a graph attention mechanism, and context-aware interactions between role pairs. We illustrate the efficacy of our model and design choices by conducting experiments on imSitu benchmark dataset, with accuracy improvements of up to 10% over the state-of-the-art.
Tasks
Published	2019-10-01
URL	http://openaccess.thecvf.com/content_ICCV_2019/html/Suhail_Mixture-Kernel_Graph_Attention_Network_for_Situation_Recognition_ICCV_2019_paper.html
PDF	http://openaccess.thecvf.com/content_ICCV_2019/papers/Suhail_Mixture-Kernel_Graph_Attention_Network_for_Situation_Recognition_ICCV_2019_paper.pdf
PWC	https://paperswithcode.com/paper/mixture-kernel-graph-attention-network-for
Repo
Framework