February 1, 2020

2874 words 14 mins read

Paper Group AWR 169

Keeping Your Distance: Solving Sparse Reward Tasks Using Self-Balancing Shaped Rewards. Mixture Models for Diverse Machine Translation: Tricks of the Trade. Factor Graph Neural Network. Scoring Sentence Singletons and Pairs for Abstractive Summarization. word2word: A Collection of Bilingual Lexicons for 3,564 Language Pairs. Primal-Dual Block Frank …

Keeping Your Distance: Solving Sparse Reward Tasks Using Self-Balancing Shaped Rewards


Title	Keeping Your Distance: Solving Sparse Reward Tasks Using Self-Balancing Shaped Rewards
Authors	Alexander Trott, Stephan Zheng, Caiming Xiong, Richard Socher
Abstract	While using shaped rewards can be beneficial when solving sparse reward tasks, their successful application often requires careful engineering and is problem specific. For instance, in tasks where the agent must achieve some goal state, simple distance-to-goal reward shaping often fails, as it renders learning vulnerable to local optima. We introduce a simple and effective model-free method to learn from shaped distance-to-goal rewards on tasks where success depends on reaching a goal state. Our method introduces an auxiliary distance-based reward based on pairs of rollouts to encourage diverse exploration. This approach effectively prevents learning dynamics from stabilizing around local optima induced by the naive distance-to-goal reward shaping and enables policies to efficiently solve sparse reward tasks. Our augmented objective does not require any additional reward engineering or domain expertise to implement and converges to the original sparse objective as the agent learns to solve the task. We demonstrate that our method successfully solves a variety of hard-exploration tasks (including maze navigation and 3D construction in a Minecraft environment), where naive distance-based reward shaping otherwise fails, and intrinsic curiosity and reward relabeling strategies exhibit poor performance.
Tasks
Published	2019-11-04
URL	https://arxiv.org/abs/1911.01417v1
PDF	https://arxiv.org/pdf/1911.01417v1.pdf
PWC	https://paperswithcode.com/paper/keeping-your-distance-solving-sparse-reward
Repo	https://github.com/salesforce/sibling-rivalry
Framework	pytorch

Mixture Models for Diverse Machine Translation: Tricks of the Trade


Title	Mixture Models for Diverse Machine Translation: Tricks of the Trade
Authors	Tianxiao Shen, Myle Ott, Michael Auli, Marc’Aurelio Ranzato
Abstract	Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
Tasks	Machine Translation, Text Generation
Published	2019-02-20
URL	https://arxiv.org/abs/1902.07816v2
PDF	https://arxiv.org/pdf/1902.07816v2.pdf
PWC	https://paperswithcode.com/paper/mixture-models-for-diverse-machine
Repo	https://github.com/pytorch/fairseq
Framework	pytorch

Factor Graph Neural Network


Title	Factor Graph Neural Network
Authors	Zhen Zhang, Fan Wu, Wee Sun Lee
Abstract	Most of the successful deep neural network architectures are structured, often consisting of elements like convolutional neural networks and gated recurrent neural networks. Recently, graph neural networks have been successfully applied to graph structured data such as point cloud and molecular data. These networks often only consider pairwise dependencies, as they operate on a graph structure. We generalize the graph neural network into a factor graph neural network (FGNN) in order to capture higher order dependencies. We show that FGNN is able to represent Max-Product Belief Propagation, an approximate inference algorithm on probabilistic graphical models; hence it is able to do well when Max-Product does well. Promising results on both synthetic and real datasets demonstrate the effectiveness of the proposed model.
Tasks
Published	2019-06-03
URL	https://arxiv.org/abs/1906.00554v1
PDF	https://arxiv.org/pdf/1906.00554v1.pdf
PWC	https://paperswithcode.com/paper/190600554
Repo	https://github.com/zzhang1987/Factor-Graph-Neural-Network
Framework	pytorch

Scoring Sentence Singletons and Pairs for Abstractive Summarization


Title	Scoring Sentence Singletons and Pairs for Abstractive Summarization
Authors	Logan Lebanoff, Kaiqiang Song, Franck Dernoncourt, Doo Soon Kim, Seokhwan Kim, Walter Chang, Fei Liu
Abstract	When writing a summary, humans tend to choose content from one or two sentences and merge them into a single summary sentence. However, the mechanisms behind the selection of one or multiple source sentences remain poorly understood. Sentence fusion assumes multi-sentence input; yet sentence selection methods only work with single sentences and not combinations of them. There is thus a crucial gap between sentence selection and fusion to support summarizing by both compressing single sentences and fusing pairs. This paper attempts to bridge the gap by ranking sentence singletons and pairs together in a unified space. Our proposed framework attempts to model human methodology by selecting either a single sentence or a pair of sentences, then compressing or fusing the sentence(s) to produce a summary sentence. We conduct extensive experiments on both single- and multi-document summarization datasets and report findings on sentence selection and abstraction.
Tasks	Abstractive Text Summarization, Document Summarization, Multi-Document Summarization
Published	2019-05-31
URL	https://arxiv.org/abs/1906.00077v1
PDF	https://arxiv.org/pdf/1906.00077v1.pdf
PWC	https://paperswithcode.com/paper/190600077
Repo	https://github.com/ucfnlp/summarization-sing-pair-mix
Framework	tf

word2word: A Collection of Bilingual Lexicons for 3,564 Language Pairs


Title	word2word: A Collection of Bilingual Lexicons for 3,564 Language Pairs
Authors	Yo Joong Choe, Kyubyong Park, Dongwoo Kim
Abstract	We present word2word, a publicly available dataset and an open-source Python package for cross-lingual word translations extracted from sentence-level parallel corpora. Our dataset provides top-k word translations in 3,564 (directed) language pairs across 62 languages in OpenSubtitles2018 (Lison et al., 2018). To obtain this dataset, we use a count-based bilingual lexicon extraction model based on the observation that not only source and target words but also source words themselves can be highly correlated. We illustrate that the resulting bilingual lexicons have high coverage and attain competitive translation quality for several language pairs. We wrap our dataset and model in an easy-to-use Python library, which supports downloading and retrieving top-k word translations in any of the supported language pairs as well as computing top-k word translations for custom parallel corpora.
Tasks
Published	2019-11-27
URL	https://arxiv.org/abs/1911.12019v1
PDF	https://arxiv.org/pdf/1911.12019v1.pdf
PWC	https://paperswithcode.com/paper/word2word-a-collection-of-bilingual-lexicons
Repo	https://github.com/Kyubyong/word2word
Framework	none

Primal-Dual Block Frank-Wolfe


Title	Primal-Dual Block Frank-Wolfe
Authors	Qi Lei, Jiacheng Zhuo, Constantine Caramanis, Inderjit S. Dhillon, Alexandros G. Dimakis
Abstract	We propose a variant of the Frank-Wolfe algorithm for solving a class of sparse/low-rank optimization problems. Our formulation includes Elastic Net, regularized SVMs and phase retrieval as special cases. The proposed Primal-Dual Block Frank-Wolfe algorithm reduces the per-iteration cost while maintaining linear convergence rate. The per iteration cost of our method depends on the structural complexity of the solution (i.e. sparsity/low-rank) instead of the ambient dimension. We empirically show that our algorithm outperforms the state-of-the-art methods on (multi-class) classification tasks.
Tasks
Published	2019-06-06
URL	https://arxiv.org/abs/1906.02436v1
PDF	https://arxiv.org/pdf/1906.02436v1.pdf
PWC	https://paperswithcode.com/paper/primal-dual-block-frank-wolfe
Repo	https://github.com/CarlsonZhuo/primal_dual_frank_wolfe
Framework	none

Human-Centered Emotion Recognition in Animated GIFs


Title	Human-Centered Emotion Recognition in Animated GIFs
Authors	Zhengyuan Yang, Yixuan Zhang, Jiebo Luo
Abstract	As an intuitive way of expression emotion, the animated Graphical Interchange Format (GIF) images have been widely used on social media. Most previous studies on automated GIF emotion recognition fail to effectively utilize GIF’s unique properties, and this potentially limits the recognition performance. In this study, we demonstrate the importance of human related information in GIFs and conduct human-centered GIF emotion recognition with a proposed Keypoint Attended Visual Attention Network (KAVAN). The framework consists of a facial attention module and a hierarchical segment temporal module. The facial attention module exploits the strong relationship between GIF contents and human characters, and extracts frame-level visual feature with a focus on human faces. The Hierarchical Segment LSTM (HS-LSTM) module is then proposed to better learn global GIF representations. Our proposed framework outperforms the state-of-the-art on the MIT GIFGIF dataset. Furthermore, the facial attention module provides reliable facial region mask predictions, which improves the model’s interpretability.
Tasks	Emotion Recognition
Published	2019-04-27
URL	http://arxiv.org/abs/1904.12201v1
PDF	http://arxiv.org/pdf/1904.12201v1.pdf
PWC	https://paperswithcode.com/paper/human-centered-emotion-recognition-in
Repo	https://github.com/zyang-ur/human-centered-GIF
Framework	none

Predictive Inequity in Object Detection


Title	Predictive Inequity in Object Detection
Authors	Benjamin Wilson, Judy Hoffman, Jamie Morgenstern
Abstract	In this work, we investigate whether state-of-the-art object detection systems have equitable predictive performance on pedestrians with different skin tones. This work is motivated by many recent examples of ML and vision systems displaying higher error rates for certain demographic groups than others. We annotate an existing large scale dataset which contains pedestrians, BDD100K, with Fitzpatrick skin tones in ranges [1-3] or [4-6]. We then provide an in-depth comparative analysis of performance between these two skin tone groupings, finding that neither time of day nor occlusion explain this behavior, suggesting this disparity is not merely the result of pedestrians in the 4-6 range appearing in more difficult scenes for detection. We investigate to what extent time of day, occlusion, and reweighting the supervised loss during training affect this predictive bias.
Tasks	Object Detection
Published	2019-02-21
URL	http://arxiv.org/abs/1902.11097v1
PDF	http://arxiv.org/pdf/1902.11097v1.pdf
PWC	https://paperswithcode.com/paper/predictive-inequity-in-object-detection
Repo	https://github.com/benjaminrwilson/inequity-release
Framework	pytorch

Correlated Variational Auto-Encoders


Title	Correlated Variational Auto-Encoders
Authors	Da Tang, Dawen Liang, Tony Jebara, Nicholas Ruozzi
Abstract	Variational Auto-Encoders (VAEs) are capable of learning latent representations for high dimensional data. However, due to the i.i.d. assumption, VAEs only optimize the singleton variational distributions and fail to account for the correlations between data points, which might be crucial for learning latent representations from dataset where a priori we know correlations exist. We propose Correlated Variational Auto-Encoders (CVAEs) that can take the correlation structure into consideration when learning latent representations with VAEs. CVAEs apply a prior based on the correlation structure. To address the intractability introduced by the correlated prior, we develop an approximation by average of a set of tractable lower bounds over all maximal acyclic subgraphs of the undirected correlation graph. Experimental results on matching and link prediction on public benchmark rating datasets and spectral clustering on a synthetic dataset show the effectiveness of the proposed method over baseline algorithms.
Tasks	Link Prediction
Published	2019-05-14
URL	https://arxiv.org/abs/1905.05335v4
PDF	https://arxiv.org/pdf/1905.05335v4.pdf
PWC	https://paperswithcode.com/paper/correlated-variational-auto-encoders
Repo	https://github.com/datang1992/Correlated-VAEs
Framework	tf

Deep Anomaly Detection with Deviation Networks


Title	Deep Anomaly Detection with Deviation Networks
Authors	Guansong Pang, Chunhua Shen, Anton van den Hengel
Abstract	Although deep learning has been applied to successfully address many data mining problems, relatively limited work has been done on deep learning for anomaly detection. Existing deep anomaly detection methods, which focus on learning new feature representations to enable downstream anomaly detection methods, perform indirect optimization of anomaly scores, leading to data-inefficient learning and suboptimal anomaly scoring. Also, they are typically designed as unsupervised learning due to the lack of large-scale labeled anomaly data. As a result, they are difficult to leverage prior knowledge (e.g., a few labeled anomalies) when such information is available as in many real-world anomaly detection applications. This paper introduces a novel anomaly detection framework and its instantiation to address these problems. Instead of representation learning, our method fulfills an end-to-end learning of anomaly scores by a neural deviation learning, in which we leverage a few (e.g., multiple to dozens) labeled anomalies and a prior probability to enforce statistically significant deviations of the anomaly scores of anomalies from that of normal data objects in the upper tail. Extensive results show that our method can be trained substantially more data-efficiently and achieves significantly better anomaly scoring than state-of-the-art competing methods.
Tasks	Anomaly Detection, Cyber Attack Detection, Fraud Detection, Network Intrusion Detection, Representation Learning
Published	2019-11-19
URL	https://arxiv.org/abs/1911.08623v1
PDF	https://arxiv.org/pdf/1911.08623v1.pdf
PWC	https://paperswithcode.com/paper/deep-anomaly-detection-with-deviation
Repo	https://github.com/GuansongPang/deviation-network
Framework	tf

Multi-Object Representation Learning with Iterative Variational Inference


Title	Multi-Object Representation Learning with Iterative Variational Inference
Authors	Klaus Greff, Raphaël Lopez Kaufman, Rishabh Kabra, Nick Watters, Chris Burgess, Daniel Zoran, Loic Matthey, Matthew Botvinick, Alexander Lerchner
Abstract	Human perception is structured around objects which form the basis for our higher-level cognition and impressive systematic generalization abilities. Yet most work on representation learning focuses on feature learning without even considering multiple objects, or treats segmentation as an (often supervised) preprocessing step. Instead, we argue for the importance of learning to segment and represent objects jointly. We demonstrate that, starting from the simple assumption that a scene is composed of multiple entities, it is possible to learn to segment images into interpretable objects with disentangled representations. Our method learns – without supervision – to inpaint occluded parts, and extrapolates to scenes with more objects and to unseen objects with novel feature combinations. We also show that, due to the use of iterative variational inference, our system is able to learn multi-modal posteriors for ambiguous inputs and extends naturally to sequences.
Tasks	Representation Learning
Published	2019-03-01
URL	https://arxiv.org/abs/1903.00450v2
PDF	https://arxiv.org/pdf/1903.00450v2.pdf
PWC	https://paperswithcode.com/paper/multi-object-representation-learning-with
Repo	https://github.com/MichaelKevinKelly/IODINE
Framework	pytorch

Semantic Foreground Inpainting from Weak Supervision


Title	Semantic Foreground Inpainting from Weak Supervision
Authors	Chenyang Lu, Gijs Dubbelman
Abstract	Semantic scene understanding is an essential task for self-driving vehicles and mobile robots. In our work, we aim to estimate a semantic segmentation map, in which the foreground objects are removed and semantically inpainted with background classes, from a single RGB image. This semantic foreground inpainting task is performed by a single-stage convolutional neural network (CNN) that contains our novel max-pooling as inpainting (MPI) module, which is trained with weak supervision, i.e., it does not require manual background annotations for the foreground regions to be inpainted. Our approach is inherently more efficient than the previous two-stage state-of-the-art method, and outperforms it by a margin of 3% IoU for the inpainted foreground regions on Cityscapes. The performance margin increases to 6% IoU, when tested on the unseen KITTI dataset. The code and the manually annotated datasets for testing are shared with the research community at https://github.com/Chenyang-Lu/semantic-foreground-inpainting.
Tasks	Scene Understanding, Semantic Segmentation
Published	2019-09-10
URL	https://arxiv.org/abs/1909.04564v3
PDF	https://arxiv.org/pdf/1909.04564v3.pdf
PWC	https://paperswithcode.com/paper/semantic-foreground-inpainting-from-weak
Repo	https://github.com/Chenyang-Lu/semantic-foreground-inpainting
Framework	none

On large-scale genre classification in symbolically encoded music by automatic identification of repeating patterns


Title	On large-scale genre classification in symbolically encoded music by automatic identification of repeating patterns
Authors	Andres Ferraro, Kjell Lemström
Abstract	The importance of repetitions in music is well-known. In this paper, we study music repetitions in the context of effective and efficient automatic genre classification in large-scale music-databases. We aim at enhancing the access and organization of pieces of music in Digital Libraries by allowing automatic categorization of entire collections by considering only their musical content. We handover to the public a set of genre-specific patterns to support research in musicology. The patterns can be used, for instance, to explore and analyze the relations between musical genres. There are many existing algorithms that could be used to identify and extract repeating patterns in symbolically encoded music. In our case, the extracted patterns are used as representations of the pieces of music on the underlying corpus and, consecutively, to train and evaluate a classifier to automatically identify genres. In this paper, we apply two very fast algorithms enabling us to experiment on large and diverse corpora. Thus, we are able to find patterns with strong discrimination power that can be used in various applications. We carried out experiments on a corpus containing over 40,000 MIDI files annotated with at least one genre. The experiments suggest that our approach is scalable and capable of dealing with real-world-size music collections.
Tasks
Published	2019-10-21
URL	https://arxiv.org/abs/1910.09242v1
PDF	https://arxiv.org/pdf/1910.09242v1.pdf
PWC	https://paperswithcode.com/paper/on-large-scale-genre-classification-in
Repo	https://github.com/andrebola/patterns-genres
Framework	none

Learning Data Augmentation Strategies for Object Detection


Title	Learning Data Augmentation Strategies for Object Detection
Authors	Barret Zoph, Ekin D. Cubuk, Golnaz Ghiasi, Tsung-Yi Lin, Jonathon Shlens, Quoc V. Le
Abstract	Data augmentation is a critical component of training deep learning models. Although data augmentation has been shown to significantly improve image classification, its potential has not been thoroughly investigated for object detection. Given the additional cost for annotating images for object detection, data augmentation may be of even greater importance for this computer vision task. In this work, we study the impact of data augmentation on object detection. We first demonstrate that data augmentation operations borrowed from image classification may be helpful for training detection models, but the improvement is limited. Thus, we investigate how learned, specialized data augmentation policies improve generalization performance for detection models. Importantly, these augmentation policies only affect training and leave a trained model unchanged during evaluation. Experiments on the COCO dataset indicate that an optimized data augmentation policy improves detection accuracy by more than +2.3 mAP, and allow a single inference model to achieve a state-of-the-art accuracy of 50.7 mAP. Importantly, the best policy found on COCO may be transferred unchanged to other detection datasets and models to improve predictive accuracy. For example, the best augmentation policy identified with COCO improves a strong baseline on PASCAL-VOC by +2.7 mAP. Our results also reveal that a learned augmentation policy is superior to state-of-the-art architecture regularization methods for object detection, even when considering strong baselines. Code for training with the learned policy is available online at https://github.com/tensorflow/tpu/tree/master/models/official/detection
Tasks	Data Augmentation, Image Augmentation, Image Classification, Object Detection
Published	2019-06-26
URL	https://arxiv.org/abs/1906.11172v1
PDF	https://arxiv.org/pdf/1906.11172v1.pdf
PWC	https://paperswithcode.com/paper/learning-data-augmentation-strategies-for
Repo	https://github.com/chenyouxin113/sota-status-investigation
Framework	tf

Unified Vision-Language Pre-Training for Image Captioning and VQA


Title	Unified Vision-Language Pre-Training for Image Captioning and VQA
Authors	Luowei Zhou, Hamid Palangi, Lei Zhang, Houdong Hu, Jason J. Corso, Jianfeng Gao
Abstract	This paper presents a unified Vision-Language Pre-training (VLP) model. The model is unified in that (1) it can be fine-tuned for either vision-language generation (e.g., image captioning) or understanding (e.g., visual question answering) tasks, and (2) it uses a shared multi-layer transformer network for both encoding and decoding, which differs from many existing methods where the encoder and decoder are implemented using separate models. The unified VLP model is pre-trained on a large amount of image-text pairs using the unsupervised learning objectives of two tasks: bidirectional and sequence-to-sequence (seq2seq) masked vision-language prediction. The two tasks differ solely in what context the prediction conditions on. This is controlled by utilizing specific self-attention masks for the shared transformer network. To the best of our knowledge, VLP is the first reported model that achieves state-of-the-art results on both vision-language generation and understanding tasks, as disparate as image captioning and visual question answering, across three challenging benchmark datasets: COCO Captions, Flickr30k Captions, and VQA 2.0. The code and the pre-trained models are available at https://github.com/LuoweiZhou/VLP.
Tasks	Image Captioning, Question Answering, Text Generation, Visual Question Answering
Published	2019-09-24
URL	https://arxiv.org/abs/1909.11059v3
PDF	https://arxiv.org/pdf/1909.11059v3.pdf
PWC	https://paperswithcode.com/paper/unified-vision-language-pre-training-for
Repo	https://github.com/LuoweiZhou/VLP
Framework	pytorch