Paper Group AWR 169
Keeping Your Distance: Solving Sparse Reward Tasks Using Self-Balancing Shaped Rewards. Mixture Models for Diverse Machine Translation: Tricks of the Trade. Factor Graph Neural Network. Scoring Sentence Singletons and Pairs for Abstractive Summarization. word2word: A Collection of Bilingual Lexicons for 3,564 Language Pairs. Primal-Dual Block Frank …
Keeping Your Distance: Solving Sparse Reward Tasks Using Self-Balancing Shaped Rewards
Title | Keeping Your Distance: Solving Sparse Reward Tasks Using Self-Balancing Shaped Rewards |
Authors | Alexander Trott, Stephan Zheng, Caiming Xiong, Richard Socher |
Abstract | While using shaped rewards can be beneficial when solving sparse reward tasks, their successful application often requires careful engineering and is problem specific. For instance, in tasks where the agent must achieve some goal state, simple distance-to-goal reward shaping often fails, as it renders learning vulnerable to local optima. We introduce a simple and effective model-free method to learn from shaped distance-to-goal rewards on tasks where success depends on reaching a goal state. Our method introduces an auxiliary distance-based reward based on pairs of rollouts to encourage diverse exploration. This approach effectively prevents learning dynamics from stabilizing around local optima induced by the naive distance-to-goal reward shaping and enables policies to efficiently solve sparse reward tasks. Our augmented objective does not require any additional reward engineering or domain expertise to implement and converges to the original sparse objective as the agent learns to solve the task. We demonstrate that our method successfully solves a variety of hard-exploration tasks (including maze navigation and 3D construction in a Minecraft environment), where naive distance-based reward shaping otherwise fails, and intrinsic curiosity and reward relabeling strategies exhibit poor performance. |
Tasks | |
Published | 2019-11-04 |
URL | https://arxiv.org/abs/1911.01417v1 |
https://arxiv.org/pdf/1911.01417v1.pdf | |
PWC | https://paperswithcode.com/paper/keeping-your-distance-solving-sparse-reward |
Repo | https://github.com/salesforce/sibling-rivalry |
Framework | pytorch |
Mixture Models for Diverse Machine Translation: Tricks of the Trade
Title | Mixture Models for Diverse Machine Translation: Tricks of the Trade |
Authors | Tianxiao Shen, Myle Ott, Michael Auli, Marc’Aurelio Ranzato |
Abstract | Facebook AI Research Sequence-to-Sequence Toolkit written in Python. |
Tasks | Machine Translation, Text Generation |
Published | 2019-02-20 |
URL | https://arxiv.org/abs/1902.07816v2 |
https://arxiv.org/pdf/1902.07816v2.pdf | |
PWC | https://paperswithcode.com/paper/mixture-models-for-diverse-machine |
Repo | https://github.com/pytorch/fairseq |
Framework | pytorch |
Factor Graph Neural Network
Title | Factor Graph Neural Network |
Authors | Zhen Zhang, Fan Wu, Wee Sun Lee |
Abstract | Most of the successful deep neural network architectures are structured, often consisting of elements like convolutional neural networks and gated recurrent neural networks. Recently, graph neural networks have been successfully applied to graph structured data such as point cloud and molecular data. These networks often only consider pairwise dependencies, as they operate on a graph structure. We generalize the graph neural network into a factor graph neural network (FGNN) in order to capture higher order dependencies. We show that FGNN is able to represent Max-Product Belief Propagation, an approximate inference algorithm on probabilistic graphical models; hence it is able to do well when Max-Product does well. Promising results on both synthetic and real datasets demonstrate the effectiveness of the proposed model. |
Tasks | |
Published | 2019-06-03 |
URL | https://arxiv.org/abs/1906.00554v1 |
https://arxiv.org/pdf/1906.00554v1.pdf | |
PWC | https://paperswithcode.com/paper/190600554 |
Repo | https://github.com/zzhang1987/Factor-Graph-Neural-Network |
Framework | pytorch |
Scoring Sentence Singletons and Pairs for Abstractive Summarization
Title | Scoring Sentence Singletons and Pairs for Abstractive Summarization |
Authors | Logan Lebanoff, Kaiqiang Song, Franck Dernoncourt, Doo Soon Kim, Seokhwan Kim, Walter Chang, Fei Liu |
Abstract | When writing a summary, humans tend to choose content from one or two sentences and merge them into a single summary sentence. However, the mechanisms behind the selection of one or multiple source sentences remain poorly understood. Sentence fusion assumes multi-sentence input; yet sentence selection methods only work with single sentences and not combinations of them. There is thus a crucial gap between sentence selection and fusion to support summarizing by both compressing single sentences and fusing pairs. This paper attempts to bridge the gap by ranking sentence singletons and pairs together in a unified space. Our proposed framework attempts to model human methodology by selecting either a single sentence or a pair of sentences, then compressing or fusing the sentence(s) to produce a summary sentence. We conduct extensive experiments on both single- and multi-document summarization datasets and report findings on sentence selection and abstraction. |
Tasks | Abstractive Text Summarization, Document Summarization, Multi-Document Summarization |
Published | 2019-05-31 |
URL | https://arxiv.org/abs/1906.00077v1 |
https://arxiv.org/pdf/1906.00077v1.pdf | |
PWC | https://paperswithcode.com/paper/190600077 |
Repo | https://github.com/ucfnlp/summarization-sing-pair-mix |
Framework | tf |
word2word: A Collection of Bilingual Lexicons for 3,564 Language Pairs
Title | word2word: A Collection of Bilingual Lexicons for 3,564 Language Pairs |
Authors | Yo Joong Choe, Kyubyong Park, Dongwoo Kim |
Abstract | We present word2word, a publicly available dataset and an open-source Python package for cross-lingual word translations extracted from sentence-level parallel corpora. Our dataset provides top-k word translations in 3,564 (directed) language pairs across 62 languages in OpenSubtitles2018 (Lison et al., 2018). To obtain this dataset, we use a count-based bilingual lexicon extraction model based on the observation that not only source and target words but also source words themselves can be highly correlated. We illustrate that the resulting bilingual lexicons have high coverage and attain competitive translation quality for several language pairs. We wrap our dataset and model in an easy-to-use Python library, which supports downloading and retrieving top-k word translations in any of the supported language pairs as well as computing top-k word translations for custom parallel corpora. |
Tasks | |
Published | 2019-11-27 |
URL | https://arxiv.org/abs/1911.12019v1 |
https://arxiv.org/pdf/1911.12019v1.pdf | |
PWC | https://paperswithcode.com/paper/word2word-a-collection-of-bilingual-lexicons |
Repo | https://github.com/Kyubyong/word2word |
Framework | none |
Primal-Dual Block Frank-Wolfe
Title | Primal-Dual Block Frank-Wolfe |
Authors | Qi Lei, Jiacheng Zhuo, Constantine Caramanis, Inderjit S. Dhillon, Alexandros G. Dimakis |
Abstract | We propose a variant of the Frank-Wolfe algorithm for solving a class of sparse/low-rank optimization problems. Our formulation includes Elastic Net, regularized SVMs and phase retrieval as special cases. The proposed Primal-Dual Block Frank-Wolfe algorithm reduces the per-iteration cost while maintaining linear convergence rate. The per iteration cost of our method depends on the structural complexity of the solution (i.e. sparsity/low-rank) instead of the ambient dimension. We empirically show that our algorithm outperforms the state-of-the-art methods on (multi-class) classification tasks. |
Tasks | |
Published | 2019-06-06 |
URL | https://arxiv.org/abs/1906.02436v1 |
https://arxiv.org/pdf/1906.02436v1.pdf | |
PWC | https://paperswithcode.com/paper/primal-dual-block-frank-wolfe |
Repo | https://github.com/CarlsonZhuo/primal_dual_frank_wolfe |
Framework | none |
Human-Centered Emotion Recognition in Animated GIFs
Title | Human-Centered Emotion Recognition in Animated GIFs |
Authors | Zhengyuan Yang, Yixuan Zhang, Jiebo Luo |
Abstract | As an intuitive way of expression emotion, the animated Graphical Interchange Format (GIF) images have been widely used on social media. Most previous studies on automated GIF emotion recognition fail to effectively utilize GIF’s unique properties, and this potentially limits the recognition performance. In this study, we demonstrate the importance of human related information in GIFs and conduct human-centered GIF emotion recognition with a proposed Keypoint Attended Visual Attention Network (KAVAN). The framework consists of a facial attention module and a hierarchical segment temporal module. The facial attention module exploits the strong relationship between GIF contents and human characters, and extracts frame-level visual feature with a focus on human faces. The Hierarchical Segment LSTM (HS-LSTM) module is then proposed to better learn global GIF representations. Our proposed framework outperforms the state-of-the-art on the MIT GIFGIF dataset. Furthermore, the facial attention module provides reliable facial region mask predictions, which improves the model’s interpretability. |
Tasks | Emotion Recognition |
Published | 2019-04-27 |
URL | http://arxiv.org/abs/1904.12201v1 |
http://arxiv.org/pdf/1904.12201v1.pdf | |
PWC | https://paperswithcode.com/paper/human-centered-emotion-recognition-in |
Repo | https://github.com/zyang-ur/human-centered-GIF |
Framework | none |
Predictive Inequity in Object Detection
Title | Predictive Inequity in Object Detection |
Authors | Benjamin Wilson, Judy Hoffman, Jamie Morgenstern |
Abstract | In this work, we investigate whether state-of-the-art object detection systems have equitable predictive performance on pedestrians with different skin tones. This work is motivated by many recent examples of ML and vision systems displaying higher error rates for certain demographic groups than others. We annotate an existing large scale dataset which contains pedestrians, BDD100K, with Fitzpatrick skin tones in ranges [1-3] or [4-6]. We then provide an in-depth comparative analysis of performance between these two skin tone groupings, finding that neither time of day nor occlusion explain this behavior, suggesting this disparity is not merely the result of pedestrians in the 4-6 range appearing in more difficult scenes for detection. We investigate to what extent time of day, occlusion, and reweighting the supervised loss during training affect this predictive bias. |
Tasks | Object Detection |
Published | 2019-02-21 |
URL | http://arxiv.org/abs/1902.11097v1 |
http://arxiv.org/pdf/1902.11097v1.pdf | |
PWC | https://paperswithcode.com/paper/predictive-inequity-in-object-detection |
Repo | https://github.com/benjaminrwilson/inequity-release |
Framework | pytorch |
Correlated Variational Auto-Encoders
Title | Correlated Variational Auto-Encoders |
Authors | Da Tang, Dawen Liang, Tony Jebara, Nicholas Ruozzi |
Abstract | Variational Auto-Encoders (VAEs) are capable of learning latent representations for high dimensional data. However, due to the i.i.d. assumption, VAEs only optimize the singleton variational distributions and fail to account for the correlations between data points, which might be crucial for learning latent representations from dataset where a priori we know correlations exist. We propose Correlated Variational Auto-Encoders (CVAEs) that can take the correlation structure into consideration when learning latent representations with VAEs. CVAEs apply a prior based on the correlation structure. To address the intractability introduced by the correlated prior, we develop an approximation by average of a set of tractable lower bounds over all maximal acyclic subgraphs of the undirected correlation graph. Experimental results on matching and link prediction on public benchmark rating datasets and spectral clustering on a synthetic dataset show the effectiveness of the proposed method over baseline algorithms. |
Tasks | Link Prediction |
Published | 2019-05-14 |
URL | https://arxiv.org/abs/1905.05335v4 |
https://arxiv.org/pdf/1905.05335v4.pdf | |
PWC | https://paperswithcode.com/paper/correlated-variational-auto-encoders |
Repo | https://github.com/datang1992/Correlated-VAEs |
Framework | tf |
Deep Anomaly Detection with Deviation Networks
Title | Deep Anomaly Detection with Deviation Networks |
Authors | Guansong Pang, Chunhua Shen, Anton van den Hengel |
Abstract | Although deep learning has been applied to successfully address many data mining problems, relatively limited work has been done on deep learning for anomaly detection. Existing deep anomaly detection methods, which focus on learning new feature representations to enable downstream anomaly detection methods, perform indirect optimization of anomaly scores, leading to data-inefficient learning and suboptimal anomaly scoring. Also, they are typically designed as unsupervised learning due to the lack of large-scale labeled anomaly data. As a result, they are difficult to leverage prior knowledge (e.g., a few labeled anomalies) when such information is available as in many real-world anomaly detection applications. This paper introduces a novel anomaly detection framework and its instantiation to address these problems. Instead of representation learning, our method fulfills an end-to-end learning of anomaly scores by a neural deviation learning, in which we leverage a few (e.g., multiple to dozens) labeled anomalies and a prior probability to enforce statistically significant deviations of the anomaly scores of anomalies from that of normal data objects in the upper tail. Extensive results show that our method can be trained substantially more data-efficiently and achieves significantly better anomaly scoring than state-of-the-art competing methods. |
Tasks | Anomaly Detection, Cyber Attack Detection, Fraud Detection, Network Intrusion Detection, Representation Learning |
Published | 2019-11-19 |
URL | https://arxiv.org/abs/1911.08623v1 |
https://arxiv.org/pdf/1911.08623v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-anomaly-detection-with-deviation |
Repo | https://github.com/GuansongPang/deviation-network |
Framework | tf |
Multi-Object Representation Learning with Iterative Variational Inference
Title | Multi-Object Representation Learning with Iterative Variational Inference |
Authors | Klaus Greff, Raphaël Lopez Kaufman, Rishabh Kabra, Nick Watters, Chris Burgess, Daniel Zoran, Loic Matthey, Matthew Botvinick, Alexander Lerchner |
Abstract | Human perception is structured around objects which form the basis for our higher-level cognition and impressive systematic generalization abilities. Yet most work on representation learning focuses on feature learning without even considering multiple objects, or treats segmentation as an (often supervised) preprocessing step. Instead, we argue for the importance of learning to segment and represent objects jointly. We demonstrate that, starting from the simple assumption that a scene is composed of multiple entities, it is possible to learn to segment images into interpretable objects with disentangled representations. Our method learns – without supervision – to inpaint occluded parts, and extrapolates to scenes with more objects and to unseen objects with novel feature combinations. We also show that, due to the use of iterative variational inference, our system is able to learn multi-modal posteriors for ambiguous inputs and extends naturally to sequences. |
Tasks | Representation Learning |
Published | 2019-03-01 |
URL | https://arxiv.org/abs/1903.00450v2 |
https://arxiv.org/pdf/1903.00450v2.pdf | |
PWC | https://paperswithcode.com/paper/multi-object-representation-learning-with |
Repo | https://github.com/MichaelKevinKelly/IODINE |
Framework | pytorch |
Semantic Foreground Inpainting from Weak Supervision
Title | Semantic Foreground Inpainting from Weak Supervision |
Authors | Chenyang Lu, Gijs Dubbelman |
Abstract | Semantic scene understanding is an essential task for self-driving vehicles and mobile robots. In our work, we aim to estimate a semantic segmentation map, in which the foreground objects are removed and semantically inpainted with background classes, from a single RGB image. This semantic foreground inpainting task is performed by a single-stage convolutional neural network (CNN) that contains our novel max-pooling as inpainting (MPI) module, which is trained with weak supervision, i.e., it does not require manual background annotations for the foreground regions to be inpainted. Our approach is inherently more efficient than the previous two-stage state-of-the-art method, and outperforms it by a margin of 3% IoU for the inpainted foreground regions on Cityscapes. The performance margin increases to 6% IoU, when tested on the unseen KITTI dataset. The code and the manually annotated datasets for testing are shared with the research community at https://github.com/Chenyang-Lu/semantic-foreground-inpainting. |
Tasks | Scene Understanding, Semantic Segmentation |
Published | 2019-09-10 |
URL | https://arxiv.org/abs/1909.04564v3 |
https://arxiv.org/pdf/1909.04564v3.pdf | |
PWC | https://paperswithcode.com/paper/semantic-foreground-inpainting-from-weak |
Repo | https://github.com/Chenyang-Lu/semantic-foreground-inpainting |
Framework | none |
On large-scale genre classification in symbolically encoded music by automatic identification of repeating patterns
Title | On large-scale genre classification in symbolically encoded music by automatic identification of repeating patterns |
Authors | Andres Ferraro, Kjell Lemström |
Abstract | The importance of repetitions in music is well-known. In this paper, we study music repetitions in the context of effective and efficient automatic genre classification in large-scale music-databases. We aim at enhancing the access and organization of pieces of music in Digital Libraries by allowing automatic categorization of entire collections by considering only their musical content. We handover to the public a set of genre-specific patterns to support research in musicology. The patterns can be used, for instance, to explore and analyze the relations between musical genres. There are many existing algorithms that could be used to identify and extract repeating patterns in symbolically encoded music. In our case, the extracted patterns are used as representations of the pieces of music on the underlying corpus and, consecutively, to train and evaluate a classifier to automatically identify genres. In this paper, we apply two very fast algorithms enabling us to experiment on large and diverse corpora. Thus, we are able to find patterns with strong discrimination power that can be used in various applications. We carried out experiments on a corpus containing over 40,000 MIDI files annotated with at least one genre. The experiments suggest that our approach is scalable and capable of dealing with real-world-size music collections. |
Tasks | |
Published | 2019-10-21 |
URL | https://arxiv.org/abs/1910.09242v1 |
https://arxiv.org/pdf/1910.09242v1.pdf | |
PWC | https://paperswithcode.com/paper/on-large-scale-genre-classification-in |
Repo | https://github.com/andrebola/patterns-genres |
Framework | none |
Learning Data Augmentation Strategies for Object Detection
Title | Learning Data Augmentation Strategies for Object Detection |
Authors | Barret Zoph, Ekin D. Cubuk, Golnaz Ghiasi, Tsung-Yi Lin, Jonathon Shlens, Quoc V. Le |
Abstract | Data augmentation is a critical component of training deep learning models. Although data augmentation has been shown to significantly improve image classification, its potential has not been thoroughly investigated for object detection. Given the additional cost for annotating images for object detection, data augmentation may be of even greater importance for this computer vision task. In this work, we study the impact of data augmentation on object detection. We first demonstrate that data augmentation operations borrowed from image classification may be helpful for training detection models, but the improvement is limited. Thus, we investigate how learned, specialized data augmentation policies improve generalization performance for detection models. Importantly, these augmentation policies only affect training and leave a trained model unchanged during evaluation. Experiments on the COCO dataset indicate that an optimized data augmentation policy improves detection accuracy by more than +2.3 mAP, and allow a single inference model to achieve a state-of-the-art accuracy of 50.7 mAP. Importantly, the best policy found on COCO may be transferred unchanged to other detection datasets and models to improve predictive accuracy. For example, the best augmentation policy identified with COCO improves a strong baseline on PASCAL-VOC by +2.7 mAP. Our results also reveal that a learned augmentation policy is superior to state-of-the-art architecture regularization methods for object detection, even when considering strong baselines. Code for training with the learned policy is available online at https://github.com/tensorflow/tpu/tree/master/models/official/detection |
Tasks | Data Augmentation, Image Augmentation, Image Classification, Object Detection |
Published | 2019-06-26 |
URL | https://arxiv.org/abs/1906.11172v1 |
https://arxiv.org/pdf/1906.11172v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-data-augmentation-strategies-for |
Repo | https://github.com/chenyouxin113/sota-status-investigation |
Framework | tf |
Unified Vision-Language Pre-Training for Image Captioning and VQA
Title | Unified Vision-Language Pre-Training for Image Captioning and VQA |
Authors | Luowei Zhou, Hamid Palangi, Lei Zhang, Houdong Hu, Jason J. Corso, Jianfeng Gao |
Abstract | This paper presents a unified Vision-Language Pre-training (VLP) model. The model is unified in that (1) it can be fine-tuned for either vision-language generation (e.g., image captioning) or understanding (e.g., visual question answering) tasks, and (2) it uses a shared multi-layer transformer network for both encoding and decoding, which differs from many existing methods where the encoder and decoder are implemented using separate models. The unified VLP model is pre-trained on a large amount of image-text pairs using the unsupervised learning objectives of two tasks: bidirectional and sequence-to-sequence (seq2seq) masked vision-language prediction. The two tasks differ solely in what context the prediction conditions on. This is controlled by utilizing specific self-attention masks for the shared transformer network. To the best of our knowledge, VLP is the first reported model that achieves state-of-the-art results on both vision-language generation and understanding tasks, as disparate as image captioning and visual question answering, across three challenging benchmark datasets: COCO Captions, Flickr30k Captions, and VQA 2.0. The code and the pre-trained models are available at https://github.com/LuoweiZhou/VLP. |
Tasks | Image Captioning, Question Answering, Text Generation, Visual Question Answering |
Published | 2019-09-24 |
URL | https://arxiv.org/abs/1909.11059v3 |
https://arxiv.org/pdf/1909.11059v3.pdf | |
PWC | https://paperswithcode.com/paper/unified-vision-language-pre-training-for |
Repo | https://github.com/LuoweiZhou/VLP |
Framework | pytorch |