February 2, 2020

3241 words 16 mins read

Paper Group AWR 24

Adversarial NLI: A New Benchmark for Natural Language Understanding. CSS10: A Collection of Single Speaker Speech Datasets for 10 Languages. On Learning Paradigms for the Travelling Salesman Problem. Posterior inference unchained with EL_2O. Contrastive Multiview Coding. Meta-Learning Representations for Continual Learning. Learning What and Where …

Adversarial NLI: A New Benchmark for Natural Language Understanding


Title	Adversarial NLI: A New Benchmark for Natural Language Understanding
Authors	Yixin Nie, Adina Williams, Emily Dinan, Mohit Bansal, Jason Weston, Douwe Kiela
Abstract	We introduce a new large-scale NLI benchmark dataset, collected via an iterative, adversarial human-and-model-in-the-loop procedure. We show that training models on this new dataset leads to state-of-the-art performance on a variety of popular NLI benchmarks, while posing a more difficult challenge with its new test set. Our analysis sheds light on the shortcomings of current state-of-the-art models, and shows that non-expert annotators are successful at finding their weaknesses. The data collection method can be applied in a never-ending learning scenario, becoming a moving target for NLU, rather than a static benchmark that will quickly saturate.
Tasks
Published	2019-10-31
URL	https://arxiv.org/abs/1910.14599v1
PDF	https://arxiv.org/pdf/1910.14599v1.pdf
PWC	https://paperswithcode.com/paper/adversarial-nli-a-new-benchmark-for-natural
Repo	https://github.com/facebookresearch/anli
Framework	none

CSS10: A Collection of Single Speaker Speech Datasets for 10 Languages


Title	CSS10: A Collection of Single Speaker Speech Datasets for 10 Languages
Authors	Kyubyong Park, Thomas Mulc
Abstract	We describe our development of CSS10, a collection of single speaker speech datasets for ten languages. It is composed of short audio clips from LibriVox audiobooks and their aligned texts. To validate its quality we train two neural text-to-speech models on each dataset. Subsequently, we conduct Mean Opinion Score tests on the synthesized speech samples. We make our datasets, pre-trained models, and test resources publicly available. We hope they will be used for future speech tasks.
Tasks
Published	2019-03-27
URL	https://arxiv.org/abs/1903.11269v3
PDF	https://arxiv.org/pdf/1903.11269v3.pdf
PWC	https://paperswithcode.com/paper/css10-a-collection-of-single-speaker-speech
Repo	https://github.com/Kyubyong/css10
Framework	tf

On Learning Paradigms for the Travelling Salesman Problem


Title	On Learning Paradigms for the Travelling Salesman Problem
Authors	Chaitanya K. Joshi, Thomas Laurent, Xavier Bresson
Abstract	We explore the impact of learning paradigms on training deep neural networks for the Travelling Salesman Problem. We design controlled experiments to train supervised learning (SL) and reinforcement learning (RL) models on fixed graph sizes up to 100 nodes, and evaluate them on variable sized graphs up to 500 nodes. Beyond not needing labelled data, our results reveal favorable properties of RL over SL: RL training leads to better emergent generalization to variable graph sizes and is a key component for learning scale-invariant solvers for novel combinatorial problems.
Tasks
Published	2019-10-16
URL	https://arxiv.org/abs/1910.07210v2
PDF	https://arxiv.org/pdf/1910.07210v2.pdf
PWC	https://paperswithcode.com/paper/on-learning-paradigms-for-the-travelling
Repo	https://github.com/chaitjo/learning-paradigms-for-tsp
Framework	pytorch

Posterior inference unchained with EL_2O


Title	Posterior inference unchained with EL_2O
Authors	Uros Seljak, Byeonghee Yu
Abstract	Statistical inference of analytically non-tractable posteriors is a difficult problem because of marginalization of correlated variables and stochastic methods such as MCMC and VI are commonly used. We argue that stochastic KL divergence minimization used by MCMC and VI is noisy, and we propose instead EL_2O, expectation optimization of L_2 distance squared between the approximate log posterior q and the un-normalized log posterior of p. When sampling from q the solutions agree with stochastic KL divergence minimization based VI in the large sample limit, however EL_2O method is free of sampling noise, has better optimization properties, and requires only as many sample evaluations as the number of parameters we are optimizing if q covers p. As a consequence, increasing the expressivity of q improves both the quality of results and the convergence rate, allowing EL_2O to approach exact inference. Use of automatic differentiation methods enables us to develop Hessian, gradient and gradient free versions of the method, which can determine M(M+2)/2+1, M+1 and 1 parameter(s) of q with a single sample, respectively. EL_2O provides a reliable estimate of the quality of the approximating posterior, and converges rapidly on full rank gaussian approximation for q and extensions beyond it, such as nonlinear transformations and gaussian mixtures. These can handle general posteriors, while still allowing fast analytic marginalizations. We test it on several examples, including a realistic 13 dimensional galaxy clustering analysis, showing that it is several orders of magnitude faster than MCMC, while giving smooth and accurate non-gaussian posteriors, often requiring a few to a few dozen of iterations only.
Tasks
Published	2019-01-14
URL	https://arxiv.org/abs/1901.04454v2
PDF	https://arxiv.org/pdf/1901.04454v2.pdf
PWC	https://paperswithcode.com/paper/posterior-inference-unchained-with-el_2o
Repo	https://github.com/bccp/DeepUQ
Framework	tf

Contrastive Multiview Coding


Title	Contrastive Multiview Coding
Authors	Yonglong Tian, Dilip Krishnan, Phillip Isola
Abstract	Humans view the world through many sensory channels, e.g., the long-wavelength light channel, viewed by the left eye, or the high-frequency vibrations channel, heard by the right ear. Each view is noisy and incomplete, but important factors, such as physics, geometry, and semantics, tend to be shared between all views (e.g., a “dog” can be seen, heard, and felt). We investigate the classic hypothesis that a powerful representation is one that models view-invariant factors. We study this hypothesis under the framework of multiview contrastive learning, where we learn a representation that aims to maximize mutual information between different views of the same scene but is otherwise compact. Our approach scales to any number of views, and is view-agnostic. We analyze key properties of the approach that make it work, finding that the contrastive loss outperforms a popular alternative based on cross-view prediction, and that the more views we learn from, the better the resulting representation captures underlying scene semantics. Our approach achieves state-of-the-art results on image and video unsupervised learning benchmarks. Code is released at: http://github.com/HobbitLong/CMC/.
Tasks	Object Classification, Self-Supervised Image Classification
Published	2019-06-13
URL	https://arxiv.org/abs/1906.05849v4
PDF	https://arxiv.org/pdf/1906.05849v4.pdf
PWC	https://paperswithcode.com/paper/contrastive-multiview-coding
Repo	https://github.com/szq0214/Rethinking-Image-Mixture-for-Unsupervised-Learning
Framework	pytorch

Meta-Learning Representations for Continual Learning


Title	Meta-Learning Representations for Continual Learning
Authors	Khurram Javed, Martha White
Abstract	A continual learning agent should be able to build on top of existing knowledge to learn on new data quickly while minimizing forgetting. Current intelligent systems based on neural network function approximators arguably do the opposite—they are highly prone to forgetting and rarely trained to facilitate future learning. One reason for this poor behavior is that they learn from a representation that is not explicitly trained for these two goals. In this paper, we propose OML, an objective that directly minimizes catastrophic interference by learning representations that accelerate future learning and are robust to forgetting under online updates in continual learning. We show that it is possible to learn naturally sparse representations that are more effective for online updating. Moreover, our algorithm is complementary to existing continual learning strategies, such as MER and GEM. Finally, we demonstrate that a basic online updating strategy on representations learned by OML is competitive with rehearsal based methods for continual learning. We release an implementation of our method at https://github.com/khurramjaved96/mrcl .
Tasks	Continual Learning, Meta-Learning
Published	2019-05-29
URL	https://arxiv.org/abs/1905.12588v2
PDF	https://arxiv.org/pdf/1905.12588v2.pdf
PWC	https://paperswithcode.com/paper/meta-learning-representations-for-continual
Repo	https://github.com/Khurramjaved96/mrcl
Framework	pytorch

Learning What and Where to Transfer


Title	Learning What and Where to Transfer
Authors	Yunhun Jang, Hankook Lee, Sung Ju Hwang, Jinwoo Shin
Abstract	As the application of deep learning has expanded to real-world problems with insufficient volume of training data, transfer learning recently has gained much attention as means of improving the performance in such small-data regime. However, when existing methods are applied between heterogeneous architectures and tasks, it becomes more important to manage their detailed configurations and often requires exhaustive tuning on them for the desired performance. To address the issue, we propose a novel transfer learning approach based on meta-learning that can automatically learn what knowledge to transfer from the source network to where in the target network. Given source and target networks, we propose an efficient training scheme to learn meta-networks that decide (a) which pairs of layers between the source and target networks should be matched for knowledge transfer and (b) which features and how much knowledge from each feature should be transferred. We validate our meta-transfer approach against recent transfer learning methods on various datasets and network architectures, on which our automated scheme significantly outperforms the prior baselines that find “what and where to transfer” in a hand-crafted manner.
Tasks	Meta-Learning, Transfer Learning
Published	2019-05-15
URL	https://arxiv.org/abs/1905.05901v1
PDF	https://arxiv.org/pdf/1905.05901v1.pdf
PWC	https://paperswithcode.com/paper/learning-what-and-where-to-transfer
Repo	https://github.com/jindongwang/transferlearning
Framework	pytorch

Self-Supervised Learning of Pretext-Invariant Representations


Title	Self-Supervised Learning of Pretext-Invariant Representations
Authors	Ishan Misra, Laurens van der Maaten
Abstract	The goal of self-supervised learning from images is to construct image representations that are semantically meaningful via pretext tasks that do not require semantic annotations for a large training set of images. Many pretext tasks lead to representations that are covariant with image transformations. We argue that, instead, semantic representations ought to be invariant under such transformations. Specifically, we develop Pretext-Invariant Representation Learning (PIRL, pronounced as “pearl”) that learns invariant representations based on pretext tasks. We use PIRL with a commonly used pretext task that involves solving jigsaw puzzles. We find that PIRL substantially improves the semantic quality of the learned image representations. Our approach sets a new state-of-the-art in self-supervised learning from images on several popular benchmarks for self-supervised learning. Despite being unsupervised, PIRL outperforms supervised pre-training in learning image representations for object detection. Altogether, our results demonstrate the potential of self-supervised learning of image representations with good invariance properties.
Tasks	Object Detection, Representation Learning, Self-Supervised Image Classification, Semi-Supervised Image Classification
Published	2019-12-04
URL	https://arxiv.org/abs/1912.01991v1
PDF	https://arxiv.org/pdf/1912.01991v1.pdf
PWC	https://paperswithcode.com/paper/self-supervised-learning-of-pretext-invariant
Repo	https://github.com/akwasigroch/Pretext-Invariant-Representations
Framework	pytorch

Learning Representations by Maximizing Mutual Information Across Views


Title	Learning Representations by Maximizing Mutual Information Across Views
Authors	Philip Bachman, R Devon Hjelm, William Buchwalter
Abstract	We propose an approach to self-supervised representation learning based on maximizing mutual information between features extracted from multiple views of a shared context. For example, one could produce multiple views of a local spatio-temporal context by observing it from different locations (e.g., camera positions within a scene), and via different modalities (e.g., tactile, auditory, or visual). Or, an ImageNet image could provide a context from which one produces multiple views by repeatedly applying data augmentation. Maximizing mutual information between features extracted from these views requires capturing information about high-level factors whose influence spans multiple views – e.g., presence of certain objects or occurrence of certain events. Following our proposed approach, we develop a model which learns image representations that significantly outperform prior methods on the tasks we consider. Most notably, using self-supervised learning, our model learns representations which achieve 68.1% accuracy on ImageNet using standard linear evaluation. This beats prior results by over 12% and concurrent results by 7%. When we extend our model to use mixture-based representations, segmentation behaviour emerges as a natural side-effect. Our code is available online: https://github.com/Philip-Bachman/amdim-public.
Tasks	Data Augmentation, Representation Learning, Self-Supervised Image Classification
Published	2019-06-03
URL	https://arxiv.org/abs/1906.00910v2
PDF	https://arxiv.org/pdf/1906.00910v2.pdf
PWC	https://paperswithcode.com/paper/190600910
Repo	https://github.com/Philip-Bachman/amdim-public
Framework	pytorch

Attention-guided Network for Ghost-free High Dynamic Range Imaging


Title	Attention-guided Network for Ghost-free High Dynamic Range Imaging
Authors	Qingsen Yan, Dong Gong, Qinfeng Shi, Anton van den Hengel, Chunhua Shen, Ian Reid, Yanning Zhang
Abstract	Ghosting artifacts caused by moving objects or misalignments is a key challenge in high dynamic range (HDR) imaging for dynamic scenes. Previous methods first register the input low dynamic range (LDR) images using optical flow before merging them, which are error-prone and cause ghosts in results. A very recent work tries to bypass optical flows via a deep network with skip-connections, however, which still suffers from ghosting artifacts for severe movement. To avoid the ghosting from the source, we propose a novel attention-guided end-to-end deep neural network (AHDRNet) to produce high-quality ghost-free HDR images. Unlike previous methods directly stacking the LDR images or features for merging, we use attention modules to guide the merging according to the reference image. The attention modules automatically suppress undesired components caused by misalignments and saturation and enhance desirable fine details in the non-reference images. In addition to the attention model, we use dilated residual dense block (DRDB) to make full use of the hierarchical features and increase the receptive field for hallucinating the missing details. The proposed AHDRNet is a non-flow-based method, which can also avoid the artifacts generated by optical-flow estimation error. Experiments on different datasets show that the proposed AHDRNet can achieve state-of-the-art quantitative and qualitative results.
Tasks	Optical Flow Estimation
Published	2019-04-23
URL	http://arxiv.org/abs/1904.10293v1
PDF	http://arxiv.org/pdf/1904.10293v1.pdf
PWC	https://paperswithcode.com/paper/attention-guided-network-for-ghost-free-high
Repo	https://github.com/JimmyChame/The-State-of-the-Art-in-HDR-Deghosting
Framework	none

Mode Seeking Generative Adversarial Networks for Diverse Image Synthesis


Title	Mode Seeking Generative Adversarial Networks for Diverse Image Synthesis
Authors	Qi Mao, Hsin-Ying Lee, Hung-Yu Tseng, Siwei Ma, Ming-Hsuan Yang
Abstract	Most conditional generation tasks expect diverse outputs given a single conditional context. However, conditional generative adversarial networks (cGANs) often focus on the prior conditional information and ignore the input noise vectors, which contribute to the output variations. Recent attempts to resolve the mode collapse issue for cGANs are usually task-specific and computationally expensive. In this work, we propose a simple yet effective regularization term to address the mode collapse issue for cGANs. The proposed method explicitly maximizes the ratio of the distance between generated images with respect to the corresponding latent codes, thus encouraging the generators to explore more minor modes during training. This mode seeking regularization term is readily applicable to various conditional generation tasks without imposing training overhead or modifying the original network structures. We validate the proposed algorithm on three conditional image synthesis tasks including categorical generation, image-to-image translation, and text-to-image synthesis with different baseline models. Both qualitative and quantitative results demonstrate the effectiveness of the proposed regularization method for improving diversity without loss of quality.
Tasks	Image Generation, Image-to-Image Translation
Published	2019-03-13
URL	https://arxiv.org/abs/1903.05628v6
PDF	https://arxiv.org/pdf/1903.05628v6.pdf
PWC	https://paperswithcode.com/paper/mode-seeking-generative-adversarial-networks
Repo	https://github.com/HelenMao/MSGAN
Framework	pytorch

Push it to the Limit: Discover Edge-Cases in Image Data with Autoencoders


Title	Push it to the Limit: Discover Edge-Cases in Image Data with Autoencoders
Authors	Ilja Manakov, Volker Tresp
Abstract	In this paper, we focus on the problem of identifying semantic factors of variation in large image datasets. By training a convolutional Autoencoder on the image data, we create encodings, which describe each datapoint at a higher level of abstraction than pixel-space. We then apply Principal Component Analysis to the encodings to disentangle the factors of variation in the data. Sorting the dataset according to the values of individual principal components, we find that samples at the high and low ends of the distribution often share specific semantic characteristics. We refer to these groups of samples as semantic groups. When applied to real-world data, this method can help discover unwanted edge-cases.
Tasks
Published	2019-10-07
URL	https://arxiv.org/abs/1910.02713v1
PDF	https://arxiv.org/pdf/1910.02713v1.pdf
PWC	https://paperswithcode.com/paper/push-it-to-the-limit-discover-edge-cases-in
Repo	https://github.com/IljaManakov/PushItToTheLimit
Framework	pytorch

PointDAN: A Multi-Scale 3D Domain Adaption Network for Point Cloud Representation


Title	PointDAN: A Multi-Scale 3D Domain Adaption Network for Point Cloud Representation
Authors	Can Qin, Haoxuan You, Lichen Wang, C. -C. Jay Kuo, Yun Fu
Abstract	Domain Adaptation (DA) approaches achieved significant improvements in a wide range of machine learning and computer vision tasks (i.e., classification, detection, and segmentation). However, as far as we are aware, there are few methods yet to achieve domain adaptation directly on 3D point cloud data. The unique challenge of point cloud data lies in its abundant spatial geometric information, and the semantics of the whole object is contributed by including regional geometric structures. Specifically, most general-purpose DA methods that struggle for global feature alignment and ignore local geometric information are not suitable for 3D domain alignment. In this paper, we propose a novel 3D Domain Adaptation Network for point cloud data (PointDAN). PointDAN jointly aligns the global and local features in multi-level. For local alignment, we propose Self-Adaptive (SA) node module with an adjusted receptive field to model the discriminative local structures for aligning domains. To represent hierarchically scaled features, node-attention module is further introduced to weight the relationship of SA nodes across objects and domains. For global alignment, an adversarial-training strategy is employed to learn and align global features across domains. Since there is no common evaluation benchmark for 3D point cloud DA scenario, we build a general benchmark (i.e., PointDA-10) extracted from three popular 3D object/scene datasets (i.e., ModelNet, ShapeNet and ScanNet) for cross-domain 3D objects classification fashion. Extensive experiments on PointDA-10 illustrate the superiority of our model over the state-of-the-art general-purpose DA methods.
Tasks	Domain Adaptation, Unsupervised Domain Adaptation
Published	2019-11-07
URL	https://arxiv.org/abs/1911.02744v1
PDF	https://arxiv.org/pdf/1911.02744v1.pdf
PWC	https://paperswithcode.com/paper/pointdan-a-multi-scale-3d-domain-adaption
Repo	https://github.com/canqin001/PointDAN
Framework	pytorch

Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction


Title	Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction
Authors	Aviral Kumar, Justin Fu, George Tucker, Sergey Levine
Abstract	Off-policy reinforcement learning aims to leverage experience collected from prior policies for sample-efficient learning. However, in practice, commonly used off-policy approximate dynamic programming methods based on Q-learning and actor-critic methods are highly sensitive to the data distribution, and can make only limited progress without collecting additional on-policy data. As a step towards more robust off-policy algorithms, we study the setting where the off-policy experience is fixed and there is no further interaction with the environment. We identify bootstrapping error as a key source of instability in current methods. Bootstrapping error is due to bootstrapping from actions that lie outside of the training data distribution, and it accumulates via the Bellman backup operator. We theoretically analyze bootstrapping error, and demonstrate how carefully constraining action selection in the backup can mitigate it. Based on our analysis, we propose a practical algorithm, bootstrapping error accumulation reduction (BEAR). We demonstrate that BEAR is able to learn robustly from different off-policy distributions, including random and suboptimal demonstrations, on a range of continuous control tasks.
Tasks	Continuous Control, Q-Learning
Published	2019-06-03
URL	https://arxiv.org/abs/1906.00949v2
PDF	https://arxiv.org/pdf/1906.00949v2.pdf
PWC	https://paperswithcode.com/paper/190600949
Repo	https://github.com/aviralkumar2907/BEAR
Framework	pytorch

Question Embeddings Based on Shannon Entropy: Solving intent classification task in goal-oriented dialogue system


Title	Question Embeddings Based on Shannon Entropy: Solving intent classification task in goal-oriented dialogue system
Authors	Aleksandr Perevalov, Daniil Kurushin, Rustam Faizrakhmanov, Farida Khabibrakhmanova
Abstract	Question-answering systems and voice assistants are becoming major part of client service departments of many organizations, helping them to reduce the labor costs of staff. In many such systems, there is always natural language understanding module that solves intent classification task. This task is complicated because of its case-dependency - every subject area has its own semantic kernel. The state of art approaches for intent classification are different machine learning and deep learning methods that use text vector representations as input. The basic vector representation models such as Bag of words and TF-IDF generate sparse matrixes, which are becoming very big as the amount of input data grows. Modern methods such as word2vec and FastText use neural networks to evaluate word embeddings with fixed dimension size. As we are developing a question-answering system for students and enrollees of the Perm National Research Polytechnic University, we have faced the problem of user’s intent detection. The subject area of our system is very specific, that is why there is a lack of training data. This aspect makes intent classification task more challenging for using state of the art deep learning methods. In this paper, we propose an approach of the questions embeddings representation based on calculation of Shannon entropy.The goal of the approach is to produce low dimensional question vectors as neural approaches do and to outperform related methods, described above in condition of small dataset. We evaluate and compare our model with existing ones using logistic regression and dataset that contains questions asked by students and enrollees. The data is labeled into six classes. Experimental comparison of proposed approach and other models revealed that proposed model performed better in the given task.
Tasks	Intent Classification, Intent Detection, Question Answering, Word Embeddings
Published	2019-03-25
URL	http://arxiv.org/abs/1904.00785v1
PDF	http://arxiv.org/pdf/1904.00785v1.pdf
PWC	https://paperswithcode.com/paper/question-embeddings-based-on-shannon-entropy
Repo	https://github.com/Perevalov/intent_classifier
Framework	tf