April 3, 2020

3234 words 16 mins read

Paper Group AWR 36

Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-training. Iterate Averaging Helps: An Alternative Perspective in Deep Learning. Bio-Inspired Modality Fusion for Active Speaker Detection. Rethinking Zero-shot Video Classification: End-to-end Training for Realistic Applications. SimLoss: Class Similarities in Cross Entropy …


Title	Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-training
Authors	Weituo Hao, Chunyuan Li, Xiujun Li, Lawrence Carin, Jianfeng Gao
Abstract	Learning to navigate in a visual environment following natural-language instructions is a challenging task, because the multimodal inputs to the agent are highly variable, and the training data on a new task is often limited. In this paper, we present the first pre-training and fine-tuning paradigm for vision-and-language navigation (VLN) tasks. By training on a large amount of image-text-action triplets in a self-supervised learning manner, the pre-trained model provides generic representations of visual environments and language instructions. It can be easily used as a drop-in for existing VLN frameworks, leading to the proposed agent called Prevalent. It learns more effectively in new tasks and generalizes better in a previously unseen environment. The performance is validated on three VLN tasks. On the Room-to-Room benchmark, our model improves the state-of-the-art from 47% to 51% on success rate weighted by path length. Further, the learned representation is transferable to other VLN tasks. On two recent tasks, vision-and-dialog navigation and ``Help, Anna!’’ the proposed Prevalent leads to significant improvement over existing methods, achieving a new state of the art. \|
Tasks
Published	2020-02-25
URL	https://arxiv.org/abs/2002.10638v1
PDF	https://arxiv.org/pdf/2002.10638v1.pdf
PWC	https://paperswithcode.com/paper/towards-learning-a-generic-agent-for-vision
Repo	https://github.com/weituo12321/PREVALENT
Framework	none

Iterate Averaging Helps: An Alternative Perspective in Deep Learning


Title	Iterate Averaging Helps: An Alternative Perspective in Deep Learning
Authors	Diego Granziol, Xingchen Wan, Stephen Roberts
Abstract	Iterate averaging has a rich history in optimisation, but has only very recently been popularised in deep learning. We investigate its effects in a deep learning context, and argue that previous explanations on its efficacy, which place a high importance on the local geometry (flatness vs sharpness) of final solutions, are not necessarily relevant. We instead argue that the robustness of iterate averaging towards the typically very high estimation noise in deep learning and the various regularisation effects averaging exert, are the key reasons for the performance gain, indeed this effect is made even more prominent due to the over-parameterisation of modern networks. Inspired by this, we propose Gadam, which combines Adam with iterate averaging to address one of key problems of adaptive optimisers that they often generalise worse. Without compromising adaptivity and with minimal additional computational burden, we show that Gadam (and its variant GadamX) achieve a generalisation performance that is consistently superior to tuned SGD and is even on par or better compared to SGD with iterate averaging on various image classification (CIFAR 10/100 and ImageNet 32$\times$32) and language tasks (PTB).
Tasks	Image Classification
Published	2020-03-02
URL	https://arxiv.org/abs/2003.01247v1
PDF	https://arxiv.org/pdf/2003.01247v1.pdf
PWC	https://paperswithcode.com/paper/iterate-averaging-helps-an-alternative
Repo	https://github.com/diegogranziol/Gadam
Framework	pytorch

Bio-Inspired Modality Fusion for Active Speaker Detection


Title	Bio-Inspired Modality Fusion for Active Speaker Detection
Authors	Gustavo Assunção, Nuno Gonçalves, Paulo Menezes
Abstract	Human beings have developed fantastic abilities to integrate information from various sensory sources exploring their inherent complementarity. Perceptual capabilities are therefore heightened enabling, for instance, the well known “cocktail party” and McGurk effects, i.e. speech disambiguation from a panoply of sound signals. This fusion ability is also key in refining the perception of sound source location, as in distinguishing whose voice is being heard in a group conversation. Furthermore, Neuroscience has successfully identified the superior colliculus region in the brain as the one responsible for this modality fusion, with a handful of biological models having been proposed to approach its underlying neurophysiological process. Deriving inspiration from one of these models, this paper presents a methodology for effectively fusing correlated auditory and visual information for active speaker detection. Such an ability can have a wide range of applications, from teleconferencing systems to social robotics. The detection approach initially routes auditory and visual information through two specialized neural network structures. The resulting embeddings are fused via a novel layer based on the superior colliculus, whose topological structure emulates spatial neuron cross-mapping of unimodal perceptual fields. The validation process employed two publicly available datasets, with achieved results confirming and greatly surpassing initial expectations.
Tasks
Published	2020-02-28
URL	https://arxiv.org/abs/2003.00063v1
PDF	https://arxiv.org/pdf/2003.00063v1.pdf
PWC	https://paperswithcode.com/paper/bio-inspired-modality-fusion-for-active
Repo	https://github.com/gustavomiguelsa/SCF
Framework	none

Rethinking Zero-shot Video Classification: End-to-end Training for Realistic Applications


Title	Rethinking Zero-shot Video Classification: End-to-end Training for Realistic Applications
Authors	Biagio Brattoli, Joseph Tighe, Fedor Zhdanov, Pietro Perona, Krzysztof Chalupka
Abstract	Trained on large datasets, deep learning (DL) can accurately classify videos into hundreds of diverse classes. However, video data is expensive to annotate. Zero-shot learning (ZSL) proposes one solution to this problem. ZSL trains a model once, and generalizes to new tasks whose classes are not present in the training dataset. We propose the first end-to-end algorithm for ZSL in video classification. Our training procedure builds on insights from recent video classification literature and uses a trainable 3D CNN to learn the visual features. This is in contrast to previous video ZSL methods, which use pretrained feature extractors. We also extend the current benchmarking paradigm: Previous techniques aim to make the test task unknown at training time but fall short of this goal. We encourage domain shift across training and test data and disallow tailoring a ZSL model to a specific test dataset. We outperform the state-of-the-art by a wide margin. Our code, evaluation procedure and model weights are available at github.com/bbrattoli/ZeroShotVideoClassification.
Tasks	Video Classification, Zero-Shot Learning
Published	2020-03-03
URL	https://arxiv.org/abs/2003.01455v3
PDF	https://arxiv.org/pdf/2003.01455v3.pdf
PWC	https://paperswithcode.com/paper/rethinking-zero-shot-video-classification-end
Repo	https://github.com/bbrattoli/ZeroShotVideoClassification
Framework	pytorch

SimLoss: Class Similarities in Cross Entropy


Title	SimLoss: Class Similarities in Cross Entropy
Authors	Konstantin Kobs, Michael Steininger, Albin Zehe, Florian Lautenschlager, Andreas Hotho
Abstract	One common loss function in neural network classification tasks is Categorical Cross Entropy (CCE), which punishes all misclassifications equally. However, classes often have an inherent structure. For instance, classifying an image of a rose as “violet” is better than as “truck”. We introduce SimLoss, a drop-in replacement for CCE that incorporates class similarities along with two techniques to construct such matrices from task-specific knowledge. We test SimLoss on Age Estimation and Image Classification and find that it brings significant improvements over CCE on several metrics. SimLoss therefore allows for explicit modeling of background knowledge by simply exchanging the loss function, while keeping the neural network architecture the same. Code and additional resources can be found at https://github.com/konstantinkobs/SimLoss.
Tasks	Age Estimation, Image Classification
Published	2020-03-06
URL	https://arxiv.org/abs/2003.03182v1
PDF	https://arxiv.org/pdf/2003.03182v1.pdf
PWC	https://paperswithcode.com/paper/simloss-class-similarities-in-cross-entropy
Repo	https://github.com/konstantinkobs/SimLoss
Framework	pytorch

The POLAR Framework: Polar Opposites Enable Interpretability of Pre-Trained Word Embeddings


Title	The POLAR Framework: Polar Opposites Enable Interpretability of Pre-Trained Word Embeddings
Authors	Binny Mathew, Sandipan Sikdar, Florian Lemmerich, Markus Strohmaier
Abstract	We introduce POLAR - a framework that adds interpretability to pre-trained word embeddings via the adoption of semantic differentials. Semantic differentials are a psychometric construct for measuring the semantics of a word by analysing its position on a scale between two polar opposites (e.g., cold – hot, soft – hard). The core idea of our approach is to transform existing, pre-trained word embeddings via semantic differentials to a new “polar” space with interpretable dimensions defined by such polar opposites. Our framework also allows for selecting the most discriminative dimensions from a set of polar dimensions provided by an oracle, i.e., an external source. We demonstrate the effectiveness of our framework by deploying it to various downstream tasks, in which our interpretable word embeddings achieve a performance that is comparable to the original word embeddings. We also show that the interpretable dimensions selected by our framework align with human judgement. Together, these results demonstrate that interpretability can be added to word embeddings without compromising performance. Our work is relevant for researchers and engineers interested in interpreting pre-trained word embeddings.
Tasks	Word Embeddings
Published	2020-01-27
URL	https://arxiv.org/abs/2001.09876v2
PDF	https://arxiv.org/pdf/2001.09876v2.pdf
PWC	https://paperswithcode.com/paper/the-polar-framework-polar-opposites-enable
Repo	https://github.com/Sandipan99/POLAR
Framework	none

Selecting Relevant Features from a Universal Representation for Few-shot Classification


Title	Selecting Relevant Features from a Universal Representation for Few-shot Classification
Authors	Nikita Dvornik, Cordelia Schmid, Julien Mairal
Abstract	Popular approaches for few-shot classification consist of first learning a generic data representation based on a large annotated dataset, before adapting the representation to new classes given only a few labeled samples. In this work, we propose a new strategy based on feature selection, which is both simpler and more effective than previous feature adaptation approaches. First, we obtain a universal representation by training a set of semantically different feature extractors. Then, given a few-shot learning task, we use our universal feature bank to automatically select the most relevant representations. We show that a simple non-parametric classifier built on top of such features produces high accuracy and generalizes to domains never seen during training, which leads to state-of-the-art results on MetaDataset and improved accuracy on mini-ImageNet.
Tasks	Feature Selection, Few-Shot Learning
Published	2020-03-20
URL	https://arxiv.org/abs/2003.09338v1
PDF	https://arxiv.org/pdf/2003.09338v1.pdf
PWC	https://paperswithcode.com/paper/selecting-relevant-features-from-a-universal
Repo	https://github.com/dvornikita/SUR
Framework	none

Extreme Classification via Adversarial Softmax Approximation


Title	Extreme Classification via Adversarial Softmax Approximation
Authors	Robert Bamler, Stephan Mandt
Abstract	Training a classifier over a large number of classes, known as ‘extreme classification’, has become a topic of major interest with applications in technology, science, and e-commerce. Traditional softmax regression induces a gradient cost proportional to the number of classes $C$, which often is prohibitively expensive. A popular scalable softmax approximation relies on uniform negative sampling, which suffers from slow convergence due a poor signal-to-noise ratio. In this paper, we propose a simple training method for drastically enhancing the gradient signal by drawing negative samples from an adversarial model that mimics the data distribution. Our contributions are three-fold: (i) an adversarial sampling mechanism that produces negative samples at a cost only logarithmic in $C$, thus still resulting in cheap gradient updates; (ii) a mathematical proof that this adversarial sampling minimizes the gradient variance while any bias due to non-uniform sampling can be removed; (iii) experimental results on large scale data sets that show a reduction of the training time by an order of magnitude relative to several competitive baselines.
Tasks
Published	2020-02-15
URL	https://arxiv.org/abs/2002.06298v1
PDF	https://arxiv.org/pdf/2002.06298v1.pdf
PWC	https://paperswithcode.com/paper/extreme-classification-via-adversarial-1
Repo	https://github.com/mandt-lab/adversarial-negative-sampling
Framework	tf

Semi-Supervised Neural Architecture Search


Title	Semi-Supervised Neural Architecture Search
Authors	Renqian Luo, Xu Tan, Rui Wang, Tao Qin, Enhong Chen, Tie-Yan Liu
Abstract	Neural architecture search (NAS) relies on a good controller to generate better architectures or predict the accuracy of given architectures. However, training the controller requires both abundant and high-quality pairs of architectures and their accuracy, while it is costly to evaluate an architecture and obtain its accuracy. In this paper, we propose SemiNAS, a semi-supervised NAS approach that leverages numerous unlabeled architectures (without evaluation and thus nearly no cost) to improve the controller. Specifically, SemiNAS 1) trains an initial controller with a small set of architecture-accuracy data pairs; 2) uses the trained controller to predict the accuracy of large amount of architectures~(without evaluation); and 3) adds the generated data pairs to the original data to further improve the controller. SemiNAS has two advantages: 1) It reduces the computational cost under the same accuracy guarantee. 2) It achieves higher accuracy under the same computational cost. On NASBench-101 benchmark dataset, it discovers a top 0.01% architecture after evaluating roughly 300 architectures, with only 1/7 computational cost compared with regularized evolution and gradient-based methods. On ImageNet, it achieves a state-of-the-art top-1 error rate of $23.5%$ (under the mobile setting) using 4 GPU-days for search. We further apply it to LJSpeech text to speech task and it achieves 97% intelligibility rate in the low-resource setting and 15% test error rate in the robustness setting, with 9%, 7% improvements over the baseline respectively. Our code is available at https://github.com/renqianluo/SemiNAS.
Tasks	Natural Language Transduction, Neural Architecture Search
Published	2020-02-24
URL	https://arxiv.org/abs/2002.10389v2
PDF	https://arxiv.org/pdf/2002.10389v2.pdf
PWC	https://paperswithcode.com/paper/semi-supervised-neural-architecture-search
Repo	https://github.com/renqianluo/SemiNAS
Framework	pytorch

Watch your Up-Convolution: CNN Based Generative Deep Neural Networks are Failing to Reproduce Spectral Distributions


Title	Watch your Up-Convolution: CNN Based Generative Deep Neural Networks are Failing to Reproduce Spectral Distributions
Authors	Ricard Durall, Margret Keuper, Janis Keuper
Abstract	Generative convolutional deep neural networks, e.g. popular GAN architectures, are relying on convolution based up-sampling methods to produce non-scalar outputs like images or video sequences. In this paper, we show that common up-sampling methods, i.e. known as up-convolution or transposed convolution, are causing the inability of such models to reproduce spectral distributions of natural training data correctly. This effect is independent of the underlying architecture and we show that it can be used to easily detect generated data like deepfakes with up to 100% accuracy on public benchmarks. To overcome this drawback of current generative models, we propose to add a novel spectral regularization term to the training optimization objective. We show that this approach not only allows to train spectral consistent GANs that are avoiding high frequency errors. Also, we show that a correct approximation of the frequency spectrum has positive effects on the training stability and output quality of generative networks.
Tasks
Published	2020-03-03
URL	https://arxiv.org/abs/2003.01826v1
PDF	https://arxiv.org/pdf/2003.01826v1.pdf
PWC	https://paperswithcode.com/paper/watch-your-up-convolution-cnn-based
Repo	https://github.com/cc-hpc-itwm/UpConv
Framework	pytorch

Sense and Sensitivity Analysis: Simple Post-Hoc Analysis of Bias Due to Unobserved Confounding


Title	Sense and Sensitivity Analysis: Simple Post-Hoc Analysis of Bias Due to Unobserved Confounding
Authors	Victor Veitch, Anisha Zaveri
Abstract	It is a truth universally acknowledged that an observed association without known mechanism must be in want of a causal estimate. However, causal estimation from observational data often relies on the (untestable) assumption of `no unobserved confounding'. Violations of this assumption can induce bias in effect estimates. In principle, such bias could invalidate or reverse the conclusions of a study. However, in some cases, we might hope that the influence of unobserved confounders is weak relative to a` large’ estimated effect, so the qualitative conclusions are robust to bias from unobserved confounding. The purpose of this paper is to develop \emph{Austen plots}, a sensitivity analysis tool to aid such judgments by making it easier to reason about potential bias induced by unobserved confounding. We formalize confounding strength in terms of how strongly the confounder influences treatment assignment and outcome. For a target level of bias, an Austen plot shows the minimum values of treatment and outcome influence required to induce that level of bias. Domain experts can then make subjective judgments about whether such strong confounders are plausible. To aid this judgment, the Austen plot additionally displays the estimated influence strength of (groups of) the observed covariates. Austen plots generalize the classic sensitivity analysis approach of Imbens [Imb03]. Critically, Austen plots allow any approach for modeling the observed data and producing the initial estimate. We illustrate the tool by assessing biases for several real causal inference problems, using a variety of machine learning approaches for the initial data analysis. Code is available at https://github.com/anishazaveri/austen_plots
Tasks	Causal Inference
Published	2020-03-03
URL	https://arxiv.org/abs/2003.01747v1
PDF	https://arxiv.org/pdf/2003.01747v1.pdf
PWC	https://paperswithcode.com/paper/sense-and-sensitivity-analysis-simple-post
Repo	https://github.com/anishazaveri/austen_plots
Framework	none

Neural Bayes: A Generic Parameterization Method for Unsupervised Representation Learning


Title	Neural Bayes: A Generic Parameterization Method for Unsupervised Representation Learning
Authors	Devansh Arpit, Huan Wang, Caiming Xiong, Richard Socher, Yoshua Bengio
Abstract	We introduce a parameterization method called Neural Bayes which allows computing statistical quantities that are in general difficult to compute and opens avenues for formulating new objectives for unsupervised representation learning. Specifically, given an observed random variable $\mathbf{x}$ and a latent discrete variable $z$, we can express $p(\mathbf{x}z)$, $p(z\mathbf{x})$ and $p(z)$ in closed form in terms of a sufficiently expressive function (Eg. neural network) using our parameterization without restricting the class of these distributions. To demonstrate its usefulness, we develop two independent use cases for this parameterization: 1. Mutual Information Maximization (MIM): MIM has become a popular means for self-supervised representation learning. Neural Bayes allows us to compute mutual information between observed random variables $\mathbf{x}$ and latent discrete random variables $z$ in closed form. We use this for learning image representations and show its usefulness on downstream classification tasks. 2. Disjoint Manifold Labeling: Neural Bayes allows us to formulate an objective which can optimally label samples from disjoint manifolds present in the support of a continuous distribution. This can be seen as a specific form of clustering where each disjoint manifold in the support is a separate cluster. We design clustering tasks that obey this formulation and empirically show that the model optimally labels the disjoint manifolds. Our code is available at \url{https://github.com/salesforce/NeuralBayes}
Tasks	Representation Learning, Unsupervised Representation Learning
Published	2020-02-20
URL	https://arxiv.org/abs/2002.09046v1
PDF	https://arxiv.org/pdf/2002.09046v1.pdf
PWC	https://paperswithcode.com/paper/neural-bayes-a-generic-parameterization
Repo	https://github.com/salesforce/NeuralBayes
Framework	pytorch

Nonlinear classifiers for ranking problems based on kernelized SVM


Title	Nonlinear classifiers for ranking problems based on kernelized SVM
Authors	Václav Mácha, Lukáš Adam, Václav Šmídl
Abstract	Many classification problems focus on maximizing the performance only on the samples with the highest relevance instead of all samples. As an example, we can mention ranking problems, accuracy at the top or search engines where only the top few queries matter. In our previous work, we derived a general framework including several classes of these linear classification problems. In this paper, we extend the framework to nonlinear classifiers. Utilizing a similarity to SVM, we dualize the problems, add kernels and propose a componentwise dual ascent method. This allows us to perform one iteration in less than 20 milliseconds on relatively large datasets such as FashionMNIST.
Tasks
Published	2020-02-26
URL	https://arxiv.org/abs/2002.11436v1
PDF	https://arxiv.org/pdf/2002.11436v1.pdf
PWC	https://paperswithcode.com/paper/nonlinear-classifiers-for-ranking-problems
Repo	https://github.com/VaclavMacha/ClassificationOnTop_new.jl
Framework	none

Ada-LISTA: Learned Solvers Adaptive to Varying Models


Title	Ada-LISTA: Learned Solvers Adaptive to Varying Models
Authors	Aviad Aberdam, Alona Golts, Michael Elad
Abstract	Neural networks that are based on unfolding of an iterative solver, such as LISTA (learned iterative soft threshold algorithm), are widely used due to their accelerated performance. Nevertheless, as opposed to non-learned solvers, these networks are trained on a certain dictionary, and therefore they are inapplicable for varying model scenarios. This work introduces an adaptive learned solver, termed Ada-LISTA, which receives pairs of signals and their corresponding dictionaries as inputs, and learns a universal architecture to serve them all. We prove that this scheme is guaranteed to solve sparse coding in linear rate for varying models, including dictionary perturbations and permutations. We also provide an extensive numerical study demonstrating its practical adaptation capabilities. Finally, we deploy Ada-LISTA to natural image inpainting, where the patch-masks vary spatially, thus requiring such an adaptation.
Tasks	Image Denoising, Image Inpainting
Published	2020-01-23
URL	https://arxiv.org/abs/2001.08456v2
PDF	https://arxiv.org/pdf/2001.08456v2.pdf
PWC	https://paperswithcode.com/paper/ada-lista-learned-solvers-adaptive-to-varying
Repo	https://github.com/aaberdam/AdaLISTA
Framework	pytorch

Understanding and Enhancing Mixed Sample Data Augmentation


Title	Understanding and Enhancing Mixed Sample Data Augmentation
Authors	Ethan Harris, Antonia Marcu, Matthew Painter, Mahesan Niranjan, Adam Prügel-Bennett, Jonathon Hare
Abstract	Mixed Sample Data Augmentation (MSDA) has received increasing attention in recent years, with many successful variants such as MixUp and CutMix. Following insight on the efficacy of CutMix in particular, we propose FMix, an MSDA that uses binary masks obtained by applying a threshold to low frequency images sampled from Fourier space. FMix improves performance over MixUp and CutMix for a number of state-of-the-art models across a range of data sets and problem settings. We go on to analyse MixUp, CutMix, and FMix from an information theoretic perspective, characterising learned models in terms of how they progressively compress the input with depth. Ultimately, our analyses allow us to decouple two complementary properties of augmentations, and present a unified framework for reasoning about MSDA. Code for all experiments is available at https://github.com/ecs-vlc/FMix.
Tasks	Data Augmentation, Image Classification
Published	2020-02-27
URL	https://arxiv.org/abs/2002.12047v1
PDF	https://arxiv.org/pdf/2002.12047v1.pdf
PWC	https://paperswithcode.com/paper/understanding-and-enhancing-mixed-sample-data
Repo	https://github.com/ecs-vlc/FMix
Framework	pytorch