February 1, 2020

2938 words 14 mins read

Paper Group AWR 113

Robust Learning with Jacobian Regularization. Compact Global Descriptor for Neural Networks. Proactive Human-Machine Conversation with Explicit Conversation Goals. Sum-of-Squares Polynomial Flow. Progressive Pose Attention Transfer for Person Image Generation. Automatic and Simultaneous Adjustment of Learning Rate and Momentum for Stochastic Gradie …

Robust Learning with Jacobian Regularization


Title	Robust Learning with Jacobian Regularization
Authors	Judy Hoffman, Daniel A. Roberts, Sho Yaida
Abstract	Design of reliable systems must guarantee stability against input perturbations. In machine learning, such guarantee entails preventing overfitting and ensuring robustness of models against corruption of input data. In order to maximize stability, we analyze and develop a computationally efficient implementation of Jacobian regularization that increases classification margins of neural networks. The stabilizing effect of the Jacobian regularizer leads to significant improvements in robustness, as measured against both random and adversarial input perturbations, without severely degrading generalization properties on clean data.
Tasks
Published	2019-08-07
URL	https://arxiv.org/abs/1908.02729v1
PDF	https://arxiv.org/pdf/1908.02729v1.pdf
PWC	https://paperswithcode.com/paper/robust-learning-with-jacobian-regularization
Repo	https://github.com/facebookresearch/jacobian_regularizer
Framework	pytorch

Compact Global Descriptor for Neural Networks


Title	Compact Global Descriptor for Neural Networks
Authors	Xiangyu He, Ke Cheng, Qiang Chen, Qinghao Hu, Peisong Wang, Jian Cheng
Abstract	Long-range dependencies modeling, widely used in capturing spatiotemporal correlation, has shown to be effective in CNN dominated computer vision tasks. Yet neither stacks of convolutional operations to enlarge receptive fields nor recent nonlocal modules is computationally efficient. In this paper, we present a generic family of lightweight global descriptors for modeling the interactions between positions across different dimensions (e.g., channels, frames). This descriptor enables subsequent convolutions to access the informative global features with negligible computational complexity and parameters. Benchmark experiments show that the proposed method can complete state-of-the-art long-range mechanisms with a significant reduction in extra computing cost. Code available at https://github.com/HolmesShuan/Compact-Global-Descriptor.
Tasks	Audio Classification, Deep Attention, Image Classification, Object Detection
Published	2019-07-23
URL	https://arxiv.org/abs/1907.09665v3
PDF	https://arxiv.org/pdf/1907.09665v3.pdf
PWC	https://paperswithcode.com/paper/compact-global-descriptor-for-neural-networks
Repo	https://github.com/HolmesShuan/Compact-Global-Descriptor
Framework	pytorch

Proactive Human-Machine Conversation with Explicit Conversation Goals


Title	Proactive Human-Machine Conversation with Explicit Conversation Goals
Authors	Wenquan Wu, Zhen Guo, Xiangyang Zhou, Hua Wu, Xiyuan Zhang, Rongzhong Lian, Haifeng Wang
Abstract	Though great progress has been made for human-machine conversation, current dialogue system is still in its infancy: it usually converses passively and utters words more as a matter of response, rather than on its own initiatives. In this paper, we take a radical step towards building a human-like conversational agent: endowing it with the ability of proactively leading the conversation (introducing a new topic or maintaining the current topic). To facilitate the development of such conversation systems, we create a new dataset named DuConv where one acts as a conversation leader and the other acts as the follower. The leader is provided with a knowledge graph and asked to sequentially change the discussion topics, following the given conversation goal, and meanwhile keep the dialogue as natural and engaging as possible. DuConv enables a very challenging task as the model needs to both understand dialogue and plan over the given knowledge graph. We establish baseline results on this dataset (about 270K utterances and 30k dialogues) using several state-of-the-art models. Experimental results show that dialogue models that plan over the knowledge graph can make full use of related knowledge to generate more diverse multi-turn conversations. The baseline systems along with the dataset are publicly available
Tasks
Published	2019-06-13
URL	https://arxiv.org/abs/1906.05572v2
PDF	https://arxiv.org/pdf/1906.05572v2.pdf
PWC	https://paperswithcode.com/paper/proactive-human-machine-conversation-with
Repo	https://github.com/baidu/Dialogue
Framework	tf

Sum-of-Squares Polynomial Flow


Title	Sum-of-Squares Polynomial Flow
Authors	Priyank Jaini, Kira A. Selby, Yaoliang Yu
Abstract	Triangular map is a recent construct in probability theory that allows one to transform any source probability density function to any target density function. Based on triangular maps, we propose a general framework for high-dimensional density estimation, by specifying one-dimensional transformations (equivalently conditional densities) and appropriate conditioner networks. This framework (a) reveals the commonalities and differences of existing autoregressive and flow based methods, (b) allows a unified understanding of the limitations and representation power of these recent approaches and, (c) motivates us to uncover a new Sum-of-Squares (SOS) flow that is interpretable, universal, and easy to train. We perform several synthetic experiments on various density geometries to demonstrate the benefits (and short-comings) of such transformations. SOS flows achieve competitive results in simulations and several real-world datasets.
Tasks	Density Estimation
Published	2019-05-07
URL	https://arxiv.org/abs/1905.02325v2
PDF	https://arxiv.org/pdf/1905.02325v2.pdf
PWC	https://paperswithcode.com/paper/sum-of-squares-polynomial-flow
Repo	https://github.com/GinGinWang/MTQ
Framework	pytorch

Progressive Pose Attention Transfer for Person Image Generation


Title	Progressive Pose Attention Transfer for Person Image Generation
Authors	Zhen Zhu, Tengteng Huang, Baoguang Shi, Miao Yu, Bofei Wang, Xiang Bai
Abstract	This paper proposes a new generative adversarial network for pose transfer, i.e., transferring the pose of a given person to a target pose. The generator of the network comprises a sequence of Pose-Attentional Transfer Blocks that each transfers certain regions it attends to, generating the person image progressively. Compared with those in previous works, our generated person images possess better appearance consistency and shape consistency with the input images, thus significantly more realistic-looking. The efficacy and efficiency of the proposed network are validated both qualitatively and quantitatively on Market-1501 and DeepFashion. Furthermore, the proposed architecture can generate training images for person re-identification, alleviating data insufficiency. Codes and models are available at: https://github.com/tengteng95/Pose-Transfer.git.
Tasks	Image Generation, Person Re-Identification, Pose Transfer
Published	2019-04-06
URL	https://arxiv.org/abs/1904.03349v3
PDF	https://arxiv.org/pdf/1904.03349v3.pdf
PWC	https://paperswithcode.com/paper/progressive-pose-attention-transfer-for
Repo	https://github.com/zsypotter/pose_transfer_keypoint
Framework	pytorch

Automatic and Simultaneous Adjustment of Learning Rate and Momentum for Stochastic Gradient Descent


Title	Automatic and Simultaneous Adjustment of Learning Rate and Momentum for Stochastic Gradient Descent
Authors	Tomer Lancewicki, Selcuk Kopru
Abstract	Stochastic Gradient Descent (SGD) methods are prominent for training machine learning and deep learning models. The performance of these techniques depends on their hyperparameter tuning over time and varies for different models and problems. Manual adjustment of hyperparameters is very costly and time-consuming, and even if done correctly, it lacks theoretical justification which inevitably leads to “rule of thumb” settings. In this paper, we propose a generic approach that utilizes the statistics of an unbiased gradient estimator to automatically and simultaneously adjust two paramount hyperparameters: the learning rate and momentum. We deploy the proposed general technique for various SGD methods to train Convolutional Neural Networks (CNN’s). The results match the performance of the best settings obtained through an exhaustive search and therefore, removes the need for a tedious manual tuning.
Tasks
Published	2019-08-20
URL	https://arxiv.org/abs/1908.07607v1
PDF	https://arxiv.org/pdf/1908.07607v1.pdf
PWC	https://paperswithcode.com/paper/190807607
Repo	https://github.com/eBay/AutoOpt
Framework	pytorch

Asking Clarifying Questions in Open-Domain Information-Seeking Conversations


Title	Asking Clarifying Questions in Open-Domain Information-Seeking Conversations
Authors	Mohammad Aliannejadi, Hamed Zamani, Fabio Crestani, W. Bruce Croft
Abstract	Users often fail to formulate their complex information needs in a single query. As a consequence, they may need to scan multiple result pages or reformulate their queries, which may be a frustrating experience. Alternatively, systems can improve user satisfaction by proactively asking questions of the users to clarify their information needs. Asking clarifying questions is especially important in conversational systems since they can only return a limited number of (often only one) result(s). In this paper, we formulate the task of asking clarifying questions in open-domain information-seeking conversational systems. To this end, we propose an offline evaluation methodology for the task and collect a dataset, called Qulac, through crowdsourcing. Our dataset is built on top of the TREC Web Track 2009-2012 data and consists of over 10K question-answer pairs for 198 TREC topics with 762 facets. Our experiments on an oracle model demonstrate that asking only one good question leads to over 170% retrieval performance improvement in terms of P@1, which clearly demonstrates the potential impact of the task. We further propose a retrieval framework consisting of three components: question retrieval, question selection, and document retrieval. In particular, our question selection model takes into account the original query and previous question-answer interactions while selecting the next question. Our model significantly outperforms competitive baselines. To foster research in this area, we have made Qulac publicly available.
Tasks
Published	2019-07-15
URL	https://arxiv.org/abs/1907.06554v1
PDF	https://arxiv.org/pdf/1907.06554v1.pdf
PWC	https://paperswithcode.com/paper/asking-clarifying-questions-in-open-domain
Repo	https://github.com/aliannejadi/qulac
Framework	none

Certified Adversarial Robustness via Randomized Smoothing


Title	Certified Adversarial Robustness via Randomized Smoothing
Authors	Jeremy M Cohen, Elan Rosenfeld, J. Zico Kolter
Abstract	We show how to turn any classifier that classifies well under Gaussian noise into a new classifier that is certifiably robust to adversarial perturbations under the $\ell_2$ norm. This “randomized smoothing” technique has been proposed recently in the literature, but existing guarantees are loose. We prove a tight robustness guarantee in $\ell_2$ norm for smoothing with Gaussian noise. We use randomized smoothing to obtain an ImageNet classifier with e.g. a certified top-1 accuracy of 49% under adversarial perturbations with $\ell_2$ norm less than 0.5 (=127/255). No certified defense has been shown feasible on ImageNet except for smoothing. On smaller-scale datasets where competing approaches to certified $\ell_2$ robustness are viable, smoothing delivers higher certified accuracies. Our strong empirical results suggest that randomized smoothing is a promising direction for future research into adversarially robust classification. Code and models are available at http://github.com/locuslab/smoothing.
Tasks	Adversarial Defense
Published	2019-02-08
URL	https://arxiv.org/abs/1902.02918v2
PDF	https://arxiv.org/pdf/1902.02918v2.pdf
PWC	https://paperswithcode.com/paper/certified-adversarial-robustness-via
Repo	https://github.com/locuslab/smoothing
Framework	pytorch

Is AmI (Attacks Meet Interpretability) Robust to Adversarial Examples?


Title	Is AmI (Attacks Meet Interpretability) Robust to Adversarial Examples?
Authors	Nicholas Carlini
Abstract	No.
Tasks	Adversarial Attack, Adversarial Defense
Published	2019-02-06
URL	http://arxiv.org/abs/1902.02322v1
PDF	http://arxiv.org/pdf/1902.02322v1.pdf
PWC	https://paperswithcode.com/paper/is-ami-attacks-meet-interpretability-robust
Repo	https://github.com/carlini/AmI
Framework	tf

Interpretable Embeddings From Molecular Simulations Using Gaussian Mixture Variational Autoencoders


Title	Interpretable Embeddings From Molecular Simulations Using Gaussian Mixture Variational Autoencoders
Authors	Yasemin Bozkurt Varolgunes, Tristan Bereau, Joseph F. Rudzinski
Abstract	Extracting insight from the enormous quantity of data generated from molecular simulations requires the identification of a small number of collective variables whose corresponding low-dimensional free-energy landscape retains the essential features of the underlying system. Data-driven techniques provide a systematic route to constructing this landscape, without the need for extensive a priori intuition into the relevant driving forces. In particular, autoencoders are powerful tools for dimensionality reduction, as they naturally force an information bottleneck and, thereby, a low-dimensional embedding of the essential features. While variational autoencoders ensure continuity of the embedding by assuming a unimodal Gaussian prior, this is at odds with the multi-basin free-energy landscapes that typically arise from the identification of meaningful collective variables. In this work, we incorporate this physical intuition into the prior by employing a Gaussian mixture variational autoencoder (GMVAE), which encourages the separation of metastable states within the embedding. The GMVAE performs dimensionality reduction and clustering within a single unified framework, and is capable of identifying the inherent dimensionality of the input data, in terms of the number of Gaussians required to categorize the data. We illustrate our approach on two toy models, alanine dipeptide, and a challenging disordered peptide ensemble, demonstrating the enhanced clustering effect of the GMVAE prior compared to standard VAEs. The resulting embeddings appear to be promising representations for constructing Markov state models, highlighting the transferability of the dimensionality reduction from static equilibrium properties to dynamics.
Tasks	Dimensionality Reduction
Published	2019-12-22
URL	https://arxiv.org/abs/1912.12175v1
PDF	https://arxiv.org/pdf/1912.12175v1.pdf
PWC	https://paperswithcode.com/paper/interpretable-embeddings-from-molecular
Repo	https://github.com/yabozkurt/gmvae
Framework	tf

Expert Sample Consensus Applied to Camera Re-Localization


Title	Expert Sample Consensus Applied to Camera Re-Localization
Authors	Eric Brachmann, Carsten Rother
Abstract	Fitting model parameters to a set of noisy data points is a common problem in computer vision. In this work, we fit the 6D camera pose to a set of noisy correspondences between the 2D input image and a known 3D environment. We estimate these correspondences from the image using a neural network. Since the correspondences often contain outliers, we utilize a robust estimator such as Random Sample Consensus (RANSAC) or Differentiable RANSAC (DSAC) to fit the pose parameters. When the problem domain, e.g. the space of all 2D-3D correspondences, is large or ambiguous, a single network does not cover the domain well. Mixture of Experts (MoE) is a popular strategy to divide a problem domain among an ensemble of specialized networks, so called experts, where a gating network decides which expert is responsible for a given input. In this work, we introduce Expert Sample Consensus (ESAC), which integrates DSAC in a MoE. Our main technical contribution is an efficient method to train ESAC jointly and end-to-end. We demonstrate experimentally that ESAC handles two real-world problems better than competing methods, i.e. scalability and ambiguity. We apply ESAC to fitting simple geometric models to synthetic images, and to camera re-localization for difficult, real datasets.
Tasks
Published	2019-08-07
URL	https://arxiv.org/abs/1908.02484v1
PDF	https://arxiv.org/pdf/1908.02484v1.pdf
PWC	https://paperswithcode.com/paper/expert-sample-consensus-applied-to-camera-re
Repo	https://github.com/vislearn/esac
Framework	pytorch

Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave Convolution


Title	Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave Convolution
Authors	Yunpeng Chen, Haoqi Fan, Bing Xu, Zhicheng Yan, Yannis Kalantidis, Marcus Rohrbach, Shuicheng Yan, Jiashi Feng
Abstract	In natural images, information is conveyed at different frequencies where higher frequencies are usually encoded with fine details and lower frequencies are usually encoded with global structures. Similarly, the output feature maps of a convolution layer can also be seen as a mixture of information at different frequencies. In this work, we propose to factorize the mixed feature maps by their frequencies, and design a novel Octave Convolution (OctConv) operation to store and process feature maps that vary spatially “slower” at a lower spatial resolution reducing both memory and computation cost. Unlike existing multi-scale methods, OctConv is formulated as a single, generic, plug-and-play convolutional unit that can be used as a direct replacement of (vanilla) convolutions without any adjustments in the network architecture. It is also orthogonal and complementary to methods that suggest better topologies or reduce channel-wise redundancy like group or depth-wise convolutions. We experimentally show that by simply replacing convolutions with OctConv, we can consistently boost accuracy for both image and video recognition tasks, while reducing memory and computational cost. An OctConv-equipped ResNet-152 can achieve 82.9% top-1 classification accuracy on ImageNet with merely 22.2 GFLOPs.
Tasks	Image Classification, Video Recognition
Published	2019-04-10
URL	https://arxiv.org/abs/1904.05049v3
PDF	https://arxiv.org/pdf/1904.05049v3.pdf
PWC	https://paperswithcode.com/paper/drop-an-octave-reducing-spatial-redundancy-in
Repo	https://github.com/matsuren/OctaveConv.pytorch
Framework	pytorch

Rethinking Normalization and Elimination Singularity in Neural Networks


Title	Rethinking Normalization and Elimination Singularity in Neural Networks
Authors	Siyuan Qiao, Huiyu Wang, Chenxi Liu, Wei Shen, Alan Yuille
Abstract	In this paper, we study normalization methods for neural networks from the perspective of elimination singularity. Elimination singularities correspond to the points on the training trajectory where neurons become consistently deactivated. They cause degenerate manifolds in the loss landscape which will slow down training and harm model performances. We show that channel-based normalizations (e.g. Layer Normalization and Group Normalization) are unable to guarantee a far distance from elimination singularities, in contrast with Batch Normalization which by design avoids models from getting too close to them. To address this issue, we propose BatchChannel Normalization (BCN), which uses batch knowledge to avoid the elimination singularities in the training of channel-normalized models. Unlike Batch Normalization, BCN is able to run in both large-batch and micro-batch training settings. The effectiveness of BCN is verified on many tasks, including image classification, object detection, instance segmentation, and semantic segmentation. The code is here: https://github.com/joe-siyuan-qiao/Batch-Channel-Normalization.
Tasks	Image Classification, Instance Segmentation, Object Detection, Semantic Segmentation
Published	2019-11-21
URL	https://arxiv.org/abs/1911.09738v1
PDF	https://arxiv.org/pdf/1911.09738v1.pdf
PWC	https://paperswithcode.com/paper/rethinking-normalization-and-elimination
Repo	https://github.com/joe-siyuan-qiao/Batch-Channel-Normalization
Framework	pytorch

Induction Networks for Few-Shot Text Classification


Title	Induction Networks for Few-Shot Text Classification
Authors	Ruiying Geng, Binhua Li, Yongbin Li, Xiaodan Zhu, Ping Jian, Jian Sun
Abstract	Text classification tends to struggle when data is deficient or when it needs to adapt to unseen classes. In such challenging scenarios, recent studies have used meta-learning to simulate the few-shot task, in which new queries are compared to a small support set at the sample-wise level. However, this sample-wise comparison may be severely disturbed by the various expressions in the same class. Therefore, we should be able to learn a general representation of each class in the support set and then compare it to new queries. In this paper, we propose a novel Induction Network to learn such a generalized class-wise representation, by innovatively leveraging the dynamic routing algorithm in meta-learning. In this way, we find the model is able to induce and generalize better. We evaluate the proposed model on a well-studied sentiment classification dataset (English) and a real-world dialogue intent classification dataset (Chinese). Experiment results show that on both datasets, the proposed model significantly outperforms the existing state-of-the-art approaches, proving the effectiveness of class-wise generalization in few-shot text classification.
Tasks	Intent Classification, Meta-Learning, Sentiment Analysis, Text Classification
Published	2019-02-27
URL	https://arxiv.org/abs/1902.10482v2
PDF	https://arxiv.org/pdf/1902.10482v2.pdf
PWC	https://paperswithcode.com/paper/few-shot-text-classification-with-induction
Repo	https://github.com/laohur/LearnToCompareText
Framework	pytorch

Risky Action Recognition in Lane Change Video Clips using Deep Spatiotemporal Networks with Segmentation Mask Transfer


Title	Risky Action Recognition in Lane Change Video Clips using Deep Spatiotemporal Networks with Segmentation Mask Transfer
Authors	Ekim Yurtsever, Yongkang Liu, Jacob Lambert, Chiyomi Miyajima, Eijiro Takeuchi, Kazuya Takeda, John H. L. Hansen
Abstract	Advanced driver assistance and automated driving systems rely on risk estimation modules to predict and avoid dangerous situations. Current methods use expensive sensor setups and complex processing pipeline, limiting their availability and robustness. To address these issues, we introduce a novel deep learning based action recognition framework for classifying dangerous lane change behavior in short video clips captured by a monocular camera. We designed a deep spatiotemporal classification network that uses pre-trained state-of-the-art instance segmentation network Mask R-CNN as its spatial feature extractor for this task. The Long-Short Term Memory (LSTM) and shallower final classification layers of the proposed method were trained on a semi-naturalistic lane change dataset with annotated risk labels. A comprehensive comparison of state-of-the-art feature extractors was carried out to find the best network layout and training strategy. The best result, with a 0.937 AUC score, was obtained with the proposed network. Our code and trained models are available open-source.
Tasks	Instance Segmentation, Semantic Segmentation
Published	2019-06-07
URL	https://arxiv.org/abs/1906.02859v2
PDF	https://arxiv.org/pdf/1906.02859v2.pdf
PWC	https://paperswithcode.com/paper/risky-action-recognition-in-lane-change-video
Repo	https://github.com/Ekim-Yurtsever/DeepTL-Lane-Change-Classification
Framework	tf