Paper Group AWR 113
Robust Learning with Jacobian Regularization. Compact Global Descriptor for Neural Networks. Proactive Human-Machine Conversation with Explicit Conversation Goals. Sum-of-Squares Polynomial Flow. Progressive Pose Attention Transfer for Person Image Generation. Automatic and Simultaneous Adjustment of Learning Rate and Momentum for Stochastic Gradie …
Robust Learning with Jacobian Regularization
Title | Robust Learning with Jacobian Regularization |
Authors | Judy Hoffman, Daniel A. Roberts, Sho Yaida |
Abstract | Design of reliable systems must guarantee stability against input perturbations. In machine learning, such guarantee entails preventing overfitting and ensuring robustness of models against corruption of input data. In order to maximize stability, we analyze and develop a computationally efficient implementation of Jacobian regularization that increases classification margins of neural networks. The stabilizing effect of the Jacobian regularizer leads to significant improvements in robustness, as measured against both random and adversarial input perturbations, without severely degrading generalization properties on clean data. |
Tasks | |
Published | 2019-08-07 |
URL | https://arxiv.org/abs/1908.02729v1 |
https://arxiv.org/pdf/1908.02729v1.pdf | |
PWC | https://paperswithcode.com/paper/robust-learning-with-jacobian-regularization |
Repo | https://github.com/facebookresearch/jacobian_regularizer |
Framework | pytorch |
Compact Global Descriptor for Neural Networks
Title | Compact Global Descriptor for Neural Networks |
Authors | Xiangyu He, Ke Cheng, Qiang Chen, Qinghao Hu, Peisong Wang, Jian Cheng |
Abstract | Long-range dependencies modeling, widely used in capturing spatiotemporal correlation, has shown to be effective in CNN dominated computer vision tasks. Yet neither stacks of convolutional operations to enlarge receptive fields nor recent nonlocal modules is computationally efficient. In this paper, we present a generic family of lightweight global descriptors for modeling the interactions between positions across different dimensions (e.g., channels, frames). This descriptor enables subsequent convolutions to access the informative global features with negligible computational complexity and parameters. Benchmark experiments show that the proposed method can complete state-of-the-art long-range mechanisms with a significant reduction in extra computing cost. Code available at https://github.com/HolmesShuan/Compact-Global-Descriptor. |
Tasks | Audio Classification, Deep Attention, Image Classification, Object Detection |
Published | 2019-07-23 |
URL | https://arxiv.org/abs/1907.09665v3 |
https://arxiv.org/pdf/1907.09665v3.pdf | |
PWC | https://paperswithcode.com/paper/compact-global-descriptor-for-neural-networks |
Repo | https://github.com/HolmesShuan/Compact-Global-Descriptor |
Framework | pytorch |
Proactive Human-Machine Conversation with Explicit Conversation Goals
Title | Proactive Human-Machine Conversation with Explicit Conversation Goals |
Authors | Wenquan Wu, Zhen Guo, Xiangyang Zhou, Hua Wu, Xiyuan Zhang, Rongzhong Lian, Haifeng Wang |
Abstract | Though great progress has been made for human-machine conversation, current dialogue system is still in its infancy: it usually converses passively and utters words more as a matter of response, rather than on its own initiatives. In this paper, we take a radical step towards building a human-like conversational agent: endowing it with the ability of proactively leading the conversation (introducing a new topic or maintaining the current topic). To facilitate the development of such conversation systems, we create a new dataset named DuConv where one acts as a conversation leader and the other acts as the follower. The leader is provided with a knowledge graph and asked to sequentially change the discussion topics, following the given conversation goal, and meanwhile keep the dialogue as natural and engaging as possible. DuConv enables a very challenging task as the model needs to both understand dialogue and plan over the given knowledge graph. We establish baseline results on this dataset (about 270K utterances and 30k dialogues) using several state-of-the-art models. Experimental results show that dialogue models that plan over the knowledge graph can make full use of related knowledge to generate more diverse multi-turn conversations. The baseline systems along with the dataset are publicly available |
Tasks | |
Published | 2019-06-13 |
URL | https://arxiv.org/abs/1906.05572v2 |
https://arxiv.org/pdf/1906.05572v2.pdf | |
PWC | https://paperswithcode.com/paper/proactive-human-machine-conversation-with |
Repo | https://github.com/baidu/Dialogue |
Framework | tf |
Sum-of-Squares Polynomial Flow
Title | Sum-of-Squares Polynomial Flow |
Authors | Priyank Jaini, Kira A. Selby, Yaoliang Yu |
Abstract | Triangular map is a recent construct in probability theory that allows one to transform any source probability density function to any target density function. Based on triangular maps, we propose a general framework for high-dimensional density estimation, by specifying one-dimensional transformations (equivalently conditional densities) and appropriate conditioner networks. This framework (a) reveals the commonalities and differences of existing autoregressive and flow based methods, (b) allows a unified understanding of the limitations and representation power of these recent approaches and, (c) motivates us to uncover a new Sum-of-Squares (SOS) flow that is interpretable, universal, and easy to train. We perform several synthetic experiments on various density geometries to demonstrate the benefits (and short-comings) of such transformations. SOS flows achieve competitive results in simulations and several real-world datasets. |
Tasks | Density Estimation |
Published | 2019-05-07 |
URL | https://arxiv.org/abs/1905.02325v2 |
https://arxiv.org/pdf/1905.02325v2.pdf | |
PWC | https://paperswithcode.com/paper/sum-of-squares-polynomial-flow |
Repo | https://github.com/GinGinWang/MTQ |
Framework | pytorch |
Progressive Pose Attention Transfer for Person Image Generation
Title | Progressive Pose Attention Transfer for Person Image Generation |
Authors | Zhen Zhu, Tengteng Huang, Baoguang Shi, Miao Yu, Bofei Wang, Xiang Bai |
Abstract | This paper proposes a new generative adversarial network for pose transfer, i.e., transferring the pose of a given person to a target pose. The generator of the network comprises a sequence of Pose-Attentional Transfer Blocks that each transfers certain regions it attends to, generating the person image progressively. Compared with those in previous works, our generated person images possess better appearance consistency and shape consistency with the input images, thus significantly more realistic-looking. The efficacy and efficiency of the proposed network are validated both qualitatively and quantitatively on Market-1501 and DeepFashion. Furthermore, the proposed architecture can generate training images for person re-identification, alleviating data insufficiency. Codes and models are available at: https://github.com/tengteng95/Pose-Transfer.git. |
Tasks | Image Generation, Person Re-Identification, Pose Transfer |
Published | 2019-04-06 |
URL | https://arxiv.org/abs/1904.03349v3 |
https://arxiv.org/pdf/1904.03349v3.pdf | |
PWC | https://paperswithcode.com/paper/progressive-pose-attention-transfer-for |
Repo | https://github.com/zsypotter/pose_transfer_keypoint |
Framework | pytorch |
Automatic and Simultaneous Adjustment of Learning Rate and Momentum for Stochastic Gradient Descent
Title | Automatic and Simultaneous Adjustment of Learning Rate and Momentum for Stochastic Gradient Descent |
Authors | Tomer Lancewicki, Selcuk Kopru |
Abstract | Stochastic Gradient Descent (SGD) methods are prominent for training machine learning and deep learning models. The performance of these techniques depends on their hyperparameter tuning over time and varies for different models and problems. Manual adjustment of hyperparameters is very costly and time-consuming, and even if done correctly, it lacks theoretical justification which inevitably leads to “rule of thumb” settings. In this paper, we propose a generic approach that utilizes the statistics of an unbiased gradient estimator to automatically and simultaneously adjust two paramount hyperparameters: the learning rate and momentum. We deploy the proposed general technique for various SGD methods to train Convolutional Neural Networks (CNN’s). The results match the performance of the best settings obtained through an exhaustive search and therefore, removes the need for a tedious manual tuning. |
Tasks | |
Published | 2019-08-20 |
URL | https://arxiv.org/abs/1908.07607v1 |
https://arxiv.org/pdf/1908.07607v1.pdf | |
PWC | https://paperswithcode.com/paper/190807607 |
Repo | https://github.com/eBay/AutoOpt |
Framework | pytorch |
Asking Clarifying Questions in Open-Domain Information-Seeking Conversations
Title | Asking Clarifying Questions in Open-Domain Information-Seeking Conversations |
Authors | Mohammad Aliannejadi, Hamed Zamani, Fabio Crestani, W. Bruce Croft |
Abstract | Users often fail to formulate their complex information needs in a single query. As a consequence, they may need to scan multiple result pages or reformulate their queries, which may be a frustrating experience. Alternatively, systems can improve user satisfaction by proactively asking questions of the users to clarify their information needs. Asking clarifying questions is especially important in conversational systems since they can only return a limited number of (often only one) result(s). In this paper, we formulate the task of asking clarifying questions in open-domain information-seeking conversational systems. To this end, we propose an offline evaluation methodology for the task and collect a dataset, called Qulac, through crowdsourcing. Our dataset is built on top of the TREC Web Track 2009-2012 data and consists of over 10K question-answer pairs for 198 TREC topics with 762 facets. Our experiments on an oracle model demonstrate that asking only one good question leads to over 170% retrieval performance improvement in terms of P@1, which clearly demonstrates the potential impact of the task. We further propose a retrieval framework consisting of three components: question retrieval, question selection, and document retrieval. In particular, our question selection model takes into account the original query and previous question-answer interactions while selecting the next question. Our model significantly outperforms competitive baselines. To foster research in this area, we have made Qulac publicly available. |
Tasks | |
Published | 2019-07-15 |
URL | https://arxiv.org/abs/1907.06554v1 |
https://arxiv.org/pdf/1907.06554v1.pdf | |
PWC | https://paperswithcode.com/paper/asking-clarifying-questions-in-open-domain |
Repo | https://github.com/aliannejadi/qulac |
Framework | none |
Certified Adversarial Robustness via Randomized Smoothing
Title | Certified Adversarial Robustness via Randomized Smoothing |
Authors | Jeremy M Cohen, Elan Rosenfeld, J. Zico Kolter |
Abstract | We show how to turn any classifier that classifies well under Gaussian noise into a new classifier that is certifiably robust to adversarial perturbations under the $\ell_2$ norm. This “randomized smoothing” technique has been proposed recently in the literature, but existing guarantees are loose. We prove a tight robustness guarantee in $\ell_2$ norm for smoothing with Gaussian noise. We use randomized smoothing to obtain an ImageNet classifier with e.g. a certified top-1 accuracy of 49% under adversarial perturbations with $\ell_2$ norm less than 0.5 (=127/255). No certified defense has been shown feasible on ImageNet except for smoothing. On smaller-scale datasets where competing approaches to certified $\ell_2$ robustness are viable, smoothing delivers higher certified accuracies. Our strong empirical results suggest that randomized smoothing is a promising direction for future research into adversarially robust classification. Code and models are available at http://github.com/locuslab/smoothing. |
Tasks | Adversarial Defense |
Published | 2019-02-08 |
URL | https://arxiv.org/abs/1902.02918v2 |
https://arxiv.org/pdf/1902.02918v2.pdf | |
PWC | https://paperswithcode.com/paper/certified-adversarial-robustness-via |
Repo | https://github.com/locuslab/smoothing |
Framework | pytorch |
Is AmI (Attacks Meet Interpretability) Robust to Adversarial Examples?
Title | Is AmI (Attacks Meet Interpretability) Robust to Adversarial Examples? |
Authors | Nicholas Carlini |
Abstract | No. |
Tasks | Adversarial Attack, Adversarial Defense |
Published | 2019-02-06 |
URL | http://arxiv.org/abs/1902.02322v1 |
http://arxiv.org/pdf/1902.02322v1.pdf | |
PWC | https://paperswithcode.com/paper/is-ami-attacks-meet-interpretability-robust |
Repo | https://github.com/carlini/AmI |
Framework | tf |
Interpretable Embeddings From Molecular Simulations Using Gaussian Mixture Variational Autoencoders
Title | Interpretable Embeddings From Molecular Simulations Using Gaussian Mixture Variational Autoencoders |
Authors | Yasemin Bozkurt Varolgunes, Tristan Bereau, Joseph F. Rudzinski |
Abstract | Extracting insight from the enormous quantity of data generated from molecular simulations requires the identification of a small number of collective variables whose corresponding low-dimensional free-energy landscape retains the essential features of the underlying system. Data-driven techniques provide a systematic route to constructing this landscape, without the need for extensive a priori intuition into the relevant driving forces. In particular, autoencoders are powerful tools for dimensionality reduction, as they naturally force an information bottleneck and, thereby, a low-dimensional embedding of the essential features. While variational autoencoders ensure continuity of the embedding by assuming a unimodal Gaussian prior, this is at odds with the multi-basin free-energy landscapes that typically arise from the identification of meaningful collective variables. In this work, we incorporate this physical intuition into the prior by employing a Gaussian mixture variational autoencoder (GMVAE), which encourages the separation of metastable states within the embedding. The GMVAE performs dimensionality reduction and clustering within a single unified framework, and is capable of identifying the inherent dimensionality of the input data, in terms of the number of Gaussians required to categorize the data. We illustrate our approach on two toy models, alanine dipeptide, and a challenging disordered peptide ensemble, demonstrating the enhanced clustering effect of the GMVAE prior compared to standard VAEs. The resulting embeddings appear to be promising representations for constructing Markov state models, highlighting the transferability of the dimensionality reduction from static equilibrium properties to dynamics. |
Tasks | Dimensionality Reduction |
Published | 2019-12-22 |
URL | https://arxiv.org/abs/1912.12175v1 |
https://arxiv.org/pdf/1912.12175v1.pdf | |
PWC | https://paperswithcode.com/paper/interpretable-embeddings-from-molecular |
Repo | https://github.com/yabozkurt/gmvae |
Framework | tf |
Expert Sample Consensus Applied to Camera Re-Localization
Title | Expert Sample Consensus Applied to Camera Re-Localization |
Authors | Eric Brachmann, Carsten Rother |
Abstract | Fitting model parameters to a set of noisy data points is a common problem in computer vision. In this work, we fit the 6D camera pose to a set of noisy correspondences between the 2D input image and a known 3D environment. We estimate these correspondences from the image using a neural network. Since the correspondences often contain outliers, we utilize a robust estimator such as Random Sample Consensus (RANSAC) or Differentiable RANSAC (DSAC) to fit the pose parameters. When the problem domain, e.g. the space of all 2D-3D correspondences, is large or ambiguous, a single network does not cover the domain well. Mixture of Experts (MoE) is a popular strategy to divide a problem domain among an ensemble of specialized networks, so called experts, where a gating network decides which expert is responsible for a given input. In this work, we introduce Expert Sample Consensus (ESAC), which integrates DSAC in a MoE. Our main technical contribution is an efficient method to train ESAC jointly and end-to-end. We demonstrate experimentally that ESAC handles two real-world problems better than competing methods, i.e. scalability and ambiguity. We apply ESAC to fitting simple geometric models to synthetic images, and to camera re-localization for difficult, real datasets. |
Tasks | |
Published | 2019-08-07 |
URL | https://arxiv.org/abs/1908.02484v1 |
https://arxiv.org/pdf/1908.02484v1.pdf | |
PWC | https://paperswithcode.com/paper/expert-sample-consensus-applied-to-camera-re |
Repo | https://github.com/vislearn/esac |
Framework | pytorch |
Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave Convolution
Title | Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave Convolution |
Authors | Yunpeng Chen, Haoqi Fan, Bing Xu, Zhicheng Yan, Yannis Kalantidis, Marcus Rohrbach, Shuicheng Yan, Jiashi Feng |
Abstract | In natural images, information is conveyed at different frequencies where higher frequencies are usually encoded with fine details and lower frequencies are usually encoded with global structures. Similarly, the output feature maps of a convolution layer can also be seen as a mixture of information at different frequencies. In this work, we propose to factorize the mixed feature maps by their frequencies, and design a novel Octave Convolution (OctConv) operation to store and process feature maps that vary spatially “slower” at a lower spatial resolution reducing both memory and computation cost. Unlike existing multi-scale methods, OctConv is formulated as a single, generic, plug-and-play convolutional unit that can be used as a direct replacement of (vanilla) convolutions without any adjustments in the network architecture. It is also orthogonal and complementary to methods that suggest better topologies or reduce channel-wise redundancy like group or depth-wise convolutions. We experimentally show that by simply replacing convolutions with OctConv, we can consistently boost accuracy for both image and video recognition tasks, while reducing memory and computational cost. An OctConv-equipped ResNet-152 can achieve 82.9% top-1 classification accuracy on ImageNet with merely 22.2 GFLOPs. |
Tasks | Image Classification, Video Recognition |
Published | 2019-04-10 |
URL | https://arxiv.org/abs/1904.05049v3 |
https://arxiv.org/pdf/1904.05049v3.pdf | |
PWC | https://paperswithcode.com/paper/drop-an-octave-reducing-spatial-redundancy-in |
Repo | https://github.com/matsuren/OctaveConv.pytorch |
Framework | pytorch |
Rethinking Normalization and Elimination Singularity in Neural Networks
Title | Rethinking Normalization and Elimination Singularity in Neural Networks |
Authors | Siyuan Qiao, Huiyu Wang, Chenxi Liu, Wei Shen, Alan Yuille |
Abstract | In this paper, we study normalization methods for neural networks from the perspective of elimination singularity. Elimination singularities correspond to the points on the training trajectory where neurons become consistently deactivated. They cause degenerate manifolds in the loss landscape which will slow down training and harm model performances. We show that channel-based normalizations (e.g. Layer Normalization and Group Normalization) are unable to guarantee a far distance from elimination singularities, in contrast with Batch Normalization which by design avoids models from getting too close to them. To address this issue, we propose BatchChannel Normalization (BCN), which uses batch knowledge to avoid the elimination singularities in the training of channel-normalized models. Unlike Batch Normalization, BCN is able to run in both large-batch and micro-batch training settings. The effectiveness of BCN is verified on many tasks, including image classification, object detection, instance segmentation, and semantic segmentation. The code is here: https://github.com/joe-siyuan-qiao/Batch-Channel-Normalization. |
Tasks | Image Classification, Instance Segmentation, Object Detection, Semantic Segmentation |
Published | 2019-11-21 |
URL | https://arxiv.org/abs/1911.09738v1 |
https://arxiv.org/pdf/1911.09738v1.pdf | |
PWC | https://paperswithcode.com/paper/rethinking-normalization-and-elimination |
Repo | https://github.com/joe-siyuan-qiao/Batch-Channel-Normalization |
Framework | pytorch |
Induction Networks for Few-Shot Text Classification
Title | Induction Networks for Few-Shot Text Classification |
Authors | Ruiying Geng, Binhua Li, Yongbin Li, Xiaodan Zhu, Ping Jian, Jian Sun |
Abstract | Text classification tends to struggle when data is deficient or when it needs to adapt to unseen classes. In such challenging scenarios, recent studies have used meta-learning to simulate the few-shot task, in which new queries are compared to a small support set at the sample-wise level. However, this sample-wise comparison may be severely disturbed by the various expressions in the same class. Therefore, we should be able to learn a general representation of each class in the support set and then compare it to new queries. In this paper, we propose a novel Induction Network to learn such a generalized class-wise representation, by innovatively leveraging the dynamic routing algorithm in meta-learning. In this way, we find the model is able to induce and generalize better. We evaluate the proposed model on a well-studied sentiment classification dataset (English) and a real-world dialogue intent classification dataset (Chinese). Experiment results show that on both datasets, the proposed model significantly outperforms the existing state-of-the-art approaches, proving the effectiveness of class-wise generalization in few-shot text classification. |
Tasks | Intent Classification, Meta-Learning, Sentiment Analysis, Text Classification |
Published | 2019-02-27 |
URL | https://arxiv.org/abs/1902.10482v2 |
https://arxiv.org/pdf/1902.10482v2.pdf | |
PWC | https://paperswithcode.com/paper/few-shot-text-classification-with-induction |
Repo | https://github.com/laohur/LearnToCompareText |
Framework | pytorch |
Risky Action Recognition in Lane Change Video Clips using Deep Spatiotemporal Networks with Segmentation Mask Transfer
Title | Risky Action Recognition in Lane Change Video Clips using Deep Spatiotemporal Networks with Segmentation Mask Transfer |
Authors | Ekim Yurtsever, Yongkang Liu, Jacob Lambert, Chiyomi Miyajima, Eijiro Takeuchi, Kazuya Takeda, John H. L. Hansen |
Abstract | Advanced driver assistance and automated driving systems rely on risk estimation modules to predict and avoid dangerous situations. Current methods use expensive sensor setups and complex processing pipeline, limiting their availability and robustness. To address these issues, we introduce a novel deep learning based action recognition framework for classifying dangerous lane change behavior in short video clips captured by a monocular camera. We designed a deep spatiotemporal classification network that uses pre-trained state-of-the-art instance segmentation network Mask R-CNN as its spatial feature extractor for this task. The Long-Short Term Memory (LSTM) and shallower final classification layers of the proposed method were trained on a semi-naturalistic lane change dataset with annotated risk labels. A comprehensive comparison of state-of-the-art feature extractors was carried out to find the best network layout and training strategy. The best result, with a 0.937 AUC score, was obtained with the proposed network. Our code and trained models are available open-source. |
Tasks | Instance Segmentation, Semantic Segmentation |
Published | 2019-06-07 |
URL | https://arxiv.org/abs/1906.02859v2 |
https://arxiv.org/pdf/1906.02859v2.pdf | |
PWC | https://paperswithcode.com/paper/risky-action-recognition-in-lane-change-video |
Repo | https://github.com/Ekim-Yurtsever/DeepTL-Lane-Change-Classification |
Framework | tf |