February 1, 2020

3474 words 17 mins read

Paper Group AWR 81

Bridging Adversarial Robustness and Gradient Interpretability. Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection. MelNet: A Generative Model for Audio in the Frequency Domain. Edge-Guided Occlusion Fading Reduction for a Light-Weighted Self-Supervised Monocular Depth Estimation. Fair Kernel Regre …

Bridging Adversarial Robustness and Gradient Interpretability


Title	Bridging Adversarial Robustness and Gradient Interpretability
Authors	Beomsu Kim, Junghoon Seo, Taegyun Jeon
Abstract	Adversarial training is a training scheme designed to counter adversarial attacks by augmenting the training dataset with adversarial examples. Surprisingly, several studies have observed that loss gradients from adversarially trained DNNs are visually more interpretable than those from standard DNNs. Although this phenomenon is interesting, there are only few works that have offered an explanation. In this paper, we attempted to bridge this gap between adversarial robustness and gradient interpretability. To this end, we identified that loss gradients from adversarially trained DNNs align better with human perception because adversarial training restricts gradients closer to the image manifold. We then demonstrated that adversarial training causes loss gradients to be quantitatively meaningful. Finally, we showed that under the adversarial training framework, there exists an empirical trade-off between test accuracy and loss gradient interpretability and proposed two potential approaches to resolving this trade-off.
Tasks
Published	2019-03-27
URL	http://arxiv.org/abs/1903.11626v2
PDF	http://arxiv.org/pdf/1903.11626v2.pdf
PWC	https://paperswithcode.com/paper/bridging-adversarial-robustness-and-gradient
Repo	https://github.com/1202kbs/Robustness-and-Interpretability
Framework	tf

Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection


Title	Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection
Authors	Shifeng Zhang, Cheng Chi, Yongqiang Yao, Zhen Lei, Stan Z. Li
Abstract	Object detection has been dominated by anchor-based detectors for several years. Recently, anchor-free detectors have become popular due to the proposal of FPN and Focal Loss. In this paper, we first point out that the essential difference between anchor-based and anchor-free detection is actually how to define positive and negative training samples, which leads to the performance gap between them. If they adopt the same definition of positive and negative samples during training, there is no obvious difference in the final performance, no matter regressing from a box or a point. This shows that how to select positive and negative training samples is important for current object detectors. Then, we propose an Adaptive Training Sample Selection (ATSS) to automatically select positive and negative samples according to statistical characteristics of object. It significantly improves the performance of anchor-based and anchor-free detectors and bridges the gap between them. Finally, we discuss the necessity of tiling multiple anchors per location on the image to detect objects. Extensive experiments conducted on MS COCO support our aforementioned analysis and conclusions. With the newly introduced ATSS, we improve state-of-the-art detectors by a large margin to $50.7%$ AP without introducing any overhead. The code is available at https://github.com/sfzhang15/ATSS
Tasks	Object Detection
Published	2019-12-05
URL	https://arxiv.org/abs/1912.02424v3
PDF	https://arxiv.org/pdf/1912.02424v3.pdf
PWC	https://paperswithcode.com/paper/bridging-the-gap-between-anchor-based-and
Repo	https://github.com/sfzhang15/ATSS
Framework	pytorch

MelNet: A Generative Model for Audio in the Frequency Domain


Title	MelNet: A Generative Model for Audio in the Frequency Domain
Authors	Sean Vasquez, Mike Lewis
Abstract	Capturing high-level structure in audio waveforms is challenging because a single second of audio spans tens of thousands of timesteps. While long-range dependencies are difficult to model directly in the time domain, we show that they can be more tractably modelled in two-dimensional time-frequency representations such as spectrograms. By leveraging this representational advantage, in conjunction with a highly expressive probabilistic model and a multiscale generation procedure, we design a model capable of generating high-fidelity audio samples which capture structure at timescales that time-domain models have yet to achieve. We apply our model to a variety of audio generation tasks, including unconditional speech generation, music generation, and text-to-speech synthesis—showing improvements over previous approaches in both density estimates and human judgments.
Tasks	Audio Generation, Music Generation, Speech Synthesis, Text-To-Speech Synthesis
Published	2019-06-04
URL	https://arxiv.org/abs/1906.01083v1
PDF	https://arxiv.org/pdf/1906.01083v1.pdf
PWC	https://paperswithcode.com/paper/melnet-a-generative-model-for-audio-in-the
Repo	https://github.com/YuvalBecker/MelNet
Framework	tf

Edge-Guided Occlusion Fading Reduction for a Light-Weighted Self-Supervised Monocular Depth Estimation


Title	Edge-Guided Occlusion Fading Reduction for a Light-Weighted Self-Supervised Monocular Depth Estimation
Authors	Kuo-Shiuan Peng, Gregory Ditzler, Jerzy Rozenblit
Abstract	Self-supervised monocular depth estimation methods generally suffer the occlusion fading issue due to the lack of supervision by the per pixel ground truth. Although a post-processing method was proposed by Godard et. al. to reduce the occlusion fading, the compensated results have a severe halo effect. In this paper, we propose a novel Edge-Guided post-processing to reduce the occlusion fading issue for self-supervised monocular depth estimation. We further introduce Atrous Spatial Pyramid Pooling (ASPP) into the network to reduce the computational costs and improve the inference performance. The proposed ASPP-based network is lighter, faster, and better than current commonly used depth estimation networks. This light-weight network only needs 8.1 million parameters and can achieve up to 40 frames per second for $256\times512$ input in the inference stage using a single nVIDIA GTX1080 GPU. The proposed network also outperforms the current state-of-the-art on the KITTI benchmarks. The ASPP-based network and Edge-Guided post-processing produce better results either quantitatively and qualitatively than the competitors.
Tasks	Depth Estimation, Monocular Depth Estimation
Published	2019-11-26
URL	https://arxiv.org/abs/1911.11705v1
PDF	https://arxiv.org/pdf/1911.11705v1.pdf
PWC	https://paperswithcode.com/paper/edge-guided-occlusion-fading-reduction-for-a
Repo	https://github.com/kspeng/lw-eg-monodepth
Framework	tf

Fair Kernel Regression via Fair Feature Embedding in Kernel Space


Title	Fair Kernel Regression via Fair Feature Embedding in Kernel Space
Authors	Austin Okray, Hui Hu, Chao Lan
Abstract	In recent years, there have been significant efforts on mitigating unethical demographic biases in machine learning methods. However, very little is done for kernel methods. In this paper, we propose a new fair kernel regression method via fair feature embedding (FKR-F$^2$E) in kernel space. Motivated by prior works on feature selection in kernel space and feature processing for fair machine learning, we propose to learn fair feature embedding functions that minimize demographic discrepancy of feature distributions in kernel space. Compared to the state-of-the-art fair kernel regression method and several baseline methods, we show FKR-F$^2$E achieves significantly lower prediction disparity across three real-world data sets.
Tasks	Feature Selection
Published	2019-07-04
URL	https://arxiv.org/abs/1907.02242v2
PDF	https://arxiv.org/pdf/1907.02242v2.pdf
PWC	https://paperswithcode.com/paper/fair-kernel-regression-via-fair-feature
Repo	https://github.com/aokray/FKRFFE
Framework	none

Learning Fixed Points in Generative Adversarial Networks: From Image-to-Image Translation to Disease Detection and Localization


Title	Learning Fixed Points in Generative Adversarial Networks: From Image-to-Image Translation to Disease Detection and Localization
Authors	Md Mahfuzur Rahman Siddiquee, Zongwei Zhou, Nima Tajbakhsh, Ruibin Feng, Michael B. Gotway, Yoshua Bengio, Jianming Liang
Abstract	Generative adversarial networks (GANs) have ushered in a revolution in image-to-image translation. The development and proliferation of GANs raises an interesting question: can we train a GAN to remove an object, if present, from an image while otherwise preserving the image? Specifically, can a GAN “virtually heal” anyone by turning his medical image, with an unknown health status (diseased or healthy), into a healthy one, so that diseased regions could be revealed by subtracting those two images? Such a task requires a GAN to identify a minimal subset of target pixels for domain translation, an ability that we call fixed-point translation, which no GAN is equipped with yet. Therefore, we propose a new GAN, called Fixed-Point GAN, trained by (1) supervising same-domain translation through a conditional identity loss, and (2) regularizing cross-domain translation through revised adversarial, domain classification, and cycle consistency loss. Based on fixed-point translation, we further derive a novel framework for disease detection and localization using only image-level annotation. Qualitative and quantitative evaluations demonstrate that the proposed method outperforms the state of the art in multi-domain image-to-image translation and that it surpasses predominant weakly-supervised localization methods in both disease detection and localization. Implementation is available at https://github.com/jlianglab/Fixed-Point-GAN.
Tasks	Image-to-Image Translation
Published	2019-08-16
URL	https://arxiv.org/abs/1908.06965v2
PDF	https://arxiv.org/pdf/1908.06965v2.pdf
PWC	https://paperswithcode.com/paper/learning-fixed-points-in-generative
Repo	https://github.com/jlianglab/Fixed-Point-GAN
Framework	pytorch

Probabilistic Reconstruction Networks for 3D Shape Inference from a Single Image


Title	Probabilistic Reconstruction Networks for 3D Shape Inference from a Single Image
Authors	Roman Klokov, Jakob Verbeek, Edmond Boyer
Abstract	We study end-to-end learning strategies for 3D shape inference from images, in particular from a single image. Several approaches in this direction have been investigated that explore different shape representations and suitable learning architectures. We focus instead on the underlying probabilistic mechanisms involved and contribute a more principled probabilistic inference-based reconstruction framework, which we coin Probabilistic Reconstruction Networks. This framework expresses image conditioned 3D shape inference through a family of latent variable models, and naturally decouples the choice of shape representations from the inference itself. Moreover, it suggests different options for the image conditioning and allows training in two regimes, using either Monte Carlo or variational approximation of the marginal likelihood. Using our Probabilistic Reconstruction Networks we obtain single image 3D reconstruction results that set a new state of the art on the ShapeNet dataset in terms of the intersection over union and earth mover’s distance evaluation metrics. Interestingly, we obtain these results using a basic voxel grid representation, improving over recent work based on finer point cloud or mesh based representations.
Tasks	3D Reconstruction, Latent Variable Models
Published	2019-08-20
URL	https://arxiv.org/abs/1908.07475v1
PDF	https://arxiv.org/pdf/1908.07475v1.pdf
PWC	https://paperswithcode.com/paper/probabilistic-reconstruction-networks-for-3d
Repo	https://github.com/Regenerator/prns
Framework	pytorch

Learning Belief Representations for Imitation Learning in POMDPs


Title	Learning Belief Representations for Imitation Learning in POMDPs
Authors	Tanmay Gangwani, Joel Lehman, Qiang Liu, Jian Peng
Abstract	We consider the problem of imitation learning from expert demonstrations in partially observable Markov decision processes (POMDPs). Belief representations, which characterize the distribution over the latent states in a POMDP, have been modeled using recurrent neural networks and probabilistic latent variable models, and shown to be effective for reinforcement learning in POMDPs. In this work, we investigate the belief representation learning problem for generative adversarial imitation learning in POMDPs. Instead of training the belief module and the policy separately as suggested in prior work, we learn the belief module jointly with the policy, using a task-aware imitation loss to ensure that the representation is more aligned with the policy’s objective. To improve robustness of representation, we introduce several informative belief regularization techniques, including multi-step prediction of dynamics and action-sequences. Evaluated on various partially observable continuous-control locomotion tasks, our belief-module imitation learning approach (BMIL) substantially outperforms several baselines, including the original GAIL algorithm and the task-agnostic belief learning algorithm. Extensive ablation analysis indicates the effectiveness of task-aware belief learning and belief regularization.
Tasks	Continuous Control, Imitation Learning, Latent Variable Models, Representation Learning
Published	2019-06-22
URL	https://arxiv.org/abs/1906.09510v1
PDF	https://arxiv.org/pdf/1906.09510v1.pdf
PWC	https://paperswithcode.com/paper/learning-belief-representations-for-imitation
Repo	https://github.com/tgangwani/BMIL
Framework	pytorch


Title	Scraping Social Media Photos Posted in Kenya and Elsewhere to Detect and Analyze Food Types
Authors	Kaihong Wang, Mona Jalal, Sankara Jefferson, Yi Zheng, Elaine O. Nsoesie, Margrit Betke
Abstract	Monitoring population-level changes in diet could be useful for education and for implementing interventions to improve health. Research has shown that data from social media sources can be used for monitoring dietary behavior. We propose a scrape-by-location methodology to create food image datasets from Instagram posts. We used it to collect 3.56 million images over a period of 20 days in March 2019. We also propose a scrape-by-keywords methodology and used it to scrape ~30,000 images and their captions of 38 Kenyan food types. We publish two datasets of 104,000 and 8,174 image/caption pairs, respectively. With the first dataset, Kenya104K, we train a Kenyan Food Classifier, called KenyanFC, to distinguish Kenyan food from non-food images posted in Kenya. We used the second dataset, KenyanFood13, to train a classifier KenyanFTR, short for Kenyan Food Type Recognizer, to recognize 13 popular food types in Kenya. The KenyanFTR is a multimodal deep neural network that can identify 13 types of Kenyan foods using both images and their corresponding captions. Experiments show that the average top-1 accuracy of KenyanFC is 99% over 10,400 tested Instagram images and of KenyanFTR is 81% over 8,174 tested data points. Ablation studies show that three of the 13 food types are particularly difficult to categorize based on image content only and that adding analysis of captions to the image analysis yields a classifier that is 9 percent points more accurate than a classifier that relies only on images. Our food trend analysis revealed that cakes and roasted meats were the most popular foods in photographs on Instagram in Kenya in March 2019.
Tasks
Published	2019-08-31
URL	https://arxiv.org/abs/1909.00134v1
PDF	https://arxiv.org/pdf/1909.00134v1.pdf
PWC	https://paperswithcode.com/paper/scraping-social-media-photos-posted-in-kenya
Repo	https://github.com/monajalal/Kenyan-Food
Framework	pytorch

Large-Scale Answerer in Questioner’s Mind for Visual Dialog Question Generation


Title	Large-Scale Answerer in Questioner’s Mind for Visual Dialog Question Generation
Authors	Sang-Woo Lee, Tong Gao, Sohee Yang, Jaejun Yoo, Jung-Woo Ha
Abstract	Answerer in Questioner’s Mind (AQM) is an information-theoretic framework that has been recently proposed for task-oriented dialog systems. AQM benefits from asking a question that would maximize the information gain when it is asked. However, due to its intrinsic nature of explicitly calculating the information gain, AQM has a limitation when the solution space is very large. To address this, we propose AQM+ that can deal with a large-scale problem and ask a question that is more coherent to the current context of the dialog. We evaluate our method on GuessWhich, a challenging task-oriented visual dialog problem, where the number of candidate classes is near 10K. Our experimental results and ablation studies show that AQM+ outperforms the state-of-the-art models by a remarkable margin with a reasonable approximation. In particular, the proposed AQM+ reduces more than 60% of error as the dialog proceeds, while the comparative algorithms diminish the error by less than 6%. Based on our results, we argue that AQM+ is a general task-oriented dialog algorithm that can be applied for non-yes-or-no responses.
Tasks	Question Generation, Visual Dialog
Published	2019-02-22
URL	http://arxiv.org/abs/1902.08355v1
PDF	http://arxiv.org/pdf/1902.08355v1.pdf
PWC	https://paperswithcode.com/paper/large-scale-answerer-in-questioners-mind-for
Repo	https://github.com/naver/aqm-plus
Framework	pytorch

Addressing Semantic Drift in Question Generation for Semi-Supervised Question Answering


Title	Addressing Semantic Drift in Question Generation for Semi-Supervised Question Answering
Authors	Shiyue Zhang, Mohit Bansal
Abstract	Text-based Question Generation (QG) aims at generating natural and relevant questions that can be answered by a given answer in some context. Existing QG models suffer from a “semantic drift” problem, i.e., the semantics of the model-generated question drifts away from the given context and answer. In this paper, we first propose two semantics-enhanced rewards obtained from downstream question paraphrasing and question answering tasks to regularize the QG model to generate semantically valid questions. Second, since the traditional evaluation metrics (e.g., BLEU) often fall short in evaluating the quality of generated questions, we propose a QA-based evaluation method which measures the QG model’s ability to mimic human annotators in generating QA training data. Experiments show that our method achieves the new state-of-the-art performance w.r.t. traditional metrics, and also performs best on our QA-based evaluation metrics. Further, we investigate how to use our QG model to augment QA datasets and enable semi-supervised QA. We propose two ways to generate synthetic QA pairs: generate new questions from existing articles or collect QA pairs from new articles. We also propose two empirically effective strategies, a data filter and mixing mini-batch training, to properly use the QG-generated data for QA. Experiments show that our method improves over both BiDAF and BERT QA baselines, even without introducing new articles.
Tasks	Question Answering, Question Generation
Published	2019-09-13
URL	https://arxiv.org/abs/1909.06356v1
PDF	https://arxiv.org/pdf/1909.06356v1.pdf
PWC	https://paperswithcode.com/paper/addressing-semantic-drift-in-question
Repo	https://github.com/ZhangShiyue/QGforQA
Framework	tf

Let’s Ask Again: Refine Network for Automatic Question Generation


Title	Let’s Ask Again: Refine Network for Automatic Question Generation
Authors	Preksha Nema, Akash Kumar Mohankumar, Mitesh M. Khapra, Balaji Vasan Srinivasan, Balaraman Ravindran
Abstract	In this work, we focus on the task of Automatic Question Generation (AQG) where given a passage and an answer the task is to generate the corresponding question. It is desired that the generated question should be (i) grammatically correct (ii) answerable from the passage and (iii) specific to the given answer. An analysis of existing AQG models shows that they produce questions which do not adhere to one or more of {the above-mentioned qualities}. In particular, the generated questions look like an incomplete draft of the desired question with a clear scope for refinement. {To alleviate this shortcoming}, we propose a method which tries to mimic the human process of generating questions by first creating an initial draft and then refining it. More specifically, we propose Refine Network (RefNet) which contains two decoders. The second decoder uses a dual attention network which pays attention to both (i) the original passage and (ii) the question (initial draft) generated by the first decoder. In effect, it refines the question generated by the first decoder, thereby making it more correct and complete. We evaluate RefNet on three datasets, \textit{viz.}, SQuAD, HOTPOT-QA, and DROP, and show that it outperforms existing state-of-the-art methods by 7-16% on all of these datasets. Lastly, we show that we can improve the quality of the second decoder on specific metrics, such as, fluency and answerability by explicitly rewarding revisions that improve on the corresponding metric during training. The code has been made publicly available \footnote{https://github.com/PrekshaNema25/RefNet-QG}
Tasks	Question Generation
Published	2019-08-31
URL	https://arxiv.org/abs/1909.05355v1
PDF	https://arxiv.org/pdf/1909.05355v1.pdf
PWC	https://paperswithcode.com/paper/lets-ask-again-refine-network-for-automatic
Repo	https://github.com/PrekshaNema25/RefNet-QG
Framework	tf

Acceleration of expensive computations in Bayesian statistics using vector operations


Title	Acceleration of expensive computations in Bayesian statistics using vector operations
Authors	David J. Warne, Scott A. Sisson, Christopher Drovandi
Abstract	Many applications in Bayesian statistics are extremely computationally intensive. However, they are also often inherently parallel, making them prime targets for modern massively parallel central processing unit (CPU) architectures. While the use of multi-core and distributed computing is widely applied in the Bayesian community, very little attention has been given to fine-grain parallelisation using single instruction multiple data (SIMD) operations that are available on most modern commodity CPUs. Rather, most fine-grain tuning in the literature has centred around general purpose graphics processing units (GPGPUs). Since the effective utilisation of GPGPUs typically requires specialised programming languages, such technologies are not ideal for the wider Bayesian community. In this work, we practically demonstrate, using standard programming libraries, the utility of the SIMD approach for several topical Bayesian applications. In particular, we consider sampling of the prior predictive distribution for approximate Bayesian computation (ABC), the computation of Bayesian $p$-values for testing prior weak informativeness, and inference on a computationally challenging econometrics model. Through minor code alterations, we show that SIMD operations can improve the floating point arithmetic performance resulting in up to $6\times$ improvement in the overall serial algorithm performance. Furthermore $4$-way parallel versions can lead to almost $19\times$ improvement over a na"{i}ve serial implementation. We illustrate the potential of SIMD operations for accelerating Bayesian computations and provide the reader with essential implementation techniques required to exploit modern massively parallel processing environments using standard software development tools.
Tasks
Published	2019-02-25
URL	https://arxiv.org/abs/1902.09046v2
PDF	https://arxiv.org/pdf/1902.09046v2.pdf
PWC	https://paperswithcode.com/paper/acceleration-of-expensive-computations-in
Repo	https://github.com/davidwarne/Bayesian_SIMD_examples
Framework	none

Contextual Recurrent Neural Networks


Title	Contextual Recurrent Neural Networks
Authors	Sam Wenke, Jim Fleming
Abstract	There is an implicit assumption that by unfolding recurrent neural networks (RNN) in finite time, the misspecification of choosing a zero value for the initial hidden state is mitigated by later time steps. This assumption has been shown to work in practice and alternative initialization may be suggested but often overlooked. In this paper, we propose a method of parameterizing the initial hidden state of an RNN. The resulting architecture, referred to as a Contextual RNN, can be trained end-to-end. The performance on an associative retrieval task is found to improve by conditioning the RNN initial hidden state on contextual information from the input sequence. Furthermore, we propose a novel method of conditionally generating sequences using the hidden state parameterization of Contextual RNN.
Tasks
Published	2019-02-09
URL	http://arxiv.org/abs/1902.03455v1
PDF	http://arxiv.org/pdf/1902.03455v1.pdf
PWC	https://paperswithcode.com/paper/contextual-recurrent-neural-networks
Repo	https://github.com/fomorians/contextual_rnn
Framework	tf

The Weighted Tsetlin Machine: Compressed Representations with Weighted Clauses


Title	The Weighted Tsetlin Machine: Compressed Representations with Weighted Clauses
Authors	Adrian Phoulady, Ole-Christoffer Granmo, Saeed Rahimi Gorji, Hady Ahmady Phoulady
Abstract	The Tsetlin Machine (TM) is an interpretable mechanism for pattern recognition that constructs conjunctive clauses from data. The clauses capture frequent patterns with high discriminating power, providing increasing expression power with each additional clause. However, the resulting accuracy gain comes at the cost of linear growth in computation time and memory usage. In this paper, we present the Weighted Tsetlin Machine (WTM), which reduces computation time and memory usage by weighting the clauses. Real-valued weighting allows one clause to replace multiple, and supports fine-tuning the impact of each clause. Our novel scheme simultaneously learns both the composition of the clauses and their weights. Furthermore, we increase training efficiency by replacing $k$ Bernoulli trials of success probability $p$ with a uniform sample of average size $p k$, the size drawn from a binomial distribution. In our empirical evaluation, the WTM achieved the same accuracy as the TM on MNIST, IMDb, and Connect-4, requiring only $1/4$, $1/3$, and $1/50$ of the clauses, respectively. With the same number of clauses, the WTM outperformed the TM, obtaining peak test accuracies of respectively $98.63%$, $90.37%$, and $87.91%$. Finally, our novel sampling scheme reduced sample generation time by a factor of $7$.
Tasks
Published	2019-11-28
URL	https://arxiv.org/abs/1911.12607v4
PDF	https://arxiv.org/pdf/1911.12607v4.pdf
PWC	https://paperswithcode.com/paper/the-weighted-tsetlin-machine-compressed
Repo	https://github.com/cair/pyTsetlinMachine
Framework	none