February 1, 2020

3474 words 17 mins read

Paper Group AWR 81

Paper Group AWR 81

Bridging Adversarial Robustness and Gradient Interpretability. Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection. MelNet: A Generative Model for Audio in the Frequency Domain. Edge-Guided Occlusion Fading Reduction for a Light-Weighted Self-Supervised Monocular Depth Estimation. Fair Kernel Regre …

Bridging Adversarial Robustness and Gradient Interpretability

Title Bridging Adversarial Robustness and Gradient Interpretability
Authors Beomsu Kim, Junghoon Seo, Taegyun Jeon
Abstract Adversarial training is a training scheme designed to counter adversarial attacks by augmenting the training dataset with adversarial examples. Surprisingly, several studies have observed that loss gradients from adversarially trained DNNs are visually more interpretable than those from standard DNNs. Although this phenomenon is interesting, there are only few works that have offered an explanation. In this paper, we attempted to bridge this gap between adversarial robustness and gradient interpretability. To this end, we identified that loss gradients from adversarially trained DNNs align better with human perception because adversarial training restricts gradients closer to the image manifold. We then demonstrated that adversarial training causes loss gradients to be quantitatively meaningful. Finally, we showed that under the adversarial training framework, there exists an empirical trade-off between test accuracy and loss gradient interpretability and proposed two potential approaches to resolving this trade-off.
Tasks
Published 2019-03-27
URL http://arxiv.org/abs/1903.11626v2
PDF http://arxiv.org/pdf/1903.11626v2.pdf
PWC https://paperswithcode.com/paper/bridging-adversarial-robustness-and-gradient
Repo https://github.com/1202kbs/Robustness-and-Interpretability
Framework tf

Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection

Title Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection
Authors Shifeng Zhang, Cheng Chi, Yongqiang Yao, Zhen Lei, Stan Z. Li
Abstract Object detection has been dominated by anchor-based detectors for several years. Recently, anchor-free detectors have become popular due to the proposal of FPN and Focal Loss. In this paper, we first point out that the essential difference between anchor-based and anchor-free detection is actually how to define positive and negative training samples, which leads to the performance gap between them. If they adopt the same definition of positive and negative samples during training, there is no obvious difference in the final performance, no matter regressing from a box or a point. This shows that how to select positive and negative training samples is important for current object detectors. Then, we propose an Adaptive Training Sample Selection (ATSS) to automatically select positive and negative samples according to statistical characteristics of object. It significantly improves the performance of anchor-based and anchor-free detectors and bridges the gap between them. Finally, we discuss the necessity of tiling multiple anchors per location on the image to detect objects. Extensive experiments conducted on MS COCO support our aforementioned analysis and conclusions. With the newly introduced ATSS, we improve state-of-the-art detectors by a large margin to $50.7%$ AP without introducing any overhead. The code is available at https://github.com/sfzhang15/ATSS
Tasks Object Detection
Published 2019-12-05
URL https://arxiv.org/abs/1912.02424v3
PDF https://arxiv.org/pdf/1912.02424v3.pdf
PWC https://paperswithcode.com/paper/bridging-the-gap-between-anchor-based-and
Repo https://github.com/sfzhang15/ATSS
Framework pytorch

MelNet: A Generative Model for Audio in the Frequency Domain

Title MelNet: A Generative Model for Audio in the Frequency Domain
Authors Sean Vasquez, Mike Lewis
Abstract Capturing high-level structure in audio waveforms is challenging because a single second of audio spans tens of thousands of timesteps. While long-range dependencies are difficult to model directly in the time domain, we show that they can be more tractably modelled in two-dimensional time-frequency representations such as spectrograms. By leveraging this representational advantage, in conjunction with a highly expressive probabilistic model and a multiscale generation procedure, we design a model capable of generating high-fidelity audio samples which capture structure at timescales that time-domain models have yet to achieve. We apply our model to a variety of audio generation tasks, including unconditional speech generation, music generation, and text-to-speech synthesis—showing improvements over previous approaches in both density estimates and human judgments.
Tasks Audio Generation, Music Generation, Speech Synthesis, Text-To-Speech Synthesis
Published 2019-06-04
URL https://arxiv.org/abs/1906.01083v1
PDF https://arxiv.org/pdf/1906.01083v1.pdf
PWC https://paperswithcode.com/paper/melnet-a-generative-model-for-audio-in-the
Repo https://github.com/YuvalBecker/MelNet
Framework tf

Edge-Guided Occlusion Fading Reduction for a Light-Weighted Self-Supervised Monocular Depth Estimation

Title Edge-Guided Occlusion Fading Reduction for a Light-Weighted Self-Supervised Monocular Depth Estimation
Authors Kuo-Shiuan Peng, Gregory Ditzler, Jerzy Rozenblit
Abstract Self-supervised monocular depth estimation methods generally suffer the occlusion fading issue due to the lack of supervision by the per pixel ground truth. Although a post-processing method was proposed by Godard et. al. to reduce the occlusion fading, the compensated results have a severe halo effect. In this paper, we propose a novel Edge-Guided post-processing to reduce the occlusion fading issue for self-supervised monocular depth estimation. We further introduce Atrous Spatial Pyramid Pooling (ASPP) into the network to reduce the computational costs and improve the inference performance. The proposed ASPP-based network is lighter, faster, and better than current commonly used depth estimation networks. This light-weight network only needs 8.1 million parameters and can achieve up to 40 frames per second for $256\times512$ input in the inference stage using a single nVIDIA GTX1080 GPU. The proposed network also outperforms the current state-of-the-art on the KITTI benchmarks. The ASPP-based network and Edge-Guided post-processing produce better results either quantitatively and qualitatively than the competitors.
Tasks Depth Estimation, Monocular Depth Estimation
Published 2019-11-26
URL https://arxiv.org/abs/1911.11705v1
PDF https://arxiv.org/pdf/1911.11705v1.pdf
PWC https://paperswithcode.com/paper/edge-guided-occlusion-fading-reduction-for-a
Repo https://github.com/kspeng/lw-eg-monodepth
Framework tf

Fair Kernel Regression via Fair Feature Embedding in Kernel Space

Title Fair Kernel Regression via Fair Feature Embedding in Kernel Space
Authors Austin Okray, Hui Hu, Chao Lan
Abstract In recent years, there have been significant efforts on mitigating unethical demographic biases in machine learning methods. However, very little is done for kernel methods. In this paper, we propose a new fair kernel regression method via fair feature embedding (FKR-F$^2$E) in kernel space. Motivated by prior works on feature selection in kernel space and feature processing for fair machine learning, we propose to learn fair feature embedding functions that minimize demographic discrepancy of feature distributions in kernel space. Compared to the state-of-the-art fair kernel regression method and several baseline methods, we show FKR-F$^2$E achieves significantly lower prediction disparity across three real-world data sets.
Tasks Feature Selection
Published 2019-07-04
URL https://arxiv.org/abs/1907.02242v2
PDF https://arxiv.org/pdf/1907.02242v2.pdf
PWC https://paperswithcode.com/paper/fair-kernel-regression-via-fair-feature
Repo https://github.com/aokray/FKRFFE
Framework none

Learning Fixed Points in Generative Adversarial Networks: From Image-to-Image Translation to Disease Detection and Localization

Title Learning Fixed Points in Generative Adversarial Networks: From Image-to-Image Translation to Disease Detection and Localization
Authors Md Mahfuzur Rahman Siddiquee, Zongwei Zhou, Nima Tajbakhsh, Ruibin Feng, Michael B. Gotway, Yoshua Bengio, Jianming Liang
Abstract Generative adversarial networks (GANs) have ushered in a revolution in image-to-image translation. The development and proliferation of GANs raises an interesting question: can we train a GAN to remove an object, if present, from an image while otherwise preserving the image? Specifically, can a GAN “virtually heal” anyone by turning his medical image, with an unknown health status (diseased or healthy), into a healthy one, so that diseased regions could be revealed by subtracting those two images? Such a task requires a GAN to identify a minimal subset of target pixels for domain translation, an ability that we call fixed-point translation, which no GAN is equipped with yet. Therefore, we propose a new GAN, called Fixed-Point GAN, trained by (1) supervising same-domain translation through a conditional identity loss, and (2) regularizing cross-domain translation through revised adversarial, domain classification, and cycle consistency loss. Based on fixed-point translation, we further derive a novel framework for disease detection and localization using only image-level annotation. Qualitative and quantitative evaluations demonstrate that the proposed method outperforms the state of the art in multi-domain image-to-image translation and that it surpasses predominant weakly-supervised localization methods in both disease detection and localization. Implementation is available at https://github.com/jlianglab/Fixed-Point-GAN.
Tasks Image-to-Image Translation
Published 2019-08-16
URL https://arxiv.org/abs/1908.06965v2
PDF https://arxiv.org/pdf/1908.06965v2.pdf
PWC https://paperswithcode.com/paper/learning-fixed-points-in-generative
Repo https://github.com/jlianglab/Fixed-Point-GAN
Framework pytorch

Probabilistic Reconstruction Networks for 3D Shape Inference from a Single Image

Title Probabilistic Reconstruction Networks for 3D Shape Inference from a Single Image
Authors Roman Klokov, Jakob Verbeek, Edmond Boyer
Abstract We study end-to-end learning strategies for 3D shape inference from images, in particular from a single image. Several approaches in this direction have been investigated that explore different shape representations and suitable learning architectures. We focus instead on the underlying probabilistic mechanisms involved and contribute a more principled probabilistic inference-based reconstruction framework, which we coin Probabilistic Reconstruction Networks. This framework expresses image conditioned 3D shape inference through a family of latent variable models, and naturally decouples the choice of shape representations from the inference itself. Moreover, it suggests different options for the image conditioning and allows training in two regimes, using either Monte Carlo or variational approximation of the marginal likelihood. Using our Probabilistic Reconstruction Networks we obtain single image 3D reconstruction results that set a new state of the art on the ShapeNet dataset in terms of the intersection over union and earth mover’s distance evaluation metrics. Interestingly, we obtain these results using a basic voxel grid representation, improving over recent work based on finer point cloud or mesh based representations.
Tasks 3D Reconstruction, Latent Variable Models
Published 2019-08-20
URL https://arxiv.org/abs/1908.07475v1
PDF https://arxiv.org/pdf/1908.07475v1.pdf
PWC https://paperswithcode.com/paper/probabilistic-reconstruction-networks-for-3d
Repo https://github.com/Regenerator/prns
Framework pytorch

Learning Belief Representations for Imitation Learning in POMDPs

Title Learning Belief Representations for Imitation Learning in POMDPs
Authors Tanmay Gangwani, Joel Lehman, Qiang Liu, Jian Peng
Abstract We consider the problem of imitation learning from expert demonstrations in partially observable Markov decision processes (POMDPs). Belief representations, which characterize the distribution over the latent states in a POMDP, have been modeled using recurrent neural networks and probabilistic latent variable models, and shown to be effective for reinforcement learning in POMDPs. In this work, we investigate the belief representation learning problem for generative adversarial imitation learning in POMDPs. Instead of training the belief module and the policy separately as suggested in prior work, we learn the belief module jointly with the policy, using a task-aware imitation loss to ensure that the representation is more aligned with the policy’s objective. To improve robustness of representation, we introduce several informative belief regularization techniques, including multi-step prediction of dynamics and action-sequences. Evaluated on various partially observable continuous-control locomotion tasks, our belief-module imitation learning approach (BMIL) substantially outperforms several baselines, including the original GAIL algorithm and the task-agnostic belief learning algorithm. Extensive ablation analysis indicates the effectiveness of task-aware belief learning and belief regularization.
Tasks Continuous Control, Imitation Learning, Latent Variable Models, Representation Learning
Published 2019-06-22
URL https://arxiv.org/abs/1906.09510v1
PDF https://arxiv.org/pdf/1906.09510v1.pdf
PWC https://paperswithcode.com/paper/learning-belief-representations-for-imitation
Repo https://github.com/tgangwani/BMIL
Framework pytorch

Scraping Social Media Photos Posted in Kenya and Elsewhere to Detect and Analyze Food Types

Title Scraping Social Media Photos Posted in Kenya and Elsewhere to Detect and Analyze Food Types
Authors Kaihong Wang, Mona Jalal, Sankara Jefferson, Yi Zheng, Elaine O. Nsoesie, Margrit Betke
Abstract Monitoring population-level changes in diet could be useful for education and for implementing interventions to improve health. Research has shown that data from social media sources can be used for monitoring dietary behavior. We propose a scrape-by-location methodology to create food image datasets from Instagram posts. We used it to collect 3.56 million images over a period of 20 days in March 2019. We also propose a scrape-by-keywords methodology and used it to scrape ~30,000 images and their captions of 38 Kenyan food types. We publish two datasets of 104,000 and 8,174 image/caption pairs, respectively. With the first dataset, Kenya104K, we train a Kenyan Food Classifier, called KenyanFC, to distinguish Kenyan food from non-food images posted in Kenya. We used the second dataset, KenyanFood13, to train a classifier KenyanFTR, short for Kenyan Food Type Recognizer, to recognize 13 popular food types in Kenya. The KenyanFTR is a multimodal deep neural network that can identify 13 types of Kenyan foods using both images and their corresponding captions. Experiments show that the average top-1 accuracy of KenyanFC is 99% over 10,400 tested Instagram images and of KenyanFTR is 81% over 8,174 tested data points. Ablation studies show that three of the 13 food types are particularly difficult to categorize based on image content only and that adding analysis of captions to the image analysis yields a classifier that is 9 percent points more accurate than a classifier that relies only on images. Our food trend analysis revealed that cakes and roasted meats were the most popular foods in photographs on Instagram in Kenya in March 2019.
Tasks
Published 2019-08-31
URL https://arxiv.org/abs/1909.00134v1
PDF https://arxiv.org/pdf/1909.00134v1.pdf
PWC https://paperswithcode.com/paper/scraping-social-media-photos-posted-in-kenya
Repo https://github.com/monajalal/Kenyan-Food
Framework pytorch

Large-Scale Answerer in Questioner’s Mind for Visual Dialog Question Generation

Title Large-Scale Answerer in Questioner’s Mind for Visual Dialog Question Generation
Authors Sang-Woo Lee, Tong Gao, Sohee Yang, Jaejun Yoo, Jung-Woo Ha
Abstract Answerer in Questioner’s Mind (AQM) is an information-theoretic framework that has been recently proposed for task-oriented dialog systems. AQM benefits from asking a question that would maximize the information gain when it is asked. However, due to its intrinsic nature of explicitly calculating the information gain, AQM has a limitation when the solution space is very large. To address this, we propose AQM+ that can deal with a large-scale problem and ask a question that is more coherent to the current context of the dialog. We evaluate our method on GuessWhich, a challenging task-oriented visual dialog problem, where the number of candidate classes is near 10K. Our experimental results and ablation studies show that AQM+ outperforms the state-of-the-art models by a remarkable margin with a reasonable approximation. In particular, the proposed AQM+ reduces more than 60% of error as the dialog proceeds, while the comparative algorithms diminish the error by less than 6%. Based on our results, we argue that AQM+ is a general task-oriented dialog algorithm that can be applied for non-yes-or-no responses.
Tasks Question Generation, Visual Dialog
Published 2019-02-22
URL http://arxiv.org/abs/1902.08355v1
PDF http://arxiv.org/pdf/1902.08355v1.pdf
PWC https://paperswithcode.com/paper/large-scale-answerer-in-questioners-mind-for
Repo https://github.com/naver/aqm-plus
Framework pytorch

Addressing Semantic Drift in Question Generation for Semi-Supervised Question Answering

Title Addressing Semantic Drift in Question Generation for Semi-Supervised Question Answering
Authors Shiyue Zhang, Mohit Bansal
Abstract Text-based Question Generation (QG) aims at generating natural and relevant questions that can be answered by a given answer in some context. Existing QG models suffer from a “semantic drift” problem, i.e., the semantics of the model-generated question drifts away from the given context and answer. In this paper, we first propose two semantics-enhanced rewards obtained from downstream question paraphrasing and question answering tasks to regularize the QG model to generate semantically valid questions. Second, since the traditional evaluation metrics (e.g., BLEU) often fall short in evaluating the quality of generated questions, we propose a QA-based evaluation method which measures the QG model’s ability to mimic human annotators in generating QA training data. Experiments show that our method achieves the new state-of-the-art performance w.r.t. traditional metrics, and also performs best on our QA-based evaluation metrics. Further, we investigate how to use our QG model to augment QA datasets and enable semi-supervised QA. We propose two ways to generate synthetic QA pairs: generate new questions from existing articles or collect QA pairs from new articles. We also propose two empirically effective strategies, a data filter and mixing mini-batch training, to properly use the QG-generated data for QA. Experiments show that our method improves over both BiDAF and BERT QA baselines, even without introducing new articles.
Tasks Question Answering, Question Generation
Published 2019-09-13
URL https://arxiv.org/abs/1909.06356v1
PDF https://arxiv.org/pdf/1909.06356v1.pdf
PWC https://paperswithcode.com/paper/addressing-semantic-drift-in-question
Repo https://github.com/ZhangShiyue/QGforQA
Framework tf

Let’s Ask Again: Refine Network for Automatic Question Generation

Title Let’s Ask Again: Refine Network for Automatic Question Generation
Authors Preksha Nema, Akash Kumar Mohankumar, Mitesh M. Khapra, Balaji Vasan Srinivasan, Balaraman Ravindran
Abstract In this work, we focus on the task of Automatic Question Generation (AQG) where given a passage and an answer the task is to generate the corresponding question. It is desired that the generated question should be (i) grammatically correct (ii) answerable from the passage and (iii) specific to the given answer. An analysis of existing AQG models shows that they produce questions which do not adhere to one or more of {the above-mentioned qualities}. In particular, the generated questions look like an incomplete draft of the desired question with a clear scope for refinement. {To alleviate this shortcoming}, we propose a method which tries to mimic the human process of generating questions by first creating an initial draft and then refining it. More specifically, we propose Refine Network (RefNet) which contains two decoders. The second decoder uses a dual attention network which pays attention to both (i) the original passage and (ii) the question (initial draft) generated by the first decoder. In effect, it refines the question generated by the first decoder, thereby making it more correct and complete. We evaluate RefNet on three datasets, \textit{viz.}, SQuAD, HOTPOT-QA, and DROP, and show that it outperforms existing state-of-the-art methods by 7-16% on all of these datasets. Lastly, we show that we can improve the quality of the second decoder on specific metrics, such as, fluency and answerability by explicitly rewarding revisions that improve on the corresponding metric during training. The code has been made publicly available \footnote{https://github.com/PrekshaNema25/RefNet-QG}
Tasks Question Generation
Published 2019-08-31
URL https://arxiv.org/abs/1909.05355v1
PDF https://arxiv.org/pdf/1909.05355v1.pdf
PWC https://paperswithcode.com/paper/lets-ask-again-refine-network-for-automatic
Repo https://github.com/PrekshaNema25/RefNet-QG
Framework tf

Acceleration of expensive computations in Bayesian statistics using vector operations

Title Acceleration of expensive computations in Bayesian statistics using vector operations
Authors David J. Warne, Scott A. Sisson, Christopher Drovandi
Abstract Many applications in Bayesian statistics are extremely computationally intensive. However, they are also often inherently parallel, making them prime targets for modern massively parallel central processing unit (CPU) architectures. While the use of multi-core and distributed computing is widely applied in the Bayesian community, very little attention has been given to fine-grain parallelisation using single instruction multiple data (SIMD) operations that are available on most modern commodity CPUs. Rather, most fine-grain tuning in the literature has centred around general purpose graphics processing units (GPGPUs). Since the effective utilisation of GPGPUs typically requires specialised programming languages, such technologies are not ideal for the wider Bayesian community. In this work, we practically demonstrate, using standard programming libraries, the utility of the SIMD approach for several topical Bayesian applications. In particular, we consider sampling of the prior predictive distribution for approximate Bayesian computation (ABC), the computation of Bayesian $p$-values for testing prior weak informativeness, and inference on a computationally challenging econometrics model. Through minor code alterations, we show that SIMD operations can improve the floating point arithmetic performance resulting in up to $6\times$ improvement in the overall serial algorithm performance. Furthermore $4$-way parallel versions can lead to almost $19\times$ improvement over a na"{i}ve serial implementation. We illustrate the potential of SIMD operations for accelerating Bayesian computations and provide the reader with essential implementation techniques required to exploit modern massively parallel processing environments using standard software development tools.
Tasks
Published 2019-02-25
URL https://arxiv.org/abs/1902.09046v2
PDF https://arxiv.org/pdf/1902.09046v2.pdf
PWC https://paperswithcode.com/paper/acceleration-of-expensive-computations-in
Repo https://github.com/davidwarne/Bayesian_SIMD_examples
Framework none

Contextual Recurrent Neural Networks

Title Contextual Recurrent Neural Networks
Authors Sam Wenke, Jim Fleming
Abstract There is an implicit assumption that by unfolding recurrent neural networks (RNN) in finite time, the misspecification of choosing a zero value for the initial hidden state is mitigated by later time steps. This assumption has been shown to work in practice and alternative initialization may be suggested but often overlooked. In this paper, we propose a method of parameterizing the initial hidden state of an RNN. The resulting architecture, referred to as a Contextual RNN, can be trained end-to-end. The performance on an associative retrieval task is found to improve by conditioning the RNN initial hidden state on contextual information from the input sequence. Furthermore, we propose a novel method of conditionally generating sequences using the hidden state parameterization of Contextual RNN.
Tasks
Published 2019-02-09
URL http://arxiv.org/abs/1902.03455v1
PDF http://arxiv.org/pdf/1902.03455v1.pdf
PWC https://paperswithcode.com/paper/contextual-recurrent-neural-networks
Repo https://github.com/fomorians/contextual_rnn
Framework tf

The Weighted Tsetlin Machine: Compressed Representations with Weighted Clauses

Title The Weighted Tsetlin Machine: Compressed Representations with Weighted Clauses
Authors Adrian Phoulady, Ole-Christoffer Granmo, Saeed Rahimi Gorji, Hady Ahmady Phoulady
Abstract The Tsetlin Machine (TM) is an interpretable mechanism for pattern recognition that constructs conjunctive clauses from data. The clauses capture frequent patterns with high discriminating power, providing increasing expression power with each additional clause. However, the resulting accuracy gain comes at the cost of linear growth in computation time and memory usage. In this paper, we present the Weighted Tsetlin Machine (WTM), which reduces computation time and memory usage by weighting the clauses. Real-valued weighting allows one clause to replace multiple, and supports fine-tuning the impact of each clause. Our novel scheme simultaneously learns both the composition of the clauses and their weights. Furthermore, we increase training efficiency by replacing $k$ Bernoulli trials of success probability $p$ with a uniform sample of average size $p k$, the size drawn from a binomial distribution. In our empirical evaluation, the WTM achieved the same accuracy as the TM on MNIST, IMDb, and Connect-4, requiring only $1/4$, $1/3$, and $1/50$ of the clauses, respectively. With the same number of clauses, the WTM outperformed the TM, obtaining peak test accuracies of respectively $98.63%$, $90.37%$, and $87.91%$. Finally, our novel sampling scheme reduced sample generation time by a factor of $7$.
Tasks
Published 2019-11-28
URL https://arxiv.org/abs/1911.12607v4
PDF https://arxiv.org/pdf/1911.12607v4.pdf
PWC https://paperswithcode.com/paper/the-weighted-tsetlin-machine-compressed
Repo https://github.com/cair/pyTsetlinMachine
Framework none
comments powered by Disqus