October 20, 2019

2885 words 14 mins read

Paper Group AWR 347

On Extensions of CLEVER: A Neural Network Robustness Evaluation Algorithm. Discrete Autoencoders for Sequence Models. Measuring the quality of Synthetic data for use in competitions. BayesGrad: Explaining Predictions of Graph Convolutional Networks. Guided Neural Language Generation for Abstractive Summarization using Abstract Meaning Representatio …

On Extensions of CLEVER: A Neural Network Robustness Evaluation Algorithm


Title	On Extensions of CLEVER: A Neural Network Robustness Evaluation Algorithm
Authors	Tsui-Wei Weng, Huan Zhang, Pin-Yu Chen, Aurelie Lozano, Cho-Jui Hsieh, Luca Daniel
Abstract	CLEVER (Cross-Lipschitz Extreme Value for nEtwork Robustness) is an Extreme Value Theory (EVT) based robustness score for large-scale deep neural networks (DNNs). In this paper, we propose two extensions on this robustness score. First, we provide a new formal robustness guarantee for classifier functions that are twice differentiable. We apply extreme value theory on the new formal robustness guarantee and the estimated robustness is called second-order CLEVER score. Second, we discuss how to handle gradient masking, a common defensive technique, using CLEVER with Backward Pass Differentiable Approximation (BPDA). With BPDA applied, CLEVER can evaluate the intrinsic robustness of neural networks of a broader class – networks with non-differentiable input transformations. We demonstrate the effectiveness of CLEVER with BPDA in experiments on a 121-layer Densenet model trained on the ImageNet dataset.
Tasks
Published	2018-10-19
URL	http://arxiv.org/abs/1810.08640v1
PDF	http://arxiv.org/pdf/1810.08640v1.pdf
PWC	https://paperswithcode.com/paper/on-extensions-of-clever-a-neural-network
Repo	https://github.com/huanzhang12/CLEVER
Framework	tf

Discrete Autoencoders for Sequence Models


Title	Discrete Autoencoders for Sequence Models
Authors	Łukasz Kaiser, Samy Bengio
Abstract	Recurrent models for sequences have been recently successful at many tasks, especially for language modeling and machine translation. Nevertheless, it remains challenging to extract good representations from these models. For instance, even though language has a clear hierarchical structure going from characters through words to sentences, it is not apparent in current language models. We propose to improve the representation in sequence models by augmenting current approaches with an autoencoder that is forced to compress the sequence through an intermediate discrete latent space. In order to propagate gradients though this discrete representation we introduce an improved semantic hashing technique. We show that this technique performs well on a newly proposed quantitative efficiency measure. We also analyze latent codes produced by the model showing how they correspond to words and phrases. Finally, we present an application of the autoencoder-augmented model to generating diverse translations.
Tasks	Language Modelling, Machine Translation
Published	2018-01-29
URL	http://arxiv.org/abs/1801.09797v1
PDF	http://arxiv.org/pdf/1801.09797v1.pdf
PWC	https://paperswithcode.com/paper/discrete-autoencoders-for-sequence-models
Repo	https://github.com/tensorflow/tensor2tensor
Framework	tf

Measuring the quality of Synthetic data for use in competitions


Title	Measuring the quality of Synthetic data for use in competitions
Authors	James Jordon, Jinsung Yoon, Mihaela van der Schaar
Abstract	Machine learning has the potential to assist many communities in using the large datasets that are becoming more and more available. Unfortunately, much of that potential is not being realized because it would require sharing data in a way that compromises privacy. In order to overcome this hurdle, several methods have been proposed that generate synthetic data while preserving the privacy of the real data. In this paper we consider a key characteristic that synthetic data should have in order to be useful for machine learning researchers - the relative performance of two algorithms (trained and tested) on the synthetic dataset should be the same as their relative performance (when trained and tested) on the original dataset.
Tasks
Published	2018-06-29
URL	http://arxiv.org/abs/1806.11345v1
PDF	http://arxiv.org/pdf/1806.11345v1.pdf
PWC	https://paperswithcode.com/paper/measuring-the-quality-of-synthetic-data-for
Repo	https://github.com/jsyoon0823/SRA_TSTR
Framework	none

BayesGrad: Explaining Predictions of Graph Convolutional Networks


Title	BayesGrad: Explaining Predictions of Graph Convolutional Networks
Authors	Hirotaka Akita, Kosuke Nakago, Tomoki Komatsu, Yohei Sugawara, Shin-ichi Maeda, Yukino Baba, Hisashi Kashima
Abstract	Recent advances in graph convolutional networks have significantly improved the performance of chemical predictions, raising a new research question: “how do we explain the predictions of graph convolutional networks?” A possible approach to answer this question is to visualize evidence substructures responsible for the predictions. For chemical property prediction tasks, the sample size of the training data is often small and/or a label imbalance problem occurs, where a few samples belong to a single class and the majority of samples belong to the other classes. This can lead to uncertainty related to the learned parameters of the machine learning model. To address this uncertainty, we propose BayesGrad, utilizing the Bayesian predictive distribution, to define the importance of each node in an input graph, which is computed efficiently using the dropout technique. We demonstrate that BayesGrad successfully visualizes the substructures responsible for the label prediction in the artificial experiment, even when the sample size is small. Furthermore, we use a real dataset to evaluate the effectiveness of the visualization. The basic idea of BayesGrad is not limited to graph-structured data and can be applied to other data types.
Tasks
Published	2018-07-04
URL	http://arxiv.org/abs/1807.01985v1
PDF	http://arxiv.org/pdf/1807.01985v1.pdf
PWC	https://paperswithcode.com/paper/bayesgrad-explaining-predictions-of-graph
Repo	https://github.com/pfnet-research/chainer-chemistry
Framework	none

Guided Neural Language Generation for Abstractive Summarization using Abstract Meaning Representation


Title	Guided Neural Language Generation for Abstractive Summarization using Abstract Meaning Representation
Authors	Hardy, Andreas Vlachos
Abstract	Recent work on abstractive summarization has made progress with neural encoder-decoder architectures. However, such models are often challenged due to their lack of explicit semantic modeling of the source document and its summary. In this paper, we extend previous work on abstractive summarization using Abstract Meaning Representation (AMR) with a neural language generation stage which we guide using the source document. We demonstrate that this guidance improves summarization results by 7.4 and 10.5 points in ROUGE-2 using gold standard AMR parses and parses obtained from an off-the-shelf parser respectively. We also find that the summarization performance using the latter is 2 ROUGE-2 points higher than that of a well-established neural encoder-decoder approach trained on a larger dataset. Code is available at \url{https://github.com/sheffieldnlp/AMR2Text-summ}
Tasks	Abstractive Text Summarization, Text Generation
Published	2018-08-28
URL	http://arxiv.org/abs/1808.09160v1
PDF	http://arxiv.org/pdf/1808.09160v1.pdf
PWC	https://paperswithcode.com/paper/guided-neural-language-generation-for
Repo	https://github.com/sheffieldnlp/AMR2Text-summ
Framework	pytorch

Mining Novel Multivariate Relationships in Time Series Data Using Correlation Networks


Title	Mining Novel Multivariate Relationships in Time Series Data Using Correlation Networks
Authors	Saurabh Agrawal, Michael Steinbach, Daniel Boley, Snigdhansu Chatterjee, Gowtham Atluri, Anh The Dang, Stefan Liess, Vipin Kumar
Abstract	In many domains, there is significant interest in capturing novel relationships between time series that represent activities recorded at different nodes of a highly complex system. In this paper, we introduce multipoles, a novel class of linear relationships between more than two time series. A multipole is a set of time series that have strong linear dependence among themselves, with the requirement that each time series makes a significant contribution to the linear dependence. We demonstrate that most interesting multipoles can be identified as cliques of negative correlations in a correlation network. Such cliques are typically rare in a real-world correlation network, which allows us to find almost all multipoles efficiently using a clique-enumeration approach. Using our proposed framework, we demonstrate the utility of multipoles in discovering new physical phenomena in two scientific domains: climate science and neuroscience. In particular, we discovered several multipole relationships that are reproducible in multiple other independent datasets and lead to novel domain insights.
Tasks	Time Series
Published	2018-10-06
URL	http://arxiv.org/abs/1810.02950v2
PDF	http://arxiv.org/pdf/1810.02950v2.pdf
PWC	https://paperswithcode.com/paper/mining-novel-multivariate-relationships-in
Repo	https://github.com/15saurabh16/Multipoles
Framework	none

Deep Dyna-Q: Integrating Planning for Task-Completion Dialogue Policy Learning


Title	Deep Dyna-Q: Integrating Planning for Task-Completion Dialogue Policy Learning
Authors	Baolin Peng, Xiujun Li, Jianfeng Gao, Jingjing Liu, Kam-Fai Wong, Shang-Yu Su
Abstract	Training a task-completion dialogue agent via reinforcement learning (RL) is costly because it requires many interactions with real users. One common alternative is to use a user simulator. However, a user simulator usually lacks the language complexity of human interlocutors and the biases in its design may tend to degrade the agent. To address these issues, we present Deep Dyna-Q, which to our knowledge is the first deep RL framework that integrates planning for task-completion dialogue policy learning. We incorporate into the dialogue agent a model of the environment, referred to as the world model, to mimic real user response and generate simulated experience. During dialogue policy learning, the world model is constantly updated with real user experience to approach real user behavior, and in turn, the dialogue agent is optimized using both real experience and simulated experience. The effectiveness of our approach is demonstrated on a movie-ticket booking task in both simulated and human-in-the-loop settings.
Tasks	Task-Completion Dialogue Policy Learning
Published	2018-01-18
URL	http://arxiv.org/abs/1801.06176v3
PDF	http://arxiv.org/pdf/1801.06176v3.pdf
PWC	https://paperswithcode.com/paper/deep-dyna-q-integrating-planning-for-task
Repo	https://github.com/bagequan/MS-BCS-DDQ
Framework	none

Art2Real: Unfolding the Reality of Artworks via Semantically-Aware Image-to-Image Translation


Title	Art2Real: Unfolding the Reality of Artworks via Semantically-Aware Image-to-Image Translation
Authors	Matteo Tomei, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara
Abstract	The applicability of computer vision to real paintings and artworks has been rarely investigated, even though a vast heritage would greatly benefit from techniques which can understand and process data from the artistic domain. This is partially due to the small amount of annotated artistic data, which is not even comparable to that of natural images captured by cameras. In this paper, we propose a semantic-aware architecture which can translate artworks to photo-realistic visualizations, thus reducing the gap between visual features of artistic and realistic data. Our architecture can generate natural images by retrieving and learning details from real photos through a similarity matching strategy which leverages a weakly-supervised semantic understanding of the scene. Experimental results show that the proposed technique leads to increased realism and to a reduction in domain shift, which improves the performance of pre-trained architectures for classification, detection, and segmentation. Code is publicly available at: https://github.com/aimagelab/art2real.
Tasks	Image-to-Image Translation
Published	2018-11-26
URL	https://arxiv.org/abs/1811.10666v3
PDF	https://arxiv.org/pdf/1811.10666v3.pdf
PWC	https://paperswithcode.com/paper/art2real-unfolding-the-reality-of-artworks
Repo	https://github.com/aimagelab/art2real
Framework	pytorch

Introducing Neuromodulation in Deep Neural Networks to Learn Adaptive Behaviours


Title	Introducing Neuromodulation in Deep Neural Networks to Learn Adaptive Behaviours
Authors	Nicolas Vecoven, Damien Ernst, Antoine Wehenkel, Guillaume Drion
Abstract	Animals excel at adapting their intentions, attention, and actions to the environment, making them remarkably efficient at interacting with a rich, unpredictable and ever-changing external world, a property that intelligent machines currently lack. Such an adaptation property relies heavily on cellular neuromodulation, the biological mechanism that dynamically controls intrinsic properties of neurons and their response to external stimuli in a context-dependent manner. In this paper, we take inspiration from cellular neuromodulation to construct a new deep neural network architecture that is specifically designed to learn adaptive behaviours. The network adaptation capabilities are tested on navigation benchmarks in a meta-reinforcement learning context and compared with state-of-the-art approaches. Results show that neuromodulation is capable of adapting an agent to different tasks and that neuromodulation-based approaches provide a promising way of improving adaptation of artificial systems.
Tasks
Published	2018-12-21
URL	https://arxiv.org/abs/1812.09113v3
PDF	https://arxiv.org/pdf/1812.09113v3.pdf
PWC	https://paperswithcode.com/paper/introducing-neuromodulation-in-deep-neural
Repo	https://github.com/nvecoven/nmd_net
Framework	tf

Expert-augmented actor-critic for ViZDoom and Montezumas Revenge


Title	Expert-augmented actor-critic for ViZDoom and Montezumas Revenge
Authors	Michał Garmulewicz, Henryk Michalewski, Piotr Miłoś
Abstract	We propose an expert-augmented actor-critic algorithm, which we evaluate on two environments with sparse rewards: Montezumas Revenge and a demanding maze from the ViZDoom suite. In the case of Montezumas Revenge, an agent trained with our method achieves very good results consistently scoring above 27,000 points (in many experiments beating the first world). With an appropriate choice of hyperparameters, our algorithm surpasses the performance of the expert data. In a number of experiments, we have observed an unreported bug in Montezumas Revenge which allowed the agent to score more than 800,000 points.
Tasks
Published	2018-09-10
URL	http://arxiv.org/abs/1809.03447v1
PDF	http://arxiv.org/pdf/1809.03447v1.pdf
PWC	https://paperswithcode.com/paper/expert-augmented-actor-critic-for-vizdoom-and
Repo	https://github.com/ghostFaceKillah/expert-visualisation
Framework	tf

Var-CNN: A Data-Efficient Website Fingerprinting Attack Based on Deep Learning


Title	Var-CNN: A Data-Efficient Website Fingerprinting Attack Based on Deep Learning
Authors	Sanjit Bhat, David Lu, Albert Kwon, Srinivas Devadas
Abstract	In recent years, there have been several works that use website fingerprinting techniques to enable a local adversary to determine which website a Tor user visits. While the current state-of-the-art attack, which uses deep learning, outperforms prior art with medium to large amounts of data, it attains marginal to no accuracy improvements when both use small amounts of training data. In this work, we propose Var-CNN, a website fingerprinting attack that leverages deep learning techniques along with novel insights specific to packet sequence classification. In open-world settings with large amounts of data, Var-CNN attains over $1%$ higher true positive rate (TPR) than state-of-the-art attacks while achieving $4\times$ lower false positive rate (FPR). Var-CNN’s improvements are especially notable in low-data scenarios, where it reduces the FPR of prior art by $3.12%$ while increasing the TPR by $13%$. Overall, insights used to develop Var-CNN can be applied to future deep learning based attacks, and substantially reduce the amount of training data needed to perform a successful website fingerprinting attack. This shortens the time needed for data collection and lowers the likelihood of having data staleness issues.
Tasks
Published	2018-02-28
URL	https://arxiv.org/abs/1802.10215v2
PDF	https://arxiv.org/pdf/1802.10215v2.pdf
PWC	https://paperswithcode.com/paper/var-cnn-and-dynaflow-improved-attacks-and
Repo	https://github.com/sanjit-bhat/Var-CNN--DynaFlow
Framework	tf

Systematic Weight Pruning of DNNs using Alternating Direction Method of Multipliers


Title	Systematic Weight Pruning of DNNs using Alternating Direction Method of Multipliers
Authors	Tianyun Zhang, Shaokai Ye, Yipeng Zhang, Yanzhi Wang, Makan Fardad
Abstract	We present a systematic weight pruning framework of deep neural networks (DNNs) using the alternating direction method of multipliers (ADMM). We first formulate the weight pruning problem of DNNs as a constrained nonconvex optimization problem, and then adopt the ADMM framework for systematic weight pruning. We show that ADMM is highly suitable for weight pruning due to the computational efficiency it offers. We achieve a much higher compression ratio compared with prior work while maintaining the same test accuracy, together with a faster convergence rate. Our models are released at https://github.com/KaiqiZhang/admm-pruning
Tasks
Published	2018-02-15
URL	http://arxiv.org/abs/1802.05747v2
PDF	http://arxiv.org/pdf/1802.05747v2.pdf
PWC	https://paperswithcode.com/paper/systematic-weight-pruning-of-dnns-using
Repo	https://github.com/KaiqiZhang/admm-pruning
Framework	tf

Meta-Reinforcement Learning of Structured Exploration Strategies


Title	Meta-Reinforcement Learning of Structured Exploration Strategies
Authors	Abhishek Gupta, Russell Mendonca, YuXuan Liu, Pieter Abbeel, Sergey Levine
Abstract	Exploration is a fundamental challenge in reinforcement learning (RL). Many of the current exploration methods for deep RL use task-agnostic objectives, such as information gain or bonuses based on state visitation. However, many practical applications of RL involve learning more than a single task, and prior tasks can be used to inform how exploration should be performed in new tasks. In this work, we explore how prior tasks can inform an agent about how to explore effectively in new situations. We introduce a novel gradient-based fast adaptation algorithm – model agnostic exploration with structured noise (MAESN) – to learn exploration strategies from prior experience. The prior experience is used both to initialize a policy and to acquire a latent exploration space that can inject structured stochasticity into a policy, producing exploration strategies that are informed by prior knowledge and are more effective than random action-space noise. We show that MAESN is more effective at learning exploration strategies when compared to prior meta-RL methods, RL without learned exploration strategies, and task-agnostic exploration methods. We evaluate our method on a variety of simulated tasks: locomotion with a wheeled robot, locomotion with a quadrupedal walker, and object manipulation.
Tasks
Published	2018-02-20
URL	http://arxiv.org/abs/1802.07245v1
PDF	http://arxiv.org/pdf/1802.07245v1.pdf
PWC	https://paperswithcode.com/paper/meta-reinforcement-learning-of-structured
Repo	https://github.com/llan-ml/tesp
Framework	tf

An Unsupervised Word Sense Disambiguation System for Under-Resourced Languages


Title	An Unsupervised Word Sense Disambiguation System for Under-Resourced Languages
Authors	Dmitry Ustalov, Denis Teslenko, Alexander Panchenko, Mikhail Chernoskutov, Chris Biemann, Simone Paolo Ponzetto
Abstract	In this paper, we present Watasense, an unsupervised system for word sense disambiguation. Given a sentence, the system chooses the most relevant sense of each input word with respect to the semantic similarity between the given sentence and the synset constituting the sense of the target word. Watasense has two modes of operation. The sparse mode uses the traditional vector space model to estimate the most similar word sense corresponding to its context. The dense mode, instead, uses synset embeddings to cope with the sparsity problem. We describe the architecture of the present system and also conduct its evaluation on three different lexical semantic resources for Russian. We found that the dense mode substantially outperforms the sparse one on all datasets according to the adjusted Rand index.
Tasks	Semantic Similarity, Semantic Textual Similarity, Word Sense Disambiguation
Published	2018-04-27
URL	http://arxiv.org/abs/1804.10686v1
PDF	http://arxiv.org/pdf/1804.10686v1.pdf
PWC	https://paperswithcode.com/paper/an-unsupervised-word-sense-disambiguation
Repo	https://github.com/nlpub/watasense
Framework	none

Visualizing Convolutional Networks for MRI-based Diagnosis of Alzheimer’s Disease


Title	Visualizing Convolutional Networks for MRI-based Diagnosis of Alzheimer’s Disease
Authors	Johannes Rieke, Fabian Eitel, Martin Weygandt, John-Dylan Haynes, Kerstin Ritter
Abstract	Visualizing and interpreting convolutional neural networks (CNNs) is an important task to increase trust in automatic medical decision making systems. In this study, we train a 3D CNN to detect Alzheimer’s disease based on structural MRI scans of the brain. Then, we apply four different gradient-based and occlusion-based visualization methods that explain the network’s classification decisions by highlighting relevant areas in the input image. We compare the methods qualitatively and quantitatively. We find that all four methods focus on brain regions known to be involved in Alzheimer’s disease, such as inferior and middle temporal gyrus. While the occlusion-based methods focus more on specific regions, the gradient-based methods pick up distributed relevance patterns. Additionally, we find that the distribution of relevance varies across patients, with some having a stronger focus on the temporal lobe, whereas for others more cortical areas are relevant. In summary, we show that applying different visualization methods is important to understand the decisions of a CNN, a step that is crucial to increase clinical impact and trust in computer-based decision support systems.
Tasks	Decision Making
Published	2018-08-08
URL	http://arxiv.org/abs/1808.02874v1
PDF	http://arxiv.org/pdf/1808.02874v1.pdf
PWC	https://paperswithcode.com/paper/visualizing-convolutional-networks-for-mri
Repo	https://github.com/jrieke/cnn-interpretability
Framework	pytorch