Paper Group AWR 347
On Extensions of CLEVER: A Neural Network Robustness Evaluation Algorithm. Discrete Autoencoders for Sequence Models. Measuring the quality of Synthetic data for use in competitions. BayesGrad: Explaining Predictions of Graph Convolutional Networks. Guided Neural Language Generation for Abstractive Summarization using Abstract Meaning Representatio …
On Extensions of CLEVER: A Neural Network Robustness Evaluation Algorithm
Title | On Extensions of CLEVER: A Neural Network Robustness Evaluation Algorithm |
Authors | Tsui-Wei Weng, Huan Zhang, Pin-Yu Chen, Aurelie Lozano, Cho-Jui Hsieh, Luca Daniel |
Abstract | CLEVER (Cross-Lipschitz Extreme Value for nEtwork Robustness) is an Extreme Value Theory (EVT) based robustness score for large-scale deep neural networks (DNNs). In this paper, we propose two extensions on this robustness score. First, we provide a new formal robustness guarantee for classifier functions that are twice differentiable. We apply extreme value theory on the new formal robustness guarantee and the estimated robustness is called second-order CLEVER score. Second, we discuss how to handle gradient masking, a common defensive technique, using CLEVER with Backward Pass Differentiable Approximation (BPDA). With BPDA applied, CLEVER can evaluate the intrinsic robustness of neural networks of a broader class – networks with non-differentiable input transformations. We demonstrate the effectiveness of CLEVER with BPDA in experiments on a 121-layer Densenet model trained on the ImageNet dataset. |
Tasks | |
Published | 2018-10-19 |
URL | http://arxiv.org/abs/1810.08640v1 |
http://arxiv.org/pdf/1810.08640v1.pdf | |
PWC | https://paperswithcode.com/paper/on-extensions-of-clever-a-neural-network |
Repo | https://github.com/huanzhang12/CLEVER |
Framework | tf |
Discrete Autoencoders for Sequence Models
Title | Discrete Autoencoders for Sequence Models |
Authors | Łukasz Kaiser, Samy Bengio |
Abstract | Recurrent models for sequences have been recently successful at many tasks, especially for language modeling and machine translation. Nevertheless, it remains challenging to extract good representations from these models. For instance, even though language has a clear hierarchical structure going from characters through words to sentences, it is not apparent in current language models. We propose to improve the representation in sequence models by augmenting current approaches with an autoencoder that is forced to compress the sequence through an intermediate discrete latent space. In order to propagate gradients though this discrete representation we introduce an improved semantic hashing technique. We show that this technique performs well on a newly proposed quantitative efficiency measure. We also analyze latent codes produced by the model showing how they correspond to words and phrases. Finally, we present an application of the autoencoder-augmented model to generating diverse translations. |
Tasks | Language Modelling, Machine Translation |
Published | 2018-01-29 |
URL | http://arxiv.org/abs/1801.09797v1 |
http://arxiv.org/pdf/1801.09797v1.pdf | |
PWC | https://paperswithcode.com/paper/discrete-autoencoders-for-sequence-models |
Repo | https://github.com/tensorflow/tensor2tensor |
Framework | tf |
Measuring the quality of Synthetic data for use in competitions
Title | Measuring the quality of Synthetic data for use in competitions |
Authors | James Jordon, Jinsung Yoon, Mihaela van der Schaar |
Abstract | Machine learning has the potential to assist many communities in using the large datasets that are becoming more and more available. Unfortunately, much of that potential is not being realized because it would require sharing data in a way that compromises privacy. In order to overcome this hurdle, several methods have been proposed that generate synthetic data while preserving the privacy of the real data. In this paper we consider a key characteristic that synthetic data should have in order to be useful for machine learning researchers - the relative performance of two algorithms (trained and tested) on the synthetic dataset should be the same as their relative performance (when trained and tested) on the original dataset. |
Tasks | |
Published | 2018-06-29 |
URL | http://arxiv.org/abs/1806.11345v1 |
http://arxiv.org/pdf/1806.11345v1.pdf | |
PWC | https://paperswithcode.com/paper/measuring-the-quality-of-synthetic-data-for |
Repo | https://github.com/jsyoon0823/SRA_TSTR |
Framework | none |
BayesGrad: Explaining Predictions of Graph Convolutional Networks
Title | BayesGrad: Explaining Predictions of Graph Convolutional Networks |
Authors | Hirotaka Akita, Kosuke Nakago, Tomoki Komatsu, Yohei Sugawara, Shin-ichi Maeda, Yukino Baba, Hisashi Kashima |
Abstract | Recent advances in graph convolutional networks have significantly improved the performance of chemical predictions, raising a new research question: “how do we explain the predictions of graph convolutional networks?” A possible approach to answer this question is to visualize evidence substructures responsible for the predictions. For chemical property prediction tasks, the sample size of the training data is often small and/or a label imbalance problem occurs, where a few samples belong to a single class and the majority of samples belong to the other classes. This can lead to uncertainty related to the learned parameters of the machine learning model. To address this uncertainty, we propose BayesGrad, utilizing the Bayesian predictive distribution, to define the importance of each node in an input graph, which is computed efficiently using the dropout technique. We demonstrate that BayesGrad successfully visualizes the substructures responsible for the label prediction in the artificial experiment, even when the sample size is small. Furthermore, we use a real dataset to evaluate the effectiveness of the visualization. The basic idea of BayesGrad is not limited to graph-structured data and can be applied to other data types. |
Tasks | |
Published | 2018-07-04 |
URL | http://arxiv.org/abs/1807.01985v1 |
http://arxiv.org/pdf/1807.01985v1.pdf | |
PWC | https://paperswithcode.com/paper/bayesgrad-explaining-predictions-of-graph |
Repo | https://github.com/pfnet-research/chainer-chemistry |
Framework | none |
Guided Neural Language Generation for Abstractive Summarization using Abstract Meaning Representation
Title | Guided Neural Language Generation for Abstractive Summarization using Abstract Meaning Representation |
Authors | Hardy, Andreas Vlachos |
Abstract | Recent work on abstractive summarization has made progress with neural encoder-decoder architectures. However, such models are often challenged due to their lack of explicit semantic modeling of the source document and its summary. In this paper, we extend previous work on abstractive summarization using Abstract Meaning Representation (AMR) with a neural language generation stage which we guide using the source document. We demonstrate that this guidance improves summarization results by 7.4 and 10.5 points in ROUGE-2 using gold standard AMR parses and parses obtained from an off-the-shelf parser respectively. We also find that the summarization performance using the latter is 2 ROUGE-2 points higher than that of a well-established neural encoder-decoder approach trained on a larger dataset. Code is available at \url{https://github.com/sheffieldnlp/AMR2Text-summ} |
Tasks | Abstractive Text Summarization, Text Generation |
Published | 2018-08-28 |
URL | http://arxiv.org/abs/1808.09160v1 |
http://arxiv.org/pdf/1808.09160v1.pdf | |
PWC | https://paperswithcode.com/paper/guided-neural-language-generation-for |
Repo | https://github.com/sheffieldnlp/AMR2Text-summ |
Framework | pytorch |
Mining Novel Multivariate Relationships in Time Series Data Using Correlation Networks
Title | Mining Novel Multivariate Relationships in Time Series Data Using Correlation Networks |
Authors | Saurabh Agrawal, Michael Steinbach, Daniel Boley, Snigdhansu Chatterjee, Gowtham Atluri, Anh The Dang, Stefan Liess, Vipin Kumar |
Abstract | In many domains, there is significant interest in capturing novel relationships between time series that represent activities recorded at different nodes of a highly complex system. In this paper, we introduce multipoles, a novel class of linear relationships between more than two time series. A multipole is a set of time series that have strong linear dependence among themselves, with the requirement that each time series makes a significant contribution to the linear dependence. We demonstrate that most interesting multipoles can be identified as cliques of negative correlations in a correlation network. Such cliques are typically rare in a real-world correlation network, which allows us to find almost all multipoles efficiently using a clique-enumeration approach. Using our proposed framework, we demonstrate the utility of multipoles in discovering new physical phenomena in two scientific domains: climate science and neuroscience. In particular, we discovered several multipole relationships that are reproducible in multiple other independent datasets and lead to novel domain insights. |
Tasks | Time Series |
Published | 2018-10-06 |
URL | http://arxiv.org/abs/1810.02950v2 |
http://arxiv.org/pdf/1810.02950v2.pdf | |
PWC | https://paperswithcode.com/paper/mining-novel-multivariate-relationships-in |
Repo | https://github.com/15saurabh16/Multipoles |
Framework | none |
Deep Dyna-Q: Integrating Planning for Task-Completion Dialogue Policy Learning
Title | Deep Dyna-Q: Integrating Planning for Task-Completion Dialogue Policy Learning |
Authors | Baolin Peng, Xiujun Li, Jianfeng Gao, Jingjing Liu, Kam-Fai Wong, Shang-Yu Su |
Abstract | Training a task-completion dialogue agent via reinforcement learning (RL) is costly because it requires many interactions with real users. One common alternative is to use a user simulator. However, a user simulator usually lacks the language complexity of human interlocutors and the biases in its design may tend to degrade the agent. To address these issues, we present Deep Dyna-Q, which to our knowledge is the first deep RL framework that integrates planning for task-completion dialogue policy learning. We incorporate into the dialogue agent a model of the environment, referred to as the world model, to mimic real user response and generate simulated experience. During dialogue policy learning, the world model is constantly updated with real user experience to approach real user behavior, and in turn, the dialogue agent is optimized using both real experience and simulated experience. The effectiveness of our approach is demonstrated on a movie-ticket booking task in both simulated and human-in-the-loop settings. |
Tasks | Task-Completion Dialogue Policy Learning |
Published | 2018-01-18 |
URL | http://arxiv.org/abs/1801.06176v3 |
http://arxiv.org/pdf/1801.06176v3.pdf | |
PWC | https://paperswithcode.com/paper/deep-dyna-q-integrating-planning-for-task |
Repo | https://github.com/bagequan/MS-BCS-DDQ |
Framework | none |
Art2Real: Unfolding the Reality of Artworks via Semantically-Aware Image-to-Image Translation
Title | Art2Real: Unfolding the Reality of Artworks via Semantically-Aware Image-to-Image Translation |
Authors | Matteo Tomei, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara |
Abstract | The applicability of computer vision to real paintings and artworks has been rarely investigated, even though a vast heritage would greatly benefit from techniques which can understand and process data from the artistic domain. This is partially due to the small amount of annotated artistic data, which is not even comparable to that of natural images captured by cameras. In this paper, we propose a semantic-aware architecture which can translate artworks to photo-realistic visualizations, thus reducing the gap between visual features of artistic and realistic data. Our architecture can generate natural images by retrieving and learning details from real photos through a similarity matching strategy which leverages a weakly-supervised semantic understanding of the scene. Experimental results show that the proposed technique leads to increased realism and to a reduction in domain shift, which improves the performance of pre-trained architectures for classification, detection, and segmentation. Code is publicly available at: https://github.com/aimagelab/art2real. |
Tasks | Image-to-Image Translation |
Published | 2018-11-26 |
URL | https://arxiv.org/abs/1811.10666v3 |
https://arxiv.org/pdf/1811.10666v3.pdf | |
PWC | https://paperswithcode.com/paper/art2real-unfolding-the-reality-of-artworks |
Repo | https://github.com/aimagelab/art2real |
Framework | pytorch |
Introducing Neuromodulation in Deep Neural Networks to Learn Adaptive Behaviours
Title | Introducing Neuromodulation in Deep Neural Networks to Learn Adaptive Behaviours |
Authors | Nicolas Vecoven, Damien Ernst, Antoine Wehenkel, Guillaume Drion |
Abstract | Animals excel at adapting their intentions, attention, and actions to the environment, making them remarkably efficient at interacting with a rich, unpredictable and ever-changing external world, a property that intelligent machines currently lack. Such an adaptation property relies heavily on cellular neuromodulation, the biological mechanism that dynamically controls intrinsic properties of neurons and their response to external stimuli in a context-dependent manner. In this paper, we take inspiration from cellular neuromodulation to construct a new deep neural network architecture that is specifically designed to learn adaptive behaviours. The network adaptation capabilities are tested on navigation benchmarks in a meta-reinforcement learning context and compared with state-of-the-art approaches. Results show that neuromodulation is capable of adapting an agent to different tasks and that neuromodulation-based approaches provide a promising way of improving adaptation of artificial systems. |
Tasks | |
Published | 2018-12-21 |
URL | https://arxiv.org/abs/1812.09113v3 |
https://arxiv.org/pdf/1812.09113v3.pdf | |
PWC | https://paperswithcode.com/paper/introducing-neuromodulation-in-deep-neural |
Repo | https://github.com/nvecoven/nmd_net |
Framework | tf |
Expert-augmented actor-critic for ViZDoom and Montezumas Revenge
Title | Expert-augmented actor-critic for ViZDoom and Montezumas Revenge |
Authors | Michał Garmulewicz, Henryk Michalewski, Piotr Miłoś |
Abstract | We propose an expert-augmented actor-critic algorithm, which we evaluate on two environments with sparse rewards: Montezumas Revenge and a demanding maze from the ViZDoom suite. In the case of Montezumas Revenge, an agent trained with our method achieves very good results consistently scoring above 27,000 points (in many experiments beating the first world). With an appropriate choice of hyperparameters, our algorithm surpasses the performance of the expert data. In a number of experiments, we have observed an unreported bug in Montezumas Revenge which allowed the agent to score more than 800,000 points. |
Tasks | |
Published | 2018-09-10 |
URL | http://arxiv.org/abs/1809.03447v1 |
http://arxiv.org/pdf/1809.03447v1.pdf | |
PWC | https://paperswithcode.com/paper/expert-augmented-actor-critic-for-vizdoom-and |
Repo | https://github.com/ghostFaceKillah/expert-visualisation |
Framework | tf |
Var-CNN: A Data-Efficient Website Fingerprinting Attack Based on Deep Learning
Title | Var-CNN: A Data-Efficient Website Fingerprinting Attack Based on Deep Learning |
Authors | Sanjit Bhat, David Lu, Albert Kwon, Srinivas Devadas |
Abstract | In recent years, there have been several works that use website fingerprinting techniques to enable a local adversary to determine which website a Tor user visits. While the current state-of-the-art attack, which uses deep learning, outperforms prior art with medium to large amounts of data, it attains marginal to no accuracy improvements when both use small amounts of training data. In this work, we propose Var-CNN, a website fingerprinting attack that leverages deep learning techniques along with novel insights specific to packet sequence classification. In open-world settings with large amounts of data, Var-CNN attains over $1%$ higher true positive rate (TPR) than state-of-the-art attacks while achieving $4\times$ lower false positive rate (FPR). Var-CNN’s improvements are especially notable in low-data scenarios, where it reduces the FPR of prior art by $3.12%$ while increasing the TPR by $13%$. Overall, insights used to develop Var-CNN can be applied to future deep learning based attacks, and substantially reduce the amount of training data needed to perform a successful website fingerprinting attack. This shortens the time needed for data collection and lowers the likelihood of having data staleness issues. |
Tasks | |
Published | 2018-02-28 |
URL | https://arxiv.org/abs/1802.10215v2 |
https://arxiv.org/pdf/1802.10215v2.pdf | |
PWC | https://paperswithcode.com/paper/var-cnn-and-dynaflow-improved-attacks-and |
Repo | https://github.com/sanjit-bhat/Var-CNN--DynaFlow |
Framework | tf |
Systematic Weight Pruning of DNNs using Alternating Direction Method of Multipliers
Title | Systematic Weight Pruning of DNNs using Alternating Direction Method of Multipliers |
Authors | Tianyun Zhang, Shaokai Ye, Yipeng Zhang, Yanzhi Wang, Makan Fardad |
Abstract | We present a systematic weight pruning framework of deep neural networks (DNNs) using the alternating direction method of multipliers (ADMM). We first formulate the weight pruning problem of DNNs as a constrained nonconvex optimization problem, and then adopt the ADMM framework for systematic weight pruning. We show that ADMM is highly suitable for weight pruning due to the computational efficiency it offers. We achieve a much higher compression ratio compared with prior work while maintaining the same test accuracy, together with a faster convergence rate. Our models are released at https://github.com/KaiqiZhang/admm-pruning |
Tasks | |
Published | 2018-02-15 |
URL | http://arxiv.org/abs/1802.05747v2 |
http://arxiv.org/pdf/1802.05747v2.pdf | |
PWC | https://paperswithcode.com/paper/systematic-weight-pruning-of-dnns-using |
Repo | https://github.com/KaiqiZhang/admm-pruning |
Framework | tf |
Meta-Reinforcement Learning of Structured Exploration Strategies
Title | Meta-Reinforcement Learning of Structured Exploration Strategies |
Authors | Abhishek Gupta, Russell Mendonca, YuXuan Liu, Pieter Abbeel, Sergey Levine |
Abstract | Exploration is a fundamental challenge in reinforcement learning (RL). Many of the current exploration methods for deep RL use task-agnostic objectives, such as information gain or bonuses based on state visitation. However, many practical applications of RL involve learning more than a single task, and prior tasks can be used to inform how exploration should be performed in new tasks. In this work, we explore how prior tasks can inform an agent about how to explore effectively in new situations. We introduce a novel gradient-based fast adaptation algorithm – model agnostic exploration with structured noise (MAESN) – to learn exploration strategies from prior experience. The prior experience is used both to initialize a policy and to acquire a latent exploration space that can inject structured stochasticity into a policy, producing exploration strategies that are informed by prior knowledge and are more effective than random action-space noise. We show that MAESN is more effective at learning exploration strategies when compared to prior meta-RL methods, RL without learned exploration strategies, and task-agnostic exploration methods. We evaluate our method on a variety of simulated tasks: locomotion with a wheeled robot, locomotion with a quadrupedal walker, and object manipulation. |
Tasks | |
Published | 2018-02-20 |
URL | http://arxiv.org/abs/1802.07245v1 |
http://arxiv.org/pdf/1802.07245v1.pdf | |
PWC | https://paperswithcode.com/paper/meta-reinforcement-learning-of-structured |
Repo | https://github.com/llan-ml/tesp |
Framework | tf |
An Unsupervised Word Sense Disambiguation System for Under-Resourced Languages
Title | An Unsupervised Word Sense Disambiguation System for Under-Resourced Languages |
Authors | Dmitry Ustalov, Denis Teslenko, Alexander Panchenko, Mikhail Chernoskutov, Chris Biemann, Simone Paolo Ponzetto |
Abstract | In this paper, we present Watasense, an unsupervised system for word sense disambiguation. Given a sentence, the system chooses the most relevant sense of each input word with respect to the semantic similarity between the given sentence and the synset constituting the sense of the target word. Watasense has two modes of operation. The sparse mode uses the traditional vector space model to estimate the most similar word sense corresponding to its context. The dense mode, instead, uses synset embeddings to cope with the sparsity problem. We describe the architecture of the present system and also conduct its evaluation on three different lexical semantic resources for Russian. We found that the dense mode substantially outperforms the sparse one on all datasets according to the adjusted Rand index. |
Tasks | Semantic Similarity, Semantic Textual Similarity, Word Sense Disambiguation |
Published | 2018-04-27 |
URL | http://arxiv.org/abs/1804.10686v1 |
http://arxiv.org/pdf/1804.10686v1.pdf | |
PWC | https://paperswithcode.com/paper/an-unsupervised-word-sense-disambiguation |
Repo | https://github.com/nlpub/watasense |
Framework | none |
Visualizing Convolutional Networks for MRI-based Diagnosis of Alzheimer’s Disease
Title | Visualizing Convolutional Networks for MRI-based Diagnosis of Alzheimer’s Disease |
Authors | Johannes Rieke, Fabian Eitel, Martin Weygandt, John-Dylan Haynes, Kerstin Ritter |
Abstract | Visualizing and interpreting convolutional neural networks (CNNs) is an important task to increase trust in automatic medical decision making systems. In this study, we train a 3D CNN to detect Alzheimer’s disease based on structural MRI scans of the brain. Then, we apply four different gradient-based and occlusion-based visualization methods that explain the network’s classification decisions by highlighting relevant areas in the input image. We compare the methods qualitatively and quantitatively. We find that all four methods focus on brain regions known to be involved in Alzheimer’s disease, such as inferior and middle temporal gyrus. While the occlusion-based methods focus more on specific regions, the gradient-based methods pick up distributed relevance patterns. Additionally, we find that the distribution of relevance varies across patients, with some having a stronger focus on the temporal lobe, whereas for others more cortical areas are relevant. In summary, we show that applying different visualization methods is important to understand the decisions of a CNN, a step that is crucial to increase clinical impact and trust in computer-based decision support systems. |
Tasks | Decision Making |
Published | 2018-08-08 |
URL | http://arxiv.org/abs/1808.02874v1 |
http://arxiv.org/pdf/1808.02874v1.pdf | |
PWC | https://paperswithcode.com/paper/visualizing-convolutional-networks-for-mri |
Repo | https://github.com/jrieke/cnn-interpretability |
Framework | pytorch |