January 31, 2020

3148 words 15 mins read

Paper Group ANR 105

Recovering Localized Adversarial Attacks. A 20-Year Community Roadmap for Artificial Intelligence Research in the US. Enhancing Clinical Concept Extraction with Contextual Embeddings. DeGraF-Flow: Extending DeGraF Features for accurate and efficient sparse-to-dense optical flow estimation. Reinforcement Learning with Low-Complexity Liquid State Mac …

Recovering Localized Adversarial Attacks


Title	Recovering Localized Adversarial Attacks
Authors	Jan Philip Göpfert, Heiko Wersing, Barbara Hammer
Abstract	Deep convolutional neural networks have achieved great successes over recent years, particularly in the domain of computer vision. They are fast, convenient, and – thanks to mature frameworks – relatively easy to implement and deploy. However, their reasoning is hidden inside a black box, in spite of a number of proposed approaches that try to provide human-understandable explanations for the predictions of neural networks. It is still a matter of debate which of these explainers are best suited for which situations, and how to quantitatively evaluate and compare them. In this contribution, we focus on the capabilities of explainers for convolutional deep neural networks in an extreme situation: a setting in which humans and networks fundamentally disagree. Deep neural networks are susceptible to adversarial attacks that deliberately modify input samples to mislead a neural network’s classification, without affecting how a human observer interprets the input. Our goal with this contribution is to evaluate explainers by investigating whether they can identify adversarially attacked regions of an image. In particular, we quantitatively and qualitatively investigate the capability of three popular explainers of classifications – classic salience, guided backpropagation, and LIME – with respect to their ability to identify regions of attack as the explanatory regions for the (incorrect) prediction in representative examples from image classification. We find that LIME outperforms the other explainers.
Tasks	Image Classification
Published	2019-10-21
URL	https://arxiv.org/abs/1910.09239v1
PDF	https://arxiv.org/pdf/1910.09239v1.pdf
PWC	https://paperswithcode.com/paper/recovering-localized-adversarial-attacks
Repo
Framework

A 20-Year Community Roadmap for Artificial Intelligence Research in the US


Title	A 20-Year Community Roadmap for Artificial Intelligence Research in the US
Authors	Yolanda Gil, Bart Selman
Abstract	Decades of research in artificial intelligence (AI) have produced formidable technologies that are providing immense benefit to industry, government, and society. AI systems can now translate across multiple languages, identify objects in images and video, streamline manufacturing processes, and control cars. The deployment of AI systems has not only created a trillion-dollar industry that is projected to quadruple in three years, but has also exposed the need to make AI systems fair, explainable, trustworthy, and secure. Future AI systems will rightfully be expected to reason effectively about the world in which they (and people) operate, handling complex tasks and responsibilities effectively and ethically, engaging in meaningful communication, and improving their awareness through experience. Achieving the full potential of AI technologies poses research challenges that require a radical transformation of the AI research enterprise, facilitated by significant and sustained investment. These are the major recommendations of a recent community effort coordinated by the Computing Community Consortium and the Association for the Advancement of Artificial Intelligence to formulate a Roadmap for AI research and development over the next two decades.
Tasks
Published	2019-08-07
URL	https://arxiv.org/abs/1908.02624v1
PDF	https://arxiv.org/pdf/1908.02624v1.pdf
PWC	https://paperswithcode.com/paper/a-20-year-community-roadmap-for-artificial
Repo
Framework

Enhancing Clinical Concept Extraction with Contextual Embeddings


Title	Enhancing Clinical Concept Extraction with Contextual Embeddings
Authors	Yuqi Si, Jingqi Wang, Hua Xu, Kirk Roberts
Abstract	Neural network-based representations (“embeddings”) have dramatically advanced natural language processing (NLP) tasks, including clinical NLP tasks such as concept extraction. Recently, however, more advanced embedding methods and representations (e.g., ELMo, BERT) have further pushed the state-of-the-art in NLP, yet there are no common best practices for how to integrate these representations into clinical tasks. The purpose of this study, then, is to explore the space of possible options in utilizing these new models for clinical concept extraction, including comparing these to traditional word embedding methods (word2vec, GloVe, fastText). Both off-the-shelf open-domain embeddings and pre-trained clinical embeddings from MIMIC-III are evaluated. We explore a battery of embedding methods consisting of traditional word embeddings and contextual embeddings, and compare these on four concept extraction corpora: i2b2 2010, i2b2 2012, SemEval 2014, and SemEval 2015. We also analyze the impact of the pre-training time of a large language model like ELMo or BERT on the extraction performance. Last, we present an intuitive way to understand the semantic information encoded by contextual embeddings. Contextual embeddings pre-trained on a large clinical corpus achieves new state-of-the-art performances across all concept extraction tasks. The best-performing model outperforms all state-of-the-art methods with respective F1-measures of 90.25, 93.18 (partial), 80.74, and 81.65. We demonstrate the potential of contextual embeddings through the state-of-the-art performance these methods achieve on clinical concept extraction. Additionally, we demonstrate contextual embeddings encode valuable semantic information not accounted for in traditional word representations.
Tasks	Clinical Concept Extraction, Language Modelling, Word Embeddings
Published	2019-02-22
URL	https://arxiv.org/abs/1902.08691v4
PDF	https://arxiv.org/pdf/1902.08691v4.pdf
PWC	https://paperswithcode.com/paper/enhancing-clinical-concept-extraction-with
Repo
Framework

DeGraF-Flow: Extending DeGraF Features for accurate and efficient sparse-to-dense optical flow estimation


Title	DeGraF-Flow: Extending DeGraF Features for accurate and efficient sparse-to-dense optical flow estimation
Authors	Felix Stephenson, Toby Breckon, Ioannis Katramados
Abstract	Modern optical flow methods make use of salient scene feature points detected and matched within the scene as a basis for sparse-to-dense optical flow estimation. Current feature detectors however either give sparse, non uniform point clouds (resulting in flow inaccuracies) or lack the efficiency for frame-rate real-time applications. In this work we use the novel Dense Gradient Based Features (DeGraF) as the input to a sparse-to-dense optical flow scheme. This consists of three stages: 1) efficient detection of uniformly distributed Dense Gradient Based Features (DeGraF); 2) feature tracking via robust local optical flow; and 3) edge preserving flow interpolation to recover overall dense optical flow. The tunable density and uniformity of DeGraF features yield superior dense optical flow estimation compared to other popular feature detectors within this three stage pipeline. Furthermore, the comparable speed of feature detection also lends itself well to the aim of real-time optical flow recovery. Evaluation on established real-world benchmark datasets show test performance in an autonomous vehicle setting where DeGraF-Flow shows promising results in terms of accuracy with competitive computational efficiency among non-GPU based methods, including a marked increase in speed over the conceptually similar EpicFlow approach.
Tasks	Optical Flow Estimation
Published	2019-01-28
URL	https://arxiv.org/abs/1901.09971v2
PDF	https://arxiv.org/pdf/1901.09971v2.pdf
PWC	https://paperswithcode.com/paper/degraf-flow-extending-degraf-features-for
Repo
Framework

Reinforcement Learning with Low-Complexity Liquid State Machines


Title	Reinforcement Learning with Low-Complexity Liquid State Machines
Authors	Wachirawit Ponghiran, Gopalakrishnan Srinivasan, Kaushik Roy
Abstract	We propose reinforcement learning on simple networks consisting of random connections of spiking neurons (both recurrent and feed-forward) that can learn complex tasks with very little trainable parameters. Such sparse and randomly interconnected recurrent spiking networks exhibit highly non-linear dynamics that transform the inputs into rich high-dimensional representations based on past context. The random input representations can be efficiently interpreted by an output (or readout) layer with trainable parameters. Systematic initialization of the random connections and training of the readout layer using Q-learning algorithm enable such small random spiking networks to learn optimally and achieve the same learning efficiency as humans on complex reinforcement learning tasks like Atari games. The spike-based approach using small random recurrent networks provides a computationally efficient alternative to state-of-the-art deep reinforcement learning networks with several layers of trainable parameters. The low-complexity spiking networks can lead to improved energy efficiency in event-driven neuromorphic hardware for complex reinforcement learning tasks.
Tasks	Atari Games, Q-Learning
Published	2019-06-04
URL	https://arxiv.org/abs/1906.01695v1
PDF	https://arxiv.org/pdf/1906.01695v1.pdf
PWC	https://paperswithcode.com/paper/reinforcement-learning-with-low-complexity
Repo
Framework

Provably Efficient Q-Learning with Low Switching Cost


Title	Provably Efficient Q-Learning with Low Switching Cost
Authors	Yu Bai, Tengyang Xie, Nan Jiang, Yu-Xiang Wang
Abstract	We take initial steps in studying PAC-MDP algorithms with limited adaptivity, that is, algorithms that change its exploration policy as infrequently as possible during regret minimization. This is motivated by the difficulty of running fully adaptive algorithms in real-world applications (such as medical domains), and we propose to quantify adaptivity using the notion of local switching cost. Our main contribution, Q-Learning with UCB2 exploration, is a model-free algorithm for H-step episodic MDP that achieves sublinear regret whose local switching cost in K episodes is $O(H^3SA\log K)$, and we provide a lower bound of $\Omega(HSA)$ on the local switching cost for any no-regret algorithm. Our algorithm can be naturally adapted to the concurrent setting, which yields nontrivial results that improve upon prior work in certain aspects.
Tasks	Q-Learning
Published	2019-05-30
URL	https://arxiv.org/abs/1905.12849v3
PDF	https://arxiv.org/pdf/1905.12849v3.pdf
PWC	https://paperswithcode.com/paper/provably-efficient-q-learning-with-low
Repo
Framework

Performance of Q-learning with Linear Function Approximation: Stability and Finite-Time Analysis


Title	Performance of Q-learning with Linear Function Approximation: Stability and Finite-Time Analysis
Authors	Zaiwei Chen, Sheng Zhang, Thinh T. Doan, Siva Theja Maguluri, John-Paul Clarke
Abstract	In this paper, we consider the model-free reinforcement learning problem and study the popular Q-learning algorithm with linear function approximation for estimating the optimal policy. Despite its popularity, it is known that Q-learning with linear function approximation may diverge in general due to off-policy sampling. Our main contribution is to provide a finite-time bound and the convergence rate on the performance of Q-learning with linear function approximation under an assumption on the behavior policy. Unlike some prior work in the literature, we do not need to make the unnatural assumption that the samples are i.i.d. (since they are Markovian), and do not require an additional projection step in the algorithm. To show this result, we first consider a more general nonlinear stochastic approximation algorithm with Markovian noise, and derive a finite-time bound on the mean-square error, which we believe is of independent interest. Our proof is based on Lyapunov drift arguments and exploits the geometric mixing of the underlying Markov chain. We also provide numerical simulations to illustrate the effectiveness of our assumption on the behavior policy, and demonstrate the rate of convergence of Q-learning with linear function approximation.
Tasks	Q-Learning
Published	2019-05-27
URL	https://arxiv.org/abs/1905.11425v3
PDF	https://arxiv.org/pdf/1905.11425v3.pdf
PWC	https://paperswithcode.com/paper/finite-time-analysis-of-q-learning-with
Repo
Framework

Hard but Robust, Easy but Sensitive: How Encoder and Decoder Perform in Neural Machine Translation


Title	Hard but Robust, Easy but Sensitive: How Encoder and Decoder Perform in Neural Machine Translation
Authors	Tianyu He, Xu Tan, Tao Qin
Abstract	Neural machine translation (NMT) typically adopts the encoder-decoder framework. A good understanding of the characteristics and functionalities of the encoder and decoder can help to explain the pros and cons of the framework, and design better models for NMT. In this work, we conduct an empirical study on the encoder and the decoder in NMT, taking Transformer as an example. We find that 1) the decoder handles an easier task than the encoder in NMT, 2) the decoder is more sensitive to the input noise than the encoder, and 3) the preceding words/tokens in the decoder provide strong conditional information, which accounts for the two observations above. We hope those observations can shed light on the characteristics of the encoder and decoder and inspire future research on NMT.
Tasks	Machine Translation
Published	2019-08-17
URL	https://arxiv.org/abs/1908.06259v1
PDF	https://arxiv.org/pdf/1908.06259v1.pdf
PWC	https://paperswithcode.com/paper/hard-but-robust-easy-but-sensitive-how
Repo
Framework

Cross-Entropy Loss and Low-Rank Features Have Responsibility for Adversarial Examples


Title	Cross-Entropy Loss and Low-Rank Features Have Responsibility for Adversarial Examples
Authors	Kamil Nar, Orhan Ocal, S. Shankar Sastry, Kannan Ramchandran
Abstract	State-of-the-art neural networks are vulnerable to adversarial examples; they can easily misclassify inputs that are imperceptibly different than their training and test data. In this work, we establish that the use of cross-entropy loss function and the low-rank features of the training data have responsibility for the existence of these inputs. Based on this observation, we suggest that addressing adversarial examples requires rethinking the use of cross-entropy loss function and looking for an alternative that is more suited for minimization with low-rank features. In this direction, we present a training scheme called differential training, which uses a loss function defined on the differences between the features of points from opposite classes. We show that differential training can ensure a large margin between the decision boundary of the neural network and the points in the training dataset. This larger margin increases the amount of perturbation needed to flip the prediction of the classifier and makes it harder to find an adversarial example with small perturbations. We test differential training on a binary classification task with CIFAR-10 dataset and demonstrate that it radically reduces the ratio of images for which an adversarial example could be found – not only in the training dataset, but in the test dataset as well.
Tasks
Published	2019-01-24
URL	http://arxiv.org/abs/1901.08360v1
PDF	http://arxiv.org/pdf/1901.08360v1.pdf
PWC	https://paperswithcode.com/paper/cross-entropy-loss-and-low-rank-features-have
Repo
Framework

Evaluation Uncertainty in Data-Driven Self-Driving Testing


Title	Evaluation Uncertainty in Data-Driven Self-Driving Testing
Authors	Zhiyuan Huang, Mansur Arief, Henry Lam, Ding Zhao
Abstract	Safety evaluation of self-driving technologies has been extensively studied. One recent approach uses Monte Carlo based evaluation to estimate the occurrence probabilities of safety-critical events as safety measures. These Monte Carlo samples are generated from stochastic input models constructed based on real-world data. In this paper, we propose an approach to assess the impact on the probability estimates from the evaluation procedures due to the estimation error caused by data variability. Our proposed method merges the classical bootstrap method for estimating input uncertainty with a likelihood ratio based scheme to reuse experiment outputs. This approach is economical and efficient in terms of implementation costs in assessing input uncertainty for the evaluation of self-driving technology. We use an example in autonomous vehicle (AV) safety evaluation to demonstrate the proposed approach as a diagnostic tool for the quality of the fitted input model.
Tasks	Autonomous Vehicles
Published	2019-04-19
URL	https://arxiv.org/abs/1904.09306v2
PDF	https://arxiv.org/pdf/1904.09306v2.pdf
PWC	https://paperswithcode.com/paper/190409306
Repo
Framework

Improving LSTM Neural Networks for Better Short-Term Wind Power Predictions


Title	Improving LSTM Neural Networks for Better Short-Term Wind Power Predictions
Authors	Maximilian Du
Abstract	This paper improves wind power prediction via weather forecast-contextualized Long Short-Term Memory Neural Network (LSTM) models. Initially, only wind power data was fed to a generic LSTM, but this model performed poorly, with erratic and naive behavior observed on even low-variance data sections. To address this issue, weather forecast data was added to better contextualize the power data, and LSTM modifications were made to address specific model shortcomings. These models were tested through both a Normalized Mean Absolute Error and the Naive Ratio (NR), which is a score introduced by this paper to quantify the unwanted presence of naive character in trained models. Results showed an increased accuracy with the addition of weather forecast data on the modified models, as well as a decrease in naive character. Key contributions include making improved LSTM variants, usage of weather forecast data, and the introduction of a new model performance index.
Tasks
Published	2019-06-30
URL	https://arxiv.org/abs/1907.00489v2
PDF	https://arxiv.org/pdf/1907.00489v2.pdf
PWC	https://paperswithcode.com/paper/improving-lstm-neural-networks-for-better
Repo
Framework

MoGlow: Probabilistic and controllable motion synthesis using normalising flows


Title	MoGlow: Probabilistic and controllable motion synthesis using normalising flows
Authors	Gustav Eje Henter, Simon Alexanderson, Jonas Beskow
Abstract	Data-driven modelling and synthesis of motion is an active research area with applications that include animation, games, and social robotics. This paper introduces a new class of probabilistic, generative, and controllable motion-data models based on normalising flows. Models of this kind can describe highly complex distributions, yet can be trained efficiently using exact maximum likelihood, unlike GANs or VAEs. Our proposed model is autoregressive and uses LSTMs to enable arbitrarily long time-dependencies. Importantly, is is also causal, meaning that each pose in the output sequence is generated without access to poses or control inputs from future time steps; this absence of algorithmic latency is important for interactive applications with real-time motion control. The approach can in principle be applied to any type of motion since it does not make restrictive assumptions such as the motion being cyclic in nature. We evaluate the models on motion-capture datasets of human and quadruped locomotion. Objective and subjective results show that randomly-sampled motion from the proposed method attains a motion quality close to recorded motion capture for both humans and animals.
Tasks	Motion Capture, Normalising Flows
Published	2019-05-16
URL	https://arxiv.org/abs/1905.06598v2
PDF	https://arxiv.org/pdf/1905.06598v2.pdf
PWC	https://paperswithcode.com/paper/moglow-probabilistic-and-controllable-motion
Repo
Framework

Evaluating Explainers via Perturbation


Title	Evaluating Explainers via Perturbation
Authors	Minh N. Vu, Truc D. Nguyen, NhatHai Phan, Ralucca Gera, My T. Thai
Abstract	Due to high complexity of many modern machine learning models such as deep convolutional networks, understanding the cause of model’s prediction is critical. Many explainers have been designed to give us more insights on the decision of complex classifiers. However, there is no common ground on evaluating the quality of different classification methods. Motivated by the needs for comprehensive evaluation, we introduce the c-Eval metric and the corresponding framework to quantify the explainer’s quality on feature-based explainers of machine learning image classifiers. Given a prediction and the corresponding explanation on that prediction, c-Eval is the minimum-power perturbation that successfully alters the prediction while keeping the explanation’s features unchanged. We also provide theoretical analysis linking the proposed parameter with the portion of predicted object covered by the explanation. Using a heuristic approach, we introduce the c-Eval plot, which not only displays a strong connection between c-Eval and explainers’ quality, but also serves as a low-complexity approach of assessing explainers. We finally conduct extensive experiments of explainers on three different datasets in order to support the adoption of c-Eval in evaluating explainers’ performance.
Tasks
Published	2019-06-05
URL	https://arxiv.org/abs/1906.02032v1
PDF	https://arxiv.org/pdf/1906.02032v1.pdf
PWC	https://paperswithcode.com/paper/evaluating-explainers-via-perturbation
Repo
Framework

Bayesian Hierarchical Mixture Clustering using Multilevel Hierarchical Dirichlet Processes


Title	Bayesian Hierarchical Mixture Clustering using Multilevel Hierarchical Dirichlet Processes
Authors	Weipeng Huang, Nishma Laitonjam, Guangyuan Piao, Neil Hurley
Abstract	This paper focuses on the problem of hierarchical non-overlapping clustering of a dataset. In such a clustering, each data item is associated with exactly one leaf node and each internal node is associated with all the data items stored in the sub-tree beneath it, so that each level of the hierarchy corresponds to a partition of the dataset. We develop a novel Bayesian nonparametric method combining the nested Chinese Restaurant Process (nCRP) and the Hierarchical Dirichlet Process (HDP). Compared with other existing Bayesian approaches, our solution tackles data with complex latent mixture features which has not been previously explored in the literature. We discuss the details of the model and the inference procedure. Furthermore, experiments on three datasets show that our method achieves solid empirical results in comparison with existing algorithms.
Tasks
Published	2019-05-13
URL	https://arxiv.org/abs/1905.05022v3
PDF	https://arxiv.org/pdf/1905.05022v3.pdf
PWC	https://paperswithcode.com/paper/bayesian-hierarchical-mixture-clustering
Repo
Framework

The asymptotic spectrum of the Hessian of DNN throughout training


Title	The asymptotic spectrum of the Hessian of DNN throughout training
Authors	Arthur Jacot, Franck Gabriel, Clément Hongler
Abstract	The dynamics of DNNs during gradient descent is described by the so-called Neural Tangent Kernel (NTK). In this article, we show that the NTK allows one to gain precise insight into the Hessian of the cost of DNNs. When the NTK is fixed during training, we obtain a full characterization of the asymptotics of the spectrum of the Hessian, at initialization and during training. In the so-called mean-field limit, where the NTK is not fixed during training, we describe the first two moments of the Hessian at initialization.
Tasks
Published	2019-10-01
URL	https://arxiv.org/abs/1910.02875v2
PDF	https://arxiv.org/pdf/1910.02875v2.pdf
PWC	https://paperswithcode.com/paper/the-asymptotic-spectrum-of-the-hessian-of-dnn
Repo
Framework