Paper Group ANR 105
Recovering Localized Adversarial Attacks. A 20-Year Community Roadmap for Artificial Intelligence Research in the US. Enhancing Clinical Concept Extraction with Contextual Embeddings. DeGraF-Flow: Extending DeGraF Features for accurate and efficient sparse-to-dense optical flow estimation. Reinforcement Learning with Low-Complexity Liquid State Mac …
Recovering Localized Adversarial Attacks
Title | Recovering Localized Adversarial Attacks |
Authors | Jan Philip Göpfert, Heiko Wersing, Barbara Hammer |
Abstract | Deep convolutional neural networks have achieved great successes over recent years, particularly in the domain of computer vision. They are fast, convenient, and – thanks to mature frameworks – relatively easy to implement and deploy. However, their reasoning is hidden inside a black box, in spite of a number of proposed approaches that try to provide human-understandable explanations for the predictions of neural networks. It is still a matter of debate which of these explainers are best suited for which situations, and how to quantitatively evaluate and compare them. In this contribution, we focus on the capabilities of explainers for convolutional deep neural networks in an extreme situation: a setting in which humans and networks fundamentally disagree. Deep neural networks are susceptible to adversarial attacks that deliberately modify input samples to mislead a neural network’s classification, without affecting how a human observer interprets the input. Our goal with this contribution is to evaluate explainers by investigating whether they can identify adversarially attacked regions of an image. In particular, we quantitatively and qualitatively investigate the capability of three popular explainers of classifications – classic salience, guided backpropagation, and LIME – with respect to their ability to identify regions of attack as the explanatory regions for the (incorrect) prediction in representative examples from image classification. We find that LIME outperforms the other explainers. |
Tasks | Image Classification |
Published | 2019-10-21 |
URL | https://arxiv.org/abs/1910.09239v1 |
https://arxiv.org/pdf/1910.09239v1.pdf | |
PWC | https://paperswithcode.com/paper/recovering-localized-adversarial-attacks |
Repo | |
Framework | |
A 20-Year Community Roadmap for Artificial Intelligence Research in the US
Title | A 20-Year Community Roadmap for Artificial Intelligence Research in the US |
Authors | Yolanda Gil, Bart Selman |
Abstract | Decades of research in artificial intelligence (AI) have produced formidable technologies that are providing immense benefit to industry, government, and society. AI systems can now translate across multiple languages, identify objects in images and video, streamline manufacturing processes, and control cars. The deployment of AI systems has not only created a trillion-dollar industry that is projected to quadruple in three years, but has also exposed the need to make AI systems fair, explainable, trustworthy, and secure. Future AI systems will rightfully be expected to reason effectively about the world in which they (and people) operate, handling complex tasks and responsibilities effectively and ethically, engaging in meaningful communication, and improving their awareness through experience. Achieving the full potential of AI technologies poses research challenges that require a radical transformation of the AI research enterprise, facilitated by significant and sustained investment. These are the major recommendations of a recent community effort coordinated by the Computing Community Consortium and the Association for the Advancement of Artificial Intelligence to formulate a Roadmap for AI research and development over the next two decades. |
Tasks | |
Published | 2019-08-07 |
URL | https://arxiv.org/abs/1908.02624v1 |
https://arxiv.org/pdf/1908.02624v1.pdf | |
PWC | https://paperswithcode.com/paper/a-20-year-community-roadmap-for-artificial |
Repo | |
Framework | |
Enhancing Clinical Concept Extraction with Contextual Embeddings
Title | Enhancing Clinical Concept Extraction with Contextual Embeddings |
Authors | Yuqi Si, Jingqi Wang, Hua Xu, Kirk Roberts |
Abstract | Neural network-based representations (“embeddings”) have dramatically advanced natural language processing (NLP) tasks, including clinical NLP tasks such as concept extraction. Recently, however, more advanced embedding methods and representations (e.g., ELMo, BERT) have further pushed the state-of-the-art in NLP, yet there are no common best practices for how to integrate these representations into clinical tasks. The purpose of this study, then, is to explore the space of possible options in utilizing these new models for clinical concept extraction, including comparing these to traditional word embedding methods (word2vec, GloVe, fastText). Both off-the-shelf open-domain embeddings and pre-trained clinical embeddings from MIMIC-III are evaluated. We explore a battery of embedding methods consisting of traditional word embeddings and contextual embeddings, and compare these on four concept extraction corpora: i2b2 2010, i2b2 2012, SemEval 2014, and SemEval 2015. We also analyze the impact of the pre-training time of a large language model like ELMo or BERT on the extraction performance. Last, we present an intuitive way to understand the semantic information encoded by contextual embeddings. Contextual embeddings pre-trained on a large clinical corpus achieves new state-of-the-art performances across all concept extraction tasks. The best-performing model outperforms all state-of-the-art methods with respective F1-measures of 90.25, 93.18 (partial), 80.74, and 81.65. We demonstrate the potential of contextual embeddings through the state-of-the-art performance these methods achieve on clinical concept extraction. Additionally, we demonstrate contextual embeddings encode valuable semantic information not accounted for in traditional word representations. |
Tasks | Clinical Concept Extraction, Language Modelling, Word Embeddings |
Published | 2019-02-22 |
URL | https://arxiv.org/abs/1902.08691v4 |
https://arxiv.org/pdf/1902.08691v4.pdf | |
PWC | https://paperswithcode.com/paper/enhancing-clinical-concept-extraction-with |
Repo | |
Framework | |
DeGraF-Flow: Extending DeGraF Features for accurate and efficient sparse-to-dense optical flow estimation
Title | DeGraF-Flow: Extending DeGraF Features for accurate and efficient sparse-to-dense optical flow estimation |
Authors | Felix Stephenson, Toby Breckon, Ioannis Katramados |
Abstract | Modern optical flow methods make use of salient scene feature points detected and matched within the scene as a basis for sparse-to-dense optical flow estimation. Current feature detectors however either give sparse, non uniform point clouds (resulting in flow inaccuracies) or lack the efficiency for frame-rate real-time applications. In this work we use the novel Dense Gradient Based Features (DeGraF) as the input to a sparse-to-dense optical flow scheme. This consists of three stages: 1) efficient detection of uniformly distributed Dense Gradient Based Features (DeGraF); 2) feature tracking via robust local optical flow; and 3) edge preserving flow interpolation to recover overall dense optical flow. The tunable density and uniformity of DeGraF features yield superior dense optical flow estimation compared to other popular feature detectors within this three stage pipeline. Furthermore, the comparable speed of feature detection also lends itself well to the aim of real-time optical flow recovery. Evaluation on established real-world benchmark datasets show test performance in an autonomous vehicle setting where DeGraF-Flow shows promising results in terms of accuracy with competitive computational efficiency among non-GPU based methods, including a marked increase in speed over the conceptually similar EpicFlow approach. |
Tasks | Optical Flow Estimation |
Published | 2019-01-28 |
URL | https://arxiv.org/abs/1901.09971v2 |
https://arxiv.org/pdf/1901.09971v2.pdf | |
PWC | https://paperswithcode.com/paper/degraf-flow-extending-degraf-features-for |
Repo | |
Framework | |
Reinforcement Learning with Low-Complexity Liquid State Machines
Title | Reinforcement Learning with Low-Complexity Liquid State Machines |
Authors | Wachirawit Ponghiran, Gopalakrishnan Srinivasan, Kaushik Roy |
Abstract | We propose reinforcement learning on simple networks consisting of random connections of spiking neurons (both recurrent and feed-forward) that can learn complex tasks with very little trainable parameters. Such sparse and randomly interconnected recurrent spiking networks exhibit highly non-linear dynamics that transform the inputs into rich high-dimensional representations based on past context. The random input representations can be efficiently interpreted by an output (or readout) layer with trainable parameters. Systematic initialization of the random connections and training of the readout layer using Q-learning algorithm enable such small random spiking networks to learn optimally and achieve the same learning efficiency as humans on complex reinforcement learning tasks like Atari games. The spike-based approach using small random recurrent networks provides a computationally efficient alternative to state-of-the-art deep reinforcement learning networks with several layers of trainable parameters. The low-complexity spiking networks can lead to improved energy efficiency in event-driven neuromorphic hardware for complex reinforcement learning tasks. |
Tasks | Atari Games, Q-Learning |
Published | 2019-06-04 |
URL | https://arxiv.org/abs/1906.01695v1 |
https://arxiv.org/pdf/1906.01695v1.pdf | |
PWC | https://paperswithcode.com/paper/reinforcement-learning-with-low-complexity |
Repo | |
Framework | |
Provably Efficient Q-Learning with Low Switching Cost
Title | Provably Efficient Q-Learning with Low Switching Cost |
Authors | Yu Bai, Tengyang Xie, Nan Jiang, Yu-Xiang Wang |
Abstract | We take initial steps in studying PAC-MDP algorithms with limited adaptivity, that is, algorithms that change its exploration policy as infrequently as possible during regret minimization. This is motivated by the difficulty of running fully adaptive algorithms in real-world applications (such as medical domains), and we propose to quantify adaptivity using the notion of local switching cost. Our main contribution, Q-Learning with UCB2 exploration, is a model-free algorithm for H-step episodic MDP that achieves sublinear regret whose local switching cost in K episodes is $O(H^3SA\log K)$, and we provide a lower bound of $\Omega(HSA)$ on the local switching cost for any no-regret algorithm. Our algorithm can be naturally adapted to the concurrent setting, which yields nontrivial results that improve upon prior work in certain aspects. |
Tasks | Q-Learning |
Published | 2019-05-30 |
URL | https://arxiv.org/abs/1905.12849v3 |
https://arxiv.org/pdf/1905.12849v3.pdf | |
PWC | https://paperswithcode.com/paper/provably-efficient-q-learning-with-low |
Repo | |
Framework | |
Performance of Q-learning with Linear Function Approximation: Stability and Finite-Time Analysis
Title | Performance of Q-learning with Linear Function Approximation: Stability and Finite-Time Analysis |
Authors | Zaiwei Chen, Sheng Zhang, Thinh T. Doan, Siva Theja Maguluri, John-Paul Clarke |
Abstract | In this paper, we consider the model-free reinforcement learning problem and study the popular Q-learning algorithm with linear function approximation for estimating the optimal policy. Despite its popularity, it is known that Q-learning with linear function approximation may diverge in general due to off-policy sampling. Our main contribution is to provide a finite-time bound and the convergence rate on the performance of Q-learning with linear function approximation under an assumption on the behavior policy. Unlike some prior work in the literature, we do not need to make the unnatural assumption that the samples are i.i.d. (since they are Markovian), and do not require an additional projection step in the algorithm. To show this result, we first consider a more general nonlinear stochastic approximation algorithm with Markovian noise, and derive a finite-time bound on the mean-square error, which we believe is of independent interest. Our proof is based on Lyapunov drift arguments and exploits the geometric mixing of the underlying Markov chain. We also provide numerical simulations to illustrate the effectiveness of our assumption on the behavior policy, and demonstrate the rate of convergence of Q-learning with linear function approximation. |
Tasks | Q-Learning |
Published | 2019-05-27 |
URL | https://arxiv.org/abs/1905.11425v3 |
https://arxiv.org/pdf/1905.11425v3.pdf | |
PWC | https://paperswithcode.com/paper/finite-time-analysis-of-q-learning-with |
Repo | |
Framework | |
Hard but Robust, Easy but Sensitive: How Encoder and Decoder Perform in Neural Machine Translation
Title | Hard but Robust, Easy but Sensitive: How Encoder and Decoder Perform in Neural Machine Translation |
Authors | Tianyu He, Xu Tan, Tao Qin |
Abstract | Neural machine translation (NMT) typically adopts the encoder-decoder framework. A good understanding of the characteristics and functionalities of the encoder and decoder can help to explain the pros and cons of the framework, and design better models for NMT. In this work, we conduct an empirical study on the encoder and the decoder in NMT, taking Transformer as an example. We find that 1) the decoder handles an easier task than the encoder in NMT, 2) the decoder is more sensitive to the input noise than the encoder, and 3) the preceding words/tokens in the decoder provide strong conditional information, which accounts for the two observations above. We hope those observations can shed light on the characteristics of the encoder and decoder and inspire future research on NMT. |
Tasks | Machine Translation |
Published | 2019-08-17 |
URL | https://arxiv.org/abs/1908.06259v1 |
https://arxiv.org/pdf/1908.06259v1.pdf | |
PWC | https://paperswithcode.com/paper/hard-but-robust-easy-but-sensitive-how |
Repo | |
Framework | |
Cross-Entropy Loss and Low-Rank Features Have Responsibility for Adversarial Examples
Title | Cross-Entropy Loss and Low-Rank Features Have Responsibility for Adversarial Examples |
Authors | Kamil Nar, Orhan Ocal, S. Shankar Sastry, Kannan Ramchandran |
Abstract | State-of-the-art neural networks are vulnerable to adversarial examples; they can easily misclassify inputs that are imperceptibly different than their training and test data. In this work, we establish that the use of cross-entropy loss function and the low-rank features of the training data have responsibility for the existence of these inputs. Based on this observation, we suggest that addressing adversarial examples requires rethinking the use of cross-entropy loss function and looking for an alternative that is more suited for minimization with low-rank features. In this direction, we present a training scheme called differential training, which uses a loss function defined on the differences between the features of points from opposite classes. We show that differential training can ensure a large margin between the decision boundary of the neural network and the points in the training dataset. This larger margin increases the amount of perturbation needed to flip the prediction of the classifier and makes it harder to find an adversarial example with small perturbations. We test differential training on a binary classification task with CIFAR-10 dataset and demonstrate that it radically reduces the ratio of images for which an adversarial example could be found – not only in the training dataset, but in the test dataset as well. |
Tasks | |
Published | 2019-01-24 |
URL | http://arxiv.org/abs/1901.08360v1 |
http://arxiv.org/pdf/1901.08360v1.pdf | |
PWC | https://paperswithcode.com/paper/cross-entropy-loss-and-low-rank-features-have |
Repo | |
Framework | |
Evaluation Uncertainty in Data-Driven Self-Driving Testing
Title | Evaluation Uncertainty in Data-Driven Self-Driving Testing |
Authors | Zhiyuan Huang, Mansur Arief, Henry Lam, Ding Zhao |
Abstract | Safety evaluation of self-driving technologies has been extensively studied. One recent approach uses Monte Carlo based evaluation to estimate the occurrence probabilities of safety-critical events as safety measures. These Monte Carlo samples are generated from stochastic input models constructed based on real-world data. In this paper, we propose an approach to assess the impact on the probability estimates from the evaluation procedures due to the estimation error caused by data variability. Our proposed method merges the classical bootstrap method for estimating input uncertainty with a likelihood ratio based scheme to reuse experiment outputs. This approach is economical and efficient in terms of implementation costs in assessing input uncertainty for the evaluation of self-driving technology. We use an example in autonomous vehicle (AV) safety evaluation to demonstrate the proposed approach as a diagnostic tool for the quality of the fitted input model. |
Tasks | Autonomous Vehicles |
Published | 2019-04-19 |
URL | https://arxiv.org/abs/1904.09306v2 |
https://arxiv.org/pdf/1904.09306v2.pdf | |
PWC | https://paperswithcode.com/paper/190409306 |
Repo | |
Framework | |
Improving LSTM Neural Networks for Better Short-Term Wind Power Predictions
Title | Improving LSTM Neural Networks for Better Short-Term Wind Power Predictions |
Authors | Maximilian Du |
Abstract | This paper improves wind power prediction via weather forecast-contextualized Long Short-Term Memory Neural Network (LSTM) models. Initially, only wind power data was fed to a generic LSTM, but this model performed poorly, with erratic and naive behavior observed on even low-variance data sections. To address this issue, weather forecast data was added to better contextualize the power data, and LSTM modifications were made to address specific model shortcomings. These models were tested through both a Normalized Mean Absolute Error and the Naive Ratio (NR), which is a score introduced by this paper to quantify the unwanted presence of naive character in trained models. Results showed an increased accuracy with the addition of weather forecast data on the modified models, as well as a decrease in naive character. Key contributions include making improved LSTM variants, usage of weather forecast data, and the introduction of a new model performance index. |
Tasks | |
Published | 2019-06-30 |
URL | https://arxiv.org/abs/1907.00489v2 |
https://arxiv.org/pdf/1907.00489v2.pdf | |
PWC | https://paperswithcode.com/paper/improving-lstm-neural-networks-for-better |
Repo | |
Framework | |
MoGlow: Probabilistic and controllable motion synthesis using normalising flows
Title | MoGlow: Probabilistic and controllable motion synthesis using normalising flows |
Authors | Gustav Eje Henter, Simon Alexanderson, Jonas Beskow |
Abstract | Data-driven modelling and synthesis of motion is an active research area with applications that include animation, games, and social robotics. This paper introduces a new class of probabilistic, generative, and controllable motion-data models based on normalising flows. Models of this kind can describe highly complex distributions, yet can be trained efficiently using exact maximum likelihood, unlike GANs or VAEs. Our proposed model is autoregressive and uses LSTMs to enable arbitrarily long time-dependencies. Importantly, is is also causal, meaning that each pose in the output sequence is generated without access to poses or control inputs from future time steps; this absence of algorithmic latency is important for interactive applications with real-time motion control. The approach can in principle be applied to any type of motion since it does not make restrictive assumptions such as the motion being cyclic in nature. We evaluate the models on motion-capture datasets of human and quadruped locomotion. Objective and subjective results show that randomly-sampled motion from the proposed method attains a motion quality close to recorded motion capture for both humans and animals. |
Tasks | Motion Capture, Normalising Flows |
Published | 2019-05-16 |
URL | https://arxiv.org/abs/1905.06598v2 |
https://arxiv.org/pdf/1905.06598v2.pdf | |
PWC | https://paperswithcode.com/paper/moglow-probabilistic-and-controllable-motion |
Repo | |
Framework | |
Evaluating Explainers via Perturbation
Title | Evaluating Explainers via Perturbation |
Authors | Minh N. Vu, Truc D. Nguyen, NhatHai Phan, Ralucca Gera, My T. Thai |
Abstract | Due to high complexity of many modern machine learning models such as deep convolutional networks, understanding the cause of model’s prediction is critical. Many explainers have been designed to give us more insights on the decision of complex classifiers. However, there is no common ground on evaluating the quality of different classification methods. Motivated by the needs for comprehensive evaluation, we introduce the c-Eval metric and the corresponding framework to quantify the explainer’s quality on feature-based explainers of machine learning image classifiers. Given a prediction and the corresponding explanation on that prediction, c-Eval is the minimum-power perturbation that successfully alters the prediction while keeping the explanation’s features unchanged. We also provide theoretical analysis linking the proposed parameter with the portion of predicted object covered by the explanation. Using a heuristic approach, we introduce the c-Eval plot, which not only displays a strong connection between c-Eval and explainers’ quality, but also serves as a low-complexity approach of assessing explainers. We finally conduct extensive experiments of explainers on three different datasets in order to support the adoption of c-Eval in evaluating explainers’ performance. |
Tasks | |
Published | 2019-06-05 |
URL | https://arxiv.org/abs/1906.02032v1 |
https://arxiv.org/pdf/1906.02032v1.pdf | |
PWC | https://paperswithcode.com/paper/evaluating-explainers-via-perturbation |
Repo | |
Framework | |
Bayesian Hierarchical Mixture Clustering using Multilevel Hierarchical Dirichlet Processes
Title | Bayesian Hierarchical Mixture Clustering using Multilevel Hierarchical Dirichlet Processes |
Authors | Weipeng Huang, Nishma Laitonjam, Guangyuan Piao, Neil Hurley |
Abstract | This paper focuses on the problem of hierarchical non-overlapping clustering of a dataset. In such a clustering, each data item is associated with exactly one leaf node and each internal node is associated with all the data items stored in the sub-tree beneath it, so that each level of the hierarchy corresponds to a partition of the dataset. We develop a novel Bayesian nonparametric method combining the nested Chinese Restaurant Process (nCRP) and the Hierarchical Dirichlet Process (HDP). Compared with other existing Bayesian approaches, our solution tackles data with complex latent mixture features which has not been previously explored in the literature. We discuss the details of the model and the inference procedure. Furthermore, experiments on three datasets show that our method achieves solid empirical results in comparison with existing algorithms. |
Tasks | |
Published | 2019-05-13 |
URL | https://arxiv.org/abs/1905.05022v3 |
https://arxiv.org/pdf/1905.05022v3.pdf | |
PWC | https://paperswithcode.com/paper/bayesian-hierarchical-mixture-clustering |
Repo | |
Framework | |
The asymptotic spectrum of the Hessian of DNN throughout training
Title | The asymptotic spectrum of the Hessian of DNN throughout training |
Authors | Arthur Jacot, Franck Gabriel, Clément Hongler |
Abstract | The dynamics of DNNs during gradient descent is described by the so-called Neural Tangent Kernel (NTK). In this article, we show that the NTK allows one to gain precise insight into the Hessian of the cost of DNNs. When the NTK is fixed during training, we obtain a full characterization of the asymptotics of the spectrum of the Hessian, at initialization and during training. In the so-called mean-field limit, where the NTK is not fixed during training, we describe the first two moments of the Hessian at initialization. |
Tasks | |
Published | 2019-10-01 |
URL | https://arxiv.org/abs/1910.02875v2 |
https://arxiv.org/pdf/1910.02875v2.pdf | |
PWC | https://paperswithcode.com/paper/the-asymptotic-spectrum-of-the-hessian-of-dnn |
Repo | |
Framework | |