Paper Group ANR 191
Generating Counterfactual Explanations with Natural Language. Wireless Data Acquisition for Edge Learning: Data-Importance Aware Retransmission. Long Short-Term Attention. Relative concentration bounds for the spectrum of kernel matrices. Finite-time optimality of Bayesian predictors. You Only Search Once: Single Shot Neural Architecture Search via …
Generating Counterfactual Explanations with Natural Language
Title | Generating Counterfactual Explanations with Natural Language |
Authors | Lisa Anne Hendricks, Ronghang Hu, Trevor Darrell, Zeynep Akata |
Abstract | Natural language explanations of deep neural network decisions provide an intuitive way for a AI agent to articulate a reasoning process. Current textual explanations learn to discuss class discriminative features in an image. However, it is also helpful to understand which attributes might change a classification decision if present in an image (e.g., “This is not a Scarlet Tanager because it does not have black wings.") We call such textual explanations counterfactual explanations, and propose an intuitive method to generate counterfactual explanations by inspecting which evidence in an input is missing, but might contribute to a different classification decision if present in the image. To demonstrate our method we consider a fine-grained image classification task in which we take as input an image and a counterfactual class and output text which explains why the image does not belong to a counterfactual class. We then analyze our generated counterfactual explanations both qualitatively and quantitatively using proposed automatic metrics. |
Tasks | Fine-Grained Image Classification, Image Classification |
Published | 2018-06-26 |
URL | http://arxiv.org/abs/1806.09809v1 |
http://arxiv.org/pdf/1806.09809v1.pdf | |
PWC | https://paperswithcode.com/paper/generating-counterfactual-explanations-with |
Repo | |
Framework | |
Wireless Data Acquisition for Edge Learning: Data-Importance Aware Retransmission
Title | Wireless Data Acquisition for Edge Learning: Data-Importance Aware Retransmission |
Authors | Dongzhu Liu, Guangxu Zhu, Jun Zhang, Kaibin Huang |
Abstract | By deploying machine-learning algorithms at the network edge, edge learning can leverage the enormous real-time data generated by billions of mobile devices to train AI models, which enable intelligent mobile applications. In this emerging research area, one key direction is to efficiently utilize radio resources for wireless data acquisition to minimize the latency of executing a learning task at an edge server. Along this direction, we consider the specific problem of retransmission decision in each communication round to ensure both reliability and quantity of those training data for accelerating model convergence. To solve the problem, a new retransmission protocol called data-importance aware automatic-repeat-request (importance ARQ) is proposed. Unlike the classic ARQ focusing merely on reliability, importance ARQ selectively retransmits a data sample based on its uncertainty which helps learning and can be measured using the model under training. Underpinning the proposed protocol is a derived elegant communication-learning relation between two corresponding metrics, i.e., signal-to-noise ratio (SNR) and data uncertainty. This relation facilitates the design of a simple threshold based policy for importance ARQ. The policy is first derived based on the classic classifier model of support vector machine (SVM), where the uncertainty of a data sample is measured by its distance to the decision boundary. The policy is then extended to the more complex model of convolutional neural networks (CNN) where data uncertainty is measured by entropy. Extensive experiments have been conducted for both the SVM and CNN using real datasets with balanced and imbalanced distributions. Experimental results demonstrate that importance ARQ effectively copes with channel fading and noise in wireless data acquisition to achieve faster model convergence than the conventional channel-aware ARQ. |
Tasks | |
Published | 2018-12-05 |
URL | http://arxiv.org/abs/1812.02030v2 |
http://arxiv.org/pdf/1812.02030v2.pdf | |
PWC | https://paperswithcode.com/paper/wireless-data-acquisition-for-edge-learning |
Repo | |
Framework | |
Long Short-Term Attention
Title | Long Short-Term Attention |
Authors | Guoqiang Zhong, Xin Lin, Kang Chen, Qingyang Li, Kaizhu Huang |
Abstract | Attention is an important cognition process of humans, which helps humans concentrate on critical information during their perception and learning. However, although many machine learning models can remember information of data, they have no the attention mechanism. For example, the long short-term memory (LSTM) network is able to remember sequential information, but it cannot pay special attention to part of the sequences. In this paper, we present a novel model called long short-term attention (LSTA), which seamlessly integrates the attention mechanism into the inner cell of LSTM. More than processing long short term dependencies, LSTA can focus on important information of the sequences with the attention mechanism. Extensive experiments demonstrate that LSTA outperforms LSTM and related models on the sequence learning tasks. |
Tasks | |
Published | 2018-10-30 |
URL | https://arxiv.org/abs/1810.12752v2 |
https://arxiv.org/pdf/1810.12752v2.pdf | |
PWC | https://paperswithcode.com/paper/long-short-term-attention |
Repo | |
Framework | |
Relative concentration bounds for the spectrum of kernel matrices
Title | Relative concentration bounds for the spectrum of kernel matrices |
Authors | Ernesto Araya Valdivia |
Abstract | In this paper we study some concentration properties of the kernel matrix associated with a kernel function. More specifically, we derive concentration inequalities for the spectrum of a kernel matrix, quantifying its deviation with respect to the spectrum of an associated integral operator. The main difference with most results in the literature is that we do not assume the positive definiteness of the kernel. Instead, we introduce Sobolev-type hypotheses on the regularity of the kernel. We show how these assumptions are well-suited to the study of kernels depending only on the distance between two points in a metric space, in which case the regularity only depends on the decay of the eigenvalues. This is connected with random geometric graphs, which we study further, giving explicit formulas for the spectrum and its fluctuations. |
Tasks | |
Published | 2018-12-05 |
URL | http://arxiv.org/abs/1812.02108v6 |
http://arxiv.org/pdf/1812.02108v6.pdf | |
PWC | https://paperswithcode.com/paper/relative-concentration-bounds-for-the |
Repo | |
Framework | |
Finite-time optimality of Bayesian predictors
Title | Finite-time optimality of Bayesian predictors |
Authors | Daniil Ryabko |
Abstract | The problem of sequential probability forecasting is considered in the most general setting: a model set C is given, and it is required to predict as well as possible if any of the measures (environments) in C is chosen to generate the data. No assumptions whatsoever are made on the model class C, in particular, no independence or mixing assumptions; C may not be measurable; there may be no predictor whose loss is sublinear, etc. It is shown that the cumulative loss of any possible predictor can be matched by that of a Bayesian predictor whose prior is discrete and is concentrated on C, up to an additive term of order $\log n$, where $n$ is the time step. The bound holds for every $n$ and every measure in C. This is the first non-asymptotic result of this kind. In addition, a non-matching lower bound is established: it goes to infinity with $n$ but may do so arbitrarily slow. |
Tasks | |
Published | 2018-12-20 |
URL | https://arxiv.org/abs/1812.08292v2 |
https://arxiv.org/pdf/1812.08292v2.pdf | |
PWC | https://paperswithcode.com/paper/finite-time-optimality-of-bayesian-predictors |
Repo | |
Framework | |
You Only Search Once: Single Shot Neural Architecture Search via Direct Sparse Optimization
Title | You Only Search Once: Single Shot Neural Architecture Search via Direct Sparse Optimization |
Authors | Xinbang Zhang, Zehao Huang, Naiyan Wang |
Abstract | Recently Neural Architecture Search (NAS) has aroused great interest in both academia and industry, however it remains challenging because of its huge and non-continuous search space. Instead of applying evolutionary algorithm or reinforcement learning as previous works, this paper proposes a Direct Sparse Optimization NAS (DSO-NAS) method. In DSO-NAS, we provide a novel model pruning view to NAS problem. In specific, we start from a completely connected block, and then introduce scaling factors to scale the information flow between operations. Next, we impose sparse regularizations to prune useless connections in the architecture. Lastly, we derive an efficient and theoretically sound optimization method to solve it. Our method enjoys both advantages of differentiability and efficiency, therefore can be directly applied to large datasets like ImageNet. Particularly, On CIFAR-10 dataset, DSO-NAS achieves an average test error 2.84%, while on the ImageNet dataset DSO-NAS achieves 25.4% test error under 600M FLOPs with 8 GPUs in 18 hours. |
Tasks | Neural Architecture Search |
Published | 2018-11-05 |
URL | http://arxiv.org/abs/1811.01567v1 |
http://arxiv.org/pdf/1811.01567v1.pdf | |
PWC | https://paperswithcode.com/paper/you-only-search-once-single-shot-neural |
Repo | |
Framework | |
Measuring Issue Ownership using Word Embeddings
Title | Measuring Issue Ownership using Word Embeddings |
Authors | Amaru Cuba Gyllensten, Magnus Sahlgren |
Abstract | Sentiment and topic analysis are common methods used for social media monitoring. Essentially, these methods answers questions such as, “what is being talked about, regarding X”, and “what do people feel, regarding X”. In this paper, we investigate another venue for social media monitoring, namely issue ownership and agenda setting, which are concepts from political science that have been used to explain voter choice and electoral outcomes. We argue that issue alignment and agenda setting can be seen as a kind of semantic source similarity of the kind “how similar is source A to issue owner P, when talking about issue X”, and as such can be measured using word/document embedding techniques. We present work in progress towards measuring that kind of conditioned similarity, and introduce a new notion of similarity for predictive embeddings. We then test this method by measuring the similarity between politically aligned media and political parties, conditioned on bloc-specific issues. |
Tasks | Document Embedding, Word Embeddings |
Published | 2018-10-31 |
URL | http://arxiv.org/abs/1811.00127v1 |
http://arxiv.org/pdf/1811.00127v1.pdf | |
PWC | https://paperswithcode.com/paper/measuring-issue-ownership-using-word |
Repo | |
Framework | |
Improved Person Detection on Omnidirectional Images with Non-maxima Suppression
Title | Improved Person Detection on Omnidirectional Images with Non-maxima Suppression |
Authors | Roman Seidel, André Apitzsch, Gangolf Hirtz |
Abstract | We propose a person detector on omnidirectional images, an accurate method to generate minimal enclosing rectangles of persons. The basic idea is to adapt the qualitative detection performance of a convolutional neural network based method, namely YOLOv2 to fish-eye images. The design of our approach picks up the idea of a state-of-the-art object detector and highly overlapping areas of images with their regions of interests. This overlap reduces the number of false negatives. Based on the raw bounding boxes of the detector we fine-tuned overlapping bounding boxes by three approaches: non-maximum suppression, soft non-maximum suppression and soft non-maximum suppression with Gaussian smoothing. The evaluation was done on the PIROPO database and an own annotated Flat dataset, supplemented with bounding boxes on omnidirectional images. We achieve an average precision of 64.4 % with YOLOv2 for the class person on PIROPO and 77.6 % on Flat. For this purpose we fine-tuned the soft non-maximum suppression with Gaussian smoothing. |
Tasks | Human Detection |
Published | 2018-05-22 |
URL | http://arxiv.org/abs/1805.08503v4 |
http://arxiv.org/pdf/1805.08503v4.pdf | |
PWC | https://paperswithcode.com/paper/improved-person-detection-on-omnidirectional |
Repo | |
Framework | |
Building Bayesian Neural Networks with Blocks: On Structure, Interpretability and Uncertainty
Title | Building Bayesian Neural Networks with Blocks: On Structure, Interpretability and Uncertainty |
Authors | Hao Henry Zhou, Yunyang Xiong, Vikas Singh |
Abstract | We provide simple schemes to build Bayesian Neural Networks (BNNs), block by block, inspired by a recent idea of computation skeletons. We show how by adjusting the types of blocks that are used within the computation skeleton, we can identify interesting relationships with Deep Gaussian Processes (DGPs), deep kernel learning (DKL), random features type approximation and other topics. We give strategies to approximate the posterior via doubly stochastic variational inference for such models which yield uncertainty estimates. We give a detailed theoretical analysis and point out extensions that may be of independent interest. As a special case, we instantiate our procedure to define a Bayesian {\em additive} Neural network – a promising strategy to identify statistical interactions and has direct benefits for obtaining interpretable models. |
Tasks | Gaussian Processes |
Published | 2018-06-10 |
URL | http://arxiv.org/abs/1806.03563v1 |
http://arxiv.org/pdf/1806.03563v1.pdf | |
PWC | https://paperswithcode.com/paper/building-bayesian-neural-networks-with-blocks |
Repo | |
Framework | |
Towards a Spectrum of Graph Convolutional Networks
Title | Towards a Spectrum of Graph Convolutional Networks |
Authors | Mathias Niepert, Alberto Garcia-Duran |
Abstract | We present our ongoing work on understanding the limitations of graph convolutional networks (GCNs) as well as our work on generalizations of graph convolutions for representing more complex node attribute dependencies. Based on an analysis of GCNs with the help of the corresponding computation graphs, we propose a generalization of existing GCNs where the aggregation operations are (a) determined by structural properties of the local neighborhood graphs and (b) not restricted to weighted averages. We show that the proposed approach is strictly more expressive while requiring only a modest increase in the number of parameters and computations. We also show that the proposed generalization is identical to standard convolutional layers when applied to regular grid graphs. |
Tasks | |
Published | 2018-05-04 |
URL | http://arxiv.org/abs/1805.01837v1 |
http://arxiv.org/pdf/1805.01837v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-a-spectrum-of-graph-convolutional |
Repo | |
Framework | |
Capsule Networks for Brain Tumor Classification based on MRI Images and Course Tumor Boundaries
Title | Capsule Networks for Brain Tumor Classification based on MRI Images and Course Tumor Boundaries |
Authors | Parnian Afshar, Konstantinos N. Plataniotis, Arash Mohammadi |
Abstract | According to official statistics, cancer is considered as the second leading cause of human fatalities. Among different types of cancer, brain tumor is seen as one of the deadliest forms due to its aggressive nature, heterogeneous characteristics, and low relative survival rate. Determining the type of brain tumor has significant impact on the treatment choice and patient’s survival. Human-centered diagnosis is typically error-prone and unreliable resulting in a recent surge of interest to automatize this process using convolutional neural networks (CNNs). CNNs, however, fail to fully utilize spatial relations, which is particularly harmful for tumor classification, as the relation between the tumor and its surrounding tissue is a critical indicator of the tumor’s type. In our recent work, we have incorporated newly developed CapsNets to overcome this shortcoming. CapsNets are, however, highly sensitive to the miscellaneous image background. The paper addresses this gap. The main contribution is to equip CapsNet with access to the tumor surrounding tissues, without distracting it from the main target. A modified CapsNet architecture is, therefore, proposed for brain tumor classification, which takes the tumor coarse boundaries as extra inputs within its pipeline to increase the CapsNet’s focus. The proposed approach noticeably outperforms its counterparts. |
Tasks | |
Published | 2018-11-01 |
URL | http://arxiv.org/abs/1811.00597v1 |
http://arxiv.org/pdf/1811.00597v1.pdf | |
PWC | https://paperswithcode.com/paper/capsule-networks-for-brain-tumor |
Repo | |
Framework | |
Hyperdrive: A Multi-Chip Systolically Scalable Binary-Weight CNN Inference Engine
Title | Hyperdrive: A Multi-Chip Systolically Scalable Binary-Weight CNN Inference Engine |
Authors | Renzo Andri, Lukas Cavigelli, Davide Rossi, Luca Benini |
Abstract | Deep neural networks have achieved impressive results in computer vision and machine learning. Unfortunately, state-of-the-art networks are extremely compute and memory intensive which makes them unsuitable for mW-devices such as IoT end-nodes. Aggressive quantization of these networks dramatically reduces the computation and memory footprint. Binary-weight neural networks (BWNs) follow this trend, pushing weight quantization to the limit. Hardware accelerators for BWNs presented up to now have focused on core efficiency, disregarding I/O bandwidth and system-level efficiency that are crucial for deployment of accelerators in ultra-low power devices. We present Hyperdrive: a BWN accelerator dramatically reducing the I/O bandwidth exploiting a novel binary-weight streaming approach, which can be used for arbitrarily sized convolutional neural network architecture and input resolution by exploiting the natural scalability of the compute units both at chip-level and system-level by arranging Hyperdrive chips systolically in a 2D mesh while processing the entire feature map together in parallel. Hyperdrive achieves 4.3 TOp/s/W system-level efficiency (i.e., including I/Os)—3.1x higher than state-of-the-art BWN accelerators, even if its core uses resource-intensive FP16 arithmetic for increased robustness. |
Tasks | Quantization |
Published | 2018-03-05 |
URL | http://arxiv.org/abs/1804.00623v3 |
http://arxiv.org/pdf/1804.00623v3.pdf | |
PWC | https://paperswithcode.com/paper/hyperdrive-a-systolically-scalable-binary |
Repo | |
Framework | |
Neural Latent Relational Analysis to Capture Lexical Semantic Relations in a Vector Space
Title | Neural Latent Relational Analysis to Capture Lexical Semantic Relations in a Vector Space |
Authors | Koki Washio, Tsuneaki Kato |
Abstract | Capturing the semantic relations of words in a vector space contributes to many natural language processing tasks. One promising approach exploits lexico-syntactic patterns as features of word pairs. In this paper, we propose a novel model of this pattern-based approach, neural latent relational analysis (NLRA). NLRA can generalize co-occurrences of word pairs and lexico-syntactic patterns, and obtain embeddings of the word pairs that do not co-occur. This overcomes the critical data sparseness problem encountered in previous pattern-based models. Our experimental results on measuring relational similarity demonstrate that NLRA outperforms the previous pattern-based models. In addition, when combined with a vector offset model, NLRA achieves a performance comparable to that of the state-of-the-art model that exploits additional semantic relational data. |
Tasks | |
Published | 2018-09-10 |
URL | http://arxiv.org/abs/1809.03401v1 |
http://arxiv.org/pdf/1809.03401v1.pdf | |
PWC | https://paperswithcode.com/paper/neural-latent-relational-analysis-to-capture |
Repo | |
Framework | |
Understanding Recurrent Neural Architectures by Analyzing and Synthesizing Long Distance Dependencies in Benchmark Sequential Datasets
Title | Understanding Recurrent Neural Architectures by Analyzing and Synthesizing Long Distance Dependencies in Benchmark Sequential Datasets |
Authors | Abhijit Mahalunkar, John D. Kelleher |
Abstract | In order to build efficient deep recurrent neural architectures, it isessential to analyze the complexity of long distance dependencies(LDDs) of the dataset being modeled. In this context, in this pa-per, we present detailed analysis of the complexity and the degreeof LDDs (orLDD characteristics) exhibited by various sequentialbenchmark datasets. We observe that the datasets sampled from asimilar process or task (e.g. natural language, or sequential MNIST,etc) display similar LDD characteristics. Upon analysing the LDDcharacteristics, we were able to analyze the factors influencingthem; such as (i) number of unique symbols in a dataset, (ii) sizeof the dataset, (iii) number of interacting symbols within a givenLDD, and (iv) the distance between the interacting symbols. Wedemonstrate that analysing LDD characteristics can inform theselection of optimal hyper-parameters for SOTA deep recurrentneural architectures. This analysis can directly contribute to thedevelopment of more accurate and efficient sequential models. Wealso introduce the use of Strictlyk-Piecewise languages as a pro-cess to generate synthesized datasets for language modelling. Theadvantage of these synthesized datasets is that they enable targetedtesting of deep recurrent neural architectures in terms of their abil-ity to model LDDs with different characteristics. Moreover, usinga variety of Strictlyk-Piecewise languages we generate a numberof new benchmarking datasets, and analyse the performance of anumber of SOTA recurrent architectures on these new benchmarks. |
Tasks | Language Modelling |
Published | 2018-10-06 |
URL | https://arxiv.org/abs/1810.02966v3 |
https://arxiv.org/pdf/1810.02966v3.pdf | |
PWC | https://paperswithcode.com/paper/understanding-recurrent-neural-architectures |
Repo | |
Framework | |
Solving for multi-class using orthogonal coding matrices
Title | Solving for multi-class using orthogonal coding matrices |
Authors | Peter Mills |
Abstract | A common method of generalizing binary to multi-class classification is the error correcting code (ECC). ECCs may be optimized in a number of ways, for instance by making them orthogonal. Here we test two types of orthogonal ECCs on seven different datasets using three types of binary classifier and compare them with three other multi-class methods: 1 vs. 1, one-versus-the-rest and random ECCs. The first type of orthogonal ECC, in which the codes contain no zeros, admits a fast and simple method of solving for the probabilities. Orthogonal ECCs are always more accurate than random ECCs as predicted by recent literature. Improvments in uncertainty coefficient (U.C.) range between 0.4–17.5% (0.004–0.139, absolute), while improvements in Brier score between 0.7–10.7%. Unfortunately, orthogonal ECCs are rarely more accurate than 1 vs. 1. Disparities are worst when the methods are paired with logistic regression, with orthogonal ECCs never beating 1 vs. 1. When the methods are paired with SVM, the losses are less significant, peaking at 1.5%, relative, 0.011 absolute in uncertainty coefficient and 6.5% in Brier scores. Orthogonal ECCs are always the fastest of the five multi-class methods when paired with linear classifiers. When paired with a piecewise linear classifier, whose classification speed does not depend on the number of training samples, classifications using orthogonal ECCs were always more accurate than the the remaining three methods and also faster than 1 vs. 1. Losses against 1 vs. 1 here were higher, peaking at 1.9% (0.017, absolute), in U.C. and 39% in Brier score. Gains in speed ranged between 1.1% and over 100%. Whether the speed increase is worth the penalty in accuracy will depend on the application. |
Tasks | Calibration |
Published | 2018-01-27 |
URL | https://arxiv.org/abs/1801.09055v5 |
https://arxiv.org/pdf/1801.09055v5.pdf | |
PWC | https://paperswithcode.com/paper/solving-for-multi-class-using-orthogonal |
Repo | |
Framework | |