Paper Group ANR 1062
Cautious Deep Learning. A Variational Dirichlet Framework for Out-of-Distribution Detection. Improving the Expressiveness of Deep Learning Frameworks with Recursion. Deep Gaussian Processes with Convolutional Kernels. How game complexity affects the playing behavior of synthetic agents. ReMotENet: Efficient Relevant Motion Event Detection for Large …
Cautious Deep Learning
Title | Cautious Deep Learning |
Authors | Yotam Hechtlinger, Barnabás Póczos, Larry Wasserman |
Abstract | Most classifiers operate by selecting the maximum of an estimate of the conditional distribution $p(yx)$ where $x$ stands for the features of the instance to be classified and $y$ denotes its label. This often results in a {\em hubristic bias}: overconfidence in the assignment of a definite label. Usually, the observations are concentrated on a small volume but the classifier provides definite predictions for the entire space. We propose constructing conformal prediction sets which contain a set of labels rather than a single label. These conformal prediction sets contain the true label with probability $1-\alpha$. Our construction is based on $p(xy)$ rather than $p(yx)$ which results in a classifier that is very cautious: it outputs the null set — meaning “I don’t know” — when the object does not resemble the training examples. An important property of our approach is that adversarial attacks are likely to be predicted as the null set or would also include the true label. We demonstrate the performance on the ImageNet ILSVRC dataset and the CelebA and IMDB-Wiki facial datasets using high dimensional features obtained from state of the art convolutional neural networks. |
Tasks | |
Published | 2018-05-24 |
URL | http://arxiv.org/abs/1805.09460v2 |
http://arxiv.org/pdf/1805.09460v2.pdf | |
PWC | https://paperswithcode.com/paper/cautious-deep-learning |
Repo | |
Framework | |
A Variational Dirichlet Framework for Out-of-Distribution Detection
Title | A Variational Dirichlet Framework for Out-of-Distribution Detection |
Authors | Wenhu Chen, Yilin Shen, Hongxia Jin, William Wang |
Abstract | With the recently rapid development in deep learning, deep neural networks have been widely adopted in many real-life applications. However, deep neural networks are also known to have very little control over its uncertainty for unseen examples, which potentially causes very harmful and annoying consequences in practical scenarios. In this paper, we are particularly interested in designing a higher-order uncertainty metric for deep neural networks and investigate its effectiveness under the out-of-distribution detection task proposed by~\cite{hendrycks2016baseline}. Our method first assumes there exists an underlying higher-order distribution $\mathbb{P}(z)$, which controls label-wise categorical distribution $\mathbb{P}(y)$ over classes on the K-dimension simplex, and then approximate such higher-order distribution via parameterized posterior function $p_{\theta}(zx)$ under variational inference framework, finally we use the entropy of learned posterior distribution $p_{\theta}(zx)$ as uncertainty measure to detect out-of-distribution examples. Further, we propose an auxiliary objective function to discriminate against synthesized adversarial examples to further increase the robustness of the proposed uncertainty measure. Through comprehensive experiments on various datasets, our proposed framework is demonstrated to consistently outperform competing algorithms. |
Tasks | Out-of-Distribution Detection |
Published | 2018-11-18 |
URL | http://arxiv.org/abs/1811.07308v4 |
http://arxiv.org/pdf/1811.07308v4.pdf | |
PWC | https://paperswithcode.com/paper/a-variational-dirichlet-framework-for-out-of |
Repo | |
Framework | |
Improving the Expressiveness of Deep Learning Frameworks with Recursion
Title | Improving the Expressiveness of Deep Learning Frameworks with Recursion |
Authors | Eunji Jeong, Joo Seong Jeong, Soojeong Kim, Gyeong-In Yu, Byung-Gon Chun |
Abstract | Recursive neural networks have widely been used by researchers to handle applications with recursively or hierarchically structured data. However, embedded control flow deep learning frameworks such as TensorFlow, Theano, Caffe2, and MXNet fail to efficiently represent and execute such neural networks, due to lack of support for recursion. In this paper, we add recursion to the programming model of existing frameworks by complementing their design with recursive execution of dataflow graphs as well as additional APIs for recursive definitions. Unlike iterative implementations, which can only understand the topological index of each node in recursive data structures, our recursive implementation is able to exploit the recursive relationships between nodes for efficient execution based on parallel computation. We present an implementation on TensorFlow and evaluation results with various recursive neural network models, showing that our recursive implementation not only conveys the recursive nature of recursive neural networks better than other implementations, but also uses given resources more effectively to reduce training and inference time. |
Tasks | |
Published | 2018-09-04 |
URL | http://arxiv.org/abs/1809.00832v1 |
http://arxiv.org/pdf/1809.00832v1.pdf | |
PWC | https://paperswithcode.com/paper/improving-the-expressiveness-of-deep-learning |
Repo | |
Framework | |
Deep Gaussian Processes with Convolutional Kernels
Title | Deep Gaussian Processes with Convolutional Kernels |
Authors | Vinayak Kumar, Vaibhav Singh, P. K. Srijith, Andreas Damianou |
Abstract | Deep Gaussian processes (DGPs) provide a Bayesian non-parametric alternative to standard parametric deep learning models. A DGP is formed by stacking multiple GPs resulting in a well-regularized composition of functions. The Bayesian framework that equips the model with attractive properties, such as implicit capacity control and predictive uncertainty, makes it at the same time challenging to combine with a convolutional structure. This has hindered the application of DGPs in computer vision tasks, an area where deep parametric models (i.e. CNNs) have made breakthroughs. Standard kernels used in DGPs such as radial basis functions (RBFs) are insufficient for handling pixel variability in raw images. In this paper, we build on the recent convolutional GP to develop Convolutional DGP (CDGP) models which effectively capture image level features through the use of convolution kernels, therefore opening up the way for applying DGPs to computer vision tasks. Our model learns local spatial influence and outperforms strong GP based baselines on multi-class image classification. We also consider various constructions of convolution kernel over the image patches, analyze the computational trade-offs and provide an efficient framework for convolutional DGP models. The experimental results on image data such as MNIST, rectangles-image, CIFAR10 and Caltech101 demonstrate the effectiveness of the proposed approaches. |
Tasks | Gaussian Processes, Image Classification |
Published | 2018-06-05 |
URL | http://arxiv.org/abs/1806.01655v1 |
http://arxiv.org/pdf/1806.01655v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-gaussian-processes-with-convolutional |
Repo | |
Framework | |
How game complexity affects the playing behavior of synthetic agents
Title | How game complexity affects the playing behavior of synthetic agents |
Authors | Chairi Kiourt, Dimitris Kalles, Panagiotis Kanellopoulos |
Abstract | Agent based simulation of social organizations, via the investigation of agents’ training and learning tactics and strategies, has been inspired by the ability of humans to learn from social environments which are rich in agents, interactions and partial or hidden information. Such richness is a source of complexity that an effective learner has to be able to navigate. This paper focuses on the investigation of the impact of the environmental complexity on the game playing-and-learning behavior of synthetic agents. We demonstrate our approach using two independent turn-based zero-sum games as the basis of forming social events which are characterized both by competition and cooperation. The paper’s key highlight is that as the complexity of a social environment changes, an effective player has to adapt its learning and playing profile to maintain a given performance profile |
Tasks | |
Published | 2018-07-07 |
URL | http://arxiv.org/abs/1807.02648v1 |
http://arxiv.org/pdf/1807.02648v1.pdf | |
PWC | https://paperswithcode.com/paper/how-game-complexity-affects-the-playing |
Repo | |
Framework | |
ReMotENet: Efficient Relevant Motion Event Detection for Large-scale Home Surveillance Videos
Title | ReMotENet: Efficient Relevant Motion Event Detection for Large-scale Home Surveillance Videos |
Authors | Ruichi Yu, Hongcheng Wang, Larry S. Davis |
Abstract | This paper addresses the problem of detecting relevant motion caused by objects of interest (e.g., person and vehicles) in large scale home surveillance videos. The traditional method usually consists of two separate steps, i.e., detecting moving objects with background subtraction running on the camera, and filtering out nuisance motion events (e.g., trees, cloud, shadow, rain/snow, flag) with deep learning based object detection and tracking running on cloud. The method is extremely slow and therefore not cost effective, and does not fully leverage the spatial-temporal redundancies with a pre-trained off-the-shelf object detector. To dramatically speedup relevant motion event detection and improve its performance, we propose a novel network for relevant motion event detection, ReMotENet, which is a unified, end-to-end data-driven method using spatial-temporal attention-based 3D ConvNets to jointly model the appearance and motion of objects-of-interest in a video. ReMotENet parses an entire video clip in one forward pass of a neural network to achieve significant speedup. Meanwhile, it exploits the properties of home surveillance videos, e.g., relevant motion is sparse both spatially and temporally, and enhances 3D ConvNets with a spatial-temporal attention model and reference-frame subtraction to encourage the network to focus on the relevant moving objects. Experiments demonstrate that our method can achieve comparable or event better performance than the object detection based method but with three to four orders of magnitude speedup (up to 20k times) on GPU devices. Our network is efficient, compact and light-weight. It can detect relevant motion on a 15s surveillance video clip within 4-8 milliseconds on a GPU and a fraction of second (0.17-0.39) on a CPU with a model size of less than 1MB. |
Tasks | Object Detection |
Published | 2018-01-06 |
URL | http://arxiv.org/abs/1801.02031v1 |
http://arxiv.org/pdf/1801.02031v1.pdf | |
PWC | https://paperswithcode.com/paper/remotenet-efficient-relevant-motion-event |
Repo | |
Framework | |
Quantum Structures in Human Decision-making: Towards Quantum Expected Utility
Title | Quantum Structures in Human Decision-making: Towards Quantum Expected Utility |
Authors | Sandro Sozzo |
Abstract | {\it Ellsberg thought experiments} and empirical confirmation of Ellsberg preferences pose serious challenges to {\it subjective expected utility theory} (SEUT). We have recently elaborated a quantum-theoretic framework for human decisions under uncertainty which satisfactorily copes with the Ellsberg paradox and other puzzles of SEUT. We apply here the quantum-theoretic framework to the {\it Ellsberg two-urn example}, showing that the paradox can be explained by assuming a state change of the conceptual entity that is the object of the decision ({\it decision-making}, or {\it DM}, {\it entity}) and representing subjective probabilities by quantum probabilities. We also model the empirical data we collected in a DM test on human participants within the theoretic framework above. The obtained results are relevant, as they provide a line to model real life, e.g., financial and medical, decisions that show the same empirical patterns as the two-urn experiment. |
Tasks | Decision Making |
Published | 2018-10-29 |
URL | https://arxiv.org/abs/1811.00875v1 |
https://arxiv.org/pdf/1811.00875v1.pdf | |
PWC | https://paperswithcode.com/paper/quantum-structures-in-human-decision-making |
Repo | |
Framework | |
AdapterNet - learning input transformation for domain adaptation
Title | AdapterNet - learning input transformation for domain adaptation |
Authors | Alon Hazan, Yoel Shoshan, Daniel Khapun, Roy Aladjem, Vadim Ratner |
Abstract | Deep neural networks have demonstrated impressive performance in various machine learning tasks. However, they are notoriously sensitive to changes in data distribution. Often, even a slight change in the distribution can lead to drastic performance reduction. Artificially augmenting the data may help to some extent, but in most cases, fails to achieve model invariance to the data distribution. Some examples where this sub-class of domain adaptation can be valuable are various imaging modalities such as thermal imaging, X-ray, ultrasound, and MRI, where changes in acquisition parameters or acquisition device manufacturer will result in a different representation of the same input. Our work shows that standard fine-tuning fails to adapt the model in certain important cases. We propose a novel method of adapting to a new data source, and demonstrate near perfect adaptation on a customized ImageNet benchmark. Moreover, our method does not require any samples from the original data set, it is completely explainable and can be tailored to the task. |
Tasks | Domain Adaptation |
Published | 2018-05-29 |
URL | http://arxiv.org/abs/1805.11601v2 |
http://arxiv.org/pdf/1805.11601v2.pdf | |
PWC | https://paperswithcode.com/paper/adapternet-learning-input-transformation-for |
Repo | |
Framework | |
Verb Argument Structure Alternations in Word and Sentence Embeddings
Title | Verb Argument Structure Alternations in Word and Sentence Embeddings |
Authors | Katharina Kann, Alex Warstadt, Adina Williams, Samuel R. Bowman |
Abstract | Verbs occur in different syntactic environments, or frames. We investigate whether artificial neural networks encode grammatical distinctions necessary for inferring the idiosyncratic frame-selectional properties of verbs. We introduce five datasets, collectively called FAVA, containing in aggregate nearly 10k sentences labeled for grammatical acceptability, illustrating different verbal argument structure alternations. We then test whether models can distinguish acceptable English verb-frame combinations from unacceptable ones using a sentence embedding alone. For converging evidence, we further construct LaVA, a corresponding word-level dataset, and investigate whether the same syntactic features can be extracted from word embeddings. Our models perform reliable classifications for some verbal alternations but not others, suggesting that while these representations do encode fine-grained lexical information, it is incomplete or can be hard to extract. Further, differences between the word- and sentence-level models show that some information present in word embeddings is not passed on to the down-stream sentence embeddings. |
Tasks | Sentence Embedding, Sentence Embeddings, Word Embeddings |
Published | 2018-11-27 |
URL | http://arxiv.org/abs/1811.10773v1 |
http://arxiv.org/pdf/1811.10773v1.pdf | |
PWC | https://paperswithcode.com/paper/verb-argument-structure-alternations-in-word |
Repo | |
Framework | |
Latent Geometry Inspired Graph Dissimilarities Enhance Affinity Propagation Community Detection in Complex Networks
Title | Latent Geometry Inspired Graph Dissimilarities Enhance Affinity Propagation Community Detection in Complex Networks |
Authors | Carlo Vittorio Cannistraci, Alessandro Muscoloni |
Abstract | Affinity propagation is one of the most effective unsupervised pattern recognition algorithms for data clustering in high-dimensional feature space. However, the numerous attempts to test its performance for community detection in complex networks have been attaining results very far from the state of the art methods such as Infomap and Louvain. Yet, all these studies agreed that the crucial problem is to convert the unweighted network topology in a ‘smart-enough’ node dissimilarity matrix that is able to properly address the message passing procedure behind affinity propagation clustering. Here we introduce a conceptual innovation and we discuss how to leverage network latent geometry notions in order to design dissimilarity matrices for affinity propagation community detection. Our results demonstrate that the latent geometry inspired dissimilarity measures we design bring affinity propagation to equal or outperform current state of the art methods for community detection. These findings are solidly proven considering both synthetic ‘realistic’ networks (with known ground-truth communities) and real networks (with community metadata), even when the data structure is corrupted by noise artificially induced by missing or spurious connectivity. |
Tasks | Community Detection |
Published | 2018-04-12 |
URL | http://arxiv.org/abs/1804.04566v2 |
http://arxiv.org/pdf/1804.04566v2.pdf | |
PWC | https://paperswithcode.com/paper/latent-geometry-inspired-graph |
Repo | |
Framework | |
Physics-Informed CoKriging: A Gaussian-Process-Regression-Based Multifidelity Method for Data-Model Convergence
Title | Physics-Informed CoKriging: A Gaussian-Process-Regression-Based Multifidelity Method for Data-Model Convergence |
Authors | Xiu Yang, David Barajas-Solano, Guzel Tartakovsky, Alexandre Tartakovsky |
Abstract | In this work, we propose a new Gaussian process regression (GPR)-based multifidelity method: physics-informed CoKriging (CoPhIK). In CoKriging-based multifidelity methods, the quantities of interest are modeled as linear combinations of multiple parameterized stationary Gaussian processes (GPs), and the hyperparameters of these GPs are estimated from data via optimization. In CoPhIK, we construct a GP representing low-fidelity data using physics-informed Kriging (PhIK), and model the discrepancy between low- and high-fidelity data using a parameterized GP with hyperparameters identified via optimization. Our approach reduces the cost of optimization for inferring hyperparameters by incorporating partial physical knowledge. We prove that the physical constraints in the form of deterministic linear operators are satisfied up to an error bound. Furthermore, we combine CoPhIK with a greedy active learning algorithm for guiding the selection of additional observation locations. The efficiency and accuracy of CoPhIK are demonstrated for reconstructing the partially observed modified Branin function, reconstructing the sparsely observed state of a steady state heat transport problem, and learning a conservative tracer distribution from sparse tracer concentration measurements. |
Tasks | Active Learning, Gaussian Processes |
Published | 2018-11-24 |
URL | http://arxiv.org/abs/1811.09757v1 |
http://arxiv.org/pdf/1811.09757v1.pdf | |
PWC | https://paperswithcode.com/paper/physics-informed-cokriging-a-gaussian-process |
Repo | |
Framework | |
Modeling Semantics with Gated Graph Neural Networks for Knowledge Base Question Answering
Title | Modeling Semantics with Gated Graph Neural Networks for Knowledge Base Question Answering |
Authors | Daniil Sorokin, Iryna Gurevych |
Abstract | The most approaches to Knowledge Base Question Answering are based on semantic parsing. In this paper, we address the problem of learning vector representations for complex semantic parses that consist of multiple entities and relations. Previous work largely focused on selecting the correct semantic relations for a question and disregarded the structure of the semantic parse: the connections between entities and the directions of the relations. We propose to use Gated Graph Neural Networks to encode the graph structure of the semantic parse. We show on two data sets that the graph networks outperform all baseline models that do not explicitly model the structure. The error analysis confirms that our approach can successfully process complex semantic parses. |
Tasks | Knowledge Base Question Answering, Question Answering, Semantic Parsing |
Published | 2018-08-13 |
URL | http://arxiv.org/abs/1808.04126v1 |
http://arxiv.org/pdf/1808.04126v1.pdf | |
PWC | https://paperswithcode.com/paper/modeling-semantics-with-gated-graph-neural |
Repo | |
Framework | |
Learning Domain-Sensitive and Sentiment-Aware Word Embeddings
Title | Learning Domain-Sensitive and Sentiment-Aware Word Embeddings |
Authors | Bei Shi, Zihao Fu, Lidong Bing, Wai Lam |
Abstract | Word embeddings have been widely used in sentiment classification because of their efficacy for semantic representations of words. Given reviews from different domains, some existing methods for word embeddings exploit sentiment information, but they cannot produce domain-sensitive embeddings. On the other hand, some other existing methods can generate domain-sensitive word embeddings, but they cannot distinguish words with similar contexts but opposite sentiment polarity. We propose a new method for learning domain-sensitive and sentiment-aware embeddings that simultaneously capture the information of sentiment semantics and domain sensitivity of individual words. Our method can automatically determine and produce domain-common embeddings and domain-specific embeddings. The differentiation of domain-common and domain-specific words enables the advantage of data augmentation of common semantics from multiple domains and capture the varied semantics of specific words from different domains at the same time. Experimental results show that our model provides an effective way to learn domain-sensitive and sentiment-aware word embeddings which benefit sentiment classification at both sentence level and lexicon term level. |
Tasks | Data Augmentation, Sentiment Analysis, Word Embeddings |
Published | 2018-05-10 |
URL | http://arxiv.org/abs/1805.03801v1 |
http://arxiv.org/pdf/1805.03801v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-domain-sensitive-and-sentiment-aware |
Repo | |
Framework | |
A Lifelong Learning Approach to Brain MR Segmentation Across Scanners and Protocols
Title | A Lifelong Learning Approach to Brain MR Segmentation Across Scanners and Protocols |
Authors | Neerav Karani, Krishna Chaitanya, Christian Baumgartner, Ender Konukoglu |
Abstract | Convolutional neural networks (CNNs) have shown promising results on several segmentation tasks in magnetic resonance (MR) images. However, the accuracy of CNNs may degrade severely when segmenting images acquired with different scanners and/or protocols as compared to the training data, thus limiting their practical utility. We address this shortcoming in a lifelong multi-domain learning setting by treating images acquired with different scanners or protocols as samples from different, but related domains. Our solution is a single CNN with shared convolutional filters and domain-specific batch normalization layers, which can be tuned to new domains with only a few ($\approx$ 4) labelled images. Importantly, this is achieved while retaining performance on the older domains whose training data may no longer be available. We evaluate the method for brain structure segmentation in MR images. Results demonstrate that the proposed method largely closes the gap to the benchmark, which is training a dedicated CNN for each scanner. |
Tasks | |
Published | 2018-05-25 |
URL | http://arxiv.org/abs/1805.10170v1 |
http://arxiv.org/pdf/1805.10170v1.pdf | |
PWC | https://paperswithcode.com/paper/a-lifelong-learning-approach-to-brain-mr |
Repo | |
Framework | |
Exploiting Sentence Embedding for Medical Question Answering
Title | Exploiting Sentence Embedding for Medical Question Answering |
Authors | Yu Hao, Xien Liu, Ji Wu, Ping Lv |
Abstract | Despite the great success of word embedding, sentence embedding remains a not-well-solved problem. In this paper, we present a supervised learning framework to exploit sentence embedding for the medical question answering task. The learning framework consists of two main parts: 1) a sentence embedding producing module, and 2) a scoring module. The former is developed with contextual self-attention and multi-scale techniques to encode a sentence into an embedding tensor. This module is shortly called Contextual self-Attention Multi-scale Sentence Embedding (CAMSE). The latter employs two scoring strategies: Semantic Matching Scoring (SMS) and Semantic Association Scoring (SAS). SMS measures similarity while SAS captures association between sentence pairs: a medical question concatenated with a candidate choice, and a piece of corresponding supportive evidence. The proposed framework is examined by two Medical Question Answering(MedicalQA) datasets which are collected from real-world applications: medical exam and clinical diagnosis based on electronic medical records (EMR). The comparison results show that our proposed framework achieved significant improvements compared to competitive baseline approaches. Additionally, a series of controlled experiments are also conducted to illustrate that the multi-scale strategy and the contextual self-attention layer play important roles for producing effective sentence embedding, and the two kinds of scoring strategies are highly complementary to each other for question answering problems. |
Tasks | Question Answering, Sentence Embedding |
Published | 2018-11-15 |
URL | http://arxiv.org/abs/1811.06156v1 |
http://arxiv.org/pdf/1811.06156v1.pdf | |
PWC | https://paperswithcode.com/paper/exploiting-sentence-embedding-for-medical |
Repo | |
Framework | |