Paper Group ANR 394
Convolutional Dictionary Learning: A Comparative Review and New Algorithms. Diff-DAC: Distributed Actor-Critic for Average Multitask Deep Reinforcement Learning. The power of deeper networks for expressing natural functions. A learning problem that is independent of the set theory ZFC axioms. Comparing Rule-Based and Deep Learning Models for Patien …
Convolutional Dictionary Learning: A Comparative Review and New Algorithms
Title | Convolutional Dictionary Learning: A Comparative Review and New Algorithms |
Authors | Cristina Garcia-Cardona, Brendt Wohlberg |
Abstract | Convolutional sparse representations are a form of sparse representation with a dictionary that has a structure that is equivalent to convolution with a set of linear filters. While effective algorithms have recently been developed for the convolutional sparse coding problem, the corresponding dictionary learning problem is substantially more challenging. Furthermore, although a number of different approaches have been proposed, the absence of thorough comparisons between them makes it difficult to determine which of them represents the current state of the art. The present work both addresses this deficiency and proposes some new approaches that outperform existing ones in certain contexts. A thorough set of performance comparisons indicates a very wide range of performance differences among the existing and proposed methods, and clearly identifies those that are the most effective. |
Tasks | Dictionary Learning |
Published | 2017-09-09 |
URL | http://arxiv.org/abs/1709.02893v5 |
http://arxiv.org/pdf/1709.02893v5.pdf | |
PWC | https://paperswithcode.com/paper/convolutional-dictionary-learning-a |
Repo | |
Framework | |
Diff-DAC: Distributed Actor-Critic for Average Multitask Deep Reinforcement Learning
Title | Diff-DAC: Distributed Actor-Critic for Average Multitask Deep Reinforcement Learning |
Authors | Sergio Valcarcel Macua, Aleksi Tukiainen, Daniel García-Ocaña Hernández, David Baldazo, Enrique Munoz de Cote, Santiago Zazo |
Abstract | We propose a fully distributed actor-critic algorithm approximated by deep neural networks, named \textit{Diff-DAC}, with application to single-task and to average multitask reinforcement learning (MRL). Each agent has access to data from its local task only, but it aims to learn a policy that performs well on average for the whole set of tasks. During the learning process, agents communicate their value-policy parameters to their neighbors, diffusing the information across the network, so that they converge to a common policy, with no need for a central node. The method is scalable, since the computational and communication costs per agent grow with its number of neighbors. We derive Diff-DAC’s from duality theory and provide novel insights into the standard actor-critic framework, showing that it is actually an instance of the dual ascent method that approximates the solution of a linear program. Experiments suggest that Diff-DAC can outperform the single previous distributed MRL approach (i.e., Dist-MTLPS) and even the centralized architecture. |
Tasks | |
Published | 2017-10-28 |
URL | https://arxiv.org/abs/1710.10363v5 |
https://arxiv.org/pdf/1710.10363v5.pdf | |
PWC | https://paperswithcode.com/paper/diff-dac-distributed-actor-critic-for-average |
Repo | |
Framework | |
The power of deeper networks for expressing natural functions
Title | The power of deeper networks for expressing natural functions |
Authors | David Rolnick, Max Tegmark |
Abstract | It is well-known that neural networks are universal approximators, but that deeper networks tend in practice to be more powerful than shallower ones. We shed light on this by proving that the total number of neurons $m$ required to approximate natural classes of multivariate polynomials of $n$ variables grows only linearly with $n$ for deep neural networks, but grows exponentially when merely a single hidden layer is allowed. We also provide evidence that when the number of hidden layers is increased from $1$ to $k$, the neuron requirement grows exponentially not with $n$ but with $n^{1/k}$, suggesting that the minimum number of layers required for practical expressibility grows only logarithmically with $n$. |
Tasks | |
Published | 2017-05-16 |
URL | http://arxiv.org/abs/1705.05502v2 |
http://arxiv.org/pdf/1705.05502v2.pdf | |
PWC | https://paperswithcode.com/paper/the-power-of-deeper-networks-for-expressing |
Repo | |
Framework | |
A learning problem that is independent of the set theory ZFC axioms
Title | A learning problem that is independent of the set theory ZFC axioms |
Authors | Shai Ben-David, Pavel Hrubes, Shay Moran, Amir Shpilka, Amir Yehudayoff |
Abstract | We consider the following statistical estimation problem: given a family F of real valued functions over some domain X and an i.i.d. sample drawn from an unknown distribution P over X, find h in F such that the expectation of h w.r.t. P is probably approximately equal to the supremum over expectations on members of F. This Expectation Maximization (EMX) problem captures many well studied learning problems; in fact, it is equivalent to Vapnik’s general setting of learning. Surprisingly, we show that the EMX learnability, as well as the learning rates of some basic class F, depend on the cardinality of the continuum and is therefore independent of the set theory ZFC axioms (that are widely accepted as a formalization of the notion of a mathematical proof). We focus on the case where the functions in F are Boolean, which generalizes classification problems. We study the interaction between the statistical sample complexity of F and its combinatorial structure. We introduce a new version of sample compression schemes and show that it characterizes EMX learnability for a wide family of classes. However, we show that for the class of finite subsets of the real line, the existence of such compression schemes is independent of set theory. We conclude that the learnability of that class with respect to the family of probability distributions of countable support is independent of the set theory ZFC axioms. We also explore the existence of a “VC-dimension-like” parameter that captures learnability in this setting. Our results imply that that there exist no “finitary” combinatorial parameter that characterizes EMX learnability in a way similar to the VC-dimension based characterization of binary valued classification problems. |
Tasks | |
Published | 2017-11-14 |
URL | http://arxiv.org/abs/1711.05195v1 |
http://arxiv.org/pdf/1711.05195v1.pdf | |
PWC | https://paperswithcode.com/paper/a-learning-problem-that-is-independent-of-the |
Repo | |
Framework | |
Comparing Rule-Based and Deep Learning Models for Patient Phenotyping
Title | Comparing Rule-Based and Deep Learning Models for Patient Phenotyping |
Authors | Sebastian Gehrmann, Franck Dernoncourt, Yeran Li, Eric T. Carlson, Joy T. Wu, Jonathan Welt, John Foote Jr., Edward T. Moseley, David W. Grant, Patrick D. Tyler, Leo Anthony Celi |
Abstract | Objective: We investigate whether deep learning techniques for natural language processing (NLP) can be used efficiently for patient phenotyping. Patient phenotyping is a classification task for determining whether a patient has a medical condition, and is a crucial part of secondary analysis of healthcare data. We assess the performance of deep learning algorithms and compare them with classical NLP approaches. Materials and Methods: We compare convolutional neural networks (CNNs), n-gram models, and approaches based on cTAKES that extract pre-defined medical concepts from clinical notes and use them to predict patient phenotypes. The performance is tested on 10 different phenotyping tasks using 1,610 discharge summaries extracted from the MIMIC-III database. Results: CNNs outperform other phenotyping algorithms in all 10 tasks. The average F1-score of our model is 76 (PPV of 83, and sensitivity of 71) with our model having an F1-score up to 37 points higher than alternative approaches. We additionally assess the interpretability of our model by presenting a method that extracts the most salient phrases for a particular prediction. Conclusion: We show that NLP methods based on deep learning improve the performance of patient phenotyping. Our CNN-based algorithm automatically learns the phrases associated with each patient phenotype. As such, it reduces the annotation complexity for clinical domain experts, who are normally required to develop task-specific annotation rules and identify relevant phrases. Our method performs well in terms of both performance and interpretability, which indicates that deep learning is an effective approach to patient phenotyping based on clinicians’ notes. |
Tasks | |
Published | 2017-03-25 |
URL | http://arxiv.org/abs/1703.08705v1 |
http://arxiv.org/pdf/1703.08705v1.pdf | |
PWC | https://paperswithcode.com/paper/comparing-rule-based-and-deep-learning-models |
Repo | |
Framework | |
An Empirical Analysis of Approximation Algorithms for the Euclidean Traveling Salesman Problem
Title | An Empirical Analysis of Approximation Algorithms for the Euclidean Traveling Salesman Problem |
Authors | Yihui He, Ming Xiang |
Abstract | With applications to many disciplines, the traveling salesman problem (TSP) is a classical computer science optimization problem with applications to industrial engineering, theoretical computer science, bioinformatics, and several other disciplines. In recent years, there have been a plethora of novel approaches for approximate solutions ranging from simplistic greedy to cooperative distributed algorithms derived from artificial intelligence. In this paper, we perform an evaluation and analysis of cornerstone algorithms for the Euclidean TSP. We evaluate greedy, 2-opt, and genetic algorithms. We use several datasets as input for the algorithms including a small dataset, a mediumsized dataset representing cities in the United States, and a synthetic dataset consisting of 200 cities to test algorithm scalability. We discover that the greedy and 2-opt algorithms efficiently calculate solutions for smaller datasets. Genetic algorithm has the best performance for optimality for medium to large datasets, but generally have longer runtime. Our implementations is public available. |
Tasks | |
Published | 2017-05-25 |
URL | http://arxiv.org/abs/1705.09058v1 |
http://arxiv.org/pdf/1705.09058v1.pdf | |
PWC | https://paperswithcode.com/paper/an-empirical-analysis-of-approximation |
Repo | |
Framework | |
Fast amortized inference of neural activity from calcium imaging data with variational autoencoders
Title | Fast amortized inference of neural activity from calcium imaging data with variational autoencoders |
Authors | Artur Speiser, Jinyao Yan, Evan Archer, Lars Buesing, Srinivas C. Turaga, Jakob H. Macke |
Abstract | Calcium imaging permits optical measurement of neural activity. Since intracellular calcium concentration is an indirect measurement of neural activity, computational tools are necessary to infer the true underlying spiking activity from fluorescence measurements. Bayesian model inversion can be used to solve this problem, but typically requires either computationally expensive MCMC sampling, or faster but approximate maximum-a-posteriori optimization. Here, we introduce a flexible algorithmic framework for fast, efficient and accurate extraction of neural spikes from imaging data. Using the framework of variational autoencoders, we propose to amortize inference by training a deep neural network to perform model inversion efficiently. The recognition network is trained to produce samples from the posterior distribution over spike trains. Once trained, performing inference amounts to a fast single forward pass through the network, without the need for iterative optimization or sampling. We show that amortization can be applied flexibly to a wide range of nonlinear generative models and significantly improves upon the state of the art in computation time, while achieving competitive accuracy. Our framework is also able to represent posterior distributions over spike-trains. We demonstrate the generality of our method by proposing the first probabilistic approach for separating backpropagating action potentials from putative synaptic inputs in calcium imaging of dendritic spines. |
Tasks | |
Published | 2017-11-06 |
URL | http://arxiv.org/abs/1711.01846v1 |
http://arxiv.org/pdf/1711.01846v1.pdf | |
PWC | https://paperswithcode.com/paper/fast-amortized-inference-of-neural-activity |
Repo | |
Framework | |
Network Analysis for Explanation
Title | Network Analysis for Explanation |
Authors | Hiroshi Kuwajima, Masayuki Tanaka |
Abstract | Safety critical systems strongly require the quality aspects of artificial intelligence including explainability. In this paper, we analyzed a trained network to extract features which mainly contribute the inference. Based on the analysis, we developed a simple solution to generate explanations of the inference processes. |
Tasks | |
Published | 2017-12-07 |
URL | http://arxiv.org/abs/1712.02890v1 |
http://arxiv.org/pdf/1712.02890v1.pdf | |
PWC | https://paperswithcode.com/paper/network-analysis-for-explanation |
Repo | |
Framework | |
Ethical Considerations in Artificial Intelligence Courses
Title | Ethical Considerations in Artificial Intelligence Courses |
Authors | Emanuelle Burton, Judy Goldsmith, Sven Koenig, Benjamin Kuipers, Nicholas Mattei, Toby Walsh |
Abstract | The recent surge in interest in ethics in artificial intelligence may leave many educators wondering how to address moral, ethical, and philosophical issues in their AI courses. As instructors we want to develop curriculum that not only prepares students to be artificial intelligence practitioners, but also to understand the moral, ethical, and philosophical impacts that artificial intelligence will have on society. In this article we provide practical case studies and links to resources for use by AI educators. We also provide concrete suggestions on how to integrate AI ethics into a general artificial intelligence course and how to teach a stand-alone artificial intelligence ethics course. |
Tasks | |
Published | 2017-01-26 |
URL | http://arxiv.org/abs/1701.07769v1 |
http://arxiv.org/pdf/1701.07769v1.pdf | |
PWC | https://paperswithcode.com/paper/ethical-considerations-in-artificial |
Repo | |
Framework | |
An Empirical Study of Mini-Batch Creation Strategies for Neural Machine Translation
Title | An Empirical Study of Mini-Batch Creation Strategies for Neural Machine Translation |
Authors | Makoto Morishita, Yusuke Oda, Graham Neubig, Koichiro Yoshino, Katsuhito Sudoh, Satoshi Nakamura |
Abstract | Training of neural machine translation (NMT) models usually uses mini-batches for efficiency purposes. During the mini-batched training process, it is necessary to pad shorter sentences in a mini-batch to be equal in length to the longest sentence therein for efficient computation. Previous work has noted that sorting the corpus based on the sentence length before making mini-batches reduces the amount of padding and increases the processing speed. However, despite the fact that mini-batch creation is an essential step in NMT training, widely used NMT toolkits implement disparate strategies for doing so, which have not been empirically validated or compared. This work investigates mini-batch creation strategies with experiments over two different datasets. Our results suggest that the choice of a mini-batch creation strategy has a large effect on NMT training and some length-based sorting strategies do not always work well compared with simple shuffling. |
Tasks | Machine Translation |
Published | 2017-06-19 |
URL | http://arxiv.org/abs/1706.05765v1 |
http://arxiv.org/pdf/1706.05765v1.pdf | |
PWC | https://paperswithcode.com/paper/an-empirical-study-of-mini-batch-creation |
Repo | |
Framework | |
IOD-CNN: Integrating Object Detection Networks for Event Recognition
Title | IOD-CNN: Integrating Object Detection Networks for Event Recognition |
Authors | Sungmin Eum, Hyungtae Lee, Heesung Kwon, David Doermann |
Abstract | Many previous methods have showed the importance of considering semantically relevant objects for performing event recognition, yet none of the methods have exploited the power of deep convolutional neural networks to directly integrate relevant object information into a unified network. We present a novel unified deep CNN architecture which integrates architecturally different, yet semantically-related object detection networks to enhance the performance of the event recognition task. Our architecture allows the sharing of the convolutional layers and a fully connected layer which effectively integrates event recognition, rigid object detection and non-rigid object detection. |
Tasks | Object Detection |
Published | 2017-03-21 |
URL | http://arxiv.org/abs/1703.07431v1 |
http://arxiv.org/pdf/1703.07431v1.pdf | |
PWC | https://paperswithcode.com/paper/iod-cnn-integrating-object-detection-networks |
Repo | |
Framework | |
Network Sketching: Exploiting Binary Structure in Deep CNNs
Title | Network Sketching: Exploiting Binary Structure in Deep CNNs |
Authors | Yiwen Guo, Anbang Yao, Hao Zhao, Yurong Chen |
Abstract | Convolutional neural networks (CNNs) with deep architectures have substantially advanced the state-of-the-art in computer vision tasks. However, deep networks are typically resource-intensive and thus difficult to be deployed on mobile devices. Recently, CNNs with binary weights have shown compelling efficiency to the community, whereas the accuracy of such models is usually unsatisfactory in practice. In this paper, we introduce network sketching as a novel technique of pursuing binary-weight CNNs, targeting at more faithful inference and better trade-off for practical applications. Our basic idea is to exploit binary structure directly in pre-trained filter banks and produce binary-weight models via tensor expansion. The whole process can be treated as a coarse-to-fine model approximation, akin to the pencil drawing steps of outlining and shading. To further speedup the generated models, namely the sketches, we also propose an associative implementation of binary tensor convolutions. Experimental results demonstrate that a proper sketch of AlexNet (or ResNet) outperforms the existing binary-weight models by large margins on the ImageNet large scale classification task, while the committed memory for network parameters only exceeds a little. |
Tasks | |
Published | 2017-06-07 |
URL | http://arxiv.org/abs/1706.02021v1 |
http://arxiv.org/pdf/1706.02021v1.pdf | |
PWC | https://paperswithcode.com/paper/network-sketching-exploiting-binary-structure |
Repo | |
Framework | |
An Information-Theoretic Optimality Principle for Deep Reinforcement Learning
Title | An Information-Theoretic Optimality Principle for Deep Reinforcement Learning |
Authors | Felix Leibfried, Jordi Grau-Moya, Haitham Bou-Ammar |
Abstract | We methodologically address the problem of Q-value overestimation in deep reinforcement learning to handle high-dimensional state spaces efficiently. By adapting concepts from information theory, we introduce an intrinsic penalty signal encouraging reduced Q-value estimates. The resultant algorithm encompasses a wide range of learning outcomes containing deep Q-networks as a special case. Different learning outcomes can be demonstrated by tuning a Lagrange multiplier accordingly. We furthermore propose a novel scheduling scheme for this Lagrange multiplier to ensure efficient and robust learning. In experiments on Atari, our algorithm outperforms other algorithms (e.g. deep and double deep Q-networks) in terms of both game-play performance and sample complexity. These results remain valid under the recently proposed dueling architecture. |
Tasks | |
Published | 2017-08-06 |
URL | http://arxiv.org/abs/1708.01867v5 |
http://arxiv.org/pdf/1708.01867v5.pdf | |
PWC | https://paperswithcode.com/paper/an-information-theoretic-optimality-principle |
Repo | |
Framework | |
Knowledge-Guided Recurrent Neural Network Learning for Task-Oriented Action Prediction
Title | Knowledge-Guided Recurrent Neural Network Learning for Task-Oriented Action Prediction |
Authors | Liang Lin, Lili Huang, Tianshui Chen, Yukang Gan, Hui Cheng |
Abstract | This paper aims at task-oriented action prediction, i.e., predicting a sequence of actions towards accomplishing a specific task under a certain scene, which is a new problem in computer vision research. The main challenges lie in how to model task-specific knowledge and integrate it in the learning procedure. In this work, we propose to train a recurrent long-short term memory (LSTM) network for handling this problem, i.e., taking a scene image (including pre-located objects) and the specified task as input and recurrently predicting action sequences. However, training such a network usually requires large amounts of annotated samples for covering the semantic space (e.g., diverse action decomposition and ordering). To alleviate this issue, we introduce a temporal And-Or graph (AOG) for task description, which hierarchically represents a task into atomic actions. With this AOG representation, we can produce many valid samples (i.e., action sequences according with common sense) by training another auxiliary LSTM network with a small set of annotated samples. And these generated samples (i.e., task-oriented action sequences) effectively facilitate training the model for task-oriented action prediction. In the experiments, we create a new dataset containing diverse daily tasks and extensively evaluate the effectiveness of our approach. |
Tasks | Common Sense Reasoning |
Published | 2017-07-15 |
URL | http://arxiv.org/abs/1707.04677v1 |
http://arxiv.org/pdf/1707.04677v1.pdf | |
PWC | https://paperswithcode.com/paper/knowledge-guided-recurrent-neural-network |
Repo | |
Framework | |
Domain Adaptation for Resume Classification Using Convolutional Neural Networks
Title | Domain Adaptation for Resume Classification Using Convolutional Neural Networks |
Authors | Luiza Sayfullina, Eric Malmi, Yiping Liao, Alex Jung |
Abstract | We propose a novel method for classifying resume data of job applicants into 27 different job categories using convolutional neural networks. Since resume data is costly and hard to obtain due to its sensitive nature, we use domain adaptation. In particular, we train a classifier on a large number of freely available job description snippets and then use it to classify resume data. We empirically verify a reasonable classification performance of our approach despite having only a small amount of labeled resume data available. |
Tasks | Domain Adaptation |
Published | 2017-07-18 |
URL | http://arxiv.org/abs/1707.05576v1 |
http://arxiv.org/pdf/1707.05576v1.pdf | |
PWC | https://paperswithcode.com/paper/domain-adaptation-for-resume-classification |
Repo | |
Framework | |