July 27, 2019

3213 words 16 mins read

Paper Group ANR 726

Paper Group ANR 726

Modeling Information Flow Through Deep Neural Networks. Kernel Approximation Methods for Speech Recognition. Ultra-Fast Reactive Transport Simulations When Chemical Reactions Meet Machine Learning: Chemical Equilibrium. Multi-label Class-imbalanced Action Recognition in Hockey Videos via 3D Convolutional Neural Networks. Structure Optimization for …

Modeling Information Flow Through Deep Neural Networks

Title Modeling Information Flow Through Deep Neural Networks
Authors Ahmad Chaddad, Behnaz Naisiri, Marco Pedersoli, Eric Granger, Christian Desrosiers, Matthew Toews
Abstract This paper proposes a principled information theoretic analysis of classification for deep neural network structures, e.g. convolutional neural networks (CNN). The output of convolutional filters is modeled as a random variable Y conditioned on the object class C and network filter bank F. The conditional entropy (CENT) H(Y C,F) is shown in theory and experiments to be a highly compact and class-informative code, that can be computed from the filter outputs throughout an existing CNN and used to obtain higher classification results than the original CNN itself. Experiments demonstrate the effectiveness of CENT feature analysis in two separate CNN classification contexts. 1) In the classification of neurodegeneration due to Alzheimer’s disease (AD) and natural aging from 3D magnetic resonance image (MRI) volumes, 3 CENT features result in an AUC=94.6% for whole-brain AD classification, the highest reported accuracy on the public OASIS dataset used and 12% higher than the softmax output of the original CNN trained for the task. 2) In the context of visual object classification from 2D photographs, transfer learning based on a small set of CENT features identified throughout an existing CNN leads to AUC values comparable to the 1000-feature softmax output of the original network when classifying previously unseen object categories. The general information theoretical analysis explains various recent CNN design successes, e.g. densely connected CNN architectures, and provides insights for future research directions in deep learning.
Tasks Object Classification, Transfer Learning
Published 2017-11-29
URL http://arxiv.org/abs/1712.00003v1
PDF http://arxiv.org/pdf/1712.00003v1.pdf
PWC https://paperswithcode.com/paper/modeling-information-flow-through-deep-neural
Repo
Framework

Kernel Approximation Methods for Speech Recognition

Title Kernel Approximation Methods for Speech Recognition
Authors Avner May, Alireza Bagheri Garakani, Zhiyun Lu, Dong Guo, Kuan Liu, Aurélien Bellet, Linxi Fan, Michael Collins, Daniel Hsu, Brian Kingsbury, Michael Picheny, Fei Sha
Abstract We study large-scale kernel methods for acoustic modeling in speech recognition and compare their performance to deep neural networks (DNNs). We perform experiments on four speech recognition datasets, including the TIMIT and Broadcast News benchmark tasks, and compare these two types of models on frame-level performance metrics (accuracy, cross-entropy), as well as on recognition metrics (word/character error rate). In order to scale kernel methods to these large datasets, we use the random Fourier feature method of Rahimi and Recht (2007). We propose two novel techniques for improving the performance of kernel acoustic models. First, in order to reduce the number of random features required by kernel models, we propose a simple but effective method for feature selection. The method is able to explore a large number of non-linear features while maintaining a compact model more efficiently than existing approaches. Second, we present a number of frame-level metrics which correlate very strongly with recognition performance when computed on the heldout set; we take advantage of these correlations by monitoring these metrics during training in order to decide when to stop learning. This technique can noticeably improve the recognition performance of both DNN and kernel models, while narrowing the gap between them. Additionally, we show that the linear bottleneck method of Sainath et al. (2013) improves the performance of our kernel models significantly, in addition to speeding up training and making the models more compact. Together, these three methods dramatically improve the performance of kernel acoustic models, making their performance comparable to DNNs on the tasks we explored.
Tasks Feature Selection, Speech Recognition
Published 2017-01-13
URL http://arxiv.org/abs/1701.03577v1
PDF http://arxiv.org/pdf/1701.03577v1.pdf
PWC https://paperswithcode.com/paper/kernel-approximation-methods-for-speech
Repo
Framework

Ultra-Fast Reactive Transport Simulations When Chemical Reactions Meet Machine Learning: Chemical Equilibrium

Title Ultra-Fast Reactive Transport Simulations When Chemical Reactions Meet Machine Learning: Chemical Equilibrium
Authors Allan M. M. Leal, Dmitrii A. Kulik, Martin O. Saar
Abstract During reactive transport modeling, the computational cost associated with chemical reaction calculations is often 10-100 times higher than that of transport calculations. Most of these costs results from chemical equilibrium calculations that are performed at least once in every mesh cell and at every time step of the simulation. Calculating chemical equilibrium is an iterative process, where each iteration is in general so computationally expensive that even if every calculation converged in a single iteration, the resulting speedup would not be significant. Thus, rather than proposing a fast-converging numerical method for solving chemical equilibrium equations, we present a machine learning method that enables new equilibrium states to be quickly and accurately estimated, whenever a previous equilibrium calculation with similar input conditions has been performed. We demonstrate the use of this smart chemical equilibrium method in a reactive transport modeling example and show that, even at early simulation times, the majority of all equilibrium calculations are quickly predicted and, after some time steps, the machine-learning-accelerated chemical solver has been fully trained to rapidly perform all subsequent equilibrium calculations, resulting in speedups of almost two orders of magnitude. We remark that our new on-demand machine learning method can be applied to any case in which a massive number of sequential/parallel evaluations of a computationally expensive function $f$ needs to be done, $y=f(x)$. We remark, that, in contrast to traditional machine learning algorithms, our on-demand training approach does not require a statistics-based training phase before the actual simulation of interest commences. The introduced on-demand training scheme requires, however, the first-order derivatives $\partial f/\partial x$ for later smart predictions.
Tasks
Published 2017-08-16
URL http://arxiv.org/abs/1708.04825v1
PDF http://arxiv.org/pdf/1708.04825v1.pdf
PWC https://paperswithcode.com/paper/ultra-fast-reactive-transport-simulations
Repo
Framework

Multi-label Class-imbalanced Action Recognition in Hockey Videos via 3D Convolutional Neural Networks

Title Multi-label Class-imbalanced Action Recognition in Hockey Videos via 3D Convolutional Neural Networks
Authors Konstantin Sozykin, Stanislav Protasov, Adil Khan, Rasheed Hussain, Jooyoung Lee
Abstract Automatic analysis of the video is one of most complex problems in the fields of computer vision and machine learning. A significant part of this research deals with (human) activity recognition (HAR) since humans, and the activities that they perform, generate most of the video semantics. Video-based HAR has applications in various domains, but one of the most important and challenging is HAR in sports videos. Some of the major issues include high inter- and intra-class variations, large class imbalance, the presence of both group actions and single player actions, and recognizing simultaneous actions, i.e., the multi-label learning problem. Keeping in mind these challenges and the recent success of CNNs in solving various computer vision problems, in this work, we implement a 3D CNN based multi-label deep HAR system for multi-label class-imbalanced action recognition in hockey videos. We test our system for two different scenarios: an ensemble of $k$ binary networks vs. a single $k$-output network, on a publicly available dataset. We also compare our results with the system that was originally designed for the chosen dataset. Experimental results show that the proposed approach performs better than the existing solution.
Tasks Activity Recognition, Human Activity Recognition, Multi-Label Learning, Temporal Action Localization
Published 2017-09-05
URL http://arxiv.org/abs/1709.01421v2
PDF http://arxiv.org/pdf/1709.01421v2.pdf
PWC https://paperswithcode.com/paper/multi-label-class-imbalanced-action
Repo
Framework

Structure Optimization for Deep Multimodal Fusion Networks using Graph-Induced Kernels

Title Structure Optimization for Deep Multimodal Fusion Networks using Graph-Induced Kernels
Authors Dhanesh Ramachandram, Michal Lisicki, Timothy J. Shields, Mohamed R. Amer, Graham W. Taylor
Abstract A popular testbed for deep learning has been multimodal recognition of human activity or gesture involving diverse inputs such as video, audio, skeletal pose and depth images. Deep learning architectures have excelled on such problems due to their ability to combine modality representations at different levels of nonlinear feature extraction. However, designing an optimal architecture in which to fuse such learned representations has largely been a non-trivial human engineering effort. We treat fusion structure optimization as a hyper-parameter search and cast it as a discrete optimization problem under the Bayesian optimization framework. We propose a novel graph-induced kernel to compute structural similarities in the search space of tree-structured multimodal architectures and demonstrate its effectiveness using two challenging multimodal human activity recognition datasets.
Tasks Activity Recognition, Human Activity Recognition
Published 2017-07-03
URL http://arxiv.org/abs/1707.00750v1
PDF http://arxiv.org/pdf/1707.00750v1.pdf
PWC https://paperswithcode.com/paper/structure-optimization-for-deep-multimodal
Repo
Framework

Pose-conditioned Spatio-Temporal Attention for Human Action Recognition

Title Pose-conditioned Spatio-Temporal Attention for Human Action Recognition
Authors Fabien Baradel, Christian Wolf, Julien Mille
Abstract We address human action recognition from multi-modal video data involving articulated pose and RGB frames and propose a two-stream approach. The pose stream is processed with a convolutional model taking as input a 3D tensor holding data from a sub-sequence. A specific joint ordering, which respects the topology of the human body, ensures that different convolutional layers correspond to meaningful levels of abstraction. The raw RGB stream is handled by a spatio-temporal soft-attention mechanism conditioned on features from the pose network. An LSTM network receives input from a set of image locations at each instant. A trainable glimpse sensor extracts features on a set of predefined locations specified by the pose stream, namely the 4 hands of the two people involved in the activity. Appearance features give important cues on hand motion and on objects held in each hand. We show that it is of high interest to shift the attention to different hands at different time steps depending on the activity itself. Finally a temporal attention mechanism learns how to fuse LSTM features over time. We evaluate the method on 3 datasets. State-of-the-art results are achieved on the largest dataset for human activity recognition, namely NTU-RGB+D, as well as on the SBU Kinect Interaction dataset. Performance close to state-of-the-art is achieved on the smaller MSR Daily Activity 3D dataset.
Tasks Activity Recognition, Human Activity Recognition, Temporal Action Localization
Published 2017-03-29
URL http://arxiv.org/abs/1703.10106v2
PDF http://arxiv.org/pdf/1703.10106v2.pdf
PWC https://paperswithcode.com/paper/pose-conditioned-spatio-temporal-attention
Repo
Framework

Tensor SVD: Statistical and Computational Limits

Title Tensor SVD: Statistical and Computational Limits
Authors Anru Zhang, Dong Xia
Abstract In this paper, we propose a general framework for tensor singular value decomposition (tensor SVD), which focuses on the methodology and theory for extracting the hidden low-rank structure from high-dimensional tensor data. Comprehensive results are developed on both the statistical and computational limits for tensor SVD. This problem exhibits three different phases according to the signal-to-noise ratio (SNR). In particular, with strong SNR, we show that the classical higher-order orthogonal iteration achieves the minimax optimal rate of convergence in estimation; with weak SNR, the information-theoretical lower bound implies that it is impossible to have consistent estimation in general; with moderate SNR, we show that the non-convex maximum likelihood estimation provides optimal solution, but with NP-hard computational cost; moreover, under the hardness hypothesis of hypergraphic planted clique detection, there are no polynomial-time algorithms performing consistently in general.
Tasks
Published 2017-03-08
URL https://arxiv.org/abs/1703.02724v4
PDF https://arxiv.org/pdf/1703.02724v4.pdf
PWC https://paperswithcode.com/paper/tensor-svd-statistical-and-computational
Repo
Framework

Efficient Attention using a Fixed-Size Memory Representation

Title Efficient Attention using a Fixed-Size Memory Representation
Authors Denny Britz, Melody Y. Guan, Minh-Thang Luong
Abstract The standard content-based attention mechanism typically used in sequence-to-sequence models is computationally expensive as it requires the comparison of large encoder and decoder states at each time step. In this work, we propose an alternative attention mechanism based on a fixed size memory representation that is more efficient. Our technique predicts a compact set of K attention contexts during encoding and lets the decoder compute an efficient lookup that does not need to consult the memory. We show that our approach performs on-par with the standard attention mechanism while yielding inference speedups of 20% for real-world translation tasks and more for tasks with longer sequences. By visualizing attention scores we demonstrate that our models learn distinct, meaningful alignments.
Tasks
Published 2017-07-01
URL http://arxiv.org/abs/1707.00110v1
PDF http://arxiv.org/pdf/1707.00110v1.pdf
PWC https://paperswithcode.com/paper/efficient-attention-using-a-fixed-size-memory
Repo
Framework

Benford’s Law and First Letter of Word

Title Benford’s Law and First Letter of Word
Authors Xiaoyong Yan, Seong-Gyu Yang, Beom Jun Kim, Petter Minnhagen
Abstract A universal First-Letter Law (FLL) is derived and described. It predicts the percentages of first letters for words in novels. The FLL is akin to Benford’s law (BL) of first digits, which predicts the percentages of first digits in a data collection of numbers. Both are universal in the sense that FLL only depends on the numbers of letters in the alphabet, whereas BL only depends on the number of digits in the base of the number system. The existence of these types of universal laws appears counter-intuitive. Nonetheless both describe data very well. Relations to some earlier works are given. FLL predicts that an English author on the average starts about 16 out of 100 words with the English letter `t’. This is corroborated by data, yet an author can freely write anything. Fuller implications and the applicability of FLL remain for the future. |
Tasks
Published 2017-12-17
URL http://arxiv.org/abs/1712.06074v1
PDF http://arxiv.org/pdf/1712.06074v1.pdf
PWC https://paperswithcode.com/paper/benfords-law-and-first-letter-of-word
Repo
Framework

Introducing DeepBalance: Random Deep Belief Network Ensembles to Address Class Imbalance

Title Introducing DeepBalance: Random Deep Belief Network Ensembles to Address Class Imbalance
Authors Peter Xenopoulos
Abstract Class imbalance problems manifest in domains such as financial fraud detection or network intrusion analysis, where the prevalence of one class is much higher than another. Typically, practitioners are more interested in predicting the minority class than the majority class as the minority class may carry a higher misclassification cost. However, classifier performance deteriorates in the face of class imbalance as oftentimes classifiers may predict every point as the majority class. Methods for dealing with class imbalance include cost-sensitive learning or resampling techniques. In this paper, we introduce DeepBalance, an ensemble of deep belief networks trained with balanced bootstraps and random feature selection. We demonstrate that our proposed method outperforms baseline resampling methods such as SMOTE and under- and over-sampling in metrics such as AUC and sensitivity when applied to a highly imbalanced financial transaction data. Additionally, we explore performance and training time implications of various model parameters. Furthermore, we show that our model is easily parallelizable, which can reduce training times. Finally, we present an implementation of DeepBalance in R.
Tasks Feature Selection, Fraud Detection
Published 2017-09-28
URL http://arxiv.org/abs/1709.10056v2
PDF http://arxiv.org/pdf/1709.10056v2.pdf
PWC https://paperswithcode.com/paper/introducing-deepbalance-random-deep-belief
Repo
Framework

Capturing Reliable Fine-Grained Sentiment Associations by Crowdsourcing and Best-Worst Scaling

Title Capturing Reliable Fine-Grained Sentiment Associations by Crowdsourcing and Best-Worst Scaling
Authors Svetlana Kiritchenko, Saif M. Mohammad
Abstract Access to word-sentiment associations is useful for many applications, including sentiment analysis, stance detection, and linguistic analysis. However, manually assigning fine-grained sentiment association scores to words has many challenges with respect to keeping annotations consistent. We apply the annotation technique of Best-Worst Scaling to obtain real-valued sentiment association scores for words and phrases in three different domains: general English, English Twitter, and Arabic Twitter. We show that on all three domains the ranking of words by sentiment remains remarkably consistent even when the annotation process is repeated with a different set of annotators. We also, for the first time, determine the minimum difference in sentiment association that is perceptible to native speakers of a language.
Tasks Sentiment Analysis, Stance Detection
Published 2017-12-05
URL http://arxiv.org/abs/1712.01741v1
PDF http://arxiv.org/pdf/1712.01741v1.pdf
PWC https://paperswithcode.com/paper/capturing-reliable-fine-grained-sentiment
Repo
Framework

Low-memory GEMM-based convolution algorithms for deep neural networks

Title Low-memory GEMM-based convolution algorithms for deep neural networks
Authors Andrew Anderson, Aravind Vasudevan, Cormac Keane, David Gregg
Abstract Deep neural networks (DNNs) require very large amounts of computation both for training and for inference when deployed in the field. A common approach to implementing DNNs is to recast the most computationally expensive operations as general matrix multiplication (GEMM). However, as we demonstrate in this paper, there are a great many different ways to express DNN convolution operations using GEMM. Although different approaches all perform the same number of operations, the size of temporary data structures differs significantly. Convolution of an input matrix with dimensions $C \times H \times W$, requires $O(K^2CHW)$ additional space using the classical im2col approach. More recently memory-efficient approaches requiring just $O(KCHW)$ auxiliary space have been proposed. We present two novel GEMM-based algorithms that require just $O(MHW)$ and $O(KW)$ additional space respectively, where $M$ is the number of channels in the result of the convolution. These algorithms dramatically reduce the space overhead of DNN convolution, making it much more suitable for memory-limited embedded systems. Experimental evaluation shows that our low-memory algorithms are just as fast as the best patch-building approaches despite requiring just a fraction of the amount of additional memory. Our low-memory algorithms have excellent data locality which gives them a further edge over patch-building algorithms when multiple cores are used. As a result, our low memory algorithms often outperform the best patch-building algorithms using multiple threads.
Tasks
Published 2017-09-08
URL http://arxiv.org/abs/1709.03395v1
PDF http://arxiv.org/pdf/1709.03395v1.pdf
PWC https://paperswithcode.com/paper/low-memory-gemm-based-convolution-algorithms
Repo
Framework

Rotting Bandits

Title Rotting Bandits
Authors Nir Levine, Koby Crammer, Shie Mannor
Abstract The Multi-Armed Bandits (MAB) framework highlights the tension between acquiring new knowledge (Exploration) and leveraging available knowledge (Exploitation). In the classical MAB problem, a decision maker must choose an arm at each time step, upon which she receives a reward. The decision maker’s objective is to maximize her cumulative expected reward over the time horizon. The MAB problem has been studied extensively, specifically under the assumption of the arms’ rewards distributions being stationary, or quasi-stationary, over time. We consider a variant of the MAB framework, which we termed Rotting Bandits, where each arm’s expected reward decays as a function of the number of times it has been pulled. We are motivated by many real-world scenarios such as online advertising, content recommendation, crowdsourcing, and more. We present algorithms, accompanied by simulations, and derive theoretical guarantees.
Tasks Multi-Armed Bandits
Published 2017-02-23
URL http://arxiv.org/abs/1702.07274v4
PDF http://arxiv.org/pdf/1702.07274v4.pdf
PWC https://paperswithcode.com/paper/rotting-bandits
Repo
Framework

Scalable Exact Parent Sets Identification in Bayesian Networks Learning with Apache Spark

Title Scalable Exact Parent Sets Identification in Bayesian Networks Learning with Apache Spark
Authors Subhadeep Karan, Jaroslaw Zola
Abstract In Machine Learning, the parent set identification problem is to find a set of random variables that best explain selected variable given the data and some predefined scoring function. This problem is a critical component to structure learning of Bayesian networks and Markov blankets discovery, and thus has many practical applications, ranging from fraud detection to clinical decision support. In this paper, we introduce a new distributed memory approach to the exact parent sets assignment problem. To achieve scalability, we derive theoretical bounds to constraint the search space when MDL scoring function is used, and we reorganize the underlying dynamic programming such that the computational density is increased and fine-grain synchronization is eliminated. We then design efficient realization of our approach in the Apache Spark platform. Through experimental results, we demonstrate that the method maintains strong scalability on a 500-core standalone Spark cluster, and it can be used to efficiently process data sets with 70 variables, far beyond the reach of the currently available solutions.
Tasks Fraud Detection
Published 2017-05-18
URL http://arxiv.org/abs/1705.06390v2
PDF http://arxiv.org/pdf/1705.06390v2.pdf
PWC https://paperswithcode.com/paper/scalable-exact-parent-sets-identification-in
Repo
Framework

Strategic Coalitions with Perfect Recall

Title Strategic Coalitions with Perfect Recall
Authors Pavel Naumov, Jia Tao
Abstract The paper proposes a bimodal logic that describes an interplay between distributed knowledge modality and coalition know-how modality. Unlike other similar systems, the one proposed here assumes perfect recall by all agents. Perfect recall is captured in the system by a single axiom. The main technical results are the soundness and the completeness theorems for the proposed logical system.
Tasks
Published 2017-07-13
URL http://arxiv.org/abs/1707.04298v2
PDF http://arxiv.org/pdf/1707.04298v2.pdf
PWC https://paperswithcode.com/paper/strategic-coalitions-with-perfect-recall
Repo
Framework
comments powered by Disqus