Paper Group ANR 558
Soft + Hardwired Attention: An LSTM Framework for Human Trajectory Prediction and Abnormal Event Detection. Deep Learning for Environmentally Robust Speech Recognition: An Overview of Recent Developments. Deep Regression Bayesian Network and Its Applications. Residual Convolutional CTC Networks for Automatic Speech Recognition. Energy Propagation i …
Soft + Hardwired Attention: An LSTM Framework for Human Trajectory Prediction and Abnormal Event Detection
Title | Soft + Hardwired Attention: An LSTM Framework for Human Trajectory Prediction and Abnormal Event Detection |
Authors | Tharindu Fernando, Simon Denman, Sridha Sridharan, Clinton Fookes |
Abstract | As humans we possess an intuitive ability for navigation which we master through years of practice; however existing approaches to model this trait for diverse tasks including monitoring pedestrian flow and detecting abnormal events have been limited by using a variety of hand-crafted features. Recent research in the area of deep-learning has demonstrated the power of learning features directly from the data; and related research in recurrent neural networks has shown exemplary results in sequence-to-sequence problems such as neural machine translation and neural image caption generation. Motivated by these approaches, we propose a novel method to predict the future motion of a pedestrian given a short history of their, and their neighbours, past behaviour. The novelty of the proposed method is the combined attention model which utilises both “soft attention” as well as “hard-wired” attention in order to map the trajectory information from the local neighbourhood to the future positions of the pedestrian of interest. We illustrate how a simple approximation of attention weights (i.e hard-wired) can be merged together with soft attention weights in order to make our model applicable for challenging real world scenarios with hundreds of neighbours. The navigational capability of the proposed method is tested on two challenging publicly available surveillance databases where our model outperforms the current-state-of-the-art methods. Additionally, we illustrate how the proposed architecture can be directly applied for the task of abnormal event detection without handcrafting the features. |
Tasks | Machine Translation, Trajectory Prediction |
Published | 2017-02-18 |
URL | http://arxiv.org/abs/1702.05552v1 |
http://arxiv.org/pdf/1702.05552v1.pdf | |
PWC | https://paperswithcode.com/paper/soft-hardwired-attention-an-lstm-framework |
Repo | |
Framework | |
Deep Learning for Environmentally Robust Speech Recognition: An Overview of Recent Developments
Title | Deep Learning for Environmentally Robust Speech Recognition: An Overview of Recent Developments |
Authors | Zixing Zhang, Jürgen Geiger, Jouni Pohjalainen, Amr El-Desoky Mousa, Wenyu Jin, Björn Schuller |
Abstract | Eliminating the negative effect of non-stationary environmental noise is a long-standing research topic for automatic speech recognition that stills remains an important challenge. Data-driven supervised approaches, including ones based on deep neural networks, have recently emerged as potential alternatives to traditional unsupervised approaches and with sufficient training, can alleviate the shortcomings of the unsupervised methods in various real-life acoustic environments. In this light, we review recently developed, representative deep learning approaches for tackling non-stationary additive and convolutional degradation of speech with the aim of providing guidelines for those involved in the development of environmentally robust speech recognition systems. We separately discuss single- and multi-channel techniques developed for the front-end and back-end of speech recognition systems, as well as joint front-end and back-end training frameworks. |
Tasks | Robust Speech Recognition, Speech Recognition |
Published | 2017-05-30 |
URL | http://arxiv.org/abs/1705.10874v3 |
http://arxiv.org/pdf/1705.10874v3.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-for-environmentally-robust |
Repo | |
Framework | |
Deep Regression Bayesian Network and Its Applications
Title | Deep Regression Bayesian Network and Its Applications |
Authors | Siqi Nie, Meng Zheng, Qiang Ji |
Abstract | Deep directed generative models have attracted much attention recently due to their generative modeling nature and powerful data representation ability. In this paper, we review different structures of deep directed generative models and the learning and inference algorithms associated with the structures. We focus on a specific structure that consists of layers of Bayesian Networks due to the property of capturing inherent and rich dependencies among latent variables. The major difficulty of learning and inference with deep directed models with many latent variables is the intractable inference due to the dependencies among the latent variables and the exponential number of latent variable configurations. Current solutions use variational methods often through an auxiliary network to approximate the posterior probability inference. In contrast, inference can also be performed directly without using any auxiliary network to maximally preserve the dependencies among the latent variables. Specifically, by exploiting the sparse representation with the latent space, max-max instead of max-sum operation can be used to overcome the exponential number of latent configurations. Furthermore, the max-max operation and augmented coordinate ascent are applied to both supervised and unsupervised learning as well as to various inference. Quantitative evaluations on benchmark datasets of different models are given for both data representation and feature learning tasks. |
Tasks | |
Published | 2017-10-13 |
URL | http://arxiv.org/abs/1710.04809v1 |
http://arxiv.org/pdf/1710.04809v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-regression-bayesian-network-and-its |
Repo | |
Framework | |
Residual Convolutional CTC Networks for Automatic Speech Recognition
Title | Residual Convolutional CTC Networks for Automatic Speech Recognition |
Authors | Yisen Wang, Xuejiao Deng, Songbai Pu, Zhiheng Huang |
Abstract | Deep learning approaches have been widely used in Automatic Speech Recognition (ASR) and they have achieved a significant accuracy improvement. Especially, Convolutional Neural Networks (CNNs) have been revisited in ASR recently. However, most CNNs used in existing work have less than 10 layers which may not be deep enough to capture all human speech signal information. In this paper, we propose a novel deep and wide CNN architecture denoted as RCNN-CTC, which has residual connections and Connectionist Temporal Classification (CTC) loss function. RCNN-CTC is an end-to-end system which can exploit temporal and spectral structures of speech signals simultaneously. Furthermore, we introduce a CTC-based system combination, which is different from the conventional frame-wise senone-based one. The basic subsystems adopted in the combination are different types and thus mutually complementary to each other. Experimental results show that our proposed single system RCNN-CTC can achieve the lowest word error rate (WER) on WSJ and Tencent Chat data sets, compared to several widely used neural network systems in ASR. In addition, the proposed system combination can offer a further error reduction on these two data sets, resulting in relative WER reductions of $14.91%$ and $6.52%$ on WSJ dev93 and Tencent Chat data sets respectively. |
Tasks | Speech Recognition |
Published | 2017-02-24 |
URL | http://arxiv.org/abs/1702.07793v1 |
http://arxiv.org/pdf/1702.07793v1.pdf | |
PWC | https://paperswithcode.com/paper/residual-convolutional-ctc-networks-for |
Repo | |
Framework | |
Energy Propagation in Deep Convolutional Neural Networks
Title | Energy Propagation in Deep Convolutional Neural Networks |
Authors | Thomas Wiatowski, Philipp Grohs, Helmut Bölcskei |
Abstract | Many practical machine learning tasks employ very deep convolutional neural networks. Such large depths pose formidable computational challenges in training and operating the network. It is therefore important to understand how fast the energy contained in the propagated signals (a.k.a. feature maps) decays across layers. In addition, it is desirable that the feature extractor generated by the network be informative in the sense of the only signal mapping to the all-zeros feature vector being the zero input signal. This “trivial null-set” property can be accomplished by asking for “energy conservation” in the sense of the energy in the feature vector being proportional to that of the corresponding input signal. This paper establishes conditions for energy conservation (and thus for a trivial null-set) for a wide class of deep convolutional neural network-based feature extractors and characterizes corresponding feature map energy decay rates. Specifically, we consider general scattering networks employing the modulus non-linearity and we find that under mild analyticity and high-pass conditions on the filters (which encompass, inter alia, various constructions of Weyl-Heisenberg filters, wavelets, ridgelets, ($\alpha$)-curvelets, and shearlets) the feature map energy decays at least polynomially fast. For broad families of wavelets and Weyl-Heisenberg filters, the guaranteed decay rate is shown to be exponential. Moreover, we provide handy estimates of the number of layers needed to have at least $((1-\varepsilon)\cdot 100)%$ of the input signal energy be contained in the feature vector. |
Tasks | |
Published | 2017-04-12 |
URL | http://arxiv.org/abs/1704.03636v3 |
http://arxiv.org/pdf/1704.03636v3.pdf | |
PWC | https://paperswithcode.com/paper/energy-propagation-in-deep-convolutional |
Repo | |
Framework | |
Medoids in almost linear time via multi-armed bandits
Title | Medoids in almost linear time via multi-armed bandits |
Authors | Vivek Bagaria, Govinda M. Kamath, Vasilis Ntranos, Martin J. Zhang, David Tse |
Abstract | Computing the medoid of a large number of points in high-dimensional space is an increasingly common operation in many data science problems. We present an algorithm Med-dit which uses O(n log n) distance evaluations to compute the medoid with high probability. Med-dit is based on a connection with the multi-armed bandit problem. We evaluate the performance of Med-dit empirically on the Netflix-prize and the single-cell RNA-Seq datasets, containing hundreds of thousands of points living in tens of thousands of dimensions, and observe a 5-10x improvement in performance over the current state of the art. Med-dit is available at https://github.com/bagavi/Meddit |
Tasks | Multi-Armed Bandits |
Published | 2017-11-02 |
URL | http://arxiv.org/abs/1711.00817v3 |
http://arxiv.org/pdf/1711.00817v3.pdf | |
PWC | https://paperswithcode.com/paper/medoids-in-almost-linear-time-via-multi-armed |
Repo | |
Framework | |
Action recognition by learning pose representations
Title | Action recognition by learning pose representations |
Authors | Alessia Saggese, Nicola Strisciuglio, Mario Vento, Nicolai Petkov |
Abstract | Pose detection is one of the fundamental steps for the recognition of human actions. In this paper we propose a novel trainable detector for recognizing human poses based on the analysis of the skeleton. The main idea is that a skeleton pose can be described by the spatial arrangements of its joints. Starting from this consideration, we propose a trainable pose detector, that can be configured on a prototype skeleton in an automatic configuration process. The result of the configuration is a model of the position of the joints in the concerned skeleton. In the application phase, the joint positions contained in the model are compared with the ones of their homologous joints in the skeleton under test. The similarity of two skeletons is computed as a combination of the position scores achieved by homologous joints. In this paper we describe an action classification method based on the use of the proposed trainable detectors to extract features from the skeletons. We performed experiments on the publicly available MSDRA data set and the achieved results confirm the effectiveness of the proposed approach. |
Tasks | Action Classification, Temporal Action Localization |
Published | 2017-08-02 |
URL | http://arxiv.org/abs/1708.00672v1 |
http://arxiv.org/pdf/1708.00672v1.pdf | |
PWC | https://paperswithcode.com/paper/action-recognition-by-learning-pose |
Repo | |
Framework | |
Detection of irregular QRS complexes using Hermite Transform and Support Vector Machine
Title | Detection of irregular QRS complexes using Hermite Transform and Support Vector Machine |
Authors | Zoja Vulaj, Milos Brajovic, Andjela Draganic, Irena Orovic |
Abstract | Computer based recognition and detection of abnormalities in ECG signals is proposed. For this purpose, the Support Vector Machines (SVM) are combined with the advantages of Hermite transform representation. SVM represent a special type of classification techniques commonly used in medical applications. Automatic classification of ECG could make the work of cardiologic departments faster and more efficient. It would also reduce the number of false diagnosis and, as a result, save lives. The working principle of the SVM is based on translating the data into a high dimensional feature space and separating it using a linear classificator. In order to provide an optimal representation for SVM application, the Hermite transform domain is used. This domain is proved to be suitable because of the similarity of the QRS complex with Hermite basis functions. The maximal signal information is obtained using a small set of features that are used for detection of irregular QRS complexes. The aim of the paper is to show that these features can be employed for automatic ECG signal analysis. |
Tasks | |
Published | 2017-05-12 |
URL | http://arxiv.org/abs/1705.04519v1 |
http://arxiv.org/pdf/1705.04519v1.pdf | |
PWC | https://paperswithcode.com/paper/detection-of-irregular-qrs-complexes-using |
Repo | |
Framework | |
Speeding up Convolutional Neural Networks By Exploiting the Sparsity of Rectifier Units
Title | Speeding up Convolutional Neural Networks By Exploiting the Sparsity of Rectifier Units |
Authors | Shaohuai Shi, Xiaowen Chu |
Abstract | Rectifier neuron units (ReLUs) have been widely used in deep convolutional networks. An ReLU converts negative values to zeros, and does not change positive values, which leads to a high sparsity of neurons. In this work, we first examine the sparsity of the outputs of ReLUs in some popular deep convolutional architectures. And then we use the sparsity property of ReLUs to accelerate the calculation of convolution by skipping calculations of zero-valued neurons. The proposed sparse convolution algorithm achieves some speedup improvements on CPUs compared to the traditional matrix-matrix multiplication algorithm for convolution when the sparsity is not less than 0.9. |
Tasks | |
Published | 2017-04-25 |
URL | http://arxiv.org/abs/1704.07724v2 |
http://arxiv.org/pdf/1704.07724v2.pdf | |
PWC | https://paperswithcode.com/paper/speeding-up-convolutional-neural-networks-by |
Repo | |
Framework | |
Error Bounds for Piecewise Smooth and Switching Regression
Title | Error Bounds for Piecewise Smooth and Switching Regression |
Authors | Fabien Lauer |
Abstract | The paper deals with regression problems, in which the nonsmooth target is assumed to switch between different operating modes. Specifically, piecewise smooth (PWS) regression considers target functions switching deterministically via a partition of the input space, while switching regression considers arbitrary switching laws. The paper derives generalization error bounds in these two settings by following the approach based on Rademacher complexities. For PWS regression, our derivation involves a chaining argument and a decomposition of the covering numbers of PWS classes in terms of the ones of their component functions and the capacity of the classifier partitioning the input space. This yields error bounds with a radical dependency on the number of modes. For switching regression, the decomposition can be performed directly at the level of the Rademacher complexities, which yields bounds with a linear dependency on the number of modes. By using once more chaining and a decomposition at the level of covering numbers, we show how to recover a radical dependency. Examples of applications are given in particular for PWS and swichting regression with linear and kernel-based component functions. |
Tasks | |
Published | 2017-07-25 |
URL | http://arxiv.org/abs/1707.07938v2 |
http://arxiv.org/pdf/1707.07938v2.pdf | |
PWC | https://paperswithcode.com/paper/error-bounds-for-piecewise-smooth-and |
Repo | |
Framework | |
Model based learning for accelerated, limited-view 3D photoacoustic tomography
Title | Model based learning for accelerated, limited-view 3D photoacoustic tomography |
Authors | Andreas Hauptmann, Felix Lucka, Marta Betcke, Nam Huynh, Jonas Adler, Ben Cox, Paul Beard, Sebastien Ourselin, Simon Arridge |
Abstract | Recent advances in deep learning for tomographic reconstructions have shown great potential to create accurate and high quality images with a considerable speed-up. In this work we present a deep neural network that is specifically designed to provide high resolution 3D images from restricted photoacoustic measurements. The network is designed to represent an iterative scheme and incorporates gradient information of the data fit to compensate for limited view artefacts. Due to the high complexity of the photoacoustic forward operator, we separate training and computation of the gradient information. A suitable prior for the desired image structures is learned as part of the training. The resulting network is trained and tested on a set of segmented vessels from lung CT scans and then applied to in-vivo photoacoustic measurement data. |
Tasks | Tomographic Reconstructions |
Published | 2017-08-31 |
URL | http://arxiv.org/abs/1708.09832v3 |
http://arxiv.org/pdf/1708.09832v3.pdf | |
PWC | https://paperswithcode.com/paper/model-based-learning-for-accelerated-limited |
Repo | |
Framework | |
Resilient Autonomous Control of Distributed Multi-agent Systems in Contested Environments
Title | Resilient Autonomous Control of Distributed Multi-agent Systems in Contested Environments |
Authors | Rohollah Moghadam, Hamidreza Modares |
Abstract | An autonomous and resilient controller is proposed for leader-follower multi-agent systems under uncertainties and cyber-physical attacks. The leader is assumed non-autonomous with a nonzero control input, which allows changing the team behavior or mission in response to environmental changes. A resilient learning-based control protocol is presented to find optimal solutions to the synchronization problem in the presence of attacks and system dynamic uncertainties. An observer-based distributed H_infinity controller is first designed to prevent propagating the effects of attacks on sensors and actuators throughout the network, as well as to attenuate the effect of these attacks on the compromised agent itself. Non-homogeneous game algebraic Riccati equations are derived to solve the H_infinity optimal synchronization problem and off-policy reinforcement learning is utilized to learn their solution without requiring any knowledge of the agent’s dynamics. A trust-confidence based distributed control protocol is then proposed to mitigate attacks that hijack the entire node and attacks on communication links. A confidence value is defined for each agent based solely on its local evidence. The proposed resilient reinforcement learning algorithm employs the confidence value of each agent to indicate the trustworthiness of its own information and broadcast it to its neighbors to put weights on the data they receive from it during and after learning. If the confidence value of an agent is low, it employs a trust mechanism to identify compromised agents and remove the data it receives from them from the learning process. Simulation results are provided to show the effectiveness of the proposed approach. |
Tasks | |
Published | 2017-08-31 |
URL | http://arxiv.org/abs/1708.09630v4 |
http://arxiv.org/pdf/1708.09630v4.pdf | |
PWC | https://paperswithcode.com/paper/resilient-autonomous-control-of-distributed |
Repo | |
Framework | |
WRPN: Wide Reduced-Precision Networks
Title | WRPN: Wide Reduced-Precision Networks |
Authors | Asit Mishra, Eriko Nurvitadhi, Jeffrey J Cook, Debbie Marr |
Abstract | For computer vision applications, prior works have shown the efficacy of reducing numeric precision of model parameters (network weights) in deep neural networks. Activation maps, however, occupy a large memory footprint during both the training and inference step when using mini-batches of inputs. One way to reduce this large memory footprint is to reduce the precision of activations. However, past works have shown that reducing the precision of activations hurts model accuracy. We study schemes to train networks from scratch using reduced-precision activations without hurting accuracy. We reduce the precision of activation maps (along with model parameters) and increase the number of filter maps in a layer, and find that this scheme matches or surpasses the accuracy of the baseline full-precision network. As a result, one can significantly improve the execution efficiency (e.g. reduce dynamic memory footprint, memory bandwidth and computational energy) and speed up the training and inference process with appropriate hardware support. We call our scheme WRPN - wide reduced-precision networks. We report results and show that WRPN scheme is better than previously reported accuracies on ILSVRC-12 dataset while being computationally less expensive compared to previously reported reduced-precision networks. |
Tasks | |
Published | 2017-09-04 |
URL | http://arxiv.org/abs/1709.01134v1 |
http://arxiv.org/pdf/1709.01134v1.pdf | |
PWC | https://paperswithcode.com/paper/wrpn-wide-reduced-precision-networks |
Repo | |
Framework | |
Towards Agent-Based Model Specification in Smart Grid: A Cognitive Agent-based Computing Approach
Title | Towards Agent-Based Model Specification in Smart Grid: A Cognitive Agent-based Computing Approach |
Authors | Waseem Akram, Muaz A. Niazi, Laszlo Barna Iantovics |
Abstract | A smart grid can be considered as a complex network where each node represents a generation unit or a consumer. Whereas links can be used to represent transmission lines. One way to study complex systems is by using the agent-based modeling (ABM) paradigm. An ABM is a way of representing a complex system of autonomous agents interacting with each other. Previously, a number of studies have been presented in the smart grid domain making use of the ABM paradigm. However, to the best of our knowledge, none of these studies have focused on the specification aspect of ABM. An ABM specification is important not only for understanding but also for replication of the model. In this study, we focus on development as well as specification of ABM for smart grid. We propose an ABM by using a combination of agent-based and complex network-based approaches. For ABM specification, we use ODD and DREAM specification approaches. We analyze these two specification approaches qualitatively as well as quantitatively. Extensive experiments demonstrate that DREAM is a most useful approach as compared with ODD for modeling as well as for replication of models for smart grid. |
Tasks | |
Published | 2017-10-01 |
URL | http://arxiv.org/abs/1710.03189v2 |
http://arxiv.org/pdf/1710.03189v2.pdf | |
PWC | https://paperswithcode.com/paper/towards-agent-based-model-specification-in |
Repo | |
Framework | |
Are distributional representations ready for the real world? Evaluating word vectors for grounded perceptual meaning
Title | Are distributional representations ready for the real world? Evaluating word vectors for grounded perceptual meaning |
Authors | Li Lucy, Jon Gauthier |
Abstract | Distributional word representation methods exploit word co-occurrences to build compact vector encodings of words. While these representations enjoy widespread use in modern natural language processing, it is unclear whether they accurately encode all necessary facets of conceptual meaning. In this paper, we evaluate how well these representations can predict perceptual and conceptual features of concrete concepts, drawing on two semantic norm datasets sourced from human participants. We find that several standard word representations fail to encode many salient perceptual features of concepts, and show that these deficits correlate with word-word similarity prediction errors. Our analyses provide motivation for grounded and embodied language learning approaches, which may help to remedy these deficits. |
Tasks | |
Published | 2017-05-31 |
URL | http://arxiv.org/abs/1705.11168v1 |
http://arxiv.org/pdf/1705.11168v1.pdf | |
PWC | https://paperswithcode.com/paper/are-distributional-representations-ready-for |
Repo | |
Framework | |