October 19, 2019

3418 words 17 mins read

Paper Group ANR 145

Paper Group ANR 145

Human-Guided Data Exploration. On Plans With Loops and Noise. Mad Max: Affine Spline Insights into Deep Learning. A Modified Sigma-Pi-Sigma Neural Network with Adaptive Choice of Multinomials. Compression of Deep Convolutional Neural Networks under Joint Sparsity Constraints. Neural Language Codes for Multilingual Acoustic Models. Robust Gesture-Ba …

Human-Guided Data Exploration

Title Human-Guided Data Exploration
Authors Andreas Henelius, Emilia Oikarinen, Kai Puolamäki
Abstract The outcome of the explorative data analysis (EDA) phase is vital for successful data analysis. EDA is more effective when the user interacts with the system used to carry out the exploration. In the recently proposed paradigm of iterative data mining the user controls the exploration by inputting knowledge in the form of patterns observed during the process. The system then shows the user views of the data that are maximally informative given the user’s current knowledge. Although this scheme is good at showing surprising views of the data to the user, there is a clear shortcoming: the user cannot steer the process. In many real cases we want to focus on investigating specific questions concerning the data. This paper presents the Human Guided Data Exploration framework, generalising previous research. This framework allows the user to incorporate existing knowledge into the exploration process, focus on exploring a subset of the data, and compare different complex hypotheses concerning relations in the data. The framework utilises a computationally efficient constrained randomisation scheme. To showcase the framework, we developed a free open-source tool, using which the empirical evaluation on real-world datasets was carried out. Our evaluation shows that the ability to focus on particular subsets and being able to compare hypotheses are important additions to the interactive iterative data mining process.
Tasks
Published 2018-04-09
URL http://arxiv.org/abs/1804.03194v1
PDF http://arxiv.org/pdf/1804.03194v1.pdf
PWC https://paperswithcode.com/paper/human-guided-data-exploration
Repo
Framework

On Plans With Loops and Noise

Title On Plans With Loops and Noise
Authors Vaishak Belle
Abstract In an influential paper, Levesque proposed a formal specification for analysing the correctness of program-like plans, such as conditional plans, iterative plans, and knowledge-based plans. He motivated a logical characterisation within the situation calculus that included binary sensing actions. While the characterisation does not immediately yield a practical algorithm, the specification serves as a general skeleton to explore the synthesis of program-like plans for reasonable, tractable fragments. Increasingly, classical plan structures are being applied to stochastic environments such as robotics applications. This raises the question as to what the specification for correctness should look like, since Levesque’s account makes the assumption that sensing is exact and actions are deterministic. Building on a situation calculus theory for reasoning about degrees of belief and noise, we revisit the execution semantics of generalised plans. The specification is then used to analyse the correctness of example plans.
Tasks
Published 2018-09-14
URL http://arxiv.org/abs/1809.05309v1
PDF http://arxiv.org/pdf/1809.05309v1.pdf
PWC https://paperswithcode.com/paper/on-plans-with-loops-and-noise
Repo
Framework

Mad Max: Affine Spline Insights into Deep Learning

Title Mad Max: Affine Spline Insights into Deep Learning
Authors Randall Balestriero, Richard Baraniuk
Abstract We build a rigorous bridge between deep networks (DNs) and approximation theory via spline functions and operators. Our key result is that a large class of DNs can be written as a composition of max-affine spline operators (MASOs), which provide a powerful portal through which to view and analyze their inner workings. For instance, conditioned on the input signal, the output of a MASO DN can be written as a simple affine transformation of the input. This implies that a DN constructs a set of signal-dependent, class-specific templates against which the signal is compared via a simple inner product; we explore the links to the classical theory of optimal classification via matched filters and the effects of data memorization. Going further, we propose a simple penalty term that can be added to the cost function of any DN learning algorithm to force the templates to be orthogonal with each other; this leads to significantly improved classification performance and reduced overfitting with no change to the DN architecture. The spline partition of the input signal space that is implicitly induced by a MASO directly links DNs to the theory of vector quantization (VQ) and $K$-means clustering, which opens up new geometric avenue to study how DNs organize signals in a hierarchical fashion. To validate the utility of the VQ interpretation, we develop and validate a new distance metric for signals and images that quantifies the difference between their VQ encodings. (This paper is a significantly expanded version of A Spline Theory of Deep Learning from ICML 2018.)
Tasks Quantization
Published 2018-05-17
URL http://arxiv.org/abs/1805.06576v5
PDF http://arxiv.org/pdf/1805.06576v5.pdf
PWC https://paperswithcode.com/paper/mad-max-affine-spline-insights-into-deep
Repo
Framework

A Modified Sigma-Pi-Sigma Neural Network with Adaptive Choice of Multinomials

Title A Modified Sigma-Pi-Sigma Neural Network with Adaptive Choice of Multinomials
Authors Feng Li, Yan Liu, Khidir Shaib Mohamed, Wei Wu
Abstract Sigma-Pi-Sigma neural networks (SPSNNs) as a kind of high-order neural networks can provide more powerful mapping capability than the traditional feedforward neural networks (Sigma-Sigma neural networks). In the existing literature, in order to reduce the number of the Pi nodes in the Pi layer, a special multinomial P_s is used in SPSNNs. Each monomial in P_s is linear with respect to each particular variable sigma_i when the other variables are taken as constants. Therefore, the monomials like sigma_i^n or sigma_i^n sigma_j with n>1 are not included. This choice may be somehow intuitive, but is not necessarily the best. We propose in this paper a modified Sigma-Pi-Sigma neural network (MSPSNN) with an adaptive approach to find a better multinomial for a given problem. To elaborate, we start from a complete multinomial with a given order. Then we employ a regularization technique in the learning process for the given problem to reduce the number of monomials used in the multinomial, and end up with a new SPSNN involving the same number of monomials (= the number of nodes in the Pi-layer) as in P_s. Numerical experiments on some benchmark problems show that our MSPSNN behaves better than the traditional SPSNN with P_s.
Tasks
Published 2018-02-01
URL http://arxiv.org/abs/1802.00123v1
PDF http://arxiv.org/pdf/1802.00123v1.pdf
PWC https://paperswithcode.com/paper/a-modified-sigma-pi-sigma-neural-network-with
Repo
Framework

Compression of Deep Convolutional Neural Networks under Joint Sparsity Constraints

Title Compression of Deep Convolutional Neural Networks under Joint Sparsity Constraints
Authors Yoojin Choi, Mostafa El-Khamy, Jungwon Lee
Abstract We consider the optimization of deep convolutional neural networks (CNNs) such that they provide good performance while having reduced complexity if deployed on either conventional systems utilizing spatial-domain convolution or lower complexity systems designed for Winograd convolution. Furthermore, we explore the universal quantization and compression of these networks. In particular, the proposed framework produces one compressed model whose convolutional filters can be made sparse either in the spatial domain or in the Winograd domain. Hence, one compressed model can be deployed universally on any platform, without need for re-training on the deployed platform, and the sparsity of its convolutional filters can be exploited for further complexity reduction in either domain. To get a better compression ratio, the sparse model is compressed in the spatial domain which has a less number of parameters. From our experiments, we obtain $24.2\times$, $47.7\times$ and $35.4\times$ compressed models for ResNet-18, AlexNet and CT-SRCNN, while their computational cost is also reduced by $4.5\times$, $5.1\times$ and $23.5\times$, respectively.
Tasks Quantization
Published 2018-05-21
URL http://arxiv.org/abs/1805.08303v2
PDF http://arxiv.org/pdf/1805.08303v2.pdf
PWC https://paperswithcode.com/paper/compression-of-deep-convolutional-neural-1
Repo
Framework

Neural Language Codes for Multilingual Acoustic Models

Title Neural Language Codes for Multilingual Acoustic Models
Authors Markus Müller, Sebastian Stüker, Alex Waibel
Abstract Multilingual Speech Recognition is one of the most costly AI problems, because each language (7,000+) and even different accents require their own acoustic models to obtain best recognition performance. Even though they all use the same phoneme symbols, each language and accent imposes its own coloring or “twang”. Many adaptive approaches have been proposed, but they require further training, additional data and generally are inferior to monolingually trained models. In this paper, we propose a different approach that uses a large multilingual model that is \emph{modulated} by the codes generated by an ancillary network that learns to code useful differences between the “twangs” or human language. We use Meta-Pi networks to have one network (the language code net) gate the activity of neurons in another (the acoustic model nets). Our results show that during recognition multilingual Meta-Pi networks quickly adapt to the proper language coloring without retraining or new data, and perform better than monolingually trained networks. The model was evaluated by training acoustic modeling nets and modulating language code nets jointly and optimize them for best recognition performance.
Tasks Speech Recognition
Published 2018-07-05
URL http://arxiv.org/abs/1807.01956v1
PDF http://arxiv.org/pdf/1807.01956v1.pdf
PWC https://paperswithcode.com/paper/neural-language-codes-for-multilingual
Repo
Framework

Robust Gesture-Based Communication for Underwater Human-Robot Interaction in the context of Search and Rescue Diver Missions

Title Robust Gesture-Based Communication for Underwater Human-Robot Interaction in the context of Search and Rescue Diver Missions
Authors Arturo Gomez Chavez, Christian A. Mueller, Tobias Doernbach, Davide Chiarella, Andreas Birk
Abstract We propose a robust gesture-based communication pipeline for divers to instruct an Autonomous Underwater Vehicle (AUV) to assist them in performing high-risk tasks and helping in case of emergency. A gesture communication language (CADDIAN) is developed, based on consolidated and standardized diver gestures, including an alphabet, syntax and semantics, ensuring a logical consistency. A hierarchical classification approach is introduced for hand gesture recognition based on stereo imagery and multi-descriptor aggregation to specifically cope with underwater image artifacts, e.g. light backscatter or color attenuation. Once the classification task is finished, a syntax check is performed to filter out invalid command sequences sent by the diver or generated by errors in the classifier. Throughout this process, the diver receives constant feedback from an underwater tablet to acknowledge or abort the mission at any time. The objective is to prevent the AUV from executing unnecessary, infeasible or potentially harmful motions. Experimental results under different environmental conditions in archaeological exploration and bridge inspection applications show that the system performs well in the field.
Tasks Gesture Recognition, Hand Gesture Recognition, Hand-Gesture Recognition
Published 2018-10-16
URL http://arxiv.org/abs/1810.07122v1
PDF http://arxiv.org/pdf/1810.07122v1.pdf
PWC https://paperswithcode.com/paper/robust-gesture-based-communication-for
Repo
Framework

Aesthetics Assessment of Images Containing Faces

Title Aesthetics Assessment of Images Containing Faces
Authors Simone Bianco, Luigi Celona, Raimondo Schettini
Abstract Recent research has widely explored the problem of aesthetics assessment of images with generic content. However, few approaches have been specifically designed to predict the aesthetic quality of images containing human faces, which make up a massive portion of photos in the web. This paper introduces a method for aesthetic quality assessment of images with faces. We exploit three different Convolutional Neural Networks to encode information regarding perceptual quality, global image aesthetics, and facial attributes; then, a model is trained to combine these features to explicitly predict the aesthetics of images containing faces. Experimental results show that our approach outperforms existing methods for both binary, i.e. low/high, and continuous aesthetic score prediction on four different databases in the state-of-the-art.
Tasks
Published 2018-05-22
URL http://arxiv.org/abs/1805.08685v1
PDF http://arxiv.org/pdf/1805.08685v1.pdf
PWC https://paperswithcode.com/paper/aesthetics-assessment-of-images-containing
Repo
Framework

Grassmannian Discriminant Maps (GDM) for Manifold Dimensionality Reduction with Application to Image Set Classification

Title Grassmannian Discriminant Maps (GDM) for Manifold Dimensionality Reduction with Application to Image Set Classification
Authors Rui Wang, Xiao-Jun Wu, Kai-Xuan Chen, Josef Kittler
Abstract In image set classification, a considerable progress has been made by representing original image sets on Grassmann manifolds. In order to extend the advantages of the Euclidean based dimensionality reduction methods to the Grassmann Manifold, several methods have been suggested recently which jointly perform dimensionality reduction and metric learning on Grassmann manifold to improve performance. Nevertheless, when applied to complex datasets, the learned features do not exhibit enough discriminatory power. To overcome this problem, we propose a new method named Grassmannian Discriminant Maps (GDM) for manifold dimensionality reduction problems. The core of the method is a new discriminant function for metric learning and dimensionality reduction. For comparison and better understanding, we also study a simple variations to GDM. The key difference between them is the discriminant function. We experiment on data sets corresponding to three tasks: face recognition, object categorization, and hand gesture recognition to evaluate the proposed method and its simple extensions. Compared with the state of the art, the results achieved show the effectiveness of the proposed algorithm.
Tasks Dimensionality Reduction, Face Recognition, Gesture Recognition, Hand Gesture Recognition, Hand-Gesture Recognition, Metric Learning
Published 2018-06-28
URL http://arxiv.org/abs/1806.10830v1
PDF http://arxiv.org/pdf/1806.10830v1.pdf
PWC https://paperswithcode.com/paper/grassmannian-discriminant-maps-gdm-for
Repo
Framework

Cardiac Arrhythmia Detection from ECG Combining Convolutional and Long Short-Term Memory Networks

Title Cardiac Arrhythmia Detection from ECG Combining Convolutional and Long Short-Term Memory Networks
Authors Philip Warrick, Masun Nabhan Homsi
Abstract Objectives: Atrial fibrillation (AF) is a common heart rhythm disorder associated with deadly and debilitating consequences including heart failure, stroke, poor mental health, reduced quality of life and death. Having an automatic system that diagnoses various types of cardiac arrhythmias would assist cardiologists to initiate appropriate preventive measures and to improve the analysis of cardiac disease. To this end, this paper introduces a new approach to detect and classify automatically cardiac arrhythmias in electrocardiograms (ECG) recordings. Methods: The proposed approach used a combination of Convolution Neural Networks (CNNs) and a sequence of Long Short-Term Memory (LSTM) units, with pooling, dropout and normalization techniques to improve their accuracy. The network predicted a classification at every 18th input sample and we selected the final prediction for classification. Results were cross-validated on the Physionet Challenge 2017 training dataset, which contains 8,528 single lead ECG recordings lasting from 9s to just over 60s. Results: Using the proposed structure and no explicit feature selection, 10-fold stratified cross-validation gave an overall F-measure of 0.83.10-0.015 on the held-out test data (mean-standard deviation over all folds) and 0.80 on the hidden dataset of the Challenge entry server.
Tasks Arrhythmia Detection, Feature Selection
Published 2018-01-30
URL http://arxiv.org/abs/1801.10033v1
PDF http://arxiv.org/pdf/1801.10033v1.pdf
PWC https://paperswithcode.com/paper/cardiac-arrhythmia-detection-from-ecg
Repo
Framework

Video to Fully Automatic 3D Hair Model

Title Video to Fully Automatic 3D Hair Model
Authors Shu Liang, Xiufeng Huang, Xianyu Meng, Kunyao Chen, Linda G. Shapiro, Ira Kemelmacher-Shlizerman
Abstract Imagine taking a selfie video with your mobile phone and getting as output a 3D model of your head (face and 3D hair strands) that can be later used in VR, AR, and any other domain. State of the art hair reconstruction methods allow either a single photo (thus compromising 3D quality) or multiple views, but they require manual user interaction (manual hair segmentation and capture of fixed camera views that span full 360 degree). In this paper, we describe a system that can completely automatically create a reconstruction from any video (even a selfie video), and we don’t require specific views, since taking your -90 degree, 90 degree, and full back views is not feasible in a selfie capture. In the core of our system, in addition to the automatization components, hair strands are estimated and deformed in 3D (rather than 2D as in state of the art) thus enabling superior results. We provide qualitative, quantitative, and Mechanical Turk human studies that support the proposed system, and show results on a diverse variety of videos (8 different celebrity videos, 9 selfie mobile videos, spanning age, gender, hair length, type, and styling).
Tasks
Published 2018-09-13
URL http://arxiv.org/abs/1809.04765v1
PDF http://arxiv.org/pdf/1809.04765v1.pdf
PWC https://paperswithcode.com/paper/video-to-fully-automatic-3d-hair-model
Repo
Framework

Deep Reinforcement Learning for Time Optimal Velocity Control using Prior Knowledge

Title Deep Reinforcement Learning for Time Optimal Velocity Control using Prior Knowledge
Authors Gabriel Hartmann, Zvi Shiller, Amos Azaria
Abstract Autonomous navigation has recently gained great interest in the field of reinforcement learning. However, little attention was given to the time optimal velocity control problem, i.e. controlling a vehicle such that it travels at the maximal speed without becoming dynamically unstable (roll-over or sliding). Time optimal velocity control can be solved numerically using existing methods that are based on optimal control and vehicle dynamics. In this paper, we use deep reinforcement learning to generate the time optimal velocity control. Furthermore, we use the numerical solution to further improve the performance of the reinforcement learner. It is shown that the reinforcement learner outperforms the numerically derived solution, and that the hybrid approach (combining learning with the numerical solution) speeds up the training process.
Tasks Autonomous Navigation
Published 2018-11-28
URL https://arxiv.org/abs/1811.11615v3
PDF https://arxiv.org/pdf/1811.11615v3.pdf
PWC https://paperswithcode.com/paper/deep-reinforcement-learning-for-time-optimal
Repo
Framework

Graph-based regularization for regression problems with alignment and highly-correlated designs

Title Graph-based regularization for regression problems with alignment and highly-correlated designs
Authors Yuan Li, Benjamin Mark, Garvesh Raskutti, Rebecca Willett, Hyebin Song, David Neiman
Abstract Sparse models for high-dimensional linear regression and machine learning have received substantial attention over the past two decades. Model selection, or determining which features or covariates are the best explanatory variables, is critical to the interpretability of a learned model. Much of the current literature assumes that covariates are only mildly correlated. However, in many modern applications covariates are highly correlated and do not exhibit key properties (such as the restricted eigenvalue condition, restricted isometry property, or other related assumptions). This work considers a high-dimensional regression setting in which a graph governs both correlations among the covariates and the similarity among regression coefficients – meaning there is \emph{alignment} between the covariates and regression coefficients. Using side information about the strength of correlations among features, we form a graph with edge weights corresponding to pairwise covariances. This graph is used to define a graph total variation regularizer that promotes similar weights for correlated features. This work shows how the proposed graph-based regularization yields mean-squared error guarantees for a broad range of covariance graph structures. These guarantees are optimal for many specific covariance graphs, including block and lattice graphs. Our proposed approach outperforms other methods for highly-correlated design in a variety of experiments on synthetic data and real biochemistry data.
Tasks Model Selection
Published 2018-03-20
URL https://arxiv.org/abs/1803.07658v3
PDF https://arxiv.org/pdf/1803.07658v3.pdf
PWC https://paperswithcode.com/paper/graph-based-regularization-for-regression
Repo
Framework

Mining Automatically Estimated Poses from Video Recordings of Top Athletes

Title Mining Automatically Estimated Poses from Video Recordings of Top Athletes
Authors Rainer Lienhart, Moritz Einfalt, Dan Zecha
Abstract Human pose detection systems based on state-of-the-art DNNs are on the go to be extended, adapted and re-trained to fit the application domain of specific sports. Therefore, plenty of noisy pose data will soon be available from videos recorded at a regular and frequent basis. This work is among the first to develop mining algorithms that can mine the expected abundance of noisy and annotation-free pose data from video recordings in individual sports. Using swimming as an example of a sport with dominant cyclic motion, we show how to determine unsupervised time-continuous cycle speeds and temporally striking poses as well as measure unsupervised cycle stability over time. Additionally, we use long jump as an example of a sport with a rigid phase-based motion to present a technique to automatically partition the temporally estimated pose sequences into their respective phases. This enables the extraction of performance relevant, pose-based metrics currently used by national professional sports associations. Experimental results prove the effectiveness of our mining algorithms, which can also be applied to other cycle-based or phase-based types of sport.
Tasks
Published 2018-04-24
URL http://arxiv.org/abs/1804.08944v2
PDF http://arxiv.org/pdf/1804.08944v2.pdf
PWC https://paperswithcode.com/paper/mining-automatically-estimated-poses-from
Repo
Framework

Sparse Attentive Backtracking: Temporal CreditAssignment Through Reminding

Title Sparse Attentive Backtracking: Temporal CreditAssignment Through Reminding
Authors Nan Rosemary Ke, Anirudh Goyal, Olexa Bilaniuk, Jonathan Binas, Michael C. Mozer, Chris Pal, Yoshua Bengio
Abstract Learning long-term dependencies in extended temporal sequences requires credit assignment to events far back in the past. The most common method for training recurrent neural networks, back-propagation through time (BPTT), requires credit information to be propagated backwards through every single step of the forward computation, potentially over thousands or millions of time steps. This becomes computationally expensive or even infeasible when used with long sequences. Importantly, biological brains are unlikely to perform such detailed reverse replay over very long sequences of internal states (consider days, months, or years.) However, humans are often reminded of past memories or mental states which are associated with the current mental state. We consider the hypothesis that such memory associations between past and present could be used for credit assignment through arbitrarily long sequences, propagating the credit assigned to the current state to the associated past state. Based on this principle, we study a novel algorithm which only back-propagates through a few of these temporal skip connections, realized by a learned attention mechanism that associates current states with relevant past states. We demonstrate in experiments that our method matches or outperforms regular BPTT and truncated BPTT in tasks involving particularly long-term dependencies, but without requiring the biologically implausible backward replay through the whole history of states. Additionally, we demonstrate that the proposed method transfers to longer sequences significantly better than LSTMs trained with BPTT and LSTMs trained with full self-attention.
Tasks
Published 2018-09-11
URL http://arxiv.org/abs/1809.03702v1
PDF http://arxiv.org/pdf/1809.03702v1.pdf
PWC https://paperswithcode.com/paper/sparse-attentive-backtracking-temporal
Repo
Framework
comments powered by Disqus