October 18, 2019

3468 words 17 mins read

Paper Group ANR 649

Exploiting Recurrent Neural Networks and Leap Motion Controller for Sign Language and Semaphoric Gesture Recognition. Identifiability of Complete Dictionary Learning. Decentralized Dictionary Learning Over Time-Varying Digraphs. Overcoming the Curse of Dimensionality in Neural Networks. Cross-Paced Representation Learning with Partial Curricula for …

Exploiting Recurrent Neural Networks and Leap Motion Controller for Sign Language and Semaphoric Gesture Recognition


Title	Exploiting Recurrent Neural Networks and Leap Motion Controller for Sign Language and Semaphoric Gesture Recognition
Authors	Danilo Avola, Marco Bernardi, Luigi Cinque, Gian Luca Foresti, Cristiano Massaroni
Abstract	In human interactions, hands are a powerful way of expressing information that, in some cases, can be used as a valid substitute for voice, as it happens in Sign Language. Hand gesture recognition has always been an interesting topic in the areas of computer vision and multimedia. These gestures can be represented as sets of feature vectors that change over time. Recurrent Neural Networks (RNNs) are suited to analyse this type of sets thanks to their ability to model the long term contextual information of temporal sequences. In this paper, a RNN is trained by using as features the angles formed by the finger bones of human hands. The selected features, acquired by a Leap Motion Controller (LMC) sensor, have been chosen because the majority of human gestures produce joint movements that generate truly characteristic corners. A challenging subset composed by a large number of gestures defined by the American Sign Language (ASL) is used to test the proposed solution and the effectiveness of the selected angles. Moreover, the proposed method has been compared to other state of the art works on the SHREC dataset, thus demonstrating its superiority in hand gesture recognition accuracy.
Tasks	Gesture Recognition, Hand Gesture Recognition, Hand-Gesture Recognition
Published	2018-03-28
URL	http://arxiv.org/abs/1803.10435v1
PDF	http://arxiv.org/pdf/1803.10435v1.pdf
PWC	https://paperswithcode.com/paper/exploiting-recurrent-neural-networks-and-leap
Repo
Framework

Identifiability of Complete Dictionary Learning


Title	Identifiability of Complete Dictionary Learning
Authors	Jérémy E. Cohen, Nicolas Gillis
Abstract	Sparse component analysis (SCA), also known as complete dictionary learning, is the following problem: Given an input matrix $M$ and an integer $r$, find a dictionary $D$ with $r$ columns and a matrix $B$ with $k$-sparse columns (that is, each column of $B$ has at most $k$ non-zero entries) such that $M \approx DB$. A key issue in SCA is identifiability, that is, characterizing the conditions under which $D$ and $B$ are essentially unique (that is, they are unique up to permutation and scaling of the columns of $D$ and rows of $B$). Although SCA has been vastly investigated in the last two decades, only a few works have tackled this issue in the deterministic scenario, and no work provides reasonable bounds in the minimum number of samples (that is, columns of $M$) that leads to identifiability. In this work, we provide new results in the deterministic scenario when the data has a low-rank structure, that is, when $D$ is (under)complete. While previous bounds feature a combinatorial term $r \choose k$, we exhibit a sufficient condition involving $\mathcal{O}(r^3/(r-k)^2)$ samples that yields an essentially unique decomposition, as long as these data points are well spread among the subspaces spanned by $r-1$ columns of $D$. We also exhibit a necessary lower bound on the number of samples that contradicts previous results in the literature when $k$ equals $r-1$. Our bounds provide a drastic improvement compared to the state of the art, and imply for example that for a fixed proportion of zeros (constant and independent of $r$, e.g., 10% of zero entries in $B$), one only requires $\mathcal{O}(r)$ data points to guarantee identifiability.
Tasks	Dictionary Learning
Published	2018-08-27
URL	http://arxiv.org/abs/1808.08765v2
PDF	http://arxiv.org/pdf/1808.08765v2.pdf
PWC	https://paperswithcode.com/paper/identifiability-of-low-rank-sparse-component
Repo
Framework

Decentralized Dictionary Learning Over Time-Varying Digraphs


Title	Decentralized Dictionary Learning Over Time-Varying Digraphs
Authors	Amir Daneshmand, Ying Sun, Gesualdo Scutari, Francisco Facchinei, Brian M. Sadler
Abstract	This paper studies Dictionary Learning problems wherein the learning task is distributed over a multi-agent network, modeled as a time-varying directed graph. This formulation is relevant, for instance, in Big Data scenarios where massive amounts of data are collected/stored in different locations (e.g., sensors, clouds) and aggregating and/or processing all data in a fusion center might be inefficient or unfeasible, due to resource limitations, communication overheads or privacy issues. We develop a unified decentralized algorithmic framework for this class of nonconvex problems, which is proved to converge to stationary solutions at a sublinear rate. The new method hinges on Successive Convex Approximation techniques, coupled with a decentralized tracking mechanism aiming at locally estimating the gradient of the smooth part of the sum-utility. To the best of our knowledge, this is the first provably convergent decentralized algorithm for Dictionary Learning and, more generally, bi-convex problems over (time-varying) (di)graphs.
Tasks	Dictionary Learning
Published	2018-08-17
URL	http://arxiv.org/abs/1808.05933v2
PDF	http://arxiv.org/pdf/1808.05933v2.pdf
PWC	https://paperswithcode.com/paper/decentralized-dictionary-learning-over-time
Repo
Framework

Overcoming the Curse of Dimensionality in Neural Networks


Title	Overcoming the Curse of Dimensionality in Neural Networks
Authors	Karen Yeressian
Abstract	Let $A$ be a set and $V$ a real Hilbert space. Let $H$ be a real Hilbert space of functions $f:A\to V$ and assume $H$ is continuously embedded in the Banach space of bounded functions. For $i=1,\cdots,n$, let $(x_i,y_i)\in A\times V$ comprise our dataset. Let $0<q<1$ and $f^\in H$ be the unique global minimizer of the functional \begin{equation} u(f) = \frac{q}{2}\Vert f\Vert_{H}^{2} + \frac{1-q}{2n}\sum_{i=1}^{n}\Vert f(x_i)-y_i\Vert_{V}^{2}. \end{equation} In this paper we show that for each $k\in\mathbb{N}$ there exists a two layer network where the first layer has $k$ functions which are Riesz representations in the Hilbert space $H$ of point evaluation functionals and the second layer is a weighted sum of the first layer, such that the functions $f_k$ realized by these networks satisfy \begin{equation} \Vert f_{k}-f^\Vert_{H}^{2} \leq \Bigl( o(1) + \frac{C}{q^2} E\bigl[ \Vert Du_{I}(f^)\Vert_{H^{}}^{2} \bigr] \Bigr)\frac{1}{k}. \end{equation} %Let us note that $x_i$ do not need to be in a linear space and $y_i$ are in a possibly infinite dimensional Hilbert space $V$. %The error estimate is independent of the data size $n$ and in the case $V$ is finite dimensional %the error estimate is also independent of the dimension of $V$. By choosing the Hilbert space $H$ appropriately, the computational complexity of evaluating the Riesz representations of point evaluations might be small and thus the network has low computational complexity.
Tasks
Published	2018-09-02
URL	https://arxiv.org/abs/1809.00368v5
PDF	https://arxiv.org/pdf/1809.00368v5.pdf
PWC	https://paperswithcode.com/paper/on-overcoming-the-curse-of-dimensionality-in
Repo
Framework

Cross-Paced Representation Learning with Partial Curricula for Sketch-based Image Retrieval


Title	Cross-Paced Representation Learning with Partial Curricula for Sketch-based Image Retrieval
Authors	Dan Xu, Xavier Alameda-Pineda, Jingkuan Song, Elisa Ricci, Nicu Sebe
Abstract	In this paper we address the problem of learning robust cross-domain representations for sketch-based image retrieval (SBIR). While most SBIR approaches focus on extracting low- and mid-level descriptors for direct feature matching, recent works have shown the benefit of learning coupled feature representations to describe data from two related sources. However, cross-domain representation learning methods are typically cast into non-convex minimization problems that are difficult to optimize, leading to unsatisfactory performance. Inspired by self-paced learning, a learning methodology designed to overcome convergence issues related to local optima by exploiting the samples in a meaningful order (i.e. easy to hard), we introduce the cross-paced partial curriculum learning (CPPCL) framework. Compared with existing self-paced learning methods which only consider a single modality and cannot deal with prior knowledge, CPPCL is specifically designed to assess the learning pace by jointly handling data from dual sources and modality-specific prior information provided in the form of partial curricula. Additionally, thanks to the learned dictionaries, we demonstrate that the proposed CPPCL embeds robust coupled representations for SBIR. Our approach is extensively evaluated on four publicly available datasets (i.e. CUFS, Flickr15K, QueenMary SBIR and TU-Berlin Extension datasets), showing superior performance over competing SBIR methods.
Tasks	Image Retrieval, Representation Learning, Sketch-Based Image Retrieval
Published	2018-03-05
URL	http://arxiv.org/abs/1803.01504v1
PDF	http://arxiv.org/pdf/1803.01504v1.pdf
PWC	https://paperswithcode.com/paper/cross-paced-representation-learning-with
Repo
Framework

Global Optimality in Separable Dictionary Learning with Applications to the Analysis of Diffusion MRI


Title	Global Optimality in Separable Dictionary Learning with Applications to the Analysis of Diffusion MRI
Authors	Evan Schwab, Benjamin D. Haeffele, René Vidal, Nicolas Charon
Abstract	Sparse dictionary learning is a popular method for representing signals as linear combinations of a few elements from a dictionary that is learned from the data. In the classical setting, signals are represented as vectors and the dictionary learning problem is posed as a matrix factorization problem where the data matrix is approximately factorized into a dictionary matrix and a sparse matrix of coefficients. However, in many applications in computer vision and medical imaging, signals are better represented as matrices or tensors (e.g. images or videos), where it may be beneficial to exploit the multi-dimensional structure of the data to learn a more compact representation. One such approach is separable dictionary learning, where one learns separate dictionaries for different dimensions of the data. However, typical formulations involve solving a non-convex optimization problem; thus guaranteeing global optimality remains a challenge. In this work, we propose a framework that builds upon recent developments in matrix factorization to provide theoretical and numerical guarantees of global optimality for separable dictionary learning. We propose an algorithm to find such a globally optimal solution, which alternates between following local descent steps and checking a certificate for global optimality. We illustrate our approach on diffusion magnetic resonance imaging (dMRI) data, a medical imaging modality that measures water diffusion along multiple angular directions in every voxel of an MRI volume. State-of-the-art methods in dMRI either learn dictionaries only for the angular domain of the signals or in some cases learn spatial and angular dictionaries independently. In this work, we apply the proposed separable dictionary learning framework to learn spatial and angular dMRI dictionaries jointly and provide preliminary validation on denoising phantom and real dMRI brain data.
Tasks	Denoising, Dictionary Learning
Published	2018-07-15
URL	https://arxiv.org/abs/1807.05595v2
PDF	https://arxiv.org/pdf/1807.05595v2.pdf
PWC	https://paperswithcode.com/paper/separable-dictionary-learning-with-global
Repo
Framework

That’s Mine! Learning Ownership Relations and Norms for Robots


Title	That’s Mine! Learning Ownership Relations and Norms for Robots
Authors	Zhi-Xuan Tan, Jake Brawer, Brian Scassellati
Abstract	The ability for autonomous agents to learn and conform to human norms is crucial for their safety and effectiveness in social environments. While recent work has led to frameworks for the representation and inference of simple social rules, research into norm learning remains at an exploratory stage. Here, we present a robotic system capable of representing, learning, and inferring ownership relations and norms. Ownership is represented as a graph of probabilistic relations between objects and their owners, along with a database of predicate-based norms that constrain the actions permissible on owned objects. To learn these norms and relations, our system integrates (i) a novel incremental norm learning algorithm capable of both one-shot learning and induction from specific examples, (ii) Bayesian inference of ownership relations in response to apparent rule violations, and (iii) percept-based prediction of an object’s likely owners. Through a series of simulated and real-world experiments, we demonstrate the competence and flexibility of the system in performing object manipulation tasks that require a variety of norms to be followed, laying the groundwork for future research into the acquisition and application of social norms.
Tasks	Bayesian Inference, One-Shot Learning
Published	2018-12-02
URL	http://arxiv.org/abs/1812.02576v2
PDF	http://arxiv.org/pdf/1812.02576v2.pdf
PWC	https://paperswithcode.com/paper/thats-mine-learning-ownership-relations-and
Repo
Framework

Dual Swap Disentangling


Title	Dual Swap Disentangling
Authors	Zunlei Feng, Xinchao Wang, Chenglong Ke, Anxiang Zeng, Dacheng Tao, Mingli Song
Abstract	Learning interpretable disentangled representations is a crucial yet challenging task. In this paper, we propose a weakly semi-supervised method, termed as Dual Swap Disentangling (DSD), for disentangling using both labeled and unlabeled data. Unlike conventional weakly supervised methods that rely on full annotations on the group of samples, we require only limited annotations on paired samples that indicate their shared attribute like the color. Our model takes the form of a dual autoencoder structure. To achieve disentangling using the labeled pairs, we follow a “encoding-swap-decoding” process, where we first swap the parts of their encodings corresponding to the shared attribute and then decode the obtained hybrid codes to reconstruct the original input pairs. For unlabeled pairs, we follow the “encoding-swap-decoding” process twice on designated encoding parts and enforce the final outputs to approximate the input pairs. By isolating parts of the encoding and swapping them back and forth, we impose the dimension-wise modularity and portability of the encodings of the unlabeled samples, which implicitly encourages disentangling under the guidance of labeled pairs. This dual swap mechanism, tailored for semi-supervised setting, turns out to be very effective. Experiments on image datasets from a wide domain show that our model yields state-of-the-art disentangling performances.
Tasks
Published	2018-05-27
URL	https://arxiv.org/abs/1805.10583v3
PDF	https://arxiv.org/pdf/1805.10583v3.pdf
PWC	https://paperswithcode.com/paper/dual-swap-disentangling
Repo
Framework

Measuring the Effects of Data Parallelism on Neural Network Training


Title	Measuring the Effects of Data Parallelism on Neural Network Training
Authors	Christopher J. Shallue, Jaehoon Lee, Joseph Antognini, Jascha Sohl-Dickstein, Roy Frostig, George E. Dahl
Abstract	Recent hardware developments have dramatically increased the scale of data parallelism available for neural network training. Among the simplest ways to harness next-generation hardware is to increase the batch size in standard mini-batch neural network training algorithms. In this work, we aim to experimentally characterize the effects of increasing the batch size on training time, as measured by the number of steps necessary to reach a goal out-of-sample error. We study how this relationship varies with the training algorithm, model, and data set, and find extremely large variation between workloads. Along the way, we show that disagreements in the literature on how batch size affects model quality can largely be explained by differences in metaparameter tuning and compute budgets at different batch sizes. We find no evidence that larger batch sizes degrade out-of-sample performance. Finally, we discuss the implications of our results on efforts to train neural networks much faster in the future. Our experimental data is publicly available as a database of 71,638,836 loss measurements taken over the course of training for 168,160 individual models across 35 workloads.
Tasks
Published	2018-11-08
URL	https://arxiv.org/abs/1811.03600v3
PDF	https://arxiv.org/pdf/1811.03600v3.pdf
PWC	https://paperswithcode.com/paper/measuring-the-effects-of-data-parallelism-on
Repo
Framework

On Human Robot Interaction using Multiple Modes


Title	On Human Robot Interaction using Multiple Modes
Authors	Neha Baranwal
Abstract	Humanoid robots have apparently similar body structure like human beings. Due to their technical design, they are sharing the same workspace with humans. They are placed to clean things, to assist old age people, to entertain us and most importantly to serve us. To be acceptable in the household, they must have higher level of intelligence than industrial robots and they must be social and capable of interacting people around it, who are not supposed to be robot specialist. All these come under the field of human robot interaction (HRI). There are various modes like speech, gesture, behavior etc. through which human can interact with robots. To solve all these challenges, a multimodel technique has been introduced where gesture as well as speech is used as a mode of interaction.
Tasks
Published	2018-11-17
URL	http://arxiv.org/abs/1811.07206v1
PDF	http://arxiv.org/pdf/1811.07206v1.pdf
PWC	https://paperswithcode.com/paper/on-human-robot-interaction-using-multiple
Repo
Framework

Spatial-Temporal Digital Image Correlation: A Unified Framework


Title	Spatial-Temporal Digital Image Correlation: A Unified Framework
Authors	Yuxi Chi, Bing Pan
Abstract	A comprehensive and systematic framework for easily extending and implementing the subset-based spatial-temporal digital image correlation (DIC) algorithm is presented. The framework decouples the three main factors (i.e. shape function, correlation criterion, and optimization algorithm) involved in algorithm implementation of DIC and represents different algorithms in a uniform form. One can freely choose and combine the three factors to meet his own need, or freely add more parameters to extract analytic results. Subpixel translation and a simulated image series with different velocity characters are analyzed using different algorithms based on the proposed framework, confirming the merit of noise suppression and velocity compatibility. An application of mitigating air disturbance due to heat haze using spatial-temporal DIC is given to demonstrate the applicability of the framework.
Tasks
Published	2018-12-12
URL	http://arxiv.org/abs/1812.04826v2
PDF	http://arxiv.org/pdf/1812.04826v2.pdf
PWC	https://paperswithcode.com/paper/spatial-temporal-digital-image-correlation-a
Repo
Framework

MotherNets: Rapid Deep Ensemble Learning


Title	MotherNets: Rapid Deep Ensemble Learning
Authors	Abdul Wasay, Brian Hentschel, Yuze Liao, Sanyuan Chen, Stratos Idreos
Abstract	Ensembles of deep neural networks significantly improve generalization accuracy. However, training neural network ensembles requires a large amount of computational resources and time. State-of-the-art approaches either train all networks from scratch leading to prohibitive training cost that allows only very small ensemble sizes in practice, or generate ensembles by training a monolithic architecture, which results in lower model diversity and decreased prediction accuracy. We propose MotherNets to enable higher accuracy and practical training cost for large and diverse neural network ensembles: A MotherNet captures the structural similarity across some or all members of a deep neural network ensemble which allows us to share data movement and computation costs across these networks. We first train a single or a small set of MotherNets and, subsequently, we generate the target ensemble networks by transferring the function from the trained MotherNet(s). Then, we continue to train these ensemble networks, which now converge drastically faster compared to training from scratch. MotherNets handle ensembles with diverse architectures by clustering ensemble networks of similar architecture and training a separate MotherNet for every cluster. MotherNets also use clustering to control the accuracy vs. training cost tradeoff. We show that compared to state-of-the-art approaches such as Snapshot Ensembles, Knowledge Distillation, and TreeNets, MotherNets provide a new Pareto frontier for the accuracy-training cost tradeoff. Crucially, training cost and accuracy improvements continue to scale as we increase the ensemble size (2 to 3 percent reduced absolute test error rate and up to 35 percent faster training compared to Snapshot Ensembles). We verify these benefits over numerous neural network architectures and large data sets.
Tasks
Published	2018-09-12
URL	https://arxiv.org/abs/1809.04270v2
PDF	https://arxiv.org/pdf/1809.04270v2.pdf
PWC	https://paperswithcode.com/paper/rapid-training-of-very-large-ensembles-of
Repo
Framework

Predicting the Programming Language of Questions and Snippets of StackOverflow Using Natural Language Processing


Title	Predicting the Programming Language of Questions and Snippets of StackOverflow Using Natural Language Processing
Authors	Kamel Alreshedy, Dhanush Dharmaretnam, Daniel M. German, Venkatesh Srinivasan, T. Aaron Gulliver
Abstract	Stack Overflow is the most popular Q&A website among software developers. As a platform for knowledge sharing and acquisition, the questions posted in Stack Overflow usually contain a code snippet. Stack Overflow relies on users to properly tag the programming language of a question and it simply assumes that the programming language of the snippets inside a question is the same as the tag of the question itself. In this paper, we propose a classifier to predict the programming language of questions posted in Stack Overflow using Natural Language Processing (NLP) and Machine Learning (ML). The classifier achieves an accuracy of 91.1% in predicting the 24 most popular programming languages by combining features from the title, body and the code snippets of the question. We also propose a classifier that only uses the title and body of the question and has an accuracy of 81.1%. Finally, we propose a classifier of code snippets only that achieves an accuracy of 77.7%. These results show that deploying Machine Learning techniques on the combination of text and the code snippets of a question provides the best performance. These results demonstrate also that it is possible to identify the programming language of a snippet of few lines of source code. We visualize the feature space of two programming languages Java and SQL in order to identify some special properties of information inside the questions in Stack Overflow corresponding to these languages.
Tasks
Published	2018-09-21
URL	http://arxiv.org/abs/1809.07954v1
PDF	http://arxiv.org/pdf/1809.07954v1.pdf
PWC	https://paperswithcode.com/paper/predicting-the-programming-language-of
Repo
Framework

Contrastive Multivariate Singular Spectrum Analysis


Title	Contrastive Multivariate Singular Spectrum Analysis
Authors	Abdi-Hakin Dirie, Abubakar Abid, James Zou
Abstract	We introduce Contrastive Multivariate Singular Spectrum Analysis, a novel unsupervised method for dimensionality reduction and signal decomposition of time series data. By utilizing an appropriate background dataset, the method transforms a target time series dataset in a way that evinces the sub-signals that are enhanced in the target dataset, as opposed to only those that account for the greatest variance. This shifts the goal from finding signals that explain the most variance to signals that matter the most to the analyst. We demonstrate our method on an illustrative synthetic example, as well as show the utility of our method in the downstream clustering of electrocardiogram signals from the public MHEALTH dataset.
Tasks	Dimensionality Reduction, Time Series
Published	2018-10-31
URL	http://arxiv.org/abs/1810.13317v1
PDF	http://arxiv.org/pdf/1810.13317v1.pdf
PWC	https://paperswithcode.com/paper/contrastive-multivariate-singular-spectrum
Repo
Framework

Efficient Online Scalar Annotation with Bounded Support


Title	Efficient Online Scalar Annotation with Bounded Support
Authors	Keisuke Sakaguchi, Benjamin Van Durme
Abstract	We describe a novel method for efficiently eliciting scalar annotations for dataset construction and system quality estimation by human judgments. We contrast direct assessment (annotators assign scores to items directly), online pairwise ranking aggregation (scores derive from annotator comparison of items), and a hybrid approach (EASL: Efficient Annotation of Scalar Labels) proposed here. Our proposal leads to increased correlation with ground truth, at far greater annotator efficiency, suggesting this strategy as an improved mechanism for dataset creation and manual system evaluation.
Tasks
Published	2018-06-04
URL	http://arxiv.org/abs/1806.01170v1
PDF	http://arxiv.org/pdf/1806.01170v1.pdf
PWC	https://paperswithcode.com/paper/efficient-online-scalar-annotation-with
Repo
Framework