Paper Group ANR 649
Exploiting Recurrent Neural Networks and Leap Motion Controller for Sign Language and Semaphoric Gesture Recognition. Identifiability of Complete Dictionary Learning. Decentralized Dictionary Learning Over Time-Varying Digraphs. Overcoming the Curse of Dimensionality in Neural Networks. Cross-Paced Representation Learning with Partial Curricula for …
Exploiting Recurrent Neural Networks and Leap Motion Controller for Sign Language and Semaphoric Gesture Recognition
Title | Exploiting Recurrent Neural Networks and Leap Motion Controller for Sign Language and Semaphoric Gesture Recognition |
Authors | Danilo Avola, Marco Bernardi, Luigi Cinque, Gian Luca Foresti, Cristiano Massaroni |
Abstract | In human interactions, hands are a powerful way of expressing information that, in some cases, can be used as a valid substitute for voice, as it happens in Sign Language. Hand gesture recognition has always been an interesting topic in the areas of computer vision and multimedia. These gestures can be represented as sets of feature vectors that change over time. Recurrent Neural Networks (RNNs) are suited to analyse this type of sets thanks to their ability to model the long term contextual information of temporal sequences. In this paper, a RNN is trained by using as features the angles formed by the finger bones of human hands. The selected features, acquired by a Leap Motion Controller (LMC) sensor, have been chosen because the majority of human gestures produce joint movements that generate truly characteristic corners. A challenging subset composed by a large number of gestures defined by the American Sign Language (ASL) is used to test the proposed solution and the effectiveness of the selected angles. Moreover, the proposed method has been compared to other state of the art works on the SHREC dataset, thus demonstrating its superiority in hand gesture recognition accuracy. |
Tasks | Gesture Recognition, Hand Gesture Recognition, Hand-Gesture Recognition |
Published | 2018-03-28 |
URL | http://arxiv.org/abs/1803.10435v1 |
http://arxiv.org/pdf/1803.10435v1.pdf | |
PWC | https://paperswithcode.com/paper/exploiting-recurrent-neural-networks-and-leap |
Repo | |
Framework | |
Identifiability of Complete Dictionary Learning
Title | Identifiability of Complete Dictionary Learning |
Authors | Jérémy E. Cohen, Nicolas Gillis |
Abstract | Sparse component analysis (SCA), also known as complete dictionary learning, is the following problem: Given an input matrix $M$ and an integer $r$, find a dictionary $D$ with $r$ columns and a matrix $B$ with $k$-sparse columns (that is, each column of $B$ has at most $k$ non-zero entries) such that $M \approx DB$. A key issue in SCA is identifiability, that is, characterizing the conditions under which $D$ and $B$ are essentially unique (that is, they are unique up to permutation and scaling of the columns of $D$ and rows of $B$). Although SCA has been vastly investigated in the last two decades, only a few works have tackled this issue in the deterministic scenario, and no work provides reasonable bounds in the minimum number of samples (that is, columns of $M$) that leads to identifiability. In this work, we provide new results in the deterministic scenario when the data has a low-rank structure, that is, when $D$ is (under)complete. While previous bounds feature a combinatorial term $r \choose k$, we exhibit a sufficient condition involving $\mathcal{O}(r^3/(r-k)^2)$ samples that yields an essentially unique decomposition, as long as these data points are well spread among the subspaces spanned by $r-1$ columns of $D$. We also exhibit a necessary lower bound on the number of samples that contradicts previous results in the literature when $k$ equals $r-1$. Our bounds provide a drastic improvement compared to the state of the art, and imply for example that for a fixed proportion of zeros (constant and independent of $r$, e.g., 10% of zero entries in $B$), one only requires $\mathcal{O}(r)$ data points to guarantee identifiability. |
Tasks | Dictionary Learning |
Published | 2018-08-27 |
URL | http://arxiv.org/abs/1808.08765v2 |
http://arxiv.org/pdf/1808.08765v2.pdf | |
PWC | https://paperswithcode.com/paper/identifiability-of-low-rank-sparse-component |
Repo | |
Framework | |
Decentralized Dictionary Learning Over Time-Varying Digraphs
Title | Decentralized Dictionary Learning Over Time-Varying Digraphs |
Authors | Amir Daneshmand, Ying Sun, Gesualdo Scutari, Francisco Facchinei, Brian M. Sadler |
Abstract | This paper studies Dictionary Learning problems wherein the learning task is distributed over a multi-agent network, modeled as a time-varying directed graph. This formulation is relevant, for instance, in Big Data scenarios where massive amounts of data are collected/stored in different locations (e.g., sensors, clouds) and aggregating and/or processing all data in a fusion center might be inefficient or unfeasible, due to resource limitations, communication overheads or privacy issues. We develop a unified decentralized algorithmic framework for this class of nonconvex problems, which is proved to converge to stationary solutions at a sublinear rate. The new method hinges on Successive Convex Approximation techniques, coupled with a decentralized tracking mechanism aiming at locally estimating the gradient of the smooth part of the sum-utility. To the best of our knowledge, this is the first provably convergent decentralized algorithm for Dictionary Learning and, more generally, bi-convex problems over (time-varying) (di)graphs. |
Tasks | Dictionary Learning |
Published | 2018-08-17 |
URL | http://arxiv.org/abs/1808.05933v2 |
http://arxiv.org/pdf/1808.05933v2.pdf | |
PWC | https://paperswithcode.com/paper/decentralized-dictionary-learning-over-time |
Repo | |
Framework | |
Overcoming the Curse of Dimensionality in Neural Networks
Title | Overcoming the Curse of Dimensionality in Neural Networks |
Authors | Karen Yeressian |
Abstract | Let $A$ be a set and $V$ a real Hilbert space. Let $H$ be a real Hilbert space of functions $f:A\to V$ and assume $H$ is continuously embedded in the Banach space of bounded functions. For $i=1,\cdots,n$, let $(x_i,y_i)\in A\times V$ comprise our dataset. Let $0<q<1$ and $f^*\in H$ be the unique global minimizer of the functional \begin{equation*} u(f) = \frac{q}{2}\Vert f\Vert_{H}^{2} + \frac{1-q}{2n}\sum_{i=1}^{n}\Vert f(x_i)-y_i\Vert_{V}^{2}. \end{equation*} In this paper we show that for each $k\in\mathbb{N}$ there exists a two layer network where the first layer has $k$ functions which are Riesz representations in the Hilbert space $H$ of point evaluation functionals and the second layer is a weighted sum of the first layer, such that the functions $f_k$ realized by these networks satisfy \begin{equation*} \Vert f_{k}-f^*\Vert_{H}^{2} \leq \Bigl( o(1) + \frac{C}{q^2} E\bigl[ \Vert Du_{I}(f^*)\Vert_{H^{*}}^{2} \bigr] \Bigr)\frac{1}{k}. \end{equation*} %Let us note that $x_i$ do not need to be in a linear space and $y_i$ are in a possibly infinite dimensional Hilbert space $V$. %The error estimate is independent of the data size $n$ and in the case $V$ is finite dimensional %the error estimate is also independent of the dimension of $V$. By choosing the Hilbert space $H$ appropriately, the computational complexity of evaluating the Riesz representations of point evaluations might be small and thus the network has low computational complexity. |
Tasks | |
Published | 2018-09-02 |
URL | https://arxiv.org/abs/1809.00368v5 |
https://arxiv.org/pdf/1809.00368v5.pdf | |
PWC | https://paperswithcode.com/paper/on-overcoming-the-curse-of-dimensionality-in |
Repo | |
Framework | |
Cross-Paced Representation Learning with Partial Curricula for Sketch-based Image Retrieval
Title | Cross-Paced Representation Learning with Partial Curricula for Sketch-based Image Retrieval |
Authors | Dan Xu, Xavier Alameda-Pineda, Jingkuan Song, Elisa Ricci, Nicu Sebe |
Abstract | In this paper we address the problem of learning robust cross-domain representations for sketch-based image retrieval (SBIR). While most SBIR approaches focus on extracting low- and mid-level descriptors for direct feature matching, recent works have shown the benefit of learning coupled feature representations to describe data from two related sources. However, cross-domain representation learning methods are typically cast into non-convex minimization problems that are difficult to optimize, leading to unsatisfactory performance. Inspired by self-paced learning, a learning methodology designed to overcome convergence issues related to local optima by exploiting the samples in a meaningful order (i.e. easy to hard), we introduce the cross-paced partial curriculum learning (CPPCL) framework. Compared with existing self-paced learning methods which only consider a single modality and cannot deal with prior knowledge, CPPCL is specifically designed to assess the learning pace by jointly handling data from dual sources and modality-specific prior information provided in the form of partial curricula. Additionally, thanks to the learned dictionaries, we demonstrate that the proposed CPPCL embeds robust coupled representations for SBIR. Our approach is extensively evaluated on four publicly available datasets (i.e. CUFS, Flickr15K, QueenMary SBIR and TU-Berlin Extension datasets), showing superior performance over competing SBIR methods. |
Tasks | Image Retrieval, Representation Learning, Sketch-Based Image Retrieval |
Published | 2018-03-05 |
URL | http://arxiv.org/abs/1803.01504v1 |
http://arxiv.org/pdf/1803.01504v1.pdf | |
PWC | https://paperswithcode.com/paper/cross-paced-representation-learning-with |
Repo | |
Framework | |
Global Optimality in Separable Dictionary Learning with Applications to the Analysis of Diffusion MRI
Title | Global Optimality in Separable Dictionary Learning with Applications to the Analysis of Diffusion MRI |
Authors | Evan Schwab, Benjamin D. Haeffele, René Vidal, Nicolas Charon |
Abstract | Sparse dictionary learning is a popular method for representing signals as linear combinations of a few elements from a dictionary that is learned from the data. In the classical setting, signals are represented as vectors and the dictionary learning problem is posed as a matrix factorization problem where the data matrix is approximately factorized into a dictionary matrix and a sparse matrix of coefficients. However, in many applications in computer vision and medical imaging, signals are better represented as matrices or tensors (e.g. images or videos), where it may be beneficial to exploit the multi-dimensional structure of the data to learn a more compact representation. One such approach is separable dictionary learning, where one learns separate dictionaries for different dimensions of the data. However, typical formulations involve solving a non-convex optimization problem; thus guaranteeing global optimality remains a challenge. In this work, we propose a framework that builds upon recent developments in matrix factorization to provide theoretical and numerical guarantees of global optimality for separable dictionary learning. We propose an algorithm to find such a globally optimal solution, which alternates between following local descent steps and checking a certificate for global optimality. We illustrate our approach on diffusion magnetic resonance imaging (dMRI) data, a medical imaging modality that measures water diffusion along multiple angular directions in every voxel of an MRI volume. State-of-the-art methods in dMRI either learn dictionaries only for the angular domain of the signals or in some cases learn spatial and angular dictionaries independently. In this work, we apply the proposed separable dictionary learning framework to learn spatial and angular dMRI dictionaries jointly and provide preliminary validation on denoising phantom and real dMRI brain data. |
Tasks | Denoising, Dictionary Learning |
Published | 2018-07-15 |
URL | https://arxiv.org/abs/1807.05595v2 |
https://arxiv.org/pdf/1807.05595v2.pdf | |
PWC | https://paperswithcode.com/paper/separable-dictionary-learning-with-global |
Repo | |
Framework | |
That’s Mine! Learning Ownership Relations and Norms for Robots
Title | That’s Mine! Learning Ownership Relations and Norms for Robots |
Authors | Zhi-Xuan Tan, Jake Brawer, Brian Scassellati |
Abstract | The ability for autonomous agents to learn and conform to human norms is crucial for their safety and effectiveness in social environments. While recent work has led to frameworks for the representation and inference of simple social rules, research into norm learning remains at an exploratory stage. Here, we present a robotic system capable of representing, learning, and inferring ownership relations and norms. Ownership is represented as a graph of probabilistic relations between objects and their owners, along with a database of predicate-based norms that constrain the actions permissible on owned objects. To learn these norms and relations, our system integrates (i) a novel incremental norm learning algorithm capable of both one-shot learning and induction from specific examples, (ii) Bayesian inference of ownership relations in response to apparent rule violations, and (iii) percept-based prediction of an object’s likely owners. Through a series of simulated and real-world experiments, we demonstrate the competence and flexibility of the system in performing object manipulation tasks that require a variety of norms to be followed, laying the groundwork for future research into the acquisition and application of social norms. |
Tasks | Bayesian Inference, One-Shot Learning |
Published | 2018-12-02 |
URL | http://arxiv.org/abs/1812.02576v2 |
http://arxiv.org/pdf/1812.02576v2.pdf | |
PWC | https://paperswithcode.com/paper/thats-mine-learning-ownership-relations-and |
Repo | |
Framework | |
Dual Swap Disentangling
Title | Dual Swap Disentangling |
Authors | Zunlei Feng, Xinchao Wang, Chenglong Ke, Anxiang Zeng, Dacheng Tao, Mingli Song |
Abstract | Learning interpretable disentangled representations is a crucial yet challenging task. In this paper, we propose a weakly semi-supervised method, termed as Dual Swap Disentangling (DSD), for disentangling using both labeled and unlabeled data. Unlike conventional weakly supervised methods that rely on full annotations on the group of samples, we require only limited annotations on paired samples that indicate their shared attribute like the color. Our model takes the form of a dual autoencoder structure. To achieve disentangling using the labeled pairs, we follow a “encoding-swap-decoding” process, where we first swap the parts of their encodings corresponding to the shared attribute and then decode the obtained hybrid codes to reconstruct the original input pairs. For unlabeled pairs, we follow the “encoding-swap-decoding” process twice on designated encoding parts and enforce the final outputs to approximate the input pairs. By isolating parts of the encoding and swapping them back and forth, we impose the dimension-wise modularity and portability of the encodings of the unlabeled samples, which implicitly encourages disentangling under the guidance of labeled pairs. This dual swap mechanism, tailored for semi-supervised setting, turns out to be very effective. Experiments on image datasets from a wide domain show that our model yields state-of-the-art disentangling performances. |
Tasks | |
Published | 2018-05-27 |
URL | https://arxiv.org/abs/1805.10583v3 |
https://arxiv.org/pdf/1805.10583v3.pdf | |
PWC | https://paperswithcode.com/paper/dual-swap-disentangling |
Repo | |
Framework | |
Measuring the Effects of Data Parallelism on Neural Network Training
Title | Measuring the Effects of Data Parallelism on Neural Network Training |
Authors | Christopher J. Shallue, Jaehoon Lee, Joseph Antognini, Jascha Sohl-Dickstein, Roy Frostig, George E. Dahl |
Abstract | Recent hardware developments have dramatically increased the scale of data parallelism available for neural network training. Among the simplest ways to harness next-generation hardware is to increase the batch size in standard mini-batch neural network training algorithms. In this work, we aim to experimentally characterize the effects of increasing the batch size on training time, as measured by the number of steps necessary to reach a goal out-of-sample error. We study how this relationship varies with the training algorithm, model, and data set, and find extremely large variation between workloads. Along the way, we show that disagreements in the literature on how batch size affects model quality can largely be explained by differences in metaparameter tuning and compute budgets at different batch sizes. We find no evidence that larger batch sizes degrade out-of-sample performance. Finally, we discuss the implications of our results on efforts to train neural networks much faster in the future. Our experimental data is publicly available as a database of 71,638,836 loss measurements taken over the course of training for 168,160 individual models across 35 workloads. |
Tasks | |
Published | 2018-11-08 |
URL | https://arxiv.org/abs/1811.03600v3 |
https://arxiv.org/pdf/1811.03600v3.pdf | |
PWC | https://paperswithcode.com/paper/measuring-the-effects-of-data-parallelism-on |
Repo | |
Framework | |
On Human Robot Interaction using Multiple Modes
Title | On Human Robot Interaction using Multiple Modes |
Authors | Neha Baranwal |
Abstract | Humanoid robots have apparently similar body structure like human beings. Due to their technical design, they are sharing the same workspace with humans. They are placed to clean things, to assist old age people, to entertain us and most importantly to serve us. To be acceptable in the household, they must have higher level of intelligence than industrial robots and they must be social and capable of interacting people around it, who are not supposed to be robot specialist. All these come under the field of human robot interaction (HRI). There are various modes like speech, gesture, behavior etc. through which human can interact with robots. To solve all these challenges, a multimodel technique has been introduced where gesture as well as speech is used as a mode of interaction. |
Tasks | |
Published | 2018-11-17 |
URL | http://arxiv.org/abs/1811.07206v1 |
http://arxiv.org/pdf/1811.07206v1.pdf | |
PWC | https://paperswithcode.com/paper/on-human-robot-interaction-using-multiple |
Repo | |
Framework | |
Spatial-Temporal Digital Image Correlation: A Unified Framework
Title | Spatial-Temporal Digital Image Correlation: A Unified Framework |
Authors | Yuxi Chi, Bing Pan |
Abstract | A comprehensive and systematic framework for easily extending and implementing the subset-based spatial-temporal digital image correlation (DIC) algorithm is presented. The framework decouples the three main factors (i.e. shape function, correlation criterion, and optimization algorithm) involved in algorithm implementation of DIC and represents different algorithms in a uniform form. One can freely choose and combine the three factors to meet his own need, or freely add more parameters to extract analytic results. Subpixel translation and a simulated image series with different velocity characters are analyzed using different algorithms based on the proposed framework, confirming the merit of noise suppression and velocity compatibility. An application of mitigating air disturbance due to heat haze using spatial-temporal DIC is given to demonstrate the applicability of the framework. |
Tasks | |
Published | 2018-12-12 |
URL | http://arxiv.org/abs/1812.04826v2 |
http://arxiv.org/pdf/1812.04826v2.pdf | |
PWC | https://paperswithcode.com/paper/spatial-temporal-digital-image-correlation-a |
Repo | |
Framework | |
MotherNets: Rapid Deep Ensemble Learning
Title | MotherNets: Rapid Deep Ensemble Learning |
Authors | Abdul Wasay, Brian Hentschel, Yuze Liao, Sanyuan Chen, Stratos Idreos |
Abstract | Ensembles of deep neural networks significantly improve generalization accuracy. However, training neural network ensembles requires a large amount of computational resources and time. State-of-the-art approaches either train all networks from scratch leading to prohibitive training cost that allows only very small ensemble sizes in practice, or generate ensembles by training a monolithic architecture, which results in lower model diversity and decreased prediction accuracy. We propose MotherNets to enable higher accuracy and practical training cost for large and diverse neural network ensembles: A MotherNet captures the structural similarity across some or all members of a deep neural network ensemble which allows us to share data movement and computation costs across these networks. We first train a single or a small set of MotherNets and, subsequently, we generate the target ensemble networks by transferring the function from the trained MotherNet(s). Then, we continue to train these ensemble networks, which now converge drastically faster compared to training from scratch. MotherNets handle ensembles with diverse architectures by clustering ensemble networks of similar architecture and training a separate MotherNet for every cluster. MotherNets also use clustering to control the accuracy vs. training cost tradeoff. We show that compared to state-of-the-art approaches such as Snapshot Ensembles, Knowledge Distillation, and TreeNets, MotherNets provide a new Pareto frontier for the accuracy-training cost tradeoff. Crucially, training cost and accuracy improvements continue to scale as we increase the ensemble size (2 to 3 percent reduced absolute test error rate and up to 35 percent faster training compared to Snapshot Ensembles). We verify these benefits over numerous neural network architectures and large data sets. |
Tasks | |
Published | 2018-09-12 |
URL | https://arxiv.org/abs/1809.04270v2 |
https://arxiv.org/pdf/1809.04270v2.pdf | |
PWC | https://paperswithcode.com/paper/rapid-training-of-very-large-ensembles-of |
Repo | |
Framework | |
Predicting the Programming Language of Questions and Snippets of StackOverflow Using Natural Language Processing
Title | Predicting the Programming Language of Questions and Snippets of StackOverflow Using Natural Language Processing |
Authors | Kamel Alreshedy, Dhanush Dharmaretnam, Daniel M. German, Venkatesh Srinivasan, T. Aaron Gulliver |
Abstract | Stack Overflow is the most popular Q&A website among software developers. As a platform for knowledge sharing and acquisition, the questions posted in Stack Overflow usually contain a code snippet. Stack Overflow relies on users to properly tag the programming language of a question and it simply assumes that the programming language of the snippets inside a question is the same as the tag of the question itself. In this paper, we propose a classifier to predict the programming language of questions posted in Stack Overflow using Natural Language Processing (NLP) and Machine Learning (ML). The classifier achieves an accuracy of 91.1% in predicting the 24 most popular programming languages by combining features from the title, body and the code snippets of the question. We also propose a classifier that only uses the title and body of the question and has an accuracy of 81.1%. Finally, we propose a classifier of code snippets only that achieves an accuracy of 77.7%. These results show that deploying Machine Learning techniques on the combination of text and the code snippets of a question provides the best performance. These results demonstrate also that it is possible to identify the programming language of a snippet of few lines of source code. We visualize the feature space of two programming languages Java and SQL in order to identify some special properties of information inside the questions in Stack Overflow corresponding to these languages. |
Tasks | |
Published | 2018-09-21 |
URL | http://arxiv.org/abs/1809.07954v1 |
http://arxiv.org/pdf/1809.07954v1.pdf | |
PWC | https://paperswithcode.com/paper/predicting-the-programming-language-of |
Repo | |
Framework | |
Contrastive Multivariate Singular Spectrum Analysis
Title | Contrastive Multivariate Singular Spectrum Analysis |
Authors | Abdi-Hakin Dirie, Abubakar Abid, James Zou |
Abstract | We introduce Contrastive Multivariate Singular Spectrum Analysis, a novel unsupervised method for dimensionality reduction and signal decomposition of time series data. By utilizing an appropriate background dataset, the method transforms a target time series dataset in a way that evinces the sub-signals that are enhanced in the target dataset, as opposed to only those that account for the greatest variance. This shifts the goal from finding signals that explain the most variance to signals that matter the most to the analyst. We demonstrate our method on an illustrative synthetic example, as well as show the utility of our method in the downstream clustering of electrocardiogram signals from the public MHEALTH dataset. |
Tasks | Dimensionality Reduction, Time Series |
Published | 2018-10-31 |
URL | http://arxiv.org/abs/1810.13317v1 |
http://arxiv.org/pdf/1810.13317v1.pdf | |
PWC | https://paperswithcode.com/paper/contrastive-multivariate-singular-spectrum |
Repo | |
Framework | |
Efficient Online Scalar Annotation with Bounded Support
Title | Efficient Online Scalar Annotation with Bounded Support |
Authors | Keisuke Sakaguchi, Benjamin Van Durme |
Abstract | We describe a novel method for efficiently eliciting scalar annotations for dataset construction and system quality estimation by human judgments. We contrast direct assessment (annotators assign scores to items directly), online pairwise ranking aggregation (scores derive from annotator comparison of items), and a hybrid approach (EASL: Efficient Annotation of Scalar Labels) proposed here. Our proposal leads to increased correlation with ground truth, at far greater annotator efficiency, suggesting this strategy as an improved mechanism for dataset creation and manual system evaluation. |
Tasks | |
Published | 2018-06-04 |
URL | http://arxiv.org/abs/1806.01170v1 |
http://arxiv.org/pdf/1806.01170v1.pdf | |
PWC | https://paperswithcode.com/paper/efficient-online-scalar-annotation-with |
Repo | |
Framework | |