Paper Group ANR 68
Correlated and Individual Multi-Modal Deep Learning for RGB-D Object Recognition. DenseReg: Fully Convolutional Dense Shape Regression In-the-Wild. Machine Learning Approach for Skill Evaluation in Robotic-Assisted Surgery. Generalizing diffuse interface methods on graphs: non-smooth potentials and hypergraphs. Learning Syntactic Program Transforma …
Correlated and Individual Multi-Modal Deep Learning for RGB-D Object Recognition
Title | Correlated and Individual Multi-Modal Deep Learning for RGB-D Object Recognition |
Authors | Ziyan Wang, Jiwen Lu, Ruogu Lin, Jianjiang Feng, Jie zhou |
Abstract | In this paper, we propose a new correlated and individual multi-modal deep learning (CIMDL) method for RGB-D object recognition. Unlike most conventional RGB-D object recognition methods which extract features from the RGB and depth channels individually, our CIMDL jointly learns feature representations from raw RGB-D data with a pair of deep neural networks, so that the sharable and modal-specific information can be simultaneously exploited. Specifically, we construct a pair of deep convolutional neural networks (CNNs) for the RGB and depth data, and concatenate them at the top layer of the network with a loss function which learns a new feature space where both correlated part and the individual part of the RGB-D information are well modelled. The parameters of the whole networks are updated by using the back-propagation criterion. Experimental results on two widely used RGB-D object image benchmark datasets clearly show that our method outperforms state-of-the-arts. |
Tasks | Object Recognition |
Published | 2016-04-06 |
URL | http://arxiv.org/abs/1604.01655v3 |
http://arxiv.org/pdf/1604.01655v3.pdf | |
PWC | https://paperswithcode.com/paper/correlated-and-individual-multi-modal-deep |
Repo | |
Framework | |
DenseReg: Fully Convolutional Dense Shape Regression In-the-Wild
Title | DenseReg: Fully Convolutional Dense Shape Regression In-the-Wild |
Authors | Rıza Alp Güler, George Trigeorgis, Epameinondas Antonakos, Patrick Snape, Stefanos Zafeiriou, Iasonas Kokkinos |
Abstract | In this paper we propose to learn a mapping from image pixels into a dense template grid through a fully convolutional network. We formulate this task as a regression problem and train our network by leveraging upon manually annotated facial landmarks “in-the-wild”. We use such landmarks to establish a dense correspondence field between a three-dimensional object template and the input image, which then serves as the ground-truth for training our regression system. We show that we can combine ideas from semantic segmentation with regression networks, yielding a highly-accurate “quantized regression” architecture. Our system, called DenseReg, allows us to estimate dense image-to-template correspondences in a fully convolutional manner. As such our network can provide useful correspondence information as a stand-alone system, while when used as an initialization for Statistical Deformable Models we obtain landmark localization results that largely outperform the current state-of-the-art on the challenging 300W benchmark. We thoroughly evaluate our method on a host of facial analysis tasks and also provide qualitative results for dense human body correspondence. We make our code available at http://alpguler.com/DenseReg.html along with supplementary materials. |
Tasks | Semantic Segmentation |
Published | 2016-12-04 |
URL | http://arxiv.org/abs/1612.01202v2 |
http://arxiv.org/pdf/1612.01202v2.pdf | |
PWC | https://paperswithcode.com/paper/densereg-fully-convolutional-dense-shape-1 |
Repo | |
Framework | |
Machine Learning Approach for Skill Evaluation in Robotic-Assisted Surgery
Title | Machine Learning Approach for Skill Evaluation in Robotic-Assisted Surgery |
Authors | Mahtab J. Fard, Sattar Ameri, Ratna B. Chinnam, Abhilash K. Pandya, Michael D. Klein, R. Darin Ellis |
Abstract | Evaluating surgeon skill has predominantly been a subjective task. Development of objective methods for surgical skill assessment are of increased interest. Recently, with technological advances such as robotic-assisted minimally invasive surgery (RMIS), new opportunities for objective and automated assessment frameworks have arisen. In this paper, we applied machine learning methods to automatically evaluate performance of the surgeon in RMIS. Six important movement features were used in the evaluation including completion time, path length, depth perception, speed, smoothness and curvature. Different classification methods applied to discriminate expert and novice surgeons. We test our method on real surgical data for suturing task and compare the classification result with the ground truth data (obtained by manual labeling). The experimental results show that the proposed framework can classify surgical skill level with relatively high accuracy of 85.7%. This study demonstrates the ability of machine learning methods to automatically classify expert and novice surgeons using movement features for different RMIS tasks. Due to the simplicity and generalizability of the introduced classification method, it is easy to implement in existing trainers. |
Tasks | |
Published | 2016-11-16 |
URL | http://arxiv.org/abs/1611.05136v1 |
http://arxiv.org/pdf/1611.05136v1.pdf | |
PWC | https://paperswithcode.com/paper/machine-learning-approach-for-skill |
Repo | |
Framework | |
Generalizing diffuse interface methods on graphs: non-smooth potentials and hypergraphs
Title | Generalizing diffuse interface methods on graphs: non-smooth potentials and hypergraphs |
Authors | Jessica Bosch, Steffen Klamt, Martin Stoll |
Abstract | Diffuse interface methods have recently been introduced for the task of semi-supervised learning. The underlying model is well-known in materials science but was extended to graphs using a Ginzburg–Landau functional and the graph Laplacian. We here generalize the previously proposed model by a non-smooth potential function. Additionally, we show that the diffuse interface method can be used for the segmentation of data coming from hypergraphs. For this we show that the graph Laplacian in almost all cases is derived from hypergraph information. Additionally, we show that the formerly introduced hypergraph Laplacian coming from a relaxed optimization problem is well suited to be used within the diffuse interface method. We present computational experiments for graph and hypergraph Laplacians. |
Tasks | |
Published | 2016-11-18 |
URL | http://arxiv.org/abs/1611.06094v1 |
http://arxiv.org/pdf/1611.06094v1.pdf | |
PWC | https://paperswithcode.com/paper/generalizing-diffuse-interface-methods-on |
Repo | |
Framework | |
Learning Syntactic Program Transformations from Examples
Title | Learning Syntactic Program Transformations from Examples |
Authors | Reudismam Rolim, Gustavo Soares, Loris D’Antoni, Oleksandr Polozov, Sumit Gulwani, Rohit Gheyi, Ryo Suzuki, Bjoern Hartmann |
Abstract | IDEs, such as Visual Studio, automate common transformations, such as Rename and Extract Method refactorings. However, extending these catalogs of transformations is complex and time-consuming. A similar phenomenon appears in intelligent tutoring systems where instructors have to write cumbersome code transformations that describe “common faults” to fix similar student submissions to programming assignments. We present REFAZER, a technique for automatically generating program transformations. REFAZER builds on the observation that code edits performed by developers can be used as examples for learning transformations. Example edits may share the same structure but involve different variables and subexpressions, which must be generalized in a transformation at the right level of abstraction. To learn transformations, REFAZER leverages state-of-the-art programming-by-example methodology using the following key components: (a) a novel domain-specific language (DSL) for describing program transformations, (b) domain-specific deductive algorithms for synthesizing transformations in the DSL, and (c) functions for ranking the synthesized transformations. We instantiate and evaluate REFAZER in two domains. First, given examples of edits used by students to fix incorrect programming assignment submissions, we learn transformations that can fix other students’ submissions with similar faults. In our evaluation conducted on 4 programming tasks performed by 720 students, our technique helped to fix incorrect submissions for 87% of the students. In the second domain, we use repetitive edits applied by developers to the same project to synthesize a program transformation that applies these edits to other locations in the code. In our evaluation conducted on 59 scenarios of repetitive edits taken from 3 C# open-source projects, REFAZER learns the intended program transformation in 83% of the cases. |
Tasks | |
Published | 2016-08-31 |
URL | http://arxiv.org/abs/1608.09000v1 |
http://arxiv.org/pdf/1608.09000v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-syntactic-program-transformations |
Repo | |
Framework | |
Transition Forests: Learning Discriminative Temporal Transitions for Action Recognition and Detection
Title | Transition Forests: Learning Discriminative Temporal Transitions for Action Recognition and Detection |
Authors | Guillermo Garcia-Hernando, Tae-Kyun Kim |
Abstract | A human action can be seen as transitions between one’s body poses over time, where the transition depicts a temporal relation between two poses. Recognizing actions thus involves learning a classifier sensitive to these pose transitions as well as to static poses. In this paper, we introduce a novel method called transitions forests, an ensemble of decision trees that both learn to discriminate static poses and transitions between pairs of two independent frames. During training, node splitting is driven by alternating two criteria: the standard classification objective that maximizes the discrimination power in individual frames, and the proposed one in pairwise frame transitions. Growing the trees tends to group frames that have similar associated transitions and share same action label incorporating temporal information that was not available otherwise. Unlike conventional decision trees where the best split in a node is determined independently of other nodes, the transition forests try to find the best split of nodes jointly (within a layer) for incorporating distant node transitions. When inferring the class label of a new frame, it is passed down the trees and the prediction is made based on previous frame predictions and the current one in an efficient and online manner. We apply our method on varied skeleton action recognition and online detection datasets showing its suitability over several baselines and state-of-the-art approaches. |
Tasks | Action Detection, Spatio-Temporal Action Localization, Temporal Action Localization |
Published | 2016-07-10 |
URL | http://arxiv.org/abs/1607.02737v3 |
http://arxiv.org/pdf/1607.02737v3.pdf | |
PWC | https://paperswithcode.com/paper/transition-forests-learning-discriminative |
Repo | |
Framework | |
Mental Lexicon Growth Modelling Reveals the Multiplexity of the English Language
Title | Mental Lexicon Growth Modelling Reveals the Multiplexity of the English Language |
Authors | Massimo Stella, Markus Brede |
Abstract | In this work we extend previous analyses of linguistic networks by adopting a multi-layer network framework for modelling the human mental lexicon, i.e. an abstract mental repository where words and concepts are stored together with their linguistic patterns. Across a three-layer linguistic multiplex, we model English words as nodes and connect them according to (i) phonological similarities, (ii) synonym relationships and (iii) free word associations. Our main aim is to exploit this multi-layered structure to explore the influence of phonological and semantic relationships on lexicon assembly over time. We propose a model of lexicon growth which is driven by the phonological layer: words are suggested according to different orderings of insertion (e.g. shorter word length, highest frequency, semantic multiplex features) and accepted or rejected subject to constraints. We then measure times of network assembly and compare these to empirical data about the age of acquisition of words. In agreement with empirical studies in psycholinguistics, our results provide quantitative evidence for the hypothesis that word acquisition is driven by features at multiple levels of organisation within language. |
Tasks | |
Published | 2016-04-05 |
URL | http://arxiv.org/abs/1604.01243v1 |
http://arxiv.org/pdf/1604.01243v1.pdf | |
PWC | https://paperswithcode.com/paper/mental-lexicon-growth-modelling-reveals-the |
Repo | |
Framework | |
An Optimal Treatment Assignment Strategy to Evaluate Demand Response Effect
Title | An Optimal Treatment Assignment Strategy to Evaluate Demand Response Effect |
Authors | Pan Li, Baosen Zhang |
Abstract | Demand response is designed to motivate electricity customers to modify their loads at critical time periods. The accurate estimation of impact of demand response signals to customers’ consumption is central to any successful program. In practice, learning these response is nontrivial because operators can only send a limited number of signals. In addition, customer behavior also depends on a large number of exogenous covariates. These two features lead to a high dimensional inference problem with limited number of observations. In this paper, we formulate this problem by using a multivariate linear model and adopt an experimental design approach to estimate the impact of demand response signals. We show that randomized assignment, which is widely used to estimate the average treatment effect, is not efficient in reducing the variance of the estimator when a large number of covariates is present. In contrast, we present a tractable algorithm that strategically assigns demand response signals to customers. This algorithm achieves the optimal reduction in estimation variance, independent of the number of covariates. The results are validated from simulations on synthetic data. |
Tasks | |
Published | 2016-10-02 |
URL | http://arxiv.org/abs/1610.00362v2 |
http://arxiv.org/pdf/1610.00362v2.pdf | |
PWC | https://paperswithcode.com/paper/an-optimal-treatment-assignment-strategy-to |
Repo | |
Framework | |
A Bayesian Model of Multilingual Unsupervised Semantic Role Induction
Title | A Bayesian Model of Multilingual Unsupervised Semantic Role Induction |
Authors | Nikhil Garg, James Henderson |
Abstract | We propose a Bayesian model of unsupervised semantic role induction in multiple languages, and use it to explore the usefulness of parallel corpora for this task. Our joint Bayesian model consists of individual models for each language plus additional latent variables that capture alignments between roles across languages. Because it is a generative Bayesian model, we can do evaluations in a variety of scenarios just by varying the inference procedure, without changing the model, thereby comparing the scenarios directly. We compare using only monolingual data, using a parallel corpus, using a parallel corpus with annotations in the other language, and using small amounts of annotation in the target language. We find that the biggest impact of adding a parallel corpus to training is actually the increase in mono-lingual data, with the alignments to another language resulting in small improvements, even with labeled data for the other language. |
Tasks | |
Published | 2016-03-04 |
URL | http://arxiv.org/abs/1603.01514v1 |
http://arxiv.org/pdf/1603.01514v1.pdf | |
PWC | https://paperswithcode.com/paper/a-bayesian-model-of-multilingual-unsupervised |
Repo | |
Framework | |
On the Nyström and Column-Sampling Methods for the Approximate Principal Components Analysis of Large Data Sets
Title | On the Nyström and Column-Sampling Methods for the Approximate Principal Components Analysis of Large Data Sets |
Authors | Darren Homrighausen, Daniel J. McDonald |
Abstract | In this paper we analyze approximate methods for undertaking a principal components analysis (PCA) on large data sets. PCA is a classical dimension reduction method that involves the projection of the data onto the subspace spanned by the leading eigenvectors of the covariance matrix. This projection can be used either for exploratory purposes or as an input for further analysis, e.g. regression. If the data have billions of entries or more, the computational and storage requirements for saving and manipulating the design matrix in fast memory is prohibitive. Recently, the Nystr"om and column-sampling methods have appeared in the numerical linear algebra community for the randomized approximation of the singular value decomposition of large matrices. However, their utility for statistical applications remains unclear. We compare these approximations theoretically by bounding the distance between the induced subspaces and the desired, but computationally infeasible, PCA subspace. Additionally we show empirically, through simulations and a real data example involving a corpus of emails, the trade-off of approximation accuracy and computational complexity. |
Tasks | Dimensionality Reduction |
Published | 2016-02-02 |
URL | http://arxiv.org/abs/1602.01120v1 |
http://arxiv.org/pdf/1602.01120v1.pdf | |
PWC | https://paperswithcode.com/paper/on-the-nystrom-and-column-sampling-methods |
Repo | |
Framework | |
Multi30K: Multilingual English-German Image Descriptions
Title | Multi30K: Multilingual English-German Image Descriptions |
Authors | Desmond Elliott, Stella Frank, Khalil Sima’an, Lucia Specia |
Abstract | We introduce the Multi30K dataset to stimulate multilingual multimodal research. Recent advances in image description have been demonstrated on English-language datasets almost exclusively, but image description should not be limited to English. This dataset extends the Flickr30K dataset with i) German translations created by professional translators over a subset of the English descriptions, and ii) descriptions crowdsourced independently of the original English descriptions. We outline how the data can be used for multilingual image description and multimodal machine translation, but we anticipate the data will be useful for a broader range of tasks. |
Tasks | Machine Translation, Multimodal Machine Translation |
Published | 2016-05-02 |
URL | http://arxiv.org/abs/1605.00459v1 |
http://arxiv.org/pdf/1605.00459v1.pdf | |
PWC | https://paperswithcode.com/paper/multi30k-multilingual-english-german-image |
Repo | |
Framework | |
Deep counter networks for asynchronous event-based processing
Title | Deep counter networks for asynchronous event-based processing |
Authors | Jonathan Binas, Giacomo Indiveri, Michael Pfeiffer |
Abstract | Despite their advantages in terms of computational resources, latency, and power consumption, event-based implementations of neural networks have not been able to achieve the same performance figures as their equivalent state-of-the-art deep network models. We propose counter neurons as minimal spiking neuron models which only require addition and comparison operations, thus avoiding costly multiplications. We show how inference carried out in deep counter networks converges to the same accuracy levels as are achieved with state-of-the-art conventional networks. As their event-based style of computation leads to reduced latency and sparse updates, counter networks are ideally suited for efficient compact and low-power hardware implementation. We present theory and training methods for counter networks, and demonstrate on the MNIST benchmark that counter networks converge quickly, both in terms of time and number of operations required, to state-of-the-art classification accuracy. |
Tasks | |
Published | 2016-11-02 |
URL | http://arxiv.org/abs/1611.00710v1 |
http://arxiv.org/pdf/1611.00710v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-counter-networks-for-asynchronous-event |
Repo | |
Framework | |
Compact-Table: Efficiently Filtering Table Constraints with Reversible Sparse Bit-Sets
Title | Compact-Table: Efficiently Filtering Table Constraints with Reversible Sparse Bit-Sets |
Authors | Jordan Demeulenaere, Renaud Hartert, Christophe Lecoutre, Guillaume Perez, Laurent Perron, Jean-Charles Régin, Pierre Schaus |
Abstract | In this paper, we describe Compact-Table (CT), a bitwise algorithm to enforce Generalized Arc Consistency (GAC) on table con- straints. Although this algorithm is the default propagator for table constraints in or-tools and OscaR, two publicly available CP solvers, it has never been described so far. Importantly, CT has been recently improved further with the introduction of residues, resetting operations and a data-structure called reversible sparse bit-set, used to maintain tables of supports (following the idea of tabular reduction): tuples are invalidated incrementally on value removals by means of bit-set operations. The experimentation that we have conducted with OscaR shows that CT outperforms state-of-the-art algorithms STR2, STR3, GAC4R, MDD4R and AC5-TC on standard benchmarks. |
Tasks | |
Published | 2016-04-22 |
URL | http://arxiv.org/abs/1604.06641v1 |
http://arxiv.org/pdf/1604.06641v1.pdf | |
PWC | https://paperswithcode.com/paper/compact-table-efficiently-filtering-table |
Repo | |
Framework | |
Learning when to trust distant supervision: An application to low-resource POS tagging using cross-lingual projection
Title | Learning when to trust distant supervision: An application to low-resource POS tagging using cross-lingual projection |
Authors | Meng Fang, Trevor Cohn |
Abstract | Cross lingual projection of linguistic annotation suffers from many sources of bias and noise, leading to unreliable annotations that cannot be used directly. In this paper, we introduce a novel approach to sequence tagging that learns to correct the errors from cross-lingual projection using an explicit debiasing layer. This is framed as joint learning over two corpora, one tagged with gold standard and the other with projected tags. We evaluated with only 1,000 tokens tagged with gold standard tags, along with more plentiful parallel data. Our system equals or exceeds the state-of-the-art on eight simulated low-resource settings, as well as two real low-resource languages, Malagasy and Kinyarwanda. |
Tasks | |
Published | 2016-07-05 |
URL | http://arxiv.org/abs/1607.01133v1 |
http://arxiv.org/pdf/1607.01133v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-when-to-trust-distant-supervision-an |
Repo | |
Framework | |
Graph-based semi-supervised learning for relational networks
Title | Graph-based semi-supervised learning for relational networks |
Authors | Leto Peel |
Abstract | We address the problem of semi-supervised learning in relational networks, networks in which nodes are entities and links are the relationships or interactions between them. Typically this problem is confounded with the problem of graph-based semi-supervised learning (GSSL), because both problems represent the data as a graph and predict the missing class labels of nodes. However, not all graphs are created equally. In GSSL a graph is constructed, often from independent data, based on similarity. As such, edges tend to connect instances with the same class label. Relational networks, however, can be more heterogeneous and edges do not always indicate similarity. For instance, instead of links being more likely to connect nodes with the same class label, they may occur more frequently between nodes with different class labels (link-heterogeneity). Or nodes with the same class label do not necessarily have the same type of connectivity across the whole network (class-heterogeneity), e.g. in a network of sexual interactions we may observe links between opposite genders in some parts of the graph and links between the same genders in others. Performing classification in networks with different types of heterogeneity is a hard problem that is made harder still when we do not know a-priori the type or level of heterogeneity. Here we present two scalable approaches for graph-based semi-supervised learning for the more general case of relational networks. We demonstrate these approaches on synthetic and real-world networks that display different link patterns within and between classes. Compared to state-of-the-art approaches, ours give better classification performance without prior knowledge of how classes interact. In particular, our two-step label propagation algorithm gives consistently good accuracy and runs on networks of over 1.6 million nodes and 30 million edges in around 12 seconds. |
Tasks | |
Published | 2016-12-15 |
URL | http://arxiv.org/abs/1612.05001v1 |
http://arxiv.org/pdf/1612.05001v1.pdf | |
PWC | https://paperswithcode.com/paper/graph-based-semi-supervised-learning-for |
Repo | |
Framework | |