October 18, 2019

3201 words 16 mins read

Paper Group ANR 477

Paper Group ANR 477

Learning-based Regularization for Cardiac Strain Analysis with Ability for Domain Adaptation. Object segmentation in depth maps with one user click and a synthetically trained fully convolutional network. Offline Signature Verification by Combining Graph Edit Distance and Triplet Networks. Towards Unsupervised Automatic Speech Recognition Trained b …

Learning-based Regularization for Cardiac Strain Analysis with Ability for Domain Adaptation

Title Learning-based Regularization for Cardiac Strain Analysis with Ability for Domain Adaptation
Authors Allen Lu, Nripesh Parajuli, Maria Zontak, John Stendahl, Kevinminh Ta, Zhao Liu, Nabil Boutagy, Geng-Shi Jeng, Imran Alkhalil, Lawrence H. Staib, Matthew O’Donnell, Albert J. Sinusas, James S. Duncan
Abstract Reliable motion estimation and strain analysis using 3D+time echocardiography (4DE) for localization and characterization of myocardial injury is valuable for early detection and targeted interventions. However, motion estimation is difficult due to the low-SNR that stems from the inherent image properties of 4DE, and intelligent regularization is critical for producing reliable motion estimates. In this work, we incorporated the notion of domain adaptation into a supervised neural network regularization framework. We first propose an unsupervised autoencoder network with biomechanical constraints for learning a latent representation that is shown to have more physiologically plausible displacements. We extended this framework to include a supervised loss term on synthetic data and showed the effects of biomechanical constraints on the network’s ability for domain adaptation. We validated both the autoencoder and semi-supervised regularization method on in vivo data with implanted sonomicrometers. Finally, we showed the ability of our semi-supervised learning regularization approach to identify infarcted regions using estimated regional strain maps with good agreement to manually traced infarct regions from postmortem excised hearts.
Tasks Domain Adaptation, Motion Estimation
Published 2018-07-12
URL http://arxiv.org/abs/1807.04807v1
PDF http://arxiv.org/pdf/1807.04807v1.pdf
PWC https://paperswithcode.com/paper/learning-based-regularization-for-cardiac
Repo
Framework

Object segmentation in depth maps with one user click and a synthetically trained fully convolutional network

Title Object segmentation in depth maps with one user click and a synthetically trained fully convolutional network
Authors Matthieu Grard, Romain Brégier, Florian Sella, Emmanuel Dellandréa, Liming Chen
Abstract With more and more household objects built on planned obsolescence and consumed by a fast-growing population, hazardous waste recycling has become a critical challenge. Given the large variability of household waste, current recycling platforms mostly rely on human operators to analyze the scene, typically composed of many object instances piled up in bulk. Helping them by robotizing the unitary extraction is a key challenge to speed up this tedious process. Whereas supervised deep learning has proven very efficient for such object-level scene understanding, e.g., generic object detection and segmentation in everyday scenes, it however requires large sets of per-pixel labeled images, that are hardly available for numerous application contexts, including industrial robotics. We thus propose a step towards a practical interactive application for generating an object-oriented robotic grasp, requiring as inputs only one depth map of the scene and one user click on the next object to extract. More precisely, we address in this paper the middle issue of object seg-mentation in top views of piles of bulk objects given a pixel location, namely seed, provided interactively by a human operator. We propose a twofold framework for generating edge-driven instance segments. First, we repurpose a state-of-the-art fully convolutional object contour detector for seed-based instance segmentation by introducing the notion of edge-mask duality with a novel patch-free and contour-oriented loss function. Second, we train one model using only synthetic scenes, instead of manually labeled training data. Our experimental results show that considering edge-mask duality for training an encoder-decoder network, as we suggest, outperforms a state-of-the-art patch-based network in the present application context.
Tasks Instance Segmentation, Object Detection, Scene Understanding, Semantic Segmentation
Published 2018-01-04
URL http://arxiv.org/abs/1801.01281v2
PDF http://arxiv.org/pdf/1801.01281v2.pdf
PWC https://paperswithcode.com/paper/object-segmentation-in-depth-maps-with-one
Repo
Framework

Offline Signature Verification by Combining Graph Edit Distance and Triplet Networks

Title Offline Signature Verification by Combining Graph Edit Distance and Triplet Networks
Authors Paul Maergner, Vinaychandran Pondenkandath, Michele Alberti, Marcus Liwicki, Kaspar Riesen, Rolf Ingold, Andreas Fischer
Abstract Biometric authentication by means of handwritten signatures is a challenging pattern recognition task, which aims to infer a writer model from only a handful of genuine signatures. In order to make it more difficult for a forger to attack the verification system, a promising strategy is to combine different writer models. In this work, we propose to complement a recent structural approach to offline signature verification based on graph edit distance with a statistical approach based on metric learning with deep neural networks. On the MCYT and GPDS benchmark datasets, we demonstrate that combining the structural and statistical models leads to significant improvements in performance, profiting from their complementary properties.
Tasks Metric Learning
Published 2018-10-17
URL http://arxiv.org/abs/1810.07491v1
PDF http://arxiv.org/pdf/1810.07491v1.pdf
PWC https://paperswithcode.com/paper/offline-signature-verification-by-combining
Repo
Framework

Towards Unsupervised Automatic Speech Recognition Trained by Unaligned Speech and Text only

Title Towards Unsupervised Automatic Speech Recognition Trained by Unaligned Speech and Text only
Authors Yi-Chen Chen, Chia-Hao Shen, Sung-Feng Huang, Hung-yi Lee
Abstract Automatic speech recognition (ASR) has been widely researched with supervised approaches, while many low-resourced languages lack audio-text aligned data, and supervised methods cannot be applied on them. In this work, we propose a framework to achieve unsupervised ASR on a read English speech dataset, where audio and text are unaligned. In the first stage, each word-level audio segment in the utterances is represented by a vector representation extracted by a sequence-of-sequence autoencoder, in which phonetic information and speaker information are disentangled. Secondly, semantic embeddings of audio segments are trained from the vector representations using a skip-gram model. Last but not the least, an unsupervised method is utilized to transform semantic embeddings of audio segments to text embedding space, and finally the transformed embeddings are mapped to words. With the above framework, we are towards unsupervised ASR trained by unaligned text and speech only.
Tasks Speech Recognition
Published 2018-03-29
URL http://arxiv.org/abs/1803.10952v3
PDF http://arxiv.org/pdf/1803.10952v3.pdf
PWC https://paperswithcode.com/paper/towards-unsupervised-automatic-speech
Repo
Framework

Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview

Title Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview
Authors Yuejie Chi, Yue M. Lu, Yuxin Chen
Abstract Substantial progress has been made recently on developing provably accurate and efficient algorithms for low-rank matrix factorization via nonconvex optimization. While conventional wisdom often takes a dim view of nonconvex optimization algorithms due to their susceptibility to spurious local minima, simple iterative methods such as gradient descent have been remarkably successful in practice. The theoretical footings, however, had been largely lacking until recently. In this tutorial-style overview, we highlight the important role of statistical models in enabling efficient nonconvex optimization with performance guarantees. We review two contrasting approaches: (1) two-stage algorithms, which consist of a tailored initialization step followed by successive refinement; and (2) global landscape analysis and initialization-free algorithms. Several canonical matrix factorization problems are discussed, including but not limited to matrix sensing, phase retrieval, matrix completion, blind deconvolution, robust principal component analysis, phase synchronization, and joint alignment. Special care is taken to illustrate the key technical insights underlying their analyses. This article serves as a testament that the integrated consideration of optimization and statistics leads to fruitful research findings.
Tasks Matrix Completion
Published 2018-09-25
URL https://arxiv.org/abs/1809.09573v3
PDF https://arxiv.org/pdf/1809.09573v3.pdf
PWC https://paperswithcode.com/paper/nonconvex-optimization-meets-low-rank-matrix
Repo
Framework

Multiclass Common Spatial Pattern for EEG based Brain Computer Interface with Adaptive Learning Classifier

Title Multiclass Common Spatial Pattern for EEG based Brain Computer Interface with Adaptive Learning Classifier
Authors Hardik Meisheri, Nagraj Ramrao, Suman Mitra
Abstract In Brain Computer Interface (BCI), data generated from Electroencephalogram (EEG) is non-stationary with low signal to noise ratio and contaminated with artifacts. Common Spatial Pattern (CSP) algorithm has been proved to be effective in BCI for extracting features in motor imagery tasks, but it is prone to overfitting. Many algorithms have been devised to regularize CSP for two class problem, however they have not been effective when applied to multiclass CSP. Outliers present in data affect extracted CSP features and reduces performance of the system. In addition to this non-stationarity present in the features extracted from the CSP present a challenge in classification. We propose a method to identify and remove artifact present in the data during pre-processing stage, this helps in calculating eigenvectors which in turn generates better CSP features. To handle the non-stationarity, Self-Regulated Interval Type-2 Neuro-Fuzzy Inference System (SRIT2NFIS) was proposed in the literature for two class EEG classification problem. This paper extends the SRIT2NFIS to multiclass using Joint Approximate Diagonalization (JAD). The results on standard data set from BCI competition IV shows significant increase in the accuracies from the current state of the art methods for multiclass classification.
Tasks EEG
Published 2018-02-25
URL http://arxiv.org/abs/1802.09046v1
PDF http://arxiv.org/pdf/1802.09046v1.pdf
PWC https://paperswithcode.com/paper/multiclass-common-spatial-pattern-for-eeg
Repo
Framework

Indicatements that character language models learn English morpho-syntactic units and regularities

Title Indicatements that character language models learn English morpho-syntactic units and regularities
Authors Yova Kementchedjhieva, Adam Lopez
Abstract Character language models have access to surface morphological patterns, but it is not clear whether or how they learn abstract morphological regularities. We instrument a character language model with several probes, finding that it can develop a specific unit to identify word boundaries and, by extension, morpheme boundaries, which allows it to capture linguistic properties and regularities of these units. Our language model proves surprisingly good at identifying the selectional restrictions of English derivational morphemes, a task that requires both morphological and syntactic awareness. Thus we conclude that, when morphemes overlap extensively with the words of a language, a character language model can perform morphological abstraction.
Tasks Language Modelling
Published 2018-08-31
URL http://arxiv.org/abs/1809.00066v1
PDF http://arxiv.org/pdf/1809.00066v1.pdf
PWC https://paperswithcode.com/paper/indicatements-that-character-language-models
Repo
Framework

Food recognition and recipe analysis: integrating visual content, context and external knowledge

Title Food recognition and recipe analysis: integrating visual content, context and external knowledge
Authors Luis Herranz, Weiqing Min, Shuqiang Jiang
Abstract The central role of food in our individual and social life, combined with recent technological advances, has motivated a growing interest in applications that help to better monitor dietary habits as well as the exploration and retrieval of food-related information. We review how visual content, context and external knowledge can be integrated effectively into food-oriented applications, with special focus on recipe analysis and retrieval, food recommendation, and the restaurant context as emerging directions.
Tasks Food Recognition
Published 2018-01-22
URL http://arxiv.org/abs/1801.07239v1
PDF http://arxiv.org/pdf/1801.07239v1.pdf
PWC https://paperswithcode.com/paper/food-recognition-and-recipe-analysis
Repo
Framework

Learning at the Ends: From Hand to Tool Affordances in Humanoid Robots

Title Learning at the Ends: From Hand to Tool Affordances in Humanoid Robots
Authors Giovanni Saponaro, Pedro Vicente, Atabak Dehban, Lorenzo Jamone, Alexandre Bernardino, José Santos-Victor
Abstract One of the open challenges in designing robots that operate successfully in the unpredictable human environment is how to make them able to predict what actions they can perform on objects, and what their effects will be, i.e., the ability to perceive object affordances. Since modeling all the possible world interactions is unfeasible, learning from experience is required, posing the challenge of collecting a large amount of experiences (i.e., training data). Typically, a manipulative robot operates on external objects by using its own hands (or similar end-effectors), but in some cases the use of tools may be desirable, nevertheless, it is reasonable to assume that while a robot can collect many sensorimotor experiences using its own hands, this cannot happen for all possible human-made tools. Therefore, in this paper we investigate the developmental transition from hand to tool affordances: what sensorimotor skills that a robot has acquired with its bare hands can be employed for tool use? By employing a visual and motor imagination mechanism to represent different hand postures compactly, we propose a probabilistic model to learn hand affordances, and we show how this model can generalize to estimate the affordances of previously unseen tools, ultimately supporting planning, decision-making and tool selection tasks in humanoid robots. We present experimental results with the iCub humanoid robot, and we publicly release the collected sensorimotor data in the form of a hand posture affordances dataset.
Tasks Decision Making
Published 2018-04-09
URL http://arxiv.org/abs/1804.03022v1
PDF http://arxiv.org/pdf/1804.03022v1.pdf
PWC https://paperswithcode.com/paper/learning-at-the-ends-from-hand-to-tool
Repo
Framework

Particle Probability Hypothesis Density Filter based on Pairwise Markov Chains

Title Particle Probability Hypothesis Density Filter based on Pairwise Markov Chains
Authors Jiangyi Liu, Chunping Wang, Wei Wang
Abstract Most multi-target tracking filters assume that one target and its observation follow a Hidden Markov Chain (HMC) model, but the implicit independence assumption of HMC model is invalid in many practical applications, and a Pairwise Markov Chain (PMC) model is more universally suitable than traditional HMC model. A particle probability hypothesis density filter based on PMC model (PF-PMC-PHD) is proposed for the nonlinear multi-target tracking system. Simulation results show the effectiveness of PF-PMC-PHD filter, and that the tracking performance of PF-PMC-PHD filter is superior to the particle PHD filter based on HMC model in a scenario where we kept the local physical properties of nonlinear and Gaussian HMC models while relaxing their independence assumption.
Tasks
Published 2018-11-28
URL http://arxiv.org/abs/1811.12211v1
PDF http://arxiv.org/pdf/1811.12211v1.pdf
PWC https://paperswithcode.com/paper/particle-probability-hypothesis-density
Repo
Framework

Sample Efficient Adaptive Text-to-Speech

Title Sample Efficient Adaptive Text-to-Speech
Authors Yutian Chen, Yannis Assael, Brendan Shillingford, David Budden, Scott Reed, Heiga Zen, Quan Wang, Luis C. Cobo, Andrew Trask, Ben Laurie, Caglar Gulcehre, Aäron van den Oord, Oriol Vinyals, Nando de Freitas
Abstract We present a meta-learning approach for adaptive text-to-speech (TTS) with few data. During training, we learn a multi-speaker model using a shared conditional WaveNet core and independent learned embeddings for each speaker. The aim of training is not to produce a neural network with fixed weights, which is then deployed as a TTS system. Instead, the aim is to produce a network that requires few data at deployment time to rapidly adapt to new speakers. We introduce and benchmark three strategies: (i) learning the speaker embedding while keeping the WaveNet core fixed, (ii) fine-tuning the entire architecture with stochastic gradient descent, and (iii) predicting the speaker embedding with a trained neural network encoder. The experiments show that these approaches are successful at adapting the multi-speaker neural network to new speakers, obtaining state-of-the-art results in both sample naturalness and voice similarity with merely a few minutes of audio data from new speakers.
Tasks Meta-Learning
Published 2018-09-27
URL http://arxiv.org/abs/1809.10460v3
PDF http://arxiv.org/pdf/1809.10460v3.pdf
PWC https://paperswithcode.com/paper/sample-efficient-adaptive-text-to-speech
Repo
Framework

CT Image Registration in Acute Stroke Monitoring

Title CT Image Registration in Acute Stroke Monitoring
Authors Lucio Amelio, Alessia Amelio
Abstract We present a new system based on tracking the temporal evolution of stroke lesions using an image registration technique on CT exams of the patient’s brain. The system is able to compare past CT exams with the most recent one related to stroke event in order to evaluate past lesions which are not related to stroke. Then, it can compare recent CT exams related to the current stroke for assessing the evolution of the lesion over time. A new similarity measure is also introduced for the comparison of the source and target images during image registration. It will result in a cheaper, faster and more accessible evaluation of the acute phase of the stroke overcoming the current limitations of the proposed systems in the state-of-the-art.
Tasks Image Registration
Published 2018-06-28
URL http://arxiv.org/abs/1806.10836v1
PDF http://arxiv.org/pdf/1806.10836v1.pdf
PWC https://paperswithcode.com/paper/ct-image-registration-in-acute-stroke
Repo
Framework

Loop corrections in spin models through density consistency

Title Loop corrections in spin models through density consistency
Authors Alfredo Braunstein, Giovanni Catania, Luca Dall’Asta
Abstract Computing marginal distributions of discrete or semidiscrete Markov random fields (MRFs) is a fundamental, generally intractable problem with a vast number of applications in virtually all fields of science. We present a new family of computational schemes to approximately calculate the marginals of discrete MRFs. This method shares some desirable properties with belief propagation, in particular, providing exact marginals on acyclic graphs, but it differs with the latter in that it includes some loop corrections; i.e., it takes into account correlations coming from all cycles in the factor graph. It is also similar to the adaptive Thouless-Anderson-Palmer method, but it differs with the latter in that the consistency is not on the first two moments of the distribution but rather on the value of its density on a subset of values. The results on finite-dimensional Isinglike models show a significant improvement with respect to the Bethe-Peierls (tree) approximation in all cases and with respect to the plaquette cluster variational method approximation in many cases. In particular, for the critical inverse temperature $\beta_{c}$ of the homogeneous hypercubic lattice, the expansion of $\left(d\beta_{c}\right)^{-1}$ around $d=\infty$ of the proposed scheme is exact up to the $d^{-4}$ order, whereas the two latter are exact only up to the $d^{-2}$ order.
Tasks
Published 2018-10-24
URL https://arxiv.org/abs/1810.10602v4
PDF https://arxiv.org/pdf/1810.10602v4.pdf
PWC https://paperswithcode.com/paper/loop-corrections-in-spin-models-through
Repo
Framework

Game-Theoretic Interpretability for Temporal Modeling

Title Game-Theoretic Interpretability for Temporal Modeling
Authors Guang-He Lee, David Alvarez-Melis, Tommi S. Jaakkola
Abstract Interpretability has arisen as a key desideratum of machine learning models alongside performance. Approaches so far have been primarily concerned with fixed dimensional inputs emphasizing feature relevance or selection. In contrast, we focus on temporal modeling and the problem of tailoring the predictor, functionally, towards an interpretable family. To this end, we propose a co-operative game between the predictor and an explainer without any a priori restrictions on the functional class of the predictor. The goal of the explainer is to highlight, locally, how well the predictor conforms to the chosen interpretable family of temporal models. Our co-operative game is setup asymmetrically in terms of information sets for efficiency reasons. We develop and illustrate the framework in the context of temporal sequence models with examples.
Tasks
Published 2018-06-30
URL http://arxiv.org/abs/1807.00130v1
PDF http://arxiv.org/pdf/1807.00130v1.pdf
PWC https://paperswithcode.com/paper/game-theoretic-interpretability-for-temporal
Repo
Framework

Modularity Matters: Learning Invariant Relational Reasoning Tasks

Title Modularity Matters: Learning Invariant Relational Reasoning Tasks
Authors Jason Jo, Vikas Verma, Yoshua Bengio
Abstract We focus on two supervised visual reasoning tasks whose labels encode a semantic relational rule between two or more objects in an image: the MNIST Parity task and the colorized Pentomino task. The objects in the images undergo random translation, scaling, rotation and coloring transformations. Thus these tasks involve invariant relational reasoning. We report uneven performance of various deep CNN models on these two tasks. For the MNIST Parity task, we report that the VGG19 model soundly outperforms a family of ResNet models. Moreover, the family of ResNet models exhibits a general sensitivity to random initialization for the MNIST Parity task. For the colorized Pentomino task, now both the VGG19 and ResNet models exhibit sluggish optimization and very poor test generalization, hovering around 30% test error. The CNN we tested all learn hierarchies of fully distributed features and thus encode the distributed representation prior. We are motivated by a hypothesis from cognitive neuroscience which posits that the human visual cortex is modularized, and this allows the visual cortex to learn higher order invariances. To this end, we consider a modularized variant of the ResNet model, referred to as a Residual Mixture Network (ResMixNet) which employs a mixture-of-experts architecture to interleave distributed representations with more specialized, modular representations. We show that very shallow ResMixNets are capable of learning each of the two tasks well, attaining less than 2% and 1% test error on the MNIST Parity and the colorized Pentomino tasks respectively. Most importantly, the ResMixNet models are extremely parameter efficient: generalizing better than various non-modular CNNs that have over 10x the number of parameters. These experimental results support the hypothesis that modularity is a robust prior for learning invariant relational reasoning.
Tasks Relational Reasoning, Visual Reasoning
Published 2018-06-18
URL http://arxiv.org/abs/1806.06765v1
PDF http://arxiv.org/pdf/1806.06765v1.pdf
PWC https://paperswithcode.com/paper/modularity-matters-learning-invariant
Repo
Framework
comments powered by Disqus