July 29, 2019

2999 words 15 mins read

Paper Group ANR 87

Paper Group ANR 87

Cross-Domain Image Retrieval with Attention Modeling. Parallel transport in shape analysis: a scalable numerical scheme. VAMPnets: Deep learning of molecular kinetics. Synthetic Data for Neural Machine Translation of Spoken-Dialects. Towards a Deep Improviser: a prototype deep learning post-tonal free music generator. Crowdsourcing Universal Part-O …

Cross-Domain Image Retrieval with Attention Modeling

Title Cross-Domain Image Retrieval with Attention Modeling
Authors Xin Ji, Wei Wang, Meihui Zhang, Yang Yang
Abstract With the proliferation of e-commerce websites and the ubiquitousness of smart phones, cross-domain image retrieval using images taken by smart phones as queries to search products on e-commerce websites is emerging as a popular application. One challenge of this task is to locate the attention of both the query and database images. In particular, database images, e.g. of fashion products, on e-commerce websites are typically displayed with other accessories, and the images taken by users contain noisy background and large variations in orientation and lighting. Consequently, their attention is difficult to locate. In this paper, we exploit the rich tag information available on the e-commerce websites to locate the attention of database images. For query images, we use each candidate image in the database as the context to locate the query attention. Novel deep convolutional neural network architectures, namely TagYNet and CtxYNet, are proposed to learn the attention weights and then extract effective representations of the images. Experimental results on public datasets confirm that our approaches have significant improvement over the existing methods in terms of the retrieval accuracy and efficiency.
Tasks Image Retrieval
Published 2017-09-06
URL http://arxiv.org/abs/1709.01784v1
PDF http://arxiv.org/pdf/1709.01784v1.pdf
PWC https://paperswithcode.com/paper/cross-domain-image-retrieval-with-attention
Repo
Framework

Parallel transport in shape analysis: a scalable numerical scheme

Title Parallel transport in shape analysis: a scalable numerical scheme
Authors Maxime Louis, Alexandre Bône, Benjamin Charlier, Stanley Durrleman
Abstract The analysis of manifold-valued data requires efficient tools from Riemannian geometry to cope with the computational complexity at stake. This complexity arises from the always-increasing dimension of the data, and the absence of closed-form expressions to basic operations such as the Riemannian logarithm. In this paper, we adapt a generic numerical scheme recently introduced for computing parallel transport along geodesics in a Riemannian manifold to finite-dimensional manifolds of diffeomorphisms. We provide a qualitative and quantitative analysis of its behavior on high-dimensional manifolds, and investigate an application with the prediction of brain structures progression.
Tasks
Published 2017-11-23
URL http://arxiv.org/abs/1711.08725v1
PDF http://arxiv.org/pdf/1711.08725v1.pdf
PWC https://paperswithcode.com/paper/parallel-transport-in-shape-analysis-a
Repo
Framework

VAMPnets: Deep learning of molecular kinetics

Title VAMPnets: Deep learning of molecular kinetics
Authors Andreas Mardt, Luca Pasquali, Hao Wu, Frank Noé
Abstract There is an increasing demand for computing the relevant structures, equilibria and long-timescale kinetics of biomolecular processes, such as protein-drug binding, from high-throughput molecular dynamics simulations. Current methods employ transformation of simulated coordinates into structural features, dimension reduction, clustering the dimension-reduced data, and estimation of a Markov state model or related model of the interconversion rates between molecular structures. This handcrafted approach demands a substantial amount of modeling expertise, as poor decisions at any step will lead to large modeling errors. Here we employ the variational approach for Markov processes (VAMP) to develop a deep learning framework for molecular kinetics using neural networks, dubbed VAMPnets. A VAMPnet encodes the entire mapping from molecular coordinates to Markov states, thus combining the whole data processing pipeline in a single end-to-end framework. Our method performs equally or better than state-of-the art Markov modeling methods and provides easily interpretable few-state kinetic models.
Tasks Dimensionality Reduction
Published 2017-10-16
URL http://arxiv.org/abs/1710.06012v2
PDF http://arxiv.org/pdf/1710.06012v2.pdf
PWC https://paperswithcode.com/paper/vampnets-deep-learning-of-molecular-kinetics
Repo
Framework

Synthetic Data for Neural Machine Translation of Spoken-Dialects

Title Synthetic Data for Neural Machine Translation of Spoken-Dialects
Authors Hany Hassan, Mostafa Elaraby, Ahmed Tawfik
Abstract In this paper, we introduce a novel approach to generate synthetic data for training Neural Machine Translation systems. The proposed approach transforms a given parallel corpus between a written language and a target language to a parallel corpus between a spoken dialect variant and the target language. Our approach is language independent and can be used to generate data for any variant of the source language such as slang or spoken dialect or even for a different language that is closely related to the source language. The proposed approach is based on local embedding projection of distributed representations which utilizes monolingual embeddings to transform parallel data across language variants. We report experimental results on Levantine to English translation using Neural Machine Translation. We show that the generated data can improve a very large scale system by more than 2.8 Bleu points using synthetic spoken data which shows that it can be used to provide a reliable translation system for a spoken dialect that does not have sufficient parallel data.
Tasks Machine Translation
Published 2017-07-01
URL http://arxiv.org/abs/1707.00079v2
PDF http://arxiv.org/pdf/1707.00079v2.pdf
PWC https://paperswithcode.com/paper/synthetic-data-for-neural-machine-translation
Repo
Framework

Towards a Deep Improviser: a prototype deep learning post-tonal free music generator

Title Towards a Deep Improviser: a prototype deep learning post-tonal free music generator
Authors Roger T. Dean, Jamie Forth
Abstract Two modest-sized symbolic corpora of post-tonal and post-metric keyboard music have been constructed, one algorithmic, the other improvised. Deep learning models of each have been trained and largely optimised. Our purpose is to obtain a model with sufficient generalisation capacity that in response to a small quantity of separate fresh input seed material, it can generate outputs that are distinctive, rather than recreative of the learned corpora or the seed material. This objective has been first assessed statistically, and as judged by k-sample Anderson-Darling and Cramer tests, has been achieved. Music has been generated using the approach, and informal judgements place it roughly on a par with algorithmic and composed music in related forms. Future work will aim to enhance the model such that it can be evaluated in relation to expression, meaning and utility in real-time performance.
Tasks
Published 2017-12-21
URL http://arxiv.org/abs/1712.07799v1
PDF http://arxiv.org/pdf/1712.07799v1.pdf
PWC https://paperswithcode.com/paper/towards-a-deep-improviser-a-prototype-deep
Repo
Framework

Crowdsourcing Universal Part-Of-Speech Tags for Code-Switching

Title Crowdsourcing Universal Part-Of-Speech Tags for Code-Switching
Authors Victor Soto, Julia Hirschberg
Abstract Code-switching is the phenomenon by which bilingual speakers switch between multiple languages during communication. The importance of developing language technologies for codeswitching data is immense, given the large populations that routinely code-switch. High-quality linguistic annotations are extremely valuable for any NLP task, and performance is often limited by the amount of high-quality labeled data. However, little such data exists for code-switching. In this paper, we describe crowd-sourcing universal part-of-speech tags for the Miami Bangor Corpus of Spanish-English code-switched speech. We split the annotation task into three subtasks: one in which a subset of tokens are labeled automatically, one in which questions are specifically designed to disambiguate a subset of high frequency words, and a more general cascaded approach for the remaining data in which questions are displayed to the worker following a decision tree structure. Each subtask is extended and adapted for a multilingual setting and the universal tagset. The quality of the annotation process is measured using hidden check questions annotated with gold labels. The overall agreement between gold standard labels and the majority vote is between 0.95 and 0.96 for just three labels and the average recall across part-of-speech tags is between 0.87 and 0.99, depending on the task.
Tasks
Published 2017-03-24
URL http://arxiv.org/abs/1703.08537v1
PDF http://arxiv.org/pdf/1703.08537v1.pdf
PWC https://paperswithcode.com/paper/crowdsourcing-universal-part-of-speech-tags
Repo
Framework

Assessing Uncertainties in X-ray Single-particle Three-dimensional reconstructions

Title Assessing Uncertainties in X-ray Single-particle Three-dimensional reconstructions
Authors Stefan Engblom, Carl Nettelblad, Jing Liu
Abstract Modern technology for producing extremely bright and coherent X-ray laser pulses provides the possibility to acquire a large number of diffraction patterns from individual biological nanoparticles, including proteins, viruses, and DNA. These two-dimensional diffraction patterns can be practically reconstructed and retrieved down to a resolution of a few \angstrom. In principle, a sufficiently large collection of diffraction patterns will contain the required information for a full three-dimensional reconstruction of the biomolecule. The computational methodology for this reconstruction task is still under development and highly resolved reconstructions have not yet been produced. We analyze the Expansion-Maximization-Compression scheme, the current state of the art approach for this very challenging application, by isolating different sources of uncertainty. Through numerical experiments on synthetic data we evaluate their respective impact. We reach conclusions of relevance for handling actual experimental data, as well as pointing out certain improvements to the underlying estimation algorithm. We also introduce a practically applicable computational methodology in the form of bootstrap procedures for assessing reconstruction uncertainty in the real data case. We evaluate the sharpness of this approach and argue that this type of procedure will be critical in the near future when handling the increasing amount of data.
Tasks
Published 2017-01-02
URL http://arxiv.org/abs/1701.00338v1
PDF http://arxiv.org/pdf/1701.00338v1.pdf
PWC https://paperswithcode.com/paper/assessing-uncertainties-in-x-ray-single
Repo
Framework

Generalized Zero-Shot Learning via Synthesized Examples

Title Generalized Zero-Shot Learning via Synthesized Examples
Authors Vinay Kumar Verma, Gundeep Arora, Ashish Mishra, Piyush Rai
Abstract We present a generative framework for generalized zero-shot learning where the training and test classes are not necessarily disjoint. Built upon a variational autoencoder based architecture, consisting of a probabilistic encoder and a probabilistic conditional decoder, our model can generate novel exemplars from seen/unseen classes, given their respective class attributes. These exemplars can subsequently be used to train any off-the-shelf classification model. One of the key aspects of our encoder-decoder architecture is a feedback-driven mechanism in which a discriminator (a multivariate regressor) learns to map the generated exemplars to the corresponding class attribute vectors, leading to an improved generator. Our model’s ability to generate and leverage examples from unseen classes to train the classification model naturally helps to mitigate the bias towards predicting seen classes in generalized zero-shot learning settings. Through a comprehensive set of experiments, we show that our model outperforms several state-of-the-art methods, on several benchmark datasets, for both standard as well as generalized zero-shot learning.
Tasks Zero-Shot Learning
Published 2017-12-11
URL http://arxiv.org/abs/1712.03878v5
PDF http://arxiv.org/pdf/1712.03878v5.pdf
PWC https://paperswithcode.com/paper/generalized-zero-shot-learning-via
Repo
Framework

Proximodistal Exploration in Motor Learning as an Emergent Property of Optimization

Title Proximodistal Exploration in Motor Learning as an Emergent Property of Optimization
Authors Freek Stulp, Pierre-Yves Oudeyer
Abstract To harness the complexity of their high-dimensional bodies during sensorimotor development, infants are guided by patterns of freezing and freeing of degrees of freedom. For instance, when learning to reach, infants free the degrees of freedom in their arm proximodistally, i.e. from joints that are closer to the body to those that are more distant. Here, we formulate and study computationally the hypothesis that such patterns can emerge spontaneously as the result of a family of stochastic optimization processes (evolution strategies with covariance-matrix adaptation), without an innate encoding of a maturational schedule. In particular, we present simulated experiments with an arm where a computational learner progressively acquires reaching skills through adaptive exploration, and we show that a proximodistal organization appears spontaneously, which we denote PDFF (ProximoDistal Freezing and Freeing of degrees of freedom). We also compare this emergent organization between different arm morphologies – from human-like to quite unnatural ones – to study the effect of different kinematic structures on the emergence of PDFF. Keywords: human motor learning; proximo-distal exploration; stochastic optimization; modelling; evolution strategies; cross-entropy methods; policy search; morphology.}
Tasks Stochastic Optimization
Published 2017-12-14
URL http://arxiv.org/abs/1712.05249v1
PDF http://arxiv.org/pdf/1712.05249v1.pdf
PWC https://paperswithcode.com/paper/proximodistal-exploration-in-motor-learning
Repo
Framework

A spatiotemporal model with visual attention for video classification

Title A spatiotemporal model with visual attention for video classification
Authors Mo Shan, Nikolay Atanasov
Abstract High level understanding of sequential visual input is important for safe and stable autonomy, especially in localization and object detection. While traditional object classification and tracking approaches are specifically designed to handle variations in rotation and scale, current state-of-the-art approaches based on deep learning achieve better performance. This paper focuses on developing a spatiotemporal model to handle videos containing moving objects with rotation and scale changes. Built on models that combine Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) to classify sequential data, this work investigates the effectiveness of incorporating attention modules in the CNN stage for video classification. The superiority of the proposed spatiotemporal model is demonstrated on the Moving MNIST dataset augmented with rotation and scaling.
Tasks Object Classification, Object Detection, Video Classification
Published 2017-07-07
URL http://arxiv.org/abs/1707.02069v2
PDF http://arxiv.org/pdf/1707.02069v2.pdf
PWC https://paperswithcode.com/paper/a-spatiotemporal-model-with-visual-attention
Repo
Framework

Joint Dictionaries for Zero-Shot Learning

Title Joint Dictionaries for Zero-Shot Learning
Authors Soheil Kolouri, Mohammad Rostami, Yuri Owechko, Kyungnam Kim
Abstract A classic approach toward zero-shot learning (ZSL) is to map the input domain to a set of semantically meaningful attributes that could be used later on to classify unseen classes of data (e.g. visual data). In this paper, we propose to learn a visual feature dictionary that has semantically meaningful atoms. Such dictionary is learned via joint dictionary learning for the visual domain and the attribute domain, while enforcing the same sparse coding for both dictionaries. Our novel attribute aware formulation provides an algorithmic solution to the domain shift/hubness problem in ZSL. Upon learning the joint dictionaries, images from unseen classes can be mapped into the attribute space by finding the attribute aware joint sparse representation using solely the visual data. We demonstrate that our approach provides superior or comparable performance to that of the state of the art on benchmark datasets.
Tasks Dictionary Learning, Zero-Shot Learning
Published 2017-09-12
URL http://arxiv.org/abs/1709.03688v1
PDF http://arxiv.org/pdf/1709.03688v1.pdf
PWC https://paperswithcode.com/paper/joint-dictionaries-for-zero-shot-learning
Repo
Framework

The Microsoft 2017 Conversational Speech Recognition System

Title The Microsoft 2017 Conversational Speech Recognition System
Authors W. Xiong, L. Wu, F. Alleva, J. Droppo, X. Huang, A. Stolcke
Abstract We describe the 2017 version of Microsoft’s conversational speech recognition system, in which we update our 2016 system with recent developments in neural-network-based acoustic and language modeling to further advance the state of the art on the Switchboard speech recognition task. The system adds a CNN-BLSTM acoustic model to the set of model architectures we combined previously, and includes character-based and dialog session aware LSTM language models in rescoring. For system combination we adopt a two-stage approach, whereby subsets of acoustic models are first combined at the senone/frame level, followed by a word-level voting via confusion networks. We also added a confusion network rescoring step after system combination. The resulting system yields a 5.1% word error rate on the 2000 Switchboard evaluation set.
Tasks Language Modelling, Speech Recognition
Published 2017-08-21
URL http://arxiv.org/abs/1708.06073v2
PDF http://arxiv.org/pdf/1708.06073v2.pdf
PWC https://paperswithcode.com/paper/the-microsoft-2017-conversational-speech
Repo
Framework

The Shattered Gradients Problem: If resnets are the answer, then what is the question?

Title The Shattered Gradients Problem: If resnets are the answer, then what is the question?
Authors David Balduzzi, Marcus Frean, Lennox Leary, JP Lewis, Kurt Wan-Duo Ma, Brian McWilliams
Abstract A long-standing obstacle to progress in deep learning is the problem of vanishing and exploding gradients. Although, the problem has largely been overcome via carefully constructed initializations and batch normalization, architectures incorporating skip-connections such as highway and resnets perform much better than standard feedforward architectures despite well-chosen initialization and batch normalization. In this paper, we identify the shattered gradients problem. Specifically, we show that the correlation between gradients in standard feedforward networks decays exponentially with depth resulting in gradients that resemble white noise whereas, in contrast, the gradients in architectures with skip-connections are far more resistant to shattering, decaying sublinearly. Detailed empirical evidence is presented in support of the analysis, on both fully-connected networks and convnets. Finally, we present a new “looks linear” (LL) initialization that prevents shattering, with preliminary experiments showing the new initialization allows to train very deep networks without the addition of skip-connections.
Tasks
Published 2017-02-28
URL http://arxiv.org/abs/1702.08591v2
PDF http://arxiv.org/pdf/1702.08591v2.pdf
PWC https://paperswithcode.com/paper/the-shattered-gradients-problem-if-resnets
Repo
Framework

Retinal Microaneurysms Detection using Local Convergence Index Features

Title Retinal Microaneurysms Detection using Local Convergence Index Features
Authors Behdad Dashtbozorg, Jiong Zhang, Bart M. ter Haar Romeny
Abstract Retinal microaneurysms are the earliest clinical sign of diabetic retinopathy disease. Detection of microaneurysms is crucial for the early diagnosis of diabetic retinopathy and prevention of blindness. In this paper, a novel and reliable method for automatic detection of microaneurysms in retinal images is proposed. In the first stage of the proposed method, several preliminary microaneurysm candidates are extracted using a gradient weighting technique and an iterative thresholding approach. In the next stage, in addition to intensity and shape descriptors, a new set of features based on local convergence index filters is extracted for each candidate. Finally, the collective set of features is fed to a hybrid sampling/boosting classifier to discriminate the MAs from non-MAs candidates. The method is evaluated on images with different resolutions and modalities (RGB and SLO) using five publicly available datasets including the Retinopathy Online Challenge’s dataset. The proposed method achieves an average sensitivity score of 0.471 on the ROC dataset outperforming state-of-the-art approaches in an extensive comparison. The experimental results on the other four datasets demonstrate the effectiveness and robustness of the proposed microaneurysms detection method regardless of different image resolutions and modalities.
Tasks
Published 2017-07-21
URL http://arxiv.org/abs/1707.06865v1
PDF http://arxiv.org/pdf/1707.06865v1.pdf
PWC https://paperswithcode.com/paper/retinal-microaneurysms-detection-using-local
Repo
Framework

An Analysis of Scale Invariance in Object Detection - SNIP

Title An Analysis of Scale Invariance in Object Detection - SNIP
Authors Bharat Singh, Larry S. Davis
Abstract An analysis of different techniques for recognizing and detecting objects under extreme scale variation is presented. Scale specific and scale invariant design of detectors are compared by training them with different configurations of input data. By evaluating the performance of different network architectures for classifying small objects on ImageNet, we show that CNNs are not robust to changes in scale. Based on this analysis, we propose to train and test detectors on the same scales of an image-pyramid. Since small and large objects are difficult to recognize at smaller and larger scales respectively, we present a novel training scheme called Scale Normalization for Image Pyramids (SNIP) which selectively back-propagates the gradients of object instances of different sizes as a function of the image scale. On the COCO dataset, our single model performance is 45.7% and an ensemble of 3 networks obtains an mAP of 48.3%. We use off-the-shelf ImageNet-1000 pre-trained models and only train with bounding box supervision. Our submission won the Best Student Entry in the COCO 2017 challenge. Code will be made available at \url{http://bit.ly/2yXVg4c}.
Tasks Object Detection
Published 2017-11-22
URL http://arxiv.org/abs/1711.08189v2
PDF http://arxiv.org/pdf/1711.08189v2.pdf
PWC https://paperswithcode.com/paper/an-analysis-of-scale-invariance-in-object-1
Repo
Framework
comments powered by Disqus