July 29, 2019

2912 words 14 mins read

Paper Group ANR 137

Paper Group ANR 137

A practical approach to dialogue response generation in closed domains. Classification Accuracy Improvement for Neuromorphic Computing Systems with One-level Precision Synapses. Deep image representations using caption generators. Nonparametric Inference for Auto-Encoding Variational Bayes. Densely tracking sequences of 3D face scans. Poseidon: An …

A practical approach to dialogue response generation in closed domains

Title A practical approach to dialogue response generation in closed domains
Authors Yichao Lu, Phillip Keung, Shaonan Zhang, Jason Sun, Vikas Bhardwaj
Abstract We describe a prototype dialogue response generation model for the customer service domain at Amazon. The model, which is trained in a weakly supervised fashion, measures the similarity between customer questions and agent answers using a dual encoder network, a Siamese-like neural network architecture. Answer templates are extracted from embeddings derived from past agent answers, without turn-by-turn annotations. Responses to customer inquiries are generated by selecting the best template from the final set of templates. We show that, in a closed domain like customer service, the selected templates cover $>$70% of past customer inquiries. Furthermore, the relevance of the model-selected templates is significantly higher than templates selected by a standard tf-idf baseline.
Tasks
Published 2017-03-28
URL http://arxiv.org/abs/1703.09439v1
PDF http://arxiv.org/pdf/1703.09439v1.pdf
PWC https://paperswithcode.com/paper/a-practical-approach-to-dialogue-response
Repo
Framework

Classification Accuracy Improvement for Neuromorphic Computing Systems with One-level Precision Synapses

Title Classification Accuracy Improvement for Neuromorphic Computing Systems with One-level Precision Synapses
Authors Yandan Wang, Wei Wen, Linghao Song, Hai Li
Abstract Brain inspired neuromorphic computing has demonstrated remarkable advantages over traditional von Neumann architecture for its high energy efficiency and parallel data processing. However, the limited resolution of synaptic weights degrades system accuracy and thus impedes the use of neuromorphic systems. In this work, we propose three orthogonal methods to learn synapses with one-level precision, namely, distribution-aware quantization, quantization regularization and bias tuning, to make image classification accuracy comparable to the state-of-the-art. Experiments on both multi-layer perception and convolutional neural networks show that the accuracy drop can be well controlled within 0.19% (5.53%) for MNIST (CIFAR-10) database, compared to an ideal system without quantization.
Tasks Image Classification, Quantization
Published 2017-01-07
URL http://arxiv.org/abs/1701.01791v1
PDF http://arxiv.org/pdf/1701.01791v1.pdf
PWC https://paperswithcode.com/paper/classification-accuracy-improvement-for
Repo
Framework

Deep image representations using caption generators

Title Deep image representations using caption generators
Authors Konda Reddy Mopuri, Vishal B. Athreya, R. Venkatesh Babu
Abstract Deep learning exploits large volumes of labeled data to learn powerful models. When the target dataset is small, it is a common practice to perform transfer learning using pre-trained models to learn new task specific representations. However, pre-trained CNNs for image recognition are provided with limited information about the image during training, which is label alone. Tasks such as scene retrieval suffer from features learned from this weak supervision and require stronger supervision to better understand the contents of the image. In this paper, we exploit the features learned from caption generating models to learn novel task specific image representations. In particular, we consider the state-of-the art captioning system Show and Tell~\cite{SnT-pami-2016} and the dense region description model DenseCap~\cite{densecap-cvpr-2016}. We demonstrate that, owing to richer supervision provided during the process of training, the features learned by the captioning system perform better than those of CNNs. Further, we train a siamese network with a modified pair-wise loss to fuse the features learned by~\cite{SnT-pami-2016} and~\cite{densecap-cvpr-2016} and learn image representations suitable for retrieval. Experiments show that the proposed fusion exploits the complementary nature of the individual features and yields state-of-the art retrieval results on benchmark datasets.
Tasks Transfer Learning
Published 2017-05-25
URL http://arxiv.org/abs/1705.09142v1
PDF http://arxiv.org/pdf/1705.09142v1.pdf
PWC https://paperswithcode.com/paper/deep-image-representations-using-caption
Repo
Framework

Nonparametric Inference for Auto-Encoding Variational Bayes

Title Nonparametric Inference for Auto-Encoding Variational Bayes
Authors Erik Bodin, Iman Malik, Carl Henrik Ek, Neill D. F. Campbell
Abstract We would like to learn latent representations that are low-dimensional and highly interpretable. A model that has these characteristics is the Gaussian Process Latent Variable Model. The benefits and negative of the GP-LVM are complementary to the Variational Autoencoder, the former provides interpretable low-dimensional latent representations while the latter is able to handle large amounts of data and can use non-Gaussian likelihoods. Our inspiration for this paper is to marry these two approaches and reap the benefits of both. In order to do so we will introduce a novel approximate inference scheme inspired by the GP-LVM and the VAE. We show experimentally that the approximation allows the capacity of the generative bottle-neck (Z) of the VAE to be arbitrarily large without losing a highly interpretable representation, allowing reconstruction quality to be unlimited by Z at the same time as a low-dimensional space can be used to perform ancestral sampling from as well as a means to reason about the embedded data.
Tasks
Published 2017-12-18
URL http://arxiv.org/abs/1712.06536v1
PDF http://arxiv.org/pdf/1712.06536v1.pdf
PWC https://paperswithcode.com/paper/nonparametric-inference-for-auto-encoding
Repo
Framework

Densely tracking sequences of 3D face scans

Title Densely tracking sequences of 3D face scans
Authors Huaxiong Ding, Liming Chen
Abstract 3D face dense tracking aims to find dense inter-frame correspondences in a sequence of 3D face scans and constitutes a powerful tool for many face analysis tasks, e.g., 3D dynamic facial expression analysis. The majority of the existing methods just fit a 3D face surface or model to a 3D target surface without considering temporal information between frames. In this paper, we propose a novel method for densely tracking sequences of 3D face scans, which ex- tends the non-rigid ICP algorithm by adding a novel specific criterion for temporal information. A novel fitting framework is presented for automatically tracking a full sequence of 3D face scans. The results of experiments carried out on the BU4D-FE database are promising, showing that the proposed algorithm outperforms state-of-the-art algorithms for 3D face dense tracking.
Tasks
Published 2017-09-13
URL http://arxiv.org/abs/1709.04295v1
PDF http://arxiv.org/pdf/1709.04295v1.pdf
PWC https://paperswithcode.com/paper/densely-tracking-sequences-of-3d-face-scans
Repo
Framework

Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters

Title Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters
Authors Hao Zhang, Zeyu Zheng, Shizhen Xu, Wei Dai, Qirong Ho, Xiaodan Liang, Zhiting Hu, Jinliang Wei, Pengtao Xie, Eric P. Xing
Abstract Deep learning models can take weeks to train on a single GPU-equipped machine, necessitating scaling out DL training to a GPU-cluster. However, current distributed DL implementations can scale poorly due to substantial parameter synchronization over the network, because the high throughput of GPUs allows more data batches to be processed per unit time than CPUs, leading to more frequent network synchronization. We present Poseidon, an efficient communication architecture for distributed DL on GPUs. Poseidon exploits the layered model structures in DL programs to overlap communication and computation, reducing bursty network communication. Moreover, Poseidon uses a hybrid communication scheme that optimizes the number of bytes required to synchronize each layer, according to layer properties and the number of machines. We show that Poseidon is applicable to different DL frameworks by plugging Poseidon into Caffe and TensorFlow. We show that Poseidon enables Caffe and TensorFlow to achieve 15.5x speed-up on 16 single-GPU machines, even with limited bandwidth (10GbE) and the challenging VGG19-22K network for image classification. Moreover, Poseidon-enabled TensorFlow achieves 31.5x speed-up with 32 single-GPU machines on Inception-V3, a 50% improvement over the open-source TensorFlow (20x speed-up).
Tasks Image Classification
Published 2017-06-11
URL http://arxiv.org/abs/1706.03292v1
PDF http://arxiv.org/pdf/1706.03292v1.pdf
PWC https://paperswithcode.com/paper/poseidon-an-efficient-communication
Repo
Framework

Probabilistic learning of nonlinear dynamical systems using sequential Monte Carlo

Title Probabilistic learning of nonlinear dynamical systems using sequential Monte Carlo
Authors Thomas B. Schön, Andreas Svensson, Lawrence Murray, Fredrik Lindsten
Abstract Probabilistic modeling provides the capability to represent and manipulate uncertainty in data, models, predictions and decisions. We are concerned with the problem of learning probabilistic models of dynamical systems from measured data. Specifically, we consider learning of probabilistic nonlinear state-space models. There is no closed-form solution available for this problem, implying that we are forced to use approximations. In this tutorial we will provide a self-contained introduction to one of the state-of-the-art methods—the particle Metropolis–Hastings algorithm—which has proven to offer a practical approximation. This is a Monte Carlo based method, where the particle filter is used to guide a Markov chain Monte Carlo method through the parameter space. One of the key merits of the particle Metropolis–Hastings algorithm is that it is guaranteed to converge to the “true solution” under mild assumptions, despite being based on a particle filter with only a finite number of particles. We will also provide a motivating numerical example illustrating the method using a modeling language tailored for sequential Monte Carlo methods. The intention of modeling languages of this kind is to open up the power of sophisticated Monte Carlo methods—including particle Metropolis–Hastings—to a large group of users without requiring them to know all the underlying mathematical details.
Tasks
Published 2017-03-07
URL http://arxiv.org/abs/1703.02419v2
PDF http://arxiv.org/pdf/1703.02419v2.pdf
PWC https://paperswithcode.com/paper/probabilistic-learning-of-nonlinear-dynamical
Repo
Framework

Real-time 3D Shape Instantiation from Single Fluoroscopy Projection for Fenestrated Stent Graft Deployment

Title Real-time 3D Shape Instantiation from Single Fluoroscopy Projection for Fenestrated Stent Graft Deployment
Authors Xiao-Yun Zhou, Jianyu Lin, Celia Riga, Guang-Zhong Yang, Su-Lin Lee
Abstract Robot-assisted deployment of fenestrated stent grafts in Fenestrated Endovascular Aortic Repair (FEVAR) requires accurate geometrical alignment. Currently, this process is guided by 2D fluoroscopy, which is uninformative and error prone. In this paper, a real-time framework is proposed to instantiate the 3D shape of a fenestrated stent graft based on only a single low-dose 2D fluoroscopic image. Firstly, the fenestrated stent graft was placed with markers. Secondly, the 3D pose of each stent segment was instantiated by the RPnP (Robust Perspective-n-Point) method. Thirdly, the 3D shape of the whole stent graft was instantiated via graft gap interpolation. Focal-Unet was proposed to segment the markers from 2D fluoroscopic images to achieve semi-automatic marker detection. The proposed framework was validated on five patient-specific 3D printed phantoms of aortic aneurysms and three stent grafts with new marker placements, showing an average distance error of 1-3mm and an average angle error of 4 degree.
Tasks
Published 2017-09-22
URL http://arxiv.org/abs/1709.07689v2
PDF http://arxiv.org/pdf/1709.07689v2.pdf
PWC https://paperswithcode.com/paper/real-time-3d-shape-instantiation-from-single
Repo
Framework

Derivation of the Asymptotic Eigenvalue Distribution for Causal 2D-AR Models under Upscaling

Title Derivation of the Asymptotic Eigenvalue Distribution for Causal 2D-AR Models under Upscaling
Authors David Vázquez-Padín, Fernando Pérez-González, Pedro Comesaña-Alfaro
Abstract This technical report describes the derivation of the asymptotic eigenvalue distribution for causal 2D-AR models under an upscaling scenario. Specifically, it tackles the analytical derivation of the asymptotic eigenvalue distribution of the sample autocorrelation matrix corresponding to genuine and upscaled images. It also includes the pseudocode of the derived approaches for resampling detection and resampling factor estimation that are based on this analysis.
Tasks
Published 2017-04-19
URL http://arxiv.org/abs/1704.05773v1
PDF http://arxiv.org/pdf/1704.05773v1.pdf
PWC https://paperswithcode.com/paper/derivation-of-the-asymptotic-eigenvalue
Repo
Framework

Sentiment Recognition in Egocentric Photostreams

Title Sentiment Recognition in Egocentric Photostreams
Authors Estefania Talavera, Nicola Strisciuglio, Nicolai Petkov, Petia Radeva
Abstract Lifelogging is a process of collecting rich source of information about daily life of people. In this paper, we introduce the problem of sentiment analysis in egocentric events focusing on the moments that compose the images recalling positive, neutral or negative feelings to the observer. We propose a method for the classification of the sentiments in egocentric pictures based on global and semantic image features extracted by Convolutional Neural Networks. We carried out experiments on an egocentric dataset, which we organized in 3 classes on the basis of the sentiment that is recalled to the user (positive, negative or neutral).
Tasks Sentiment Analysis
Published 2017-03-29
URL http://arxiv.org/abs/1703.09933v1
PDF http://arxiv.org/pdf/1703.09933v1.pdf
PWC https://paperswithcode.com/paper/sentiment-recognition-in-egocentric
Repo
Framework

Long-Term Memory Networks for Question Answering

Title Long-Term Memory Networks for Question Answering
Authors Fenglong Ma, Radha Chitta, Saurabh Kataria, Jing Zhou, Palghat Ramesh, Tong Sun, Jing Gao
Abstract Question answering is an important and difficult task in the natural language processing domain, because many basic natural language processing tasks can be cast into a question answering task. Several deep neural network architectures have been developed recently, which employ memory and inference components to memorize and reason over text information, and generate answers to questions. However, a major drawback of many such models is that they are capable of only generating single-word answers. In addition, they require large amount of training data to generate accurate answers. In this paper, we introduce the Long-Term Memory Network (LTMN), which incorporates both an external memory module and a Long Short-Term Memory (LSTM) module to comprehend the input data and generate multi-word answers. The LTMN model can be trained end-to-end using back-propagation and requires minimal supervision. We test our model on two synthetic data sets (based on Facebook’s bAbI data set) and the real-world Stanford question answering data set, and show that it can achieve state-of-the-art performance.
Tasks Question Answering
Published 2017-07-06
URL http://arxiv.org/abs/1707.01961v1
PDF http://arxiv.org/pdf/1707.01961v1.pdf
PWC https://paperswithcode.com/paper/long-term-memory-networks-for-question
Repo
Framework

How a General-Purpose Commonsense Ontology can Improve Performance of Learning-Based Image Retrieval

Title How a General-Purpose Commonsense Ontology can Improve Performance of Learning-Based Image Retrieval
Authors Rodrigo Toro Icarte, Jorge A. Baier, Cristian Ruz, Alvaro Soto
Abstract The knowledge representation community has built general-purpose ontologies which contain large amounts of commonsense knowledge over relevant aspects of the world, including useful visual information, e.g.: “a ball is used by a football player”, “a tennis player is located at a tennis court”. Current state-of-the-art approaches for visual recognition do not exploit these rule-based knowledge sources. Instead, they learn recognition models directly from training examples. In this paper, we study how general-purpose ontologies—specifically, MIT’s ConceptNet ontology—can improve the performance of state-of-the-art vision systems. As a testbed, we tackle the problem of sentence-based image retrieval. Our retrieval approach incorporates knowledge from ConceptNet on top of a large pool of object detectors derived from a deep learning technique. In our experiments, we show that ConceptNet can improve performance on a common benchmark dataset. Key to our performance is the use of the ESPGAME dataset to select visually relevant relations from ConceptNet. Consequently, a main conclusion of this work is that general-purpose commonsense ontologies improve performance on visual reasoning tasks when properly filtered to select meaningful visual relations.
Tasks Image Retrieval, Visual Reasoning
Published 2017-05-24
URL http://arxiv.org/abs/1705.08844v1
PDF http://arxiv.org/pdf/1705.08844v1.pdf
PWC https://paperswithcode.com/paper/how-a-general-purpose-commonsense-ontology
Repo
Framework

An Iterative Regression Approach for Face Pose Estimation from RGB Images

Title An Iterative Regression Approach for Face Pose Estimation from RGB Images
Authors Wenye He
Abstract This paper presents a iterative optimization method, explicit shape regression, for face pose detection and localization. The regression function is learnt to find out the entire facial shape and minimize the alignment errors. A cascaded learning framework is employed to enhance shape constraint during detection. A combination of a two-level boosted regression, shape indexed features and a correlation-based feature selection method is used to improve the performance. In this paper, we have explain the advantage of ESR for deformable object like face pose estimation and reveal its generic applications of the method. In the experiment, we compare the results with different work and demonstrate the accuracy and robustness in different scenarios.
Tasks Feature Selection, Pose Estimation
Published 2017-09-10
URL http://arxiv.org/abs/1709.03170v1
PDF http://arxiv.org/pdf/1709.03170v1.pdf
PWC https://paperswithcode.com/paper/an-iterative-regression-approach-for-face
Repo
Framework

TraX: The visual Tracking eXchange Protocol and Library

Title TraX: The visual Tracking eXchange Protocol and Library
Authors Luka Čehovin
Abstract In this paper we address the problem of developing on-line visual tracking algorithms. We present a specialized communication protocol that serves as a bridge between a tracker implementation and utilizing application. It decouples development of algorithms and application, encouraging re-usability. The primary use case is algorithm evaluation where the protocol facilitates more complex evaluation scenarios that are used nowadays thus pushing forward the field of visual tracking. We present a reference implementation of the protocol that makes it easy to use in several popular programming languages and discuss where the protocol is already used and some usage scenarios that we envision for the future.
Tasks Visual Tracking
Published 2017-05-12
URL http://arxiv.org/abs/1705.04469v1
PDF http://arxiv.org/pdf/1705.04469v1.pdf
PWC https://paperswithcode.com/paper/trax-the-visual-tracking-exchange-protocol
Repo
Framework

Ω-Net (Omega-Net): Fully Automatic, Multi-View Cardiac MR Detection, Orientation, and Segmentation with Deep Neural Networks

Title Ω-Net (Omega-Net): Fully Automatic, Multi-View Cardiac MR Detection, Orientation, and Segmentation with Deep Neural Networks
Authors Davis M. Vigneault, Weidi Xie, Carolyn Y. Ho, David A. Bluemke, J. Alison Noble
Abstract Pixelwise segmentation of the left ventricular (LV) myocardium and the four cardiac chambers in 2-D steady state free precession (SSFP) cine sequences is an essential preprocessing step for a wide range of analyses. Variability in contrast, appearance, orientation, and placement of the heart between patients, clinical views, scanners, and protocols makes fully automatic semantic segmentation a notoriously difficult problem. Here, we present ${\Omega}$-Net (Omega-Net): a novel convolutional neural network (CNN) architecture for simultaneous localization, transformation into a canonical orientation, and semantic segmentation. First, an initial segmentation is performed on the input image, second, the features learned during this initial segmentation are used to predict the parameters needed to transform the input image into a canonical orientation, and third, a final segmentation is performed on the transformed image. In this work, ${\Omega}$-Nets of varying depths were trained to detect five foreground classes in any of three clinical views (short axis, SA, four-chamber, 4C, two-chamber, 2C), without prior knowledge of the view being segmented. The architecture was trained on a cohort of patients with hypertrophic cardiomyopathy and healthy control subjects. Network performance as measured by weighted foreground intersection-over-union (IoU) was substantially improved in the best-performing ${\Omega}$- Net compared with U-Net segmentation without localization or orientation. In addition, {\Omega}-Net was retrained from scratch on the 2017 MICCAI ACDC dataset, and achieves state-of-the-art results on the LV and RV bloodpools, and performed slightly worse in segmentation of the LV myocardium. We conclude this architecture represents a substantive advancement over prior approaches, with implications for biomedical image segmentation more generally.
Tasks Semantic Segmentation
Published 2017-11-03
URL http://arxiv.org/abs/1711.01094v3
PDF http://arxiv.org/pdf/1711.01094v3.pdf
PWC https://paperswithcode.com/paper/-net-omega-net-fully-automatic-multi-view
Repo
Framework
comments powered by Disqus