July 29, 2019

2912 words 14 mins read

Paper Group ANR 137

A practical approach to dialogue response generation in closed domains. Classification Accuracy Improvement for Neuromorphic Computing Systems with One-level Precision Synapses. Deep image representations using caption generators. Nonparametric Inference for Auto-Encoding Variational Bayes. Densely tracking sequences of 3D face scans. Poseidon: An …

A practical approach to dialogue response generation in closed domains


Title	A practical approach to dialogue response generation in closed domains
Authors	Yichao Lu, Phillip Keung, Shaonan Zhang, Jason Sun, Vikas Bhardwaj
Abstract	We describe a prototype dialogue response generation model for the customer service domain at Amazon. The model, which is trained in a weakly supervised fashion, measures the similarity between customer questions and agent answers using a dual encoder network, a Siamese-like neural network architecture. Answer templates are extracted from embeddings derived from past agent answers, without turn-by-turn annotations. Responses to customer inquiries are generated by selecting the best template from the final set of templates. We show that, in a closed domain like customer service, the selected templates cover $>$70% of past customer inquiries. Furthermore, the relevance of the model-selected templates is significantly higher than templates selected by a standard tf-idf baseline.
Tasks
Published	2017-03-28
URL	http://arxiv.org/abs/1703.09439v1
PDF	http://arxiv.org/pdf/1703.09439v1.pdf
PWC	https://paperswithcode.com/paper/a-practical-approach-to-dialogue-response
Repo
Framework

Classification Accuracy Improvement for Neuromorphic Computing Systems with One-level Precision Synapses


Title	Classification Accuracy Improvement for Neuromorphic Computing Systems with One-level Precision Synapses
Authors	Yandan Wang, Wei Wen, Linghao Song, Hai Li
Abstract	Brain inspired neuromorphic computing has demonstrated remarkable advantages over traditional von Neumann architecture for its high energy efficiency and parallel data processing. However, the limited resolution of synaptic weights degrades system accuracy and thus impedes the use of neuromorphic systems. In this work, we propose three orthogonal methods to learn synapses with one-level precision, namely, distribution-aware quantization, quantization regularization and bias tuning, to make image classification accuracy comparable to the state-of-the-art. Experiments on both multi-layer perception and convolutional neural networks show that the accuracy drop can be well controlled within 0.19% (5.53%) for MNIST (CIFAR-10) database, compared to an ideal system without quantization.
Tasks	Image Classification, Quantization
Published	2017-01-07
URL	http://arxiv.org/abs/1701.01791v1
PDF	http://arxiv.org/pdf/1701.01791v1.pdf
PWC	https://paperswithcode.com/paper/classification-accuracy-improvement-for
Repo
Framework

Deep image representations using caption generators


Title	Deep image representations using caption generators
Authors	Konda Reddy Mopuri, Vishal B. Athreya, R. Venkatesh Babu
Abstract	Deep learning exploits large volumes of labeled data to learn powerful models. When the target dataset is small, it is a common practice to perform transfer learning using pre-trained models to learn new task specific representations. However, pre-trained CNNs for image recognition are provided with limited information about the image during training, which is label alone. Tasks such as scene retrieval suffer from features learned from this weak supervision and require stronger supervision to better understand the contents of the image. In this paper, we exploit the features learned from caption generating models to learn novel task specific image representations. In particular, we consider the state-of-the art captioning system Show and Tell~\cite{SnT-pami-2016} and the dense region description model DenseCap~\cite{densecap-cvpr-2016}. We demonstrate that, owing to richer supervision provided during the process of training, the features learned by the captioning system perform better than those of CNNs. Further, we train a siamese network with a modified pair-wise loss to fuse the features learned by~\cite{SnT-pami-2016} and~\cite{densecap-cvpr-2016} and learn image representations suitable for retrieval. Experiments show that the proposed fusion exploits the complementary nature of the individual features and yields state-of-the art retrieval results on benchmark datasets.
Tasks	Transfer Learning
Published	2017-05-25
URL	http://arxiv.org/abs/1705.09142v1
PDF	http://arxiv.org/pdf/1705.09142v1.pdf
PWC	https://paperswithcode.com/paper/deep-image-representations-using-caption
Repo
Framework

Nonparametric Inference for Auto-Encoding Variational Bayes


Title	Nonparametric Inference for Auto-Encoding Variational Bayes
Authors	Erik Bodin, Iman Malik, Carl Henrik Ek, Neill D. F. Campbell
Abstract	We would like to learn latent representations that are low-dimensional and highly interpretable. A model that has these characteristics is the Gaussian Process Latent Variable Model. The benefits and negative of the GP-LVM are complementary to the Variational Autoencoder, the former provides interpretable low-dimensional latent representations while the latter is able to handle large amounts of data and can use non-Gaussian likelihoods. Our inspiration for this paper is to marry these two approaches and reap the benefits of both. In order to do so we will introduce a novel approximate inference scheme inspired by the GP-LVM and the VAE. We show experimentally that the approximation allows the capacity of the generative bottle-neck (Z) of the VAE to be arbitrarily large without losing a highly interpretable representation, allowing reconstruction quality to be unlimited by Z at the same time as a low-dimensional space can be used to perform ancestral sampling from as well as a means to reason about the embedded data.
Tasks
Published	2017-12-18
URL	http://arxiv.org/abs/1712.06536v1
PDF	http://arxiv.org/pdf/1712.06536v1.pdf
PWC	https://paperswithcode.com/paper/nonparametric-inference-for-auto-encoding
Repo
Framework

Densely tracking sequences of 3D face scans


Title	Densely tracking sequences of 3D face scans
Authors	Huaxiong Ding, Liming Chen
Abstract	3D face dense tracking aims to find dense inter-frame correspondences in a sequence of 3D face scans and constitutes a powerful tool for many face analysis tasks, e.g., 3D dynamic facial expression analysis. The majority of the existing methods just fit a 3D face surface or model to a 3D target surface without considering temporal information between frames. In this paper, we propose a novel method for densely tracking sequences of 3D face scans, which ex- tends the non-rigid ICP algorithm by adding a novel specific criterion for temporal information. A novel fitting framework is presented for automatically tracking a full sequence of 3D face scans. The results of experiments carried out on the BU4D-FE database are promising, showing that the proposed algorithm outperforms state-of-the-art algorithms for 3D face dense tracking.
Tasks
Published	2017-09-13
URL	http://arxiv.org/abs/1709.04295v1
PDF	http://arxiv.org/pdf/1709.04295v1.pdf
PWC	https://paperswithcode.com/paper/densely-tracking-sequences-of-3d-face-scans
Repo
Framework

Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters


Title	Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters
Authors	Hao Zhang, Zeyu Zheng, Shizhen Xu, Wei Dai, Qirong Ho, Xiaodan Liang, Zhiting Hu, Jinliang Wei, Pengtao Xie, Eric P. Xing
Abstract	Deep learning models can take weeks to train on a single GPU-equipped machine, necessitating scaling out DL training to a GPU-cluster. However, current distributed DL implementations can scale poorly due to substantial parameter synchronization over the network, because the high throughput of GPUs allows more data batches to be processed per unit time than CPUs, leading to more frequent network synchronization. We present Poseidon, an efficient communication architecture for distributed DL on GPUs. Poseidon exploits the layered model structures in DL programs to overlap communication and computation, reducing bursty network communication. Moreover, Poseidon uses a hybrid communication scheme that optimizes the number of bytes required to synchronize each layer, according to layer properties and the number of machines. We show that Poseidon is applicable to different DL frameworks by plugging Poseidon into Caffe and TensorFlow. We show that Poseidon enables Caffe and TensorFlow to achieve 15.5x speed-up on 16 single-GPU machines, even with limited bandwidth (10GbE) and the challenging VGG19-22K network for image classification. Moreover, Poseidon-enabled TensorFlow achieves 31.5x speed-up with 32 single-GPU machines on Inception-V3, a 50% improvement over the open-source TensorFlow (20x speed-up).
Tasks	Image Classification
Published	2017-06-11
URL	http://arxiv.org/abs/1706.03292v1
PDF	http://arxiv.org/pdf/1706.03292v1.pdf
PWC	https://paperswithcode.com/paper/poseidon-an-efficient-communication
Repo
Framework

Probabilistic learning of nonlinear dynamical systems using sequential Monte Carlo


Title	Probabilistic learning of nonlinear dynamical systems using sequential Monte Carlo
Authors	Thomas B. Schön, Andreas Svensson, Lawrence Murray, Fredrik Lindsten
Abstract	Probabilistic modeling provides the capability to represent and manipulate uncertainty in data, models, predictions and decisions. We are concerned with the problem of learning probabilistic models of dynamical systems from measured data. Specifically, we consider learning of probabilistic nonlinear state-space models. There is no closed-form solution available for this problem, implying that we are forced to use approximations. In this tutorial we will provide a self-contained introduction to one of the state-of-the-art methods—the particle Metropolis–Hastings algorithm—which has proven to offer a practical approximation. This is a Monte Carlo based method, where the particle filter is used to guide a Markov chain Monte Carlo method through the parameter space. One of the key merits of the particle Metropolis–Hastings algorithm is that it is guaranteed to converge to the “true solution” under mild assumptions, despite being based on a particle filter with only a finite number of particles. We will also provide a motivating numerical example illustrating the method using a modeling language tailored for sequential Monte Carlo methods. The intention of modeling languages of this kind is to open up the power of sophisticated Monte Carlo methods—including particle Metropolis–Hastings—to a large group of users without requiring them to know all the underlying mathematical details.
Tasks
Published	2017-03-07
URL	http://arxiv.org/abs/1703.02419v2
PDF	http://arxiv.org/pdf/1703.02419v2.pdf
PWC	https://paperswithcode.com/paper/probabilistic-learning-of-nonlinear-dynamical
Repo
Framework

Real-time 3D Shape Instantiation from Single Fluoroscopy Projection for Fenestrated Stent Graft Deployment


Title	Real-time 3D Shape Instantiation from Single Fluoroscopy Projection for Fenestrated Stent Graft Deployment
Authors	Xiao-Yun Zhou, Jianyu Lin, Celia Riga, Guang-Zhong Yang, Su-Lin Lee
Abstract	Robot-assisted deployment of fenestrated stent grafts in Fenestrated Endovascular Aortic Repair (FEVAR) requires accurate geometrical alignment. Currently, this process is guided by 2D fluoroscopy, which is uninformative and error prone. In this paper, a real-time framework is proposed to instantiate the 3D shape of a fenestrated stent graft based on only a single low-dose 2D fluoroscopic image. Firstly, the fenestrated stent graft was placed with markers. Secondly, the 3D pose of each stent segment was instantiated by the RPnP (Robust Perspective-n-Point) method. Thirdly, the 3D shape of the whole stent graft was instantiated via graft gap interpolation. Focal-Unet was proposed to segment the markers from 2D fluoroscopic images to achieve semi-automatic marker detection. The proposed framework was validated on five patient-specific 3D printed phantoms of aortic aneurysms and three stent grafts with new marker placements, showing an average distance error of 1-3mm and an average angle error of 4 degree.
Tasks
Published	2017-09-22
URL	http://arxiv.org/abs/1709.07689v2
PDF	http://arxiv.org/pdf/1709.07689v2.pdf
PWC	https://paperswithcode.com/paper/real-time-3d-shape-instantiation-from-single
Repo
Framework

Derivation of the Asymptotic Eigenvalue Distribution for Causal 2D-AR Models under Upscaling


Title	Derivation of the Asymptotic Eigenvalue Distribution for Causal 2D-AR Models under Upscaling
Authors	David Vázquez-Padín, Fernando Pérez-González, Pedro Comesaña-Alfaro
Abstract	This technical report describes the derivation of the asymptotic eigenvalue distribution for causal 2D-AR models under an upscaling scenario. Specifically, it tackles the analytical derivation of the asymptotic eigenvalue distribution of the sample autocorrelation matrix corresponding to genuine and upscaled images. It also includes the pseudocode of the derived approaches for resampling detection and resampling factor estimation that are based on this analysis.
Tasks
Published	2017-04-19
URL	http://arxiv.org/abs/1704.05773v1
PDF	http://arxiv.org/pdf/1704.05773v1.pdf
PWC	https://paperswithcode.com/paper/derivation-of-the-asymptotic-eigenvalue
Repo
Framework

Sentiment Recognition in Egocentric Photostreams


Title	Sentiment Recognition in Egocentric Photostreams
Authors	Estefania Talavera, Nicola Strisciuglio, Nicolai Petkov, Petia Radeva
Abstract	Lifelogging is a process of collecting rich source of information about daily life of people. In this paper, we introduce the problem of sentiment analysis in egocentric events focusing on the moments that compose the images recalling positive, neutral or negative feelings to the observer. We propose a method for the classification of the sentiments in egocentric pictures based on global and semantic image features extracted by Convolutional Neural Networks. We carried out experiments on an egocentric dataset, which we organized in 3 classes on the basis of the sentiment that is recalled to the user (positive, negative or neutral).
Tasks	Sentiment Analysis
Published	2017-03-29
URL	http://arxiv.org/abs/1703.09933v1
PDF	http://arxiv.org/pdf/1703.09933v1.pdf
PWC	https://paperswithcode.com/paper/sentiment-recognition-in-egocentric
Repo
Framework

Long-Term Memory Networks for Question Answering


Title	Long-Term Memory Networks for Question Answering
Authors	Fenglong Ma, Radha Chitta, Saurabh Kataria, Jing Zhou, Palghat Ramesh, Tong Sun, Jing Gao
Abstract	Question answering is an important and difficult task in the natural language processing domain, because many basic natural language processing tasks can be cast into a question answering task. Several deep neural network architectures have been developed recently, which employ memory and inference components to memorize and reason over text information, and generate answers to questions. However, a major drawback of many such models is that they are capable of only generating single-word answers. In addition, they require large amount of training data to generate accurate answers. In this paper, we introduce the Long-Term Memory Network (LTMN), which incorporates both an external memory module and a Long Short-Term Memory (LSTM) module to comprehend the input data and generate multi-word answers. The LTMN model can be trained end-to-end using back-propagation and requires minimal supervision. We test our model on two synthetic data sets (based on Facebook’s bAbI data set) and the real-world Stanford question answering data set, and show that it can achieve state-of-the-art performance.
Tasks	Question Answering
Published	2017-07-06
URL	http://arxiv.org/abs/1707.01961v1
PDF	http://arxiv.org/pdf/1707.01961v1.pdf
PWC	https://paperswithcode.com/paper/long-term-memory-networks-for-question
Repo
Framework

How a General-Purpose Commonsense Ontology can Improve Performance of Learning-Based Image Retrieval


Title	How a General-Purpose Commonsense Ontology can Improve Performance of Learning-Based Image Retrieval
Authors	Rodrigo Toro Icarte, Jorge A. Baier, Cristian Ruz, Alvaro Soto
Abstract	The knowledge representation community has built general-purpose ontologies which contain large amounts of commonsense knowledge over relevant aspects of the world, including useful visual information, e.g.: “a ball is used by a football player”, “a tennis player is located at a tennis court”. Current state-of-the-art approaches for visual recognition do not exploit these rule-based knowledge sources. Instead, they learn recognition models directly from training examples. In this paper, we study how general-purpose ontologies—specifically, MIT’s ConceptNet ontology—can improve the performance of state-of-the-art vision systems. As a testbed, we tackle the problem of sentence-based image retrieval. Our retrieval approach incorporates knowledge from ConceptNet on top of a large pool of object detectors derived from a deep learning technique. In our experiments, we show that ConceptNet can improve performance on a common benchmark dataset. Key to our performance is the use of the ESPGAME dataset to select visually relevant relations from ConceptNet. Consequently, a main conclusion of this work is that general-purpose commonsense ontologies improve performance on visual reasoning tasks when properly filtered to select meaningful visual relations.
Tasks	Image Retrieval, Visual Reasoning
Published	2017-05-24
URL	http://arxiv.org/abs/1705.08844v1
PDF	http://arxiv.org/pdf/1705.08844v1.pdf
PWC	https://paperswithcode.com/paper/how-a-general-purpose-commonsense-ontology
Repo
Framework

An Iterative Regression Approach for Face Pose Estimation from RGB Images


Title	An Iterative Regression Approach for Face Pose Estimation from RGB Images
Authors	Wenye He
Abstract	This paper presents a iterative optimization method, explicit shape regression, for face pose detection and localization. The regression function is learnt to find out the entire facial shape and minimize the alignment errors. A cascaded learning framework is employed to enhance shape constraint during detection. A combination of a two-level boosted regression, shape indexed features and a correlation-based feature selection method is used to improve the performance. In this paper, we have explain the advantage of ESR for deformable object like face pose estimation and reveal its generic applications of the method. In the experiment, we compare the results with different work and demonstrate the accuracy and robustness in different scenarios.
Tasks	Feature Selection, Pose Estimation
Published	2017-09-10
URL	http://arxiv.org/abs/1709.03170v1
PDF	http://arxiv.org/pdf/1709.03170v1.pdf
PWC	https://paperswithcode.com/paper/an-iterative-regression-approach-for-face
Repo
Framework

TraX: The visual Tracking eXchange Protocol and Library


Title	TraX: The visual Tracking eXchange Protocol and Library
Authors	Luka Čehovin
Abstract	In this paper we address the problem of developing on-line visual tracking algorithms. We present a specialized communication protocol that serves as a bridge between a tracker implementation and utilizing application. It decouples development of algorithms and application, encouraging re-usability. The primary use case is algorithm evaluation where the protocol facilitates more complex evaluation scenarios that are used nowadays thus pushing forward the field of visual tracking. We present a reference implementation of the protocol that makes it easy to use in several popular programming languages and discuss where the protocol is already used and some usage scenarios that we envision for the future.
Tasks	Visual Tracking
Published	2017-05-12
URL	http://arxiv.org/abs/1705.04469v1
PDF	http://arxiv.org/pdf/1705.04469v1.pdf
PWC	https://paperswithcode.com/paper/trax-the-visual-tracking-exchange-protocol
Repo
Framework

Ω-Net (Omega-Net): Fully Automatic, Multi-View Cardiac MR Detection, Orientation, and Segmentation with Deep Neural Networks


Title	Ω-Net (Omega-Net): Fully Automatic, Multi-View Cardiac MR Detection, Orientation, and Segmentation with Deep Neural Networks
Authors	Davis M. Vigneault, Weidi Xie, Carolyn Y. Ho, David A. Bluemke, J. Alison Noble
Abstract	Pixelwise segmentation of the left ventricular (LV) myocardium and the four cardiac chambers in 2-D steady state free precession (SSFP) cine sequences is an essential preprocessing step for a wide range of analyses. Variability in contrast, appearance, orientation, and placement of the heart between patients, clinical views, scanners, and protocols makes fully automatic semantic segmentation a notoriously difficult problem. Here, we present ${\Omega}$-Net (Omega-Net): a novel convolutional neural network (CNN) architecture for simultaneous localization, transformation into a canonical orientation, and semantic segmentation. First, an initial segmentation is performed on the input image, second, the features learned during this initial segmentation are used to predict the parameters needed to transform the input image into a canonical orientation, and third, a final segmentation is performed on the transformed image. In this work, ${\Omega}$-Nets of varying depths were trained to detect five foreground classes in any of three clinical views (short axis, SA, four-chamber, 4C, two-chamber, 2C), without prior knowledge of the view being segmented. The architecture was trained on a cohort of patients with hypertrophic cardiomyopathy and healthy control subjects. Network performance as measured by weighted foreground intersection-over-union (IoU) was substantially improved in the best-performing ${\Omega}$- Net compared with U-Net segmentation without localization or orientation. In addition, {\Omega}-Net was retrained from scratch on the 2017 MICCAI ACDC dataset, and achieves state-of-the-art results on the LV and RV bloodpools, and performed slightly worse in segmentation of the LV myocardium. We conclude this architecture represents a substantive advancement over prior approaches, with implications for biomedical image segmentation more generally.
Tasks	Semantic Segmentation
Published	2017-11-03
URL	http://arxiv.org/abs/1711.01094v3
PDF	http://arxiv.org/pdf/1711.01094v3.pdf
PWC	https://paperswithcode.com/paper/-net-omega-net-fully-automatic-multi-view
Repo
Framework