Paper Group ANR 137
A practical approach to dialogue response generation in closed domains. Classification Accuracy Improvement for Neuromorphic Computing Systems with One-level Precision Synapses. Deep image representations using caption generators. Nonparametric Inference for Auto-Encoding Variational Bayes. Densely tracking sequences of 3D face scans. Poseidon: An …
A practical approach to dialogue response generation in closed domains
Title | A practical approach to dialogue response generation in closed domains |
Authors | Yichao Lu, Phillip Keung, Shaonan Zhang, Jason Sun, Vikas Bhardwaj |
Abstract | We describe a prototype dialogue response generation model for the customer service domain at Amazon. The model, which is trained in a weakly supervised fashion, measures the similarity between customer questions and agent answers using a dual encoder network, a Siamese-like neural network architecture. Answer templates are extracted from embeddings derived from past agent answers, without turn-by-turn annotations. Responses to customer inquiries are generated by selecting the best template from the final set of templates. We show that, in a closed domain like customer service, the selected templates cover $>$70% of past customer inquiries. Furthermore, the relevance of the model-selected templates is significantly higher than templates selected by a standard tf-idf baseline. |
Tasks | |
Published | 2017-03-28 |
URL | http://arxiv.org/abs/1703.09439v1 |
http://arxiv.org/pdf/1703.09439v1.pdf | |
PWC | https://paperswithcode.com/paper/a-practical-approach-to-dialogue-response |
Repo | |
Framework | |
Classification Accuracy Improvement for Neuromorphic Computing Systems with One-level Precision Synapses
Title | Classification Accuracy Improvement for Neuromorphic Computing Systems with One-level Precision Synapses |
Authors | Yandan Wang, Wei Wen, Linghao Song, Hai Li |
Abstract | Brain inspired neuromorphic computing has demonstrated remarkable advantages over traditional von Neumann architecture for its high energy efficiency and parallel data processing. However, the limited resolution of synaptic weights degrades system accuracy and thus impedes the use of neuromorphic systems. In this work, we propose three orthogonal methods to learn synapses with one-level precision, namely, distribution-aware quantization, quantization regularization and bias tuning, to make image classification accuracy comparable to the state-of-the-art. Experiments on both multi-layer perception and convolutional neural networks show that the accuracy drop can be well controlled within 0.19% (5.53%) for MNIST (CIFAR-10) database, compared to an ideal system without quantization. |
Tasks | Image Classification, Quantization |
Published | 2017-01-07 |
URL | http://arxiv.org/abs/1701.01791v1 |
http://arxiv.org/pdf/1701.01791v1.pdf | |
PWC | https://paperswithcode.com/paper/classification-accuracy-improvement-for |
Repo | |
Framework | |
Deep image representations using caption generators
Title | Deep image representations using caption generators |
Authors | Konda Reddy Mopuri, Vishal B. Athreya, R. Venkatesh Babu |
Abstract | Deep learning exploits large volumes of labeled data to learn powerful models. When the target dataset is small, it is a common practice to perform transfer learning using pre-trained models to learn new task specific representations. However, pre-trained CNNs for image recognition are provided with limited information about the image during training, which is label alone. Tasks such as scene retrieval suffer from features learned from this weak supervision and require stronger supervision to better understand the contents of the image. In this paper, we exploit the features learned from caption generating models to learn novel task specific image representations. In particular, we consider the state-of-the art captioning system Show and Tell~\cite{SnT-pami-2016} and the dense region description model DenseCap~\cite{densecap-cvpr-2016}. We demonstrate that, owing to richer supervision provided during the process of training, the features learned by the captioning system perform better than those of CNNs. Further, we train a siamese network with a modified pair-wise loss to fuse the features learned by~\cite{SnT-pami-2016} and~\cite{densecap-cvpr-2016} and learn image representations suitable for retrieval. Experiments show that the proposed fusion exploits the complementary nature of the individual features and yields state-of-the art retrieval results on benchmark datasets. |
Tasks | Transfer Learning |
Published | 2017-05-25 |
URL | http://arxiv.org/abs/1705.09142v1 |
http://arxiv.org/pdf/1705.09142v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-image-representations-using-caption |
Repo | |
Framework | |
Nonparametric Inference for Auto-Encoding Variational Bayes
Title | Nonparametric Inference for Auto-Encoding Variational Bayes |
Authors | Erik Bodin, Iman Malik, Carl Henrik Ek, Neill D. F. Campbell |
Abstract | We would like to learn latent representations that are low-dimensional and highly interpretable. A model that has these characteristics is the Gaussian Process Latent Variable Model. The benefits and negative of the GP-LVM are complementary to the Variational Autoencoder, the former provides interpretable low-dimensional latent representations while the latter is able to handle large amounts of data and can use non-Gaussian likelihoods. Our inspiration for this paper is to marry these two approaches and reap the benefits of both. In order to do so we will introduce a novel approximate inference scheme inspired by the GP-LVM and the VAE. We show experimentally that the approximation allows the capacity of the generative bottle-neck (Z) of the VAE to be arbitrarily large without losing a highly interpretable representation, allowing reconstruction quality to be unlimited by Z at the same time as a low-dimensional space can be used to perform ancestral sampling from as well as a means to reason about the embedded data. |
Tasks | |
Published | 2017-12-18 |
URL | http://arxiv.org/abs/1712.06536v1 |
http://arxiv.org/pdf/1712.06536v1.pdf | |
PWC | https://paperswithcode.com/paper/nonparametric-inference-for-auto-encoding |
Repo | |
Framework | |
Densely tracking sequences of 3D face scans
Title | Densely tracking sequences of 3D face scans |
Authors | Huaxiong Ding, Liming Chen |
Abstract | 3D face dense tracking aims to find dense inter-frame correspondences in a sequence of 3D face scans and constitutes a powerful tool for many face analysis tasks, e.g., 3D dynamic facial expression analysis. The majority of the existing methods just fit a 3D face surface or model to a 3D target surface without considering temporal information between frames. In this paper, we propose a novel method for densely tracking sequences of 3D face scans, which ex- tends the non-rigid ICP algorithm by adding a novel specific criterion for temporal information. A novel fitting framework is presented for automatically tracking a full sequence of 3D face scans. The results of experiments carried out on the BU4D-FE database are promising, showing that the proposed algorithm outperforms state-of-the-art algorithms for 3D face dense tracking. |
Tasks | |
Published | 2017-09-13 |
URL | http://arxiv.org/abs/1709.04295v1 |
http://arxiv.org/pdf/1709.04295v1.pdf | |
PWC | https://paperswithcode.com/paper/densely-tracking-sequences-of-3d-face-scans |
Repo | |
Framework | |
Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters
Title | Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters |
Authors | Hao Zhang, Zeyu Zheng, Shizhen Xu, Wei Dai, Qirong Ho, Xiaodan Liang, Zhiting Hu, Jinliang Wei, Pengtao Xie, Eric P. Xing |
Abstract | Deep learning models can take weeks to train on a single GPU-equipped machine, necessitating scaling out DL training to a GPU-cluster. However, current distributed DL implementations can scale poorly due to substantial parameter synchronization over the network, because the high throughput of GPUs allows more data batches to be processed per unit time than CPUs, leading to more frequent network synchronization. We present Poseidon, an efficient communication architecture for distributed DL on GPUs. Poseidon exploits the layered model structures in DL programs to overlap communication and computation, reducing bursty network communication. Moreover, Poseidon uses a hybrid communication scheme that optimizes the number of bytes required to synchronize each layer, according to layer properties and the number of machines. We show that Poseidon is applicable to different DL frameworks by plugging Poseidon into Caffe and TensorFlow. We show that Poseidon enables Caffe and TensorFlow to achieve 15.5x speed-up on 16 single-GPU machines, even with limited bandwidth (10GbE) and the challenging VGG19-22K network for image classification. Moreover, Poseidon-enabled TensorFlow achieves 31.5x speed-up with 32 single-GPU machines on Inception-V3, a 50% improvement over the open-source TensorFlow (20x speed-up). |
Tasks | Image Classification |
Published | 2017-06-11 |
URL | http://arxiv.org/abs/1706.03292v1 |
http://arxiv.org/pdf/1706.03292v1.pdf | |
PWC | https://paperswithcode.com/paper/poseidon-an-efficient-communication |
Repo | |
Framework | |
Probabilistic learning of nonlinear dynamical systems using sequential Monte Carlo
Title | Probabilistic learning of nonlinear dynamical systems using sequential Monte Carlo |
Authors | Thomas B. Schön, Andreas Svensson, Lawrence Murray, Fredrik Lindsten |
Abstract | Probabilistic modeling provides the capability to represent and manipulate uncertainty in data, models, predictions and decisions. We are concerned with the problem of learning probabilistic models of dynamical systems from measured data. Specifically, we consider learning of probabilistic nonlinear state-space models. There is no closed-form solution available for this problem, implying that we are forced to use approximations. In this tutorial we will provide a self-contained introduction to one of the state-of-the-art methods—the particle Metropolis–Hastings algorithm—which has proven to offer a practical approximation. This is a Monte Carlo based method, where the particle filter is used to guide a Markov chain Monte Carlo method through the parameter space. One of the key merits of the particle Metropolis–Hastings algorithm is that it is guaranteed to converge to the “true solution” under mild assumptions, despite being based on a particle filter with only a finite number of particles. We will also provide a motivating numerical example illustrating the method using a modeling language tailored for sequential Monte Carlo methods. The intention of modeling languages of this kind is to open up the power of sophisticated Monte Carlo methods—including particle Metropolis–Hastings—to a large group of users without requiring them to know all the underlying mathematical details. |
Tasks | |
Published | 2017-03-07 |
URL | http://arxiv.org/abs/1703.02419v2 |
http://arxiv.org/pdf/1703.02419v2.pdf | |
PWC | https://paperswithcode.com/paper/probabilistic-learning-of-nonlinear-dynamical |
Repo | |
Framework | |
Real-time 3D Shape Instantiation from Single Fluoroscopy Projection for Fenestrated Stent Graft Deployment
Title | Real-time 3D Shape Instantiation from Single Fluoroscopy Projection for Fenestrated Stent Graft Deployment |
Authors | Xiao-Yun Zhou, Jianyu Lin, Celia Riga, Guang-Zhong Yang, Su-Lin Lee |
Abstract | Robot-assisted deployment of fenestrated stent grafts in Fenestrated Endovascular Aortic Repair (FEVAR) requires accurate geometrical alignment. Currently, this process is guided by 2D fluoroscopy, which is uninformative and error prone. In this paper, a real-time framework is proposed to instantiate the 3D shape of a fenestrated stent graft based on only a single low-dose 2D fluoroscopic image. Firstly, the fenestrated stent graft was placed with markers. Secondly, the 3D pose of each stent segment was instantiated by the RPnP (Robust Perspective-n-Point) method. Thirdly, the 3D shape of the whole stent graft was instantiated via graft gap interpolation. Focal-Unet was proposed to segment the markers from 2D fluoroscopic images to achieve semi-automatic marker detection. The proposed framework was validated on five patient-specific 3D printed phantoms of aortic aneurysms and three stent grafts with new marker placements, showing an average distance error of 1-3mm and an average angle error of 4 degree. |
Tasks | |
Published | 2017-09-22 |
URL | http://arxiv.org/abs/1709.07689v2 |
http://arxiv.org/pdf/1709.07689v2.pdf | |
PWC | https://paperswithcode.com/paper/real-time-3d-shape-instantiation-from-single |
Repo | |
Framework | |
Derivation of the Asymptotic Eigenvalue Distribution for Causal 2D-AR Models under Upscaling
Title | Derivation of the Asymptotic Eigenvalue Distribution for Causal 2D-AR Models under Upscaling |
Authors | David Vázquez-Padín, Fernando Pérez-González, Pedro Comesaña-Alfaro |
Abstract | This technical report describes the derivation of the asymptotic eigenvalue distribution for causal 2D-AR models under an upscaling scenario. Specifically, it tackles the analytical derivation of the asymptotic eigenvalue distribution of the sample autocorrelation matrix corresponding to genuine and upscaled images. It also includes the pseudocode of the derived approaches for resampling detection and resampling factor estimation that are based on this analysis. |
Tasks | |
Published | 2017-04-19 |
URL | http://arxiv.org/abs/1704.05773v1 |
http://arxiv.org/pdf/1704.05773v1.pdf | |
PWC | https://paperswithcode.com/paper/derivation-of-the-asymptotic-eigenvalue |
Repo | |
Framework | |
Sentiment Recognition in Egocentric Photostreams
Title | Sentiment Recognition in Egocentric Photostreams |
Authors | Estefania Talavera, Nicola Strisciuglio, Nicolai Petkov, Petia Radeva |
Abstract | Lifelogging is a process of collecting rich source of information about daily life of people. In this paper, we introduce the problem of sentiment analysis in egocentric events focusing on the moments that compose the images recalling positive, neutral or negative feelings to the observer. We propose a method for the classification of the sentiments in egocentric pictures based on global and semantic image features extracted by Convolutional Neural Networks. We carried out experiments on an egocentric dataset, which we organized in 3 classes on the basis of the sentiment that is recalled to the user (positive, negative or neutral). |
Tasks | Sentiment Analysis |
Published | 2017-03-29 |
URL | http://arxiv.org/abs/1703.09933v1 |
http://arxiv.org/pdf/1703.09933v1.pdf | |
PWC | https://paperswithcode.com/paper/sentiment-recognition-in-egocentric |
Repo | |
Framework | |
Long-Term Memory Networks for Question Answering
Title | Long-Term Memory Networks for Question Answering |
Authors | Fenglong Ma, Radha Chitta, Saurabh Kataria, Jing Zhou, Palghat Ramesh, Tong Sun, Jing Gao |
Abstract | Question answering is an important and difficult task in the natural language processing domain, because many basic natural language processing tasks can be cast into a question answering task. Several deep neural network architectures have been developed recently, which employ memory and inference components to memorize and reason over text information, and generate answers to questions. However, a major drawback of many such models is that they are capable of only generating single-word answers. In addition, they require large amount of training data to generate accurate answers. In this paper, we introduce the Long-Term Memory Network (LTMN), which incorporates both an external memory module and a Long Short-Term Memory (LSTM) module to comprehend the input data and generate multi-word answers. The LTMN model can be trained end-to-end using back-propagation and requires minimal supervision. We test our model on two synthetic data sets (based on Facebook’s bAbI data set) and the real-world Stanford question answering data set, and show that it can achieve state-of-the-art performance. |
Tasks | Question Answering |
Published | 2017-07-06 |
URL | http://arxiv.org/abs/1707.01961v1 |
http://arxiv.org/pdf/1707.01961v1.pdf | |
PWC | https://paperswithcode.com/paper/long-term-memory-networks-for-question |
Repo | |
Framework | |
How a General-Purpose Commonsense Ontology can Improve Performance of Learning-Based Image Retrieval
Title | How a General-Purpose Commonsense Ontology can Improve Performance of Learning-Based Image Retrieval |
Authors | Rodrigo Toro Icarte, Jorge A. Baier, Cristian Ruz, Alvaro Soto |
Abstract | The knowledge representation community has built general-purpose ontologies which contain large amounts of commonsense knowledge over relevant aspects of the world, including useful visual information, e.g.: “a ball is used by a football player”, “a tennis player is located at a tennis court”. Current state-of-the-art approaches for visual recognition do not exploit these rule-based knowledge sources. Instead, they learn recognition models directly from training examples. In this paper, we study how general-purpose ontologies—specifically, MIT’s ConceptNet ontology—can improve the performance of state-of-the-art vision systems. As a testbed, we tackle the problem of sentence-based image retrieval. Our retrieval approach incorporates knowledge from ConceptNet on top of a large pool of object detectors derived from a deep learning technique. In our experiments, we show that ConceptNet can improve performance on a common benchmark dataset. Key to our performance is the use of the ESPGAME dataset to select visually relevant relations from ConceptNet. Consequently, a main conclusion of this work is that general-purpose commonsense ontologies improve performance on visual reasoning tasks when properly filtered to select meaningful visual relations. |
Tasks | Image Retrieval, Visual Reasoning |
Published | 2017-05-24 |
URL | http://arxiv.org/abs/1705.08844v1 |
http://arxiv.org/pdf/1705.08844v1.pdf | |
PWC | https://paperswithcode.com/paper/how-a-general-purpose-commonsense-ontology |
Repo | |
Framework | |
An Iterative Regression Approach for Face Pose Estimation from RGB Images
Title | An Iterative Regression Approach for Face Pose Estimation from RGB Images |
Authors | Wenye He |
Abstract | This paper presents a iterative optimization method, explicit shape regression, for face pose detection and localization. The regression function is learnt to find out the entire facial shape and minimize the alignment errors. A cascaded learning framework is employed to enhance shape constraint during detection. A combination of a two-level boosted regression, shape indexed features and a correlation-based feature selection method is used to improve the performance. In this paper, we have explain the advantage of ESR for deformable object like face pose estimation and reveal its generic applications of the method. In the experiment, we compare the results with different work and demonstrate the accuracy and robustness in different scenarios. |
Tasks | Feature Selection, Pose Estimation |
Published | 2017-09-10 |
URL | http://arxiv.org/abs/1709.03170v1 |
http://arxiv.org/pdf/1709.03170v1.pdf | |
PWC | https://paperswithcode.com/paper/an-iterative-regression-approach-for-face |
Repo | |
Framework | |
TraX: The visual Tracking eXchange Protocol and Library
Title | TraX: The visual Tracking eXchange Protocol and Library |
Authors | Luka Čehovin |
Abstract | In this paper we address the problem of developing on-line visual tracking algorithms. We present a specialized communication protocol that serves as a bridge between a tracker implementation and utilizing application. It decouples development of algorithms and application, encouraging re-usability. The primary use case is algorithm evaluation where the protocol facilitates more complex evaluation scenarios that are used nowadays thus pushing forward the field of visual tracking. We present a reference implementation of the protocol that makes it easy to use in several popular programming languages and discuss where the protocol is already used and some usage scenarios that we envision for the future. |
Tasks | Visual Tracking |
Published | 2017-05-12 |
URL | http://arxiv.org/abs/1705.04469v1 |
http://arxiv.org/pdf/1705.04469v1.pdf | |
PWC | https://paperswithcode.com/paper/trax-the-visual-tracking-exchange-protocol |
Repo | |
Framework | |
Ω-Net (Omega-Net): Fully Automatic, Multi-View Cardiac MR Detection, Orientation, and Segmentation with Deep Neural Networks
Title | Ω-Net (Omega-Net): Fully Automatic, Multi-View Cardiac MR Detection, Orientation, and Segmentation with Deep Neural Networks |
Authors | Davis M. Vigneault, Weidi Xie, Carolyn Y. Ho, David A. Bluemke, J. Alison Noble |
Abstract | Pixelwise segmentation of the left ventricular (LV) myocardium and the four cardiac chambers in 2-D steady state free precession (SSFP) cine sequences is an essential preprocessing step for a wide range of analyses. Variability in contrast, appearance, orientation, and placement of the heart between patients, clinical views, scanners, and protocols makes fully automatic semantic segmentation a notoriously difficult problem. Here, we present ${\Omega}$-Net (Omega-Net): a novel convolutional neural network (CNN) architecture for simultaneous localization, transformation into a canonical orientation, and semantic segmentation. First, an initial segmentation is performed on the input image, second, the features learned during this initial segmentation are used to predict the parameters needed to transform the input image into a canonical orientation, and third, a final segmentation is performed on the transformed image. In this work, ${\Omega}$-Nets of varying depths were trained to detect five foreground classes in any of three clinical views (short axis, SA, four-chamber, 4C, two-chamber, 2C), without prior knowledge of the view being segmented. The architecture was trained on a cohort of patients with hypertrophic cardiomyopathy and healthy control subjects. Network performance as measured by weighted foreground intersection-over-union (IoU) was substantially improved in the best-performing ${\Omega}$- Net compared with U-Net segmentation without localization or orientation. In addition, {\Omega}-Net was retrained from scratch on the 2017 MICCAI ACDC dataset, and achieves state-of-the-art results on the LV and RV bloodpools, and performed slightly worse in segmentation of the LV myocardium. We conclude this architecture represents a substantive advancement over prior approaches, with implications for biomedical image segmentation more generally. |
Tasks | Semantic Segmentation |
Published | 2017-11-03 |
URL | http://arxiv.org/abs/1711.01094v3 |
http://arxiv.org/pdf/1711.01094v3.pdf | |
PWC | https://paperswithcode.com/paper/-net-omega-net-fully-automatic-multi-view |
Repo | |
Framework | |