July 29, 2019

2818 words 14 mins read

Paper Group ANR 101

Paper Group ANR 101

A Review of Convolutional Neural Networks for Inverse Problems in Imaging. What does 2D geometric information really tell us about 3D face shape?. Investigating Recurrence and Eligibility Traces in Deep Q-Networks. How to Fool Radiologists with Generative Adversarial Networks? A Visual Turing Test for Lung Cancer Diagnosis. Forecasting Human Dynami …

A Review of Convolutional Neural Networks for Inverse Problems in Imaging

Title A Review of Convolutional Neural Networks for Inverse Problems in Imaging
Authors Michael T. McCann, Kyong Hwan Jin, Michael Unser
Abstract In this survey paper, we review recent uses of convolution neural networks (CNNs) to solve inverse problems in imaging. It has recently become feasible to train deep CNNs on large databases of images, and they have shown outstanding performance on object classification and segmentation tasks. Motivated by these successes, researchers have begun to apply CNNs to the resolution of inverse problems such as denoising, deconvolution, super-resolution, and medical image reconstruction, and they have started to report improvements over state-of-the-art methods, including sparsity-based techniques such as compressed sensing. Here, we review the recent experimental work in these areas, with a focus on the critical design decisions: Where does the training data come from? What is the architecture of the CNN? and How is the learning problem formulated and solved? We also bring together a few key theoretical papers that offer perspective on why CNNs are appropriate for inverse problems and point to some next steps in the field.
Tasks Denoising, Image Reconstruction, Object Classification, Super-Resolution
Published 2017-10-11
URL http://arxiv.org/abs/1710.04011v1
PDF http://arxiv.org/pdf/1710.04011v1.pdf
PWC https://paperswithcode.com/paper/a-review-of-convolutional-neural-networks-for
Repo
Framework

What does 2D geometric information really tell us about 3D face shape?

Title What does 2D geometric information really tell us about 3D face shape?
Authors Anil Bas, William A. P. Smith
Abstract A face image contains geometric cues in the form of configurational information and contours that can be used to estimate 3D face shape. While it is clear that 3D reconstruction from 2D points is highly ambiguous if no further constraints are enforced, one might expect that the face-space constraint solves this problem. We show that this is not the case and that geometric information is an ambiguous cue. There are two sources for this ambiguity. The first is that, within the space of 3D face shapes, there are flexibility modes that remain when some parts of the face are fixed. The second occurs only under perspective projection and is a result of perspective transformation as camera distance varies. Two different faces, when viewed at different distances, can give rise to the same 2D geometry. To demonstrate these ambiguities, we develop new algorithms for fitting a 3D morphable model to 2D landmarks or contours under either orthographic or perspective projection and show how to compute flexibility modes for both cases. We show that both fitting problems can be posed as a separable nonlinear least squares problem and solved efficiently. We demonstrate both quantitatively and qualitatively that the ambiguity is present in reconstructions from geometric information alone but also in reconstructions from a state-of-the-art CNN-based method.
Tasks 3D Reconstruction
Published 2017-08-22
URL https://arxiv.org/abs/1708.06703v3
PDF https://arxiv.org/pdf/1708.06703v3.pdf
PWC https://paperswithcode.com/paper/what-does-2d-geometric-information-really
Repo
Framework

Investigating Recurrence and Eligibility Traces in Deep Q-Networks

Title Investigating Recurrence and Eligibility Traces in Deep Q-Networks
Authors Jean Harb, Doina Precup
Abstract Eligibility traces in reinforcement learning are used as a bias-variance trade-off and can often speed up training time by propagating knowledge back over time-steps in a single update. We investigate the use of eligibility traces in combination with recurrent networks in the Atari domain. We illustrate the benefits of both recurrent nets and eligibility traces in some Atari games, and highlight also the importance of the optimization used in the training.
Tasks Atari Games
Published 2017-04-18
URL http://arxiv.org/abs/1704.05495v1
PDF http://arxiv.org/pdf/1704.05495v1.pdf
PWC https://paperswithcode.com/paper/investigating-recurrence-and-eligibility
Repo
Framework

How to Fool Radiologists with Generative Adversarial Networks? A Visual Turing Test for Lung Cancer Diagnosis

Title How to Fool Radiologists with Generative Adversarial Networks? A Visual Turing Test for Lung Cancer Diagnosis
Authors Maria J. M. Chuquicusma, Sarfaraz Hussein, Jeremy Burt, Ulas Bagci
Abstract Discriminating lung nodules as malignant or benign is still an underlying challenge. To address this challenge, radiologists need computer aided diagnosis (CAD) systems which can assist in learning discriminative imaging features corresponding to malignant and benign nodules. However, learning highly discriminative imaging features is an open problem. In this paper, our aim is to learn the most discriminative features pertaining to lung nodules by using an adversarial learning methodology. Specifically, we propose to use unsupervised learning with Deep Convolutional-Generative Adversarial Networks (DC-GANs) to generate lung nodule samples realistically. We hypothesize that imaging features of lung nodules will be discriminative if it is hard to differentiate them (fake) from real (true) nodules. To test this hypothesis, we present Visual Turing tests to two radiologists in order to evaluate the quality of the generated (fake) nodules. Extensive comparisons are performed in discerning real, generated, benign, and malignant nodules. This experimental set up allows us to validate the overall quality of the generated nodules, which can then be used to (1) improve diagnostic decisions by mining highly discriminative imaging features, (2) train radiologists for educational purposes, and (3) generate realistic samples to train deep networks with big data.
Tasks Lung Cancer Diagnosis
Published 2017-10-26
URL http://arxiv.org/abs/1710.09762v2
PDF http://arxiv.org/pdf/1710.09762v2.pdf
PWC https://paperswithcode.com/paper/how-to-fool-radiologists-with-generative
Repo
Framework

Forecasting Human Dynamics from Static Images

Title Forecasting Human Dynamics from Static Images
Authors Yu-Wei Chao, Jimei Yang, Brian Price, Scott Cohen, Jia Deng
Abstract This paper presents the first study on forecasting human dynamics from static images. The problem is to input a single RGB image and generate a sequence of upcoming human body poses in 3D. To address the problem, we propose the 3D Pose Forecasting Network (3D-PFNet). Our 3D-PFNet integrates recent advances on single-image human pose estimation and sequence prediction, and converts the 2D predictions into 3D space. We train our 3D-PFNet using a three-step training strategy to leverage a diverse source of training data, including image and video based human pose datasets and 3D motion capture (MoCap) data. We demonstrate competitive performance of our 3D-PFNet on 2D pose forecasting and 3D pose recovery through quantitative and qualitative results.
Tasks Human Dynamics, Motion Capture, Pose Estimation
Published 2017-04-11
URL http://arxiv.org/abs/1704.03432v1
PDF http://arxiv.org/pdf/1704.03432v1.pdf
PWC https://paperswithcode.com/paper/forecasting-human-dynamics-from-static-images
Repo
Framework

Filtering Tweets for Social Unrest

Title Filtering Tweets for Social Unrest
Authors Alan Mishler, Kevin Wonus, Wendy Chambers, Michael Bloodgood
Abstract Since the events of the Arab Spring, there has been increased interest in using social media to anticipate social unrest. While efforts have been made toward automated unrest prediction, we focus on filtering the vast volume of tweets to identify tweets relevant to unrest, which can be provided to downstream users for further analysis. We train a supervised classifier that is able to label Arabic language tweets as relevant to unrest with high reliability. We examine the relationship between training data size and performance and investigate ways to optimize the model building process while minimizing cost. We also explore how confidence thresholds can be set to achieve desired levels of performance.
Tasks
Published 2017-02-20
URL http://arxiv.org/abs/1702.06216v2
PDF http://arxiv.org/pdf/1702.06216v2.pdf
PWC https://paperswithcode.com/paper/filtering-tweets-for-social-unrest
Repo
Framework

Reducing Bias in Production Speech Models

Title Reducing Bias in Production Speech Models
Authors Eric Battenberg, Rewon Child, Adam Coates, Christopher Fougner, Yashesh Gaur, Jiaji Huang, Heewoo Jun, Ajay Kannan, Markus Kliegl, Atul Kumar, Hairong Liu, Vinay Rao, Sanjeev Satheesh, David Seetapun, Anuroop Sriram, Zhenyao Zhu
Abstract Replacing hand-engineered pipelines with end-to-end deep learning systems has enabled strong results in applications like speech and object recognition. However, the causality and latency constraints of production systems put end-to-end speech models back into the underfitting regime and expose biases in the model that we show cannot be overcome by “scaling up”, i.e., training bigger models on more data. In this work we systematically identify and address sources of bias, reducing error rates by up to 20% while remaining practical for deployment. We achieve this by utilizing improved neural architectures for streaming inference, solving optimization issues, and employing strategies that increase audio and label modelling versatility.
Tasks Object Recognition
Published 2017-05-11
URL http://arxiv.org/abs/1705.04400v1
PDF http://arxiv.org/pdf/1705.04400v1.pdf
PWC https://paperswithcode.com/paper/reducing-bias-in-production-speech-models
Repo
Framework

Deep Generative Adversarial Neural Networks for Realistic Prostate Lesion MRI Synthesis

Title Deep Generative Adversarial Neural Networks for Realistic Prostate Lesion MRI Synthesis
Authors Andy Kitchen, Jarrel Seah
Abstract Generative Adversarial Neural Networks (GANs) are applied to the synthetic generation of prostate lesion MRI images. GANs have been applied to a variety of natural images, is shown show that the same techniques can be used in the medical domain to create realistic looking synthetic lesion images. 16mm x 16mm patches are extracted from 330 MRI scans from the SPIE ProstateX Challenge 2016 and used to train a Deep Convolutional Generative Adversarial Neural Network (DCGAN) utilizing cutting edge techniques. Synthetic outputs are compared to real images and the implicit latent representations induced by the GAN are explored. Training techniques and successful neural network architectures are explained in detail.
Tasks
Published 2017-08-01
URL http://arxiv.org/abs/1708.00129v1
PDF http://arxiv.org/pdf/1708.00129v1.pdf
PWC https://paperswithcode.com/paper/deep-generative-adversarial-neural-networks
Repo
Framework

Layerwise Systematic Scan: Deep Boltzmann Machines and Beyond

Title Layerwise Systematic Scan: Deep Boltzmann Machines and Beyond
Authors Heng Guo, Kaan Kara, Ce Zhang
Abstract For Markov chain Monte Carlo methods, one of the greatest discrepancies between theory and system is the scan order - while most theoretical development on the mixing time analysis deals with random updates, real-world systems are implemented with systematic scans. We bridge this gap for models that exhibit a bipartite structure, including, most notably, the Restricted/Deep Boltzmann Machine. The de facto implementation for these models scans variables in a layerwise fashion. We show that the Gibbs sampler with a layerwise alternating scan order has its relaxation time (in terms of epochs) no larger than that of a random-update Gibbs sampler (in terms of variable updates). We also construct examples to show that this bound is asymptotically tight. Through standard inequalities, our result also implies a comparison on the mixing times.
Tasks
Published 2017-05-15
URL http://arxiv.org/abs/1705.05154v2
PDF http://arxiv.org/pdf/1705.05154v2.pdf
PWC https://paperswithcode.com/paper/layerwise-systematic-scan-deep-boltzmann
Repo
Framework

Multiagent Simple Temporal Problem: The Arc-Consistency Approach

Title Multiagent Simple Temporal Problem: The Arc-Consistency Approach
Authors Shufeng Kong, Jae Hee Lee, Sanjiang Li
Abstract The Simple Temporal Problem (STP) is a fundamental temporal reasoning problem and has recently been extended to the Multiagent Simple Temporal Problem (MaSTP). In this paper we present a novel approach that is based on enforcing arc-consistency (AC) on the input (multiagent) simple temporal network. We show that the AC-based approach is sufficient for solving both the STP and MaSTP and provide efficient algorithms for them. As our AC-based approach does not impose new constraints between agents, it does not violate the privacy of the agents and is superior to the state-of-the-art approach to MaSTP. Empirical evaluations on diverse benchmark datasets also show that our AC-based algorithms for STP and MaSTP are significantly more efficient than existing approaches.
Tasks
Published 2017-11-22
URL http://arxiv.org/abs/1711.08151v1
PDF http://arxiv.org/pdf/1711.08151v1.pdf
PWC https://paperswithcode.com/paper/multiagent-simple-temporal-problem-the-arc
Repo
Framework

Ternary Residual Networks

Title Ternary Residual Networks
Authors Abhisek Kundu, Kunal Banerjee, Naveen Mellempudi, Dheevatsa Mudigere, Dipankar Das, Bharat Kaul, Pradeep Dubey
Abstract Sub-8-bit representation of DNNs incur some discernible loss of accuracy despite rigorous (re)training at low-precision. Such loss of accuracy essentially makes them equivalent to a much shallower counterpart, diminishing the power of being deep networks. To address this problem of accuracy drop we introduce the notion of \textit{residual networks} where we add more low-precision edges to sensitive branches of the sub-8-bit network to compensate for the lost accuracy. Further, we present a perturbation theory to identify such sensitive edges. Aided by such an elegant trade-off between accuracy and compute, the 8-2 model (8-bit activations, ternary weights), enhanced by ternary residual edges, turns out to be sophisticated enough to achieve very high accuracy ($\sim 1%$ drop from our FP-32 baseline), despite $\sim 1.6\times$ reduction in model size, $\sim 26\times$ reduction in number of multiplications, and potentially $\sim 2\times$ power-performance gain comparing to 8-8 representation, on the state-of-the-art deep network ResNet-101 pre-trained on ImageNet dataset. Moreover, depending on the varying accuracy requirements in a dynamic environment, the deployed low-precision model can be upgraded/downgraded on-the-fly by partially enabling/disabling residual connections. For example, disabling the least important residual connections in the above enhanced network, the accuracy drop is $\sim 2%$ (from FP32), despite $\sim 1.9\times$ reduction in model size, $\sim 32\times$ reduction in number of multiplications, and potentially $\sim 2.3\times$ power-performance gain comparing to 8-8 representation. Finally, all the ternary connections are sparse in nature, and the ternary residual conversion can be done in a resource-constraint setting with no low-precision (re)training.
Tasks
Published 2017-07-15
URL http://arxiv.org/abs/1707.04679v2
PDF http://arxiv.org/pdf/1707.04679v2.pdf
PWC https://paperswithcode.com/paper/ternary-residual-networks
Repo
Framework

On the Synthesis of Guaranteed-Quality Plans for Robot Fleets in Logistics Scenarios via Optimization Modulo Theories

Title On the Synthesis of Guaranteed-Quality Plans for Robot Fleets in Logistics Scenarios via Optimization Modulo Theories
Authors Francesco Leofante, Erika Ábrahám, Tim Niemueller, Gerhard Lakemeyer, Armando Tacchella
Abstract In manufacturing, the increasing involvement of autonomous robots in production processes poses new challenges on the production management. In this paper we report on the usage of Optimization Modulo Theories (OMT) to solve certain multi-robot scheduling problems in this area. Whereas currently existing methods are heuristic, our approach guarantees optimality for the computed solution. We do not only present our final method but also its chronological development, and draw some general observations for the development of OMT-based approaches.
Tasks
Published 2017-11-12
URL http://arxiv.org/abs/1711.04259v1
PDF http://arxiv.org/pdf/1711.04259v1.pdf
PWC https://paperswithcode.com/paper/on-the-synthesis-of-guaranteed-quality-plans
Repo
Framework

Fast multi-output relevance vector regression

Title Fast multi-output relevance vector regression
Authors Youngmin Ha
Abstract This paper aims to decrease the time complexity of multi-output relevance vector regression from O(VM^3) to O(V^3+M^3), where V is the number of output dimensions, M is the number of basis functions, and V<M. The experimental results demonstrate that the proposed method is more competitive than the existing method, with regard to computation time. MATLAB codes are available at http://www.mathworks.com/matlabcentral/fileexchange/49131.
Tasks
Published 2017-04-17
URL http://arxiv.org/abs/1704.05041v1
PDF http://arxiv.org/pdf/1704.05041v1.pdf
PWC https://paperswithcode.com/paper/fast-multi-output-relevance-vector-regression
Repo
Framework

Visual Tracking via Dynamic Graph Learning

Title Visual Tracking via Dynamic Graph Learning
Authors Chenglong Li, Liang Lin, Wangmeng Zuo, Jin Tang, Ming-Hsuan Yang
Abstract Existing visual tracking methods usually localize a target object with a bounding box, in which the performance of the foreground object trackers or detectors is often affected by the inclusion of background clutter. To handle this problem, we learn a patch-based graph representation for visual tracking. The tracked object is modeled by with a graph by taking a set of non-overlapping image patches as nodes, in which the weight of each node indicates how likely it belongs to the foreground and edges are weighted for indicating the appearance compatibility of two neighboring nodes. This graph is dynamically learned and applied in object tracking and model updating. During the tracking process, the proposed algorithm performs three main steps in each frame. First, the graph is initialized by assigning binary weights of some image patches to indicate the object and background patches according to the predicted bounding box. Second, the graph is optimized to refine the patch weights by using a novel alternating direction method of multipliers. Third, the object feature representation is updated by imposing the weights of patches on the extracted image features. The object location is predicted by maximizing the classification score in the structured support vector machine. Extensive experiments show that the proposed tracking algorithm performs well against the state-of-the-art methods on large-scale benchmark datasets.
Tasks Object Tracking, Visual Tracking
Published 2017-10-04
URL http://arxiv.org/abs/1710.01444v2
PDF http://arxiv.org/pdf/1710.01444v2.pdf
PWC https://paperswithcode.com/paper/visual-tracking-via-dynamic-graph-learning
Repo
Framework

Creatism: A deep-learning photographer capable of creating professional work

Title Creatism: A deep-learning photographer capable of creating professional work
Authors Hui Fang, Meng Zhang
Abstract Machine-learning excels in many areas with well-defined goals. However, a clear goal is usually not available in art forms, such as photography. The success of a photograph is measured by its aesthetic value, a very subjective concept. This adds to the challenge for a machine learning approach. We introduce Creatism, a deep-learning system for artistic content creation. In our system, we break down aesthetics into multiple aspects, each can be learned individually from a shared dataset of professional examples. Each aspect corresponds to an image operation that can be optimized efficiently. A novel editing tool, dramatic mask, is introduced as one operation that improves dramatic lighting for a photo. Our training does not require a dataset with before/after image pairs, or any additional labels to indicate different aspects in aesthetics. Using our system, we mimic the workflow of a landscape photographer, from framing for the best composition to carrying out various post-processing operations. The environment for our virtual photographer is simulated by a collection of panorama images from Google Street View. We design a “Turing-test”-like experiment to objectively measure quality of its creations, where professional photographers rate a mixture of photographs from different sources blindly. Experiments show that a portion of our robot’s creation can be confused with professional work.
Tasks
Published 2017-07-11
URL http://arxiv.org/abs/1707.03491v1
PDF http://arxiv.org/pdf/1707.03491v1.pdf
PWC https://paperswithcode.com/paper/creatism-a-deep-learning-photographer-capable
Repo
Framework
comments powered by Disqus