January 26, 2020

3305 words 16 mins read

Paper Group ANR 1379

Quantile QT-Opt for Risk-Aware Vision-Based Robotic Grasping. Generating Long Sequences with Sparse Transformers. Elements of Sequential Monte Carlo. Effective Training of Convolutional Neural Networks with Low-bitwidth Weights and Activations. Deep Learning Methods for Parallel Magnetic Resonance Image Reconstruction. Transform Learning for Magnet …

Quantile QT-Opt for Risk-Aware Vision-Based Robotic Grasping


Title	Quantile QT-Opt for Risk-Aware Vision-Based Robotic Grasping
Authors	Cristian Bodnar, Adrian Li, Karol Hausman, Peter Pastor, Mrinal Kalakrishnan
Abstract	The distributional perspective on reinforcement learning (RL) has given rise to a series of successful Q-learning algorithms, resulting in state-of-the-art performance in arcade game environments. However, it has not yet been analyzed how these findings from a discrete setting translate to complex practical applications characterized by noisy, high dimensional and continuous state-action spaces. In this work, we propose Quantile QT-Opt (Q2-Opt), a distributional variant of the recently introduced distributed Q-learning algorithm for continuous domains, and examine its behaviour in a series of simulated and real vision-based robotic grasping tasks. The absence of an actor in Q2-Opt allows us to directly draw a parallel to the previous discrete experiments in the literature without the additional complexities induced by an actor-critic architecture. We demonstrate that Q2-Opt achieves a superior vision-based object grasping success rate, while also being more sample efficient. The distributional formulation also allows us to experiment with various risk distortion metrics that give us an indication of how robots can concretely manage risk in practice using a Deep RL control policy. As an additional contribution, we perform batch RL experiments in our virtual environment and compare them with the latest findings from discrete settings. Surprisingly, we find that the previous batch RL findings from the literature obtained on arcade game environments do not generalise to our setup.
Tasks	Q-Learning, Robotic Grasping
Published	2019-10-01
URL	https://arxiv.org/abs/1910.02787v2
PDF	https://arxiv.org/pdf/1910.02787v2.pdf
PWC	https://paperswithcode.com/paper/quantile-qt-opt-for-risk-aware-vision-based
Repo
Framework

Generating Long Sequences with Sparse Transformers


Title	Generating Long Sequences with Sparse Transformers
Authors	Rewon Child, Scott Gray, Alec Radford, Ilya Sutskever
Abstract	Transformers are powerful sequence models, but require time and memory that grows quadratically with the sequence length. In this paper we introduce sparse factorizations of the attention matrix which reduce this to $O(n \sqrt{n})$. We also introduce a) a variation on architecture and initialization to train deeper networks, b) the recomputation of attention matrices to save memory, and c) fast attention kernels for training. We call networks with these changes Sparse Transformers, and show they can model sequences tens of thousands of timesteps long using hundreds of layers. We use the same architecture to model images, audio, and text from raw bytes, setting a new state of the art for density modeling of Enwik8, CIFAR-10, and ImageNet-64. We generate unconditional samples that demonstrate global coherence and great diversity, and show it is possible in principle to use self-attention to model sequences of length one million or more.
Tasks
Published	2019-04-23
URL	http://arxiv.org/abs/1904.10509v1
PDF	http://arxiv.org/pdf/1904.10509v1.pdf
PWC	https://paperswithcode.com/paper/190410509
Repo
Framework

Elements of Sequential Monte Carlo


Title	Elements of Sequential Monte Carlo
Authors	Christian A. Naesseth, Fredrik Lindsten, Thomas B. Schön
Abstract	A core problem in statistics and probabilistic machine learning is to compute probability distributions and expectations. This is the fundamental problem of Bayesian statistics and machine learning, which frames all inference as expectations with respect to the posterior distribution. The key challenge is to approximate these intractable expectations. In this tutorial, we review sequential Monte Carlo (SMC), a random-sampling-based class of methods for approximate inference. First, we explain the basics of SMC, discuss practical issues, and review theoretical results. We then examine two of the main user design choices: the proposal distributions and the so called intermediate target distributions. We review recent results on how variational inference and amortization can be used to learn efficient proposals and target distributions. Next, we discuss the SMC estimate of the normalizing constant, how this can be used for pseudo-marginal inference and inference evaluation. Throughout the tutorial we illustrate the use of SMC on various models commonly used in machine learning, such as stochastic recurrent neural networks, probabilistic graphical models, and probabilistic programs.
Tasks	Bayesian Inference
Published	2019-03-12
URL	http://arxiv.org/abs/1903.04797v1
PDF	http://arxiv.org/pdf/1903.04797v1.pdf
PWC	https://paperswithcode.com/paper/elements-of-sequential-monte-carlo
Repo
Framework

Effective Training of Convolutional Neural Networks with Low-bitwidth Weights and Activations


Title	Effective Training of Convolutional Neural Networks with Low-bitwidth Weights and Activations
Authors	Bohan Zhuang, Jing Liu, Mingkui Tan, Lingqiao Liu, Ian Reid, Chunhua Shen
Abstract	This paper tackles the problem of training a deep convolutional neural network of both low-bitwidth weights and activations. Optimizing a low-precision network is very challenging due to the non-differentiability of the quantizer, which may result in substantial accuracy loss. To address this, we propose three practical approaches, including (i) progressive quantization; (ii) stochastic precision; and (iii) joint knowledge distillation to improve the network training. First, for progressive quantization, we propose two schemes to progressively find good local minima. Specifically, we propose to first optimize a net with quantized weights and subsequently quantize activations. This is in contrast to the traditional methods which optimize them simultaneously. Furthermore, we propose a second progressive quantization scheme which gradually decreases the bit-width from high-precision to low-precision during training. Second, to alleviate the excessive training burden due to the multi-round training stages, we further propose a one-stage stochastic precision strategy to randomly sample and quantize sub-networks while keeping other parts in full-precision. Finally, we adopt a novel learning scheme to jointly train a full-precision model alongside the low-precision one. By doing so, the full-precision model provides hints to guide the low-precision model training and significantly improves the performance of the low-precision network. Extensive experiments on various datasets (e.g., CIFAR-100, ImageNet) show the effectiveness of the proposed methods.
Tasks	Quantization
Published	2019-08-10
URL	https://arxiv.org/abs/1908.04680v1
PDF	https://arxiv.org/pdf/1908.04680v1.pdf
PWC	https://paperswithcode.com/paper/effective-training-of-convolutional-neural
Repo
Framework

Deep Learning Methods for Parallel Magnetic Resonance Image Reconstruction


Title	Deep Learning Methods for Parallel Magnetic Resonance Image Reconstruction
Authors	Florian Knoll, Kerstin Hammernik, Chi Zhang, Steen Moeller, Thomas Pock, Daniel K. Sodickson, Mehmet Akcakaya
Abstract	Following the success of deep learning in a wide range of applications, neural network-based machine learning techniques have received interest as a means of accelerating magnetic resonance imaging (MRI). A number of ideas inspired by deep learning techniques from computer vision and image processing have been successfully applied to non-linear image reconstruction in the spirit of compressed sensing for both low dose computed tomography and accelerated MRI. The additional integration of multi-coil information to recover missing k-space lines in the MRI reconstruction process, is still studied less frequently, even though it is the de-facto standard for currently used accelerated MR acquisitions. This manuscript provides an overview of the recent machine learning approaches that have been proposed specifically for improving parallel imaging. A general background introduction to parallel MRI is given that is structured around the classical view of image space and k-space based methods. Both linear and non-linear methods are covered, followed by a discussion of recent efforts to further improve parallel imaging using machine learning, and specifically using artificial neural networks. Image-domain based techniques that introduce improved regularizers are covered as well as k-space based methods, where the focus is on better interpolation strategies using neural networks. Issues and open problems are discussed as well as recent efforts for producing open datasets and benchmarks for the community.
Tasks	Image Reconstruction
Published	2019-04-01
URL	http://arxiv.org/abs/1904.01112v1
PDF	http://arxiv.org/pdf/1904.01112v1.pdf
PWC	https://paperswithcode.com/paper/deep-learning-methods-for-parallel-magnetic
Repo
Framework

Transform Learning for Magnetic Resonance Image Reconstruction: From Model-based Learning to Building Neural Networks


Title	Transform Learning for Magnetic Resonance Image Reconstruction: From Model-based Learning to Building Neural Networks
Authors	Bihan Wen, Saiprasad Ravishankar, Luke Pfister, Yoram Bresler
Abstract	Magnetic resonance imaging (MRI) is widely used in clinical practice, but it has been traditionally limited by its slow data acquisition. Recent advances in compressed sensing (CS) techniques for MRI reduce acquisition time while maintaining high image quality. Whereas classical CS assumes the images are sparse in known analytical dictionaries or transform domains, methods using learned image models for reconstruction have become popular. The model could be pre-learned from datasets, or learned simultaneously with the reconstruction, i.e., blind CS (BCS). Besides the well-known synthesis dictionary model, recent advances in transform learning (TL) provide an efficient alternative framework for sparse modeling in MRI. TL-based methods enjoy numerous advantages including exact sparse coding, transform update, and clustering solutions, cheap computation, and convergence guarantees, and provide high-quality results in MRI compared to popular competing methods. This paper provides a review of some recent works in MRI reconstruction from limited data, with focus on the recent TL-based methods. A unified framework for incorporating various TL-based models is presented. We discuss the connections between transform learning and convolutional or filter bank models and corresponding multi-layer extensions, with connections to deep learning. Finally, we discuss recent trends in MRI, open problems, and future directions for the field.
Tasks	Image Reconstruction
Published	2019-03-25
URL	https://arxiv.org/abs/1903.11431v2
PDF	https://arxiv.org/pdf/1903.11431v2.pdf
PWC	https://paperswithcode.com/paper/transform-learning-for-magnetic-resonance
Repo
Framework

Depth Coefficients for Depth Completion


Title	Depth Coefficients for Depth Completion
Authors	Saif Imran, Yunfei Long, Xiaoming Liu, Daniel Morris
Abstract	Depth completion involves estimating a dense depth image from sparse depth measurements, often guided by a color image. While linear upsampling is straight forward, it results in artifacts including depth pixels being interpolated in empty space across discontinuities between objects. Current methods use deep networks to upsample and “complete” the missing depth pixels. Nevertheless, depth smearing between objects remains a challenge. We propose a new representation for depth called Depth Coefficients (DC) to address this problem. It enables convolutions to more easily avoid inter-object depth mixing. We also show that the standard Mean Squared Error (MSE) loss function can promote depth mixing, and thus propose instead to use cross-entropy loss for DC. With quantitative and qualitative evaluation on benchmarks, we show that switching out sparse depth input and MSE loss with our DC representation and cross-entropy loss is a simple way to improve depth completion performance, and reduce pixel depth mixing, which leads to improved depth-based object detection.
Tasks	Depth Completion, Object Detection
Published	2019-03-13
URL	http://arxiv.org/abs/1903.05421v1
PDF	http://arxiv.org/pdf/1903.05421v1.pdf
PWC	https://paperswithcode.com/paper/depth-coefficients-for-depth-completion
Repo
Framework

Great Ape Detection in Challenging Jungle Camera Trap Footage via Attention-Based Spatial and Temporal Feature Blending


Title	Great Ape Detection in Challenging Jungle Camera Trap Footage via Attention-Based Spatial and Temporal Feature Blending
Authors	Xinyu Yang, Majid Mirmehdi, Tilo Burghardt
Abstract	We propose the first multi-frame video object detection framework trained to detect great apes. It is applicable to challenging camera trap footage in complex jungle environments and extends a traditional feature pyramid architecture by adding self-attention driven feature blending in both the spatial as well as the temporal domain. We demonstrate that this extension can detect distinctive species appearance and motion signatures despite significant partial occlusion. We evaluate the framework using 500 camera trap videos of great apes from the Pan African Programme containing 180K frames, which we manually annotated with accurate per-frame animal bounding boxes. These clips contain significant partial occlusions, challenging lighting, dynamic backgrounds, and natural camouflage effects. We show that our approach performs highly robustly and significantly outperforms frame-based detectors. We also perform detailed ablation studies and validation on the full ILSVRC 2015 VID data corpus to demonstrate wider applicability at adequate performance levels. We conclude that the framework is ready to assist human camera trap inspection efforts. We publish code, weights, and ground truth annotations with this paper.
Tasks	Object Detection, Video Object Detection
Published	2019-08-29
URL	https://arxiv.org/abs/1908.11240v1
PDF	https://arxiv.org/pdf/1908.11240v1.pdf
PWC	https://paperswithcode.com/paper/great-ape-detection-in-challenging-jungle
Repo
Framework

Color Filter Arrays for Quanta Image Sensors


Title	Color Filter Arrays for Quanta Image Sensors
Authors	Omar A. Elgendy, Stanley H. Chan
Abstract	Quanta image sensor (QIS) is envisioned to be the next generation image sensor after CCD and CMOS. In this paper, we discuss how to design color filter arrays for QIS and other small pixels. Designing color filter arrays for small pixels is challenging because maximizing the light efficiency while suppressing aliasing and crosstalk are conflicting tasks. We present an optimization-based framework which unifies several mainstream color filter array design methodologies. Our method offers greater generality and flexibility. Compared to existing methods, the new framework can simultaneously handle luminance sensitivity, chrominance sensitivity, cross-talk, anti-aliasing, manufacturability and orthogonality. Extensive experimental comparisons demonstrate the effectiveness of the framework.
Tasks	Image Reconstruction
Published	2019-03-23
URL	https://arxiv.org/abs/1903.09823v4
PDF	https://arxiv.org/pdf/1903.09823v4.pdf
PWC	https://paperswithcode.com/paper/color-filter-arrays-for-quanta-image-sensors
Repo
Framework

Let’s Make It Personal, A Challenge in Personalizing Medical Inter-Human Communication


Title	Let’s Make It Personal, A Challenge in Personalizing Medical Inter-Human Communication
Authors	Mor Vered, Frank Dignum, Tim Miller
Abstract	Current AI approaches have frequently been used to help personalize many aspects of medical experiences and tailor them to a specific individuals’ needs. However, while such systems consider medically-relevant information, they ignore socially-relevant information about how this diagnosis should be communicated and discussed with the patient. The lack of this capability may lead to mis-communication, resulting in serious implications, such as patients opting out of the best treatment. Consider a case in which the same treatment is proposed to two different individuals. The manner in which this treatment is mediated to each should be different, depending on the individual patient’s history, knowledge, and mental state. While it is clear that this communication should be conveyed via a human medical expert and not a software-based system, humans are not always capable of considering all of the relevant aspects and traversing all available information. We pose the challenge of creating Intelligent Agents (IAs) to assist medical service providers (MSPs) and consumers in establishing a more personalized human-to-human dialogue. Personalizing conversations will enable patients and MSPs to reach a solution that is best for their particular situation, such that a relation of trust can be built and commitment to the outcome of the interaction is assured. We propose a four-part conceptual framework for personalized social interactions, expand on which techniques are available within current AI research and discuss what has yet to be achieved.
Tasks
Published	2019-07-29
URL	https://arxiv.org/abs/1907.12687v1
PDF	https://arxiv.org/pdf/1907.12687v1.pdf
PWC	https://paperswithcode.com/paper/lets-make-it-personal-a-challenge-in
Repo
Framework

A Tight Runtime Analysis for the cGA on Jump Functions—EDAs Can Cross Fitness Valleys at No Extra Cost


Title	A Tight Runtime Analysis for the cGA on Jump Functions—EDAs Can Cross Fitness Valleys at No Extra Cost
Authors	Benjamin Doerr
Abstract	We prove that the compact genetic algorithm (cGA) with hypothetical population size $\mu = \Omega(\sqrt n \log n) \cap \text{poly}(n)$ with high probability finds the optimum of any $n$-dimensional jump function with jump size $k < \frac 1 {20} \ln n$ in $O(\mu \sqrt n)$ iterations. Since it is known that the cGA with high probability needs at least $\Omega(\mu \sqrt n + n \log n)$ iterations to optimize the unimodal OneMax function, our result shows that the cGA in contrast to most classic evolutionary algorithms here is able to cross moderate-sized valleys of low fitness at no extra cost. Our runtime guarantee improves over the recent upper bound $O(\mu n^{1.5} \log n)$ valid for $\mu = \Omega(n^{3.5+\varepsilon})$ of Hasen"ohrl and Sutton (GECCO 2018). For the best choice of the hypothetical population size, this result gives a runtime guarantee of $O(n^{5+\varepsilon})$, whereas ours gives $O(n \log n)$. We also provide a simple general method based on parallel runs that, under mild conditions, (i)~overcomes the need to specify a suitable population size, but gives a performance close to the one stemming from the best-possible population size, and (ii)~transforms EDAs with high-probability performance guarantees into EDAs with similar bounds on the expected runtime.
Tasks
Published	2019-03-26
URL	http://arxiv.org/abs/1903.10983v1
PDF	http://arxiv.org/pdf/1903.10983v1.pdf
PWC	https://paperswithcode.com/paper/a-tight-runtime-analysis-for-the-cga-on-jump
Repo
Framework

SeGMA: Semi-Supervised Gaussian Mixture Auto-Encoder


Title	SeGMA: Semi-Supervised Gaussian Mixture Auto-Encoder
Authors	Marek Śmieja, Maciej Wołczyk, Jacek Tabor, Bernhard C. Geiger
Abstract	We propose a semi-supervised generative model, SeGMA, which learns a joint probability distribution of data and their classes and which is implemented in a typical Wasserstein auto-encoder framework. We choose a mixture of Gaussians as a target distribution in latent space, which provides a natural splitting of data into clusters. To connect Gaussian components with correct classes, we use a small amount of labeled data and a Gaussian classifier induced by the target distribution. SeGMA is optimized efficiently due to the use of Cramer-Wold distance as a maximum mean discrepancy penalty, which yields a closed-form expression for a mixture of spherical Gaussian components and thus obviates the need of sampling. While SeGMA preserves all properties of its semi-supervised predecessors and achieves at least as good generative performance on standard benchmark data sets, it presents additional features: (a) interpolation between any pair of points in the latent space produces realistically-looking samples; (b) combining the interpolation property with disentangled class and style variables, SeGMA is able to perform a continuous style transfer from one class to another; (c) it is possible to change the intensity of class characteristics in a data point by moving the latent representation of the data point away from specific Gaussian components.
Tasks	Style Transfer
Published	2019-06-21
URL	https://arxiv.org/abs/1906.09333v1
PDF	https://arxiv.org/pdf/1906.09333v1.pdf
PWC	https://paperswithcode.com/paper/segma-semi-supervised-gaussian-mixture-auto
Repo
Framework

Autonomy, Authenticity, Authorship and Intention in computer generated art


Title	Autonomy, Authenticity, Authorship and Intention in computer generated art
Authors	Jon McCormack, Toby Gifford, Patrick Hutchings
Abstract	This paper examines five key questions surrounding computer generated art. Driven by the recent public auction of a work of `AI Art’ we selectively summarise many decades of research and commentary around topics of autonomy, authenticity, authorship and intention in computer generated art, and use this research to answer contemporary questions often asked about art made by computers that concern these topics. We additionally reflect on whether current techniques in deep learning and Generative Adversarial Networks significantly change the answers provided by many decades of prior research. \|
Tasks
Published	2019-03-06
URL	http://arxiv.org/abs/1903.02166v1
PDF	http://arxiv.org/pdf/1903.02166v1.pdf
PWC	https://paperswithcode.com/paper/autonomy-authenticity-authorship-and
Repo
Framework

Plug and play methods for magnetic resonance imaging (long version)


Title	Plug and play methods for magnetic resonance imaging (long version)
Authors	Rizwan Ahmad, Charles A. Bouman, Gregery T. Buzzard, Stanley Chan, Sizhou Liu, Edward T. Reehorst, Philip Schniter
Abstract	Magnetic Resonance Imaging (MRI) is a non-invasive diagnostic tool that provides excellent soft-tissue contrast without the use of ionizing radiation. Compared to other clinical imaging modalities (e.g., CT or ultrasound), however, the data acquisition process for MRI is inherently slow, which motivates undersampling and thus drives the need for accurate, efficient reconstruction methods from undersampled datasets. In this article, we describe the use of “plug-and-play” (PnP) algorithms for MRI image recovery. We first describe the linearly approximated inverse problem encountered in MRI. Then we review several PnP methods, where the unifying commonality is to iteratively call a denoising subroutine as one step of a larger optimization-inspired algorithm. Next, we describe how the result of the PnP method can be interpreted as a solution to an equilibrium equation, allowing convergence analysis from the equilibrium perspective. Finally, we present illustrative examples of PnP methods applied to MRI image recovery.
Tasks	Denoising, Image Denoising, Image Reconstruction
Published	2019-03-20
URL	https://arxiv.org/abs/1903.08616v5
PDF	https://arxiv.org/pdf/1903.08616v5.pdf
PWC	https://paperswithcode.com/paper/plug-and-play-methods-for-magnetic-resonance
Repo
Framework

Table-Of-Contents generation on contemporary documents


Title	Table-Of-Contents generation on contemporary documents
Authors	Najah-Imane Bentabet, Rémi Juge, Sira Ferradans
Abstract	The generation of precise and detailed Table-Of-Contents (TOC) from a document is a problem of major importance for document understanding and information extraction. Despite its importance, it is still a challenging task, especially for non-standardized documents with rich layout information such as commercial documents. In this paper, we present a new neural-based pipeline for TOC generation applicable to any searchable document. Unlike previous methods, we do not use semantic labeling nor assume the presence of parsable TOC pages in the document. Moreover, we analyze the influence of using external knowledge encoded as a template. We empirically show that this approach is only useful in a very low resource environment. Finally, we propose a new domain-specific data set that sheds some light on the difficulties of TOC generation in real-world documents. The proposed method shows better performance than the state-of-the-art on a public data set and on the newly released data set.
Tasks
Published	2019-11-20
URL	https://arxiv.org/abs/1911.08836v1
PDF	https://arxiv.org/pdf/1911.08836v1.pdf
PWC	https://paperswithcode.com/paper/table-of-contents-generation-on-contemporary
Repo
Framework