Paper Group ANR 1379
Quantile QT-Opt for Risk-Aware Vision-Based Robotic Grasping. Generating Long Sequences with Sparse Transformers. Elements of Sequential Monte Carlo. Effective Training of Convolutional Neural Networks with Low-bitwidth Weights and Activations. Deep Learning Methods for Parallel Magnetic Resonance Image Reconstruction. Transform Learning for Magnet …
Quantile QT-Opt for Risk-Aware Vision-Based Robotic Grasping
Title | Quantile QT-Opt for Risk-Aware Vision-Based Robotic Grasping |
Authors | Cristian Bodnar, Adrian Li, Karol Hausman, Peter Pastor, Mrinal Kalakrishnan |
Abstract | The distributional perspective on reinforcement learning (RL) has given rise to a series of successful Q-learning algorithms, resulting in state-of-the-art performance in arcade game environments. However, it has not yet been analyzed how these findings from a discrete setting translate to complex practical applications characterized by noisy, high dimensional and continuous state-action spaces. In this work, we propose Quantile QT-Opt (Q2-Opt), a distributional variant of the recently introduced distributed Q-learning algorithm for continuous domains, and examine its behaviour in a series of simulated and real vision-based robotic grasping tasks. The absence of an actor in Q2-Opt allows us to directly draw a parallel to the previous discrete experiments in the literature without the additional complexities induced by an actor-critic architecture. We demonstrate that Q2-Opt achieves a superior vision-based object grasping success rate, while also being more sample efficient. The distributional formulation also allows us to experiment with various risk distortion metrics that give us an indication of how robots can concretely manage risk in practice using a Deep RL control policy. As an additional contribution, we perform batch RL experiments in our virtual environment and compare them with the latest findings from discrete settings. Surprisingly, we find that the previous batch RL findings from the literature obtained on arcade game environments do not generalise to our setup. |
Tasks | Q-Learning, Robotic Grasping |
Published | 2019-10-01 |
URL | https://arxiv.org/abs/1910.02787v2 |
https://arxiv.org/pdf/1910.02787v2.pdf | |
PWC | https://paperswithcode.com/paper/quantile-qt-opt-for-risk-aware-vision-based |
Repo | |
Framework | |
Generating Long Sequences with Sparse Transformers
Title | Generating Long Sequences with Sparse Transformers |
Authors | Rewon Child, Scott Gray, Alec Radford, Ilya Sutskever |
Abstract | Transformers are powerful sequence models, but require time and memory that grows quadratically with the sequence length. In this paper we introduce sparse factorizations of the attention matrix which reduce this to $O(n \sqrt{n})$. We also introduce a) a variation on architecture and initialization to train deeper networks, b) the recomputation of attention matrices to save memory, and c) fast attention kernels for training. We call networks with these changes Sparse Transformers, and show they can model sequences tens of thousands of timesteps long using hundreds of layers. We use the same architecture to model images, audio, and text from raw bytes, setting a new state of the art for density modeling of Enwik8, CIFAR-10, and ImageNet-64. We generate unconditional samples that demonstrate global coherence and great diversity, and show it is possible in principle to use self-attention to model sequences of length one million or more. |
Tasks | |
Published | 2019-04-23 |
URL | http://arxiv.org/abs/1904.10509v1 |
http://arxiv.org/pdf/1904.10509v1.pdf | |
PWC | https://paperswithcode.com/paper/190410509 |
Repo | |
Framework | |
Elements of Sequential Monte Carlo
Title | Elements of Sequential Monte Carlo |
Authors | Christian A. Naesseth, Fredrik Lindsten, Thomas B. Schön |
Abstract | A core problem in statistics and probabilistic machine learning is to compute probability distributions and expectations. This is the fundamental problem of Bayesian statistics and machine learning, which frames all inference as expectations with respect to the posterior distribution. The key challenge is to approximate these intractable expectations. In this tutorial, we review sequential Monte Carlo (SMC), a random-sampling-based class of methods for approximate inference. First, we explain the basics of SMC, discuss practical issues, and review theoretical results. We then examine two of the main user design choices: the proposal distributions and the so called intermediate target distributions. We review recent results on how variational inference and amortization can be used to learn efficient proposals and target distributions. Next, we discuss the SMC estimate of the normalizing constant, how this can be used for pseudo-marginal inference and inference evaluation. Throughout the tutorial we illustrate the use of SMC on various models commonly used in machine learning, such as stochastic recurrent neural networks, probabilistic graphical models, and probabilistic programs. |
Tasks | Bayesian Inference |
Published | 2019-03-12 |
URL | http://arxiv.org/abs/1903.04797v1 |
http://arxiv.org/pdf/1903.04797v1.pdf | |
PWC | https://paperswithcode.com/paper/elements-of-sequential-monte-carlo |
Repo | |
Framework | |
Effective Training of Convolutional Neural Networks with Low-bitwidth Weights and Activations
Title | Effective Training of Convolutional Neural Networks with Low-bitwidth Weights and Activations |
Authors | Bohan Zhuang, Jing Liu, Mingkui Tan, Lingqiao Liu, Ian Reid, Chunhua Shen |
Abstract | This paper tackles the problem of training a deep convolutional neural network of both low-bitwidth weights and activations. Optimizing a low-precision network is very challenging due to the non-differentiability of the quantizer, which may result in substantial accuracy loss. To address this, we propose three practical approaches, including (i) progressive quantization; (ii) stochastic precision; and (iii) joint knowledge distillation to improve the network training. First, for progressive quantization, we propose two schemes to progressively find good local minima. Specifically, we propose to first optimize a net with quantized weights and subsequently quantize activations. This is in contrast to the traditional methods which optimize them simultaneously. Furthermore, we propose a second progressive quantization scheme which gradually decreases the bit-width from high-precision to low-precision during training. Second, to alleviate the excessive training burden due to the multi-round training stages, we further propose a one-stage stochastic precision strategy to randomly sample and quantize sub-networks while keeping other parts in full-precision. Finally, we adopt a novel learning scheme to jointly train a full-precision model alongside the low-precision one. By doing so, the full-precision model provides hints to guide the low-precision model training and significantly improves the performance of the low-precision network. Extensive experiments on various datasets (e.g., CIFAR-100, ImageNet) show the effectiveness of the proposed methods. |
Tasks | Quantization |
Published | 2019-08-10 |
URL | https://arxiv.org/abs/1908.04680v1 |
https://arxiv.org/pdf/1908.04680v1.pdf | |
PWC | https://paperswithcode.com/paper/effective-training-of-convolutional-neural |
Repo | |
Framework | |
Deep Learning Methods for Parallel Magnetic Resonance Image Reconstruction
Title | Deep Learning Methods for Parallel Magnetic Resonance Image Reconstruction |
Authors | Florian Knoll, Kerstin Hammernik, Chi Zhang, Steen Moeller, Thomas Pock, Daniel K. Sodickson, Mehmet Akcakaya |
Abstract | Following the success of deep learning in a wide range of applications, neural network-based machine learning techniques have received interest as a means of accelerating magnetic resonance imaging (MRI). A number of ideas inspired by deep learning techniques from computer vision and image processing have been successfully applied to non-linear image reconstruction in the spirit of compressed sensing for both low dose computed tomography and accelerated MRI. The additional integration of multi-coil information to recover missing k-space lines in the MRI reconstruction process, is still studied less frequently, even though it is the de-facto standard for currently used accelerated MR acquisitions. This manuscript provides an overview of the recent machine learning approaches that have been proposed specifically for improving parallel imaging. A general background introduction to parallel MRI is given that is structured around the classical view of image space and k-space based methods. Both linear and non-linear methods are covered, followed by a discussion of recent efforts to further improve parallel imaging using machine learning, and specifically using artificial neural networks. Image-domain based techniques that introduce improved regularizers are covered as well as k-space based methods, where the focus is on better interpolation strategies using neural networks. Issues and open problems are discussed as well as recent efforts for producing open datasets and benchmarks for the community. |
Tasks | Image Reconstruction |
Published | 2019-04-01 |
URL | http://arxiv.org/abs/1904.01112v1 |
http://arxiv.org/pdf/1904.01112v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-methods-for-parallel-magnetic |
Repo | |
Framework | |
Transform Learning for Magnetic Resonance Image Reconstruction: From Model-based Learning to Building Neural Networks
Title | Transform Learning for Magnetic Resonance Image Reconstruction: From Model-based Learning to Building Neural Networks |
Authors | Bihan Wen, Saiprasad Ravishankar, Luke Pfister, Yoram Bresler |
Abstract | Magnetic resonance imaging (MRI) is widely used in clinical practice, but it has been traditionally limited by its slow data acquisition. Recent advances in compressed sensing (CS) techniques for MRI reduce acquisition time while maintaining high image quality. Whereas classical CS assumes the images are sparse in known analytical dictionaries or transform domains, methods using learned image models for reconstruction have become popular. The model could be pre-learned from datasets, or learned simultaneously with the reconstruction, i.e., blind CS (BCS). Besides the well-known synthesis dictionary model, recent advances in transform learning (TL) provide an efficient alternative framework for sparse modeling in MRI. TL-based methods enjoy numerous advantages including exact sparse coding, transform update, and clustering solutions, cheap computation, and convergence guarantees, and provide high-quality results in MRI compared to popular competing methods. This paper provides a review of some recent works in MRI reconstruction from limited data, with focus on the recent TL-based methods. A unified framework for incorporating various TL-based models is presented. We discuss the connections between transform learning and convolutional or filter bank models and corresponding multi-layer extensions, with connections to deep learning. Finally, we discuss recent trends in MRI, open problems, and future directions for the field. |
Tasks | Image Reconstruction |
Published | 2019-03-25 |
URL | https://arxiv.org/abs/1903.11431v2 |
https://arxiv.org/pdf/1903.11431v2.pdf | |
PWC | https://paperswithcode.com/paper/transform-learning-for-magnetic-resonance |
Repo | |
Framework | |
Depth Coefficients for Depth Completion
Title | Depth Coefficients for Depth Completion |
Authors | Saif Imran, Yunfei Long, Xiaoming Liu, Daniel Morris |
Abstract | Depth completion involves estimating a dense depth image from sparse depth measurements, often guided by a color image. While linear upsampling is straight forward, it results in artifacts including depth pixels being interpolated in empty space across discontinuities between objects. Current methods use deep networks to upsample and “complete” the missing depth pixels. Nevertheless, depth smearing between objects remains a challenge. We propose a new representation for depth called Depth Coefficients (DC) to address this problem. It enables convolutions to more easily avoid inter-object depth mixing. We also show that the standard Mean Squared Error (MSE) loss function can promote depth mixing, and thus propose instead to use cross-entropy loss for DC. With quantitative and qualitative evaluation on benchmarks, we show that switching out sparse depth input and MSE loss with our DC representation and cross-entropy loss is a simple way to improve depth completion performance, and reduce pixel depth mixing, which leads to improved depth-based object detection. |
Tasks | Depth Completion, Object Detection |
Published | 2019-03-13 |
URL | http://arxiv.org/abs/1903.05421v1 |
http://arxiv.org/pdf/1903.05421v1.pdf | |
PWC | https://paperswithcode.com/paper/depth-coefficients-for-depth-completion |
Repo | |
Framework | |
Great Ape Detection in Challenging Jungle Camera Trap Footage via Attention-Based Spatial and Temporal Feature Blending
Title | Great Ape Detection in Challenging Jungle Camera Trap Footage via Attention-Based Spatial and Temporal Feature Blending |
Authors | Xinyu Yang, Majid Mirmehdi, Tilo Burghardt |
Abstract | We propose the first multi-frame video object detection framework trained to detect great apes. It is applicable to challenging camera trap footage in complex jungle environments and extends a traditional feature pyramid architecture by adding self-attention driven feature blending in both the spatial as well as the temporal domain. We demonstrate that this extension can detect distinctive species appearance and motion signatures despite significant partial occlusion. We evaluate the framework using 500 camera trap videos of great apes from the Pan African Programme containing 180K frames, which we manually annotated with accurate per-frame animal bounding boxes. These clips contain significant partial occlusions, challenging lighting, dynamic backgrounds, and natural camouflage effects. We show that our approach performs highly robustly and significantly outperforms frame-based detectors. We also perform detailed ablation studies and validation on the full ILSVRC 2015 VID data corpus to demonstrate wider applicability at adequate performance levels. We conclude that the framework is ready to assist human camera trap inspection efforts. We publish code, weights, and ground truth annotations with this paper. |
Tasks | Object Detection, Video Object Detection |
Published | 2019-08-29 |
URL | https://arxiv.org/abs/1908.11240v1 |
https://arxiv.org/pdf/1908.11240v1.pdf | |
PWC | https://paperswithcode.com/paper/great-ape-detection-in-challenging-jungle |
Repo | |
Framework | |
Color Filter Arrays for Quanta Image Sensors
Title | Color Filter Arrays for Quanta Image Sensors |
Authors | Omar A. Elgendy, Stanley H. Chan |
Abstract | Quanta image sensor (QIS) is envisioned to be the next generation image sensor after CCD and CMOS. In this paper, we discuss how to design color filter arrays for QIS and other small pixels. Designing color filter arrays for small pixels is challenging because maximizing the light efficiency while suppressing aliasing and crosstalk are conflicting tasks. We present an optimization-based framework which unifies several mainstream color filter array design methodologies. Our method offers greater generality and flexibility. Compared to existing methods, the new framework can simultaneously handle luminance sensitivity, chrominance sensitivity, cross-talk, anti-aliasing, manufacturability and orthogonality. Extensive experimental comparisons demonstrate the effectiveness of the framework. |
Tasks | Image Reconstruction |
Published | 2019-03-23 |
URL | https://arxiv.org/abs/1903.09823v4 |
https://arxiv.org/pdf/1903.09823v4.pdf | |
PWC | https://paperswithcode.com/paper/color-filter-arrays-for-quanta-image-sensors |
Repo | |
Framework | |
Let’s Make It Personal, A Challenge in Personalizing Medical Inter-Human Communication
Title | Let’s Make It Personal, A Challenge in Personalizing Medical Inter-Human Communication |
Authors | Mor Vered, Frank Dignum, Tim Miller |
Abstract | Current AI approaches have frequently been used to help personalize many aspects of medical experiences and tailor them to a specific individuals’ needs. However, while such systems consider medically-relevant information, they ignore socially-relevant information about how this diagnosis should be communicated and discussed with the patient. The lack of this capability may lead to mis-communication, resulting in serious implications, such as patients opting out of the best treatment. Consider a case in which the same treatment is proposed to two different individuals. The manner in which this treatment is mediated to each should be different, depending on the individual patient’s history, knowledge, and mental state. While it is clear that this communication should be conveyed via a human medical expert and not a software-based system, humans are not always capable of considering all of the relevant aspects and traversing all available information. We pose the challenge of creating Intelligent Agents (IAs) to assist medical service providers (MSPs) and consumers in establishing a more personalized human-to-human dialogue. Personalizing conversations will enable patients and MSPs to reach a solution that is best for their particular situation, such that a relation of trust can be built and commitment to the outcome of the interaction is assured. We propose a four-part conceptual framework for personalized social interactions, expand on which techniques are available within current AI research and discuss what has yet to be achieved. |
Tasks | |
Published | 2019-07-29 |
URL | https://arxiv.org/abs/1907.12687v1 |
https://arxiv.org/pdf/1907.12687v1.pdf | |
PWC | https://paperswithcode.com/paper/lets-make-it-personal-a-challenge-in |
Repo | |
Framework | |
A Tight Runtime Analysis for the cGA on Jump Functions—EDAs Can Cross Fitness Valleys at No Extra Cost
Title | A Tight Runtime Analysis for the cGA on Jump Functions—EDAs Can Cross Fitness Valleys at No Extra Cost |
Authors | Benjamin Doerr |
Abstract | We prove that the compact genetic algorithm (cGA) with hypothetical population size $\mu = \Omega(\sqrt n \log n) \cap \text{poly}(n)$ with high probability finds the optimum of any $n$-dimensional jump function with jump size $k < \frac 1 {20} \ln n$ in $O(\mu \sqrt n)$ iterations. Since it is known that the cGA with high probability needs at least $\Omega(\mu \sqrt n + n \log n)$ iterations to optimize the unimodal OneMax function, our result shows that the cGA in contrast to most classic evolutionary algorithms here is able to cross moderate-sized valleys of low fitness at no extra cost. Our runtime guarantee improves over the recent upper bound $O(\mu n^{1.5} \log n)$ valid for $\mu = \Omega(n^{3.5+\varepsilon})$ of Hasen"ohrl and Sutton (GECCO 2018). For the best choice of the hypothetical population size, this result gives a runtime guarantee of $O(n^{5+\varepsilon})$, whereas ours gives $O(n \log n)$. We also provide a simple general method based on parallel runs that, under mild conditions, (i)~overcomes the need to specify a suitable population size, but gives a performance close to the one stemming from the best-possible population size, and (ii)~transforms EDAs with high-probability performance guarantees into EDAs with similar bounds on the expected runtime. |
Tasks | |
Published | 2019-03-26 |
URL | http://arxiv.org/abs/1903.10983v1 |
http://arxiv.org/pdf/1903.10983v1.pdf | |
PWC | https://paperswithcode.com/paper/a-tight-runtime-analysis-for-the-cga-on-jump |
Repo | |
Framework | |
SeGMA: Semi-Supervised Gaussian Mixture Auto-Encoder
Title | SeGMA: Semi-Supervised Gaussian Mixture Auto-Encoder |
Authors | Marek Śmieja, Maciej Wołczyk, Jacek Tabor, Bernhard C. Geiger |
Abstract | We propose a semi-supervised generative model, SeGMA, which learns a joint probability distribution of data and their classes and which is implemented in a typical Wasserstein auto-encoder framework. We choose a mixture of Gaussians as a target distribution in latent space, which provides a natural splitting of data into clusters. To connect Gaussian components with correct classes, we use a small amount of labeled data and a Gaussian classifier induced by the target distribution. SeGMA is optimized efficiently due to the use of Cramer-Wold distance as a maximum mean discrepancy penalty, which yields a closed-form expression for a mixture of spherical Gaussian components and thus obviates the need of sampling. While SeGMA preserves all properties of its semi-supervised predecessors and achieves at least as good generative performance on standard benchmark data sets, it presents additional features: (a) interpolation between any pair of points in the latent space produces realistically-looking samples; (b) combining the interpolation property with disentangled class and style variables, SeGMA is able to perform a continuous style transfer from one class to another; (c) it is possible to change the intensity of class characteristics in a data point by moving the latent representation of the data point away from specific Gaussian components. |
Tasks | Style Transfer |
Published | 2019-06-21 |
URL | https://arxiv.org/abs/1906.09333v1 |
https://arxiv.org/pdf/1906.09333v1.pdf | |
PWC | https://paperswithcode.com/paper/segma-semi-supervised-gaussian-mixture-auto |
Repo | |
Framework | |
Autonomy, Authenticity, Authorship and Intention in computer generated art
Title | Autonomy, Authenticity, Authorship and Intention in computer generated art |
Authors | Jon McCormack, Toby Gifford, Patrick Hutchings |
Abstract | This paper examines five key questions surrounding computer generated art. Driven by the recent public auction of a work of `AI Art’ we selectively summarise many decades of research and commentary around topics of autonomy, authenticity, authorship and intention in computer generated art, and use this research to answer contemporary questions often asked about art made by computers that concern these topics. We additionally reflect on whether current techniques in deep learning and Generative Adversarial Networks significantly change the answers provided by many decades of prior research. | |
Tasks | |
Published | 2019-03-06 |
URL | http://arxiv.org/abs/1903.02166v1 |
http://arxiv.org/pdf/1903.02166v1.pdf | |
PWC | https://paperswithcode.com/paper/autonomy-authenticity-authorship-and |
Repo | |
Framework | |
Plug and play methods for magnetic resonance imaging (long version)
Title | Plug and play methods for magnetic resonance imaging (long version) |
Authors | Rizwan Ahmad, Charles A. Bouman, Gregery T. Buzzard, Stanley Chan, Sizhou Liu, Edward T. Reehorst, Philip Schniter |
Abstract | Magnetic Resonance Imaging (MRI) is a non-invasive diagnostic tool that provides excellent soft-tissue contrast without the use of ionizing radiation. Compared to other clinical imaging modalities (e.g., CT or ultrasound), however, the data acquisition process for MRI is inherently slow, which motivates undersampling and thus drives the need for accurate, efficient reconstruction methods from undersampled datasets. In this article, we describe the use of “plug-and-play” (PnP) algorithms for MRI image recovery. We first describe the linearly approximated inverse problem encountered in MRI. Then we review several PnP methods, where the unifying commonality is to iteratively call a denoising subroutine as one step of a larger optimization-inspired algorithm. Next, we describe how the result of the PnP method can be interpreted as a solution to an equilibrium equation, allowing convergence analysis from the equilibrium perspective. Finally, we present illustrative examples of PnP methods applied to MRI image recovery. |
Tasks | Denoising, Image Denoising, Image Reconstruction |
Published | 2019-03-20 |
URL | https://arxiv.org/abs/1903.08616v5 |
https://arxiv.org/pdf/1903.08616v5.pdf | |
PWC | https://paperswithcode.com/paper/plug-and-play-methods-for-magnetic-resonance |
Repo | |
Framework | |
Table-Of-Contents generation on contemporary documents
Title | Table-Of-Contents generation on contemporary documents |
Authors | Najah-Imane Bentabet, Rémi Juge, Sira Ferradans |
Abstract | The generation of precise and detailed Table-Of-Contents (TOC) from a document is a problem of major importance for document understanding and information extraction. Despite its importance, it is still a challenging task, especially for non-standardized documents with rich layout information such as commercial documents. In this paper, we present a new neural-based pipeline for TOC generation applicable to any searchable document. Unlike previous methods, we do not use semantic labeling nor assume the presence of parsable TOC pages in the document. Moreover, we analyze the influence of using external knowledge encoded as a template. We empirically show that this approach is only useful in a very low resource environment. Finally, we propose a new domain-specific data set that sheds some light on the difficulties of TOC generation in real-world documents. The proposed method shows better performance than the state-of-the-art on a public data set and on the newly released data set. |
Tasks | |
Published | 2019-11-20 |
URL | https://arxiv.org/abs/1911.08836v1 |
https://arxiv.org/pdf/1911.08836v1.pdf | |
PWC | https://paperswithcode.com/paper/table-of-contents-generation-on-contemporary |
Repo | |
Framework | |