July 27, 2019

3019 words 15 mins read

Paper Group ANR 553

Paper Group ANR 553

No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified Geometric Analysis. Learning Image-Conditioned Dynamics Models for Control of Under-actuated Legged Millirobots. Bi-class classification of humpback whale sound units against complex background noise with Deep Convolution Neural Network. Continuous-Time Flows for Efficient Inference …

No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified Geometric Analysis

Title No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified Geometric Analysis
Authors Rong Ge, Chi Jin, Yi Zheng
Abstract In this paper we develop a new framework that captures the common landscape underlying the common non-convex low-rank matrix problems including matrix sensing, matrix completion and robust PCA. In particular, we show for all above problems (including asymmetric cases): 1) all local minima are also globally optimal; 2) no high-order saddle points exists. These results explain why simple algorithms such as stochastic gradient descent have global converge, and efficiently optimize these non-convex objective functions in practice. Our framework connects and simplifies the existing analyses on optimization landscapes for matrix sensing and symmetric matrix completion. The framework naturally leads to new results for asymmetric matrix completion and robust PCA.
Tasks Matrix Completion
Published 2017-04-03
URL http://arxiv.org/abs/1704.00708v1
PDF http://arxiv.org/pdf/1704.00708v1.pdf
PWC https://paperswithcode.com/paper/no-spurious-local-minima-in-nonconvex-low
Repo
Framework

Learning Image-Conditioned Dynamics Models for Control of Under-actuated Legged Millirobots

Title Learning Image-Conditioned Dynamics Models for Control of Under-actuated Legged Millirobots
Authors Anusha Nagabandi, Guangzhao Yang, Thomas Asmar, Ravi Pandya, Gregory Kahn, Sergey Levine, Ronald S. Fearing
Abstract Millirobots are a promising robotic platform for many applications due to their small size and low manufacturing costs. Legged millirobots, in particular, can provide increased mobility in complex environments and improved scaling of obstacles. However, controlling these small, highly dynamic, and underactuated legged systems is difficult. Hand-engineered controllers can sometimes control these legged millirobots, but they have difficulties with dynamic maneuvers and complex terrains. We present an approach for controlling a real-world legged millirobot that is based on learned neural network models. Using less than 17 minutes of data, our method can learn a predictive model of the robot’s dynamics that can enable effective gaits to be synthesized on the fly for following user-specified waypoints on a given terrain. Furthermore, by leveraging expressive, high-capacity neural network models, our approach allows for these predictions to be directly conditioned on camera images, endowing the robot with the ability to predict how different terrains might affect its dynamics. This enables sample-efficient and effective learning for locomotion of a dynamic legged millirobot on various terrains, including gravel, turf, carpet, and styrofoam. Experiment videos can be found at https://sites.google.com/view/imageconddyn
Tasks
Published 2017-11-14
URL http://arxiv.org/abs/1711.05253v3
PDF http://arxiv.org/pdf/1711.05253v3.pdf
PWC https://paperswithcode.com/paper/learning-image-conditioned-dynamics-models
Repo
Framework

Bi-class classification of humpback whale sound units against complex background noise with Deep Convolution Neural Network

Title Bi-class classification of humpback whale sound units against complex background noise with Deep Convolution Neural Network
Authors Cazau Dorian, Riwal Lefort, Julien Bonnel, Jean-Luc Zarader, Olivier Adam
Abstract Automatically detecting sound units of humpback whales in complex time-varying background noises is a current challenge for scientists. In this paper, we explore the applicability of Convolution Neural Network (CNN) method for this task. In the evaluation stage, we present 6 bi-class classification experimentations of whale sound detection against different background noise types (e.g., rain, wind). In comparison to classical FFT-based representation like spectrograms, we showed that the use of image-based pretrained CNN features brought higher performance to classify whale sounds and background noise.
Tasks
Published 2017-03-31
URL http://arxiv.org/abs/1703.10887v1
PDF http://arxiv.org/pdf/1703.10887v1.pdf
PWC https://paperswithcode.com/paper/bi-class-classification-of-humpback-whale
Repo
Framework

Continuous-Time Flows for Efficient Inference and Density Estimation

Title Continuous-Time Flows for Efficient Inference and Density Estimation
Authors Changyou Chen, Chunyuan Li, Liqun Chen, Wenlin Wang, Yunchen Pu, Lawrence Carin
Abstract Two fundamental problems in unsupervised learning are efficient inference for latent-variable models and robust density estimation based on large amounts of unlabeled data. Algorithms for the two tasks, such as normalizing flows and generative adversarial networks (GANs), are often developed independently. In this paper, we propose the concept of {\em continuous-time flows} (CTFs), a family of diffusion-based methods that are able to asymptotically approach a target distribution. Distinct from normalizing flows and GANs, CTFs can be adopted to achieve the above two goals in one framework, with theoretical guarantees. Our framework includes distilling knowledge from a CTF for efficient inference, and learning an explicit energy-based distribution with CTFs for density estimation. Both tasks rely on a new technique for distribution matching within amortized learning. Experiments on various tasks demonstrate promising performance of the proposed CTF framework, compared to related techniques.
Tasks Density Estimation, Latent Variable Models
Published 2017-09-04
URL http://arxiv.org/abs/1709.01179v4
PDF http://arxiv.org/pdf/1709.01179v4.pdf
PWC https://paperswithcode.com/paper/continuous-time-flows-for-efficient-inference
Repo
Framework

Combining LSTM and Latent Topic Modeling for Mortality Prediction

Title Combining LSTM and Latent Topic Modeling for Mortality Prediction
Authors Yohan Jo, Lisa Lee, Shruti Palaskar
Abstract There is a great need for technologies that can predict the mortality of patients in intensive care units with both high accuracy and accountability. We present joint end-to-end neural network architectures that combine long short-term memory (LSTM) and a latent topic model to simultaneously train a classifier for mortality prediction and learn latent topics indicative of mortality from textual clinical notes. For topic interpretability, the topic modeling layer has been carefully designed as a single-layer network with constraints inspired by LDA. Experiments on the MIMIC-III dataset show that our models significantly outperform prior models that are based on LDA topics in mortality prediction. However, we achieve limited success with our method for interpreting topics from the trained models by looking at the neural network weights.
Tasks Mortality Prediction
Published 2017-09-08
URL http://arxiv.org/abs/1709.02842v1
PDF http://arxiv.org/pdf/1709.02842v1.pdf
PWC https://paperswithcode.com/paper/combining-lstm-and-latent-topic-modeling-for
Repo
Framework

Higher-order Pooling of CNN Features via Kernel Linearization for Action Recognition

Title Higher-order Pooling of CNN Features via Kernel Linearization for Action Recognition
Authors Anoop Cherian, Piotr Koniusz, Stephen Gould
Abstract Most successful deep learning algorithms for action recognition extend models designed for image-based tasks such as object recognition to video. Such extensions are typically trained for actions on single video frames or very short clips, and then their predictions from sliding-windows over the video sequence are pooled for recognizing the action at the sequence level. Usually this pooling step uses the first-order statistics of frame-level action predictions. In this paper, we explore the advantages of using higher-order correlations; specifically, we introduce Higher-order Kernel (HOK) descriptors generated from the late fusion of CNN classifier scores from all the frames in a sequence. To generate these descriptors, we use the idea of kernel linearization. Specifically, a similarity kernel matrix, which captures the temporal evolution of deep classifier scores, is first linearized into kernel feature maps. The HOK descriptors are then generated from the higher-order co-occurrences of these feature maps, and are then used as input to a video-level classifier. We provide experiments on two fine-grained action recognition datasets and show that our scheme leads to state-of-the-art results.
Tasks Object Recognition, Temporal Action Localization
Published 2017-01-19
URL http://arxiv.org/abs/1701.05432v1
PDF http://arxiv.org/pdf/1701.05432v1.pdf
PWC https://paperswithcode.com/paper/higher-order-pooling-of-cnn-features-via
Repo
Framework

Sparse Photometric 3D Face Reconstruction Guided by Morphable Models

Title Sparse Photometric 3D Face Reconstruction Guided by Morphable Models
Authors Xuan Cao, Zhang Chen, Anpei Chen, Xin Chen, Cen Wang, Jingyi Yu
Abstract We present a novel 3D face reconstruction technique that leverages sparse photometric stereo (PS) and latest advances on face registration/modeling from a single image. We observe that 3D morphable faces approach provides a reasonable geometry proxy for light position calibration. Specifically, we develop a robust optimization technique that can calibrate per-pixel lighting direction and illumination at a very high precision without assuming uniform surface albedos. Next, we apply semantic segmentation on input images and the geometry proxy to refine hairy vs. bare skin regions using tailored filters. Experiments on synthetic and real data show that by using a very small set of images, our technique is able to reconstruct fine geometric details such as wrinkles, eyebrows, whelks, pores, etc, comparable to and sometimes surpassing movie quality productions.
Tasks 3D Face Reconstruction, Calibration, Face Reconstruction, Semantic Segmentation
Published 2017-11-29
URL http://arxiv.org/abs/1711.10870v1
PDF http://arxiv.org/pdf/1711.10870v1.pdf
PWC https://paperswithcode.com/paper/sparse-photometric-3d-face-reconstruction
Repo
Framework

An Adversarial Neuro-Tensorial Approach For Learning Disentangled Representations

Title An Adversarial Neuro-Tensorial Approach For Learning Disentangled Representations
Authors Mengjiao Wang, Zhixin Shu, Shiyang Cheng, Yannis Panagakis, Dimitris Samaras, Stefanos Zafeiriou
Abstract Several factors contribute to the appearance of an object in a visual scene, including pose, illumination, and deformation, among others. Each factor accounts for a source of variability in the data, while the multiplicative interactions of these factors emulate the entangled variability, giving rise to the rich structure of visual object appearance. Disentangling such unobserved factors from visual data is a challenging task, especially when the data have been captured in uncontrolled recording conditions (also referred to as “in-the-wild”) and label information is not available. In this paper, we propose the first unsupervised deep learning method (with pseudo-supervision) for disentangling multiple latent factors of variation in face images captured in-the-wild. To this end, we propose a deep latent variable model, where the multiplicative interactions of multiple latent factors of variation are explicitly modelled by means of multilinear (tensor) structure. We demonstrate that the proposed approach indeed learns disentangled representations of facial expressions and pose, which can be used in various applications, including face editing, as well as 3D face reconstruction and classification of facial expression, identity and pose.
Tasks 3D Face Reconstruction, Face Reconstruction
Published 2017-11-28
URL http://arxiv.org/abs/1711.10402v2
PDF http://arxiv.org/pdf/1711.10402v2.pdf
PWC https://paperswithcode.com/paper/an-adversarial-neuro-tensorial-approach-for
Repo
Framework

Multiscale Strategies for Computing Optimal Transport

Title Multiscale Strategies for Computing Optimal Transport
Authors Samuel Gerber, Mauro Maggioni
Abstract This paper presents a multiscale approach to efficiently compute approximate optimal transport plans between point sets. It is particularly well-suited for point sets that are in high-dimensions, but are close to being intrinsically low-dimensional. The approach is based on an adaptive multiscale decomposition of the point sets. The multiscale decomposition yields a sequence of optimal transport problems, that are solved in a top-to-bottom fashion from the coarsest to the finest scale. We provide numerical evidence that this multiscale approach scales approximately linearly, in time and memory, in the number of nodes, instead of quadratically or worse for a direct solution. Empirically, the multiscale approach results in less than one percent relative error in the objective function. Furthermore, the multiscale plans constructed are of interest by themselves as they may be used to introduce novel features and notions of distances between point sets. An analysis of sets of brain MRI based on optimal transport distances illustrates the effectiveness of the proposed method on a real world data set. The application demonstrates that multiscale optimal transport distances have the potential to improve on state-of-the-art metrics currently used in computational anatomy.
Tasks
Published 2017-08-08
URL http://arxiv.org/abs/1708.02469v1
PDF http://arxiv.org/pdf/1708.02469v1.pdf
PWC https://paperswithcode.com/paper/multiscale-strategies-for-computing-optimal
Repo
Framework

The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning

Title The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning
Authors Audrunas Gruslys, Will Dabney, Mohammad Gheshlaghi Azar, Bilal Piot, Marc Bellemare, Remi Munos
Abstract In this work we present a new agent architecture, called Reactor, which combines multiple algorithmic and architectural contributions to produce an agent with higher sample-efficiency than Prioritized Dueling DQN (Wang et al., 2016) and Categorical DQN (Bellemare et al., 2017), while giving better run-time performance than A3C (Mnih et al., 2016). Our first contribution is a new policy evaluation algorithm called Distributional Retrace, which brings multi-step off-policy updates to the distributional reinforcement learning setting. The same approach can be used to convert several classes of multi-step policy evaluation algorithms designed for expected value evaluation into distributional ones. Next, we introduce the \b{eta}-leave-one-out policy gradient algorithm which improves the trade-off between variance and bias by using action values as a baseline. Our final algorithmic contribution is a new prioritized replay algorithm for sequences, which exploits the temporal locality of neighboring observations for more efficient replay prioritization. Using the Atari 2600 benchmarks, we show that each of these innovations contribute to both the sample efficiency and final agent performance. Finally, we demonstrate that Reactor reaches state-of-the-art performance after 200 million frames and less than a day of training.
Tasks Atari Games, Distributional Reinforcement Learning
Published 2017-04-15
URL http://arxiv.org/abs/1704.04651v2
PDF http://arxiv.org/pdf/1704.04651v2.pdf
PWC https://paperswithcode.com/paper/the-reactor-a-fast-and-sample-efficient-actor
Repo
Framework

A Novel Document Generation Process for Topic Detection based on Hierarchical Latent Tree Models

Title A Novel Document Generation Process for Topic Detection based on Hierarchical Latent Tree Models
Authors Peixian Chen, Zhourong Chen, Nevin L. Zhang
Abstract We propose a novel document generation process based on hierarchical latent tree models (HLTMs) learned from data. An HLTM has a layer of observed word variables at the bottom and multiple layers of latent variables on top. For each document, we first sample values for the latent variables layer by layer via logic sampling, then draw relative frequencies for the words conditioned on the values of the latent variables, and finally generate words for the document using the relative word frequencies. The motivation for the work is to take word counts into consideration with HLTMs. In comparison with LDA-based hierarchical document generation processes, the new process achieves drastically better model fit with much fewer parameters. It also yields more meaningful topics and topic hierarchies. It is the new state-of-the-art for the hierarchical topic detection.
Tasks Topic Models
Published 2017-12-12
URL https://arxiv.org/abs/1712.04116v3
PDF https://arxiv.org/pdf/1712.04116v3.pdf
PWC https://paperswithcode.com/paper/document-generation-with-hierarchical-latent
Repo
Framework

Improving Language Modeling using Densely Connected Recurrent Neural Networks

Title Improving Language Modeling using Densely Connected Recurrent Neural Networks
Authors Fréderic Godin, Joni Dambre, Wesley De Neve
Abstract In this paper, we introduce the novel concept of densely connected layers into recurrent neural networks. We evaluate our proposed architecture on the Penn Treebank language modeling task. We show that we can obtain similar perplexity scores with six times fewer parameters compared to a standard stacked 2-layer LSTM model trained with dropout (Zaremba et al. 2014). In contrast with the current usage of skip connections, we show that densely connecting only a few stacked layers with skip connections already yields significant perplexity reductions.
Tasks Language Modelling
Published 2017-07-19
URL http://arxiv.org/abs/1707.06130v1
PDF http://arxiv.org/pdf/1707.06130v1.pdf
PWC https://paperswithcode.com/paper/improving-language-modeling-using-densely
Repo
Framework

Interactive Medical Image Segmentation using Deep Learning with Image-specific Fine-tuning

Title Interactive Medical Image Segmentation using Deep Learning with Image-specific Fine-tuning
Authors Guotai Wang, Wenqi Li, Maria A. Zuluaga, Rosalind Pratt, Premal A. Patel, Michael Aertsen, Tom Doel, Anna L. David, Jan Deprest, Sebastien Ourselin, Tom Vercauteren
Abstract Convolutional neural networks (CNNs) have achieved state-of-the-art performance for automatic medical image segmentation. However, they have not demonstrated sufficiently accurate and robust results for clinical use. In addition, they are limited by the lack of image-specific adaptation and the lack of generalizability to previously unseen object classes. To address these problems, we propose a novel deep learning-based framework for interactive segmentation by incorporating CNNs into a bounding box and scribble-based segmentation pipeline. We propose image-specific fine-tuning to make a CNN model adaptive to a specific test image, which can be either unsupervised (without additional user interactions) or supervised (with additional scribbles). We also propose a weighted loss function considering network and interaction-based uncertainty for the fine-tuning. We applied this framework to two applications: 2D segmentation of multiple organs from fetal MR slices, where only two types of these organs were annotated for training; and 3D segmentation of brain tumor core (excluding edema) and whole brain tumor (including edema) from different MR sequences, where only tumor cores in one MR sequence were annotated for training. Experimental results show that 1) our model is more robust to segment previously unseen objects than state-of-the-art CNNs; 2) image-specific fine-tuning with the proposed weighted loss function significantly improves segmentation accuracy; and 3) our method leads to accurate results with fewer user interactions and less user time than traditional interactive segmentation methods.
Tasks Interactive Segmentation, Medical Image Segmentation, Semantic Segmentation
Published 2017-10-11
URL http://arxiv.org/abs/1710.04043v1
PDF http://arxiv.org/pdf/1710.04043v1.pdf
PWC https://paperswithcode.com/paper/interactive-medical-image-segmentation-using
Repo
Framework

Detecting Parts for Action Localization

Title Detecting Parts for Action Localization
Authors Nicolas Chesneau, Grégory Rogez, Karteek Alahari, Cordelia Schmid
Abstract In this paper, we propose a new framework for action localization that tracks people in videos and extracts full-body human tubes, i.e., spatio-temporal regions localizing actions, even in the case of occlusions or truncations. This is achieved by training a novel human part detector that scores visible parts while regressing full-body bounding boxes. The core of our method is a convolutional neural network which learns part proposals specific to certain body parts. These are then combined to detect people robustly in each frame. Our tracking algorithm connects the image detections temporally to extract full-body human tubes. We apply our new tube extraction method on the problem of human action localization, on the popular JHMDB dataset, and a very recent challenging dataset DALY (Daily Action Localization in YouTube), showing state-of-the-art results.
Tasks Action Localization
Published 2017-07-19
URL http://arxiv.org/abs/1707.06005v2
PDF http://arxiv.org/pdf/1707.06005v2.pdf
PWC https://paperswithcode.com/paper/detecting-parts-for-action-localization
Repo
Framework

Adaptive multi-penalty regularization based on a generalized Lasso path

Title Adaptive multi-penalty regularization based on a generalized Lasso path
Authors Markus Grasmair, Timo Klock, Valeriya Naumova
Abstract For many algorithms, parameter tuning remains a challenging and critical task, which becomes tedious and infeasible in a multi-parameter setting. Multi-penalty regularization, successfully used for solving undetermined sparse regression of problems of unmixing type where signal and noise are additively mixed, is one of such examples. In this paper, we propose a novel algorithmic framework for an adaptive parameter choice in multi-penalty regularization with a focus on the correct support recovery. Building upon the theory of regularization paths and algorithms for single-penalty functionals, we extend these ideas to a multi-penalty framework by providing an efficient procedure for the construction of regions containing structurally similar solutions, i.e., solutions with the same sparsity and sign pattern, over the whole range of parameters. Combining this with a model selection criterion, we can choose regularization parameters in a data-adaptive manner. Another advantage of our algorithm is that it provides an overview on the solution stability over the whole range of parameters. This can be further exploited to obtain additional insights into the problem of interest. We provide a numerical analysis of our method and compare it to the state-of-the-art single-penalty algorithms for compressed sensing problems in order to demonstrate the robustness and power of the proposed algorithm.
Tasks Model Selection
Published 2017-10-11
URL http://arxiv.org/abs/1710.03971v1
PDF http://arxiv.org/pdf/1710.03971v1.pdf
PWC https://paperswithcode.com/paper/adaptive-multi-penalty-regularization-based
Repo
Framework
comments powered by Disqus