January 30, 2020

3017 words 15 mins read

Paper Group ANR 222

Fast and Provable ADMM for Learning with Generative Priors. Policy Optimization Provably Converges to Nash Equilibria in Zero-Sum Linear Quadratic Games. On-Policy Robot Imitation Learning from a Converging Supervisor. Source Generator Attribution via Inversion. Episodic Learning with Control Lyapunov Functions for Uncertain Robotic Systems. Motion …

Fast and Provable ADMM for Learning with Generative Priors


Title	Fast and Provable ADMM for Learning with Generative Priors
Authors	Fabian Latorre Gómez, Armin Eftekhari, Volkan Cevher
Abstract	In this work, we propose a (linearized) Alternating Direction Method-of-Multipliers (ADMM) algorithm for minimizing a convex function subject to a nonconvex constraint. We focus on the special case where such constraint arises from the specification that a variable should lie in the range of a neural network. This is motivated by recent successful applications of Generative Adversarial Networks (GANs) in tasks like compressive sensing, denoising and robustness against adversarial examples. The derived rates for our algorithm are characterized in terms of certain geometric properties of the generator network, which we show hold for feedforward architectures, under mild assumptions. Unlike gradient descent (GD), it can efficiently handle non-smooth objectives as well as exploit efficient partial minimization procedures, thus being faster in many practical scenarios.
Tasks	Compressive Sensing, Denoising
Published	2019-07-07
URL	https://arxiv.org/abs/1907.03343v1
PDF	https://arxiv.org/pdf/1907.03343v1.pdf
PWC	https://paperswithcode.com/paper/fast-and-provable-admm-for-learning-with
Repo
Framework

Policy Optimization Provably Converges to Nash Equilibria in Zero-Sum Linear Quadratic Games


Title	Policy Optimization Provably Converges to Nash Equilibria in Zero-Sum Linear Quadratic Games
Authors	Kaiqing Zhang, Zhuoran Yang, Tamer Başar
Abstract	We study the global convergence of policy optimization for finding the Nash equilibria (NE) in zero-sum linear quadratic (LQ) games. To this end, we first investigate the landscape of LQ games, viewing it as a nonconvex-nonconcave saddle-point problem in the policy space. Specifically, we show that despite its nonconvexity and nonconcavity, zero-sum LQ games have the property that the stationary point of the objective function with respect to the linear feedback control policies constitutes the NE of the game. Building upon this, we develop three projected nested-gradient methods that are guaranteed to converge to the NE of the game. Moreover, we show that all of these algorithms enjoy both globally sublinear and locally linear convergence rates. Simulation results are also provided to illustrate the satisfactory convergence properties of the algorithms. To the best of our knowledge, this work appears to be the first one to investigate the optimization landscape of LQ games, and provably show the convergence of policy optimization methods to the Nash equilibria. Our work serves as an initial step toward understanding the theoretical aspects of policy-based reinforcement learning algorithms for zero-sum Markov games in general.
Tasks
Published	2019-05-31
URL	https://arxiv.org/abs/1906.00729v2
PDF	https://arxiv.org/pdf/1906.00729v2.pdf
PWC	https://paperswithcode.com/paper/190600729
Repo
Framework

On-Policy Robot Imitation Learning from a Converging Supervisor


Title	On-Policy Robot Imitation Learning from a Converging Supervisor
Authors	Ashwin Balakrishna, Brijen Thananjeyan, Jonathan Lee, Felix Li, Arsh Zahed, Joseph E. Gonzalez, Ken Goldberg
Abstract	Existing on-policy imitation learning algorithms, such as DAgger, assume access to a fixed supervisor. However, there are many settings where the supervisor may evolve during policy learning, such as a human performing a novel task or an improving algorithmic controller. We formalize imitation learning from a “converging supervisor” and provide sublinear static and dynamic regret guarantees against the best policy in hindsight with labels from the converged supervisor, even when labels during learning are only from intermediate supervisors. We then show that this framework is closely connected to a class of reinforcement learning (RL) algorithms known as dual policy iteration (DPI), which alternate between training a reactive learner with imitation learning and a model-based supervisor with data from the learner. Experiments suggest that when this framework is applied with the state-of-the-art deep model-based RL algorithm PETS as an improving supervisor, it outperforms deep RL baselines on continuous control tasks and provides up to an 80-fold speedup in policy evaluation.
Tasks	Continuous Control, Imitation Learning
Published	2019-07-08
URL	https://arxiv.org/abs/1907.03423v6
PDF	https://arxiv.org/pdf/1907.03423v6.pdf
PWC	https://paperswithcode.com/paper/on-policy-robot-imitation-learning-from-a
Repo
Framework

Source Generator Attribution via Inversion


Title	Source Generator Attribution via Inversion
Authors	Michael Albright, Scott McCloskey
Abstract	With advances in Generative Adversarial Networks (GANs) leading to dramatically-improved synthetic images and video, there is an increased need for algorithms which extend traditional forensics to this new category of imagery. While GANs have been shown to be helpful in a number of computer vision applications, there are other problematic uses such as `deep fakes’ which necessitate such forensics. Source camera attribution algorithms using various cues have addressed this need for imagery captured by a camera, but there are fewer options for synthetic imagery. We address the problem of attributing a synthetic image to a specific generator in a white box setting, by inverting the process of generation. This enables us to simultaneously determine whether the generator produced the image and recover an input which produces a close match to the synthetic image. \|
Tasks
Published	2019-05-06
URL	https://arxiv.org/abs/1905.02259v2
PDF	https://arxiv.org/pdf/1905.02259v2.pdf
PWC	https://paperswithcode.com/paper/source-generator-attribution-via-inversion
Repo
Framework

Episodic Learning with Control Lyapunov Functions for Uncertain Robotic Systems


Title	Episodic Learning with Control Lyapunov Functions for Uncertain Robotic Systems
Authors	Andrew J. Taylor, Victor D. Dorobantu, Hoang M. Le, Yisong Yue, Aaron D. Ames
Abstract	Many modern nonlinear control methods aim to endow systems with guaranteed properties, such as stability or safety, and have been successfully applied to the domain of robotics. However, model uncertainty remains a persistent challenge, weakening theoretical guarantees and causing implementation failures on physical systems. This paper develops a machine learning framework centered around Control Lyapunov Functions (CLFs) to adapt to parametric uncertainty and unmodeled dynamics in general robotic systems. Our proposed method proceeds by iteratively updating estimates of Lyapunov function derivatives and improving controllers, ultimately yielding a stabilizing quadratic program model-based controller. We validate our approach on a planar Segway simulation, demonstrating substantial performance improvements by iteratively refining on a base model-free controller.
Tasks
Published	2019-03-04
URL	http://arxiv.org/abs/1903.01577v1
PDF	http://arxiv.org/pdf/1903.01577v1.pdf
PWC	https://paperswithcode.com/paper/episodic-learning-with-control-lyapunov
Repo
Framework

Motion Perception in Reinforcement Learning with Dynamic Objects


Title	Motion Perception in Reinforcement Learning with Dynamic Objects
Authors	Artemij Amiranashvili, Alexey Dosovitskiy, Vladlen Koltun, Thomas Brox
Abstract	In dynamic environments, learned controllers are supposed to take motion into account when selecting the action to be taken. However, in existing reinforcement learning works motion is rarely treated explicitly; it is rather assumed that the controller learns the necessary motion representation from temporal stacks of frames implicitly. In this paper, we show that for continuous control tasks learning an explicit representation of motion improves the quality of the learned controller in dynamic scenarios. We demonstrate this on common benchmark tasks (Walker, Swimmer, Hopper), on target reaching and ball catching tasks with simulated robotic arms, and on a dynamic single ball juggling task. Moreover, we find that when equipped with an appropriate network architecture, the agent can, on some tasks, learn motion features also with pure reinforcement learning, without additional supervision. Further we find that using an image difference between the current and the previous frame as an additional input leads to better results than a temporal stack of frames.
Tasks	Continuous Control
Published	2019-01-10
URL	http://arxiv.org/abs/1901.03162v2
PDF	http://arxiv.org/pdf/1901.03162v2.pdf
PWC	https://paperswithcode.com/paper/motion-perception-in-reinforcement-learning
Repo
Framework

An Active Learning Framework for Efficient Robust Policy Search


Title	An Active Learning Framework for Efficient Robust Policy Search
Authors	Sai Kiran Narayanaswami, Nandan Sudarsanam, Balaraman Ravindran
Abstract	Robust Policy Search is the problem of learning policies that do not degrade in performance when subject to unseen environment model parameters. It is particularly relevant for transferring policies learned in a simulation environment to the real world. Several existing approaches involve sampling large batches of trajectories which reflect the differences in various possible environments, and then selecting some subset of these to learn robust policies, such as the ones that result in the worst performance. We propose an active learning based framework, EffAcTS, to selectively choose model parameters for this purpose so as to collect only as much data as necessary to select such a subset. We apply this framework to an existing method, namely EPOpt, and experimentally validate the gains in sample efficiency and the performance of our approach on standard continuous control tasks. We also present a Multi-Task Learning perspective to the problem of Robust Policy Search, and draw connections from our proposed framework to existing work on Multi-Task Learning.
Tasks	Active Learning, Continuous Control, Multi-Task Learning
Published	2019-01-01
URL	http://arxiv.org/abs/1901.00117v1
PDF	http://arxiv.org/pdf/1901.00117v1.pdf
PWC	https://paperswithcode.com/paper/an-active-learning-framework-for-efficient
Repo
Framework

VAIS ASR: Building a conversational speech recognition system using language model combination


Title	VAIS ASR: Building a conversational speech recognition system using language model combination
Authors	Quang Minh Nguyen, Thai Binh Nguyen, Ngoc Phuong Pham, The Loc Nguyen
Abstract	Automatic Speech Recognition (ASR) systems have been evolving quickly and reaching human parity in certain cases. The systems usually perform pretty well on reading style and clean speech, however, most of the available systems suffer from situation where the speaking style is conversation and in noisy environments. It is not straight-forward to tackle such problems due to difficulties in data collection for both speech and text. In this paper, we attempt to mitigate the problems using language models combination techniques that allows us to utilize both large amount of writing style text and small number of conversation text data. Evaluation on the VLSP 2019 ASR challenges showed that our system achieved 4.85% WER on the VLSP 2018 and 15.09% WER on the VLSP 2019 data sets.
Tasks	Language Modelling, Speech Recognition
Published	2019-10-12
URL	https://arxiv.org/abs/1910.05603v1
PDF	https://arxiv.org/pdf/1910.05603v1.pdf
PWC	https://paperswithcode.com/paper/vais-asr-building-a-conversational-speech
Repo
Framework

Accurate and Robust Pulmonary Nodule Detection by 3D Feature Pyramid Network with Self-supervised Feature Learning


Title	Accurate and Robust Pulmonary Nodule Detection by 3D Feature Pyramid Network with Self-supervised Feature Learning
Authors	Jingya Liu, Liangliang Cao, Oguz Akin, Yingli Tian
Abstract	Accurate detection of pulmonary nodules with high sensitivity and specificity is essential for automatic lung cancer diagnosis from CT scans. Although many deep learning-based algorithms make great progress for improving the accuracy of nodule detection, the high false positive rate is still a challenging problem which limits the automatic diagnosis in routine clinical practice. Moreover, the CT scans collected from multiple manufacturers may affect the robustness of Computer-aided diagnosis (CAD) due to the differences in intensity scales and machine noises. In this paper, we propose a novel self-supervised learning assisted pulmonary nodule detection framework based on a 3D Feature Pyramid Network (3DFPN) to improve the sensitivity of nodule detection by employing multi-scale features to increase the resolution of nodules, as well as a parallel top-down path to transit the high-level semantic features to complement low-level general features. Furthermore, a High Sensitivity and Specificity (HS2) network is introduced to eliminate the false positive nodule candidates by tracking the appearance changes in continuous CT slices of each nodule candidate on Location History Images (LHI). In addition, in order to improve the performance consistency of the proposed framework across data captured by different CT scanners without using additional annotations, an effective self-supervised learning schema is applied to learn spatiotemporal features of CT scans from large-scale unlabeled data. The performance and robustness of our method are evaluated on several publicly available datasets with significant performance improvements. The proposed framework is able to accurately detect pulmonary nodules with high sensitivity and specificity and achieves 90.6% sensitivity with 1/8 false positive per scan which outperforms the state-of-the-art results 15.8% on LUNA16 dataset.
Tasks	Lung Cancer Diagnosis
Published	2019-07-25
URL	https://arxiv.org/abs/1907.11704v1
PDF	https://arxiv.org/pdf/1907.11704v1.pdf
PWC	https://paperswithcode.com/paper/accurate-and-robust-pulmonary-nodule
Repo
Framework

Random Sum-Product Forests with Residual Links


Title	Random Sum-Product Forests with Residual Links
Authors	Fabrizio Ventola, Karl Stelzner, Alejandro Molina, Kristian Kersting
Abstract	Tractable yet expressive density estimators are a key building block of probabilistic machine learning. While sum-product networks (SPNs) offer attractive inference capabilities, obtaining structures large enough to fit complex, high-dimensional data has proven challenging. In this paper, we present random sum-product forests (RSPFs), an ensemble approach for mixing multiple randomly generated SPNs. We also introduce residual links, which reference specialized substructures of other component SPNs in order to leverage the context-specific knowledge encoded within them. Our empirical evidence demonstrates that RSPFs provide better performance than their individual components. Adding residual links improves the models further, allowing the resulting ResSPNs to be competitive with commonly used structure learning methods.
Tasks
Published	2019-08-08
URL	https://arxiv.org/abs/1908.03250v1
PDF	https://arxiv.org/pdf/1908.03250v1.pdf
PWC	https://paperswithcode.com/paper/random-sum-product-forests-with-residual
Repo
Framework

Glyph: Fast and Accurately Training Deep Neural Networks on Encrypted Data


Title	Glyph: Fast and Accurately Training Deep Neural Networks on Encrypted Data
Authors	Qian Lou, Bo Feng, Geoffrey C. Fox, Lei Jiang
Abstract	Big data is one of the cornerstones to enabling and training deep neural networks (DNNs). Because of the lack of expertise, to gain benefits from their data, average users have to rely on and upload their private data to big data companies they may not trust. Due to the compliance, legal, or privacy constraints, most users are willing to contribute only their encrypted data, and lack interests or resources to join the training of DNNs in cloud. To train a DNN on encrypted data in a completely non-interactive way, a recent work proposes a fully homomorphic encryption (FHE)-based technique implementing all activations in the neural network by \textit{Brakerski-Gentry-Vaikuntanathan (BGV)}-based lookup tables. However, such inefficient lookup-table-based activations significantly prolong the training latency of privacy-preserving DNNs. In this paper, we propose, Glyph, a FHE-based scheme to fast and accurately train DNNs on encrypted data by switching between TFHE (Fast Fully Homomorphic Encryption over the Torus) and BGV cryptosystems. Glyph uses logic-operation-friendly TFHE to implement nonlinear activations, while adopts vectorial-arithmetic-friendly BGV to perform multiply-accumulation (MAC) operations. Glyph further applies transfer learning on the training of DNNs to improve the test accuracy and reduce the number of MAC operations between ciphertext and ciphertext in convolutional layers. Our experimental results show Glyph obtains the state-of-the-art test accuracy, but reduces the training latency by $99%$ over the prior FHE-based technique on various encrypted datasets.
Tasks	Transfer Learning
Published	2019-11-16
URL	https://arxiv.org/abs/1911.07101v2
PDF	https://arxiv.org/pdf/1911.07101v2.pdf
PWC	https://paperswithcode.com/paper/glyph-fast-and-accurately-training-deep
Repo
Framework

Unsupervised Representations of Pollen in Bright-Field Microscopy


Title	Unsupervised Representations of Pollen in Bright-Field Microscopy
Authors	Peter He, Gerard Glowacki, Alexis Gkantiragas
Abstract	We present the first unsupervised deep learning method for pollen analysis using bright-field microscopy. Using a modest dataset of 650 images of pollen grains collected from honey, we achieve family level identification of pollen. We embed images of pollen grains into a low-dimensional latent space and compare Euclidean and Riemannian metrics on these spaces for clustering. We propose this system for automated analysis of pollen and other microscopic biological structures which have only small or unlabelled datasets available.
Tasks
Published	2019-08-05
URL	https://arxiv.org/abs/1908.01866v1
PDF	https://arxiv.org/pdf/1908.01866v1.pdf
PWC	https://paperswithcode.com/paper/unsupervised-representations-of-pollen-in
Repo
Framework

Location Forensics of Media Recordings Utilizing Cascaded SVM and Pole-matching Classifiers


Title	Location Forensics of Media Recordings Utilizing Cascaded SVM and Pole-matching Classifiers
Authors	Jayanta Dey, Mohammad Ariful Haque
Abstract	Information regarding the location of power distribution grid can be extracted from the power signature embedded in the multimedia signals (e.g., audio, video data) recorded near electrical activities. This implicit mechanism of identifying the origin-of-recording can be a very promising tool for multimedia forensics and security applications. In this work, we have developed a novel grid-of-origin identification system from media recording that consists of a number of support vector machine (SVM) followed by pole-matching (PM) classifiers. First, we determine the nominal frequency of the grid (50 or 60 Hz) based on the spectral observation. Then an SVM classifier, trained for the detection of a grid with a particular nominal frequency, narrows down the list of possible grids on the basis of different discriminating features extracted from the electric network frequency (ENF) signal. The decision of the SVM classifier is then passed to the PM classifier that detects the final grid based on the minimum distance between the estimated poles of test and training grids. Thus, we start from the problem of classifying grids with different nominal frequencies and simplify the problem of classification in three stages based on nominal frequency, SVM and finally using PM classifier. This cascaded system of classification ensures better accuracy (15.57% higher) compared to traditional ENF-based SVM classifiers described in the literature.
Tasks
Published	2019-12-01
URL	https://arxiv.org/abs/1912.00519v1
PDF	https://arxiv.org/pdf/1912.00519v1.pdf
PWC	https://paperswithcode.com/paper/location-forensics-of-media-recordings
Repo
Framework

Inverse Reinforcement Learning with Missing Data


Title	Inverse Reinforcement Learning with Missing Data
Authors	Tien Mai, Quoc Phong Nguyen, Kian Hsiang Low, Patrick Jaillet
Abstract	We consider the problem of recovering an expert’s reward function with inverse reinforcement learning (IRL) when there are missing/incomplete state-action pairs or observations in the demonstrated trajectories. This issue of missing trajectory data or information occurs in many situations, e.g., GPS signals from vehicles moving on a road network are intermittent. In this paper, we propose a tractable approach to directly compute the log-likelihood of demonstrated trajectories with incomplete/missing data. Our algorithm is efficient in handling a large number of missing segments in the demonstrated trajectories, as it performs the training with incomplete data by solving a sequence of systems of linear equations, and the number of such systems to be solved does not depend on the number of missing segments. Empirical evaluation on a real-world dataset shows that our training algorithm outperforms other conventional techniques.
Tasks
Published	2019-11-16
URL	https://arxiv.org/abs/1911.06930v1
PDF	https://arxiv.org/pdf/1911.06930v1.pdf
PWC	https://paperswithcode.com/paper/inverse-reinforcement-learning-with-missing
Repo
Framework

Outliagnostics: Visualizing Temporal Discrepancy in Outlying Signatures of Data Entries


Title	Outliagnostics: Visualizing Temporal Discrepancy in Outlying Signatures of Data Entries
Authors	Vung Pham, Tommy Dang
Abstract	This paper presents an approach to analyzing two-dimensional temporal datasets focusing on identifying observations that are significant in calculating the outliers of a scatterplot. We also propose a prototype, called Outliagnostics, to guide users when interactively exploring abnormalities in large time series. Instead of focusing on detecting outliers at each time point, we monitor and display the discrepant temporal signatures of each data entry concerning the overall distributions. Our prototype is designed to handle these tasks in parallel to improve performance. To highlight the benefits and performance of our approach, we illustrate and validate the use of Outliagnostics on real-world datasets of various sizes in different parallelism configurations. This work also discusses how to extend these ideas to handle time series with a higher number of dimensions and provides a prototype for this type of datasets.
Tasks	Time Series
Published	2019-10-30
URL	https://arxiv.org/abs/1910.13656v1
PDF	https://arxiv.org/pdf/1910.13656v1.pdf
PWC	https://paperswithcode.com/paper/outliagnostics-visualizing-temporal
Repo
Framework