Paper Group ANR 222
Fast and Provable ADMM for Learning with Generative Priors. Policy Optimization Provably Converges to Nash Equilibria in Zero-Sum Linear Quadratic Games. On-Policy Robot Imitation Learning from a Converging Supervisor. Source Generator Attribution via Inversion. Episodic Learning with Control Lyapunov Functions for Uncertain Robotic Systems. Motion …
Fast and Provable ADMM for Learning with Generative Priors
Title | Fast and Provable ADMM for Learning with Generative Priors |
Authors | Fabian Latorre Gómez, Armin Eftekhari, Volkan Cevher |
Abstract | In this work, we propose a (linearized) Alternating Direction Method-of-Multipliers (ADMM) algorithm for minimizing a convex function subject to a nonconvex constraint. We focus on the special case where such constraint arises from the specification that a variable should lie in the range of a neural network. This is motivated by recent successful applications of Generative Adversarial Networks (GANs) in tasks like compressive sensing, denoising and robustness against adversarial examples. The derived rates for our algorithm are characterized in terms of certain geometric properties of the generator network, which we show hold for feedforward architectures, under mild assumptions. Unlike gradient descent (GD), it can efficiently handle non-smooth objectives as well as exploit efficient partial minimization procedures, thus being faster in many practical scenarios. |
Tasks | Compressive Sensing, Denoising |
Published | 2019-07-07 |
URL | https://arxiv.org/abs/1907.03343v1 |
https://arxiv.org/pdf/1907.03343v1.pdf | |
PWC | https://paperswithcode.com/paper/fast-and-provable-admm-for-learning-with |
Repo | |
Framework | |
Policy Optimization Provably Converges to Nash Equilibria in Zero-Sum Linear Quadratic Games
Title | Policy Optimization Provably Converges to Nash Equilibria in Zero-Sum Linear Quadratic Games |
Authors | Kaiqing Zhang, Zhuoran Yang, Tamer Başar |
Abstract | We study the global convergence of policy optimization for finding the Nash equilibria (NE) in zero-sum linear quadratic (LQ) games. To this end, we first investigate the landscape of LQ games, viewing it as a nonconvex-nonconcave saddle-point problem in the policy space. Specifically, we show that despite its nonconvexity and nonconcavity, zero-sum LQ games have the property that the stationary point of the objective function with respect to the linear feedback control policies constitutes the NE of the game. Building upon this, we develop three projected nested-gradient methods that are guaranteed to converge to the NE of the game. Moreover, we show that all of these algorithms enjoy both globally sublinear and locally linear convergence rates. Simulation results are also provided to illustrate the satisfactory convergence properties of the algorithms. To the best of our knowledge, this work appears to be the first one to investigate the optimization landscape of LQ games, and provably show the convergence of policy optimization methods to the Nash equilibria. Our work serves as an initial step toward understanding the theoretical aspects of policy-based reinforcement learning algorithms for zero-sum Markov games in general. |
Tasks | |
Published | 2019-05-31 |
URL | https://arxiv.org/abs/1906.00729v2 |
https://arxiv.org/pdf/1906.00729v2.pdf | |
PWC | https://paperswithcode.com/paper/190600729 |
Repo | |
Framework | |
On-Policy Robot Imitation Learning from a Converging Supervisor
Title | On-Policy Robot Imitation Learning from a Converging Supervisor |
Authors | Ashwin Balakrishna, Brijen Thananjeyan, Jonathan Lee, Felix Li, Arsh Zahed, Joseph E. Gonzalez, Ken Goldberg |
Abstract | Existing on-policy imitation learning algorithms, such as DAgger, assume access to a fixed supervisor. However, there are many settings where the supervisor may evolve during policy learning, such as a human performing a novel task or an improving algorithmic controller. We formalize imitation learning from a “converging supervisor” and provide sublinear static and dynamic regret guarantees against the best policy in hindsight with labels from the converged supervisor, even when labels during learning are only from intermediate supervisors. We then show that this framework is closely connected to a class of reinforcement learning (RL) algorithms known as dual policy iteration (DPI), which alternate between training a reactive learner with imitation learning and a model-based supervisor with data from the learner. Experiments suggest that when this framework is applied with the state-of-the-art deep model-based RL algorithm PETS as an improving supervisor, it outperforms deep RL baselines on continuous control tasks and provides up to an 80-fold speedup in policy evaluation. |
Tasks | Continuous Control, Imitation Learning |
Published | 2019-07-08 |
URL | https://arxiv.org/abs/1907.03423v6 |
https://arxiv.org/pdf/1907.03423v6.pdf | |
PWC | https://paperswithcode.com/paper/on-policy-robot-imitation-learning-from-a |
Repo | |
Framework | |
Source Generator Attribution via Inversion
Title | Source Generator Attribution via Inversion |
Authors | Michael Albright, Scott McCloskey |
Abstract | With advances in Generative Adversarial Networks (GANs) leading to dramatically-improved synthetic images and video, there is an increased need for algorithms which extend traditional forensics to this new category of imagery. While GANs have been shown to be helpful in a number of computer vision applications, there are other problematic uses such as `deep fakes’ which necessitate such forensics. Source camera attribution algorithms using various cues have addressed this need for imagery captured by a camera, but there are fewer options for synthetic imagery. We address the problem of attributing a synthetic image to a specific generator in a white box setting, by inverting the process of generation. This enables us to simultaneously determine whether the generator produced the image and recover an input which produces a close match to the synthetic image. | |
Tasks | |
Published | 2019-05-06 |
URL | https://arxiv.org/abs/1905.02259v2 |
https://arxiv.org/pdf/1905.02259v2.pdf | |
PWC | https://paperswithcode.com/paper/source-generator-attribution-via-inversion |
Repo | |
Framework | |
Episodic Learning with Control Lyapunov Functions for Uncertain Robotic Systems
Title | Episodic Learning with Control Lyapunov Functions for Uncertain Robotic Systems |
Authors | Andrew J. Taylor, Victor D. Dorobantu, Hoang M. Le, Yisong Yue, Aaron D. Ames |
Abstract | Many modern nonlinear control methods aim to endow systems with guaranteed properties, such as stability or safety, and have been successfully applied to the domain of robotics. However, model uncertainty remains a persistent challenge, weakening theoretical guarantees and causing implementation failures on physical systems. This paper develops a machine learning framework centered around Control Lyapunov Functions (CLFs) to adapt to parametric uncertainty and unmodeled dynamics in general robotic systems. Our proposed method proceeds by iteratively updating estimates of Lyapunov function derivatives and improving controllers, ultimately yielding a stabilizing quadratic program model-based controller. We validate our approach on a planar Segway simulation, demonstrating substantial performance improvements by iteratively refining on a base model-free controller. |
Tasks | |
Published | 2019-03-04 |
URL | http://arxiv.org/abs/1903.01577v1 |
http://arxiv.org/pdf/1903.01577v1.pdf | |
PWC | https://paperswithcode.com/paper/episodic-learning-with-control-lyapunov |
Repo | |
Framework | |
Motion Perception in Reinforcement Learning with Dynamic Objects
Title | Motion Perception in Reinforcement Learning with Dynamic Objects |
Authors | Artemij Amiranashvili, Alexey Dosovitskiy, Vladlen Koltun, Thomas Brox |
Abstract | In dynamic environments, learned controllers are supposed to take motion into account when selecting the action to be taken. However, in existing reinforcement learning works motion is rarely treated explicitly; it is rather assumed that the controller learns the necessary motion representation from temporal stacks of frames implicitly. In this paper, we show that for continuous control tasks learning an explicit representation of motion improves the quality of the learned controller in dynamic scenarios. We demonstrate this on common benchmark tasks (Walker, Swimmer, Hopper), on target reaching and ball catching tasks with simulated robotic arms, and on a dynamic single ball juggling task. Moreover, we find that when equipped with an appropriate network architecture, the agent can, on some tasks, learn motion features also with pure reinforcement learning, without additional supervision. Further we find that using an image difference between the current and the previous frame as an additional input leads to better results than a temporal stack of frames. |
Tasks | Continuous Control |
Published | 2019-01-10 |
URL | http://arxiv.org/abs/1901.03162v2 |
http://arxiv.org/pdf/1901.03162v2.pdf | |
PWC | https://paperswithcode.com/paper/motion-perception-in-reinforcement-learning |
Repo | |
Framework | |
An Active Learning Framework for Efficient Robust Policy Search
Title | An Active Learning Framework for Efficient Robust Policy Search |
Authors | Sai Kiran Narayanaswami, Nandan Sudarsanam, Balaraman Ravindran |
Abstract | Robust Policy Search is the problem of learning policies that do not degrade in performance when subject to unseen environment model parameters. It is particularly relevant for transferring policies learned in a simulation environment to the real world. Several existing approaches involve sampling large batches of trajectories which reflect the differences in various possible environments, and then selecting some subset of these to learn robust policies, such as the ones that result in the worst performance. We propose an active learning based framework, EffAcTS, to selectively choose model parameters for this purpose so as to collect only as much data as necessary to select such a subset. We apply this framework to an existing method, namely EPOpt, and experimentally validate the gains in sample efficiency and the performance of our approach on standard continuous control tasks. We also present a Multi-Task Learning perspective to the problem of Robust Policy Search, and draw connections from our proposed framework to existing work on Multi-Task Learning. |
Tasks | Active Learning, Continuous Control, Multi-Task Learning |
Published | 2019-01-01 |
URL | http://arxiv.org/abs/1901.00117v1 |
http://arxiv.org/pdf/1901.00117v1.pdf | |
PWC | https://paperswithcode.com/paper/an-active-learning-framework-for-efficient |
Repo | |
Framework | |
VAIS ASR: Building a conversational speech recognition system using language model combination
Title | VAIS ASR: Building a conversational speech recognition system using language model combination |
Authors | Quang Minh Nguyen, Thai Binh Nguyen, Ngoc Phuong Pham, The Loc Nguyen |
Abstract | Automatic Speech Recognition (ASR) systems have been evolving quickly and reaching human parity in certain cases. The systems usually perform pretty well on reading style and clean speech, however, most of the available systems suffer from situation where the speaking style is conversation and in noisy environments. It is not straight-forward to tackle such problems due to difficulties in data collection for both speech and text. In this paper, we attempt to mitigate the problems using language models combination techniques that allows us to utilize both large amount of writing style text and small number of conversation text data. Evaluation on the VLSP 2019 ASR challenges showed that our system achieved 4.85% WER on the VLSP 2018 and 15.09% WER on the VLSP 2019 data sets. |
Tasks | Language Modelling, Speech Recognition |
Published | 2019-10-12 |
URL | https://arxiv.org/abs/1910.05603v1 |
https://arxiv.org/pdf/1910.05603v1.pdf | |
PWC | https://paperswithcode.com/paper/vais-asr-building-a-conversational-speech |
Repo | |
Framework | |
Accurate and Robust Pulmonary Nodule Detection by 3D Feature Pyramid Network with Self-supervised Feature Learning
Title | Accurate and Robust Pulmonary Nodule Detection by 3D Feature Pyramid Network with Self-supervised Feature Learning |
Authors | Jingya Liu, Liangliang Cao, Oguz Akin, Yingli Tian |
Abstract | Accurate detection of pulmonary nodules with high sensitivity and specificity is essential for automatic lung cancer diagnosis from CT scans. Although many deep learning-based algorithms make great progress for improving the accuracy of nodule detection, the high false positive rate is still a challenging problem which limits the automatic diagnosis in routine clinical practice. Moreover, the CT scans collected from multiple manufacturers may affect the robustness of Computer-aided diagnosis (CAD) due to the differences in intensity scales and machine noises. In this paper, we propose a novel self-supervised learning assisted pulmonary nodule detection framework based on a 3D Feature Pyramid Network (3DFPN) to improve the sensitivity of nodule detection by employing multi-scale features to increase the resolution of nodules, as well as a parallel top-down path to transit the high-level semantic features to complement low-level general features. Furthermore, a High Sensitivity and Specificity (HS2) network is introduced to eliminate the false positive nodule candidates by tracking the appearance changes in continuous CT slices of each nodule candidate on Location History Images (LHI). In addition, in order to improve the performance consistency of the proposed framework across data captured by different CT scanners without using additional annotations, an effective self-supervised learning schema is applied to learn spatiotemporal features of CT scans from large-scale unlabeled data. The performance and robustness of our method are evaluated on several publicly available datasets with significant performance improvements. The proposed framework is able to accurately detect pulmonary nodules with high sensitivity and specificity and achieves 90.6% sensitivity with 1/8 false positive per scan which outperforms the state-of-the-art results 15.8% on LUNA16 dataset. |
Tasks | Lung Cancer Diagnosis |
Published | 2019-07-25 |
URL | https://arxiv.org/abs/1907.11704v1 |
https://arxiv.org/pdf/1907.11704v1.pdf | |
PWC | https://paperswithcode.com/paper/accurate-and-robust-pulmonary-nodule |
Repo | |
Framework | |
Random Sum-Product Forests with Residual Links
Title | Random Sum-Product Forests with Residual Links |
Authors | Fabrizio Ventola, Karl Stelzner, Alejandro Molina, Kristian Kersting |
Abstract | Tractable yet expressive density estimators are a key building block of probabilistic machine learning. While sum-product networks (SPNs) offer attractive inference capabilities, obtaining structures large enough to fit complex, high-dimensional data has proven challenging. In this paper, we present random sum-product forests (RSPFs), an ensemble approach for mixing multiple randomly generated SPNs. We also introduce residual links, which reference specialized substructures of other component SPNs in order to leverage the context-specific knowledge encoded within them. Our empirical evidence demonstrates that RSPFs provide better performance than their individual components. Adding residual links improves the models further, allowing the resulting ResSPNs to be competitive with commonly used structure learning methods. |
Tasks | |
Published | 2019-08-08 |
URL | https://arxiv.org/abs/1908.03250v1 |
https://arxiv.org/pdf/1908.03250v1.pdf | |
PWC | https://paperswithcode.com/paper/random-sum-product-forests-with-residual |
Repo | |
Framework | |
Glyph: Fast and Accurately Training Deep Neural Networks on Encrypted Data
Title | Glyph: Fast and Accurately Training Deep Neural Networks on Encrypted Data |
Authors | Qian Lou, Bo Feng, Geoffrey C. Fox, Lei Jiang |
Abstract | Big data is one of the cornerstones to enabling and training deep neural networks (DNNs). Because of the lack of expertise, to gain benefits from their data, average users have to rely on and upload their private data to big data companies they may not trust. Due to the compliance, legal, or privacy constraints, most users are willing to contribute only their encrypted data, and lack interests or resources to join the training of DNNs in cloud. To train a DNN on encrypted data in a completely non-interactive way, a recent work proposes a fully homomorphic encryption (FHE)-based technique implementing all activations in the neural network by \textit{Brakerski-Gentry-Vaikuntanathan (BGV)}-based lookup tables. However, such inefficient lookup-table-based activations significantly prolong the training latency of privacy-preserving DNNs. In this paper, we propose, Glyph, a FHE-based scheme to fast and accurately train DNNs on encrypted data by switching between TFHE (Fast Fully Homomorphic Encryption over the Torus) and BGV cryptosystems. Glyph uses logic-operation-friendly TFHE to implement nonlinear activations, while adopts vectorial-arithmetic-friendly BGV to perform multiply-accumulation (MAC) operations. Glyph further applies transfer learning on the training of DNNs to improve the test accuracy and reduce the number of MAC operations between ciphertext and ciphertext in convolutional layers. Our experimental results show Glyph obtains the state-of-the-art test accuracy, but reduces the training latency by $99%$ over the prior FHE-based technique on various encrypted datasets. |
Tasks | Transfer Learning |
Published | 2019-11-16 |
URL | https://arxiv.org/abs/1911.07101v2 |
https://arxiv.org/pdf/1911.07101v2.pdf | |
PWC | https://paperswithcode.com/paper/glyph-fast-and-accurately-training-deep |
Repo | |
Framework | |
Unsupervised Representations of Pollen in Bright-Field Microscopy
Title | Unsupervised Representations of Pollen in Bright-Field Microscopy |
Authors | Peter He, Gerard Glowacki, Alexis Gkantiragas |
Abstract | We present the first unsupervised deep learning method for pollen analysis using bright-field microscopy. Using a modest dataset of 650 images of pollen grains collected from honey, we achieve family level identification of pollen. We embed images of pollen grains into a low-dimensional latent space and compare Euclidean and Riemannian metrics on these spaces for clustering. We propose this system for automated analysis of pollen and other microscopic biological structures which have only small or unlabelled datasets available. |
Tasks | |
Published | 2019-08-05 |
URL | https://arxiv.org/abs/1908.01866v1 |
https://arxiv.org/pdf/1908.01866v1.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-representations-of-pollen-in |
Repo | |
Framework | |
Location Forensics of Media Recordings Utilizing Cascaded SVM and Pole-matching Classifiers
Title | Location Forensics of Media Recordings Utilizing Cascaded SVM and Pole-matching Classifiers |
Authors | Jayanta Dey, Mohammad Ariful Haque |
Abstract | Information regarding the location of power distribution grid can be extracted from the power signature embedded in the multimedia signals (e.g., audio, video data) recorded near electrical activities. This implicit mechanism of identifying the origin-of-recording can be a very promising tool for multimedia forensics and security applications. In this work, we have developed a novel grid-of-origin identification system from media recording that consists of a number of support vector machine (SVM) followed by pole-matching (PM) classifiers. First, we determine the nominal frequency of the grid (50 or 60 Hz) based on the spectral observation. Then an SVM classifier, trained for the detection of a grid with a particular nominal frequency, narrows down the list of possible grids on the basis of different discriminating features extracted from the electric network frequency (ENF) signal. The decision of the SVM classifier is then passed to the PM classifier that detects the final grid based on the minimum distance between the estimated poles of test and training grids. Thus, we start from the problem of classifying grids with different nominal frequencies and simplify the problem of classification in three stages based on nominal frequency, SVM and finally using PM classifier. This cascaded system of classification ensures better accuracy (15.57% higher) compared to traditional ENF-based SVM classifiers described in the literature. |
Tasks | |
Published | 2019-12-01 |
URL | https://arxiv.org/abs/1912.00519v1 |
https://arxiv.org/pdf/1912.00519v1.pdf | |
PWC | https://paperswithcode.com/paper/location-forensics-of-media-recordings |
Repo | |
Framework | |
Inverse Reinforcement Learning with Missing Data
Title | Inverse Reinforcement Learning with Missing Data |
Authors | Tien Mai, Quoc Phong Nguyen, Kian Hsiang Low, Patrick Jaillet |
Abstract | We consider the problem of recovering an expert’s reward function with inverse reinforcement learning (IRL) when there are missing/incomplete state-action pairs or observations in the demonstrated trajectories. This issue of missing trajectory data or information occurs in many situations, e.g., GPS signals from vehicles moving on a road network are intermittent. In this paper, we propose a tractable approach to directly compute the log-likelihood of demonstrated trajectories with incomplete/missing data. Our algorithm is efficient in handling a large number of missing segments in the demonstrated trajectories, as it performs the training with incomplete data by solving a sequence of systems of linear equations, and the number of such systems to be solved does not depend on the number of missing segments. Empirical evaluation on a real-world dataset shows that our training algorithm outperforms other conventional techniques. |
Tasks | |
Published | 2019-11-16 |
URL | https://arxiv.org/abs/1911.06930v1 |
https://arxiv.org/pdf/1911.06930v1.pdf | |
PWC | https://paperswithcode.com/paper/inverse-reinforcement-learning-with-missing |
Repo | |
Framework | |
Outliagnostics: Visualizing Temporal Discrepancy in Outlying Signatures of Data Entries
Title | Outliagnostics: Visualizing Temporal Discrepancy in Outlying Signatures of Data Entries |
Authors | Vung Pham, Tommy Dang |
Abstract | This paper presents an approach to analyzing two-dimensional temporal datasets focusing on identifying observations that are significant in calculating the outliers of a scatterplot. We also propose a prototype, called Outliagnostics, to guide users when interactively exploring abnormalities in large time series. Instead of focusing on detecting outliers at each time point, we monitor and display the discrepant temporal signatures of each data entry concerning the overall distributions. Our prototype is designed to handle these tasks in parallel to improve performance. To highlight the benefits and performance of our approach, we illustrate and validate the use of Outliagnostics on real-world datasets of various sizes in different parallelism configurations. This work also discusses how to extend these ideas to handle time series with a higher number of dimensions and provides a prototype for this type of datasets. |
Tasks | Time Series |
Published | 2019-10-30 |
URL | https://arxiv.org/abs/1910.13656v1 |
https://arxiv.org/pdf/1910.13656v1.pdf | |
PWC | https://paperswithcode.com/paper/outliagnostics-visualizing-temporal |
Repo | |
Framework | |