January 30, 2020

3277 words 16 mins read

Paper Group ANR 203

Paper Group ANR 203

Environment reconstruction on depth images using Generative Adversarial Networks. Joint Subspace Recovery and Enhanced Locality Driven Robust Flexible Discriminative Dictionary Learning. Constructing Gradient Controllable Recurrent Neural Networks Using Hamiltonian Dynamics. Optimizing Millions of Hyperparameters by Implicit Differentiation. Two-St …

Environment reconstruction on depth images using Generative Adversarial Networks

Title Environment reconstruction on depth images using Generative Adversarial Networks
Authors Lucas P. N. Matias, Jefferson R. Souza, Denis F. Wolf
Abstract Robust perception systems are essential for autonomous vehicle safety. To navigate in a complex urban environment, it is necessary precise sensors with reliable data. The task of understanding the surroundings is hard by itself; for intelligent vehicles, it is even more critical due to the high speed in which the vehicle navigates. To successfully navigate in an urban environment, the perception system must quickly receive, process, and execute an action to guarantee both passenger and pedestrian safety. Stereo cameras collect environment information at many levels, e.g., depth, color, texture, shape, which guarantee ample knowledge about the surroundings. Even so, when compared to human, computational methods lack the ability to deal with missing information, i.e., occlusions. For many perception tasks, this lack of data can be a hindrance due to the environment incomplete information. In this paper, we address this problem and discuss recent methods to deal with occluded areas inference. We then introduce a loss function focused on disparity and environment depth data reconstruction, and a Generative Adversarial Network (GAN) architecture able to deal with occluded information inference. Our results present a coherent reconstruction on depth maps, estimating regions occluded by different obstacles. Our final contribution is a loss function focused on disparity data and a GAN able to extract depth features and estimate depth data by inpainting disparity images.
Tasks
Published 2019-12-09
URL https://arxiv.org/abs/1912.03992v1
PDF https://arxiv.org/pdf/1912.03992v1.pdf
PWC https://paperswithcode.com/paper/environment-reconstruction-on-depth-images
Repo
Framework

Joint Subspace Recovery and Enhanced Locality Driven Robust Flexible Discriminative Dictionary Learning

Title Joint Subspace Recovery and Enhanced Locality Driven Robust Flexible Discriminative Dictionary Learning
Authors Zhao Zhang, Jiahuan Ren, Weiming Jiang, Zheng Zhang, Richang Hong, Shuicheng Yan, Meng Wang
Abstract We propose a joint subspace recovery and enhanced locality based robust flexible label consistent dictionary learning method called Robust Flexible Discriminative Dictionary Learning (RFDDL). RFDDL mainly improves the data representation and classification abilities by enhancing the robust property to sparse errors and encoding the locality, reconstruction error and label consistency more accurately. First, for the robustness to noise and sparse errors in data and atoms, RFDDL aims at recovering the underlying clean data and clean atom subspaces jointly, and then performs DL and encodes the locality in the recovered subspaces. Second, to enable the data sampled from a nonlinear manifold to be handled potentially and obtain the accurate reconstruction by avoiding the overfitting, RFDDL minimizes the reconstruction error in a flexible manner. Third, to encode the label consistency accurately, RFDDL involves a discriminative flexible sparse code error to encourage the coefficients to be soft. Fourth, to encode the locality well, RFDDL defines the Laplacian matrix over recovered atoms, includes label information of atoms in terms of intra-class compactness and inter-class separation, and associates with group sparse codes and classifier to obtain the accurate discriminative locality-constrained coefficients and classifier. Extensive results on public databases show the effectiveness of our RFDDL.
Tasks Dictionary Learning
Published 2019-06-11
URL https://arxiv.org/abs/1906.04598v1
PDF https://arxiv.org/pdf/1906.04598v1.pdf
PWC https://paperswithcode.com/paper/joint-subspace-recovery-and-enhanced-locality
Repo
Framework

Constructing Gradient Controllable Recurrent Neural Networks Using Hamiltonian Dynamics

Title Constructing Gradient Controllable Recurrent Neural Networks Using Hamiltonian Dynamics
Authors Konstantin Rusch, John W. Pearson, Konstantinos C. Zygalakis
Abstract Recurrent neural networks (RNNs) have gained a great deal of attention in solving sequential learning problems. The learning of long-term dependencies, however, remains challenging due to the problem of a vanishing or exploding hidden states gradient. By exploring further the recently established connections between RNNs and dynamical systems we propose a novel RNN architecture, which we call a Hamiltonian recurrent neural network (Hamiltonian RNN), based on a symplectic discretization of an appropriately chosen Hamiltonian system. The key benefit of this approach is that the corresponding RNN inherits the favorable long time properties of the Hamiltonian system, which in turn allows us to control the hidden states gradient with a hyperparameter of the Hamiltonian RNN architecture. This enables us to handle sequential learning problems with arbitrary sequence lengths, since for a range of values of this hyperparameter the gradient neither vanishes nor explodes. Additionally, we provide a heuristic for the optimal choice of the hyperparameter, which we use in our numerical simulations to illustrate that the Hamiltonian RNN is able to outperform other state-of-the-art RNNs without the need of computationally intensive hyperparameter optimization.
Tasks Hyperparameter Optimization
Published 2019-11-11
URL https://arxiv.org/abs/1911.05035v2
PDF https://arxiv.org/pdf/1911.05035v2.pdf
PWC https://paperswithcode.com/paper/constructing-gradient-controllable-recurrent
Repo
Framework

Optimizing Millions of Hyperparameters by Implicit Differentiation

Title Optimizing Millions of Hyperparameters by Implicit Differentiation
Authors Jonathan Lorraine, Paul Vicol, David Duvenaud
Abstract We propose an algorithm for inexpensive gradient-based hyperparameter optimization that combines the implicit function theorem (IFT) with efficient inverse Hessian approximations. We present results about the relationship between the IFT and differentiating through optimization, motivating our algorithm. We use the proposed approach to train modern network architectures with millions of weights and millions of hyper-parameters. For example, we learn a data-augmentation network - where every weight is a hyperparameter tuned for validation performance - outputting augmented training examples. Jointly tuning weights and hyperparameters with our approach is only a few times more costly in memory and compute than standard training.
Tasks Data Augmentation, Hyperparameter Optimization
Published 2019-11-06
URL https://arxiv.org/abs/1911.02590v1
PDF https://arxiv.org/pdf/1911.02590v1.pdf
PWC https://paperswithcode.com/paper/optimizing-millions-of-hyperparameters-by
Repo
Framework

Two-Stream Action Recognition-Oriented Video Super-Resolution

Title Two-Stream Action Recognition-Oriented Video Super-Resolution
Authors Haochen Zhang, Dong Liu, Zhiwei Xiong
Abstract We study the video super-resolution (SR) problem for facilitating video analytics tasks, e.g. action recognition, instead of for visual quality. The popular action recognition methods based on convolutional networks, exemplified by two-stream networks, are not directly applicable on video of low spatial resolution. This can be remedied by performing video SR prior to recognition, which motivates us to improve the SR procedure for recognition accuracy. Tailored for two-stream action recognition networks, we propose two video SR methods for the spatial and temporal streams respectively. On the one hand, we observe that regions with action are more important to recognition, and we propose an optical-flow guided weighted mean-squared-error loss for our spatial-oriented SR (SoSR) network to emphasize the reconstruction of moving objects. On the other hand, we observe that existing video SR methods incur temporal discontinuity between frames, which also worsens the recognition accuracy, and we propose a siamese network for our temporal-oriented SR (ToSR) training that emphasizes the temporal continuity between consecutive frames. We perform experiments using two state-of-the-art action recognition networks and two well-known datasets–UCF101 and HMDB51. Results demonstrate the effectiveness of our proposed SoSR and ToSR in improving recognition accuracy.
Tasks Optical Flow Estimation, Super-Resolution, Temporal Action Localization, Video Super-Resolution
Published 2019-03-13
URL https://arxiv.org/abs/1903.05577v2
PDF https://arxiv.org/pdf/1903.05577v2.pdf
PWC https://paperswithcode.com/paper/two-stream-oriented-video-super-resolution
Repo
Framework

Near-Optimal Online Egalitarian learning in General Sum Repeated Matrix Games

Title Near-Optimal Online Egalitarian learning in General Sum Repeated Matrix Games
Authors Aristide Tossou, Christos Dimitrakakis, Jaroslaw Rzepecki, Katja Hofmann
Abstract We study two-player general sum repeated finite games where the rewards of each player are generated from an unknown distribution. Our aim is to find the egalitarian bargaining solution (EBS) for the repeated game, which can lead to much higher rewards than the maximin value of both players. Our most important contribution is the derivation of an algorithm that achieves simultaneously, for both players, a high-probability regret bound of order $\mathcal{O}(\sqrt[3]{\ln T}\cdot T^{2/3})$ after any $T$ rounds of play. We demonstrate that our upper bound is nearly optimal by proving a lower bound of $\Omega(T^{2/3})$ for any algorithm.
Tasks
Published 2019-06-04
URL https://arxiv.org/abs/1906.01609v1
PDF https://arxiv.org/pdf/1906.01609v1.pdf
PWC https://paperswithcode.com/paper/near-optimal-online-egalitarian-learning-in
Repo
Framework

Towards Understanding Gender Bias in Relation Extraction

Title Towards Understanding Gender Bias in Relation Extraction
Authors Andrew Gaut, Tony Sun, Shirlyn Tang, Yuxin Huang, Jing Qian, Mai ElSherief, Jieyu Zhao, Diba Mirza, Elizabeth Belding, Kai-Wei Chang, William Yang Wang
Abstract Recent developments in Neural Relation Extraction (NRE) have made significant strides towards Automated Knowledge Base Construction (AKBC). While much attention has been dedicated towards improvements in accuracy, there have been no attempts in the literature to our knowledge to evaluate social biases in NRE systems. We create WikiGenderBias, a distantly supervised dataset with a human annotated test set. WikiGenderBias has sentences specifically curated to analyze gender bias in relation extraction systems. We use WikiGenderBias to evaluate systems for bias and find that NRE systems exhibit gender biased predictions and lay groundwork for future evaluation of bias in NRE. We also analyze how name anonymization, hard debiasing for word embeddings, and counterfactual data augmentation affect gender bias in predictions and performance.
Tasks Data Augmentation, Relation Extraction, Word Embeddings
Published 2019-11-09
URL https://arxiv.org/abs/1911.03642v1
PDF https://arxiv.org/pdf/1911.03642v1.pdf
PWC https://paperswithcode.com/paper/towards-understanding-gender-bias-in-relation
Repo
Framework

A Generalized Markov Chain Model to Capture Dynamic Preferences and Choice Overload

Title A Generalized Markov Chain Model to Capture Dynamic Preferences and Choice Overload
Authors Kumar Goutam, Vineet Goyal, Agathe Soret
Abstract Assortment optimization is an important problem that arises in many practical applications such as retailing and online advertising where the goal is to find a subset of products from a universe of substitutable products that maximize a seller’s expected revenue. The demand and the revenue depend on the substitution behavior of the customers that is captured by a choice model. One of the key challenges is to find the right model for the customer substitution behavior. Many parametric random utility based models have been considered in the literature to capture substitution. However, in all these models, the probability of purchase increases as we add more options to the assortment. This is not true in general and in many settings, the probability of purchase may decrease if we add more products to the assortment, referred to as the choice overload. In this paper we attempt to address these serious limitations and propose a generalization of the Markov chain based choice model considered in Blanchet et al. In particular, we handle dynamic preferences and the choice overload phenomenon using a Markovian comparison model that is a generalization of the Markovian substitution framework of Blanchet et al. The Markovian comparison framework allows us to implicitly model the search cost in the choice process and thereby, modeling both dynamic preferences as well as the choice overload phenomenon. We consider the assortment optimization problem for the special case of our generalized Markov chain model where the underlying Markov chain is rank-1 (this is a generalization of the Multinomial Logit model). We show that the assortment optimization problem under this model is NP-hard and present a fully polynomial-time approximation scheme (FPTAS) for this problem.
Tasks
Published 2019-11-15
URL https://arxiv.org/abs/1911.06716v2
PDF https://arxiv.org/pdf/1911.06716v2.pdf
PWC https://paperswithcode.com/paper/a-generalized-markov-chain-model-to-capture
Repo
Framework
Title Constrained Bayesian Optimization with Max-Value Entropy Search
Authors Valerio Perrone, Iaroslav Shcherbatyi, Rodolphe Jenatton, Cedric Archambeau, Matthias Seeger
Abstract Bayesian optimization (BO) is a model-based approach to sequentially optimize expensive black-box functions, such as the validation error of a deep neural network with respect to its hyperparameters. In many real-world scenarios, the optimization is further subject to a priori unknown constraints. For example, training a deep network configuration may fail with an out-of-memory error when the model is too large. In this work, we focus on a general formulation of Gaussian process-based BO with continuous or binary constraints. We propose constrained Max-value Entropy Search (cMES), a novel information theoretic-based acquisition function implementing this formulation. We also revisit the validity of the factorized approximation adopted for rapid computation of the MES acquisition function, showing empirically that this leads to inaccurate results. On an extensive set of real-world constrained hyperparameter optimization problems we show that cMES compares favourably to prior work, while being simpler to implement and faster than other constrained extensions of Entropy Search.
Tasks Hyperparameter Optimization
Published 2019-10-15
URL https://arxiv.org/abs/1910.07003v1
PDF https://arxiv.org/pdf/1910.07003v1.pdf
PWC https://paperswithcode.com/paper/constrained-bayesian-optimization-with-max
Repo
Framework

Learning Multi-Robot Decentralized Macro-Action-Based Policies via a Centralized Q-Net

Title Learning Multi-Robot Decentralized Macro-Action-Based Policies via a Centralized Q-Net
Authors Yuchen Xiao, Joshua Hoffman, Tian Xia, Christopher Amato
Abstract In many real-world multi-robot tasks, high-quality solutions often require a team of robots to perform asynchronous actions under decentralized control. Decentralized multi-agent reinforcement learning methods have difficulty learning decentralized policies because of the environment appearing to be non-stationary due to other agents also learning at the same time. In this paper, we address this challenge by proposing a macro-action-based decentralized multi-agent double deep recurrent Q-net (MacDec-MADDRQN) which trains each decentralized Q-net using a centralized Q-net for action selection. A generalized version of MacDec-MADDRQN with two separate training environments, called Parallel-MacDec-MADDRQN, is also presented to leverage either centralized or decentralized exploration. The advantages and the practical nature of our methods are demonstrated by achieving near-centralized results in simulation and having real robots accomplish a warehouse tool delivery task in an efficient way.
Tasks Multi-agent Reinforcement Learning
Published 2019-09-19
URL https://arxiv.org/abs/1909.08776v2
PDF https://arxiv.org/pdf/1909.08776v2.pdf
PWC https://paperswithcode.com/paper/multi-robot-deep-reinforcement-learning-with
Repo
Framework

Translation, Sentiment and Voices: A Computational Model to Translate and Analyse Voices from Real-Time Video Calling

Title Translation, Sentiment and Voices: A Computational Model to Translate and Analyse Voices from Real-Time Video Calling
Authors Aneek Barman Roy
Abstract With internet quickly becoming an easy access to many, voice calling over internet is slowly gaining momentum. Individuals has been engaging in video communication across the world in different languages. The decade saw the emergence of language translation using neural networks as well. With more data being generated in audio and visual forms, there has become a need and a challenge to analyse such information for many researchers from academia and industry. The availability of video chat corpora is limited as organizations protect user privacy and ensure data security. For this reason, an audio-visual communication system (VidALL) has been developed and audio-speeches were extracted. To understand human nature while answering a video call, an analysis was conducted where polarity and vocal intensity were considered as parameters. Simultaneously, a translation model using a neural approach was developed to translate English sentences to French. Simple RNN-based and Embedded-RNN based models were used for the translation model. BLEU score and target sentence comparators were used to check sentence correctness. Embedded-RNN showed an accuracy of 88.71 percentage and predicted correct sentences. A key finding suggest that polarity is a good estimator to understand human emotion.
Tasks
Published 2019-09-28
URL https://arxiv.org/abs/1909.13162v1
PDF https://arxiv.org/pdf/1909.13162v1.pdf
PWC https://paperswithcode.com/paper/translation-sentiment-and-voices-a
Repo
Framework

AutoRemover: Automatic Object Removal for Autonomous Driving Videos

Title AutoRemover: Automatic Object Removal for Autonomous Driving Videos
Authors Rong Zhang, Wei Li, Peng Wang, Chenye Guan, Jin Fang, Yuhang Song, Jinhui Yu, Baoquan Chen, Weiwei Xu, Ruigang Yang
Abstract Motivated by the need for photo-realistic simulation in autonomous driving, in this paper we present a video inpainting algorithm \emph{AutoRemover}, designed specifically for generating street-view videos without any moving objects. In our setup we have two challenges: the first is the shadow, shadows are usually unlabeled but tightly coupled with the moving objects. The second is the large ego-motion in the videos. To deal with shadows, we build up an autonomous driving shadow dataset and design a deep neural network to detect shadows automatically. To deal with large ego-motion, we take advantage of the multi-source data, in particular the 3D data, in autonomous driving. More specifically, the geometric relationship between frames is incorporated into an inpainting deep neural network to produce high-quality structurally consistent video output. Experiments show that our method outperforms other state-of-the-art (SOTA) object removal algorithms, reducing the RMSE by over $19%$.
Tasks Autonomous Driving, Video Inpainting
Published 2019-11-28
URL https://arxiv.org/abs/1911.12588v1
PDF https://arxiv.org/pdf/1911.12588v1.pdf
PWC https://paperswithcode.com/paper/autoremover-automatic-object-removal-for
Repo
Framework

The Price of Interpretability

Title The Price of Interpretability
Authors Dimitris Bertsimas, Arthur Delarue, Patrick Jaillet, Sebastien Martin
Abstract When quantitative models are used to support decision-making on complex and important topics, understanding a model’s reasoning'' can increase trust in its predictions, expose hidden biases, or reduce vulnerability to adversarial attacks. However, the concept of interpretability remains loosely defined and application-specific. In this paper, we introduce a mathematical framework in which machine learning models are constructed in a sequence of interpretable steps. We show that for a variety of models, a natural choice of interpretable steps recovers standard interpretability proxies (e.g., sparsity in linear models). We then generalize these proxies to yield a parametrized family of consistent measures of model interpretability. This formal definition allows us to quantify the price’’ of interpretability, i.e., the tradeoff with predictive accuracy. We demonstrate practical algorithms to apply our framework on real and synthetic datasets.
Tasks Decision Making
Published 2019-07-08
URL https://arxiv.org/abs/1907.03419v1
PDF https://arxiv.org/pdf/1907.03419v1.pdf
PWC https://paperswithcode.com/paper/the-price-of-interpretability
Repo
Framework

Risk-Aware Planning by Confidence Estimation using Deep Learning-Based Perception

Title Risk-Aware Planning by Confidence Estimation using Deep Learning-Based Perception
Authors Maymoonah Toubeh, Pratap Tokekar
Abstract This work proposes the use of Bayesian approximations of uncertainty from deep learning in a robot planner, showing that this produces more cautious actions in safety-critical scenarios. The case study investigated is motivated by a setup where an aerial robot acts as a “scout” for a ground robot. This is useful when the below area is unknown or dangerous, with applications in space exploration, military, or search-and-rescue. Images taken from the aerial view are used to provide a less obstructed map to guide the navigation of the robot on the ground. Experiments are conducted using a deep learning semantic image segmentation, followed by a path planner based on the resulting cost map, to provide an empirical analysis of the proposed method. A comparison with similar approaches is presented to portray the usefulness of certain techniques, or variations within a technique, in similar experimental settings. The method is analyzed to assess the impact of variations in the uncertainty extraction, as well as the absence of an uncertainty metric, on the overall system with the use of a defined metric which measures surprise to the planner. The analysis is performed on multiple datasets, showing a similar trend of lower surprise when uncertainty information is incorporated in the planning, given threshold values of the hyperparameters in the uncertainty extraction have been met. We find that taking uncertainty into account leads to paths that could be 18% less risky on an average.
Tasks Semantic Segmentation
Published 2019-09-13
URL https://arxiv.org/abs/1910.00101v1
PDF https://arxiv.org/pdf/1910.00101v1.pdf
PWC https://paperswithcode.com/paper/risk-aware-planning-by-confidence-estimation
Repo
Framework

Analysis by Adversarial Synthesis – A Novel Approach for Speech Vocoding

Title Analysis by Adversarial Synthesis – A Novel Approach for Speech Vocoding
Authors Ahmed Mustafa, Arijit Biswas, Christian Bergler, Julia Schottenhamml, Andreas Maier
Abstract Classical parametric speech coding techniques provide a compact representation for speech signals. This affords a very low transmission rate but with a reduced perceptual quality of the reconstructed signals. Recently, autoregressive deep generative models such as WaveNet and SampleRNN have been used as speech vocoders to scale up the perceptual quality of the reconstructed signals without increasing the coding rate. However, such models suffer from a very slow signal generation mechanism due to their sample-by-sample modelling approach. In this work, we introduce a new methodology for neural speech vocoding based on generative adversarial networks (GANs). A fake speech signal is generated from a very compressed representation of the glottal excitation using conditional GANs as a deep generative model. This fake speech is then refined using the LPC parameters of the original speech signal to obtain a natural reconstruction. The reconstructed speech waveforms based on this approach show a higher perceptual quality than the classical vocoder counterparts according to subjective and objective evaluation scores for a dataset of 30 male and female speakers. Moreover, the usage of GANs enables to generate signals in one-shot compared to autoregressive generative models. This makes GANs promising for exploration to implement high-quality neural vocoders.
Tasks
Published 2019-07-01
URL https://arxiv.org/abs/1907.00772v1
PDF https://arxiv.org/pdf/1907.00772v1.pdf
PWC https://paperswithcode.com/paper/analysis-by-adversarial-synthesis-a-novel
Repo
Framework
comments powered by Disqus