Paper Group ANR 203
Environment reconstruction on depth images using Generative Adversarial Networks. Joint Subspace Recovery and Enhanced Locality Driven Robust Flexible Discriminative Dictionary Learning. Constructing Gradient Controllable Recurrent Neural Networks Using Hamiltonian Dynamics. Optimizing Millions of Hyperparameters by Implicit Differentiation. Two-St …
Environment reconstruction on depth images using Generative Adversarial Networks
Title | Environment reconstruction on depth images using Generative Adversarial Networks |
Authors | Lucas P. N. Matias, Jefferson R. Souza, Denis F. Wolf |
Abstract | Robust perception systems are essential for autonomous vehicle safety. To navigate in a complex urban environment, it is necessary precise sensors with reliable data. The task of understanding the surroundings is hard by itself; for intelligent vehicles, it is even more critical due to the high speed in which the vehicle navigates. To successfully navigate in an urban environment, the perception system must quickly receive, process, and execute an action to guarantee both passenger and pedestrian safety. Stereo cameras collect environment information at many levels, e.g., depth, color, texture, shape, which guarantee ample knowledge about the surroundings. Even so, when compared to human, computational methods lack the ability to deal with missing information, i.e., occlusions. For many perception tasks, this lack of data can be a hindrance due to the environment incomplete information. In this paper, we address this problem and discuss recent methods to deal with occluded areas inference. We then introduce a loss function focused on disparity and environment depth data reconstruction, and a Generative Adversarial Network (GAN) architecture able to deal with occluded information inference. Our results present a coherent reconstruction on depth maps, estimating regions occluded by different obstacles. Our final contribution is a loss function focused on disparity data and a GAN able to extract depth features and estimate depth data by inpainting disparity images. |
Tasks | |
Published | 2019-12-09 |
URL | https://arxiv.org/abs/1912.03992v1 |
https://arxiv.org/pdf/1912.03992v1.pdf | |
PWC | https://paperswithcode.com/paper/environment-reconstruction-on-depth-images |
Repo | |
Framework | |
Joint Subspace Recovery and Enhanced Locality Driven Robust Flexible Discriminative Dictionary Learning
Title | Joint Subspace Recovery and Enhanced Locality Driven Robust Flexible Discriminative Dictionary Learning |
Authors | Zhao Zhang, Jiahuan Ren, Weiming Jiang, Zheng Zhang, Richang Hong, Shuicheng Yan, Meng Wang |
Abstract | We propose a joint subspace recovery and enhanced locality based robust flexible label consistent dictionary learning method called Robust Flexible Discriminative Dictionary Learning (RFDDL). RFDDL mainly improves the data representation and classification abilities by enhancing the robust property to sparse errors and encoding the locality, reconstruction error and label consistency more accurately. First, for the robustness to noise and sparse errors in data and atoms, RFDDL aims at recovering the underlying clean data and clean atom subspaces jointly, and then performs DL and encodes the locality in the recovered subspaces. Second, to enable the data sampled from a nonlinear manifold to be handled potentially and obtain the accurate reconstruction by avoiding the overfitting, RFDDL minimizes the reconstruction error in a flexible manner. Third, to encode the label consistency accurately, RFDDL involves a discriminative flexible sparse code error to encourage the coefficients to be soft. Fourth, to encode the locality well, RFDDL defines the Laplacian matrix over recovered atoms, includes label information of atoms in terms of intra-class compactness and inter-class separation, and associates with group sparse codes and classifier to obtain the accurate discriminative locality-constrained coefficients and classifier. Extensive results on public databases show the effectiveness of our RFDDL. |
Tasks | Dictionary Learning |
Published | 2019-06-11 |
URL | https://arxiv.org/abs/1906.04598v1 |
https://arxiv.org/pdf/1906.04598v1.pdf | |
PWC | https://paperswithcode.com/paper/joint-subspace-recovery-and-enhanced-locality |
Repo | |
Framework | |
Constructing Gradient Controllable Recurrent Neural Networks Using Hamiltonian Dynamics
Title | Constructing Gradient Controllable Recurrent Neural Networks Using Hamiltonian Dynamics |
Authors | Konstantin Rusch, John W. Pearson, Konstantinos C. Zygalakis |
Abstract | Recurrent neural networks (RNNs) have gained a great deal of attention in solving sequential learning problems. The learning of long-term dependencies, however, remains challenging due to the problem of a vanishing or exploding hidden states gradient. By exploring further the recently established connections between RNNs and dynamical systems we propose a novel RNN architecture, which we call a Hamiltonian recurrent neural network (Hamiltonian RNN), based on a symplectic discretization of an appropriately chosen Hamiltonian system. The key benefit of this approach is that the corresponding RNN inherits the favorable long time properties of the Hamiltonian system, which in turn allows us to control the hidden states gradient with a hyperparameter of the Hamiltonian RNN architecture. This enables us to handle sequential learning problems with arbitrary sequence lengths, since for a range of values of this hyperparameter the gradient neither vanishes nor explodes. Additionally, we provide a heuristic for the optimal choice of the hyperparameter, which we use in our numerical simulations to illustrate that the Hamiltonian RNN is able to outperform other state-of-the-art RNNs without the need of computationally intensive hyperparameter optimization. |
Tasks | Hyperparameter Optimization |
Published | 2019-11-11 |
URL | https://arxiv.org/abs/1911.05035v2 |
https://arxiv.org/pdf/1911.05035v2.pdf | |
PWC | https://paperswithcode.com/paper/constructing-gradient-controllable-recurrent |
Repo | |
Framework | |
Optimizing Millions of Hyperparameters by Implicit Differentiation
Title | Optimizing Millions of Hyperparameters by Implicit Differentiation |
Authors | Jonathan Lorraine, Paul Vicol, David Duvenaud |
Abstract | We propose an algorithm for inexpensive gradient-based hyperparameter optimization that combines the implicit function theorem (IFT) with efficient inverse Hessian approximations. We present results about the relationship between the IFT and differentiating through optimization, motivating our algorithm. We use the proposed approach to train modern network architectures with millions of weights and millions of hyper-parameters. For example, we learn a data-augmentation network - where every weight is a hyperparameter tuned for validation performance - outputting augmented training examples. Jointly tuning weights and hyperparameters with our approach is only a few times more costly in memory and compute than standard training. |
Tasks | Data Augmentation, Hyperparameter Optimization |
Published | 2019-11-06 |
URL | https://arxiv.org/abs/1911.02590v1 |
https://arxiv.org/pdf/1911.02590v1.pdf | |
PWC | https://paperswithcode.com/paper/optimizing-millions-of-hyperparameters-by |
Repo | |
Framework | |
Two-Stream Action Recognition-Oriented Video Super-Resolution
Title | Two-Stream Action Recognition-Oriented Video Super-Resolution |
Authors | Haochen Zhang, Dong Liu, Zhiwei Xiong |
Abstract | We study the video super-resolution (SR) problem for facilitating video analytics tasks, e.g. action recognition, instead of for visual quality. The popular action recognition methods based on convolutional networks, exemplified by two-stream networks, are not directly applicable on video of low spatial resolution. This can be remedied by performing video SR prior to recognition, which motivates us to improve the SR procedure for recognition accuracy. Tailored for two-stream action recognition networks, we propose two video SR methods for the spatial and temporal streams respectively. On the one hand, we observe that regions with action are more important to recognition, and we propose an optical-flow guided weighted mean-squared-error loss for our spatial-oriented SR (SoSR) network to emphasize the reconstruction of moving objects. On the other hand, we observe that existing video SR methods incur temporal discontinuity between frames, which also worsens the recognition accuracy, and we propose a siamese network for our temporal-oriented SR (ToSR) training that emphasizes the temporal continuity between consecutive frames. We perform experiments using two state-of-the-art action recognition networks and two well-known datasets–UCF101 and HMDB51. Results demonstrate the effectiveness of our proposed SoSR and ToSR in improving recognition accuracy. |
Tasks | Optical Flow Estimation, Super-Resolution, Temporal Action Localization, Video Super-Resolution |
Published | 2019-03-13 |
URL | https://arxiv.org/abs/1903.05577v2 |
https://arxiv.org/pdf/1903.05577v2.pdf | |
PWC | https://paperswithcode.com/paper/two-stream-oriented-video-super-resolution |
Repo | |
Framework | |
Near-Optimal Online Egalitarian learning in General Sum Repeated Matrix Games
Title | Near-Optimal Online Egalitarian learning in General Sum Repeated Matrix Games |
Authors | Aristide Tossou, Christos Dimitrakakis, Jaroslaw Rzepecki, Katja Hofmann |
Abstract | We study two-player general sum repeated finite games where the rewards of each player are generated from an unknown distribution. Our aim is to find the egalitarian bargaining solution (EBS) for the repeated game, which can lead to much higher rewards than the maximin value of both players. Our most important contribution is the derivation of an algorithm that achieves simultaneously, for both players, a high-probability regret bound of order $\mathcal{O}(\sqrt[3]{\ln T}\cdot T^{2/3})$ after any $T$ rounds of play. We demonstrate that our upper bound is nearly optimal by proving a lower bound of $\Omega(T^{2/3})$ for any algorithm. |
Tasks | |
Published | 2019-06-04 |
URL | https://arxiv.org/abs/1906.01609v1 |
https://arxiv.org/pdf/1906.01609v1.pdf | |
PWC | https://paperswithcode.com/paper/near-optimal-online-egalitarian-learning-in |
Repo | |
Framework | |
Towards Understanding Gender Bias in Relation Extraction
Title | Towards Understanding Gender Bias in Relation Extraction |
Authors | Andrew Gaut, Tony Sun, Shirlyn Tang, Yuxin Huang, Jing Qian, Mai ElSherief, Jieyu Zhao, Diba Mirza, Elizabeth Belding, Kai-Wei Chang, William Yang Wang |
Abstract | Recent developments in Neural Relation Extraction (NRE) have made significant strides towards Automated Knowledge Base Construction (AKBC). While much attention has been dedicated towards improvements in accuracy, there have been no attempts in the literature to our knowledge to evaluate social biases in NRE systems. We create WikiGenderBias, a distantly supervised dataset with a human annotated test set. WikiGenderBias has sentences specifically curated to analyze gender bias in relation extraction systems. We use WikiGenderBias to evaluate systems for bias and find that NRE systems exhibit gender biased predictions and lay groundwork for future evaluation of bias in NRE. We also analyze how name anonymization, hard debiasing for word embeddings, and counterfactual data augmentation affect gender bias in predictions and performance. |
Tasks | Data Augmentation, Relation Extraction, Word Embeddings |
Published | 2019-11-09 |
URL | https://arxiv.org/abs/1911.03642v1 |
https://arxiv.org/pdf/1911.03642v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-understanding-gender-bias-in-relation |
Repo | |
Framework | |
A Generalized Markov Chain Model to Capture Dynamic Preferences and Choice Overload
Title | A Generalized Markov Chain Model to Capture Dynamic Preferences and Choice Overload |
Authors | Kumar Goutam, Vineet Goyal, Agathe Soret |
Abstract | Assortment optimization is an important problem that arises in many practical applications such as retailing and online advertising where the goal is to find a subset of products from a universe of substitutable products that maximize a seller’s expected revenue. The demand and the revenue depend on the substitution behavior of the customers that is captured by a choice model. One of the key challenges is to find the right model for the customer substitution behavior. Many parametric random utility based models have been considered in the literature to capture substitution. However, in all these models, the probability of purchase increases as we add more options to the assortment. This is not true in general and in many settings, the probability of purchase may decrease if we add more products to the assortment, referred to as the choice overload. In this paper we attempt to address these serious limitations and propose a generalization of the Markov chain based choice model considered in Blanchet et al. In particular, we handle dynamic preferences and the choice overload phenomenon using a Markovian comparison model that is a generalization of the Markovian substitution framework of Blanchet et al. The Markovian comparison framework allows us to implicitly model the search cost in the choice process and thereby, modeling both dynamic preferences as well as the choice overload phenomenon. We consider the assortment optimization problem for the special case of our generalized Markov chain model where the underlying Markov chain is rank-1 (this is a generalization of the Multinomial Logit model). We show that the assortment optimization problem under this model is NP-hard and present a fully polynomial-time approximation scheme (FPTAS) for this problem. |
Tasks | |
Published | 2019-11-15 |
URL | https://arxiv.org/abs/1911.06716v2 |
https://arxiv.org/pdf/1911.06716v2.pdf | |
PWC | https://paperswithcode.com/paper/a-generalized-markov-chain-model-to-capture |
Repo | |
Framework | |
Constrained Bayesian Optimization with Max-Value Entropy Search
Title | Constrained Bayesian Optimization with Max-Value Entropy Search |
Authors | Valerio Perrone, Iaroslav Shcherbatyi, Rodolphe Jenatton, Cedric Archambeau, Matthias Seeger |
Abstract | Bayesian optimization (BO) is a model-based approach to sequentially optimize expensive black-box functions, such as the validation error of a deep neural network with respect to its hyperparameters. In many real-world scenarios, the optimization is further subject to a priori unknown constraints. For example, training a deep network configuration may fail with an out-of-memory error when the model is too large. In this work, we focus on a general formulation of Gaussian process-based BO with continuous or binary constraints. We propose constrained Max-value Entropy Search (cMES), a novel information theoretic-based acquisition function implementing this formulation. We also revisit the validity of the factorized approximation adopted for rapid computation of the MES acquisition function, showing empirically that this leads to inaccurate results. On an extensive set of real-world constrained hyperparameter optimization problems we show that cMES compares favourably to prior work, while being simpler to implement and faster than other constrained extensions of Entropy Search. |
Tasks | Hyperparameter Optimization |
Published | 2019-10-15 |
URL | https://arxiv.org/abs/1910.07003v1 |
https://arxiv.org/pdf/1910.07003v1.pdf | |
PWC | https://paperswithcode.com/paper/constrained-bayesian-optimization-with-max |
Repo | |
Framework | |
Learning Multi-Robot Decentralized Macro-Action-Based Policies via a Centralized Q-Net
Title | Learning Multi-Robot Decentralized Macro-Action-Based Policies via a Centralized Q-Net |
Authors | Yuchen Xiao, Joshua Hoffman, Tian Xia, Christopher Amato |
Abstract | In many real-world multi-robot tasks, high-quality solutions often require a team of robots to perform asynchronous actions under decentralized control. Decentralized multi-agent reinforcement learning methods have difficulty learning decentralized policies because of the environment appearing to be non-stationary due to other agents also learning at the same time. In this paper, we address this challenge by proposing a macro-action-based decentralized multi-agent double deep recurrent Q-net (MacDec-MADDRQN) which trains each decentralized Q-net using a centralized Q-net for action selection. A generalized version of MacDec-MADDRQN with two separate training environments, called Parallel-MacDec-MADDRQN, is also presented to leverage either centralized or decentralized exploration. The advantages and the practical nature of our methods are demonstrated by achieving near-centralized results in simulation and having real robots accomplish a warehouse tool delivery task in an efficient way. |
Tasks | Multi-agent Reinforcement Learning |
Published | 2019-09-19 |
URL | https://arxiv.org/abs/1909.08776v2 |
https://arxiv.org/pdf/1909.08776v2.pdf | |
PWC | https://paperswithcode.com/paper/multi-robot-deep-reinforcement-learning-with |
Repo | |
Framework | |
Translation, Sentiment and Voices: A Computational Model to Translate and Analyse Voices from Real-Time Video Calling
Title | Translation, Sentiment and Voices: A Computational Model to Translate and Analyse Voices from Real-Time Video Calling |
Authors | Aneek Barman Roy |
Abstract | With internet quickly becoming an easy access to many, voice calling over internet is slowly gaining momentum. Individuals has been engaging in video communication across the world in different languages. The decade saw the emergence of language translation using neural networks as well. With more data being generated in audio and visual forms, there has become a need and a challenge to analyse such information for many researchers from academia and industry. The availability of video chat corpora is limited as organizations protect user privacy and ensure data security. For this reason, an audio-visual communication system (VidALL) has been developed and audio-speeches were extracted. To understand human nature while answering a video call, an analysis was conducted where polarity and vocal intensity were considered as parameters. Simultaneously, a translation model using a neural approach was developed to translate English sentences to French. Simple RNN-based and Embedded-RNN based models were used for the translation model. BLEU score and target sentence comparators were used to check sentence correctness. Embedded-RNN showed an accuracy of 88.71 percentage and predicted correct sentences. A key finding suggest that polarity is a good estimator to understand human emotion. |
Tasks | |
Published | 2019-09-28 |
URL | https://arxiv.org/abs/1909.13162v1 |
https://arxiv.org/pdf/1909.13162v1.pdf | |
PWC | https://paperswithcode.com/paper/translation-sentiment-and-voices-a |
Repo | |
Framework | |
AutoRemover: Automatic Object Removal for Autonomous Driving Videos
Title | AutoRemover: Automatic Object Removal for Autonomous Driving Videos |
Authors | Rong Zhang, Wei Li, Peng Wang, Chenye Guan, Jin Fang, Yuhang Song, Jinhui Yu, Baoquan Chen, Weiwei Xu, Ruigang Yang |
Abstract | Motivated by the need for photo-realistic simulation in autonomous driving, in this paper we present a video inpainting algorithm \emph{AutoRemover}, designed specifically for generating street-view videos without any moving objects. In our setup we have two challenges: the first is the shadow, shadows are usually unlabeled but tightly coupled with the moving objects. The second is the large ego-motion in the videos. To deal with shadows, we build up an autonomous driving shadow dataset and design a deep neural network to detect shadows automatically. To deal with large ego-motion, we take advantage of the multi-source data, in particular the 3D data, in autonomous driving. More specifically, the geometric relationship between frames is incorporated into an inpainting deep neural network to produce high-quality structurally consistent video output. Experiments show that our method outperforms other state-of-the-art (SOTA) object removal algorithms, reducing the RMSE by over $19%$. |
Tasks | Autonomous Driving, Video Inpainting |
Published | 2019-11-28 |
URL | https://arxiv.org/abs/1911.12588v1 |
https://arxiv.org/pdf/1911.12588v1.pdf | |
PWC | https://paperswithcode.com/paper/autoremover-automatic-object-removal-for |
Repo | |
Framework | |
The Price of Interpretability
Title | The Price of Interpretability |
Authors | Dimitris Bertsimas, Arthur Delarue, Patrick Jaillet, Sebastien Martin |
Abstract | When quantitative models are used to support decision-making on complex and important topics, understanding a model’s reasoning'' can increase trust in its predictions, expose hidden biases, or reduce vulnerability to adversarial attacks. However, the concept of interpretability remains loosely defined and application-specific. In this paper, we introduce a mathematical framework in which machine learning models are constructed in a sequence of interpretable steps. We show that for a variety of models, a natural choice of interpretable steps recovers standard interpretability proxies (e.g., sparsity in linear models). We then generalize these proxies to yield a parametrized family of consistent measures of model interpretability. This formal definition allows us to quantify the price’’ of interpretability, i.e., the tradeoff with predictive accuracy. We demonstrate practical algorithms to apply our framework on real and synthetic datasets. |
Tasks | Decision Making |
Published | 2019-07-08 |
URL | https://arxiv.org/abs/1907.03419v1 |
https://arxiv.org/pdf/1907.03419v1.pdf | |
PWC | https://paperswithcode.com/paper/the-price-of-interpretability |
Repo | |
Framework | |
Risk-Aware Planning by Confidence Estimation using Deep Learning-Based Perception
Title | Risk-Aware Planning by Confidence Estimation using Deep Learning-Based Perception |
Authors | Maymoonah Toubeh, Pratap Tokekar |
Abstract | This work proposes the use of Bayesian approximations of uncertainty from deep learning in a robot planner, showing that this produces more cautious actions in safety-critical scenarios. The case study investigated is motivated by a setup where an aerial robot acts as a “scout” for a ground robot. This is useful when the below area is unknown or dangerous, with applications in space exploration, military, or search-and-rescue. Images taken from the aerial view are used to provide a less obstructed map to guide the navigation of the robot on the ground. Experiments are conducted using a deep learning semantic image segmentation, followed by a path planner based on the resulting cost map, to provide an empirical analysis of the proposed method. A comparison with similar approaches is presented to portray the usefulness of certain techniques, or variations within a technique, in similar experimental settings. The method is analyzed to assess the impact of variations in the uncertainty extraction, as well as the absence of an uncertainty metric, on the overall system with the use of a defined metric which measures surprise to the planner. The analysis is performed on multiple datasets, showing a similar trend of lower surprise when uncertainty information is incorporated in the planning, given threshold values of the hyperparameters in the uncertainty extraction have been met. We find that taking uncertainty into account leads to paths that could be 18% less risky on an average. |
Tasks | Semantic Segmentation |
Published | 2019-09-13 |
URL | https://arxiv.org/abs/1910.00101v1 |
https://arxiv.org/pdf/1910.00101v1.pdf | |
PWC | https://paperswithcode.com/paper/risk-aware-planning-by-confidence-estimation |
Repo | |
Framework | |
Analysis by Adversarial Synthesis – A Novel Approach for Speech Vocoding
Title | Analysis by Adversarial Synthesis – A Novel Approach for Speech Vocoding |
Authors | Ahmed Mustafa, Arijit Biswas, Christian Bergler, Julia Schottenhamml, Andreas Maier |
Abstract | Classical parametric speech coding techniques provide a compact representation for speech signals. This affords a very low transmission rate but with a reduced perceptual quality of the reconstructed signals. Recently, autoregressive deep generative models such as WaveNet and SampleRNN have been used as speech vocoders to scale up the perceptual quality of the reconstructed signals without increasing the coding rate. However, such models suffer from a very slow signal generation mechanism due to their sample-by-sample modelling approach. In this work, we introduce a new methodology for neural speech vocoding based on generative adversarial networks (GANs). A fake speech signal is generated from a very compressed representation of the glottal excitation using conditional GANs as a deep generative model. This fake speech is then refined using the LPC parameters of the original speech signal to obtain a natural reconstruction. The reconstructed speech waveforms based on this approach show a higher perceptual quality than the classical vocoder counterparts according to subjective and objective evaluation scores for a dataset of 30 male and female speakers. Moreover, the usage of GANs enables to generate signals in one-shot compared to autoregressive generative models. This makes GANs promising for exploration to implement high-quality neural vocoders. |
Tasks | |
Published | 2019-07-01 |
URL | https://arxiv.org/abs/1907.00772v1 |
https://arxiv.org/pdf/1907.00772v1.pdf | |
PWC | https://paperswithcode.com/paper/analysis-by-adversarial-synthesis-a-novel |
Repo | |
Framework | |