April 2, 2020

3713 words 18 mins read

Paper Group ANR 248

Paper Group ANR 248

Bayesian Neural Architecture Search using A Training-Free Performance Metric. On the Sample Complexity of Adversarial Multi-Source PAC Learning. Efficiently Learning and Sampling Interventional Distributions from Observations. From Kinematics To Dynamics: Estimating Center of Pressure and Base of Support from Video Frames of Human Motion. Zeroth-Or …

Bayesian Neural Architecture Search using A Training-Free Performance Metric

Title Bayesian Neural Architecture Search using A Training-Free Performance Metric
Authors Andrés Camero, Hao Wang, Enrique Alba, Thomas Bäck
Abstract Recurrent neural networks (RNNs) are a powerful approach for time series prediction. However, their performance is strongly affected by their architecture and hyperparameter settings. The architecture optimization of RNNs is a time-consuming task, where the search space is typically a mixture of real, integer and categorical values. To allow for shrinking and expanding the size of the network, the representation of architectures often has a variable length. In this paper, we propose to tackle the architecture optimization problem with a variant of the Bayesian Optimization (BO) algorithm. To reduce the evaluation time of candidate architectures the Mean Absolute Error Random Sampling (MRS), a training-free method to estimate the network performance, is adopted as the objective function for BO. Also, we propose three fixed-length encoding schemes to cope with the variable-length architecture representation. The result is a new perspective on accurate and efficient design of RNNs, that we validate on three problems. Our findings show that 1) the BO algorithm can explore different network architectures using the proposed encoding schemes and successfully designs well-performing architectures, and 2) the optimization time is significantly reduced by using MRS, without compromising the performance as compared to the architectures obtained from the actual training procedure.
Tasks Neural Architecture Search, Time Series, Time Series Prediction
Published 2020-01-29
URL https://arxiv.org/abs/2001.10726v1
PDF https://arxiv.org/pdf/2001.10726v1.pdf
PWC https://paperswithcode.com/paper/bayesian-neural-architecture-search-using-a
Repo
Framework

On the Sample Complexity of Adversarial Multi-Source PAC Learning

Title On the Sample Complexity of Adversarial Multi-Source PAC Learning
Authors Nikola Konstantinov, Elias Frantar, Dan Alistarh, Christoph H. Lampert
Abstract We study the problem of learning from multiple untrusted data sources, a scenario of increasing practical relevance given the recent emergence of crowdsourcing and collaborative learning paradigms. Specifically, we analyze the situation in which a learning system obtains datasets from multiple sources, some of which might be biased or even adversarially perturbed. It is known that in the single-source case, an adversary with the power to corrupt a fixed fraction of the training data can prevent PAC-learnability, that is, even in the limit of infinitely much training data, no learning system can approach the optimal test error. In this work we show that, surprisingly, the same is not true in the multi-source setting, where the adversary can arbitrarily corrupt a fixed fraction of the data sources. Our main results are a generalization bound that provides finite-sample guarantees for this learning setting, as well as corresponding lower bounds. Besides establishing PAC-learnability our results also show that in a cooperative learning setting sharing data with other parties has provable benefits, even if some participants are malicious.
Tasks
Published 2020-02-24
URL https://arxiv.org/abs/2002.10384v1
PDF https://arxiv.org/pdf/2002.10384v1.pdf
PWC https://paperswithcode.com/paper/on-the-sample-complexity-of-adversarial-multi
Repo
Framework

Efficiently Learning and Sampling Interventional Distributions from Observations

Title Efficiently Learning and Sampling Interventional Distributions from Observations
Authors Arnab Bhattacharyya, Sutanu Gayen, Saravanan Kandasamy, Ashwin Maran, N. V. Vinodchandran
Abstract We study the problem of efficiently estimating the effect of an intervention on a single variable using observational samples in a causal Bayesian network. Our goal is to give algorithms that are efficient in both time and sample complexity in a non-parametric setting. Tian and Pearl (AAAI `02) have exactly characterized the class of causal graphs for which causal effects of atomic interventions can be identified from observational data. We make their result quantitative. Suppose P is a causal model on a set V of n observable variables with respect to a given causal graph G with observable distribution $P$. Let $P_x$ denote the interventional distribution over the observables with respect to an intervention of a designated variable X with x. We show that assuming that G has bounded in-degree, bounded c-components, and that the observational distribution is identifiable and satisfies certain strong positivity condition: 1. [Evaluation] There is an algorithm that outputs with probability $2/3$ an evaluator for a distribution $P'$ that satisfies $d_{tv}(P_x, P’) \leq \epsilon$ using $m=\tilde{O}(n\epsilon^{-2})$ samples from $P$ and $O(mn)$ time. The evaluator can return in $O(n)$ time the probability $P’(v)$ for any assignment $v$ to $V$. 2. [Generation] There is an algorithm that outputs with probability $2/3$ a sampler for a distribution $\hat{P}$ that satisfies $d_{tv}(P_x, \hat{P}) \leq \epsilon$ using $m=\tilde{O}(n\epsilon^{-2})$ samples from $P$ and $O(mn)$ time. The sampler returns an iid sample from $\hat{P}$ with probability $1-\delta$ in $O(n\epsilon^{-1} \log\delta^{-1})$ time. We extend our techniques to estimate marginals $P_x_Y$ over a given $Y \subset V$ of interest. We also show lower bounds for the sample complexity showing that our sample complexity has optimal dependence on the parameters n and $\epsilon$ as well as the strong positivity parameter. |
Tasks
Published 2020-02-11
URL https://arxiv.org/abs/2002.04232v1
PDF https://arxiv.org/pdf/2002.04232v1.pdf
PWC https://paperswithcode.com/paper/efficiently-learning-and-sampling
Repo
Framework

From Kinematics To Dynamics: Estimating Center of Pressure and Base of Support from Video Frames of Human Motion

Title From Kinematics To Dynamics: Estimating Center of Pressure and Base of Support from Video Frames of Human Motion
Authors Jesse Scott, Christopher Funk, Bharadwaj Ravichandran, John H. Challis, Robert T. Collins, Yanxi Liu
Abstract To gain an understanding of the relation between a given human pose image and the corresponding physical foot pressure of the human subject, we propose and validate two end-to-end deep learning architectures, PressNet and PressNet-Simple, to regress foot pressure heatmaps (dynamics) from 2D human pose (kinematics) derived from a video frame. A unique video and foot pressure data set of 813,050 synchronized pairs, composed of 5-minute long choreographed Taiji movement sequences of 6 subjects, is collected and used for leaving-one-subject-out cross validation. Our initial experimental results demonstrate reliable and repeatable foot pressure prediction from a single image, setting the first baseline for such a complex cross modality mapping problem in computer vision. Furthermore, we compute and quantitatively validate the Center of Pressure (CoP) and Base of Support (BoS) from predicted foot pressure distribution, obtaining key components in pose stability analysis from images with potential applications in kinesiology, medicine, sports and robotics.
Tasks
Published 2020-01-02
URL https://arxiv.org/abs/2001.00657v1
PDF https://arxiv.org/pdf/2001.00657v1.pdf
PWC https://paperswithcode.com/paper/from-kinematics-to-dynamics-estimating-center
Repo
Framework

Zeroth-Order Regularized Optimization (ZORO): Approximately Sparse Gradients and Adaptive Sampling

Title Zeroth-Order Regularized Optimization (ZORO): Approximately Sparse Gradients and Adaptive Sampling
Authors HanQin Cai, Daniel Mckenzie, Wotao Yin, Zhenliang Zhang
Abstract We consider the problem of minimizing a high-dimensional objective function, which may include a regularization term, using (possibly noisy) evaluations of the function. Such optimization is also called derivative-free, zeroth-order, or black-box optimization. We propose a new $\textbf{Z}$eroth-$\textbf{O}$rder $\textbf{R}$egularized $\textbf{O}$ptimization method, dubbed ZORO. When the underlying gradient is approximately sparse at an iterate, ZORO needs very few objective function evaluations to obtain a new iterate that decreases the objective function. We achieve this with an adaptive, randomized gradient estimator, followed by an inexact proximal-gradient scheme. Under a novel approximately sparse gradient assumption and various different convex settings, we show the (theoretical and empirical) convergence rate of ZORO is only logarithmically dependent on the problem dimension. Numerical experiments show that ZORO outperforms the existing methods with similar assumptions, on both synthetic and real datasets.
Tasks
Published 2020-03-29
URL https://arxiv.org/abs/2003.13001v1
PDF https://arxiv.org/pdf/2003.13001v1.pdf
PWC https://paperswithcode.com/paper/zeroth-order-regularized-optimization-zoro
Repo
Framework

DriverMHG: A Multi-Modal Dataset for Dynamic Recognition of Driver Micro Hand Gestures and a Real-Time Recognition Framework

Title DriverMHG: A Multi-Modal Dataset for Dynamic Recognition of Driver Micro Hand Gestures and a Real-Time Recognition Framework
Authors Okan Köpüklü, Thomas Ledwon, Yao Rong, Neslihan Kose, Gerhard Rigoll
Abstract The use of hand gestures provides a natural alternative to cumbersome interface devices for Human-Computer Interaction (HCI) systems. However, real-time recognition of dynamic micro hand gestures from video streams is challenging for in-vehicle scenarios since (i) the gestures should be performed naturally without distracting the driver, (ii) micro hand gestures occur within very short time intervals at spatially constrained areas, (iii) the performed gesture should be recognized only once, and (iv) the entire architecture should be designed lightweight as it will be deployed to an embedded system. In this work, we propose an HCI system for dynamic recognition of driver micro hand gestures, which can have a crucial impact in automotive sector especially for safety related issues. For this purpose, we initially collected a dataset named Driver Micro Hand Gestures (DriverMHG), which consists of RGB, depth and infrared modalities. The challenges for dynamic recognition of micro hand gestures have been addressed by proposing a lightweight convolutional neural network (CNN) based architecture which operates online efficiently with a sliding window approach. For the CNN model, several 3-dimensional resource efficient networks are applied and their performances are analyzed. Online recognition of gestures has been performed with 3D-MobileNetV2, which provided the best offline accuracy among the applied networks with similar computational complexities. The final architecture is deployed on a driver simulator operating in real-time. We make DriverMHG dataset and our source code publicly available.
Tasks
Published 2020-03-02
URL https://arxiv.org/abs/2003.00951v1
PDF https://arxiv.org/pdf/2003.00951v1.pdf
PWC https://paperswithcode.com/paper/drivermhg-a-multi-modal-dataset-for-dynamic
Repo
Framework

Hierarchical Modeling of Multidimensional Data in Regularly Decomposed Spaces: Synthesis and Perspective

Title Hierarchical Modeling of Multidimensional Data in Regularly Decomposed Spaces: Synthesis and Perspective
Authors Olivier Guye
Abstract This fourth and last tome is focusing on describing the envisioned works for a project that has been presented in the preceding tome. It is about a new approach dedicated to the coding of still and moving pictures, trying to bridge the MPEG-4 and MPEG-7 standard bodies. The aim of this project is to define the principles of self-descriptive video coding. In order to establish them, the document is composed in five chapters that describe the various envisioned techniques for developing such a new approach in visual coding: - image segmentation, - computation of visual descriptors, - computation of perceptual groupings, - building of visual dictionaries, - picture and video coding. Based on the techniques of multiresolution computing, it is proposed to develop an image segmentation made from piecewise regular components, to compute attributes on the frame and the rendering of so produced shapes, independently to the geometric transforms that can occur in the image plane, and to gather them into perceptual groupings so as to be able in performing recognition of partially hidden patterns. Due to vector quantization of shapes frame and rendering, it will appear that simple shapes may be compared to a visual alphabet and that complex shapes then become words written using this alphabet and be recorded into a dictionary. With the help of a nearest neighbour scanning applied on the picture shapes, the self-descriptive coding will then generate a sentence made from words written using the simple shape alphabet.
Tasks Quantization, Semantic Segmentation
Published 2020-01-13
URL https://arxiv.org/abs/2001.04322v1
PDF https://arxiv.org/pdf/2001.04322v1.pdf
PWC https://paperswithcode.com/paper/hierarchical-modeling-of-multidimensional-2
Repo
Framework

Capsule GAN Using Capsule Network for Generator Architecture

Title Capsule GAN Using Capsule Network for Generator Architecture
Authors Kanako Marusaki, Hiroshi Watanabe
Abstract This paper presents Capsule GAN, a Generative adversarial network using Capsule Network not only in the discriminator but also in the generator. Recently, Generative adversarial networks (GANs) has been intensively studied. However, generating images by GANs is difficult. Therefore, GANs sometimes generate poor quality images. These GANs use convolutional neural networks (CNNs). However, CNNs have the defect that the relational information between features of the image may be lost. Capsule Network, proposed by Hinton in 2017, overcomes the defect of CNNs. Capsule GAN reported previously uses Capsule Network in the discriminator. However, instead of using Capsule Network, Capsule GAN reported in previous studies uses CNNs in generator architecture like DCGAN. This paper introduces two approaches to use Capsule Network in the generator. One is to use DigitCaps layer from the discriminator as the input to the generator. DigitCaps layer is the output layer of Capsule Network. It has the features of the input images of the discriminator. The other is to use the reverse operation of recognition process in Capsule Network in the generator. We compare Capsule GAN proposed in this paper with conventional GAN using CNN and Capsule GAN which uses Capsule Network in the discriminator only. The datasets are MNIST, Fashion-MNIST and color images. We show that Capsule GAN outperforms the GAN using CNN and the GAN using Capsule Network in the discriminator only. The architecture of Capsule GAN proposed in this paper is a basic architecture using Capsule Network. Therefore, we can apply the existing improvement techniques for GANs to Capsule GAN.
Tasks
Published 2020-03-18
URL https://arxiv.org/abs/2003.08047v1
PDF https://arxiv.org/pdf/2003.08047v1.pdf
PWC https://paperswithcode.com/paper/capsule-gan-using-capsule-network-for
Repo
Framework

Looking Enhances Listening: Recovering Missing Speech Using Images

Title Looking Enhances Listening: Recovering Missing Speech Using Images
Authors Tejas Srinivasan, Ramon Sanabria, Florian Metze
Abstract Speech is understood better by using visual context; for this reason, there have been many attempts to use images to adapt automatic speech recognition (ASR) systems. Current work, however, has shown that visually adapted ASR models only use images as a regularization signal, while completely ignoring their semantic content. In this paper, we present a set of experiments where we show the utility of the visual modality under noisy conditions. Our results show that multimodal ASR models can recover words which are masked in the input acoustic signal, by grounding its transcriptions using the visual representations. We observe that integrating visual context can result in up to 35% relative improvement in masked word recovery. These results demonstrate that end-to-end multimodal ASR systems can become more robust to noise by leveraging the visual context.
Tasks Speech Recognition
Published 2020-02-13
URL https://arxiv.org/abs/2002.05639v1
PDF https://arxiv.org/pdf/2002.05639v1.pdf
PWC https://paperswithcode.com/paper/looking-enhances-listening-recovering-missing
Repo
Framework

Training Efficient Network Architecture and Weights via Direct Sparsity Control

Title Training Efficient Network Architecture and Weights via Direct Sparsity Control
Authors Yangzi Guo, Yiyuan She, Adrian Barbu
Abstract Artificial neural networks (ANNs) especially deep convolutional networks are very popular these days and have been proved to successfully offer quite reliable solutions to many vision problems. However, the use of deep neural networks is widely impeded by their intensive computational and memory cost. In this paper, we propose a novel efficient network pruning method that is suitable for both non-structured and structured channel-level pruning. Our proposed method tightens a sparsity constraint by gradually removing network parameters or filter channels based on a criterion and a schedule. The attractive fact that the network size keeps dropping throughout the iterations makes it suitable for the pruning of any untrained or pre-trained network. Because our method uses a L0 constraint instead of the L1 penalty, it does not introduce any bias in the training parameters or filter channels. Furthermore, the L0 constraint makes it easy to directly specify the desired sparsity level during the network pruning process. Finally, experimental validation on synthetic and real datasets both show that the proposed method obtains better or competitive performance compared to other states of art network pruning methods.
Tasks Network Pruning
Published 2020-02-11
URL https://arxiv.org/abs/2002.04301v1
PDF https://arxiv.org/pdf/2002.04301v1.pdf
PWC https://paperswithcode.com/paper/training-efficient-network-architecture-and
Repo
Framework

Investigating the Importance of Shape Features, Color Constancy, Color Spaces and Similarity Measures in Open-Ended 3D Object Recognition

Title Investigating the Importance of Shape Features, Color Constancy, Color Spaces and Similarity Measures in Open-Ended 3D Object Recognition
Authors S. Hamidreza Kasaei, Maryam Ghorbani, Jits Schilperoort, Wessel van der Rest
Abstract Despite the recent success of state-of-the-art 3D object recognition approaches, service robots are frequently failed to recognize many objects in real human-centric environments. For these robots, object recognition is a challenging task due to the high demand for accurate and real-time response under changing and unpredictable environmental conditions. Most of the recent approaches use either the shape information only and ignore the role of color information or vice versa. Furthermore, they mainly utilize the $L_n$ Minkowski family functions to measure the similarity of two object views, while there are various distance measures that are applicable to compare two object views. In this paper, we explore the importance of shape information, color constancy, color spaces, and various similarity measures in open-ended 3D object recognition. Towards this goal, we extensively evaluate the performance of object recognition approaches in three different configurations, including \textit{color-only}, \textit{shape-only}, and \textit{ combinations of color and shape}, in both offline and online settings. Experimental results concerning scalability, memory usage, and object recognition performance show that all of the \textit{combinations of color and shape} yields significant improvements over the \textit{shape-only} and \textit{color-only} approaches. The underlying reason is that color information is an important feature to distinguish objects that have very similar geometric properties with different colors and vice versa. Moreover, by combining color and shape information, we demonstrate that the robot can learn new object categories from very few training examples in a real-world setting.
Tasks 3D Object Recognition, Color Constancy, Object Recognition
Published 2020-02-10
URL https://arxiv.org/abs/2002.03779v1
PDF https://arxiv.org/pdf/2002.03779v1.pdf
PWC https://paperswithcode.com/paper/investigating-the-importance-of-shape
Repo
Framework

Using Counterfactual Reasoning and Reinforcement Learning for Decision-Making in Autonomous Driving

Title Using Counterfactual Reasoning and Reinforcement Learning for Decision-Making in Autonomous Driving
Authors Patrick Hart, Alois Knoll
Abstract In decision-making for autonomous vehicles, we need to predict other vehicle’s behaviors or learn their behavior implicitly using machine learning. However, often the predictions and learned models have errors or might be wrong altogether which can lead to dangerous situations. Therefore, decision-making algorithms should consider counterfactual reasoning such as: what would happen if the other agents will behave in a certain way? The approach we present in this paper is two-fold: First, during training, we randomly select behavior models from a behavior model pool and assign them to the other vehicles in the scenario, such as more passive or aggressive behavior models. Second, during the application, we derive several virtual worlds from the actual world that have the same initial state. In each of these worlds, we also assign behavior models from the behavior model pool to others. We then evolve these virtual worlds for a defined time-horizon. This enables us to apply counterfactual reasoning by asking what would happen if the actual world evolves as in the virtual world. In uncertain environments, this makes it possible to generate more probable risk estimates and, thus, to enable safer decision-making. We conduct studies using a lane-change scenario that shows the advantages of counterfactual reasoning using learned policies and virtual worlds to estimate their risk and performance.
Tasks Autonomous Driving, Autonomous Vehicles, Decision Making
Published 2020-03-20
URL https://arxiv.org/abs/2003.11919v1
PDF https://arxiv.org/pdf/2003.11919v1.pdf
PWC https://paperswithcode.com/paper/using-counterfactual-reasoning-and
Repo
Framework

Stable Sparse Subspace Embedding for Dimensionality Reduction

Title Stable Sparse Subspace Embedding for Dimensionality Reduction
Authors Li Chen, Shuizheng Zhou, Jiajun Ma
Abstract Sparse random projection (RP) is a popular tool for dimensionality reduction that shows promising performance with low computational complexity. However, in the existing sparse RP matrices, the positions of non-zero entries are usually randomly selected. Although they adopt uniform sampling with replacement, due to large sampling variance, the number of non-zeros is uneven among rows of the projection matrix which is generated in one trial, and more data information may be lost after dimension reduction. To break this bottleneck, based on random sampling without replacement in statistics, this paper builds a stable sparse subspace embedded matrix (S-SSE), in which non-zeros are uniformly distributed. It is proved that the S-SSE is stabler than the existing matrix, and it can maintain Euclidean distance between points well after dimension reduction. Our empirical studies corroborate our theoretical findings and demonstrate that our approach can indeed achieve satisfactory performance.
Tasks Dimensionality Reduction
Published 2020-02-07
URL https://arxiv.org/abs/2002.02844v1
PDF https://arxiv.org/pdf/2002.02844v1.pdf
PWC https://paperswithcode.com/paper/stable-sparse-subspace-embedding-for
Repo
Framework

Application and Assessment of Deep Learning for the Generation of Potential NMDA Receptor Antagonists

Title Application and Assessment of Deep Learning for the Generation of Potential NMDA Receptor Antagonists
Authors Katherine J. Schultz, Sean M. Colby, Yasemin Yesiltepe, Jamie R. Nuñez, Monee Y. McGrady, Ryan R. Renslow
Abstract Uncompetitive antagonists of the N-methyl D-aspartate receptor (NMDAR) have demonstrated therapeutic benefit in the treatment of neurological diseases such as Parkinson’s and Alzheimer’s, but some also cause dissociative effects that have led to the synthesis of illicit drugs. The ability to generate NMDAR antagonists in silico is therefore desirable both for new medication development and for preempting and identifying new designer drugs. Recently, generative deep learning models have been applied to de novo drug design as a means to expand the amount of chemical space that can be explored for potential drug-like compounds. In this study, we assess the application of a generative model to the NMDAR to achieve two primary objectives: (i) the creation and release of a comprehensive library of experimentally validated NMDAR phencyclidine (PCP) site antagonists to assist the drug discovery community and (ii) an analysis of both the advantages conferred by applying such generative artificial intelligence models to drug design and the current limitations of the approach. We apply, and provide source code for, a variety of ligand- and structure-based assessment techniques used in standard drug discovery analyses to the deep learning-generated compounds. We present twelve candidate antagonists that are not available in existing chemical databases to provide an example of what this type of workflow can achieve, though synthesis and experimental validation of these compounds is still required.
Tasks Drug Discovery
Published 2020-03-31
URL https://arxiv.org/abs/2003.14360v1
PDF https://arxiv.org/pdf/2003.14360v1.pdf
PWC https://paperswithcode.com/paper/application-and-assessment-of-deep-learning
Repo
Framework

Baryon acoustic oscillations reconstruction using convolutional neural networks

Title Baryon acoustic oscillations reconstruction using convolutional neural networks
Authors Tian-Xiang Mao, Jie Wang, Baojiu Li, Yan-Chuan Cai, Bridget Falck, Mark Neyrinck, Alex Szalay
Abstract Here we propose a new scheme to reconstruct the baryon acoustic oscillations (BAO) signal, with key cosmological information, based on deep convolutional neural networks. After training the network with almost no fine-tuning, in the test set, the network recovers large-scale modes accurately: the correlation coefficient between the ground truth and recovered initial conditions still reach $90%$ at $k \leq 0.2~ h\mathrm{Mpc}^{-1}$, which significantly improves the BAO signal-to-noise ratio until the scale $k=0.4~ h\mathrm{Mpc}^{-1}$. Furthermore, our scheme is independent of the survey boundary since it reconstructs initial condition based on local density distribution in configuration space, which means that we can gain more information from the whole survey space. Finally, we found our trained network is not sensitive to the cosmological parameters and works very well in those cosmologies close to that of our training set. This new scheme will possibly help us dig out more information from the current, on-going and future galaxy surveys.
Tasks
Published 2020-02-24
URL https://arxiv.org/abs/2002.10218v1
PDF https://arxiv.org/pdf/2002.10218v1.pdf
PWC https://paperswithcode.com/paper/baryon-acoustic-oscillations-reconstruction
Repo
Framework
comments powered by Disqus