January 30, 2020

2996 words 15 mins read

Paper Group ANR 350

Evaluating the Representational Hub of Language and Vision Models. Gradient Flows and Accelerated Proximal Splitting Methods. Improving Action Localization by Progressive Cross-stream Cooperation. Embedded Constrained Feature Construction for High-Energy Physics Data Classification. Predictive Coding as Stimulus Avoidance in Spiking Neural Networks …

Evaluating the Representational Hub of Language and Vision Models


Title	Evaluating the Representational Hub of Language and Vision Models
Authors	Ravi Shekhar, Ece Takmaz, Raquel Fernández, Raffaella Bernardi
Abstract	The multimodal models used in the emerging field at the intersection of computational linguistics and computer vision implement the bottom-up processing of the `Hub and Spoke’ architecture proposed in cognitive science to represent how the brain processes and combines multi-sensory inputs. In particular, the Hub is implemented as a neural network encoder. We investigate the effect on this encoder of various vision-and-language tasks proposed in the literature: visual question answering, visual reference resolution, and visually grounded dialogue. To measure the quality of the representations learned by the encoder, we use two kinds of analyses. First, we evaluate the encoder pre-trained on the different vision-and-language tasks on an existing diagnostic task designed to assess multimodal semantic understanding. Second, we carry out a battery of analyses aimed at studying how the encoder merges and exploits the two modalities. \|
Tasks	Question Answering, Visual Question Answering
Published	2019-04-12
URL	http://arxiv.org/abs/1904.06038v1
PDF	http://arxiv.org/pdf/1904.06038v1.pdf
PWC	https://paperswithcode.com/paper/evaluating-the-representational-hub-of
Repo
Framework

Gradient Flows and Accelerated Proximal Splitting Methods


Title	Gradient Flows and Accelerated Proximal Splitting Methods
Authors	Guilherme França, Daniel P. Robinson, René Vidal
Abstract	Proximal based algorithms are well-suited to nonsmooth optimization problems with important applications in signal processing, control theory, statistics and machine learning. There are essentially four basic types of proximal algorithms based on fixed-point iteration currently known: forward-backward splitting, forward-backward-forward or Tseng splitting, Douglas-Rachford, and the very recent Davis-Yin three-operator splitting. In addition, the alternating direction method of multipliers (ADMM) is also closely related. In this paper, we show that all these different methods can be derived from the gradient flow by using splitting methods for ordinary differential equations. Furthermore, applying similar discretization scheme to a particular second order differential equation results in accelerated variants of the respective algorithm, which can be of Nesterov or heavy ball type, although we treat both simultaneously. Many of the optimization algorithms we derive are new. For instance, we propose accelerated variants of Davis-Yin and two extensions of ADMM together with their accelerated variants. Interestingly, we show that (accelerated) ADMM corresponds to a rebalanced splitting which is a recent technique designed to preserve steady states of the differential equation. Overall, our results strengthen the connections between optimization and continuous dynamical systems and offer a more unified perspective on accelerated methods.
Tasks
Published	2019-08-02
URL	https://arxiv.org/abs/1908.00865v2
PDF	https://arxiv.org/pdf/1908.00865v2.pdf
PWC	https://paperswithcode.com/paper/gradient-flows-and-accelerated-proximal
Repo
Framework

Improving Action Localization by Progressive Cross-stream Cooperation


Title	Improving Action Localization by Progressive Cross-stream Cooperation
Authors	Rui Su, Wanli Ouyang, Luping Zhou, Dong Xu
Abstract	Spatio-temporal action localization consists of three levels of tasks: spatial localization, action classification, and temporal segmentation. In this work, we propose a new Progressive Cross-stream Cooperation (PCSC) framework to use both region proposals and features from one stream (i.e. Flow/RGB) to help another stream (i.e. RGB/Flow) to iteratively improve action localization results and generate better bounding boxes in an iterative fashion. Specifically, we first generate a larger set of region proposals by combining the latest region proposals from both streams, from which we can readily obtain a larger set of labelled training samples to help learn better action detection models. Second, we also propose a new message passing approach to pass information from one stream to another stream in order to learn better representations, which also leads to better action detection models. As a result, our iterative framework progressively improves action localization results at the frame level. To improve action localization results at the video level, we additionally propose a new strategy to train class-specific actionness detectors for better temporal segmentation, which can be readily learnt by focusing on “confusing” samples from the same action class. Comprehensive experiments on two benchmark datasets UCF-101-24 and J-HMDB demonstrate the effectiveness of our newly proposed approaches for spatio-temporal action localization in realistic scenarios.
Tasks	Action Classification, Action Detection, Action Localization, Spatio-Temporal Action Localization, Temporal Action Localization
Published	2019-05-28
URL	https://arxiv.org/abs/1905.11575v1
PDF	https://arxiv.org/pdf/1905.11575v1.pdf
PWC	https://paperswithcode.com/paper/improving-action-localization-by-progressive-1
Repo
Framework

Embedded Constrained Feature Construction for High-Energy Physics Data Classification


Title	Embedded Constrained Feature Construction for High-Energy Physics Data Classification
Authors	Noëlie Cherrier, Maxime Defurne, Jean-Philippe Poli, Franck Sabatié
Abstract	Before any publication, data analysis of high-energy physics experiments must be validated. This validation is granted only if a perfect understanding of the data and the analysis process is demonstrated. Therefore, physicists prefer using transparent machine learning algorithms whose performances highly rely on the suitability of the provided input features. To transform the feature space, feature construction aims at automatically generating new relevant features. Whereas most of previous works in this area perform the feature construction prior to the model training, we propose here a general framework to embed a feature construction technique adapted to the constraints of high-energy physics in the induction of tree-based models. Experiments on two high-energy physics datasets confirm that a significant gain is obtained on the classification scores, while limiting the number of built features. Since the features are built to be interpretable, the whole model is transparent and readable.
Tasks
Published	2019-12-17
URL	https://arxiv.org/abs/1912.07999v1
PDF	https://arxiv.org/pdf/1912.07999v1.pdf
PWC	https://paperswithcode.com/paper/embedded-constrained-feature-construction-for
Repo
Framework

Predictive Coding as Stimulus Avoidance in Spiking Neural Networks


Title	Predictive Coding as Stimulus Avoidance in Spiking Neural Networks
Authors	Atsushi Masumori, Lana Sinapayen, Takashi Ikegami
Abstract	Predictive coding can be regarded as a function which reduces the error between an input signal and a top-down prediction. If reducing the error is equivalent to reducing the influence of stimuli from the environment, predictive coding can be regarded as stimulation avoidance by prediction. Our previous studies showed that action and selection for stimulation avoidance emerge in spiking neural networks through spike-timing dependent plasticity (STDP). In this study, we demonstrate that spiking neural networks with random structure spontaneously learn to predict temporal sequences of stimuli based solely on STDP.
Tasks
Published	2019-11-21
URL	https://arxiv.org/abs/1911.09230v1
PDF	https://arxiv.org/pdf/1911.09230v1.pdf
PWC	https://paperswithcode.com/paper/predictive-coding-as-stimulus-avoidance-in
Repo
Framework

A New Approach for Distributed Hypothesis Testing with Extensions to Byzantine-Resilience


Title	A New Approach for Distributed Hypothesis Testing with Extensions to Byzantine-Resilience
Authors	Aritra Mitra, John A. Richards, Shreyas Sundaram
Abstract	We study a setting where a group of agents, each receiving partially informative private observations, seek to collaboratively learn the true state (among a set of hypotheses) that explains their joint observation profiles over time. To solve this problem, we propose a distributed learning rule that differs fundamentally from existing approaches, in the sense, that it does not employ any form of “belief-averaging”. Specifically, every agent maintains a local belief (on each hypothesis) that is updated in a Bayesian manner without any network influence, and an actual belief that is updated (up to normalization) as the minimum of its own local belief and the actual beliefs of its neighbors. Under minimal requirements on the signal structures of the agents and the underlying communication graph, we establish consistency of the proposed belief update rule, i.e., we show that the actual beliefs of the agents asymptotically concentrate on the true state almost surely. As one of the key benefits of our approach, we show that our learning rule can be extended to scenarios that capture misbehavior on the part of certain agents in the network, modeled via the Byzantine adversary model. In particular, we prove that each non-adversarial agent can asymptotically learn the true state of the world almost surely, under appropriate conditions on the observation model and the network topology.
Tasks
Published	2019-03-14
URL	http://arxiv.org/abs/1903.05817v1
PDF	http://arxiv.org/pdf/1903.05817v1.pdf
PWC	https://paperswithcode.com/paper/a-new-approach-for-distributed-hypothesis
Repo
Framework

Deep Neural Network Approximation for Custom Hardware: Where We’ve Been, Where We’re Going


Title	Deep Neural Network Approximation for Custom Hardware: Where We’ve Been, Where We’re Going
Authors	Erwei Wang, James J. Davis, Ruizhe Zhao, Ho-Cheung Ng, Xinyu Niu, Wayne Luk, Peter Y. K. Cheung, George A. Constantinides
Abstract	Deep neural networks have proven to be particularly effective in visual and audio recognition tasks. Existing models tend to be computationally expensive and memory intensive, however, and so methods for hardware-oriented approximation have become a hot topic. Research has shown that custom hardware-based neural network accelerators can surpass their general-purpose processor equivalents in terms of both throughput and energy efficiency. Application-tailored accelerators, when co-designed with approximation-based network training methods, transform large, dense and computationally expensive networks into small, sparse and hardware-efficient alternatives, increasing the feasibility of network deployment. In this article, we provide a comprehensive evaluation of approximation methods for high-performance network inference along with in-depth discussion of their effectiveness for custom hardware implementation. We also include proposals for future research based on a thorough analysis of current trends. This article represents the first survey providing detailed comparisons of custom hardware accelerators featuring approximation for both convolutional and recurrent neural networks, through which we hope to inspire exciting new developments in the field.
Tasks
Published	2019-01-21
URL	https://arxiv.org/abs/1901.06955v4
PDF	https://arxiv.org/pdf/1901.06955v4.pdf
PWC	https://paperswithcode.com/paper/deep-neural-network-approximation-for-custom
Repo
Framework

HR-SAR-Net: A Deep Neural Network for Urban Scene Segmentation from High-Resolution SAR Data


Title	HR-SAR-Net: A Deep Neural Network for Urban Scene Segmentation from High-Resolution SAR Data
Authors	Xiaying Wang, Lukas Cavigelli, Manuel Eggimann, Michele Magno, Luca Benini
Abstract	Synthetic aperture radar (SAR) data is becoming increasingly available to a wide range of users through commercial service providers with resolutions reaching 0.5m/px. Segmenting SAR data still requires skilled personnel, limiting the potential for large-scale use. We show that it is possible to automatically and reliably perform urban scene segmentation from next-gen resolution SAR data (0.15m/px) using deep neural networks (DNNs), achieving a pixel accuracy of 95.19% and a mean IoU of 74.67% with data collected over a region of merely 2.2km${}^2$. The presented DNN is not only effective, but is very small with only 63k parameters and computationally simple enough to achieve a throughput of around 500Mpx/s using a single GPU. We further identify that additional SAR receive antennas and data from multiple flights massively improve the segmentation accuracy. We describe a procedure for generating a high-quality segmentation ground truth from multiple inaccurate building and road annotations, which has been crucial to achieving these segmentation results.
Tasks	Scene Segmentation
Published	2019-12-10
URL	https://arxiv.org/abs/1912.04441v1
PDF	https://arxiv.org/pdf/1912.04441v1.pdf
PWC	https://paperswithcode.com/paper/hr-sar-net-a-deep-neural-network-for-urban
Repo
Framework

Practical Solutions for Machine Learning Safety in Autonomous Vehicles


Title	Practical Solutions for Machine Learning Safety in Autonomous Vehicles
Authors	Sina Mohseni, Mandar Pitale, Vasu Singh, Zhangyang Wang
Abstract	Autonomous vehicles rely on machine learning to solve challenging tasks in perception and motion planning. However, automotive software safety standards have not fully evolved to address the challenges of machine learning safety such as interpretability, verification, and performance limitations. In this paper, we review and organize practical machine learning safety techniques that can complement engineering safety for machine learning based software in autonomous vehicles. Our organization maps safety strategies to state-of-the-art machine learning techniques in order to enhance dependability and safety of machine learning algorithms. We also discuss security limitations and user experience aspects of machine learning components in autonomous vehicles.
Tasks	Autonomous Vehicles, Motion Planning
Published	2019-12-20
URL	https://arxiv.org/abs/1912.09630v1
PDF	https://arxiv.org/pdf/1912.09630v1.pdf
PWC	https://paperswithcode.com/paper/practical-solutions-for-machine-learning
Repo
Framework

A Configuration-Space Decomposition Scheme for Learning-based Collision Checking


Title	A Configuration-Space Decomposition Scheme for Learning-based Collision Checking
Authors	Yiheng Han, Wang Zhao, Jia Pan, Zipeng Ye, Ran Yi, Yong-Jin Liu
Abstract	Motion planning for robots of high degrees-of-freedom (DOFs) is an important problem in robotics with sampling-based methods in configuration space C as one popular solution. Recently, machine learning methods have been introduced into sampling-based motion planning methods, which train a classifier to distinguish collision free subspace from in-collision subspace in C. In this paper, we propose a novel configuration space decomposition method and show two nice properties resulted from this decomposition. Using these two properties, we build a composite classifier that works compatibly with previous machine learning methods by using them as the elementary classifiers. Experimental results are presented, showing that our composite classifier outperforms state-of-the-art single classifier methods by a large margin. A real application of motion planning in a multi-robot system in plant phenotyping using three UR5 robotic arms is also presented.
Tasks	Motion Planning
Published	2019-11-17
URL	https://arxiv.org/abs/1911.08581v1
PDF	https://arxiv.org/pdf/1911.08581v1.pdf
PWC	https://paperswithcode.com/paper/a-configuration-space-decomposition-scheme
Repo
Framework

A Topological “Reading” Lesson: Classification of MNIST using TDA


Title	A Topological “Reading” Lesson: Classification of MNIST using TDA
Authors	Adélie Garin, Guillaume Tauzin
Abstract	We present a way to use Topological Data Analysis (TDA) for machine learning tasks on grayscale images. We apply persistent homology to generate a wide range of topological features using a point cloud obtained from an image, its natural grayscale filtration, and different filtrations defined on the binarized image. We show that this topological machine learning pipeline can be used as a highly relevant dimensionality reduction by applying it to the MNIST digits dataset. We conduct a feature selection and study their correlations while providing an intuitive interpretation of their importance, which is relevant in both machine learning and TDA. Finally, we show that we can classify digit images while reducing the size of the feature set by a factor 5 compared to the grayscale pixel value features and maintain similar accuracy.
Tasks	Dimensionality Reduction, Feature Selection, Topological Data Analysis
Published	2019-10-18
URL	https://arxiv.org/abs/1910.08345v2
PDF	https://arxiv.org/pdf/1910.08345v2.pdf
PWC	https://paperswithcode.com/paper/a-topological-reading-lesson-classification
Repo
Framework

Multivariate extensions of isotonic regression and total variation denoising via entire monotonicity and Hardy-Krause variation


Title	Multivariate extensions of isotonic regression and total variation denoising via entire monotonicity and Hardy-Krause variation
Authors	Billy Fang, Adityanand Guntuboyina, Bodhisattva Sen
Abstract	We consider the problem of nonparametric regression when the covariate is $d$-dimensional, where $d \geq 1$. In this paper we introduce and study two nonparametric least squares estimators (LSEs) in this setting—the entirely monotonic LSE and the constrained Hardy-Krause variation LSE. We show that these two LSEs are natural generalizations of univariate isotonic regression and univariate total variation denoising, respectively, to multiple dimensions. We discuss the characterization and computation of these two LSEs obtained from $n$ data points. We provide a detailed study of their risk properties under the squared error loss and fixed uniform lattice design. We show that the finite sample risk of these LSEs is always bounded from above by $n^{-2/3}$ modulo logarithmic factors depending on $d$; thus these nonparametric LSEs avoid the curse of dimensionality to some extent. We also prove nearly matching minimax lower bounds. Further, we illustrate that these LSEs are particularly useful in fitting rectangular piecewise constant functions. Specifically, we show that the risk of the entirely monotonic LSE is almost parametric (at most $1/n$ up to logarithmic factors) when the true function is well-approximable by a rectangular piecewise constant entirely monotone function with not too many constant pieces. A similar result is also shown to hold for the constrained Hardy-Krause variation LSE for a simple subclass of rectangular piecewise constant functions. We believe that the proposed LSEs yield a novel approach to estimating multivariate functions using convex optimization that avoid the curse of dimensionality to some extent.
Tasks	Denoising
Published	2019-03-04
URL	https://arxiv.org/abs/1903.01395v2
PDF	https://arxiv.org/pdf/1903.01395v2.pdf
PWC	https://paperswithcode.com/paper/multivariate-extensions-of-isotonic
Repo
Framework

FA-Harris: A Fast and Asynchronous Corner Detector for Event Cameras


Title	FA-Harris: A Fast and Asynchronous Corner Detector for Event Cameras
Authors	Ruoxiang Li, Dianxi Shi, Yongjun Zhang, Kaiyue Li, Ruihao Li
Abstract	Recently, the emerging bio-inspired event cameras have demonstrated potentials for a wide range of robotic applications in dynamic environments. In this paper, we propose a novel fast and asynchronous event-based corner detection method which is called FA-Harris. FA-Harris consists of several components, including an event filter, a Global Surface of Active Events (G-SAE) maintaining unit, a corner candidate selecting unit, and a corner candidate refining unit. The proposed G-SAE maintenance algorithm and corner candidate selection algorithm greatly enhance the real-time performance for corner detection, while the corner candidate refinement algorithm maintains the accuracy of performance by using an improved event-based Harris detector. Additionally, FA-Harris does not require artificially synthesized event-frames and can operate on asynchronous events directly. We implement the proposed method in C++ and evaluate it on public Event Camera Datasets. The results show that our method achieves approximately 8x speed-up when compared with previously reported event-based Harris detector, and with no compromise on the accuracy of performance.
Tasks
Published	2019-06-26
URL	https://arxiv.org/abs/1906.10925v4
PDF	https://arxiv.org/pdf/1906.10925v4.pdf
PWC	https://paperswithcode.com/paper/fa-harris-a-fast-and-asynchronous-corner
Repo
Framework

Deep Autoencoders with Value-at-Risk Thresholding for Unsupervised Anomaly Detection


Title	Deep Autoencoders with Value-at-Risk Thresholding for Unsupervised Anomaly Detection
Authors	Albert Akhriev, Jakub Marecek
Abstract	Many real-world monitoring and surveillance applications require non-trivial anomaly detection to be run in the streaming model. We consider an incremental-learning approach, wherein a deep-autoencoding (DAE) model of what is normal is trained and used to detect anomalies at the same time. In the detection of anomalies, we utilise a novel thresholding mechanism, based on value at risk (VaR). We compare the resulting convolutional neural network (CNN) against a number of subspace methods, and present results on changedetection net.
Tasks	Anomaly Detection, Unsupervised Anomaly Detection
Published	2019-12-09
URL	https://arxiv.org/abs/1912.04418v1
PDF	https://arxiv.org/pdf/1912.04418v1.pdf
PWC	https://paperswithcode.com/paper/deep-autoencoders-with-value-at-risk
Repo
Framework

Fast Spatially-Varying Indoor Lighting Estimation


Title	Fast Spatially-Varying Indoor Lighting Estimation
Authors	Mathieu Garon, Kalyan Sunkavalli, Sunil Hadap, Nathan Carr, Jean-François Lalonde
Abstract	We propose a real-time method to estimate spatiallyvarying indoor lighting from a single RGB image. Given an image and a 2D location in that image, our CNN estimates a 5th order spherical harmonic representation of the lighting at the given location in less than 20ms on a laptop mobile graphics card. While existing approaches estimate a single, global lighting representation or require depth as input, our method reasons about local lighting without requiring any geometry information. We demonstrate, through quantitative experiments including a user study, that our results achieve lower lighting estimation errors and are preferred by users over the state-of-the-art. Our approach can be used directly for augmented reality applications, where a virtual object is relit realistically at any position in the scene in real-time.
Tasks
Published	2019-06-10
URL	https://arxiv.org/abs/1906.03799v1
PDF	https://arxiv.org/pdf/1906.03799v1.pdf
PWC	https://paperswithcode.com/paper/fast-spatially-varying-indoor-lighting-1
Repo
Framework