Paper Group ANR 350
Evaluating the Representational Hub of Language and Vision Models. Gradient Flows and Accelerated Proximal Splitting Methods. Improving Action Localization by Progressive Cross-stream Cooperation. Embedded Constrained Feature Construction for High-Energy Physics Data Classification. Predictive Coding as Stimulus Avoidance in Spiking Neural Networks …
Evaluating the Representational Hub of Language and Vision Models
Title | Evaluating the Representational Hub of Language and Vision Models |
Authors | Ravi Shekhar, Ece Takmaz, Raquel Fernández, Raffaella Bernardi |
Abstract | The multimodal models used in the emerging field at the intersection of computational linguistics and computer vision implement the bottom-up processing of the `Hub and Spoke’ architecture proposed in cognitive science to represent how the brain processes and combines multi-sensory inputs. In particular, the Hub is implemented as a neural network encoder. We investigate the effect on this encoder of various vision-and-language tasks proposed in the literature: visual question answering, visual reference resolution, and visually grounded dialogue. To measure the quality of the representations learned by the encoder, we use two kinds of analyses. First, we evaluate the encoder pre-trained on the different vision-and-language tasks on an existing diagnostic task designed to assess multimodal semantic understanding. Second, we carry out a battery of analyses aimed at studying how the encoder merges and exploits the two modalities. | |
Tasks | Question Answering, Visual Question Answering |
Published | 2019-04-12 |
URL | http://arxiv.org/abs/1904.06038v1 |
http://arxiv.org/pdf/1904.06038v1.pdf | |
PWC | https://paperswithcode.com/paper/evaluating-the-representational-hub-of |
Repo | |
Framework | |
Gradient Flows and Accelerated Proximal Splitting Methods
Title | Gradient Flows and Accelerated Proximal Splitting Methods |
Authors | Guilherme França, Daniel P. Robinson, René Vidal |
Abstract | Proximal based algorithms are well-suited to nonsmooth optimization problems with important applications in signal processing, control theory, statistics and machine learning. There are essentially four basic types of proximal algorithms based on fixed-point iteration currently known: forward-backward splitting, forward-backward-forward or Tseng splitting, Douglas-Rachford, and the very recent Davis-Yin three-operator splitting. In addition, the alternating direction method of multipliers (ADMM) is also closely related. In this paper, we show that all these different methods can be derived from the gradient flow by using splitting methods for ordinary differential equations. Furthermore, applying similar discretization scheme to a particular second order differential equation results in accelerated variants of the respective algorithm, which can be of Nesterov or heavy ball type, although we treat both simultaneously. Many of the optimization algorithms we derive are new. For instance, we propose accelerated variants of Davis-Yin and two extensions of ADMM together with their accelerated variants. Interestingly, we show that (accelerated) ADMM corresponds to a rebalanced splitting which is a recent technique designed to preserve steady states of the differential equation. Overall, our results strengthen the connections between optimization and continuous dynamical systems and offer a more unified perspective on accelerated methods. |
Tasks | |
Published | 2019-08-02 |
URL | https://arxiv.org/abs/1908.00865v2 |
https://arxiv.org/pdf/1908.00865v2.pdf | |
PWC | https://paperswithcode.com/paper/gradient-flows-and-accelerated-proximal |
Repo | |
Framework | |
Improving Action Localization by Progressive Cross-stream Cooperation
Title | Improving Action Localization by Progressive Cross-stream Cooperation |
Authors | Rui Su, Wanli Ouyang, Luping Zhou, Dong Xu |
Abstract | Spatio-temporal action localization consists of three levels of tasks: spatial localization, action classification, and temporal segmentation. In this work, we propose a new Progressive Cross-stream Cooperation (PCSC) framework to use both region proposals and features from one stream (i.e. Flow/RGB) to help another stream (i.e. RGB/Flow) to iteratively improve action localization results and generate better bounding boxes in an iterative fashion. Specifically, we first generate a larger set of region proposals by combining the latest region proposals from both streams, from which we can readily obtain a larger set of labelled training samples to help learn better action detection models. Second, we also propose a new message passing approach to pass information from one stream to another stream in order to learn better representations, which also leads to better action detection models. As a result, our iterative framework progressively improves action localization results at the frame level. To improve action localization results at the video level, we additionally propose a new strategy to train class-specific actionness detectors for better temporal segmentation, which can be readily learnt by focusing on “confusing” samples from the same action class. Comprehensive experiments on two benchmark datasets UCF-101-24 and J-HMDB demonstrate the effectiveness of our newly proposed approaches for spatio-temporal action localization in realistic scenarios. |
Tasks | Action Classification, Action Detection, Action Localization, Spatio-Temporal Action Localization, Temporal Action Localization |
Published | 2019-05-28 |
URL | https://arxiv.org/abs/1905.11575v1 |
https://arxiv.org/pdf/1905.11575v1.pdf | |
PWC | https://paperswithcode.com/paper/improving-action-localization-by-progressive-1 |
Repo | |
Framework | |
Embedded Constrained Feature Construction for High-Energy Physics Data Classification
Title | Embedded Constrained Feature Construction for High-Energy Physics Data Classification |
Authors | Noëlie Cherrier, Maxime Defurne, Jean-Philippe Poli, Franck Sabatié |
Abstract | Before any publication, data analysis of high-energy physics experiments must be validated. This validation is granted only if a perfect understanding of the data and the analysis process is demonstrated. Therefore, physicists prefer using transparent machine learning algorithms whose performances highly rely on the suitability of the provided input features. To transform the feature space, feature construction aims at automatically generating new relevant features. Whereas most of previous works in this area perform the feature construction prior to the model training, we propose here a general framework to embed a feature construction technique adapted to the constraints of high-energy physics in the induction of tree-based models. Experiments on two high-energy physics datasets confirm that a significant gain is obtained on the classification scores, while limiting the number of built features. Since the features are built to be interpretable, the whole model is transparent and readable. |
Tasks | |
Published | 2019-12-17 |
URL | https://arxiv.org/abs/1912.07999v1 |
https://arxiv.org/pdf/1912.07999v1.pdf | |
PWC | https://paperswithcode.com/paper/embedded-constrained-feature-construction-for |
Repo | |
Framework | |
Predictive Coding as Stimulus Avoidance in Spiking Neural Networks
Title | Predictive Coding as Stimulus Avoidance in Spiking Neural Networks |
Authors | Atsushi Masumori, Lana Sinapayen, Takashi Ikegami |
Abstract | Predictive coding can be regarded as a function which reduces the error between an input signal and a top-down prediction. If reducing the error is equivalent to reducing the influence of stimuli from the environment, predictive coding can be regarded as stimulation avoidance by prediction. Our previous studies showed that action and selection for stimulation avoidance emerge in spiking neural networks through spike-timing dependent plasticity (STDP). In this study, we demonstrate that spiking neural networks with random structure spontaneously learn to predict temporal sequences of stimuli based solely on STDP. |
Tasks | |
Published | 2019-11-21 |
URL | https://arxiv.org/abs/1911.09230v1 |
https://arxiv.org/pdf/1911.09230v1.pdf | |
PWC | https://paperswithcode.com/paper/predictive-coding-as-stimulus-avoidance-in |
Repo | |
Framework | |
A New Approach for Distributed Hypothesis Testing with Extensions to Byzantine-Resilience
Title | A New Approach for Distributed Hypothesis Testing with Extensions to Byzantine-Resilience |
Authors | Aritra Mitra, John A. Richards, Shreyas Sundaram |
Abstract | We study a setting where a group of agents, each receiving partially informative private observations, seek to collaboratively learn the true state (among a set of hypotheses) that explains their joint observation profiles over time. To solve this problem, we propose a distributed learning rule that differs fundamentally from existing approaches, in the sense, that it does not employ any form of “belief-averaging”. Specifically, every agent maintains a local belief (on each hypothesis) that is updated in a Bayesian manner without any network influence, and an actual belief that is updated (up to normalization) as the minimum of its own local belief and the actual beliefs of its neighbors. Under minimal requirements on the signal structures of the agents and the underlying communication graph, we establish consistency of the proposed belief update rule, i.e., we show that the actual beliefs of the agents asymptotically concentrate on the true state almost surely. As one of the key benefits of our approach, we show that our learning rule can be extended to scenarios that capture misbehavior on the part of certain agents in the network, modeled via the Byzantine adversary model. In particular, we prove that each non-adversarial agent can asymptotically learn the true state of the world almost surely, under appropriate conditions on the observation model and the network topology. |
Tasks | |
Published | 2019-03-14 |
URL | http://arxiv.org/abs/1903.05817v1 |
http://arxiv.org/pdf/1903.05817v1.pdf | |
PWC | https://paperswithcode.com/paper/a-new-approach-for-distributed-hypothesis |
Repo | |
Framework | |
Deep Neural Network Approximation for Custom Hardware: Where We’ve Been, Where We’re Going
Title | Deep Neural Network Approximation for Custom Hardware: Where We’ve Been, Where We’re Going |
Authors | Erwei Wang, James J. Davis, Ruizhe Zhao, Ho-Cheung Ng, Xinyu Niu, Wayne Luk, Peter Y. K. Cheung, George A. Constantinides |
Abstract | Deep neural networks have proven to be particularly effective in visual and audio recognition tasks. Existing models tend to be computationally expensive and memory intensive, however, and so methods for hardware-oriented approximation have become a hot topic. Research has shown that custom hardware-based neural network accelerators can surpass their general-purpose processor equivalents in terms of both throughput and energy efficiency. Application-tailored accelerators, when co-designed with approximation-based network training methods, transform large, dense and computationally expensive networks into small, sparse and hardware-efficient alternatives, increasing the feasibility of network deployment. In this article, we provide a comprehensive evaluation of approximation methods for high-performance network inference along with in-depth discussion of their effectiveness for custom hardware implementation. We also include proposals for future research based on a thorough analysis of current trends. This article represents the first survey providing detailed comparisons of custom hardware accelerators featuring approximation for both convolutional and recurrent neural networks, through which we hope to inspire exciting new developments in the field. |
Tasks | |
Published | 2019-01-21 |
URL | https://arxiv.org/abs/1901.06955v4 |
https://arxiv.org/pdf/1901.06955v4.pdf | |
PWC | https://paperswithcode.com/paper/deep-neural-network-approximation-for-custom |
Repo | |
Framework | |
HR-SAR-Net: A Deep Neural Network for Urban Scene Segmentation from High-Resolution SAR Data
Title | HR-SAR-Net: A Deep Neural Network for Urban Scene Segmentation from High-Resolution SAR Data |
Authors | Xiaying Wang, Lukas Cavigelli, Manuel Eggimann, Michele Magno, Luca Benini |
Abstract | Synthetic aperture radar (SAR) data is becoming increasingly available to a wide range of users through commercial service providers with resolutions reaching 0.5m/px. Segmenting SAR data still requires skilled personnel, limiting the potential for large-scale use. We show that it is possible to automatically and reliably perform urban scene segmentation from next-gen resolution SAR data (0.15m/px) using deep neural networks (DNNs), achieving a pixel accuracy of 95.19% and a mean IoU of 74.67% with data collected over a region of merely 2.2km${}^2$. The presented DNN is not only effective, but is very small with only 63k parameters and computationally simple enough to achieve a throughput of around 500Mpx/s using a single GPU. We further identify that additional SAR receive antennas and data from multiple flights massively improve the segmentation accuracy. We describe a procedure for generating a high-quality segmentation ground truth from multiple inaccurate building and road annotations, which has been crucial to achieving these segmentation results. |
Tasks | Scene Segmentation |
Published | 2019-12-10 |
URL | https://arxiv.org/abs/1912.04441v1 |
https://arxiv.org/pdf/1912.04441v1.pdf | |
PWC | https://paperswithcode.com/paper/hr-sar-net-a-deep-neural-network-for-urban |
Repo | |
Framework | |
Practical Solutions for Machine Learning Safety in Autonomous Vehicles
Title | Practical Solutions for Machine Learning Safety in Autonomous Vehicles |
Authors | Sina Mohseni, Mandar Pitale, Vasu Singh, Zhangyang Wang |
Abstract | Autonomous vehicles rely on machine learning to solve challenging tasks in perception and motion planning. However, automotive software safety standards have not fully evolved to address the challenges of machine learning safety such as interpretability, verification, and performance limitations. In this paper, we review and organize practical machine learning safety techniques that can complement engineering safety for machine learning based software in autonomous vehicles. Our organization maps safety strategies to state-of-the-art machine learning techniques in order to enhance dependability and safety of machine learning algorithms. We also discuss security limitations and user experience aspects of machine learning components in autonomous vehicles. |
Tasks | Autonomous Vehicles, Motion Planning |
Published | 2019-12-20 |
URL | https://arxiv.org/abs/1912.09630v1 |
https://arxiv.org/pdf/1912.09630v1.pdf | |
PWC | https://paperswithcode.com/paper/practical-solutions-for-machine-learning |
Repo | |
Framework | |
A Configuration-Space Decomposition Scheme for Learning-based Collision Checking
Title | A Configuration-Space Decomposition Scheme for Learning-based Collision Checking |
Authors | Yiheng Han, Wang Zhao, Jia Pan, Zipeng Ye, Ran Yi, Yong-Jin Liu |
Abstract | Motion planning for robots of high degrees-of-freedom (DOFs) is an important problem in robotics with sampling-based methods in configuration space C as one popular solution. Recently, machine learning methods have been introduced into sampling-based motion planning methods, which train a classifier to distinguish collision free subspace from in-collision subspace in C. In this paper, we propose a novel configuration space decomposition method and show two nice properties resulted from this decomposition. Using these two properties, we build a composite classifier that works compatibly with previous machine learning methods by using them as the elementary classifiers. Experimental results are presented, showing that our composite classifier outperforms state-of-the-art single classifier methods by a large margin. A real application of motion planning in a multi-robot system in plant phenotyping using three UR5 robotic arms is also presented. |
Tasks | Motion Planning |
Published | 2019-11-17 |
URL | https://arxiv.org/abs/1911.08581v1 |
https://arxiv.org/pdf/1911.08581v1.pdf | |
PWC | https://paperswithcode.com/paper/a-configuration-space-decomposition-scheme |
Repo | |
Framework | |
A Topological “Reading” Lesson: Classification of MNIST using TDA
Title | A Topological “Reading” Lesson: Classification of MNIST using TDA |
Authors | Adélie Garin, Guillaume Tauzin |
Abstract | We present a way to use Topological Data Analysis (TDA) for machine learning tasks on grayscale images. We apply persistent homology to generate a wide range of topological features using a point cloud obtained from an image, its natural grayscale filtration, and different filtrations defined on the binarized image. We show that this topological machine learning pipeline can be used as a highly relevant dimensionality reduction by applying it to the MNIST digits dataset. We conduct a feature selection and study their correlations while providing an intuitive interpretation of their importance, which is relevant in both machine learning and TDA. Finally, we show that we can classify digit images while reducing the size of the feature set by a factor 5 compared to the grayscale pixel value features and maintain similar accuracy. |
Tasks | Dimensionality Reduction, Feature Selection, Topological Data Analysis |
Published | 2019-10-18 |
URL | https://arxiv.org/abs/1910.08345v2 |
https://arxiv.org/pdf/1910.08345v2.pdf | |
PWC | https://paperswithcode.com/paper/a-topological-reading-lesson-classification |
Repo | |
Framework | |
Multivariate extensions of isotonic regression and total variation denoising via entire monotonicity and Hardy-Krause variation
Title | Multivariate extensions of isotonic regression and total variation denoising via entire monotonicity and Hardy-Krause variation |
Authors | Billy Fang, Adityanand Guntuboyina, Bodhisattva Sen |
Abstract | We consider the problem of nonparametric regression when the covariate is $d$-dimensional, where $d \geq 1$. In this paper we introduce and study two nonparametric least squares estimators (LSEs) in this setting—the entirely monotonic LSE and the constrained Hardy-Krause variation LSE. We show that these two LSEs are natural generalizations of univariate isotonic regression and univariate total variation denoising, respectively, to multiple dimensions. We discuss the characterization and computation of these two LSEs obtained from $n$ data points. We provide a detailed study of their risk properties under the squared error loss and fixed uniform lattice design. We show that the finite sample risk of these LSEs is always bounded from above by $n^{-2/3}$ modulo logarithmic factors depending on $d$; thus these nonparametric LSEs avoid the curse of dimensionality to some extent. We also prove nearly matching minimax lower bounds. Further, we illustrate that these LSEs are particularly useful in fitting rectangular piecewise constant functions. Specifically, we show that the risk of the entirely monotonic LSE is almost parametric (at most $1/n$ up to logarithmic factors) when the true function is well-approximable by a rectangular piecewise constant entirely monotone function with not too many constant pieces. A similar result is also shown to hold for the constrained Hardy-Krause variation LSE for a simple subclass of rectangular piecewise constant functions. We believe that the proposed LSEs yield a novel approach to estimating multivariate functions using convex optimization that avoid the curse of dimensionality to some extent. |
Tasks | Denoising |
Published | 2019-03-04 |
URL | https://arxiv.org/abs/1903.01395v2 |
https://arxiv.org/pdf/1903.01395v2.pdf | |
PWC | https://paperswithcode.com/paper/multivariate-extensions-of-isotonic |
Repo | |
Framework | |
FA-Harris: A Fast and Asynchronous Corner Detector for Event Cameras
Title | FA-Harris: A Fast and Asynchronous Corner Detector for Event Cameras |
Authors | Ruoxiang Li, Dianxi Shi, Yongjun Zhang, Kaiyue Li, Ruihao Li |
Abstract | Recently, the emerging bio-inspired event cameras have demonstrated potentials for a wide range of robotic applications in dynamic environments. In this paper, we propose a novel fast and asynchronous event-based corner detection method which is called FA-Harris. FA-Harris consists of several components, including an event filter, a Global Surface of Active Events (G-SAE) maintaining unit, a corner candidate selecting unit, and a corner candidate refining unit. The proposed G-SAE maintenance algorithm and corner candidate selection algorithm greatly enhance the real-time performance for corner detection, while the corner candidate refinement algorithm maintains the accuracy of performance by using an improved event-based Harris detector. Additionally, FA-Harris does not require artificially synthesized event-frames and can operate on asynchronous events directly. We implement the proposed method in C++ and evaluate it on public Event Camera Datasets. The results show that our method achieves approximately 8x speed-up when compared with previously reported event-based Harris detector, and with no compromise on the accuracy of performance. |
Tasks | |
Published | 2019-06-26 |
URL | https://arxiv.org/abs/1906.10925v4 |
https://arxiv.org/pdf/1906.10925v4.pdf | |
PWC | https://paperswithcode.com/paper/fa-harris-a-fast-and-asynchronous-corner |
Repo | |
Framework | |
Deep Autoencoders with Value-at-Risk Thresholding for Unsupervised Anomaly Detection
Title | Deep Autoencoders with Value-at-Risk Thresholding for Unsupervised Anomaly Detection |
Authors | Albert Akhriev, Jakub Marecek |
Abstract | Many real-world monitoring and surveillance applications require non-trivial anomaly detection to be run in the streaming model. We consider an incremental-learning approach, wherein a deep-autoencoding (DAE) model of what is normal is trained and used to detect anomalies at the same time. In the detection of anomalies, we utilise a novel thresholding mechanism, based on value at risk (VaR). We compare the resulting convolutional neural network (CNN) against a number of subspace methods, and present results on changedetection net. |
Tasks | Anomaly Detection, Unsupervised Anomaly Detection |
Published | 2019-12-09 |
URL | https://arxiv.org/abs/1912.04418v1 |
https://arxiv.org/pdf/1912.04418v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-autoencoders-with-value-at-risk |
Repo | |
Framework | |
Fast Spatially-Varying Indoor Lighting Estimation
Title | Fast Spatially-Varying Indoor Lighting Estimation |
Authors | Mathieu Garon, Kalyan Sunkavalli, Sunil Hadap, Nathan Carr, Jean-François Lalonde |
Abstract | We propose a real-time method to estimate spatiallyvarying indoor lighting from a single RGB image. Given an image and a 2D location in that image, our CNN estimates a 5th order spherical harmonic representation of the lighting at the given location in less than 20ms on a laptop mobile graphics card. While existing approaches estimate a single, global lighting representation or require depth as input, our method reasons about local lighting without requiring any geometry information. We demonstrate, through quantitative experiments including a user study, that our results achieve lower lighting estimation errors and are preferred by users over the state-of-the-art. Our approach can be used directly for augmented reality applications, where a virtual object is relit realistically at any position in the scene in real-time. |
Tasks | |
Published | 2019-06-10 |
URL | https://arxiv.org/abs/1906.03799v1 |
https://arxiv.org/pdf/1906.03799v1.pdf | |
PWC | https://paperswithcode.com/paper/fast-spatially-varying-indoor-lighting-1 |
Repo | |
Framework | |