April 3, 2020

3240 words 16 mins read

Paper Group ANR 45

Asymptotic Analysis of Sampling Estimators for Randomized Numerical Linear Algebra Algorithms. Noise Estimation Using Density Estimation for Self-Supervised Multimodal Learning. Closing the convergence gap of SGD without replacement. BigNAS: Scaling Up Neural Architecture Search with Big Single-Stage Models. Image-to-image Neural Network for Additi …

Asymptotic Analysis of Sampling Estimators for Randomized Numerical Linear Algebra Algorithms


Title	Asymptotic Analysis of Sampling Estimators for Randomized Numerical Linear Algebra Algorithms
Authors	Ping Ma, Xinlian Zhang, Xin Xing, Jingyi Ma, Michael W. Mahoney
Abstract	The statistical analysis of Randomized Numerical Linear Algebra (RandNLA) algorithms within the past few years has mostly focused on their performance as point estimators. However, this is insufficient for conducting statistical inference, e.g., constructing confidence intervals and hypothesis testing, since the distribution of the estimator is lacking. In this article, we develop an asymptotic analysis to derive the distribution of RandNLA sampling estimators for the least-squares problem. In particular, we derive the asymptotic distribution of a general sampling estimator with arbitrary sampling probabilities. The analysis is conducted in two complementary settings, i.e., when the objective of interest is to approximate the full sample estimator or is to infer the underlying ground truth model parameters. For each setting, we show that the sampling estimator is asymptotically normally distributed under mild regularity conditions. Moreover, the sampling estimator is asymptotically unbiased in both settings. Based on our asymptotic analysis, we use two criteria, the Asymptotic Mean Squared Error (AMSE) and the Expected Asymptotic Mean Squared Error (EAMSE), to identify optimal sampling probabilities. Several of these optimal sampling probability distributions are new to the literature, e.g., the root leverage sampling estimator and the predictor length sampling estimator. Our theoretical results clarify the role of leverage in the sampling process, and our empirical results demonstrate improvements over existing methods.
Tasks
Published	2020-02-24
URL	https://arxiv.org/abs/2002.10526v1
PDF	https://arxiv.org/pdf/2002.10526v1.pdf
PWC	https://paperswithcode.com/paper/asymptotic-analysis-of-sampling-estimators
Repo
Framework

Noise Estimation Using Density Estimation for Self-Supervised Multimodal Learning


Title	Noise Estimation Using Density Estimation for Self-Supervised Multimodal Learning
Authors	Elad Amrani, Rami Ben-Ari, Daniel Rotman, Alex Bronstein
Abstract	One of the key factors of enabling machine learning models to comprehend and solve real-world tasks is to leverage multimodal data. Unfortunately, annotation of multimodal data is challenging and expensive. Recently, self-supervised multimodal methods that combine vision and language were proposed to learn multimodal representations without annotation. However, these methods choose to ignore the presence of high levels of noise and thus yield sub-optimal results. In this work, we show that the problem of noise estimation for multimodal data can be reduced to a multimodal density estimation task. Using multimodal density estimation, we propose a noise estimation building block for multimodal representation learning that is based strictly on the inherent correlation between different modalities. We demonstrate how our noise estimation can be broadly integrated and achieves comparable results to state-of-the-art performance on five different benchmark datasets for two challenging multimodal tasks: Video Question Answering and Text-To-Video Retrieval.
Tasks	Density Estimation, Question Answering, Representation Learning, Video Question Answering, Video Retrieval, Visual Question Answering
Published	2020-03-06
URL	https://arxiv.org/abs/2003.03186v1
PDF	https://arxiv.org/pdf/2003.03186v1.pdf
PWC	https://paperswithcode.com/paper/noise-estimation-using-density-estimation-for
Repo
Framework

Closing the convergence gap of SGD without replacement


Title	Closing the convergence gap of SGD without replacement
Authors	Shashank Rajput, Anant Gupta, Dimitris Papailiopoulos
Abstract	Stochastic gradient descent without replacement sampling is widely used in practice for model training. However, the vast majority of SGD analyses assumes data sampled with replacement, and when the function minimized is strongly convex, an $\mathcal{O}\left(\frac{1}{T}\right)$ rate can be established when SGD is run for $T$ iterations. A recent line of breakthrough work on SGD without replacement (SGDo) established an $\mathcal{O}\left(\frac{n}{T^2}\right)$ convergence rate when the function minimized is strongly convex and is a sum of $n$ smooth functions, and an $\mathcal{O}\left(\frac{1}{T^2}+\frac{n^3}{T^3}\right)$ rate for sums of quadratics. On the other hand, the tightest known lower bound postulates an $\Omega\left(\frac{1}{T^2}+\frac{n^2}{T^3}\right)$ rate, leaving open the possibility of better SGDo convergence rates in the general case. In this paper, we close this gap and show that SGD without replacement achieves a rate of $\mathcal{O}\left(\frac{1}{T^2}+\frac{n^2}{T^3}\right)$ when the sum of the functions is a quadratic, and offer a new lower bound of $\Omega\left(\frac{n}{T^2}\right)$ for strongly convex functions that are sums of smooth functions.
Tasks
Published	2020-02-24
URL	https://arxiv.org/abs/2002.10400v2
PDF	https://arxiv.org/pdf/2002.10400v2.pdf
PWC	https://paperswithcode.com/paper/closing-the-convergence-gap-of-sgd-without
Repo
Framework

BigNAS: Scaling Up Neural Architecture Search with Big Single-Stage Models


Title	BigNAS: Scaling Up Neural Architecture Search with Big Single-Stage Models
Authors	Jiahui Yu, Pengchong Jin, Hanxiao Liu, Gabriel Bender, Pieter-Jan Kindermans, Mingxing Tan, Thomas Huang, Xiaodan Song, Ruoming Pang, Quoc Le
Abstract	Neural architecture search (NAS) has shown promising results discovering models that are both accurate and fast. For NAS, training a one-shot model has become a popular strategy to rank the relative quality of different architectures (child models) using a single set of shared weights. However, while one-shot model weights can effectively rank different network architectures, the absolute accuracies from these shared weights are typically far below those obtained from stand-alone training. To compensate, existing methods assume that the weights must be retrained, finetuned, or otherwise post-processed after the search is completed. These steps significantly increase the compute requirements and complexity of the architecture search and model deployment. In this work, we propose BigNAS, an approach that challenges the conventional wisdom that post-processing of the weights is necessary to get good prediction accuracies. Without extra retraining or post-processing steps, we are able to train a single set of shared weights on ImageNet and use these weights to obtain child models whose sizes range from 200 to 1000 MFLOPs. Our discovered model family, BigNASModels, achieve top-1 accuracies ranging from 76.5% to 80.9%, surpassing state-of-the-art models in this range including EfficientNets and Once-for-All networks without extra retraining or post-processing. We present ablative study and analysis to further understand the proposed BigNASModels.
Tasks	Neural Architecture Search
Published	2020-03-24
URL	https://arxiv.org/abs/2003.11142v1
PDF	https://arxiv.org/pdf/2003.11142v1.pdf
PWC	https://paperswithcode.com/paper/bignas-scaling-up-neural-architecture-search
Repo
Framework

Image-to-image Neural Network for Addition and Subtraction of a Pair of Not Very Large Numbers


Title	Image-to-image Neural Network for Addition and Subtraction of a Pair of Not Very Large Numbers
Authors	Vladimir Ivashkin
Abstract	Looking back at the history of calculators, one can see that they become less functional and more computationally expensive over time. A modern calculator runs on a personal computer and is drawn at 60 fps only to help us click a few digits with a mouse pointer. A search engine is often used as a calculator, which means that nowadays we need the Internet just to add two numbers. In this paper, we propose to go further and train a convolutional neural network that takes an image of a simple mathematical expression and generates an image of an answer. This neural calculator works only with pairs of double-digit numbers and supports only addition and subtraction. Also, sometimes it makes mistakes. We promise that the proposed calculator is a small step for man, but one giant leap for mankind.
Tasks
Published	2020-03-14
URL	https://arxiv.org/abs/2003.06592v1
PDF	https://arxiv.org/pdf/2003.06592v1.pdf
PWC	https://paperswithcode.com/paper/image-to-image-neural-network-for-addition
Repo
Framework

Bayesian Optimization for Policy Search in High-Dimensional Systems via Automatic Domain Selection


Title	Bayesian Optimization for Policy Search in High-Dimensional Systems via Automatic Domain Selection
Authors	Lukas P. Fröhlich, Edgar D. Klenske, Christian G. Daniel, Melanie N. Zeilinger
Abstract	Bayesian Optimization (BO) is an effective method for optimizing expensive-to-evaluate black-box functions with a wide range of applications for example in robotics, system design and parameter optimization. However, scaling BO to problems with large input dimensions (>10) remains an open challenge. In this paper, we propose to leverage results from optimal control to scale BO to higher dimensional control tasks and to reduce the need for manually selecting the optimization domain. The contributions of this paper are twofold: 1) We show how we can make use of a learned dynamics model in combination with a model-based controller to simplify the BO problem by focusing onto the most relevant regions of the optimization domain. 2) Based on (1) we present a method to find an embedding in parameter space that reduces the effective dimensionality of the optimization problem. To evaluate the effectiveness of the proposed approach, we present an experimental evaluation on real hardware, as well as simulated tasks including a 48-dimensional policy for a quadcopter.
Tasks
Published	2020-01-21
URL	https://arxiv.org/abs/2001.07394v1
PDF	https://arxiv.org/pdf/2001.07394v1.pdf
PWC	https://paperswithcode.com/paper/bayesian-optimization-for-policy-search-in
Repo
Framework

Rotation-invariant Mixed Graphical Model Network for 2D Hand Pose Estimation


Title	Rotation-invariant Mixed Graphical Model Network for 2D Hand Pose Estimation
Authors	Deying Kong, Haoyu Ma, Yifei Chen, Xiaohui Xie
Abstract	In this paper, we propose a new architecture named Rotation-invariant Mixed Graphical Model Network (R-MGMN) to solve the problem of 2D hand pose estimation from a monocular RGB image. By integrating a rotation net, the R-MGMN is invariant to rotations of the hand in the image. It also has a pool of graphical models, from which a combination of graphical models could be selected, conditioning on the input image. Belief propagation is performed on each graphical model separately, generating a set of marginal distributions, which are taken as the confidence maps of hand keypoint positions. Final confidence maps are obtained by aggregating these confidence maps together. We evaluate the R-MGMN on two public hand pose datasets. Experiment results show our model outperforms the state-of-the-art algorithm which is widely used in 2D hand pose estimation by a noticeable margin.
Tasks	Hand Pose Estimation, Pose Estimation
Published	2020-02-05
URL	https://arxiv.org/abs/2002.02033v1
PDF	https://arxiv.org/pdf/2002.02033v1.pdf
PWC	https://paperswithcode.com/paper/rotation-invariant-mixed-graphical-model
Repo
Framework

Deceptive AI Explanations: Creation and Detection


Title	Deceptive AI Explanations: Creation and Detection
Authors	Johannes Schneider, Joshua Handali, Michalis Vlachos, Christian Meske
Abstract	Artificial intelligence comes with great opportunities and but also great risks. We investigate to what extent deep learning can be used to create and detect deceptive explanations that either aim to lure a human into believing a decision that is not truthful to the model or provide reasoning that is non-faithful to the decision. Our theoretical insights show some limits of deception and detection in the absence of domain knowledge. For empirical evaluation, we focus on text classification. To create deceptive explanations, we alter explanations originating from GradCAM, a state-of-art technique for creating explanations in neural networks. We evaluate the effectiveness of deceptive explanations on 200 participants. Our findings indicate that deceptive explanations can indeed fool humans. Our classifier can detect even seemingly minor attempts of deception with accuracy that exceeds 80% given sufficient domain knowledge encoded in the form of training data.
Tasks	Text Classification
Published	2020-01-21
URL	https://arxiv.org/abs/2001.07641v1
PDF	https://arxiv.org/pdf/2001.07641v1.pdf
PWC	https://paperswithcode.com/paper/deceptive-ai-explanations-creation-and
Repo
Framework

Multi-Source Domain Adaptation for Text Classification via DistanceNet-Bandits


Title	Multi-Source Domain Adaptation for Text Classification via DistanceNet-Bandits
Authors	Han Guo, Ramakanth Pasunuru, Mohit Bansal
Abstract	Domain adaptation performance of a learning algorithm on a target domain is a function of its source domain error and a divergence measure between the data distribution of these two domains. We present a study of various distance-based measures in the context of NLP tasks, that characterize the dissimilarity between domains based on sample estimates. We first conduct analysis experiments to show which of these distance measures can best differentiate samples from same versus different domains, and are correlated with empirical results. Next, we develop a DistanceNet model which uses these distance measures, or a mixture of these distance measures, as an additional loss function to be minimized jointly with the task’s loss function, so as to achieve better unsupervised domain adaptation. Finally, we extend this model to a novel DistanceNet-Bandit model, which employs a multi-armed bandit controller to dynamically switch between multiple source domains and allow the model to learn an optimal trajectory and mixture of domains for transfer to the low-resource target domain. We conduct experiments on popular sentiment analysis datasets with several diverse domains and show that our DistanceNet model, as well as its dynamic bandit variant, can outperform competitive baselines in the context of unsupervised domain adaptation.
Tasks	Domain Adaptation, Sentiment Analysis, Text Classification, Unsupervised Domain Adaptation
Published	2020-01-13
URL	https://arxiv.org/abs/2001.04362v3
PDF	https://arxiv.org/pdf/2001.04362v3.pdf
PWC	https://paperswithcode.com/paper/multi-source-domain-adaptation-for-text
Repo
Framework

Visual Simplified Characters’ Emotion Emulator Implementing OCC Model


Title	Visual Simplified Characters’ Emotion Emulator Implementing OCC Model
Authors	Ana Lilia Laureano-Cruces, Laura Hernández-Domínguez, Martha Mora-Torres, Juan-Manuel Torres-Moreno, Jaime Enrique Cabrera-López
Abstract	In this paper, we present a visual emulator of the emotions seen in characters in stories. This system is based on a simplified view of the cognitive structure of emotions proposed by Ortony, Clore and Collins (OCC Model). The goal of this paper is to provide a visual platform that allows us to observe changes in the characters’ different emotions, and the intricate interrelationships between: 1) each character’s emotions, 2) their affective relationships and actions, 3) The events that take place in the development of a plot, and 4) the objects of desire that make up the emotional map of any story. This tool was tested on stories with a contrasting variety of emotional and affective environments: Othello, Twilight, and Harry Potter, behaving sensibly and in keeping with the atmosphere in which the characters were immersed.
Tasks
Published	2020-01-17
URL	https://arxiv.org/abs/2001.06190v1
PDF	https://arxiv.org/pdf/2001.06190v1.pdf
PWC	https://paperswithcode.com/paper/visual-simplified-characters-emotion-emulator
Repo
Framework

Introduction of Quantification in Frame Semantics


Title	Introduction of Quantification in Frame Semantics
Authors	Valentin D. Richard
Abstract	Feature Structures (FSs) are a widespread tool used for decompositional frameworks of Attribute-Value associations. Even though they thrive in simple systems, they lack a way of representing higher-order entities and relations. This is however needed in Frame Semantics, where semantic dependencies should be able to connect groups of individuals and their properties, especially to model quantification. To answer this issue, this master report introduces wrappings as a way to envelop a sub-FS and treat it as a node. Following the work of [Kallmeyer, Osswald 2013], we extend its syntax, semantics and some properties (translation to FOL, subsumption, unification). We can then expand the proposed pipeline. Lexical minimal model sets are generated from formulas. They unify by FS value equations obtained by LTAG parsing to an underspecified sentence representation. The syntactic approach of quantifiers allows us to use existing methods to produce any possible reading. Finally, we give a transcription to type-logical formulas to interact with the context in the view of dynamic semantics. Supported by ideas of Frame Types, this system provides a workable and tractable tool for higher-order relations with FS.
Tasks
Published	2020-01-25
URL	https://arxiv.org/abs/2002.00720v1
PDF	https://arxiv.org/pdf/2002.00720v1.pdf
PWC	https://paperswithcode.com/paper/introduction-of-quantification-in-frame
Repo
Framework

Differentially Private and Fair Classification via Calibrated Functional Mechanism


Title	Differentially Private and Fair Classification via Calibrated Functional Mechanism
Authors	Jiahao Ding, Xinyue Zhang, Xiaohuan Li, Junyi Wang, Rong Yu, Miao Pan
Abstract	Machine learning is increasingly becoming a powerful tool to make decisions in a wide variety of applications, such as medical diagnosis and autonomous driving. Privacy concerns related to the training data and unfair behaviors of some decisions with regard to certain attributes (e.g., sex, race) are becoming more critical. Thus, constructing a fair machine learning model while simultaneously providing privacy protection becomes a challenging problem. In this paper, we focus on the design of classification model with fairness and differential privacy guarantees by jointly combining functional mechanism and decision boundary fairness. In order to enforce $\epsilon$-differential privacy and fairness, we leverage the functional mechanism to add different amounts of Laplace noise regarding different attributes to the polynomial coefficients of the objective function in consideration of fairness constraint. We further propose an utility-enhancement scheme, called relaxed functional mechanism by adding Gaussian noise instead of Laplace noise, hence achieving $(\epsilon,\delta)$-differential privacy. Based on the relaxed functional mechanism, we can design $(\epsilon,\delta)$-differentially private and fair classification model. Moreover, our theoretical analysis and empirical results demonstrate that our two approaches achieve both fairness and differential privacy while preserving good utility and outperform the state-of-the-art algorithms.
Tasks	Autonomous Driving, Medical Diagnosis
Published	2020-01-14
URL	https://arxiv.org/abs/2001.04958v2
PDF	https://arxiv.org/pdf/2001.04958v2.pdf
PWC	https://paperswithcode.com/paper/differentially-private-and-fair
Repo
Framework

RTM3D: Real-time Monocular 3D Detection from Object Keypoints for Autonomous Driving


Title	RTM3D: Real-time Monocular 3D Detection from Object Keypoints for Autonomous Driving
Authors	Peixuan Li, Huaici Zhao, Pengfei Liu, Feidao Cao
Abstract	In this work, we propose an efficient and accurate monocular 3D detection framework in single shot. Most successful 3D detectors take the projection constraint from the 3D bounding box to the 2D box as an important component. Four edges of a 2D box provide only four constraints and the performance deteriorates dramatically with the small error of the 2D detector. Different from these approaches, our method predicts the nine perspective keypoints of a 3D bounding box in image space, and then utilize the geometric relationship of 3D and 2D perspectives to recover the dimension, location, and orientation in 3D space. In this method, the properties of the object can be predicted stably even when the estimation of keypoints is very noisy, which enables us to obtain fast detection speed with a small architecture. Training our method only uses the 3D properties of the object without the need for external networks or supervision data. Our method is the first real-time system for monocular image 3D detection while achieves state-of-the-art performance on the KITTI benchmark. Code will be released at https://github.com/Banconxuan/RTM3D.
Tasks	Autonomous Driving
Published	2020-01-10
URL	https://arxiv.org/abs/2001.03343v1
PDF	https://arxiv.org/pdf/2001.03343v1.pdf
PWC	https://paperswithcode.com/paper/rtm3d-real-time-monocular-3d-detection-from
Repo
Framework

VisionNet: A Drivable-space-based Interactive Motion Prediction Network for Autonomous Driving


Title	VisionNet: A Drivable-space-based Interactive Motion Prediction Network for Autonomous Driving
Authors	Yanliang Zhu, Deheng Qian, Dongchun Ren, Huaxia Xia
Abstract	The comprehension of environmental traffic situation largely ensures the driving safety of autonomous vehicles. Recently, the mission has been investigated by plenty of researches, while it is hard to be well addressed due to the limitation of collective influence in complex scenarios. These approaches model the interactions through the spatial relations between the target obstacle and its neighbors. However, they oversimplify the challenge since the training stage of the interactions lacks effective supervision. As a result, these models are far from promising. More intuitively, we transform the problem into calculating the interaction-aware drivable spaces and propose the CNN-based VisionNet for trajectory prediction. The VisionNet accepts a sequence of motion states, i.e., location, velocity, and acceleration, to estimate the future drivable spaces. The reified interactions significantly increase the interpretation ability of the VisionNet and refine the prediction. To further advance the performance, we propose an interactive loss to guide the generation of the drivable spaces. Experiments on multiple public datasets demonstrate the effectiveness of the proposed VisionNet.
Tasks	Autonomous Driving, Autonomous Vehicles, motion prediction, Trajectory Prediction
Published	2020-01-08
URL	https://arxiv.org/abs/2001.02354v1
PDF	https://arxiv.org/pdf/2001.02354v1.pdf
PWC	https://paperswithcode.com/paper/visionnet-a-drivable-space-based-interactive
Repo
Framework

Intelligent Roundabout Insertion using Deep Reinforcement Learning


Title	Intelligent Roundabout Insertion using Deep Reinforcement Learning
Authors	Alessandro Paolo Capasso, Giulio Bacchiani, Daniele Molinari
Abstract	An important topic in the autonomous driving research is the development of maneuver planning systems. Vehicles have to interact and negotiate with each other so that optimal choices, in terms of time and safety, are taken. For this purpose, we present a maneuver planning module able to negotiate the entering in busy roundabouts. The proposed module is based on a neural network trained to predict when and how entering the roundabout throughout the whole duration of the maneuver. Our model is trained with a novel implementation of A3C, which we will call Delayed A3C (D-A3C), in a synthetic environment where vehicles move in a realistic manner with interaction capabilities. In addition, the system is trained such that agents feature a unique tunable behavior, emulating real world scenarios where drivers have their own driving styles. Similarly, the maneuver can be performed using different aggressiveness levels, which is particularly useful to manage busy scenarios where conservative rule-based policies would result in undefined waits.
Tasks	Autonomous Driving
Published	2020-01-03
URL	https://arxiv.org/abs/2001.00786v1
PDF	https://arxiv.org/pdf/2001.00786v1.pdf
PWC	https://paperswithcode.com/paper/intelligent-roundabout-insertion-using-deep
Repo
Framework