January 27, 2020

3319 words 16 mins read

Paper Group ANR 1255

Provable Smoothness Guarantees for Black-Box Variational Inference. An Adaptive and Fast Convergent Approach to Differentially Private Deep Learning. A Kernel Loss for Solving the Bellman Equation. Incorporating External Knowledge into Machine Reading for Generative Question Answering. Financial Market Directional Forecasting With Stacked Denoising …

Provable Smoothness Guarantees for Black-Box Variational Inference


Title	Provable Smoothness Guarantees for Black-Box Variational Inference
Authors	Justin Domke
Abstract	Black-box variational inference tries to approximate a complex target distribution though a gradient-based optimization of the parameters of a simpler distribution. Provable convergence guarantees require structural properties of the objective. This paper shows that for location-scale family approximations, if the target is M-Lipschitz smooth, then so is the objective, if the entropy is excluded. The key proof idea is to describe gradients in a certain inner-product space, thus permitting use of Bessel’s inequality. This result gives insight into how to parameterize distributions, gives bounds the location of the optimal parameters, and is a key ingredient for convergence guarantees.
Tasks
Published	2019-01-24
URL	https://arxiv.org/abs/1901.08431v2
PDF	https://arxiv.org/pdf/1901.08431v2.pdf
PWC	https://paperswithcode.com/paper/provable-smoothness-guarantees-for-black-box
Repo
Framework

An Adaptive and Fast Convergent Approach to Differentially Private Deep Learning


Title	An Adaptive and Fast Convergent Approach to Differentially Private Deep Learning
Authors	Zhiying Xu, Shuyu Shi, Alex X. Liu, Jun Zhao, Lin Chen
Abstract	With the advent of the era of big data, deep learning has become a prevalent building block in a variety of machine learning or data mining tasks, such as signal processing, network modeling and traffic analysis, to name a few. The massive user data crowdsourced plays a crucial role in the success of deep learning models. However, it has been shown that user data may be inferred from trained neural models and thereby exposed to potential adversaries, which raises information security and privacy concerns. To address this issue, recent studies leverage the technique of differential privacy to design private-preserving deep learning algorithms. Albeit successful at privacy protection, differential privacy degrades the performance of neural models. In this paper, we develop ADADP, an adaptive and fast convergent learning algorithm with a provable privacy guarantee. ADADP significantly reduces the privacy cost by improving the convergence speed with an adaptive learning rate and mitigates the negative effect of differential privacy upon the model accuracy by introducing adaptive noise. The performance of ADADP is evaluated on real-world datasets. Experiment results show that it outperforms state-of-the-art differentially private approaches in terms of both privacy cost and model accuracy.
Tasks
Published	2019-12-19
URL	https://arxiv.org/abs/1912.09150v1
PDF	https://arxiv.org/pdf/1912.09150v1.pdf
PWC	https://paperswithcode.com/paper/an-adaptive-and-fast-convergent-approach-to
Repo
Framework

A Kernel Loss for Solving the Bellman Equation


Title	A Kernel Loss for Solving the Bellman Equation
Authors	Yihao Feng, Lihong Li, Qiang Liu
Abstract	Value function learning plays a central role in many state-of-the-art reinforcement-learning algorithms. Many popular algorithms like Q-learning do not optimize any objective function, but are fixed-point iterations of some variant of Bellman operator that is not necessarily a contraction. As a result, they may easily lose convergence guarantees, as can be observed in practice. In this paper, we propose a novel loss function, which can be optimized using standard gradient-based methods without risking divergence. The key advantage is that its gradient can be easily approximated using sampled transitions, avoiding the need for double samples required by prior algorithms like residual gradient. Our approach may be combined with general function classes such as neural networks, on either on- or off-policy data, and is shown to work reliably and effectively in several benchmarks.
Tasks	Q-Learning
Published	2019-05-25
URL	https://arxiv.org/abs/1905.10506v3
PDF	https://arxiv.org/pdf/1905.10506v3.pdf
PWC	https://paperswithcode.com/paper/a-kernel-loss-for-solving-the-bellman
Repo
Framework

Incorporating External Knowledge into Machine Reading for Generative Question Answering


Title	Incorporating External Knowledge into Machine Reading for Generative Question Answering
Authors	Bin Bi, Chen Wu, Ming Yan, Wei Wang, Jiangnan Xia, Chenliang Li
Abstract	Commonsense and background knowledge is required for a QA model to answer many nontrivial questions. Different from existing work on knowledge-aware QA, we focus on a more challenging task of leveraging external knowledge to generate answers in natural language for a given question with context. In this paper, we propose a new neural model, Knowledge-Enriched Answer Generator (KEAG), which is able to compose a natural answer by exploiting and aggregating evidence from all four information sources available: question, passage, vocabulary and knowledge. During the process of answer generation, KEAG adaptively determines when to utilize symbolic knowledge and which fact from the knowledge is useful. This allows the model to exploit external knowledge that is not explicitly stated in the given text, but that is relevant for generating an answer. The empirical study on public benchmark of answer generation demonstrates that KEAG improves answer quality over models without knowledge and existing knowledge-aware models, confirming its effectiveness in leveraging knowledge.
Tasks	Question Answering, Reading Comprehension
Published	2019-09-06
URL	https://arxiv.org/abs/1909.02745v1
PDF	https://arxiv.org/pdf/1909.02745v1.pdf
PWC	https://paperswithcode.com/paper/incorporating-external-knowledge-into-machine
Repo
Framework

Financial Market Directional Forecasting With Stacked Denoising Autoencoder


Title	Financial Market Directional Forecasting With Stacked Denoising Autoencoder
Authors	Shaogao Lv, Yongchao Hou, Hongwei Zhou
Abstract	Forecasting stock market direction is always an amazing but challenging problem in finance. Although many popular shallow computational methods (such as Backpropagation Network and Support Vector Machine) have extensively been proposed, most algorithms have not yet attained a desirable level of applicability. In this paper, we present a deep learning model with strong ability to generate high level feature representations for accurate financial prediction. Precisely, a stacked denoising autoencoder (SDAE) from deep learning is applied to predict the daily CSI 300 index, from Shanghai and Shenzhen Stock Exchanges in China. We use six evaluation criteria to evaluate its performance compared with the back propagation network, support vector machine. The experiment shows that the underlying financial model with deep machine technology has a significant advantage for the prediction of the CSI 300 index.
Tasks	Denoising
Published	2019-12-02
URL	https://arxiv.org/abs/1912.00712v1
PDF	https://arxiv.org/pdf/1912.00712v1.pdf
PWC	https://paperswithcode.com/paper/financial-market-directional-forecasting-with
Repo
Framework

Riemannian batch normalization for SPD neural networks


Title	Riemannian batch normalization for SPD neural networks
Authors	Daniel Brooks, Olivier Schwander, Frederic Barbaresco, Jean-Yves Schneider, Matthieu Cord
Abstract	Covariance matrices have attracted attention for machine learning applications due to their capacity to capture interesting structure in the data. The main challenge is that one needs to take into account the particular geometry of the Riemannian manifold of symmetric positive definite (SPD) matrices they belong to. In the context of deep networks, several architectures for these matrices have recently been proposed. In our article, we introduce a Riemannian batch normalization (batchnorm) algorithm, which generalizes the one used in Euclidean nets. This novel layer makes use of geometric operations on the manifold, notably the Riemannian barycenter, parallel transport and non-linear structured matrix transformations. We derive a new manifold-constrained gradient descent algorithm working in the space of SPD matrices, allowing to learn the batchnorm layer. We validate our proposed approach with experiments in three different contexts on diverse data types: a drone recognition dataset from radar observations, and on emotion and action recognition datasets from video and motion capture data. Experiments show that the Riemannian batchnorm systematically gives better classification performance compared with leading methods and a remarkable robustness to lack of data.
Tasks	Motion Capture
Published	2019-09-03
URL	https://arxiv.org/abs/1909.02414v2
PDF	https://arxiv.org/pdf/1909.02414v2.pdf
PWC	https://paperswithcode.com/paper/riemannian-batch-normalization-for-spd-neural
Repo
Framework

Did You Miss the Sign? A False Negative Alarm System for Traffic Sign Detectors


Title	Did You Miss the Sign? A False Negative Alarm System for Traffic Sign Detectors
Authors	Quazi Marufur Rahman, Niko Sünderhauf, Feras Dayoub
Abstract	Object detection is an integral part of an autonomous vehicle for its safety-critical and navigational purposes. Traffic signs as objects play a vital role in guiding such systems. However, if the vehicle fails to locate any critical sign, it might make a catastrophic failure. In this paper, we propose an approach to identify traffic signs that have been mistakenly discarded by the object detector. The proposed method raises an alarm when it discovers a failure by the object detector to detect a traffic sign. This approach can be useful to evaluate the performance of the detector during the deployment phase. We trained a single shot multi-box object detector to detect traffic signs and used its internal features to train a separate false negative detector (FND). During deployment, FND decides whether the traffic sign detector (TSD) has missed a sign or not. We are using precision and recall to measure the accuracy of FND in two different datasets. For 80% recall, FND has achieved 89.9% precision in Belgium Traffic Sign Detection dataset and 90.8% precision in German Traffic Sign Recognition Benchmark dataset respectively. To the best of our knowledge, our method is the first to tackle this critical aspect of false negative detection in robotic vision. Such a fail-safe mechanism for object detection can improve the engagement of robotic vision systems in our daily life.
Tasks	Object Detection, Traffic Sign Recognition
Published	2019-03-15
URL	http://arxiv.org/abs/1903.06391v1
PDF	http://arxiv.org/pdf/1903.06391v1.pdf
PWC	https://paperswithcode.com/paper/did-you-miss-the-sign-a-false-negative-alarm
Repo
Framework

On Cycling Risk and Discomfort: Urban Safety Mapping and Bike Route Recommendations


Title	On Cycling Risk and Discomfort: Urban Safety Mapping and Bike Route Recommendations
Authors	David Castells-Graells, Christopher Salahub, Evangelos Pournaras
Abstract	Bike usage in Smart Cities becomes paramount for sustainable urban development. Cycling provides tremendous opportunities for a more healthy lifestyle, lower energy consumption and carbon emissions as well as reduction of traffic jams. While the number of cyclists increase along with the expansion of bike sharing initiatives and infrastructures, the number of bike accidents rises drastically threatening to jeopardize the bike urban movement. This paper studies cycling risk and discomfort using a diverse spectrum of data sources about geolocated bike accidents and their severity. Empirical continuous spatial risk estimations are calculated via kernel density contours that map safety in a case study of Zurich city. The role of weather, time, accident type and severity are illustrated. Given the predominance of self-caused accidents, an open-source software artifact for personalized route recommendations is introduced. The software is also used to collect open baseline route data that are compared with alternative ones that minimize risk or discomfort. These contributions can provide invaluable insights for urban planners to improve infrastructure. They can also improve the risk awareness of existing cyclists’ as well as support new cyclists, such as tourists, to safely explore a new urban environment by bike.
Tasks
Published	2019-05-16
URL	https://arxiv.org/abs/1905.08775v1
PDF	https://arxiv.org/pdf/1905.08775v1.pdf
PWC	https://paperswithcode.com/paper/190508775
Repo
Framework

Improving BERT Fine-tuning with Embedding Normalization


Title	Improving BERT Fine-tuning with Embedding Normalization
Authors	Wenxuan Zhou, Junyi Du, Xiang Ren
Abstract	Large pre-trained sentence encoders like BERT start a new chapter in natural language processing. A common practice to apply pre-trained BERT to sequence classification tasks (e.g., classification of sentences or sentence pairs) is by feeding the embedding of [CLS] token (in the last layer) to a task-specific classification layer, and then fine tune the model parameters of BERT and classifier jointly. In this paper, we conduct systematic analysis over several sequence classification datasets to examine the embedding values of [CLS] token before the fine tuning phase, and present the biased embedding distribution issue—i.e., embedding values of [CLS] concentrate on a few dimensions and are non-zero centered. Such biased embedding brings challenge to the optimization process during fine-tuning as gradients of [CLS] embedding may explode and result in degraded model performance. We further propose several simple yet effective normalization methods to modify the [CLS] embedding during the fine-tuning. Compared with the previous practice, neural classification model with the normalized embedding shows improvements on several text classification tasks, demonstrates the effectiveness of our method.
Tasks	Text Classification
Published	2019-11-10
URL	https://arxiv.org/abs/1911.03918v2
PDF	https://arxiv.org/pdf/1911.03918v2.pdf
PWC	https://paperswithcode.com/paper/improving-bert-fine-tuning-with-embedding
Repo
Framework

Gaze360: Physically Unconstrained Gaze Estimation in the Wild


Title	Gaze360: Physically Unconstrained Gaze Estimation in the Wild
Authors	Petr Kellnhofer, Adria Recasens, Simon Stent, Wojciech Matusik, Antonio Torralba
Abstract	Understanding where people are looking is an informative social cue. In this work, we present Gaze360, a large-scale gaze-tracking dataset and method for robust 3D gaze estimation in unconstrained images. Our dataset consists of 238 subjects in indoor and outdoor environments with labelled 3D gaze across a wide range of head poses and distances. It is the largest publicly available dataset of its kind by both subject and variety, made possible by a simple and efficient collection method. Our proposed 3D gaze model extends existing models to include temporal information and to directly output an estimate of gaze uncertainty. We demonstrate the benefits of our model via an ablation study, and show its generalization performance via a cross-dataset evaluation against other recent gaze benchmark datasets. We furthermore propose a simple self-supervised approach to improve cross-dataset domain adaptation. Finally, we demonstrate an application of our model for estimating customer attention in a supermarket setting. Our dataset and models are available at http://gaze360.csail.mit.edu .
Tasks	Domain Adaptation, Gaze Estimation
Published	2019-10-22
URL	https://arxiv.org/abs/1910.10088v1
PDF	https://arxiv.org/pdf/1910.10088v1.pdf
PWC	https://paperswithcode.com/paper/gaze360-physically-unconstrained-gaze
Repo
Framework

Design Smell Analysis for Developing and Established Open Source Java Software


Title	Design Smell Analysis for Developing and Established Open Source Java Software
Authors	Asif Imran, Tevfik Kosar
Abstract	Software design smells are design attributes which violate the fundamental design principles. Design smells are a key cause of design debt. Although the activities of design smell identification and measurement are predominantly considered in current literature, those which identify and communicate which design smells occur more frequently in newly developing software and which ones are more dominant in established software have been studied to a limited extent. This research describes a mechanism for identifying the design smells that are more prevalent in developing and established software respectively. A tool is provided which is used for design smell detection by analyzing large volumes of source code. More specifically, 164,609 Lines of Code (LoC) and 5,712 class files of six developing and 244,930 LoC and 12,048 class files of five established open-source Java software are analyzed. Obtained results show that out of the 4,020 occurrences of smells that were made for nine preselected types of design smells, 1,643 design smells were detected for developing software, which mainly consisted of four specific types of smells. For established software, 2,397 design smells were observed which predominantly consisted of four other types of smells. The remaining design smell was equally prevalent in both developing and established software. Desirable precision values ranging from 72.9% to 84.1% were obtained for the tool.
Tasks
Published	2019-10-11
URL	https://arxiv.org/abs/1910.05428v1
PDF	https://arxiv.org/pdf/1910.05428v1.pdf
PWC	https://paperswithcode.com/paper/design-smell-analysis-for-developing-and
Repo
Framework

Adaptive Deep Learning for High-Dimensional Hamilton-Jacobi-Bellman Equations


Title	Adaptive Deep Learning for High-Dimensional Hamilton-Jacobi-Bellman Equations
Authors	Tenavi Nakamura-Zimmerer, Qi Gong, Wei Kang
Abstract	Computing optimal feedback controls for nonlinear systems generally requires solving Hamilton-Jacobi-Bellman (HJB) equations, which are notoriously difficult when the state dimension is large. Existing strategies for high-dimensional problems generally rely on specific, restrictive problem structures, or are valid only locally around some nominal trajectory. In this paper, we propose a data-driven method to approximate semi-global solutions to HJB equations for general high-dimensional nonlinear systems and compute optimal feedback controls in real-time. To accomplish this, we model solutions to HJB equations with neural networks (NNs) trained on data generated without discretizing the state space. Training is made more effective and data-efficient by leveraging the known physics of the problem and using the partially-trained NN to aid in adaptive data generation. We demonstrate the effectiveness of our method by learning solutions to HJB equations corresponding to the attitude control of a six-dimensional nonlinear rigid body, and nonlinear systems of dimension up to 30 arising from the stabilization of a Burgers’-type partial differential equation. The trained NNs are then used for real-time optimal feedback control of these systems.
Tasks
Published	2019-07-11
URL	https://arxiv.org/abs/1907.05317v3
PDF	https://arxiv.org/pdf/1907.05317v3.pdf
PWC	https://paperswithcode.com/paper/adaptive-deep-learning-for-high-dimensional
Repo
Framework

Optimal Average-Case Reductions to Sparse PCA: From Weak Assumptions to Strong Hardness


Title	Optimal Average-Case Reductions to Sparse PCA: From Weak Assumptions to Strong Hardness
Authors	Matthew Brennan, Guy Bresler
Abstract	In the past decade, sparse principal component analysis has emerged as an archetypal problem for illustrating statistical-computational tradeoffs. This trend has largely been driven by a line of research aiming to characterize the average-case complexity of sparse PCA through reductions from the planted clique (PC) conjecture - which conjectures that there is no polynomial-time algorithm to detect a planted clique of size $K = o(N^{1/2})$ in $\mathcal{G}(N, \frac{1}{2})$. All previous reductions to sparse PCA either fail to show tight computational lower bounds matching existing algorithms or show lower bounds for formulations of sparse PCA other than its canonical generative model, the spiked covariance model. Also, these lower bounds all quickly degrade with the exponent in the PC conjecture. Specifically, when only given the PC conjecture up to $K = o(N^\alpha)$ where $\alpha < 1/2$, there is no sparsity level $k$ at which these lower bounds remain tight. If $\alpha \le 1/3$ these reductions fail to even show the existence of a statistical-computational tradeoff at any sparsity $k$. We give a reduction from PC that yields the first full characterization of the computational barrier in the spiked covariance model, providing tight lower bounds at all sparsities $k$. We also show the surprising result that weaker forms of the PC conjecture up to clique size $K = o(N^\alpha)$ for any given $\alpha \in (0, 1/2]$ imply tight computational lower bounds for sparse PCA at sparsities $k = o(n^{\alpha/3})$. This shows that even a mild improvement in the signal strength needed by the best known polynomial-time sparse PCA algorithms would imply that the hardness threshold for PC is subpolynomial. This is the first instance of a suboptimal hardness assumption implying optimal lower bounds for another problem in unsupervised learning.
Tasks
Published	2019-02-20
URL	http://arxiv.org/abs/1902.07380v1
PDF	http://arxiv.org/pdf/1902.07380v1.pdf
PWC	https://paperswithcode.com/paper/optimal-average-case-reductions-to-sparse-pca
Repo
Framework

A Two-stream End-to-End Deep Learning Network for Recognizing Atypical Visual Attention in Autism Spectrum Disorder


Title	A Two-stream End-to-End Deep Learning Network for Recognizing Atypical Visual Attention in Autism Spectrum Disorder
Authors	Jin Xie, Longfei Wang, Paula Webster, Yang Yao, Jiayao Sun, Shuo Wang, Huihui Zhou
Abstract	Eye movements have been widely investigated to study the atypical visual attention in Autism Spectrum Disorder (ASD). The majority of these studies have been focused on limited eye movement features by statistical comparisons between ASD and Typically Developing (TD) groups, which make it difficult to accurately separate ASD from TD at the individual level. The deep learning technology has been highly successful in overcoming this issue by automatically extracting features important for classification through a data-driven learning process. However, there is still a lack of end-to-end deep learning framework for recognition of abnormal attention in ASD. In this study, we developed a novel two-stream deep learning network for this recognition based on 700 images and corresponding eye movement patterns of ASD and TD, and obtained an accuracy of 0.95, which was higher than the previous state-of-the-art. We next characterized contributions to the classification at the single image level and non-linearly integration of this single image level information during the classification. Moreover, we identified a group of pixel-level visual features within these images with greater impacts on the classification. Together, this two-stream deep learning network provides us a novel and powerful tool to recognize and understand abnormal visual attention in ASD.
Tasks
Published	2019-11-26
URL	https://arxiv.org/abs/1911.11393v1
PDF	https://arxiv.org/pdf/1911.11393v1.pdf
PWC	https://paperswithcode.com/paper/a-two-stream-end-to-end-deep-learning-network
Repo
Framework

On the Turing Completeness of Modern Neural Network Architectures


Title	On the Turing Completeness of Modern Neural Network Architectures
Authors	Jorge Pérez, Javier Marinković, Pablo Barceló
Abstract	Alternatives to recurrent neural networks, in particular, architectures based on attention or convolutions, have been gaining momentum for processing input sequences. In spite of their relevance, the computational properties of these alternatives have not yet been fully explored. We study the computational power of two of the most paradigmatic architectures exemplifying these mechanisms: the Transformer (Vaswani et al., 2017) and the Neural GPU (Kaiser & Sutskever, 2016). We show both models to be Turing complete exclusively based on their capacity to compute and access internal dense representations of the data. In particular, neither the Transformer nor the Neural GPU requires access to an external memory to become Turing complete. Our study also reveals some minimal sets of elements needed to obtain these completeness results.
Tasks
Published	2019-01-10
URL	http://arxiv.org/abs/1901.03429v1
PDF	http://arxiv.org/pdf/1901.03429v1.pdf
PWC	https://paperswithcode.com/paper/on-the-turing-completeness-of-modern-neural
Repo
Framework