Paper Group ANR 1255
Provable Smoothness Guarantees for Black-Box Variational Inference. An Adaptive and Fast Convergent Approach to Differentially Private Deep Learning. A Kernel Loss for Solving the Bellman Equation. Incorporating External Knowledge into Machine Reading for Generative Question Answering. Financial Market Directional Forecasting With Stacked Denoising …
Provable Smoothness Guarantees for Black-Box Variational Inference
Title | Provable Smoothness Guarantees for Black-Box Variational Inference |
Authors | Justin Domke |
Abstract | Black-box variational inference tries to approximate a complex target distribution though a gradient-based optimization of the parameters of a simpler distribution. Provable convergence guarantees require structural properties of the objective. This paper shows that for location-scale family approximations, if the target is M-Lipschitz smooth, then so is the objective, if the entropy is excluded. The key proof idea is to describe gradients in a certain inner-product space, thus permitting use of Bessel’s inequality. This result gives insight into how to parameterize distributions, gives bounds the location of the optimal parameters, and is a key ingredient for convergence guarantees. |
Tasks | |
Published | 2019-01-24 |
URL | https://arxiv.org/abs/1901.08431v2 |
https://arxiv.org/pdf/1901.08431v2.pdf | |
PWC | https://paperswithcode.com/paper/provable-smoothness-guarantees-for-black-box |
Repo | |
Framework | |
An Adaptive and Fast Convergent Approach to Differentially Private Deep Learning
Title | An Adaptive and Fast Convergent Approach to Differentially Private Deep Learning |
Authors | Zhiying Xu, Shuyu Shi, Alex X. Liu, Jun Zhao, Lin Chen |
Abstract | With the advent of the era of big data, deep learning has become a prevalent building block in a variety of machine learning or data mining tasks, such as signal processing, network modeling and traffic analysis, to name a few. The massive user data crowdsourced plays a crucial role in the success of deep learning models. However, it has been shown that user data may be inferred from trained neural models and thereby exposed to potential adversaries, which raises information security and privacy concerns. To address this issue, recent studies leverage the technique of differential privacy to design private-preserving deep learning algorithms. Albeit successful at privacy protection, differential privacy degrades the performance of neural models. In this paper, we develop ADADP, an adaptive and fast convergent learning algorithm with a provable privacy guarantee. ADADP significantly reduces the privacy cost by improving the convergence speed with an adaptive learning rate and mitigates the negative effect of differential privacy upon the model accuracy by introducing adaptive noise. The performance of ADADP is evaluated on real-world datasets. Experiment results show that it outperforms state-of-the-art differentially private approaches in terms of both privacy cost and model accuracy. |
Tasks | |
Published | 2019-12-19 |
URL | https://arxiv.org/abs/1912.09150v1 |
https://arxiv.org/pdf/1912.09150v1.pdf | |
PWC | https://paperswithcode.com/paper/an-adaptive-and-fast-convergent-approach-to |
Repo | |
Framework | |
A Kernel Loss for Solving the Bellman Equation
Title | A Kernel Loss for Solving the Bellman Equation |
Authors | Yihao Feng, Lihong Li, Qiang Liu |
Abstract | Value function learning plays a central role in many state-of-the-art reinforcement-learning algorithms. Many popular algorithms like Q-learning do not optimize any objective function, but are fixed-point iterations of some variant of Bellman operator that is not necessarily a contraction. As a result, they may easily lose convergence guarantees, as can be observed in practice. In this paper, we propose a novel loss function, which can be optimized using standard gradient-based methods without risking divergence. The key advantage is that its gradient can be easily approximated using sampled transitions, avoiding the need for double samples required by prior algorithms like residual gradient. Our approach may be combined with general function classes such as neural networks, on either on- or off-policy data, and is shown to work reliably and effectively in several benchmarks. |
Tasks | Q-Learning |
Published | 2019-05-25 |
URL | https://arxiv.org/abs/1905.10506v3 |
https://arxiv.org/pdf/1905.10506v3.pdf | |
PWC | https://paperswithcode.com/paper/a-kernel-loss-for-solving-the-bellman |
Repo | |
Framework | |
Incorporating External Knowledge into Machine Reading for Generative Question Answering
Title | Incorporating External Knowledge into Machine Reading for Generative Question Answering |
Authors | Bin Bi, Chen Wu, Ming Yan, Wei Wang, Jiangnan Xia, Chenliang Li |
Abstract | Commonsense and background knowledge is required for a QA model to answer many nontrivial questions. Different from existing work on knowledge-aware QA, we focus on a more challenging task of leveraging external knowledge to generate answers in natural language for a given question with context. In this paper, we propose a new neural model, Knowledge-Enriched Answer Generator (KEAG), which is able to compose a natural answer by exploiting and aggregating evidence from all four information sources available: question, passage, vocabulary and knowledge. During the process of answer generation, KEAG adaptively determines when to utilize symbolic knowledge and which fact from the knowledge is useful. This allows the model to exploit external knowledge that is not explicitly stated in the given text, but that is relevant for generating an answer. The empirical study on public benchmark of answer generation demonstrates that KEAG improves answer quality over models without knowledge and existing knowledge-aware models, confirming its effectiveness in leveraging knowledge. |
Tasks | Question Answering, Reading Comprehension |
Published | 2019-09-06 |
URL | https://arxiv.org/abs/1909.02745v1 |
https://arxiv.org/pdf/1909.02745v1.pdf | |
PWC | https://paperswithcode.com/paper/incorporating-external-knowledge-into-machine |
Repo | |
Framework | |
Financial Market Directional Forecasting With Stacked Denoising Autoencoder
Title | Financial Market Directional Forecasting With Stacked Denoising Autoencoder |
Authors | Shaogao Lv, Yongchao Hou, Hongwei Zhou |
Abstract | Forecasting stock market direction is always an amazing but challenging problem in finance. Although many popular shallow computational methods (such as Backpropagation Network and Support Vector Machine) have extensively been proposed, most algorithms have not yet attained a desirable level of applicability. In this paper, we present a deep learning model with strong ability to generate high level feature representations for accurate financial prediction. Precisely, a stacked denoising autoencoder (SDAE) from deep learning is applied to predict the daily CSI 300 index, from Shanghai and Shenzhen Stock Exchanges in China. We use six evaluation criteria to evaluate its performance compared with the back propagation network, support vector machine. The experiment shows that the underlying financial model with deep machine technology has a significant advantage for the prediction of the CSI 300 index. |
Tasks | Denoising |
Published | 2019-12-02 |
URL | https://arxiv.org/abs/1912.00712v1 |
https://arxiv.org/pdf/1912.00712v1.pdf | |
PWC | https://paperswithcode.com/paper/financial-market-directional-forecasting-with |
Repo | |
Framework | |
Riemannian batch normalization for SPD neural networks
Title | Riemannian batch normalization for SPD neural networks |
Authors | Daniel Brooks, Olivier Schwander, Frederic Barbaresco, Jean-Yves Schneider, Matthieu Cord |
Abstract | Covariance matrices have attracted attention for machine learning applications due to their capacity to capture interesting structure in the data. The main challenge is that one needs to take into account the particular geometry of the Riemannian manifold of symmetric positive definite (SPD) matrices they belong to. In the context of deep networks, several architectures for these matrices have recently been proposed. In our article, we introduce a Riemannian batch normalization (batchnorm) algorithm, which generalizes the one used in Euclidean nets. This novel layer makes use of geometric operations on the manifold, notably the Riemannian barycenter, parallel transport and non-linear structured matrix transformations. We derive a new manifold-constrained gradient descent algorithm working in the space of SPD matrices, allowing to learn the batchnorm layer. We validate our proposed approach with experiments in three different contexts on diverse data types: a drone recognition dataset from radar observations, and on emotion and action recognition datasets from video and motion capture data. Experiments show that the Riemannian batchnorm systematically gives better classification performance compared with leading methods and a remarkable robustness to lack of data. |
Tasks | Motion Capture |
Published | 2019-09-03 |
URL | https://arxiv.org/abs/1909.02414v2 |
https://arxiv.org/pdf/1909.02414v2.pdf | |
PWC | https://paperswithcode.com/paper/riemannian-batch-normalization-for-spd-neural |
Repo | |
Framework | |
Did You Miss the Sign? A False Negative Alarm System for Traffic Sign Detectors
Title | Did You Miss the Sign? A False Negative Alarm System for Traffic Sign Detectors |
Authors | Quazi Marufur Rahman, Niko Sünderhauf, Feras Dayoub |
Abstract | Object detection is an integral part of an autonomous vehicle for its safety-critical and navigational purposes. Traffic signs as objects play a vital role in guiding such systems. However, if the vehicle fails to locate any critical sign, it might make a catastrophic failure. In this paper, we propose an approach to identify traffic signs that have been mistakenly discarded by the object detector. The proposed method raises an alarm when it discovers a failure by the object detector to detect a traffic sign. This approach can be useful to evaluate the performance of the detector during the deployment phase. We trained a single shot multi-box object detector to detect traffic signs and used its internal features to train a separate false negative detector (FND). During deployment, FND decides whether the traffic sign detector (TSD) has missed a sign or not. We are using precision and recall to measure the accuracy of FND in two different datasets. For 80% recall, FND has achieved 89.9% precision in Belgium Traffic Sign Detection dataset and 90.8% precision in German Traffic Sign Recognition Benchmark dataset respectively. To the best of our knowledge, our method is the first to tackle this critical aspect of false negative detection in robotic vision. Such a fail-safe mechanism for object detection can improve the engagement of robotic vision systems in our daily life. |
Tasks | Object Detection, Traffic Sign Recognition |
Published | 2019-03-15 |
URL | http://arxiv.org/abs/1903.06391v1 |
http://arxiv.org/pdf/1903.06391v1.pdf | |
PWC | https://paperswithcode.com/paper/did-you-miss-the-sign-a-false-negative-alarm |
Repo | |
Framework | |
On Cycling Risk and Discomfort: Urban Safety Mapping and Bike Route Recommendations
Title | On Cycling Risk and Discomfort: Urban Safety Mapping and Bike Route Recommendations |
Authors | David Castells-Graells, Christopher Salahub, Evangelos Pournaras |
Abstract | Bike usage in Smart Cities becomes paramount for sustainable urban development. Cycling provides tremendous opportunities for a more healthy lifestyle, lower energy consumption and carbon emissions as well as reduction of traffic jams. While the number of cyclists increase along with the expansion of bike sharing initiatives and infrastructures, the number of bike accidents rises drastically threatening to jeopardize the bike urban movement. This paper studies cycling risk and discomfort using a diverse spectrum of data sources about geolocated bike accidents and their severity. Empirical continuous spatial risk estimations are calculated via kernel density contours that map safety in a case study of Zurich city. The role of weather, time, accident type and severity are illustrated. Given the predominance of self-caused accidents, an open-source software artifact for personalized route recommendations is introduced. The software is also used to collect open baseline route data that are compared with alternative ones that minimize risk or discomfort. These contributions can provide invaluable insights for urban planners to improve infrastructure. They can also improve the risk awareness of existing cyclists’ as well as support new cyclists, such as tourists, to safely explore a new urban environment by bike. |
Tasks | |
Published | 2019-05-16 |
URL | https://arxiv.org/abs/1905.08775v1 |
https://arxiv.org/pdf/1905.08775v1.pdf | |
PWC | https://paperswithcode.com/paper/190508775 |
Repo | |
Framework | |
Improving BERT Fine-tuning with Embedding Normalization
Title | Improving BERT Fine-tuning with Embedding Normalization |
Authors | Wenxuan Zhou, Junyi Du, Xiang Ren |
Abstract | Large pre-trained sentence encoders like BERT start a new chapter in natural language processing. A common practice to apply pre-trained BERT to sequence classification tasks (e.g., classification of sentences or sentence pairs) is by feeding the embedding of [CLS] token (in the last layer) to a task-specific classification layer, and then fine tune the model parameters of BERT and classifier jointly. In this paper, we conduct systematic analysis over several sequence classification datasets to examine the embedding values of [CLS] token before the fine tuning phase, and present the biased embedding distribution issue—i.e., embedding values of [CLS] concentrate on a few dimensions and are non-zero centered. Such biased embedding brings challenge to the optimization process during fine-tuning as gradients of [CLS] embedding may explode and result in degraded model performance. We further propose several simple yet effective normalization methods to modify the [CLS] embedding during the fine-tuning. Compared with the previous practice, neural classification model with the normalized embedding shows improvements on several text classification tasks, demonstrates the effectiveness of our method. |
Tasks | Text Classification |
Published | 2019-11-10 |
URL | https://arxiv.org/abs/1911.03918v2 |
https://arxiv.org/pdf/1911.03918v2.pdf | |
PWC | https://paperswithcode.com/paper/improving-bert-fine-tuning-with-embedding |
Repo | |
Framework | |
Gaze360: Physically Unconstrained Gaze Estimation in the Wild
Title | Gaze360: Physically Unconstrained Gaze Estimation in the Wild |
Authors | Petr Kellnhofer, Adria Recasens, Simon Stent, Wojciech Matusik, Antonio Torralba |
Abstract | Understanding where people are looking is an informative social cue. In this work, we present Gaze360, a large-scale gaze-tracking dataset and method for robust 3D gaze estimation in unconstrained images. Our dataset consists of 238 subjects in indoor and outdoor environments with labelled 3D gaze across a wide range of head poses and distances. It is the largest publicly available dataset of its kind by both subject and variety, made possible by a simple and efficient collection method. Our proposed 3D gaze model extends existing models to include temporal information and to directly output an estimate of gaze uncertainty. We demonstrate the benefits of our model via an ablation study, and show its generalization performance via a cross-dataset evaluation against other recent gaze benchmark datasets. We furthermore propose a simple self-supervised approach to improve cross-dataset domain adaptation. Finally, we demonstrate an application of our model for estimating customer attention in a supermarket setting. Our dataset and models are available at http://gaze360.csail.mit.edu . |
Tasks | Domain Adaptation, Gaze Estimation |
Published | 2019-10-22 |
URL | https://arxiv.org/abs/1910.10088v1 |
https://arxiv.org/pdf/1910.10088v1.pdf | |
PWC | https://paperswithcode.com/paper/gaze360-physically-unconstrained-gaze |
Repo | |
Framework | |
Design Smell Analysis for Developing and Established Open Source Java Software
Title | Design Smell Analysis for Developing and Established Open Source Java Software |
Authors | Asif Imran, Tevfik Kosar |
Abstract | Software design smells are design attributes which violate the fundamental design principles. Design smells are a key cause of design debt. Although the activities of design smell identification and measurement are predominantly considered in current literature, those which identify and communicate which design smells occur more frequently in newly developing software and which ones are more dominant in established software have been studied to a limited extent. This research describes a mechanism for identifying the design smells that are more prevalent in developing and established software respectively. A tool is provided which is used for design smell detection by analyzing large volumes of source code. More specifically, 164,609 Lines of Code (LoC) and 5,712 class files of six developing and 244,930 LoC and 12,048 class files of five established open-source Java software are analyzed. Obtained results show that out of the 4,020 occurrences of smells that were made for nine preselected types of design smells, 1,643 design smells were detected for developing software, which mainly consisted of four specific types of smells. For established software, 2,397 design smells were observed which predominantly consisted of four other types of smells. The remaining design smell was equally prevalent in both developing and established software. Desirable precision values ranging from 72.9% to 84.1% were obtained for the tool. |
Tasks | |
Published | 2019-10-11 |
URL | https://arxiv.org/abs/1910.05428v1 |
https://arxiv.org/pdf/1910.05428v1.pdf | |
PWC | https://paperswithcode.com/paper/design-smell-analysis-for-developing-and |
Repo | |
Framework | |
Adaptive Deep Learning for High-Dimensional Hamilton-Jacobi-Bellman Equations
Title | Adaptive Deep Learning for High-Dimensional Hamilton-Jacobi-Bellman Equations |
Authors | Tenavi Nakamura-Zimmerer, Qi Gong, Wei Kang |
Abstract | Computing optimal feedback controls for nonlinear systems generally requires solving Hamilton-Jacobi-Bellman (HJB) equations, which are notoriously difficult when the state dimension is large. Existing strategies for high-dimensional problems generally rely on specific, restrictive problem structures, or are valid only locally around some nominal trajectory. In this paper, we propose a data-driven method to approximate semi-global solutions to HJB equations for general high-dimensional nonlinear systems and compute optimal feedback controls in real-time. To accomplish this, we model solutions to HJB equations with neural networks (NNs) trained on data generated without discretizing the state space. Training is made more effective and data-efficient by leveraging the known physics of the problem and using the partially-trained NN to aid in adaptive data generation. We demonstrate the effectiveness of our method by learning solutions to HJB equations corresponding to the attitude control of a six-dimensional nonlinear rigid body, and nonlinear systems of dimension up to 30 arising from the stabilization of a Burgers’-type partial differential equation. The trained NNs are then used for real-time optimal feedback control of these systems. |
Tasks | |
Published | 2019-07-11 |
URL | https://arxiv.org/abs/1907.05317v3 |
https://arxiv.org/pdf/1907.05317v3.pdf | |
PWC | https://paperswithcode.com/paper/adaptive-deep-learning-for-high-dimensional |
Repo | |
Framework | |
Optimal Average-Case Reductions to Sparse PCA: From Weak Assumptions to Strong Hardness
Title | Optimal Average-Case Reductions to Sparse PCA: From Weak Assumptions to Strong Hardness |
Authors | Matthew Brennan, Guy Bresler |
Abstract | In the past decade, sparse principal component analysis has emerged as an archetypal problem for illustrating statistical-computational tradeoffs. This trend has largely been driven by a line of research aiming to characterize the average-case complexity of sparse PCA through reductions from the planted clique (PC) conjecture - which conjectures that there is no polynomial-time algorithm to detect a planted clique of size $K = o(N^{1/2})$ in $\mathcal{G}(N, \frac{1}{2})$. All previous reductions to sparse PCA either fail to show tight computational lower bounds matching existing algorithms or show lower bounds for formulations of sparse PCA other than its canonical generative model, the spiked covariance model. Also, these lower bounds all quickly degrade with the exponent in the PC conjecture. Specifically, when only given the PC conjecture up to $K = o(N^\alpha)$ where $\alpha < 1/2$, there is no sparsity level $k$ at which these lower bounds remain tight. If $\alpha \le 1/3$ these reductions fail to even show the existence of a statistical-computational tradeoff at any sparsity $k$. We give a reduction from PC that yields the first full characterization of the computational barrier in the spiked covariance model, providing tight lower bounds at all sparsities $k$. We also show the surprising result that weaker forms of the PC conjecture up to clique size $K = o(N^\alpha)$ for any given $\alpha \in (0, 1/2]$ imply tight computational lower bounds for sparse PCA at sparsities $k = o(n^{\alpha/3})$. This shows that even a mild improvement in the signal strength needed by the best known polynomial-time sparse PCA algorithms would imply that the hardness threshold for PC is subpolynomial. This is the first instance of a suboptimal hardness assumption implying optimal lower bounds for another problem in unsupervised learning. |
Tasks | |
Published | 2019-02-20 |
URL | http://arxiv.org/abs/1902.07380v1 |
http://arxiv.org/pdf/1902.07380v1.pdf | |
PWC | https://paperswithcode.com/paper/optimal-average-case-reductions-to-sparse-pca |
Repo | |
Framework | |
A Two-stream End-to-End Deep Learning Network for Recognizing Atypical Visual Attention in Autism Spectrum Disorder
Title | A Two-stream End-to-End Deep Learning Network for Recognizing Atypical Visual Attention in Autism Spectrum Disorder |
Authors | Jin Xie, Longfei Wang, Paula Webster, Yang Yao, Jiayao Sun, Shuo Wang, Huihui Zhou |
Abstract | Eye movements have been widely investigated to study the atypical visual attention in Autism Spectrum Disorder (ASD). The majority of these studies have been focused on limited eye movement features by statistical comparisons between ASD and Typically Developing (TD) groups, which make it difficult to accurately separate ASD from TD at the individual level. The deep learning technology has been highly successful in overcoming this issue by automatically extracting features important for classification through a data-driven learning process. However, there is still a lack of end-to-end deep learning framework for recognition of abnormal attention in ASD. In this study, we developed a novel two-stream deep learning network for this recognition based on 700 images and corresponding eye movement patterns of ASD and TD, and obtained an accuracy of 0.95, which was higher than the previous state-of-the-art. We next characterized contributions to the classification at the single image level and non-linearly integration of this single image level information during the classification. Moreover, we identified a group of pixel-level visual features within these images with greater impacts on the classification. Together, this two-stream deep learning network provides us a novel and powerful tool to recognize and understand abnormal visual attention in ASD. |
Tasks | |
Published | 2019-11-26 |
URL | https://arxiv.org/abs/1911.11393v1 |
https://arxiv.org/pdf/1911.11393v1.pdf | |
PWC | https://paperswithcode.com/paper/a-two-stream-end-to-end-deep-learning-network |
Repo | |
Framework | |
On the Turing Completeness of Modern Neural Network Architectures
Title | On the Turing Completeness of Modern Neural Network Architectures |
Authors | Jorge Pérez, Javier Marinković, Pablo Barceló |
Abstract | Alternatives to recurrent neural networks, in particular, architectures based on attention or convolutions, have been gaining momentum for processing input sequences. In spite of their relevance, the computational properties of these alternatives have not yet been fully explored. We study the computational power of two of the most paradigmatic architectures exemplifying these mechanisms: the Transformer (Vaswani et al., 2017) and the Neural GPU (Kaiser & Sutskever, 2016). We show both models to be Turing complete exclusively based on their capacity to compute and access internal dense representations of the data. In particular, neither the Transformer nor the Neural GPU requires access to an external memory to become Turing complete. Our study also reveals some minimal sets of elements needed to obtain these completeness results. |
Tasks | |
Published | 2019-01-10 |
URL | http://arxiv.org/abs/1901.03429v1 |
http://arxiv.org/pdf/1901.03429v1.pdf | |
PWC | https://paperswithcode.com/paper/on-the-turing-completeness-of-modern-neural |
Repo | |
Framework | |