October 19, 2019

3277 words 16 mins read

Paper Group ANR 291

Paper Group ANR 291

VIBNN: Hardware Acceleration of Bayesian Neural Networks. Decision-Making with Belief Functions: a Review. Finite Sample Analysis of the GTD Policy Evaluation Algorithms in Markov Setting. On How Well Generative Adversarial Networks Learn Densities: Nonparametric and Parametric Results. Towards a Neural Network Approach to Abstractive Multi-Documen …

VIBNN: Hardware Acceleration of Bayesian Neural Networks

Title VIBNN: Hardware Acceleration of Bayesian Neural Networks
Authors Ruizhe Cai, Ao Ren, Ning Liu, Caiwen Ding, Luhao Wang, Xuehai Qian, Massoud Pedram, Yanzhi Wang
Abstract Bayesian Neural Networks (BNNs) have been proposed to address the problem of model uncertainty in training and inference. By introducing weights associated with conditioned probability distributions, BNNs are capable of resolving the overfitting issue commonly seen in conventional neural networks and allow for small-data training, through the variational inference process. Frequent usage of Gaussian random variables in this process requires a properly optimized Gaussian Random Number Generator (GRNG). The high hardware cost of conventional GRNG makes the hardware implementation of BNNs challenging. In this paper, we propose VIBNN, an FPGA-based hardware accelerator design for variational inference on BNNs. We explore the design space for massive amount of Gaussian variable sampling tasks in BNNs. Specifically, we introduce two high performance Gaussian (pseudo) random number generators: the RAM-based Linear Feedback Gaussian Random Number Generator (RLF-GRNG), which is inspired by the properties of binomial distribution and linear feedback logics; and the Bayesian Neural Network-oriented Wallace Gaussian Random Number Generator. To achieve high scalability and efficient memory access, we propose a deep pipelined accelerator architecture with fast execution and good hardware utilization. Experimental results demonstrate that the proposed VIBNN implementations on an FPGA can achieve throughput of 321,543.4 Images/s and energy efficiency upto 52,694.8 Images/J while maintaining similar accuracy as its software counterpart.
Tasks
Published 2018-02-02
URL http://arxiv.org/abs/1802.00822v1
PDF http://arxiv.org/pdf/1802.00822v1.pdf
PWC https://paperswithcode.com/paper/vibnn-hardware-acceleration-of-bayesian
Repo
Framework

Decision-Making with Belief Functions: a Review

Title Decision-Making with Belief Functions: a Review
Authors Thierry Denoeux
Abstract Approaches to decision-making under uncertainty in the belief function framework are reviewed. Most methods are shown to blend criteria for decision under ignorance with the maximum expected utility principle of Bayesian decision theory. A distinction is made between methods that construct a complete preference relation among acts, and those that allow incomparability of some acts due to lack of information. Methods developed in the imprecise probability framework are applicable in the Dempster-Shafer context and are also reviewed. Shafer’s constructive decision theory, which substitutes the notion of goal for that of utility, is described and contrasted with other approaches. The paper ends by pointing out the need to carry out deeper investigation of fundamental issues related to decision-making with belief functions and to assess the descriptive, normative and prescriptive values of the different approaches.
Tasks Decision Making, Decision Making Under Uncertainty
Published 2018-08-16
URL https://arxiv.org/abs/1808.05322v2
PDF https://arxiv.org/pdf/1808.05322v2.pdf
PWC https://paperswithcode.com/paper/decision-making-with-belief-functions-a
Repo
Framework

Finite Sample Analysis of the GTD Policy Evaluation Algorithms in Markov Setting

Title Finite Sample Analysis of the GTD Policy Evaluation Algorithms in Markov Setting
Authors Yue Wang, Wei Chen, Yuting Liu, Zhi-Ming Ma, Tie-Yan Liu
Abstract In reinforcement learning (RL) , one of the key components is policy evaluation, which aims to estimate the value function (i.e., expected long-term accumulated reward) of a policy. With a good policy evaluation method, the RL algorithms will estimate the value function more accurately and find a better policy. When the state space is large or continuous \emph{Gradient-based Temporal Difference(GTD)} policy evaluation algorithms with linear function approximation are widely used. Considering that the collection of the evaluation data is both time and reward consuming, a clear understanding of the finite sample performance of the policy evaluation algorithms is very important to reinforcement learning. Under the assumption that data are i.i.d. generated, previous work provided the finite sample analysis of the GTD algorithms with constant step size by converting them into convex-concave saddle point problems. However, it is well-known that, the data are generated from Markov processes rather than i.i.d. in RL problems.. In this paper, in the realistic Markov setting, we derive the finite sample bounds for the general convex-concave saddle point problems, and hence for the GTD algorithms. We have the following discussions based on our bounds. (1) With variants of step size, GTD algorithms converge. (2) The convergence rate is determined by the step size, with the mixing time of the Markov process as the coefficient. The faster the Markov processes mix, the faster the convergence. (3) We explain that the experience replay trick is effective by improving the mixing property of the Markov process. To the best of our knowledge, our analysis is the first to provide finite sample bounds for the GTD algorithms in Markov setting.
Tasks
Published 2018-09-21
URL http://arxiv.org/abs/1809.08926v1
PDF http://arxiv.org/pdf/1809.08926v1.pdf
PWC https://paperswithcode.com/paper/finite-sample-analysis-of-the-gtd-policy
Repo
Framework

On How Well Generative Adversarial Networks Learn Densities: Nonparametric and Parametric Results

Title On How Well Generative Adversarial Networks Learn Densities: Nonparametric and Parametric Results
Authors Tengyuan Liang
Abstract We study in this paper the rate of convergence for learning distributions with the adversarial framework and Generative Adversarial Networks (GANs), which subsumes Wasserstein, Sobolev and MMD GANs as special cases. We study a wide range of parametric and nonparametric target distributions, under a collection of objective evaluation metrics. On the nonparametric end, we investigate the minimax optimal rates and fundamental difficulty of the density estimation under the adversarial framework. On the parametric end, we establish a theory for general neural network classes (including deep leaky ReLU as a special case), that characterizes the interplay on the choice of generator and discriminator. We investigate how to obtain a good statistical guarantee for GANs through the lens of regularization. We discover and isolate a new notion of regularization, called the \textit{generator/discriminator pair regularization}, that sheds light on the advantage of GANs compared to classical parametric and nonparametric approaches for density estimation. We develop novel oracle inequalities as the main tools for analyzing GANs, which is of independent theoretical interest.
Tasks Density Estimation
Published 2018-11-07
URL https://arxiv.org/abs/1811.03179v2
PDF https://arxiv.org/pdf/1811.03179v2.pdf
PWC https://paperswithcode.com/paper/on-how-well-generative-adversarial-networks
Repo
Framework

Towards a Neural Network Approach to Abstractive Multi-Document Summarization

Title Towards a Neural Network Approach to Abstractive Multi-Document Summarization
Authors Jianmin Zhang, Jiwei Tan, Xiaojun Wan
Abstract Till now, neural abstractive summarization methods have achieved great success for single document summarization (SDS). However, due to the lack of large scale multi-document summaries, such methods can be hardly applied to multi-document summarization (MDS). In this paper, we investigate neural abstractive methods for MDS by adapting a state-of-the-art neural abstractive summarization model for SDS. We propose an approach to extend the neural abstractive model trained on large scale SDS data to the MDS task. Our approach only makes use of a small number of multi-document summaries for fine tuning. Experimental results on two benchmark DUC datasets demonstrate that our approach can outperform a variety of baseline neural models.
Tasks Abstractive Text Summarization, Document Summarization, Multi-Document Summarization
Published 2018-04-24
URL http://arxiv.org/abs/1804.09010v1
PDF http://arxiv.org/pdf/1804.09010v1.pdf
PWC https://paperswithcode.com/paper/towards-a-neural-network-approach-to
Repo
Framework

Language Modeling at Scale

Title Language Modeling at Scale
Authors Mostofa Patwary, Milind Chabbi, Heewoo Jun, Jiaji Huang, Gregory Diamos, Kenneth Church
Abstract We show how Zipf’s Law can be used to scale up language modeling (LM) to take advantage of more training data and more GPUs. LM plays a key role in many important natural language applications such as speech recognition and machine translation. Scaling up LM is important since it is widely accepted by the community that there is no data like more data. Eventually, we would like to train on terabytes (TBs) of text (trillions of words). Modern training methods are far from this goal, because of various bottlenecks, especially memory (within GPUs) and communication (across GPUs). This paper shows how Zipf’s Law can address these bottlenecks by grouping parameters for common words and character sequences, because $U \ll N$, where $U$ is the number of unique words (types) and $N$ is the size of the training set (tokens). For a local batch size $K$ with $G$ GPUs and a $D$-dimension embedding matrix, we reduce the original per-GPU memory and communication asymptotic complexity from $\Theta(GKD)$ to $\Theta(GK + UD)$. Empirically, we find $U \propto (GK)^{0.64}$ on four publicly available large datasets. When we scale up the number of GPUs to 64, a factor of 8, training time speeds up by factors up to 6.7$\times$ (for character LMs) and 6.3$\times$ (for word LMs) with negligible loss of accuracy. Our weak scaling on 192 GPUs on the Tieba dataset shows a 35% improvement in LM prediction accuracy by training on 93 GB of data (2.5$\times$ larger than publicly available SOTA dataset), but taking only 1.25$\times$ increase in training time, compared to 3 GB of the same dataset running on 6 GPUs.
Tasks Language Modelling, Machine Translation, Speech Recognition
Published 2018-10-23
URL http://arxiv.org/abs/1810.10045v1
PDF http://arxiv.org/pdf/1810.10045v1.pdf
PWC https://paperswithcode.com/paper/language-modeling-at-scale
Repo
Framework

Incremental Classifier Learning with Generative Adversarial Networks

Title Incremental Classifier Learning with Generative Adversarial Networks
Authors Yue Wu, Yinpeng Chen, Lijuan Wang, Yuancheng Ye, Zicheng Liu, Yandong Guo, Zhengyou Zhang, Yun Fu
Abstract In this paper, we address the incremental classifier learning problem, which suffers from catastrophic forgetting. The main reason for catastrophic forgetting is that the past data are not available during learning. Typical approaches keep some exemplars for the past classes and use distillation regularization to retain the classification capability on the past classes and balance the past and new classes. However, there are four main problems with these approaches. First, the loss function is not efficient for classification. Second, there is unbalance problem between the past and new classes. Third, the size of pre-decided exemplars is usually limited and they might not be distinguishable from unseen new classes. Forth, the exemplars may not be allowed to be kept for a long time due to privacy regulations. To address these problems, we propose (a) a new loss function to combine the cross-entropy loss and distillation loss, (b) a simple way to estimate and remove the unbalance between the old and new classes , and (c) using Generative Adversarial Networks (GANs) to generate historical data and select representative exemplars during generation. We believe that the data generated by GANs have much less privacy issues than real images because GANs do not directly copy any real image patches. We evaluate the proposed method on CIFAR-100, Flower-102, and MS-Celeb-1M-Base datasets and extensive experiments demonstrate the effectiveness of our method.
Tasks
Published 2018-02-02
URL http://arxiv.org/abs/1802.00853v1
PDF http://arxiv.org/pdf/1802.00853v1.pdf
PWC https://paperswithcode.com/paper/incremental-classifier-learning-with
Repo
Framework

Estimating the Rating of Reviewers Based on the Text

Title Estimating the Rating of Reviewers Based on the Text
Authors Mohammadamir Kavousi, Sepehr Saadatmand
Abstract User-generated texts such as reviews and social media are valuable sources of information. Online reviews are important assets for users to buy a product, see a movie, or make a decision. Therefore, rating of a review is one of the reliable factors for all users to read and trust the reviews. This paper analyzes the texts of the reviews to evaluate and predict the ratings. Moreover, we study the effect of lexical features generated from text as well as sentimental words on the accuracy of rating prediction. Our analysis show that words with high information gain score are more efficient compared to words with high TF-IDF value. In addition, we explore the best number of features for predicting the ratings of the reviews.
Tasks
Published 2018-05-22
URL http://arxiv.org/abs/1805.08415v1
PDF http://arxiv.org/pdf/1805.08415v1.pdf
PWC https://paperswithcode.com/paper/estimating-the-rating-of-reviewers-based-on
Repo
Framework

Towards Automatic Personality Prediction Using Facebook Like Categories

Title Towards Automatic Personality Prediction Using Facebook Like Categories
Authors Raad Bin Tareaf, Philipp Berger, Patrick Hennig, Christoph Meinel
Abstract We demonstrate that effortlessly accessible digital records of behavior such as Facebook Likes can be obtained and utilized to automatically distinguish a wide range of highly delicate personal traits including: life satisfaction, cultural ethnicity, political views, age, gender and personality traits. The analysis presented based on a dataset of over 738,000 users who conferred their Facebook Likes, social network activities, egocentric network, demographic characteristics, and the results of various psychometric tests for our extended personality analysis. The proposed model uses unique mapping technique between each Facebook Like object to the corresponding Facebook page category/sub-category object, which is then evaluated as features for a set of machine learning algorithms to predict individual psycho-demographic profiles from Likes. The model , distinguishes between a religious and non-religious individual in 83% of circumstances, Asian and European in 87% of situations, and between emotional stable and emotion unstable in 81% of situations. We provide exemplars of correlations between attributes and Likes and present suggestions for future directions.
Tasks
Published 2018-12-11
URL http://arxiv.org/abs/1812.04346v1
PDF http://arxiv.org/pdf/1812.04346v1.pdf
PWC https://paperswithcode.com/paper/towards-automatic-personality-prediction
Repo
Framework

Correlated Components Analysis - Extracting Reliable Dimensions in Multivariate Data

Title Correlated Components Analysis - Extracting Reliable Dimensions in Multivariate Data
Authors Lucas C. Parra, Stefan Haufe, Jacek P. Dmochowski
Abstract How does one find dimensions in multivariate data that are reliably expressed across repetitions? For example, in a brain imaging study one may want to identify combinations of neural signals that are reliably expressed across multiple trials or subjects. For a behavioral assessment with multiple ratings, one may want to identify an aggregate score that is reliably reproduced across raters. Correlated Components Analysis (CorrCA) addresses this problem by identifying components that are maximally correlated between repetitions (e.g. trials, subjects, raters). Here we formalize this as the maximization of the ratio of between-repetition to within-repetition covariance. We show that this criterion maximizes repeat-reliability, defined as mean over variance across repeats, and that it leads to CorrCA or to multi-set Canonical Correlation Analysis, depending on the constraints. Surprisingly, we also find that CorrCA is equivalent to Linear Discriminant Analysis for zero-mean signals, which provides an unexpected link between classic concepts of multivariate analysis. We present an exact parametric test of statistical significance based on the F-statistic for normally distributed independent samples, and present and validate shuffle statistics for the case of dependent samples. Regularization and extension to non-linear mappings using kernels are also presented. The algorithms are demonstrated on a series of data analysis applications, and we provide all code and data required to reproduce the results.
Tasks
Published 2018-01-26
URL http://arxiv.org/abs/1801.08881v5
PDF http://arxiv.org/pdf/1801.08881v5.pdf
PWC https://paperswithcode.com/paper/correlated-components-analysis-extracting
Repo
Framework

What evidence does deep learning model use to classify Skin Lesions?

Title What evidence does deep learning model use to classify Skin Lesions?
Authors Xiaoxiao Li, Junyan Wu, Eric Z. Chen, Hongda Jiang
Abstract Melanoma is a type of skin cancer with the most rapidly increasing incidence. Early detection of melanoma using dermoscopy images significantly increases patients’ survival rate. However, accurately classifying skin lesions by eye, especially in the early stage of melanoma, is extremely challenging for the dermatologists. Hence, the discovery of reliable biomarkers will be meaningful for melanoma diagnosis. Recent years, the value of deep learning empowered computer-assisted diagnose has been shown in biomedical imaging based decision making. However, much research focuses on improving disease detection accuracy but not exploring the evidence of pathology. In this paper, we propose a method to interpret the deep learning classification findings. Firstly, we propose an accurate neural network architecture to classify skin lesions. Secondly, we utilize a prediction difference analysis method that examines each patch on the image through patch-wised corrupting to detect the biomarkers. Lastly, we validate that our biomarker findings are corresponding to the patterns in the literature. The findings can be significant and useful to guide clinical diagnosis.
Tasks Decision Making
Published 2018-11-02
URL http://arxiv.org/abs/1811.01051v3
PDF http://arxiv.org/pdf/1811.01051v3.pdf
PWC https://paperswithcode.com/paper/what-evidence-does-deep-learning-model-use-to
Repo
Framework

On the role of neurogenesis in overcoming catastrophic forgetting

Title On the role of neurogenesis in overcoming catastrophic forgetting
Authors German I. Parisi, Xu Ji, Stefan Wermter
Abstract Lifelong learning capabilities are crucial for artificial autonomous agents operating on real-world data, which is typically non-stationary and temporally correlated. In this work, we demonstrate that dynamically grown networks outperform static networks in incremental learning scenarios, even when bounded by the same amount of memory in both cases. Learning is unsupervised in our models, a condition that additionally makes training more challenging whilst increasing the realism of the study, since humans are able to learn without dense manual annotation. Our results on artificial neural networks reinforce that structural plasticity constitutes effective prevention against catastrophic forgetting in non-stationary environments, as well as empirically supporting the importance of neurogenesis in the mammalian brain.
Tasks
Published 2018-11-06
URL http://arxiv.org/abs/1811.02113v2
PDF http://arxiv.org/pdf/1811.02113v2.pdf
PWC https://paperswithcode.com/paper/on-the-role-of-neurogenesis-in-overcoming
Repo
Framework

On the Effect of Suboptimal Estimation of Mutual Information in Feature Selection and Classification

Title On the Effect of Suboptimal Estimation of Mutual Information in Feature Selection and Classification
Authors Kiran Karra, Lamine Mili
Abstract This paper introduces a new property of estimators of the strength of statistical association, which helps characterize how well an estimator will perform in scenarios where dependencies between continuous and discrete random variables need to be rank ordered. The new property, termed the estimator response curve, is easily computable and provides a marginal distribution agnostic way to assess an estimator’s performance. It overcomes notable drawbacks of current metrics of assessment, including statistical power, bias, and consistency. We utilize the estimator response curve to test various measures of the strength of association that satisfy the data processing inequality (DPI), and show that the CIM estimator’s performance compares favorably to kNN, vME, AP, and H_{MI} estimators of mutual information. The estimators which were identified to be suboptimal, according to the estimator response curve, perform worse than the more optimal estimators when tested with real-world data from four different areas of science, all with varying dimensionalities and sizes.
Tasks Feature Selection
Published 2018-04-30
URL https://arxiv.org/abs/1804.11021v2
PDF https://arxiv.org/pdf/1804.11021v2.pdf
PWC https://paperswithcode.com/paper/on-the-effect-of-suboptimal-estimation-of
Repo
Framework

Effect of Depth and Width on Local Minima in Deep Learning

Title Effect of Depth and Width on Local Minima in Deep Learning
Authors Kenji Kawaguchi, Jiaoyang Huang, Leslie Pack Kaelbling
Abstract In this paper, we analyze the effects of depth and width on the quality of local minima, without strong over-parameterization and simplification assumptions in the literature. Without any simplification assumption, for deep nonlinear neural networks with the squared loss, we theoretically show that the quality of local minima tends to improve towards the global minimum value as depth and width increase. Furthermore, with a locally-induced structure on deep nonlinear neural networks, the values of local minima of neural networks are theoretically proven to be no worse than the globally optimal values of corresponding classical machine learning models. We empirically support our theoretical observation with a synthetic dataset as well as MNIST, CIFAR-10 and SVHN datasets. When compared to previous studies with strong over-parameterization assumptions, the results in this paper do not require over-parameterization, and instead show the gradual effects of over-parameterization as consequences of general results.
Tasks
Published 2018-11-20
URL https://arxiv.org/abs/1811.08150v4
PDF https://arxiv.org/pdf/1811.08150v4.pdf
PWC https://paperswithcode.com/paper/effect-of-depth-and-width-on-local-minima-in
Repo
Framework

Fighting Accounting Fraud Through Forensic Data Analytics

Title Fighting Accounting Fraud Through Forensic Data Analytics
Authors Maria Jofre, Richard Gerlach
Abstract Accounting fraud is a global concern representing a significant threat to the financial system stability due to the resulting diminishing of the market confidence and trust of regulatory authorities. Several tricks can be used to commit accounting fraud, hence the need for non-static regulatory interventions that take into account different fraudulent patterns. Accordingly, this study aims to improve the detection of accounting fraud via the implementation of several machine learning methods to better differentiate between fraud and non-fraud companies, and to further assist the task of examination within the riskier firms by evaluating relevant financial indicators. Out-of-sample results suggest there is a great potential in detecting falsified financial statements through statistical modelling and analysis of publicly available accounting information. The proposed methodology can be of assistance to public auditors and regulatory agencies as it facilitates auditing processes, and supports more targeted and effective examinations of accounting reports.
Tasks
Published 2018-05-08
URL http://arxiv.org/abs/1805.02840v1
PDF http://arxiv.org/pdf/1805.02840v1.pdf
PWC https://paperswithcode.com/paper/fighting-accounting-fraud-through-forensic
Repo
Framework
comments powered by Disqus