Paper Group AWR 156
Bayesian Neural Networks at Finite Temperature. Investigating an Effective Character-level Embedding in Korean Sentence Classification. Representation Learning with Weighted Inner Product for Universal Approximation of General Similarities. Separating common (global and local) and distinct variation in multiple mixed types data sets. Integrating NV …
Bayesian Neural Networks at Finite Temperature
Title | Bayesian Neural Networks at Finite Temperature |
Authors | Robert J. N. Baldock, Nicola Marzari |
Abstract | We recapitulate the Bayesian formulation of neural network based classifiers and show that, while sampling from the posterior does indeed lead to better generalisation than is obtained by standard optimisation of the cost function, even better performance can in general be achieved by sampling finite temperature ($T$) distributions derived from the posterior. Taking the example of two different deep (3 hidden layers) classifiers for MNIST data, we find quite different $T$ values to be appropriate in each case. In particular, for a typical neural network classifier a clear minimum of the test error is observed at $T>0$. This suggests an early stopping criterion for full batch simulated annealing: cool until the average validation error starts to increase, then revert to the parameters with the lowest validation error. As $T$ is increased classifiers transition from accurate classifiers to classifiers that have higher training error than assigning equal probability to each class. Efficient studies of these temperature-induced effects are enabled using a replica-exchange Hamiltonian Monte Carlo simulation technique. Finally, we show how thermodynamic integration can be used to perform model selection for deep neural networks. Similar to the Laplace approximation, this approach assumes that the posterior is dominated by a single mode. Crucially, however, no assumption is made about the shape of that mode and it is not required to precisely compute and invert the Hessian. |
Tasks | Bayesian Inference, Model Selection |
Published | 2019-04-08 |
URL | http://arxiv.org/abs/1904.04154v1 |
http://arxiv.org/pdf/1904.04154v1.pdf | |
PWC | https://paperswithcode.com/paper/bayesian-neural-networks-at-finite |
Repo | https://github.com/rjnbaldock/nn_sample |
Framework | pytorch |
Investigating an Effective Character-level Embedding in Korean Sentence Classification
Title | Investigating an Effective Character-level Embedding in Korean Sentence Classification |
Authors | Won Ik Cho, Seok Min Kim, Nam Soo Kim |
Abstract | Different from the writing systems of many Romance and Germanic languages, some languages or language families show complex conjunct forms in character composition. For such cases where the conjuncts consist of the components representing consonant(s) and vowel, various character encoding schemes can be adopted beyond merely making up a one-hot vector. However, there has been little work done on intra-language comparison regarding performances using each representation. In this study, utilizing the Korean language which is character-rich and agglutinative, we investigate an encoding scheme that is the most effective among Jamo-level one-hot, character-level one-hot, character-level dense, and character-level multi-hot. Classification performance with each scheme is evaluated on two corpora: one on binary sentiment analysis of movie reviews, and the other on multi-class identification of intention types. The result displays that the character-level features show higher performance in general, although the Jamo-level features may show compatibility with the attention-based models if guaranteed adequate parameter set size. |
Tasks | Sentence Classification, Sentiment Analysis |
Published | 2019-05-31 |
URL | https://arxiv.org/abs/1905.13656v3 |
https://arxiv.org/pdf/1905.13656v3.pdf | |
PWC | https://paperswithcode.com/paper/investigating-an-effective-character-level |
Repo | https://github.com/warnikchow/coaudiotext |
Framework | tf |
Representation Learning with Weighted Inner Product for Universal Approximation of General Similarities
Title | Representation Learning with Weighted Inner Product for Universal Approximation of General Similarities |
Authors | Geewook Kim, Akifumi Okuno, Kazuki Fukui, Hidetoshi Shimodaira |
Abstract | We propose $\textit{weighted inner product similarity}$ (WIPS) for neural network-based graph embedding. In addition to the parameters of neural networks, we optimize the weights of the inner product by allowing positive and negative values. Despite its simplicity, WIPS can approximate arbitrary general similarities including positive definite, conditionally positive definite, and indefinite kernels. WIPS is free from similarity model selection, since it can learn any similarity models such as cosine similarity, negative Poincar'e distance and negative Wasserstein distance. Our experiments show that the proposed method can learn high-quality distributed representations of nodes from real datasets, leading to an accurate approximation of similarities as well as high performance in inductive tasks. |
Tasks | Graph Embedding, Model Selection, Representation Learning |
Published | 2019-02-27 |
URL | https://arxiv.org/abs/1902.10409v2 |
https://arxiv.org/pdf/1902.10409v2.pdf | |
PWC | https://paperswithcode.com/paper/representation-learning-with-weighted-inner |
Repo | https://github.com/kdrl/WIPS |
Framework | pytorch |
Separating common (global and local) and distinct variation in multiple mixed types data sets
Title | Separating common (global and local) and distinct variation in multiple mixed types data sets |
Authors | Yipeng Song, Johan A. Westerhuis, Age K. Smilde |
Abstract | Multiple sets of measurements on the same objects obtained from different platforms may reflect partially complementary information of the studied system. The integrative analysis of such data sets not only provides us with the opportunity of a deeper understanding of the studied system, but also introduces some new statistical challenges. First, the separation of information that is common across all or some of the data sets, and the information that is specific to each data set is problematic. Furthermore, these data sets are often a mix of quantitative and discrete (binary or categorical) data types, while commonly used data fusion methods require all data sets to be quantitative. In this paper, we propose an exponential family simultaneous component analysis (ESCA) model to tackle the potential mixed data types problem of multiple data sets. In addition, a structured sparse pattern of the loading matrix is induced through a nearly unbiased group concave penalty to disentangle the global, local common and distinct information of the multiple data sets. A Majorization-Minimization based algorithm is derived to fit the proposed model. Analytic solutions are derived for updating all the parameters of the model in each iteration, and the algorithm will decrease the objective function in each iteration monotonically. For model selection, a missing value based cross validation procedure is implemented. The advantages of the proposed method in comparison with other approaches are assessed using comprehensive simulations as well as the analysis of real data from a chronic lymphocytic leukaemia (CLL) study. Availability: the codes to reproduce the results in this article are available at https://gitlab.com/uvabda. |
Tasks | Model Selection |
Published | 2019-02-17 |
URL | https://arxiv.org/abs/1902.06241v2 |
https://arxiv.org/pdf/1902.06241v2.pdf | |
PWC | https://paperswithcode.com/paper/separating-common-global-and-local-and |
Repo | https://github.com/YipengUva/RpESCA |
Framework | none |
Integrating NVIDIA Deep Learning Accelerator (NVDLA) with RISC-V SoC on FireSim
Title | Integrating NVIDIA Deep Learning Accelerator (NVDLA) with RISC-V SoC on FireSim |
Authors | Farzad Farshchi, Qijing Huang, Heechul Yun |
Abstract | NVDLA is an open-source deep neural network (DNN) accelerator which has received a lot of attention by the community since its introduction by Nvidia. It is a full-featured hardware IP and can serve as a good reference for conducting research and development of SoCs with integrated accelerators. However, an expensive FPGA board is required to do experiments with this IP in a real SoC. Moreover, since NVDLA is clocked at a lower frequency on an FPGA, it would be hard to do accurate performance analysis with such a setup. To overcome these limitations, we integrate NVDLA into a real RISC-V SoC on the Amazon cloud FPGA using FireSim, a cycle-exact FPGA-accelerated simulator. We then evaluate the performance of NVDLA by running YOLOv3 object-detection algorithm. Our results show that NVDLA can sustain 7.5 fps when running YOLOv3. We further analyze the performance by showing that sharing the last-level cache with NVDLA can result in up to 1.56x speedup. We then identify that sharing the memory system with the accelerator can result in unpredictable execution time for the real-time tasks running on this platform. We believe this is an important issue that must be addressed in order for on-chip DNN accelerators to be incorporated in real-time embedded systems. |
Tasks | Object Detection |
Published | 2019-03-05 |
URL | https://arxiv.org/abs/1903.06495v2 |
https://arxiv.org/pdf/1903.06495v2.pdf | |
PWC | https://paperswithcode.com/paper/integrating-nvidia-deep-learning-accelerator |
Repo | https://github.com/CSL-KU/firesim-nvdla |
Framework | none |
Benchmarking TPU, GPU, and CPU Platforms for Deep Learning
Title | Benchmarking TPU, GPU, and CPU Platforms for Deep Learning |
Authors | Yu Emma Wang, Gu-Yeon Wei, David Brooks |
Abstract | Training deep learning models is compute-intensive and there is an industry-wide trend towards hardware specialization to improve performance. To systematically benchmark deep learning platforms, we introduce ParaDnn, a parameterized benchmark suite for deep learning that generates end-to-end models for fully connected (FC), convolutional (CNN), and recurrent (RNN) neural networks. Along with six real-world models, we benchmark Google’s Cloud TPU v2/v3, NVIDIA’s V100 GPU, and an Intel Skylake CPU platform. We take a deep dive into TPU architecture, reveal its bottlenecks, and highlight valuable lessons learned for future specialized system design. We also provide a thorough comparison of the platforms and find that each has unique strengths for some types of models. Finally, we quantify the rapid performance improvements that specialized software stacks provide for the TPU and GPU platforms. |
Tasks | |
Published | 2019-07-24 |
URL | https://arxiv.org/abs/1907.10701v4 |
https://arxiv.org/pdf/1907.10701v4.pdf | |
PWC | https://paperswithcode.com/paper/benchmarking-tpu-gpu-and-cpu-platforms-for |
Repo | https://github.com/Emma926/paradnn |
Framework | tf |
Efficient Neural Architecture Search via Proximal Iterations
Title | Efficient Neural Architecture Search via Proximal Iterations |
Authors | Quanming Yao, Ju Xu, Wei-Wei Tu, Zhanxing Zhu |
Abstract | Neural architecture search (NAS) recently attracts much research attention because of its ability to identify better architectures than handcrafted ones. However, many NAS methods, which optimize the search process in a discrete search space, need many GPU days for convergence. Recently, DARTS, which constructs a differentiable search space and then optimizes it by gradient descent, can obtain high-performance architecture and reduces the search time to several days. However, DARTS is still slow as it updates an ensemble of all operations and keeps only one after convergence. Besides, DARTS can converge to inferior architectures due to the strong correlation among operations. In this paper, we propose a new differentiable Neural Architecture Search method based on Proximal gradient descent (denoted as NASP). Different from DARTS, NASP reformulates the search process as an optimization problem with a constraint that only one operation is allowed to be updated during forward and backward propagation. Since the constraint is hard to deal with, we propose a new algorithm inspired by proximal iterations to solve it. Experiments on various tasks demonstrate that NASP can obtain high-performance architectures with 10 times of speedup on the computational time than DARTS. |
Tasks | Neural Architecture Search |
Published | 2019-05-30 |
URL | https://arxiv.org/abs/1905.13577v3 |
https://arxiv.org/pdf/1905.13577v3.pdf | |
PWC | https://paperswithcode.com/paper/differentiable-neural-architecture-search-via |
Repo | https://github.com/xiangning-chen/SIF |
Framework | pytorch |
Bit-Swap: Recursive Bits-Back Coding for Lossless Compression with Hierarchical Latent Variables
Title | Bit-Swap: Recursive Bits-Back Coding for Lossless Compression with Hierarchical Latent Variables |
Authors | Friso H. Kingma, Pieter Abbeel, Jonathan Ho |
Abstract | The bits-back argument suggests that latent variable models can be turned into lossless compression schemes. Translating the bits-back argument into efficient and practical lossless compression schemes for general latent variable models, however, is still an open problem. Bits-Back with Asymmetric Numeral Systems (BB-ANS), recently proposed by Townsend et al. (2019), makes bits-back coding practically feasible for latent variable models with one latent layer, but it is inefficient for hierarchical latent variable models. In this paper we propose Bit-Swap, a new compression scheme that generalizes BB-ANS and achieves strictly better compression rates for hierarchical latent variable models with Markov chain structure. Through experiments we verify that Bit-Swap results in lossless compression rates that are empirically superior to existing techniques. Our implementation is available at https://github.com/fhkingma/bitswap. |
Tasks | Latent Variable Models |
Published | 2019-05-16 |
URL | https://arxiv.org/abs/1905.06845v4 |
https://arxiv.org/pdf/1905.06845v4.pdf | |
PWC | https://paperswithcode.com/paper/bit-swap-recursive-bits-back-coding-for |
Repo | https://github.com/fhkingma/bitswap |
Framework | pytorch |
Convolutional Poisson Gamma Belief Network
Title | Convolutional Poisson Gamma Belief Network |
Authors | Chaojie Wang, Bo Chen, Sucheng Xiao, Mingyuan Zhou |
Abstract | For text analysis, one often resorts to a lossy representation that either completely ignores word order or embeds each word as a low-dimensional dense feature vector. In this paper, we propose convolutional Poisson factor analysis (CPFA) that directly operates on a lossless representation that processes the words in each document as a sequence of high-dimensional one-hot vectors. To boost its performance, we further propose the convolutional Poisson gamma belief network (CPGBN) that couples CPFA with the gamma belief network via a novel probabilistic pooling layer. CPFA forms words into phrases and captures very specific phrase-level topics, and CPGBN further builds a hierarchy of increasingly more general phrase-level topics. For efficient inference, we develop both a Gibbs sampler and a Weibull distribution based convolutional variational auto-encoder. Experimental results demonstrate that CPGBN can extract high-quality text latent representations that capture the word order information, and hence can be leveraged as a building block to enrich a wide variety of existing latent variable models that ignore word order. |
Tasks | Latent Variable Models |
Published | 2019-05-14 |
URL | https://arxiv.org/abs/1905.05394v1 |
https://arxiv.org/pdf/1905.05394v1.pdf | |
PWC | https://paperswithcode.com/paper/convolutional-poisson-gamma-belief-network |
Repo | https://github.com/BoChenGroup/CPGBN |
Framework | tf |
A Contrastive Divergence for Combining Variational Inference and MCMC
Title | A Contrastive Divergence for Combining Variational Inference and MCMC |
Authors | Francisco J. R. Ruiz, Michalis K. Titsias |
Abstract | We develop a method to combine Markov chain Monte Carlo (MCMC) and variational inference (VI), leveraging the advantages of both inference approaches. Specifically, we improve the variational distribution by running a few MCMC steps. To make inference tractable, we introduce the variational contrastive divergence (VCD), a new divergence that replaces the standard Kullback-Leibler (KL) divergence used in VI. The VCD captures a notion of discrepancy between the initial variational distribution and its improved version (obtained after running the MCMC steps), and it converges asymptotically to the symmetrized KL divergence between the variational distribution and the posterior of interest. The VCD objective can be optimized efficiently with respect to the variational parameters via stochastic optimization. We show experimentally that optimizing the VCD leads to better predictive performance on two latent variable models: logistic matrix factorization and variational autoencoders (VAEs). |
Tasks | Latent Variable Models, Stochastic Optimization |
Published | 2019-05-10 |
URL | https://arxiv.org/abs/1905.04062v2 |
https://arxiv.org/pdf/1905.04062v2.pdf | |
PWC | https://paperswithcode.com/paper/a-contrastive-divergence-for-combining |
Repo | https://github.com/franrruiz/vcd_divergence |
Framework | none |
DAVANet: Stereo Deblurring with View Aggregation
Title | DAVANet: Stereo Deblurring with View Aggregation |
Authors | Shangchen Zhou, Jiawei Zhang, Wangmeng Zuo, Haozhe Xie, Jinshan Pan, Jimmy Ren |
Abstract | Nowadays stereo cameras are more commonly adopted in emerging devices such as dual-lens smartphones and unmanned aerial vehicles. However, they also suffer from blurry images in dynamic scenes which leads to visual discomfort and hampers further image processing. Previous works have succeeded in monocular deblurring, yet there are few studies on deblurring for stereoscopic images. By exploiting the two-view nature of stereo images, we propose a novel stereo image deblurring network with Depth Awareness and View Aggregation, named DAVANet. In our proposed network, 3D scene cues from the depth and varying information from two views are incorporated, which help to remove complex spatially-varying blur in dynamic scenes. Specifically, with our proposed fusion network, we integrate the bidirectional disparities estimation and deblurring into a unified framework. Moreover, we present a large-scale multi-scene dataset for stereo deblurring, containing 20,637 blurry-sharp stereo image pairs from 135 diverse sequences and their corresponding bidirectional disparities. The experimental results on our dataset demonstrate that DAVANet outperforms state-of-the-art methods in terms of accuracy, speed, and model size. |
Tasks | Deblurring |
Published | 2019-04-10 |
URL | http://arxiv.org/abs/1904.05065v1 |
http://arxiv.org/pdf/1904.05065v1.pdf | |
PWC | https://paperswithcode.com/paper/davanet-stereo-deblurring-with-view |
Repo | https://github.com/sczhou/DAVANet |
Framework | pytorch |
Tutorial: Complexity analysis of Singular Value Decomposition and its variants
Title | Tutorial: Complexity analysis of Singular Value Decomposition and its variants |
Authors | Xiaocan Li, Shuo Wang, Yinghao Cai |
Abstract | We compared the regular Singular Value Decomposition (SVD), truncated SVD, Krylov method and Randomized PCA, in terms of time and space complexity. It is well-known that Krylov method and Randomized PCA only performs well when k « n, i.e. the number of eigenpair needed is far less than that of matrix size. We compared them for calculating all the eigenpairs. We also discussed the relationship between Principal Component Analysis and SVD. |
Tasks | |
Published | 2019-06-28 |
URL | https://arxiv.org/abs/1906.12085v3 |
https://arxiv.org/pdf/1906.12085v3.pdf | |
PWC | https://paperswithcode.com/paper/famesvd-fast-and-memory-efficient-singular |
Repo | https://github.com/UnofficialJuliaMirrorSnapshots/FameSVD.jl-9ba2d756-9ce3-11e9-1a71-0ffcb019784d |
Framework | none |
A framework for information extraction from tables in biomedical literature
Title | A framework for information extraction from tables in biomedical literature |
Authors | Nikola Milosevic, Cassie Gregson, Robert Hernandez, Goran Nenadic |
Abstract | The scientific literature is growing exponentially, and professionals are no more able to cope with the current amount of publications. Text mining provided in the past methods to retrieve and extract information from text; however, most of these approaches ignored tables and figures. The research done in mining table data still does not have an integrated approach for mining that would consider all complexities and challenges of a table. Our research is examining the methods for extracting numerical (number of patients, age, gender distribution) and textual (adverse reactions) information from tables in the clinical literature. We present a requirement analysis template and an integral methodology for information extraction from tables in clinical domain that contains 7 steps: (1) table detection, (2) functional processing, (3) structural processing, (4) semantic tagging, (5) pragmatic processing, (6) cell selection and (7) syntactic processing and extraction. Our approach performed with the F-measure ranged between 82 and 92%, depending on the variable, task and its complexity. |
Tasks | Table Detection |
Published | 2019-02-26 |
URL | http://arxiv.org/abs/1902.10031v1 |
http://arxiv.org/pdf/1902.10031v1.pdf | |
PWC | https://paperswithcode.com/paper/a-framework-for-information-extraction-from |
Repo | https://github.com/nikolamilosevic86/TabInOut |
Framework | none |
Identification, Interpretability, and Bayesian Word Embeddings
Title | Identification, Interpretability, and Bayesian Word Embeddings |
Authors | Adam M. Lauretig |
Abstract | Social scientists have recently turned to analyzing text using tools from natural language processing like word embeddings to measure concepts like ideology, bias, and affinity. However, word embeddings are difficult to use in the regression framework familiar to social scientists: embeddings are are neither identified, nor directly interpretable. I offer two advances on standard embedding models to remedy these problems. First, I develop Bayesian Word Embeddings with Automatic Relevance Determination priors, relaxing the assumption that all embedding dimensions have equal weight. Second, I apply work identifying latent variable models to anchor the dimensions of the resulting embeddings, identifying them, and making them interpretable and usable in a regression. I then apply this model and anchoring approach to two cases, the shift in internationalist rhetoric in the American presidents’ inaugural addresses, and the relationship between bellicosity in American foreign policy decision-makers’ deliberations. I find that inaugural addresses became less internationalist after 1945, which goes against the conventional wisdom, and that an increase in bellicosity is associated with an increase in hostile actions by the United States, showing that elite deliberations are not cheap talk, and helping confirm the validity of the model. |
Tasks | Latent Variable Models, Word Embeddings |
Published | 2019-04-02 |
URL | http://arxiv.org/abs/1904.01628v1 |
http://arxiv.org/pdf/1904.01628v1.pdf | |
PWC | https://paperswithcode.com/paper/identification-interpretability-and-bayesian |
Repo | https://github.com/adamlauretig/bwe |
Framework | none |
The Variational Predictive Natural Gradient
Title | The Variational Predictive Natural Gradient |
Authors | Da Tang, Rajesh Ranganath |
Abstract | Variational inference transforms posterior inference into parametric optimization thereby enabling the use of latent variable models where otherwise impractical. However, variational inference can be finicky when different variational parameters control variables that are strongly correlated under the model. Traditional natural gradients based on the variational approximation fail to correct for correlations when the approximation is not the true posterior. To address this, we construct a new natural gradient called the Variational Predictive Natural Gradient (VPNG). Unlike traditional natural gradients for variational inference, this natural gradient accounts for the relationship between model parameters and variational parameters. We demonstrate the insight with a simple example as well as the empirical value on a classification task, a deep generative model of images, and probabilistic matrix factorization for recommendation. |
Tasks | Latent Variable Models |
Published | 2019-03-07 |
URL | https://arxiv.org/abs/1903.02984v3 |
https://arxiv.org/pdf/1903.02984v3.pdf | |
PWC | https://paperswithcode.com/paper/the-variational-predictive-natural-gradient |
Repo | https://github.com/datang1992/VPNG |
Framework | tf |