February 1, 2020

3112 words 15 mins read

Paper Group AWR 156

Bayesian Neural Networks at Finite Temperature. Investigating an Effective Character-level Embedding in Korean Sentence Classification. Representation Learning with Weighted Inner Product for Universal Approximation of General Similarities. Separating common (global and local) and distinct variation in multiple mixed types data sets. Integrating NV …

Bayesian Neural Networks at Finite Temperature


Title	Bayesian Neural Networks at Finite Temperature
Authors	Robert J. N. Baldock, Nicola Marzari
Abstract	We recapitulate the Bayesian formulation of neural network based classifiers and show that, while sampling from the posterior does indeed lead to better generalisation than is obtained by standard optimisation of the cost function, even better performance can in general be achieved by sampling finite temperature ($T$) distributions derived from the posterior. Taking the example of two different deep (3 hidden layers) classifiers for MNIST data, we find quite different $T$ values to be appropriate in each case. In particular, for a typical neural network classifier a clear minimum of the test error is observed at $T>0$. This suggests an early stopping criterion for full batch simulated annealing: cool until the average validation error starts to increase, then revert to the parameters with the lowest validation error. As $T$ is increased classifiers transition from accurate classifiers to classifiers that have higher training error than assigning equal probability to each class. Efficient studies of these temperature-induced effects are enabled using a replica-exchange Hamiltonian Monte Carlo simulation technique. Finally, we show how thermodynamic integration can be used to perform model selection for deep neural networks. Similar to the Laplace approximation, this approach assumes that the posterior is dominated by a single mode. Crucially, however, no assumption is made about the shape of that mode and it is not required to precisely compute and invert the Hessian.
Tasks	Bayesian Inference, Model Selection
Published	2019-04-08
URL	http://arxiv.org/abs/1904.04154v1
PDF	http://arxiv.org/pdf/1904.04154v1.pdf
PWC	https://paperswithcode.com/paper/bayesian-neural-networks-at-finite
Repo	https://github.com/rjnbaldock/nn_sample
Framework	pytorch

Investigating an Effective Character-level Embedding in Korean Sentence Classification


Title	Investigating an Effective Character-level Embedding in Korean Sentence Classification
Authors	Won Ik Cho, Seok Min Kim, Nam Soo Kim
Abstract	Different from the writing systems of many Romance and Germanic languages, some languages or language families show complex conjunct forms in character composition. For such cases where the conjuncts consist of the components representing consonant(s) and vowel, various character encoding schemes can be adopted beyond merely making up a one-hot vector. However, there has been little work done on intra-language comparison regarding performances using each representation. In this study, utilizing the Korean language which is character-rich and agglutinative, we investigate an encoding scheme that is the most effective among Jamo-level one-hot, character-level one-hot, character-level dense, and character-level multi-hot. Classification performance with each scheme is evaluated on two corpora: one on binary sentiment analysis of movie reviews, and the other on multi-class identification of intention types. The result displays that the character-level features show higher performance in general, although the Jamo-level features may show compatibility with the attention-based models if guaranteed adequate parameter set size.
Tasks	Sentence Classification, Sentiment Analysis
Published	2019-05-31
URL	https://arxiv.org/abs/1905.13656v3
PDF	https://arxiv.org/pdf/1905.13656v3.pdf
PWC	https://paperswithcode.com/paper/investigating-an-effective-character-level
Repo	https://github.com/warnikchow/coaudiotext
Framework	tf

Representation Learning with Weighted Inner Product for Universal Approximation of General Similarities


Title	Representation Learning with Weighted Inner Product for Universal Approximation of General Similarities
Authors	Geewook Kim, Akifumi Okuno, Kazuki Fukui, Hidetoshi Shimodaira
Abstract	We propose $\textit{weighted inner product similarity}$ (WIPS) for neural network-based graph embedding. In addition to the parameters of neural networks, we optimize the weights of the inner product by allowing positive and negative values. Despite its simplicity, WIPS can approximate arbitrary general similarities including positive definite, conditionally positive definite, and indefinite kernels. WIPS is free from similarity model selection, since it can learn any similarity models such as cosine similarity, negative Poincar'e distance and negative Wasserstein distance. Our experiments show that the proposed method can learn high-quality distributed representations of nodes from real datasets, leading to an accurate approximation of similarities as well as high performance in inductive tasks.
Tasks	Graph Embedding, Model Selection, Representation Learning
Published	2019-02-27
URL	https://arxiv.org/abs/1902.10409v2
PDF	https://arxiv.org/pdf/1902.10409v2.pdf
PWC	https://paperswithcode.com/paper/representation-learning-with-weighted-inner
Repo	https://github.com/kdrl/WIPS
Framework	pytorch

Separating common (global and local) and distinct variation in multiple mixed types data sets


Title	Separating common (global and local) and distinct variation in multiple mixed types data sets
Authors	Yipeng Song, Johan A. Westerhuis, Age K. Smilde
Abstract	Multiple sets of measurements on the same objects obtained from different platforms may reflect partially complementary information of the studied system. The integrative analysis of such data sets not only provides us with the opportunity of a deeper understanding of the studied system, but also introduces some new statistical challenges. First, the separation of information that is common across all or some of the data sets, and the information that is specific to each data set is problematic. Furthermore, these data sets are often a mix of quantitative and discrete (binary or categorical) data types, while commonly used data fusion methods require all data sets to be quantitative. In this paper, we propose an exponential family simultaneous component analysis (ESCA) model to tackle the potential mixed data types problem of multiple data sets. In addition, a structured sparse pattern of the loading matrix is induced through a nearly unbiased group concave penalty to disentangle the global, local common and distinct information of the multiple data sets. A Majorization-Minimization based algorithm is derived to fit the proposed model. Analytic solutions are derived for updating all the parameters of the model in each iteration, and the algorithm will decrease the objective function in each iteration monotonically. For model selection, a missing value based cross validation procedure is implemented. The advantages of the proposed method in comparison with other approaches are assessed using comprehensive simulations as well as the analysis of real data from a chronic lymphocytic leukaemia (CLL) study. Availability: the codes to reproduce the results in this article are available at https://gitlab.com/uvabda.
Tasks	Model Selection
Published	2019-02-17
URL	https://arxiv.org/abs/1902.06241v2
PDF	https://arxiv.org/pdf/1902.06241v2.pdf
PWC	https://paperswithcode.com/paper/separating-common-global-and-local-and
Repo	https://github.com/YipengUva/RpESCA
Framework	none

Integrating NVIDIA Deep Learning Accelerator (NVDLA) with RISC-V SoC on FireSim


Title	Integrating NVIDIA Deep Learning Accelerator (NVDLA) with RISC-V SoC on FireSim
Authors	Farzad Farshchi, Qijing Huang, Heechul Yun
Abstract	NVDLA is an open-source deep neural network (DNN) accelerator which has received a lot of attention by the community since its introduction by Nvidia. It is a full-featured hardware IP and can serve as a good reference for conducting research and development of SoCs with integrated accelerators. However, an expensive FPGA board is required to do experiments with this IP in a real SoC. Moreover, since NVDLA is clocked at a lower frequency on an FPGA, it would be hard to do accurate performance analysis with such a setup. To overcome these limitations, we integrate NVDLA into a real RISC-V SoC on the Amazon cloud FPGA using FireSim, a cycle-exact FPGA-accelerated simulator. We then evaluate the performance of NVDLA by running YOLOv3 object-detection algorithm. Our results show that NVDLA can sustain 7.5 fps when running YOLOv3. We further analyze the performance by showing that sharing the last-level cache with NVDLA can result in up to 1.56x speedup. We then identify that sharing the memory system with the accelerator can result in unpredictable execution time for the real-time tasks running on this platform. We believe this is an important issue that must be addressed in order for on-chip DNN accelerators to be incorporated in real-time embedded systems.
Tasks	Object Detection
Published	2019-03-05
URL	https://arxiv.org/abs/1903.06495v2
PDF	https://arxiv.org/pdf/1903.06495v2.pdf
PWC	https://paperswithcode.com/paper/integrating-nvidia-deep-learning-accelerator
Repo	https://github.com/CSL-KU/firesim-nvdla
Framework	none

Benchmarking TPU, GPU, and CPU Platforms for Deep Learning


Title	Benchmarking TPU, GPU, and CPU Platforms for Deep Learning
Authors	Yu Emma Wang, Gu-Yeon Wei, David Brooks
Abstract	Training deep learning models is compute-intensive and there is an industry-wide trend towards hardware specialization to improve performance. To systematically benchmark deep learning platforms, we introduce ParaDnn, a parameterized benchmark suite for deep learning that generates end-to-end models for fully connected (FC), convolutional (CNN), and recurrent (RNN) neural networks. Along with six real-world models, we benchmark Google’s Cloud TPU v2/v3, NVIDIA’s V100 GPU, and an Intel Skylake CPU platform. We take a deep dive into TPU architecture, reveal its bottlenecks, and highlight valuable lessons learned for future specialized system design. We also provide a thorough comparison of the platforms and find that each has unique strengths for some types of models. Finally, we quantify the rapid performance improvements that specialized software stacks provide for the TPU and GPU platforms.
Tasks
Published	2019-07-24
URL	https://arxiv.org/abs/1907.10701v4
PDF	https://arxiv.org/pdf/1907.10701v4.pdf
PWC	https://paperswithcode.com/paper/benchmarking-tpu-gpu-and-cpu-platforms-for
Repo	https://github.com/Emma926/paradnn
Framework	tf

Efficient Neural Architecture Search via Proximal Iterations


Title	Efficient Neural Architecture Search via Proximal Iterations
Authors	Quanming Yao, Ju Xu, Wei-Wei Tu, Zhanxing Zhu
Abstract	Neural architecture search (NAS) recently attracts much research attention because of its ability to identify better architectures than handcrafted ones. However, many NAS methods, which optimize the search process in a discrete search space, need many GPU days for convergence. Recently, DARTS, which constructs a differentiable search space and then optimizes it by gradient descent, can obtain high-performance architecture and reduces the search time to several days. However, DARTS is still slow as it updates an ensemble of all operations and keeps only one after convergence. Besides, DARTS can converge to inferior architectures due to the strong correlation among operations. In this paper, we propose a new differentiable Neural Architecture Search method based on Proximal gradient descent (denoted as NASP). Different from DARTS, NASP reformulates the search process as an optimization problem with a constraint that only one operation is allowed to be updated during forward and backward propagation. Since the constraint is hard to deal with, we propose a new algorithm inspired by proximal iterations to solve it. Experiments on various tasks demonstrate that NASP can obtain high-performance architectures with 10 times of speedup on the computational time than DARTS.
Tasks	Neural Architecture Search
Published	2019-05-30
URL	https://arxiv.org/abs/1905.13577v3
PDF	https://arxiv.org/pdf/1905.13577v3.pdf
PWC	https://paperswithcode.com/paper/differentiable-neural-architecture-search-via
Repo	https://github.com/xiangning-chen/SIF
Framework	pytorch

Bit-Swap: Recursive Bits-Back Coding for Lossless Compression with Hierarchical Latent Variables


Title	Bit-Swap: Recursive Bits-Back Coding for Lossless Compression with Hierarchical Latent Variables
Authors	Friso H. Kingma, Pieter Abbeel, Jonathan Ho
Abstract	The bits-back argument suggests that latent variable models can be turned into lossless compression schemes. Translating the bits-back argument into efficient and practical lossless compression schemes for general latent variable models, however, is still an open problem. Bits-Back with Asymmetric Numeral Systems (BB-ANS), recently proposed by Townsend et al. (2019), makes bits-back coding practically feasible for latent variable models with one latent layer, but it is inefficient for hierarchical latent variable models. In this paper we propose Bit-Swap, a new compression scheme that generalizes BB-ANS and achieves strictly better compression rates for hierarchical latent variable models with Markov chain structure. Through experiments we verify that Bit-Swap results in lossless compression rates that are empirically superior to existing techniques. Our implementation is available at https://github.com/fhkingma/bitswap.
Tasks	Latent Variable Models
Published	2019-05-16
URL	https://arxiv.org/abs/1905.06845v4
PDF	https://arxiv.org/pdf/1905.06845v4.pdf
PWC	https://paperswithcode.com/paper/bit-swap-recursive-bits-back-coding-for
Repo	https://github.com/fhkingma/bitswap
Framework	pytorch

Convolutional Poisson Gamma Belief Network


Title	Convolutional Poisson Gamma Belief Network
Authors	Chaojie Wang, Bo Chen, Sucheng Xiao, Mingyuan Zhou
Abstract	For text analysis, one often resorts to a lossy representation that either completely ignores word order or embeds each word as a low-dimensional dense feature vector. In this paper, we propose convolutional Poisson factor analysis (CPFA) that directly operates on a lossless representation that processes the words in each document as a sequence of high-dimensional one-hot vectors. To boost its performance, we further propose the convolutional Poisson gamma belief network (CPGBN) that couples CPFA with the gamma belief network via a novel probabilistic pooling layer. CPFA forms words into phrases and captures very specific phrase-level topics, and CPGBN further builds a hierarchy of increasingly more general phrase-level topics. For efficient inference, we develop both a Gibbs sampler and a Weibull distribution based convolutional variational auto-encoder. Experimental results demonstrate that CPGBN can extract high-quality text latent representations that capture the word order information, and hence can be leveraged as a building block to enrich a wide variety of existing latent variable models that ignore word order.
Tasks	Latent Variable Models
Published	2019-05-14
URL	https://arxiv.org/abs/1905.05394v1
PDF	https://arxiv.org/pdf/1905.05394v1.pdf
PWC	https://paperswithcode.com/paper/convolutional-poisson-gamma-belief-network
Repo	https://github.com/BoChenGroup/CPGBN
Framework	tf

A Contrastive Divergence for Combining Variational Inference and MCMC


Title	A Contrastive Divergence for Combining Variational Inference and MCMC
Authors	Francisco J. R. Ruiz, Michalis K. Titsias
Abstract	We develop a method to combine Markov chain Monte Carlo (MCMC) and variational inference (VI), leveraging the advantages of both inference approaches. Specifically, we improve the variational distribution by running a few MCMC steps. To make inference tractable, we introduce the variational contrastive divergence (VCD), a new divergence that replaces the standard Kullback-Leibler (KL) divergence used in VI. The VCD captures a notion of discrepancy between the initial variational distribution and its improved version (obtained after running the MCMC steps), and it converges asymptotically to the symmetrized KL divergence between the variational distribution and the posterior of interest. The VCD objective can be optimized efficiently with respect to the variational parameters via stochastic optimization. We show experimentally that optimizing the VCD leads to better predictive performance on two latent variable models: logistic matrix factorization and variational autoencoders (VAEs).
Tasks	Latent Variable Models, Stochastic Optimization
Published	2019-05-10
URL	https://arxiv.org/abs/1905.04062v2
PDF	https://arxiv.org/pdf/1905.04062v2.pdf
PWC	https://paperswithcode.com/paper/a-contrastive-divergence-for-combining
Repo	https://github.com/franrruiz/vcd_divergence
Framework	none

DAVANet: Stereo Deblurring with View Aggregation


Title	DAVANet: Stereo Deblurring with View Aggregation
Authors	Shangchen Zhou, Jiawei Zhang, Wangmeng Zuo, Haozhe Xie, Jinshan Pan, Jimmy Ren
Abstract	Nowadays stereo cameras are more commonly adopted in emerging devices such as dual-lens smartphones and unmanned aerial vehicles. However, they also suffer from blurry images in dynamic scenes which leads to visual discomfort and hampers further image processing. Previous works have succeeded in monocular deblurring, yet there are few studies on deblurring for stereoscopic images. By exploiting the two-view nature of stereo images, we propose a novel stereo image deblurring network with Depth Awareness and View Aggregation, named DAVANet. In our proposed network, 3D scene cues from the depth and varying information from two views are incorporated, which help to remove complex spatially-varying blur in dynamic scenes. Specifically, with our proposed fusion network, we integrate the bidirectional disparities estimation and deblurring into a unified framework. Moreover, we present a large-scale multi-scene dataset for stereo deblurring, containing 20,637 blurry-sharp stereo image pairs from 135 diverse sequences and their corresponding bidirectional disparities. The experimental results on our dataset demonstrate that DAVANet outperforms state-of-the-art methods in terms of accuracy, speed, and model size.
Tasks	Deblurring
Published	2019-04-10
URL	http://arxiv.org/abs/1904.05065v1
PDF	http://arxiv.org/pdf/1904.05065v1.pdf
PWC	https://paperswithcode.com/paper/davanet-stereo-deblurring-with-view
Repo	https://github.com/sczhou/DAVANet
Framework	pytorch

Tutorial: Complexity analysis of Singular Value Decomposition and its variants


Title	Tutorial: Complexity analysis of Singular Value Decomposition and its variants
Authors	Xiaocan Li, Shuo Wang, Yinghao Cai
Abstract	We compared the regular Singular Value Decomposition (SVD), truncated SVD, Krylov method and Randomized PCA, in terms of time and space complexity. It is well-known that Krylov method and Randomized PCA only performs well when k « n, i.e. the number of eigenpair needed is far less than that of matrix size. We compared them for calculating all the eigenpairs. We also discussed the relationship between Principal Component Analysis and SVD.
Tasks
Published	2019-06-28
URL	https://arxiv.org/abs/1906.12085v3
PDF	https://arxiv.org/pdf/1906.12085v3.pdf
PWC	https://paperswithcode.com/paper/famesvd-fast-and-memory-efficient-singular
Repo	https://github.com/UnofficialJuliaMirrorSnapshots/FameSVD.jl-9ba2d756-9ce3-11e9-1a71-0ffcb019784d
Framework	none

A framework for information extraction from tables in biomedical literature


Title	A framework for information extraction from tables in biomedical literature
Authors	Nikola Milosevic, Cassie Gregson, Robert Hernandez, Goran Nenadic
Abstract	The scientific literature is growing exponentially, and professionals are no more able to cope with the current amount of publications. Text mining provided in the past methods to retrieve and extract information from text; however, most of these approaches ignored tables and figures. The research done in mining table data still does not have an integrated approach for mining that would consider all complexities and challenges of a table. Our research is examining the methods for extracting numerical (number of patients, age, gender distribution) and textual (adverse reactions) information from tables in the clinical literature. We present a requirement analysis template and an integral methodology for information extraction from tables in clinical domain that contains 7 steps: (1) table detection, (2) functional processing, (3) structural processing, (4) semantic tagging, (5) pragmatic processing, (6) cell selection and (7) syntactic processing and extraction. Our approach performed with the F-measure ranged between 82 and 92%, depending on the variable, task and its complexity.
Tasks	Table Detection
Published	2019-02-26
URL	http://arxiv.org/abs/1902.10031v1
PDF	http://arxiv.org/pdf/1902.10031v1.pdf
PWC	https://paperswithcode.com/paper/a-framework-for-information-extraction-from
Repo	https://github.com/nikolamilosevic86/TabInOut
Framework	none

Identification, Interpretability, and Bayesian Word Embeddings


Title	Identification, Interpretability, and Bayesian Word Embeddings
Authors	Adam M. Lauretig
Abstract	Social scientists have recently turned to analyzing text using tools from natural language processing like word embeddings to measure concepts like ideology, bias, and affinity. However, word embeddings are difficult to use in the regression framework familiar to social scientists: embeddings are are neither identified, nor directly interpretable. I offer two advances on standard embedding models to remedy these problems. First, I develop Bayesian Word Embeddings with Automatic Relevance Determination priors, relaxing the assumption that all embedding dimensions have equal weight. Second, I apply work identifying latent variable models to anchor the dimensions of the resulting embeddings, identifying them, and making them interpretable and usable in a regression. I then apply this model and anchoring approach to two cases, the shift in internationalist rhetoric in the American presidents’ inaugural addresses, and the relationship between bellicosity in American foreign policy decision-makers’ deliberations. I find that inaugural addresses became less internationalist after 1945, which goes against the conventional wisdom, and that an increase in bellicosity is associated with an increase in hostile actions by the United States, showing that elite deliberations are not cheap talk, and helping confirm the validity of the model.
Tasks	Latent Variable Models, Word Embeddings
Published	2019-04-02
URL	http://arxiv.org/abs/1904.01628v1
PDF	http://arxiv.org/pdf/1904.01628v1.pdf
PWC	https://paperswithcode.com/paper/identification-interpretability-and-bayesian
Repo	https://github.com/adamlauretig/bwe
Framework	none

The Variational Predictive Natural Gradient


Title	The Variational Predictive Natural Gradient
Authors	Da Tang, Rajesh Ranganath
Abstract	Variational inference transforms posterior inference into parametric optimization thereby enabling the use of latent variable models where otherwise impractical. However, variational inference can be finicky when different variational parameters control variables that are strongly correlated under the model. Traditional natural gradients based on the variational approximation fail to correct for correlations when the approximation is not the true posterior. To address this, we construct a new natural gradient called the Variational Predictive Natural Gradient (VPNG). Unlike traditional natural gradients for variational inference, this natural gradient accounts for the relationship between model parameters and variational parameters. We demonstrate the insight with a simple example as well as the empirical value on a classification task, a deep generative model of images, and probabilistic matrix factorization for recommendation.
Tasks	Latent Variable Models
Published	2019-03-07
URL	https://arxiv.org/abs/1903.02984v3
PDF	https://arxiv.org/pdf/1903.02984v3.pdf
PWC	https://paperswithcode.com/paper/the-variational-predictive-natural-gradient
Repo	https://github.com/datang1992/VPNG
Framework	tf