October 20, 2019

3170 words 15 mins read

Paper Group AWR 173

Residual Dense Network for Image Restoration. Change Detection in Graph Streams by Learning Graph Embeddings on Constant-Curvature Manifolds. textTOvec: Deep Contextualized Neural Autoregressive Topic Models of Language with Distributed Compositional Prior. Document Informed Neural Autoregressive Topic Models with Distributional Prior. Deep Neural …

Residual Dense Network for Image Restoration


Title	Residual Dense Network for Image Restoration
Authors	Yulun Zhang, Yapeng Tian, Yu Kong, Bineng Zhong, Yun Fu
Abstract	Convolutional neural network has recently achieved great success for image restoration (IR) and also offered hierarchical features. However, most deep CNN based IR models do not make full use of the hierarchical features from the original low-quality images, thereby achieving relatively-low performance. In this paper, we propose a novel residual dense network (RDN) to address this problem in IR. We fully exploit the hierarchical features from all the convolutional layers. Specifically, we propose residual dense block (RDB) to extract abundant local features via densely connected convolutional layers. RDB further allows direct connections from the state of preceding RDB to all the layers of current RDB, leading to a contiguous memory mechanism. To adaptively learn more effective features from preceding and current local features and stabilize the training of wider network, we proposed local feature fusion in RDB. After fully obtaining dense local features, we use global feature fusion to jointly and adaptively learn global hierarchical features in a holistic way. We demonstrate the effectiveness of RDN with several representative IR applications, single image super-resolution, Gaussian image denoising, image compression artifact reduction, and image deblurring. Experiments on benchmark and real-world datasets show that our RDN achieves favorable performance against state-of-the-art methods for each IR task quantitatively and visually.
Tasks	Deblurring, Denoising, Image Compression, Image Denoising, Image Restoration, Image Super-Resolution, Super-Resolution
Published	2018-12-25
URL	https://arxiv.org/abs/1812.10477v2
PDF	https://arxiv.org/pdf/1812.10477v2.pdf
PWC	https://paperswithcode.com/paper/residual-dense-network-for-image-restoration
Repo	https://github.com/yulunzhang/RDN
Framework	pytorch

Change Detection in Graph Streams by Learning Graph Embeddings on Constant-Curvature Manifolds


Title	Change Detection in Graph Streams by Learning Graph Embeddings on Constant-Curvature Manifolds
Authors	Daniele Grattarola, Daniele Zambon, Cesare Alippi, Lorenzo Livi
Abstract	The space of graphs is often characterised by a non-trivial geometry, which complicates learning and inference in practical applications. A common approach is to use embedding techniques to represent graphs as points in a conventional Euclidean space, but non-Euclidean spaces have often been shown to be better suited for embedding graphs. Among these, constant-curvature Riemannian manifolds (CCMs) offer embedding spaces suitable for studying the statistical properties of a graph distribution, as they provide ways to easily compute metric geodesic distances. In this paper, we focus on the problem of detecting changes in stationarity in a stream of attributed graphs. To this end, we introduce a novel change detection framework based on neural networks and CCMs, that takes into account the non-Euclidean nature of graphs. Our contribution in this work is twofold. First, via a novel approach based on adversarial learning, we compute graph embeddings by training an autoencoder to represent graphs on CCMs. Second, we introduce two novel change detection tests operating on CCMs. We perform experiments on synthetic data, as well as two real-world application scenarios: the detection of epileptic seizures using functional connectivity brain networks, and the detection of hostility between two subjects, using human skeletal graphs. Results show that the proposed methods are able to detect even small changes in a graph-generating process, consistently outperforming approaches based on Euclidean embeddings.
Tasks	Seizure Detection
Published	2018-05-16
URL	http://arxiv.org/abs/1805.06299v3
PDF	http://arxiv.org/pdf/1805.06299v3.pdf
PWC	https://paperswithcode.com/paper/change-detection-in-graph-streams-by-learning
Repo	https://github.com/dan-zam/cdg
Framework	none

textTOvec: Deep Contextualized Neural Autoregressive Topic Models of Language with Distributed Compositional Prior


Title	textTOvec: Deep Contextualized Neural Autoregressive Topic Models of Language with Distributed Compositional Prior
Authors	Pankaj Gupta, Yatin Chaudhary, Florian Buettner, Hinrich Schütze
Abstract	We address two challenges of probabilistic topic modelling in order to better estimate the probability of a word in a given context, i.e., P(wordcontext): (1) No Language Structure in Context: Probabilistic topic models ignore word order by summarizing a given context as a “bag-of-word” and consequently the semantics of words in the context is lost. The LSTM-LM learns a vector-space representation of each word by accounting for word order in local collocation patterns and models complex characteristics of language (e.g., syntax and semantics), while the TM simultaneously learns a latent representation from the entire document and discovers the underlying thematic structure. We unite two complementary paradigms of learning the meaning of word occurrences by combining a TM (e.g., DocNADE) and a LM in a unified probabilistic framework, named as ctx-DocNADE. (2) Limited Context and/or Smaller training corpus of documents: In settings with a small number of word occurrences (i.e., lack of context) in short text or data sparsity in a corpus of few documents, the application of TMs is challenging. We address this challenge by incorporating external knowledge into neural autoregressive topic models via a language modelling approach: we use word embeddings as input of a LSTM-LM with the aim to improve the word-topic mapping on a smaller and/or short-text corpus. The proposed DocNADE extension is named as ctx-DocNADEe. We present novel neural autoregressive topic model variants coupled with neural LMs and embeddings priors that consistently outperform state-of-the-art generative TMs in terms of generalization (perplexity), interpretability (topic coherence) and applicability (retrieval and classification) over 6 long-text and 8 short-text datasets from diverse domains.
Tasks	Information Extraction, Information Retrieval, Language Modelling, Topic Models, Unsupervised Representation Learning, Word Embeddings
Published	2018-10-09
URL	http://arxiv.org/abs/1810.03947v4
PDF	http://arxiv.org/pdf/1810.03947v4.pdf
PWC	https://paperswithcode.com/paper/texttovec-deep-contextualized-neural
Repo	https://github.com/pgcool/textTOvec
Framework	tf

Document Informed Neural Autoregressive Topic Models with Distributional Prior


Title	Document Informed Neural Autoregressive Topic Models with Distributional Prior
Authors	Pankaj Gupta, Yatin Chaudhary, Florian Buettner, Hinrich Schütze
Abstract	We address two challenges in topic models: (1) Context information around words helps in determining their actual meaning, e.g., “networks” used in the contexts “artificial neural networks” vs. “biological neuron networks”. Generative topic models infer topic-word distributions, taking no or only little context into account. Here, we extend a neural autoregressive topic model to exploit the full context information around words in a document in a language modeling fashion. The proposed model is named as iDocNADE. (2) Due to the small number of word occurrences (i.e., lack of context) in short text and data sparsity in a corpus of few documents, the application of topic models is challenging on such texts. Therefore, we propose a simple and efficient way of incorporating external knowledge into neural autoregressive topic models: we use embeddings as a distributional prior. The proposed variants are named as DocNADEe and iDocNADEe. We present novel neural autoregressive topic model variants that consistently outperform state-of-the-art generative topic models in terms of generalization, interpretability (topic coherence) and applicability (retrieval and classification) over 7 long-text and 8 short-text datasets from diverse domains.
Tasks	Language Modelling, Topic Models
Published	2018-09-15
URL	http://arxiv.org/abs/1809.06709v2
PDF	http://arxiv.org/pdf/1809.06709v2.pdf
PWC	https://paperswithcode.com/paper/document-informed-neural-autoregressive-topic
Repo	https://github.com/pgcool/iDocNADEe
Framework	tf

Deep Neural Networks with Multi-Branch Architectures Are Less Non-Convex


Title	Deep Neural Networks with Multi-Branch Architectures Are Less Non-Convex
Authors	Hongyang Zhang, Junru Shao, Ruslan Salakhutdinov
Abstract	Several recently proposed architectures of neural networks such as ResNeXt, Inception, Xception, SqueezeNet and Wide ResNet are based on the designing idea of having multiple branches and have demonstrated improved performance in many applications. We show that one cause for such success is due to the fact that the multi-branch architecture is less non-convex in terms of duality gap. The duality gap measures the degree of intrinsic non-convexity of an optimization problem: smaller gap in relative value implies lower degree of intrinsic non-convexity. The challenge is to quantitatively measure the duality gap of highly non-convex problems such as deep neural networks. In this work, we provide strong guarantees of this quantity for two classes of network architectures. For the neural networks with arbitrary activation functions, multi-branch architecture and a variant of hinge loss, we show that the duality gap of both population and empirical risks shrinks to zero as the number of branches increases. This result sheds light on better understanding the power of over-parametrization where increasing the network width tends to make the loss surface less non-convex. For the neural networks with linear activation function and $\ell_2$ loss, we show that the duality gap of empirical risk is zero. Our two results work for arbitrary depths and adversarial data, while the analytical techniques might be of independent interest to non-convex optimization more broadly. Experiments on both synthetic and real-world datasets validate our results.
Tasks
Published	2018-06-06
URL	http://arxiv.org/abs/1806.01845v2
PDF	http://arxiv.org/pdf/1806.01845v2.pdf
PWC	https://paperswithcode.com/paper/deep-neural-networks-with-multi-branch
Repo	https://github.com/hongyanz/multibranch
Framework	pytorch

DNN Feature Map Compression using Learned Representation over GF(2)


Title	DNN Feature Map Compression using Learned Representation over GF(2)
Authors	Denis A. Gudovskiy, Alec Hodgkinson, Luca Rigazio
Abstract	In this paper, we introduce a method to compress intermediate feature maps of deep neural networks (DNNs) to decrease memory storage and bandwidth requirements during inference. Unlike previous works, the proposed method is based on converting fixed-point activations into vectors over the smallest GF(2) finite field followed by nonlinear dimensionality reduction (NDR) layers embedded into a DNN. Such an end-to-end learned representation finds more compact feature maps by exploiting quantization redundancies within the fixed-point activations along the channel or spatial dimensions. We apply the proposed network architectures derived from modified SqueezeNet and MobileNetV2 to the tasks of ImageNet classification and PASCAL VOC object detection. Compared to prior approaches, the conducted experiments show a factor of 2 decrease in memory requirements with minor degradation in accuracy while adding only bitwise computations.
Tasks	Dimensionality Reduction, Object Detection, Quantization
Published	2018-08-15
URL	http://arxiv.org/abs/1808.05285v1
PDF	http://arxiv.org/pdf/1808.05285v1.pdf
PWC	https://paperswithcode.com/paper/dnn-feature-map-compression-using-learned
Repo	https://github.com/gudovskiy/fmap_compression
Framework	none

Understanding the impact of entropy on policy optimization


Title	Understanding the impact of entropy on policy optimization
Authors	Zafarali Ahmed, Nicolas Le Roux, Mohammad Norouzi, Dale Schuurmans
Abstract	Entropy regularization is commonly used to improve policy optimization in reinforcement learning. It is believed to help with \emph{exploration} by encouraging the selection of more stochastic policies. In this work, we analyze this claim using new visualizations of the optimization landscape based on randomly perturbing the loss function. We first show that even with access to the exact gradient, policy optimization is difficult due to the geometry of the objective function. Then, we qualitatively show that in some environments, a policy with higher entropy can make the optimization landscape smoother, thereby connecting local optima and enabling the use of larger learning rates. This paper presents new tools for understanding the optimization landscape, shows that policy entropy serves as a regularizer, and highlights the challenge of designing general-purpose policy optimization algorithms.
Tasks
Published	2018-11-27
URL	https://arxiv.org/abs/1811.11214v5
PDF	https://arxiv.org/pdf/1811.11214v5.pdf
PWC	https://paperswithcode.com/paper/understanding-the-impact-of-entropy-on-policy
Repo	https://github.com/zafarali/emdp
Framework	none

The Description Length of Deep Learning Models


Title	The Description Length of Deep Learning Models
Authors	Léonard Blier, Yann Ollivier
Abstract	Solomonoff’s general theory of inference and the Minimum Description Length principle formalize Occam’s razor, and hold that a good model of data is a model that is good at losslessly compressing the data, including the cost of describing the model itself. Deep neural networks might seem to go against this principle given the large number of parameters to be encoded. We demonstrate experimentally the ability of deep neural networks to compress the training data even when accounting for parameter encoding. The compression viewpoint originally motivated the use of variational methods in neural networks. Unexpectedly, we found that these variational methods provide surprisingly poor compression bounds, despite being explicitly built to minimize such bounds. This might explain the relatively poor practical performance of variational methods in deep learning. On the other hand, simple incremental encoding methods yield excellent compression values on deep networks, vindicating Solomonoff’s approach.
Tasks
Published	2018-02-20
URL	http://arxiv.org/abs/1802.07044v5
PDF	http://arxiv.org/pdf/1802.07044v5.pdf
PWC	https://paperswithcode.com/paper/the-description-length-of-deep-learning
Repo	https://github.com/leonardblier/descriptionlengthdeeplearning
Framework	pytorch

Data-to-Text Generation with Content Selection and Planning


Title	Data-to-Text Generation with Content Selection and Planning
Authors	Ratish Puduppully, Li Dong, Mirella Lapata
Abstract	Recent advances in data-to-text generation have led to the use of large-scale datasets and neural network models which are trained end-to-end, without explicitly modeling what to say and in what order. In this work, we present a neural network architecture which incorporates content selection and planning without sacrificing end-to-end training. We decompose the generation task into two stages. Given a corpus of data records (paired with descriptive documents), we first generate a content plan highlighting which information should be mentioned and in which order and then generate the document while taking the content plan into account. Automatic and human-based evaluation experiments show that our model outperforms strong baselines improving the state-of-the-art on the recently released RotoWire dataset.
Tasks	Data-to-Text Generation, Text Generation
Published	2018-09-03
URL	http://arxiv.org/abs/1809.00582v2
PDF	http://arxiv.org/pdf/1809.00582v2.pdf
PWC	https://paperswithcode.com/paper/data-to-text-generation-with-content
Repo	https://github.com/jugalw13/Red-Hat-Hack
Framework	none

A Probabilistic Model of the Bitcoin Blockchain


Title	A Probabilistic Model of the Bitcoin Blockchain
Authors	Marc Jourdan, Sebastien Blandin, Laura Wynter, Pralhad Deshpande
Abstract	The Bitcoin transaction graph is a public data structure organized as transactions between addresses, each associated with a logical entity. In this work, we introduce a complete probabilistic model of the Bitcoin Blockchain. We first formulate a set of conditional dependencies induced by the Bitcoin protocol at the block level and derive a corresponding fully observed graphical model of a Bitcoin block. We then extend the model to include hidden entity attributes such as the functional category of the associated logical agent and derive asymptotic bounds on the privacy properties implied by this model. At the network level, we show evidence of complex transaction-to-transaction behavior and present a relevant discriminative model of the agent categories. Performance of both the block-based graphical model and the network-level discriminative model is evaluated on a subset of the public Bitcoin Blockchain.
Tasks
Published	2018-11-07
URL	http://arxiv.org/abs/1812.05451v1
PDF	http://arxiv.org/pdf/1812.05451v1.pdf
PWC	https://paperswithcode.com/paper/a-probabilistic-model-of-the-bitcoin
Repo	https://github.com/Maru92/EntityAddressBitcoin
Framework	none

Neural Body Fitting: Unifying Deep Learning and Model-Based Human Pose and Shape Estimation


Title	Neural Body Fitting: Unifying Deep Learning and Model-Based Human Pose and Shape Estimation
Authors	Mohamed Omran, Christoph Lassner, Gerard Pons-Moll, Peter V. Gehler, Bernt Schiele
Abstract	Direct prediction of 3D body pose and shape remains a challenge even for highly parameterized deep learning models. Mapping from the 2D image space to the prediction space is difficult: perspective ambiguities make the loss function noisy and training data is scarce. In this paper, we propose a novel approach (Neural Body Fitting (NBF)). It integrates a statistical body model within a CNN, leveraging reliable bottom-up semantic body part segmentation and robust top-down body model constraints. NBF is fully differentiable and can be trained using 2D and 3D annotations. In detailed experiments, we analyze how the components of our model affect performance, especially the use of part segmentations as an explicit intermediate representation, and present a robust, efficiently trainable framework for 3D human pose estimation from 2D images with competitive results on standard benchmarks. Code will be made available at http://github.com/mohomran/neural_body_fitting
Tasks	3D Human Pose Estimation, Pose Estimation
Published	2018-08-17
URL	http://arxiv.org/abs/1808.05942v1
PDF	http://arxiv.org/pdf/1808.05942v1.pdf
PWC	https://paperswithcode.com/paper/neural-body-fitting-unifying-deep-learning
Repo	https://github.com/mohomran/neural_body_fitting
Framework	tf

Boltzmann Generators – Sampling Equilibrium States of Many-Body Systems with Deep Learning


Title	Boltzmann Generators – Sampling Equilibrium States of Many-Body Systems with Deep Learning
Authors	Frank Noé, Simon Olsson, Jonas Köhler, Hao Wu
Abstract	Computing equilibrium states in condensed-matter many-body systems, such as solvated proteins, is a long-standing challenge. Lacking methods for generating statistically independent equilibrium samples in “one shot”, vast computational effort is invested for simulating these system in small steps, e.g., using Molecular Dynamics. Combining deep learning and statistical mechanics, we here develop Boltzmann Generators, that are shown to generate unbiased one-shot equilibrium samples of representative condensed matter systems and proteins. Boltzmann Generators use neural networks to learn a coordinate transformation of the complex configurational equilibrium distribution to a distribution that can be easily sampled. Accurate computation of free energy differences and discovery of new configurations are demonstrated, providing a statistical mechanics tool that can avoid rare events during sampling without prior knowledge of reaction coordinates.
Tasks
Published	2018-12-04
URL	https://arxiv.org/abs/1812.01729v2
PDF	https://arxiv.org/pdf/1812.01729v2.pdf
PWC	https://paperswithcode.com/paper/boltzmann-generators-sampling-equilibrium
Repo	https://github.com/noegroup/project_boltzmann_generators
Framework	tf

Learning Human Optical Flow


Title	Learning Human Optical Flow
Authors	Anurag Ranjan, Javier Romero, Michael J. Black
Abstract	The optical flow of humans is well known to be useful for the analysis of human action. Given this, we devise an optical flow algorithm specifically for human motion and show that it is superior to generic flow methods. Designing a method by hand is impractical, so we develop a new training database of image sequences with ground truth optical flow. For this we use a 3D model of the human body and motion capture data to synthesize realistic flow fields. We then train a convolutional neural network to estimate human flow fields from pairs of images. Since many applications in human motion analysis depend on speed, and we anticipate mobile applications, we base our method on SpyNet with several modifications. We demonstrate that our trained network is more accurate than a wide range of top methods on held-out test data and that it generalizes well to real image sequences. When combined with a person detector/tracker, the approach provides a full solution to the problem of 2D human flow estimation. Both the code and the dataset are available for research.
Tasks	Motion Capture, Optical Flow Estimation
Published	2018-06-14
URL	http://arxiv.org/abs/1806.05666v2
PDF	http://arxiv.org/pdf/1806.05666v2.pdf
PWC	https://paperswithcode.com/paper/learning-human-optical-flow
Repo	https://github.com/anuragranj/humanflow
Framework	pytorch

Natural Gradient Deep Q-learning


Title	Natural Gradient Deep Q-learning
Authors	Ethan Knight, Osher Lerner
Abstract	We present a novel algorithm to train a deep Q-learning agent using natural-gradient techniques. We compare the original deep Q-network (DQN) algorithm to its natural-gradient counterpart, which we refer to as NGDQN, on a collection of classic control domains. Without employing target networks, NGDQN significantly outperforms DQN without target networks, and performs no worse than DQN with target networks, suggesting that NGDQN stabilizes training and can help reduce the need for additional hyperparameter tuning. We also find that NGDQN is less sensitive to hyperparameter optimization relative to DQN. Together these results suggest that natural-gradient techniques can improve value-function optimization in deep reinforcement learning.
Tasks	Hyperparameter Optimization, Q-Learning
Published	2018-03-20
URL	http://arxiv.org/abs/1803.07482v2
PDF	http://arxiv.org/pdf/1803.07482v2.pdf
PWC	https://paperswithcode.com/paper/natural-gradient-deep-q-learning
Repo	https://github.com/hyperdo/natural-gradient-deep-q-learning
Framework	tf

Bayesian Layers: A Module for Neural Network Uncertainty


Title	Bayesian Layers: A Module for Neural Network Uncertainty
Authors	Dustin Tran, Michael W. Dusenberry, Mark van der Wilk, Danijar Hafner
Abstract	We describe Bayesian Layers, a module designed for fast experimentation with neural network uncertainty. It extends neural network libraries with drop-in replacements for common layers. This enables composition via a unified abstraction over deterministic and stochastic functions and allows for scalability via the underlying system. These layers capture uncertainty over weights (Bayesian neural nets), pre-activation units (dropout), activations (“stochastic output layers”), or the function itself (Gaussian processes). They can also be reversible to propagate uncertainty from input to output. We include code examples for common architectures such as Bayesian LSTMs, deep GPs, and flow-based models. As demonstration, we fit a 5-billion parameter “Bayesian Transformer” on 512 TPUv2 cores for uncertainty in machine translation and a Bayesian dynamics model for model-based planning. Finally, we show how Bayesian Layers can be used within the Edward2 probabilistic programming language for probabilistic programs with stochastic processes.
Tasks	Gaussian Processes, Machine Translation, Probabilistic Programming
Published	2018-12-10
URL	http://arxiv.org/abs/1812.03973v3
PDF	http://arxiv.org/pdf/1812.03973v3.pdf
PWC	https://paperswithcode.com/paper/bayesian-layers-a-module-for-neural-network
Repo	https://github.com/google/edward2
Framework	tf