Paper Group AWR 173
Residual Dense Network for Image Restoration. Change Detection in Graph Streams by Learning Graph Embeddings on Constant-Curvature Manifolds. textTOvec: Deep Contextualized Neural Autoregressive Topic Models of Language with Distributed Compositional Prior. Document Informed Neural Autoregressive Topic Models with Distributional Prior. Deep Neural …
Residual Dense Network for Image Restoration
Title | Residual Dense Network for Image Restoration |
Authors | Yulun Zhang, Yapeng Tian, Yu Kong, Bineng Zhong, Yun Fu |
Abstract | Convolutional neural network has recently achieved great success for image restoration (IR) and also offered hierarchical features. However, most deep CNN based IR models do not make full use of the hierarchical features from the original low-quality images, thereby achieving relatively-low performance. In this paper, we propose a novel residual dense network (RDN) to address this problem in IR. We fully exploit the hierarchical features from all the convolutional layers. Specifically, we propose residual dense block (RDB) to extract abundant local features via densely connected convolutional layers. RDB further allows direct connections from the state of preceding RDB to all the layers of current RDB, leading to a contiguous memory mechanism. To adaptively learn more effective features from preceding and current local features and stabilize the training of wider network, we proposed local feature fusion in RDB. After fully obtaining dense local features, we use global feature fusion to jointly and adaptively learn global hierarchical features in a holistic way. We demonstrate the effectiveness of RDN with several representative IR applications, single image super-resolution, Gaussian image denoising, image compression artifact reduction, and image deblurring. Experiments on benchmark and real-world datasets show that our RDN achieves favorable performance against state-of-the-art methods for each IR task quantitatively and visually. |
Tasks | Deblurring, Denoising, Image Compression, Image Denoising, Image Restoration, Image Super-Resolution, Super-Resolution |
Published | 2018-12-25 |
URL | https://arxiv.org/abs/1812.10477v2 |
https://arxiv.org/pdf/1812.10477v2.pdf | |
PWC | https://paperswithcode.com/paper/residual-dense-network-for-image-restoration |
Repo | https://github.com/yulunzhang/RDN |
Framework | pytorch |
Change Detection in Graph Streams by Learning Graph Embeddings on Constant-Curvature Manifolds
Title | Change Detection in Graph Streams by Learning Graph Embeddings on Constant-Curvature Manifolds |
Authors | Daniele Grattarola, Daniele Zambon, Cesare Alippi, Lorenzo Livi |
Abstract | The space of graphs is often characterised by a non-trivial geometry, which complicates learning and inference in practical applications. A common approach is to use embedding techniques to represent graphs as points in a conventional Euclidean space, but non-Euclidean spaces have often been shown to be better suited for embedding graphs. Among these, constant-curvature Riemannian manifolds (CCMs) offer embedding spaces suitable for studying the statistical properties of a graph distribution, as they provide ways to easily compute metric geodesic distances. In this paper, we focus on the problem of detecting changes in stationarity in a stream of attributed graphs. To this end, we introduce a novel change detection framework based on neural networks and CCMs, that takes into account the non-Euclidean nature of graphs. Our contribution in this work is twofold. First, via a novel approach based on adversarial learning, we compute graph embeddings by training an autoencoder to represent graphs on CCMs. Second, we introduce two novel change detection tests operating on CCMs. We perform experiments on synthetic data, as well as two real-world application scenarios: the detection of epileptic seizures using functional connectivity brain networks, and the detection of hostility between two subjects, using human skeletal graphs. Results show that the proposed methods are able to detect even small changes in a graph-generating process, consistently outperforming approaches based on Euclidean embeddings. |
Tasks | Seizure Detection |
Published | 2018-05-16 |
URL | http://arxiv.org/abs/1805.06299v3 |
http://arxiv.org/pdf/1805.06299v3.pdf | |
PWC | https://paperswithcode.com/paper/change-detection-in-graph-streams-by-learning |
Repo | https://github.com/dan-zam/cdg |
Framework | none |
textTOvec: Deep Contextualized Neural Autoregressive Topic Models of Language with Distributed Compositional Prior
Title | textTOvec: Deep Contextualized Neural Autoregressive Topic Models of Language with Distributed Compositional Prior |
Authors | Pankaj Gupta, Yatin Chaudhary, Florian Buettner, Hinrich Schütze |
Abstract | We address two challenges of probabilistic topic modelling in order to better estimate the probability of a word in a given context, i.e., P(wordcontext): (1) No Language Structure in Context: Probabilistic topic models ignore word order by summarizing a given context as a “bag-of-word” and consequently the semantics of words in the context is lost. The LSTM-LM learns a vector-space representation of each word by accounting for word order in local collocation patterns and models complex characteristics of language (e.g., syntax and semantics), while the TM simultaneously learns a latent representation from the entire document and discovers the underlying thematic structure. We unite two complementary paradigms of learning the meaning of word occurrences by combining a TM (e.g., DocNADE) and a LM in a unified probabilistic framework, named as ctx-DocNADE. (2) Limited Context and/or Smaller training corpus of documents: In settings with a small number of word occurrences (i.e., lack of context) in short text or data sparsity in a corpus of few documents, the application of TMs is challenging. We address this challenge by incorporating external knowledge into neural autoregressive topic models via a language modelling approach: we use word embeddings as input of a LSTM-LM with the aim to improve the word-topic mapping on a smaller and/or short-text corpus. The proposed DocNADE extension is named as ctx-DocNADEe. We present novel neural autoregressive topic model variants coupled with neural LMs and embeddings priors that consistently outperform state-of-the-art generative TMs in terms of generalization (perplexity), interpretability (topic coherence) and applicability (retrieval and classification) over 6 long-text and 8 short-text datasets from diverse domains. |
Tasks | Information Extraction, Information Retrieval, Language Modelling, Topic Models, Unsupervised Representation Learning, Word Embeddings |
Published | 2018-10-09 |
URL | http://arxiv.org/abs/1810.03947v4 |
http://arxiv.org/pdf/1810.03947v4.pdf | |
PWC | https://paperswithcode.com/paper/texttovec-deep-contextualized-neural |
Repo | https://github.com/pgcool/textTOvec |
Framework | tf |
Document Informed Neural Autoregressive Topic Models with Distributional Prior
Title | Document Informed Neural Autoregressive Topic Models with Distributional Prior |
Authors | Pankaj Gupta, Yatin Chaudhary, Florian Buettner, Hinrich Schütze |
Abstract | We address two challenges in topic models: (1) Context information around words helps in determining their actual meaning, e.g., “networks” used in the contexts “artificial neural networks” vs. “biological neuron networks”. Generative topic models infer topic-word distributions, taking no or only little context into account. Here, we extend a neural autoregressive topic model to exploit the full context information around words in a document in a language modeling fashion. The proposed model is named as iDocNADE. (2) Due to the small number of word occurrences (i.e., lack of context) in short text and data sparsity in a corpus of few documents, the application of topic models is challenging on such texts. Therefore, we propose a simple and efficient way of incorporating external knowledge into neural autoregressive topic models: we use embeddings as a distributional prior. The proposed variants are named as DocNADEe and iDocNADEe. We present novel neural autoregressive topic model variants that consistently outperform state-of-the-art generative topic models in terms of generalization, interpretability (topic coherence) and applicability (retrieval and classification) over 7 long-text and 8 short-text datasets from diverse domains. |
Tasks | Language Modelling, Topic Models |
Published | 2018-09-15 |
URL | http://arxiv.org/abs/1809.06709v2 |
http://arxiv.org/pdf/1809.06709v2.pdf | |
PWC | https://paperswithcode.com/paper/document-informed-neural-autoregressive-topic |
Repo | https://github.com/pgcool/iDocNADEe |
Framework | tf |
Deep Neural Networks with Multi-Branch Architectures Are Less Non-Convex
Title | Deep Neural Networks with Multi-Branch Architectures Are Less Non-Convex |
Authors | Hongyang Zhang, Junru Shao, Ruslan Salakhutdinov |
Abstract | Several recently proposed architectures of neural networks such as ResNeXt, Inception, Xception, SqueezeNet and Wide ResNet are based on the designing idea of having multiple branches and have demonstrated improved performance in many applications. We show that one cause for such success is due to the fact that the multi-branch architecture is less non-convex in terms of duality gap. The duality gap measures the degree of intrinsic non-convexity of an optimization problem: smaller gap in relative value implies lower degree of intrinsic non-convexity. The challenge is to quantitatively measure the duality gap of highly non-convex problems such as deep neural networks. In this work, we provide strong guarantees of this quantity for two classes of network architectures. For the neural networks with arbitrary activation functions, multi-branch architecture and a variant of hinge loss, we show that the duality gap of both population and empirical risks shrinks to zero as the number of branches increases. This result sheds light on better understanding the power of over-parametrization where increasing the network width tends to make the loss surface less non-convex. For the neural networks with linear activation function and $\ell_2$ loss, we show that the duality gap of empirical risk is zero. Our two results work for arbitrary depths and adversarial data, while the analytical techniques might be of independent interest to non-convex optimization more broadly. Experiments on both synthetic and real-world datasets validate our results. |
Tasks | |
Published | 2018-06-06 |
URL | http://arxiv.org/abs/1806.01845v2 |
http://arxiv.org/pdf/1806.01845v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-neural-networks-with-multi-branch |
Repo | https://github.com/hongyanz/multibranch |
Framework | pytorch |
DNN Feature Map Compression using Learned Representation over GF(2)
Title | DNN Feature Map Compression using Learned Representation over GF(2) |
Authors | Denis A. Gudovskiy, Alec Hodgkinson, Luca Rigazio |
Abstract | In this paper, we introduce a method to compress intermediate feature maps of deep neural networks (DNNs) to decrease memory storage and bandwidth requirements during inference. Unlike previous works, the proposed method is based on converting fixed-point activations into vectors over the smallest GF(2) finite field followed by nonlinear dimensionality reduction (NDR) layers embedded into a DNN. Such an end-to-end learned representation finds more compact feature maps by exploiting quantization redundancies within the fixed-point activations along the channel or spatial dimensions. We apply the proposed network architectures derived from modified SqueezeNet and MobileNetV2 to the tasks of ImageNet classification and PASCAL VOC object detection. Compared to prior approaches, the conducted experiments show a factor of 2 decrease in memory requirements with minor degradation in accuracy while adding only bitwise computations. |
Tasks | Dimensionality Reduction, Object Detection, Quantization |
Published | 2018-08-15 |
URL | http://arxiv.org/abs/1808.05285v1 |
http://arxiv.org/pdf/1808.05285v1.pdf | |
PWC | https://paperswithcode.com/paper/dnn-feature-map-compression-using-learned |
Repo | https://github.com/gudovskiy/fmap_compression |
Framework | none |
Understanding the impact of entropy on policy optimization
Title | Understanding the impact of entropy on policy optimization |
Authors | Zafarali Ahmed, Nicolas Le Roux, Mohammad Norouzi, Dale Schuurmans |
Abstract | Entropy regularization is commonly used to improve policy optimization in reinforcement learning. It is believed to help with \emph{exploration} by encouraging the selection of more stochastic policies. In this work, we analyze this claim using new visualizations of the optimization landscape based on randomly perturbing the loss function. We first show that even with access to the exact gradient, policy optimization is difficult due to the geometry of the objective function. Then, we qualitatively show that in some environments, a policy with higher entropy can make the optimization landscape smoother, thereby connecting local optima and enabling the use of larger learning rates. This paper presents new tools for understanding the optimization landscape, shows that policy entropy serves as a regularizer, and highlights the challenge of designing general-purpose policy optimization algorithms. |
Tasks | |
Published | 2018-11-27 |
URL | https://arxiv.org/abs/1811.11214v5 |
https://arxiv.org/pdf/1811.11214v5.pdf | |
PWC | https://paperswithcode.com/paper/understanding-the-impact-of-entropy-on-policy |
Repo | https://github.com/zafarali/emdp |
Framework | none |
The Description Length of Deep Learning Models
Title | The Description Length of Deep Learning Models |
Authors | Léonard Blier, Yann Ollivier |
Abstract | Solomonoff’s general theory of inference and the Minimum Description Length principle formalize Occam’s razor, and hold that a good model of data is a model that is good at losslessly compressing the data, including the cost of describing the model itself. Deep neural networks might seem to go against this principle given the large number of parameters to be encoded. We demonstrate experimentally the ability of deep neural networks to compress the training data even when accounting for parameter encoding. The compression viewpoint originally motivated the use of variational methods in neural networks. Unexpectedly, we found that these variational methods provide surprisingly poor compression bounds, despite being explicitly built to minimize such bounds. This might explain the relatively poor practical performance of variational methods in deep learning. On the other hand, simple incremental encoding methods yield excellent compression values on deep networks, vindicating Solomonoff’s approach. |
Tasks | |
Published | 2018-02-20 |
URL | http://arxiv.org/abs/1802.07044v5 |
http://arxiv.org/pdf/1802.07044v5.pdf | |
PWC | https://paperswithcode.com/paper/the-description-length-of-deep-learning |
Repo | https://github.com/leonardblier/descriptionlengthdeeplearning |
Framework | pytorch |
Data-to-Text Generation with Content Selection and Planning
Title | Data-to-Text Generation with Content Selection and Planning |
Authors | Ratish Puduppully, Li Dong, Mirella Lapata |
Abstract | Recent advances in data-to-text generation have led to the use of large-scale datasets and neural network models which are trained end-to-end, without explicitly modeling what to say and in what order. In this work, we present a neural network architecture which incorporates content selection and planning without sacrificing end-to-end training. We decompose the generation task into two stages. Given a corpus of data records (paired with descriptive documents), we first generate a content plan highlighting which information should be mentioned and in which order and then generate the document while taking the content plan into account. Automatic and human-based evaluation experiments show that our model outperforms strong baselines improving the state-of-the-art on the recently released RotoWire dataset. |
Tasks | Data-to-Text Generation, Text Generation |
Published | 2018-09-03 |
URL | http://arxiv.org/abs/1809.00582v2 |
http://arxiv.org/pdf/1809.00582v2.pdf | |
PWC | https://paperswithcode.com/paper/data-to-text-generation-with-content |
Repo | https://github.com/jugalw13/Red-Hat-Hack |
Framework | none |
A Probabilistic Model of the Bitcoin Blockchain
Title | A Probabilistic Model of the Bitcoin Blockchain |
Authors | Marc Jourdan, Sebastien Blandin, Laura Wynter, Pralhad Deshpande |
Abstract | The Bitcoin transaction graph is a public data structure organized as transactions between addresses, each associated with a logical entity. In this work, we introduce a complete probabilistic model of the Bitcoin Blockchain. We first formulate a set of conditional dependencies induced by the Bitcoin protocol at the block level and derive a corresponding fully observed graphical model of a Bitcoin block. We then extend the model to include hidden entity attributes such as the functional category of the associated logical agent and derive asymptotic bounds on the privacy properties implied by this model. At the network level, we show evidence of complex transaction-to-transaction behavior and present a relevant discriminative model of the agent categories. Performance of both the block-based graphical model and the network-level discriminative model is evaluated on a subset of the public Bitcoin Blockchain. |
Tasks | |
Published | 2018-11-07 |
URL | http://arxiv.org/abs/1812.05451v1 |
http://arxiv.org/pdf/1812.05451v1.pdf | |
PWC | https://paperswithcode.com/paper/a-probabilistic-model-of-the-bitcoin |
Repo | https://github.com/Maru92/EntityAddressBitcoin |
Framework | none |
Neural Body Fitting: Unifying Deep Learning and Model-Based Human Pose and Shape Estimation
Title | Neural Body Fitting: Unifying Deep Learning and Model-Based Human Pose and Shape Estimation |
Authors | Mohamed Omran, Christoph Lassner, Gerard Pons-Moll, Peter V. Gehler, Bernt Schiele |
Abstract | Direct prediction of 3D body pose and shape remains a challenge even for highly parameterized deep learning models. Mapping from the 2D image space to the prediction space is difficult: perspective ambiguities make the loss function noisy and training data is scarce. In this paper, we propose a novel approach (Neural Body Fitting (NBF)). It integrates a statistical body model within a CNN, leveraging reliable bottom-up semantic body part segmentation and robust top-down body model constraints. NBF is fully differentiable and can be trained using 2D and 3D annotations. In detailed experiments, we analyze how the components of our model affect performance, especially the use of part segmentations as an explicit intermediate representation, and present a robust, efficiently trainable framework for 3D human pose estimation from 2D images with competitive results on standard benchmarks. Code will be made available at http://github.com/mohomran/neural_body_fitting |
Tasks | 3D Human Pose Estimation, Pose Estimation |
Published | 2018-08-17 |
URL | http://arxiv.org/abs/1808.05942v1 |
http://arxiv.org/pdf/1808.05942v1.pdf | |
PWC | https://paperswithcode.com/paper/neural-body-fitting-unifying-deep-learning |
Repo | https://github.com/mohomran/neural_body_fitting |
Framework | tf |
Boltzmann Generators – Sampling Equilibrium States of Many-Body Systems with Deep Learning
Title | Boltzmann Generators – Sampling Equilibrium States of Many-Body Systems with Deep Learning |
Authors | Frank Noé, Simon Olsson, Jonas Köhler, Hao Wu |
Abstract | Computing equilibrium states in condensed-matter many-body systems, such as solvated proteins, is a long-standing challenge. Lacking methods for generating statistically independent equilibrium samples in “one shot”, vast computational effort is invested for simulating these system in small steps, e.g., using Molecular Dynamics. Combining deep learning and statistical mechanics, we here develop Boltzmann Generators, that are shown to generate unbiased one-shot equilibrium samples of representative condensed matter systems and proteins. Boltzmann Generators use neural networks to learn a coordinate transformation of the complex configurational equilibrium distribution to a distribution that can be easily sampled. Accurate computation of free energy differences and discovery of new configurations are demonstrated, providing a statistical mechanics tool that can avoid rare events during sampling without prior knowledge of reaction coordinates. |
Tasks | |
Published | 2018-12-04 |
URL | https://arxiv.org/abs/1812.01729v2 |
https://arxiv.org/pdf/1812.01729v2.pdf | |
PWC | https://paperswithcode.com/paper/boltzmann-generators-sampling-equilibrium |
Repo | https://github.com/noegroup/project_boltzmann_generators |
Framework | tf |
Learning Human Optical Flow
Title | Learning Human Optical Flow |
Authors | Anurag Ranjan, Javier Romero, Michael J. Black |
Abstract | The optical flow of humans is well known to be useful for the analysis of human action. Given this, we devise an optical flow algorithm specifically for human motion and show that it is superior to generic flow methods. Designing a method by hand is impractical, so we develop a new training database of image sequences with ground truth optical flow. For this we use a 3D model of the human body and motion capture data to synthesize realistic flow fields. We then train a convolutional neural network to estimate human flow fields from pairs of images. Since many applications in human motion analysis depend on speed, and we anticipate mobile applications, we base our method on SpyNet with several modifications. We demonstrate that our trained network is more accurate than a wide range of top methods on held-out test data and that it generalizes well to real image sequences. When combined with a person detector/tracker, the approach provides a full solution to the problem of 2D human flow estimation. Both the code and the dataset are available for research. |
Tasks | Motion Capture, Optical Flow Estimation |
Published | 2018-06-14 |
URL | http://arxiv.org/abs/1806.05666v2 |
http://arxiv.org/pdf/1806.05666v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-human-optical-flow |
Repo | https://github.com/anuragranj/humanflow |
Framework | pytorch |
Natural Gradient Deep Q-learning
Title | Natural Gradient Deep Q-learning |
Authors | Ethan Knight, Osher Lerner |
Abstract | We present a novel algorithm to train a deep Q-learning agent using natural-gradient techniques. We compare the original deep Q-network (DQN) algorithm to its natural-gradient counterpart, which we refer to as NGDQN, on a collection of classic control domains. Without employing target networks, NGDQN significantly outperforms DQN without target networks, and performs no worse than DQN with target networks, suggesting that NGDQN stabilizes training and can help reduce the need for additional hyperparameter tuning. We also find that NGDQN is less sensitive to hyperparameter optimization relative to DQN. Together these results suggest that natural-gradient techniques can improve value-function optimization in deep reinforcement learning. |
Tasks | Hyperparameter Optimization, Q-Learning |
Published | 2018-03-20 |
URL | http://arxiv.org/abs/1803.07482v2 |
http://arxiv.org/pdf/1803.07482v2.pdf | |
PWC | https://paperswithcode.com/paper/natural-gradient-deep-q-learning |
Repo | https://github.com/hyperdo/natural-gradient-deep-q-learning |
Framework | tf |
Bayesian Layers: A Module for Neural Network Uncertainty
Title | Bayesian Layers: A Module for Neural Network Uncertainty |
Authors | Dustin Tran, Michael W. Dusenberry, Mark van der Wilk, Danijar Hafner |
Abstract | We describe Bayesian Layers, a module designed for fast experimentation with neural network uncertainty. It extends neural network libraries with drop-in replacements for common layers. This enables composition via a unified abstraction over deterministic and stochastic functions and allows for scalability via the underlying system. These layers capture uncertainty over weights (Bayesian neural nets), pre-activation units (dropout), activations (“stochastic output layers”), or the function itself (Gaussian processes). They can also be reversible to propagate uncertainty from input to output. We include code examples for common architectures such as Bayesian LSTMs, deep GPs, and flow-based models. As demonstration, we fit a 5-billion parameter “Bayesian Transformer” on 512 TPUv2 cores for uncertainty in machine translation and a Bayesian dynamics model for model-based planning. Finally, we show how Bayesian Layers can be used within the Edward2 probabilistic programming language for probabilistic programs with stochastic processes. |
Tasks | Gaussian Processes, Machine Translation, Probabilistic Programming |
Published | 2018-12-10 |
URL | http://arxiv.org/abs/1812.03973v3 |
http://arxiv.org/pdf/1812.03973v3.pdf | |
PWC | https://paperswithcode.com/paper/bayesian-layers-a-module-for-neural-network |
Repo | https://github.com/google/edward2 |
Framework | tf |