October 20, 2019

2970 words 14 mins read

Paper Group AWR 345

Deep Generative Markov State Models. Adapting the Neural Encoder-Decoder Framework from Single to Multi-Document Summarization. Quasi-hyperbolic momentum and Adam for deep learning. TACO: Learning Task Decomposition via Temporal Alignment for Control. Learning, Planning, and Control in a Monolithic Neural Event Inference Architecture. Semi-Blind Sp …

Deep Generative Markov State Models


Title	Deep Generative Markov State Models
Authors	Hao Wu, Andreas Mardt, Luca Pasquali, Frank Noe
Abstract	We propose a deep generative Markov State Model (DeepGenMSM) learning framework for inference of metastable dynamical systems and prediction of trajectories. After unsupervised training on time series data, the model contains (i) a probabilistic encoder that maps from high-dimensional configuration space to a small-sized vector indicating the membership to metastable (long-lived) states, (ii) a Markov chain that governs the transitions between metastable states and facilitates analysis of the long-time dynamics, and (iii) a generative part that samples the conditional distribution of configurations in the next time step. The model can be operated in a recursive fashion to generate trajectories to predict the system evolution from a defined starting state and propose new configurations. The DeepGenMSM is demonstrated to provide accurate estimates of the long-time kinetics and generate valid distributions for molecular dynamics (MD) benchmark systems. Remarkably, we show that DeepGenMSMs are able to make long time-steps in molecular configuration space and generate physically realistic structures in regions that were not seen in training data.
Tasks	Time Series
Published	2018-05-19
URL	http://arxiv.org/abs/1805.07601v2
PDF	http://arxiv.org/pdf/1805.07601v2.pdf
PWC	https://paperswithcode.com/paper/deep-generative-markov-state-models
Repo	https://github.com/markovmodel/deep_gen_msm
Framework	pytorch

Adapting the Neural Encoder-Decoder Framework from Single to Multi-Document Summarization


Title	Adapting the Neural Encoder-Decoder Framework from Single to Multi-Document Summarization
Authors	Logan Lebanoff, Kaiqiang Song, Fei Liu
Abstract	Generating a text abstract from a set of documents remains a challenging task. The neural encoder-decoder framework has recently been exploited to summarize single documents, but its success can in part be attributed to the availability of large parallel data automatically acquired from the Web. In contrast, parallel data for multi-document summarization are scarce and costly to obtain. There is a pressing need to adapt an encoder-decoder model trained on single-document summarization data to work with multiple-document input. In this paper, we present an initial investigation into a novel adaptation method. It exploits the maximal marginal relevance method to select representative sentences from multi-document input, and leverages an abstractive encoder-decoder model to fuse disparate sentences to an abstractive summary. The adaptation method is robust and itself requires no training data. Our system compares favorably to state-of-the-art extractive and abstractive approaches judged by automatic metrics and human assessors.
Tasks	Document Summarization, Multi-Document Summarization
Published	2018-08-19
URL	http://arxiv.org/abs/1808.06218v2
PDF	http://arxiv.org/pdf/1808.06218v2.pdf
PWC	https://paperswithcode.com/paper/adapting-the-neural-encoder-decoder-framework
Repo	https://github.com/ucfnlp/multidoc_summarization
Framework	tf

Quasi-hyperbolic momentum and Adam for deep learning


Title	Quasi-hyperbolic momentum and Adam for deep learning
Authors	Jerry Ma, Denis Yarats
Abstract	Momentum-based acceleration of stochastic gradient descent (SGD) is widely used in deep learning. We propose the quasi-hyperbolic momentum algorithm (QHM) as an extremely simple alteration of momentum SGD, averaging a plain SGD step with a momentum step. We describe numerous connections to and identities with other algorithms, and we characterize the set of two-state optimization algorithms that QHM can recover. Finally, we propose a QH variant of Adam called QHAdam, and we empirically demonstrate that our algorithms lead to significantly improved training in a variety of settings, including a new state-of-the-art result on WMT16 EN-DE. We hope that these empirical results, combined with the conceptual and practical simplicity of QHM and QHAdam, will spur interest from both practitioners and researchers. Code is immediately available.
Tasks	Stochastic Optimization
Published	2018-10-16
URL	http://arxiv.org/abs/1810.06801v4
PDF	http://arxiv.org/pdf/1810.06801v4.pdf
PWC	https://paperswithcode.com/paper/quasi-hyperbolic-momentum-and-adam-for-deep
Repo	https://github.com/sajadn/QHAdam
Framework	none

TACO: Learning Task Decomposition via Temporal Alignment for Control


Title	TACO: Learning Task Decomposition via Temporal Alignment for Control
Authors	Kyriacos Shiarlis, Markus Wulfmeier, Sasha Salter, Shimon Whiteson, Ingmar Posner
Abstract	Many advanced Learning from Demonstration (LfD) methods consider the decomposition of complex, real-world tasks into simpler sub-tasks. By reusing the corresponding sub-policies within and between tasks, they provide training data for each policy from different high-level tasks and compose them to perform novel ones. Existing approaches to modular LfD focus either on learning a single high-level task or depend on domain knowledge and temporal segmentation. In contrast, we propose a weakly supervised, domain-agnostic approach based on task sketches, which include only the sequence of sub-tasks performed in each demonstration. Our approach simultaneously aligns the sketches with the observed demonstrations and learns the required sub-policies. This improves generalisation in comparison to separate optimisation procedures. We evaluate the approach on multiple domains, including a simulated 3D robot arm control task using purely image-based observations. The results show that our approach performs commensurately with fully supervised approaches, while requiring significantly less annotation effort.
Tasks
Published	2018-03-02
URL	http://arxiv.org/abs/1803.01840v2
PDF	http://arxiv.org/pdf/1803.01840v2.pdf
PWC	https://paperswithcode.com/paper/taco-learning-task-decomposition-via-temporal
Repo	https://github.com/KyriacosShiarli/taco
Framework	tf

Learning, Planning, and Control in a Monolithic Neural Event Inference Architecture


Title	Learning, Planning, and Control in a Monolithic Neural Event Inference Architecture
Authors	Martin V. Butz, David Bilkey, Dania Humaidan, Alistair Knott, Sebastian Otte
Abstract	We introduce REPRISE, a REtrospective and PRospective Inference SchEme, which learns temporal event-predictive models of dynamical systems. REPRISE infers the unobservable contextual event state and accompanying temporal predictive models that best explain the recently encountered sensorimotor experiences retrospectively. Meanwhile, it optimizes upcoming motor activities prospectively in a goal-directed manner. Here, REPRISE is implemented by a recurrent neural network (RNN), which learns temporal forward models of the sensorimotor contingencies generated by different simulated dynamic vehicles. The RNN is augmented with contextual neurons, which enable the encoding of distinct, but related, sensorimotor dynamics as compact event codes. We show that REPRISE concurrently learns to separate and approximate the encountered sensorimotor dynamics: it analyzes sensorimotor error signals adapting both internal contextual neural activities and connection weight values. Moreover, we show that REPRISE can exploit the learned model to induce goal-directed, model-predictive control, that is, approximate active inference: Given a goal state, the system imagines a motor command sequence optimizing it with the prospective objective to minimize the distance to the goal. The RNN activities thus continuously imagine the upcoming future and reflect on the recent past, optimizing the predictive model, the hidden neural state activities, and the upcoming motor activities. As a result, event-predictive neural encodings develop, which allow the invocation of highly effective and adaptive goal-directed sensorimotor control.
Tasks
Published	2018-09-19
URL	https://arxiv.org/abs/1809.07412v2
PDF	https://arxiv.org/pdf/1809.07412v2.pdf
PWC	https://paperswithcode.com/paper/learning-planning-and-control-in-a-monolithic
Repo	https://github.com/CognitiveModeling/2019-ModeInferencePaperCode
Framework	none


Title	Semi-Blind Spatially-Variant Deconvolution in Optical Microscopy with Local Point Spread Function Estimation By Use Of Convolutional Neural Networks
Authors	Adrian Shajkofci, Michael Liebling
Abstract	We present a semi-blind, spatially-variant deconvolution technique aimed at optical microscopy that combines a local estimation step of the point spread function (PSF) and deconvolution using a spatially variant, regularized Richardson-Lucy algorithm. To find the local PSF map in a computationally tractable way, we train a convolutional neural network to perform regression of an optical parametric model on synthetically blurred image patches. We deconvolved both synthetic and experimentally-acquired data, and achieved an improvement of image SNR of 1.00 dB on average, compared to other deconvolution algorithms.
Tasks
Published	2018-03-20
URL	https://arxiv.org/abs/1803.07452v4
PDF	https://arxiv.org/pdf/1803.07452v4.pdf
PWC	https://paperswithcode.com/paper/semi-blind-spatially-variant-deconvolution-in
Repo	https://github.com/idiap/semiblindpsfdeconv
Framework	pytorch

A Contextual Bandit Bake-off


Title	A Contextual Bandit Bake-off
Authors	Alberto Bietti, Alekh Agarwal, John Langford
Abstract	Contextual bandit algorithms are essential for solving many real-world interactive machine learning problems. Despite multiple recent successes on statistically and computationally efficient methods, the practical behavior of these algorithms is still poorly understood. We leverage the availability of large numbers of supervised learning datasets to empirically evaluate contextual bandit algorithms, focusing on practical methods that learn by relying on optimization oracles from supervised learning. We find that a recent method (Foster et al., 2018) using optimism under uncertainty works the best overall. A surprisingly close second is a simple greedy baseline that only explores implicitly through the diversity of contexts, followed by a variant of Online Cover (Agarwal et al., 2014) which tends to be more conservative but robust to problem specification by design. Along the way, we also evaluate various components of contextual bandit algorithm design such as loss estimators. Overall, this is a thorough study and review of contextual bandit methodology.
Tasks
Published	2018-02-12
URL	https://arxiv.org/abs/1802.04064v4
PDF	https://arxiv.org/pdf/1802.04064v4.pdf
PWC	https://paperswithcode.com/paper/a-contextual-bandit-bake-off
Repo	https://github.com/albietz/cb_bakeoff
Framework	none

Interpretable Convolutional Neural Networks via Feedforward Design


Title	Interpretable Convolutional Neural Networks via Feedforward Design
Authors	C. -C. Jay Kuo, Min Zhang, Siyang Li, Jiali Duan, Yueru Chen
Abstract	The model parameters of convolutional neural networks (CNNs) are determined by backpropagation (BP). In this work, we propose an interpretable feedforward (FF) design without any BP as a reference. The FF design adopts a data-centric approach. It derives network parameters of the current layer based on data statistics from the output of the previous layer in a one-pass manner. To construct convolutional layers, we develop a new signal transform, called the Saab (Subspace Approximation with Adjusted Bias) transform. It is a variant of the principal component analysis (PCA) with an added bias vector to annihilate activation’s nonlinearity. Multiple Saab transforms in cascade yield multiple convolutional layers. As to fully-connected (FC) layers, we construct them using a cascade of multi-stage linear least squared regressors (LSRs). The classification and robustness (against adversarial attacks) performances of BP- and FF-designed CNNs applied to the MNIST and the CIFAR-10 datasets are compared. Finally, we comment on the relationship between BP and FF designs.
Tasks
Published	2018-10-05
URL	http://arxiv.org/abs/1810.02786v2
PDF	http://arxiv.org/pdf/1810.02786v2.pdf
PWC	https://paperswithcode.com/paper/interpretable-convolutional-neural-networks-1
Repo	https://github.com/yifan-fanyi/Pixelhop
Framework	none

DeepMind Control Suite


Title	DeepMind Control Suite
Authors	Yuval Tassa, Yotam Doron, Alistair Muldal, Tom Erez, Yazhe Li, Diego de Las Casas, David Budden, Abbas Abdolmaleki, Josh Merel, Andrew Lefrancq, Timothy Lillicrap, Martin Riedmiller
Abstract	The DeepMind Control Suite is a set of continuous control tasks with a standardised structure and interpretable rewards, intended to serve as performance benchmarks for reinforcement learning agents. The tasks are written in Python and powered by the MuJoCo physics engine, making them easy to use and modify. We include benchmarks for several learning algorithms. The Control Suite is publicly available at https://www.github.com/deepmind/dm_control . A video summary of all tasks is available at http://youtu.be/rAai4QzcYbs .
Tasks	Continuous Control
Published	2018-01-02
URL	http://arxiv.org/abs/1801.00690v1
PDF	http://arxiv.org/pdf/1801.00690v1.pdf
PWC	https://paperswithcode.com/paper/deepmind-control-suite
Repo	https://github.com/deepmind/dm_control
Framework	none

MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling


Title	MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling
Authors	Paweł Budzianowski, Tsung-Hsien Wen, Bo-Hsiang Tseng, Iñigo Casanueva, Stefan Ultes, Osman Ramadan, Milica Gašić
Abstract	Even though machine learning has become the major scene in dialogue research community, the real breakthrough has been blocked by the scale of data available. To address this fundamental obstacle, we introduce the Multi-Domain Wizard-of-Oz dataset (MultiWOZ), a fully-labeled collection of human-human written conversations spanning over multiple domains and topics. At a size of $10$k dialogues, it is at least one order of magnitude larger than all previous annotated task-oriented corpora. The contribution of this work apart from the open-sourced dataset labelled with dialogue belief states and dialogue actions is two-fold: firstly, a detailed description of the data collection procedure along with a summary of data structure and analysis is provided. The proposed data-collection pipeline is entirely based on crowd-sourcing without the need of hiring professional annotators; secondly, a set of benchmark results of belief tracking, dialogue act and response generation is reported, which shows the usability of the data and sets a baseline for future studies.
Tasks
Published	2018-09-29
URL	https://arxiv.org/abs/1810.00278v2
PDF	https://arxiv.org/pdf/1810.00278v2.pdf
PWC	https://paperswithcode.com/paper/multiwoz-a-large-scale-multi-domain-wizard-of
Repo	https://github.com/jojonki/MultiWOZ-Parser
Framework	none

BNN+: Improved Binary Network Training


Title	BNN+: Improved Binary Network Training
Authors	Sajad Darabi, Mouloud Belbahri, Matthieu Courbariaux, Vahid Partovi Nia
Abstract	The deployment of Deep neural networks (DNN) on edge devices has been difficult because they are resource hungry. Binary neural networks (BNN) help to alleviate the prohibitive resource requirements of DNN, where both activations and weights are limited to $1$-bit. There is however a significant performance gap between BNNs and floating point DNNs. To reduce this gap, We propose an improved binary training method, by introducing a new regularization function that encourages training weights around binary values. In addition, we add trainable scaling factors to our regularization functions. We also introduce an improved approximation of the derivative of the $sign$ activation function in the backward computation. These modifications are based on linear operations that are easily implementable into the binary training framework. We show experimental results on CIFAR-10 obtaining an accuracy of $87.4%$, on AlexNet and $83.9%$ with DoReFa network. On ImageNet, our method also outperforms the traditional BNN method and XNOR-net, using AlexNet by a margin of $4%$ and $2%$ top-$1$ accuracy respectively. In other words, we significantly reduce the gap between BNNs and floating point DNNs.
Tasks
Published	2018-12-31
URL	http://arxiv.org/abs/1812.11800v2
PDF	http://arxiv.org/pdf/1812.11800v2.pdf
PWC	https://paperswithcode.com/paper/bnn-improved-binary-network-training
Repo	https://github.com/sajaddarabi/BiRealNet
Framework	pytorch

Extreme View Synthesis


Title	Extreme View Synthesis
Authors	Inchang Choi, Orazio Gallo, Alejandro Troccoli, Min H. Kim, Jan Kautz
Abstract	We present Extreme View Synthesis, a solution for novel view extrapolation that works even when the number of input images is small–as few as two. In this context, occlusions and depth uncertainty are two of the most pressing issues, and worsen as the degree of extrapolation increases. We follow the traditional paradigm of performing depth-based warping and refinement, with a few key improvements. First, we estimate a depth probability volume, rather than just a single depth value for each pixel of the novel view. This allows us to leverage depth uncertainty in challenging regions, such as depth discontinuities. After using it to get an initial estimate of the novel view, we explicitly combine learned image priors and the depth uncertainty to synthesize a refined image with less artifacts. Our method is the first to show visually pleasing results for baseline magnifications of up to 30X.
Tasks
Published	2018-12-12
URL	https://arxiv.org/abs/1812.04777v2
PDF	https://arxiv.org/pdf/1812.04777v2.pdf
PWC	https://paperswithcode.com/paper/extreme-view-synthesis
Repo	https://github.com/NVlabs/extreme-view-synth
Framework	pytorch

Norm matters: efficient and accurate normalization schemes in deep networks


Title	Norm matters: efficient and accurate normalization schemes in deep networks
Authors	Elad Hoffer, Ron Banner, Itay Golan, Daniel Soudry
Abstract	Over the past few years, Batch-Normalization has been commonly used in deep networks, allowing faster training and high performance for a wide variety of applications. However, the reasons behind its merits remained unanswered, with several shortcomings that hindered its use for certain tasks. In this work, we present a novel view on the purpose and function of normalization methods and weight-decay, as tools to decouple weights’ norm from the underlying optimized objective. This property highlights the connection between practices such as normalization, weight decay and learning-rate adjustments. We suggest several alternatives to the widely used $L^2$ batch-norm, using normalization in $L^1$ and $L^\infty$ spaces that can substantially improve numerical stability in low-precision implementations as well as provide computational and memory benefits. We demonstrate that such methods enable the first batch-norm alternative to work for half-precision implementations. Finally, we suggest a modification to weight-normalization, which improves its performance on large-scale tasks.
Tasks
Published	2018-03-05
URL	http://arxiv.org/abs/1803.01814v3
PDF	http://arxiv.org/pdf/1803.01814v3.pdf
PWC	https://paperswithcode.com/paper/norm-matters-efficient-and-accurate
Repo	https://github.com/vaapopescu/gradient-pruning
Framework	pytorch

Anomaly Detection using Autoencoders in High Performance Computing Systems


Title	Anomaly Detection using Autoencoders in High Performance Computing Systems
Authors	Andrea Borghesi, Andrea Bartolini, Michele Lombardi, Michela Milano, Luca Benini
Abstract	Anomaly detection in supercomputers is a very difficult problem due to the big scale of the systems and the high number of components. The current state of the art for automated anomaly detection employs Machine Learning methods or statistical regression models in a supervised fashion, meaning that the detection tool is trained to distinguish among a fixed set of behaviour classes (healthy and unhealthy states). We propose a novel approach for anomaly detection in High Performance Computing systems based on a Machine (Deep) Learning technique, namely a type of neural network called autoencoder. The key idea is to train a set of autoencoders to learn the normal (healthy) behaviour of the supercomputer nodes and, after training, use them to identify abnormal conditions. This is different from previous approaches which where based on learning the abnormal condition, for which there are much smaller datasets (since it is very hard to identify them to begin with). We test our approach on a real supercomputer equipped with a fine-grained, scalable monitoring infrastructure that can provide large amount of data to characterize the system behaviour. The results are extremely promising: after the training phase to learn the normal system behaviour, our method is capable of detecting anomalies that have never been seen before with a very good accuracy (values ranging between 88% and 96%).
Tasks	Anomaly Detection
Published	2018-11-13
URL	http://arxiv.org/abs/1811.05269v1
PDF	http://arxiv.org/pdf/1811.05269v1.pdf
PWC	https://paperswithcode.com/paper/anomaly-detection-using-autoencoders-in-high
Repo	https://github.com/Young-in/ANM-Assignment2-loglizer
Framework	none

ListOps: A Diagnostic Dataset for Latent Tree Learning


Title	ListOps: A Diagnostic Dataset for Latent Tree Learning
Authors	Nikita Nangia, Samuel R. Bowman
Abstract	Latent tree learning models learn to parse a sentence without syntactic supervision, and use that parse to build the sentence representation. Existing work on such models has shown that, while they perform well on tasks like sentence classification, they do not learn grammars that conform to any plausible semantic or syntactic formalism (Williams et al., 2018a). Studying the parsing ability of such models in natural language can be challenging due to the inherent complexities of natural language, like having several valid parses for a single sentence. In this paper we introduce ListOps, a toy dataset created to study the parsing ability of latent tree models. ListOps sequences are in the style of prefix arithmetic. The dataset is designed to have a single correct parsing strategy that a system needs to learn to succeed at the task. We show that the current leading latent tree models are unable to learn to parse and succeed at ListOps. These models achieve accuracies worse than purely sequential RNNs.
Tasks	Sentence Classification
Published	2018-04-17
URL	http://arxiv.org/abs/1804.06028v1
PDF	http://arxiv.org/pdf/1804.06028v1.pdf
PWC	https://paperswithcode.com/paper/listops-a-diagnostic-dataset-for-latent-tree
Repo	https://github.com/yikangshen/Ordered-Memory
Framework	pytorch