July 30, 2019

3405 words 16 mins read

Paper Group AWR 11

Shakespearizing Modern Language Using Copy-Enriched Sequence-to-Sequence Models. Many Paths to Equilibrium: GANs Do Not Need to Decrease a Divergence At Every Step. Grad-CAM++: Improved Visual Explanations for Deep Convolutional Networks. End-to-End Task-Completion Neural Dialogue Systems. Domain Adaptation by Using Causal Inference to Predict Inva …

Shakespearizing Modern Language Using Copy-Enriched Sequence-to-Sequence Models


Title	Shakespearizing Modern Language Using Copy-Enriched Sequence-to-Sequence Models
Authors	Harsh Jhamtani, Varun Gangal, Eduard Hovy, Eric Nyberg
Abstract	Variations in writing styles are commonly used to adapt the content to a specific context, audience, or purpose. However, applying stylistic variations is still by and large a manual process, and there have been little efforts towards automating it. In this paper we explore automated methods to transform text from modern English to Shakespearean English using an end to end trainable neural model with pointers to enable copy action. To tackle limited amount of parallel data, we pre-train embeddings of words by leveraging external dictionaries mapping Shakespearean words to modern English words as well as additional text. Our methods are able to get a BLEU score of 31+, an improvement of ~6 points above the strongest baseline. We publicly release our code to foster further research in this area.
Tasks
Published	2017-07-04
URL	http://arxiv.org/abs/1707.01161v2
PDF	http://arxiv.org/pdf/1707.01161v2.pdf
PWC	https://paperswithcode.com/paper/shakespearizing-modern-language-using-copy-1
Repo	https://github.com/harsh19/Shakespearizing-Modern-English
Framework	tf

Many Paths to Equilibrium: GANs Do Not Need to Decrease a Divergence At Every Step


Title	Many Paths to Equilibrium: GANs Do Not Need to Decrease a Divergence At Every Step
Authors	William Fedus, Mihaela Rosca, Balaji Lakshminarayanan, Andrew M. Dai, Shakir Mohamed, Ian Goodfellow
Abstract	Generative adversarial networks (GANs) are a family of generative models that do not minimize a single training criterion. Unlike other generative models, the data distribution is learned via a game between a generator (the generative model) and a discriminator (a teacher providing training signal) that each minimize their own cost. GANs are designed to reach a Nash equilibrium at which each player cannot reduce their cost without changing the other players’ parameters. One useful approach for the theory of GANs is to show that a divergence between the training distribution and the model distribution obtains its minimum value at equilibrium. Several recent research directions have been motivated by the idea that this divergence is the primary guide for the learning process and that every step of learning should decrease the divergence. We show that this view is overly restrictive. During GAN training, the discriminator provides learning signal in situations where the gradients of the divergences between distributions would not be useful. We provide empirical counterexamples to the view of GAN training as divergence minimization. Specifically, we demonstrate that GANs are able to learn distributions in situations where the divergence minimization point of view predicts they would fail. We also show that gradient penalties motivated from the divergence minimization perspective are equally helpful when applied in other contexts in which the divergence minimization perspective does not predict they would be helpful. This contributes to a growing body of evidence that GAN training may be more usefully viewed as approaching Nash equilibria via trajectories that do not necessarily minimize a specific divergence at each step.
Tasks
Published	2017-10-23
URL	http://arxiv.org/abs/1710.08446v3
PDF	http://arxiv.org/pdf/1710.08446v3.pdf
PWC	https://paperswithcode.com/paper/many-paths-to-equilibrium-gans-do-not-need-to
Repo	https://github.com/kodalinaveen3/DRAGAN
Framework	pytorch

Grad-CAM++: Improved Visual Explanations for Deep Convolutional Networks


Title	Grad-CAM++: Improved Visual Explanations for Deep Convolutional Networks
Authors	Aditya Chattopadhyay, Anirban Sarkar, Prantik Howlader, Vineeth N Balasubramanian
Abstract	Over the last decade, Convolutional Neural Network (CNN) models have been highly successful in solving complex vision problems. However, these deep models are perceived as “black box” methods considering the lack of understanding of their internal functioning. There has been a significant recent interest in developing explainable deep learning models, and this paper is an effort in this direction. Building on a recently proposed method called Grad-CAM, we propose a generalized method called Grad-CAM++ that can provide better visual explanations of CNN model predictions, in terms of better object localization as well as explaining occurrences of multiple object instances in a single image, when compared to state-of-the-art. We provide a mathematical derivation for the proposed method, which uses a weighted combination of the positive partial derivatives of the last convolutional layer feature maps with respect to a specific class score as weights to generate a visual explanation for the corresponding class label. Our extensive experiments and evaluations, both subjective and objective, on standard datasets showed that Grad-CAM++ provides promising human-interpretable visual explanations for a given CNN architecture across multiple tasks including classification, image caption generation and 3D action recognition; as well as in new settings such as knowledge distillation.
Tasks	3D Human Action Recognition, Object Localization, Temporal Action Localization
Published	2017-10-30
URL	http://arxiv.org/abs/1710.11063v3
PDF	http://arxiv.org/pdf/1710.11063v3.pdf
PWC	https://paperswithcode.com/paper/grad-cam-improved-visual-explanations-for
Repo	https://github.com/totti0223/gradcamplusplus
Framework	none

End-to-End Task-Completion Neural Dialogue Systems


Title	End-to-End Task-Completion Neural Dialogue Systems
Authors	Xiujun Li, Yun-Nung Chen, Lihong Li, Jianfeng Gao, Asli Celikyilmaz
Abstract	One of the major drawbacks of modularized task-completion dialogue systems is that each module is trained individually, which presents several challenges. For example, downstream modules are affected by earlier modules, and the performance of the entire system is not robust to the accumulated errors. This paper presents a novel end-to-end learning framework for task-completion dialogue systems to tackle such issues. Our neural dialogue system can directly interact with a structured database to assist users in accessing information and accomplishing certain tasks. The reinforcement learning based dialogue manager offers robust capabilities to handle noises caused by other components of the dialogue system. Our experiments in a movie-ticket booking domain show that our end-to-end system not only outperforms modularized dialogue system baselines for both objective and subjective evaluation, but also is robust to noises as demonstrated by several systematic experiments with different error granularity and rates specific to the language understanding module.
Tasks	Chatbot
Published	2017-03-03
URL	http://arxiv.org/abs/1703.01008v4
PDF	http://arxiv.org/pdf/1703.01008v4.pdf
PWC	https://paperswithcode.com/paper/end-to-end-task-completion-neural-dialogue
Repo	https://github.com/bagequan/MS-BCS-DDQ
Framework	none

Domain Adaptation by Using Causal Inference to Predict Invariant Conditional Distributions


Title	Domain Adaptation by Using Causal Inference to Predict Invariant Conditional Distributions
Authors	Sara Magliacane, Thijs van Ommen, Tom Claassen, Stephan Bongers, Philip Versteeg, Joris M. Mooij
Abstract	An important goal common to domain adaptation and causal inference is to make accurate predictions when the distributions for the source (or training) domain(s) and target (or test) domain(s) differ. In many cases, these different distributions can be modeled as different contexts of a single underlying system, in which each distribution corresponds to a different perturbation of the system, or in causal terms, an intervention. We focus on a class of such causal domain adaptation problems, where data for one or more source domains are given, and the task is to predict the distribution of a certain target variable from measurements of other variables in one or more target domains. We propose an approach for solving these problems that exploits causal inference and does not rely on prior knowledge of the causal graph, the type of interventions or the intervention targets. We demonstrate our approach by evaluating a possible implementation on simulated and real world data.
Tasks	Causal Inference, Domain Adaptation
Published	2017-07-20
URL	http://arxiv.org/abs/1707.06422v3
PDF	http://arxiv.org/pdf/1707.06422v3.pdf
PWC	https://paperswithcode.com/paper/domain-adaptation-by-using-causal-inference
Repo	https://github.com/caus-am/dom_adapt
Framework	none

On Adaptive Propensity Score Truncation in Causal Inference


Title	On Adaptive Propensity Score Truncation in Causal Inference
Authors	Cheng Ju, Joshua Schwab, Mark J. van der Laan
Abstract	The positivity assumption, or the experimental treatment assignment (ETA) assumption, is important for identifiability in causal inference. Even if the positivity assumption holds, practical violations of this assumption may jeopardize the finite sample performance of the causal estimator. One of the consequences of practical violations of the positivity assumption is extreme values in the estimated propensity score (PS). A common practice to address this issue is truncating the PS estimate when constructing PS-based estimators. In this study, we propose a novel adaptive truncation method, Positivity-C-TMLE, based on the collaborative targeted maximum likelihood estimation (C-TMLE) methodology. We demonstrate the outstanding performance of our novel approach in a variety of simulations by comparing it with other commonly studied estimators. Results show that by adaptively truncating the estimated PS with a more targeted objective function, the Positivity-C-TMLE estimator achieves the best performance for both point estimation and confidence interval coverage among all estimators considered.
Tasks	Causal Inference
Published	2017-07-18
URL	http://arxiv.org/abs/1707.05861v1
PDF	http://arxiv.org/pdf/1707.05861v1.pdf
PWC	https://paperswithcode.com/paper/on-adaptive-propensity-score-truncation-in
Repo	https://github.com/jucheng1992/ctmle
Framework	none

LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation


Title	LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation
Authors	Abhishek Chaurasia, Eugenio Culurciello
Abstract	Pixel-wise semantic segmentation for visual scene understanding not only needs to be accurate, but also efficient in order to find any use in real-time application. Existing algorithms even though are accurate but they do not focus on utilizing the parameters of neural network efficiently. As a result they are huge in terms of parameters and number of operations; hence slow too. In this paper, we propose a novel deep neural network architecture which allows it to learn without any significant increase in number of parameters. Our network uses only 11.5 million parameters and 21.2 GFLOPs for processing an image of resolution 3x640x360. It gives state-of-the-art performance on CamVid and comparable results on Cityscapes dataset. We also compare our networks processing time on NVIDIA GPU and embedded system device with existing state-of-the-art architectures for different image resolutions.
Tasks	Scene Understanding, Semantic Segmentation
Published	2017-06-14
URL	http://arxiv.org/abs/1707.03718v1
PDF	http://arxiv.org/pdf/1707.03718v1.pdf
PWC	https://paperswithcode.com/paper/linknet-exploiting-encoder-representations
Repo	https://github.com/ternaus/angiodysplasia-segmentation
Framework	pytorch

From optimal transport to generative modeling: the VEGAN cookbook


Title	From optimal transport to generative modeling: the VEGAN cookbook
Authors	Olivier Bousquet, Sylvain Gelly, Ilya Tolstikhin, Carl-Johann Simon-Gabriel, Bernhard Schoelkopf
Abstract	We study unsupervised generative modeling in terms of the optimal transport (OT) problem between true (but unknown) data distribution $P_X$ and the latent variable model distribution $P_G$. We show that the OT problem can be equivalently written in terms of probabilistic encoders, which are constrained to match the posterior and prior distributions over the latent space. When relaxed, this constrained optimization problem leads to a penalized optimal transport (POT) objective, which can be efficiently minimized using stochastic gradient descent by sampling from $P_X$ and $P_G$. We show that POT for the 2-Wasserstein distance coincides with the objective heuristically employed in adversarial auto-encoders (AAE) (Makhzani et al., 2016), which provides the first theoretical justification for AAEs known to the authors. We also compare POT to other popular techniques like variational auto-encoders (VAE) (Kingma and Welling, 2014). Our theoretical results include (a) a better understanding of the commonly observed blurriness of images generated by VAEs, and (b) establishing duality between Wasserstein GAN (Arjovsky and Bottou, 2017) and POT for the 1-Wasserstein distance.
Tasks
Published	2017-05-22
URL	http://arxiv.org/abs/1705.07642v1
PDF	http://arxiv.org/pdf/1705.07642v1.pdf
PWC	https://paperswithcode.com/paper/from-optimal-transport-to-generative-modeling
Repo	https://github.com/zhenxuan00/graphical-gan
Framework	tf

DeepIoT: Compressing Deep Neural Network Structures for Sensing Systems with a Compressor-Critic Framework


Title	DeepIoT: Compressing Deep Neural Network Structures for Sensing Systems with a Compressor-Critic Framework
Authors	Shuochao Yao, Yiran Zhao, Aston Zhang, Lu Su, Tarek Abdelzaher
Abstract	Recent advances in deep learning motivate the use of deep neutral networks in sensing applications, but their excessive resource needs on constrained embedded devices remain an important impediment. A recently explored solution space lies in compressing (approximating or simplifying) deep neural networks in some manner before use on the device. We propose a new compression solution, called DeepIoT, that makes two key contributions in that space. First, unlike current solutions geared for compressing specific types of neural networks, DeepIoT presents a unified approach that compresses all commonly used deep learning structures for sensing applications, including fully-connected, convolutional, and recurrent neural networks, as well as their combinations. Second, unlike solutions that either sparsify weight matrices or assume linear structure within weight matrices, DeepIoT compresses neural network structures into smaller dense matrices by finding the minimum number of non-redundant hidden elements, such as filters and dimensions required by each layer, while keeping the performance of sensing applications the same. Importantly, it does so using an approach that obtains a global view of parameter redundancies, which is shown to produce superior compression. We conduct experiments with five different sensing-related tasks on Intel Edison devices. DeepIoT outperforms all compared baseline algorithms with respect to execution time and energy consumption by a significant margin. It reduces the size of deep neural networks by 90% to 98.9%. It is thus able to shorten execution time by 71.4% to 94.5%, and decrease energy consumption by 72.2% to 95.7%. These improvements are achieved without loss of accuracy. The results underscore the potential of DeepIoT for advancing the exploitation of deep neural networks on resource-constrained embedded devices.
Tasks
Published	2017-06-05
URL	http://arxiv.org/abs/1706.01215v3
PDF	http://arxiv.org/pdf/1706.01215v3.pdf
PWC	https://paperswithcode.com/paper/deepiot-compressing-deep-neural-network
Repo	https://github.com/AtenaKid/Reproducible-Deep-Learning-in-Communication
Framework	tf

Dilated Residual Networks


Title	Dilated Residual Networks
Authors	Fisher Yu, Vladlen Koltun, Thomas Funkhouser
Abstract	Convolutional networks for image classification progressively reduce resolution until the image is represented by tiny feature maps in which the spatial structure of the scene is no longer discernible. Such loss of spatial acuity can limit image classification accuracy and complicate the transfer of the model to downstream applications that require detailed scene understanding. These problems can be alleviated by dilation, which increases the resolution of output feature maps without reducing the receptive field of individual neurons. We show that dilated residual networks (DRNs) outperform their non-dilated counterparts in image classification without increasing the model’s depth or complexity. We then study gridding artifacts introduced by dilation, develop an approach to removing these artifacts (`degridding’), and show that this further increases the performance of DRNs. In addition, we show that the accuracy advantage of DRNs is further magnified in downstream applications such as object localization and semantic segmentation. \|
Tasks	Image Classification, Object Localization, Scene Understanding, Semantic Segmentation
Published	2017-05-28
URL	http://arxiv.org/abs/1705.09914v1
PDF	http://arxiv.org/pdf/1705.09914v1.pdf
PWC	https://paperswithcode.com/paper/dilated-residual-networks
Repo	https://github.com/osmr/imgclsmob
Framework	mxnet

Spatio-Temporal Data Mining: A Survey of Problems and Methods


Title	Spatio-Temporal Data Mining: A Survey of Problems and Methods
Authors	Gowtham Atluri, Anuj Karpatne, Vipin Kumar
Abstract	Large volumes of spatio-temporal data are increasingly collected and studied in diverse domains including, climate science, social sciences, neuroscience, epidemiology, transportation, mobile health, and Earth sciences. Spatio-temporal data differs from relational data for which computational approaches are developed in the data mining community for multiple decades, in that both spatial and temporal attributes are available in addition to the actual measurements/attributes. The presence of these attributes introduces additional challenges that needs to be dealt with. Approaches for mining spatio-temporal data have been studied for over a decade in the data mining community. In this article we present a broad survey of this relatively young field of spatio-temporal data mining. We discuss different types of spatio-temporal data and the relevant data mining questions that arise in the context of analyzing each of these datasets. Based on the nature of the data mining problem studied, we classify literature on spatio-temporal data mining into six major categories: clustering, predictive learning, change detection, frequent pattern mining, anomaly detection, and relationship mining. We discuss the various forms of spatio-temporal data mining problems in each of these categories.
Tasks	Anomaly Detection, Epidemiology
Published	2017-11-13
URL	http://arxiv.org/abs/1711.04710v2
PDF	http://arxiv.org/pdf/1711.04710v2.pdf
PWC	https://paperswithcode.com/paper/spatio-temporal-data-mining-a-survey-of
Repo	https://github.com/devbas/ovassistant-alpha
Framework	none

Accelerated Hierarchical Density Clustering


Title	Accelerated Hierarchical Density Clustering
Authors	Leland McInnes, John Healy
Abstract	We present an accelerated algorithm for hierarchical density based clustering. Our new algorithm improves upon HDBSCAN, which itself provided a significant qualitative improvement over the popular DBSCAN algorithm. The accelerated HDBSCAN algorithm provides comparable performance to DBSCAN, while supporting variable density clusters, and eliminating the need for the difficult to tune distance scale parameter. This makes accelerated HDBSCAN* the default choice for density based clustering. Library available at: https://github.com/scikit-learn-contrib/hdbscan
Tasks
Published	2017-05-20
URL	http://arxiv.org/abs/1705.07321v2
PDF	http://arxiv.org/pdf/1705.07321v2.pdf
PWC	https://paperswithcode.com/paper/accelerated-hierarchical-density-clustering
Repo	https://github.com/scikit-learn-contrib/hdbscan
Framework	none

Fast structure learning with modular regularization


Title	Fast structure learning with modular regularization
Authors	Greg Ver Steeg, Hrayr Harutyunyan, Daniel Moyer, Aram Galstyan
Abstract	Estimating graphical model structure from high-dimensional and undersampled data is a fundamental problem in many scientific fields. Existing approaches, such as GLASSO, latent variable GLASSO, and latent tree models, suffer from high computational complexity and may impose unrealistic sparsity priors in some cases. We introduce a novel method that leverages a newly discovered connection between information-theoretic measures and structured latent factor models to derive an optimization objective which encourages modular structures where each observed variable has a single latent parent. The proposed method has linear stepwise computational complexity w.r.t. the number of observed variables. Our experiments on synthetic data demonstrate that our approach is the only method that recovers modular structure better as the dimensionality increases. We also use our approach for estimating covariance structure for a number of real-world datasets and show that it consistently outperforms state-of-the-art estimators at a fraction of the computational cost. Finally, we apply the proposed method to high-resolution fMRI data (with more than 10^5 voxels) and show that it is capable of extracting meaningful patterns.
Tasks
Published	2017-06-11
URL	https://arxiv.org/abs/1706.03353v3
PDF	https://arxiv.org/pdf/1706.03353v3.pdf
PWC	https://paperswithcode.com/paper/low-complexity-gaussian-latent-factor-models
Repo	https://github.com/gregversteeg/LinearSieve
Framework	tf

Neuromorphic Hardware In The Loop: Training a Deep Spiking Network on the BrainScaleS Wafer-Scale System


Title	Neuromorphic Hardware In The Loop: Training a Deep Spiking Network on the BrainScaleS Wafer-Scale System
Authors	Sebastian Schmitt, Johann Klaehn, Guillaume Bellec, Andreas Gruebl, Maurice Guettler, Andreas Hartel, Stephan Hartmann, Dan Husmann, Kai Husmann, Vitali Karasenko, Mitja Kleider, Christoph Koke, Christian Mauch, Eric Mueller, Paul Mueller, Johannes Partzsch, Mihai A. Petrovici, Stefan Schiefer, Stefan Scholze, Bernhard Vogginger, Robert Legenstein, Wolfgang Maass, Christian Mayr, Johannes Schemmel, Karlheinz Meier
Abstract	Emulating spiking neural networks on analog neuromorphic hardware offers several advantages over simulating them on conventional computers, particularly in terms of speed and energy consumption. However, this usually comes at the cost of reduced control over the dynamics of the emulated networks. In this paper, we demonstrate how iterative training of a hardware-emulated network can compensate for anomalies induced by the analog substrate. We first convert a deep neural network trained in software to a spiking network on the BrainScaleS wafer-scale neuromorphic system, thereby enabling an acceleration factor of 10 000 compared to the biological time domain. This mapping is followed by the in-the-loop training, where in each training step, the network activity is first recorded in hardware and then used to compute the parameter updates in software via backpropagation. An essential finding is that the parameter updates do not have to be precise, but only need to approximately follow the correct gradient, which simplifies the computation of updates. Using this approach, after only several tens of iterations, the spiking network shows an accuracy close to the ideal software-emulated prototype. The presented techniques show that deep spiking networks emulated on analog neuromorphic devices can attain good computational performance despite the inherent variations of the analog substrate.
Tasks
Published	2017-03-06
URL	http://arxiv.org/abs/1703.01909v1
PDF	http://arxiv.org/pdf/1703.01909v1.pdf
PWC	https://paperswithcode.com/paper/neuromorphic-hardware-in-the-loop-training-a
Repo	https://github.com/hbp-unibi/BS2Cypress
Framework	none

Automated Problem Identification: Regression vs Classification via Evolutionary Deep Networks


Title	Automated Problem Identification: Regression vs Classification via Evolutionary Deep Networks
Authors	Emmanuel Dufourq, Bruce A. Bassett
Abstract	Regression or classification? This is perhaps the most basic question faced when tackling a new supervised learning problem. We present an Evolutionary Deep Learning (EDL) algorithm that automatically solves this by identifying the question type with high accuracy, along with a proposed deep architecture. Typically, a significant amount of human insight and preparation is required prior to executing machine learning algorithms. For example, when creating deep neural networks, the number of parameters must be selected in advance and furthermore, a lot of these choices are made based upon pre-existing knowledge of the data such as the use of a categorical cross entropy loss function. Humans are able to study a dataset and decide whether it represents a classification or a regression problem, and consequently make decisions which will be applied to the execution of the neural network. We propose the Automated Problem Identification (API) algorithm, which uses an evolutionary algorithm interface to TensorFlow to manipulate a deep neural network to decide if a dataset represents a classification or a regression problem. We test API on 16 different classification, regression and sentiment analysis datasets with up to 10,000 features and up to 17,000 unique target values. API achieves an average accuracy of $96.3%$ in identifying the problem type without hardcoding any insights about the general characteristics of regression or classification problems. For example, API successfully identifies classification problems even with 1000 target values. Furthermore, the algorithm recommends which loss function to use and also recommends a neural network architecture. Our work is therefore a step towards fully automated machine learning.
Tasks	Sentiment Analysis
Published	2017-07-03
URL	http://arxiv.org/abs/1707.00703v1
PDF	http://arxiv.org/pdf/1707.00703v1.pdf
PWC	https://paperswithcode.com/paper/automated-problem-identification-regression
Repo	https://github.com/yicli/API
Framework	tf