January 31, 2020

3073 words 15 mins read

Paper Group AWR 414

Paper Group AWR 414

DeepAlign: Alignment-based Process Anomaly Correction using Recurrent Neural Networks. Implicit Semantic Data Augmentation for Deep Networks. Prose for a Painting. Topics to Avoid: Demoting Latent Confounds in Text Classification. Ranking Policy Gradient. Zero-shot transfer for implicit discourse relation classification. Urban Driving with Conditio …

DeepAlign: Alignment-based Process Anomaly Correction using Recurrent Neural Networks

Title DeepAlign: Alignment-based Process Anomaly Correction using Recurrent Neural Networks
Authors Timo Nolle, Alexander Seeliger, Nils Thoma, Max Mühlhäuser
Abstract In this paper, we propose DeepAlign, a novel approach to multi-perspective process anomaly correction, based on recurrent neural networks and bidirectional beam search. At the core of the DeepAlign algorithm are two recurrent neural networks trained to predict the next event. One is reading sequences of process executions from left to right, while the other is reading the sequences from right to left. By combining the predictive capabilities of both neural networks, we show that it is possible to calculate sequence alignments, which are used to detect and correct anomalies. DeepAlign utilizes the case-level and event-level attributes to closely model the decisions within a process. We evaluate the performance of our approach on an elaborate data corpus of 252 realistic synthetic event logs and compare it to three state-of-the-art conformance checking methods. DeepAlign produces better corrections than the rest of the field reaching an overall $F_1$ score of $0.9572$ across all datasets, whereas the best comparable state-of-the-art method reaches $0.6411$.
Tasks
Published 2019-11-29
URL https://arxiv.org/abs/1911.13229v2
PDF https://arxiv.org/pdf/1911.13229v2.pdf
PWC https://paperswithcode.com/paper/deepalign-alignment-based-process-anomaly
Repo https://github.com/tnolle/deepalign
Framework tf

Implicit Semantic Data Augmentation for Deep Networks

Title Implicit Semantic Data Augmentation for Deep Networks
Authors Yulin Wang, Xuran Pan, Shiji Song, Hong Zhang, Cheng Wu, Gao Huang
Abstract In this paper, we propose a novel implicit semantic data augmentation (ISDA) approach to complement traditional augmentation techniques like flipping, translation or rotation. Our work is motivated by the intriguing property that deep networks are surprisingly good at linearizing features, such that certain directions in the deep feature space correspond to meaningful semantic transformations, e.g., adding sunglasses or changing backgrounds. As a consequence, translating training samples along many semantic directions in the feature space can effectively augment the dataset to improve generalization. To implement this idea effectively and efficiently, we first perform an online estimate of the covariance matrix of deep features for each class, which captures the intra-class semantic variations. Then random vectors are drawn from a zero-mean normal distribution with the estimated covariance to augment the training data in that class. Importantly, instead of augmenting the samples explicitly, we can directly minimize an upper bound of the expected cross-entropy (CE) loss on the augmented training set, leading to a highly efficient algorithm. In fact, we show that the proposed ISDA amounts to minimizing a novel robust CE loss, which adds negligible extra computational cost to a normal training procedure. Although being simple, ISDA consistently improves the generalization performance of popular deep models (ResNets and DenseNets) on a variety of datasets, e.g., CIFAR-10, CIFAR-100 and ImageNet. Code for reproducing our results is available at https://github.com/blackfeather-wang/ISDA-for-Deep-Networks.
Tasks Data Augmentation, Image Augmentation
Published 2019-09-26
URL https://arxiv.org/abs/1909.12220v4
PDF https://arxiv.org/pdf/1909.12220v4.pdf
PWC https://paperswithcode.com/paper/implicit-semantic-data-augmentation-for-deep
Repo https://github.com/blackfeather-wang/ISDA-for-Deep-Networks
Framework pytorch

Prose for a Painting

Title Prose for a Painting
Authors Prerna Kashyap, Samrat Phatale, Iddo Drori
Abstract Painting captions are often dry and simplistic which motivates us to describe a painting creatively in the style of Shakespearean prose. This is a difficult problem, since there does not exist a large supervised dataset from paintings to Shakespearean prose. Our solution is to use an intermediate English poem description of the painting and then apply language style transfer which results in Shakespearean prose describing the painting. We rate our results by human evaluation on a Likert scale, and evaluate the quality of language style transfer using BLEU score as a function of prose length. We demonstrate the applicability and limitations of our approach by generating Shakespearean prose for famous paintings. We make our models and code publicly available.
Tasks Style Transfer
Published 2019-10-08
URL https://arxiv.org/abs/1910.03634v1
PDF https://arxiv.org/pdf/1910.03634v1.pdf
PWC https://paperswithcode.com/paper/prose-for-a-painting
Repo https://github.com/prerna135/prose-for-a-painting
Framework none

Topics to Avoid: Demoting Latent Confounds in Text Classification

Title Topics to Avoid: Demoting Latent Confounds in Text Classification
Authors Sachin Kumar, Shuly Wintner, Noah A. Smith, Yulia Tsvetkov
Abstract Despite impressive performance on many text classification tasks, deep neural networks tend to learn frequent superficial patterns that are specific to the training data and do not always generalize well. In this work, we observe this limitation with respect to the task of native language identification. We find that standard text classifiers which perform well on the test set end up learning topical features which are confounds of the prediction task (e.g., if the input text mentions Sweden, the classifier predicts that the author’s native language is Swedish). We propose a method that represents the latent topical confounds and a model which “unlearns” confounding features by predicting both the label of the input text and the confound; but we train the two predictors adversarially in an alternating fashion to learn a text representation that predicts the correct label but is less prone to using information about the confound. We show that this model generalizes better and learns features that are indicative of the writing style rather than the content.
Tasks Language Identification, Native Language Identification, Text Classification
Published 2019-09-01
URL https://arxiv.org/abs/1909.00453v1
PDF https://arxiv.org/pdf/1909.00453v1.pdf
PWC https://paperswithcode.com/paper/topics-to-avoid-demoting-latent-confounds-in
Repo https://github.com/Sachin19/adversarial-classify
Framework pytorch

Ranking Policy Gradient

Title Ranking Policy Gradient
Authors Kaixiang Lin, Jiayu Zhou
Abstract Sample inefficiency is a long-lasting problem in reinforcement learning (RL). The state-of-the-art estimates the optimal action values while it usually involves an extensive search over the state-action space and unstable optimization. Towards the sample-efficient RL, we propose ranking policy gradient (RPG), a policy gradient method that learns the optimal rank of a set of discrete actions. To accelerate the learning of policy gradient methods, we establish the equivalence between maximizing the lower bound of return and imitating a near-optimal policy without accessing any oracles. These results lead to a general off-policy learning framework, which preserves the optimality, reduces variance, and improves the sample-efficiency. Furthermore, the sample complexity of RPG does not depend on the dimension of state space, which enables RPG for large-scale problems. We conduct extensive experiments showing that when consolidating with the off-policy learning framework, RPG substantially reduces the sample complexity, comparing to the state-of-the-art.
Tasks Policy Gradient Methods
Published 2019-06-24
URL https://arxiv.org/abs/1906.09674v3
PDF https://arxiv.org/pdf/1906.09674v3.pdf
PWC https://paperswithcode.com/paper/ranking-policy-gradient
Repo https://github.com/illidanlab/rpg
Framework tf

Zero-shot transfer for implicit discourse relation classification

Title Zero-shot transfer for implicit discourse relation classification
Authors Murathan Kurfalı, Robert Östling
Abstract Automatically classifying the relation between sentences in a discourse is a challenging task, in particular when there is no overt expression of the relation. It becomes even more challenging by the fact that annotated training data exists only for a small number of languages, such as English and Chinese. We present a new system using zero-shot transfer learning for implicit discourse relation classification, where the only resource used for the target language is unannotated parallel text. This system is evaluated on the discourse-annotated TED-MDB parallel corpus, where it obtains good results for all seven languages using only English training data.
Tasks Implicit Discourse Relation Classification, Relation Classification, Transfer Learning
Published 2019-07-30
URL https://arxiv.org/abs/1907.12885v1
PDF https://arxiv.org/pdf/1907.12885v1.pdf
PWC https://paperswithcode.com/paper/zero-shot-transfer-for-implicit-discourse
Repo https://github.com/MurathanKurfali/multilingual_IDRC
Framework pytorch

Urban Driving with Conditional Imitation Learning

Title Urban Driving with Conditional Imitation Learning
Authors Jeffrey Hawke, Richard Shen, Corina Gurau, Siddharth Sharma, Daniele Reda, Nikolay Nikolov, Przemyslaw Mazur, Sean Micklethwaite, Nicolas Griffiths, Amar Shah, Alex Kendall
Abstract Hand-crafting generalised decision-making rules for real-world urban autonomous driving is hard. Alternatively, learning behaviour from easy-to-collect human driving demonstrations is appealing. Prior work has studied imitation learning (IL) for autonomous driving with a number of limitations. Examples include only performing lane-following rather than following a user-defined route, only using a single camera view or heavily cropped frames lacking state observability, only lateral (steering) control, but not longitudinal (speed) control and a lack of interaction with traffic. Importantly, the majority of such systems have been primarily evaluated in simulation - a simple domain, which lacks real-world complexities. Motivated by these challenges, we focus on learning representations of semantics, geometry and motion with computer vision for IL from human driving demonstrations. As our main contribution, we present an end-to-end conditional imitation learning approach, combining both lateral and longitudinal control on a real vehicle for following urban routes with simple traffic. We address inherent dataset bias by data balancing, training our final policy on approximately 30 hours of demonstrations gathered over six months. We evaluate our method on an autonomous vehicle by driving 35km of novel routes in European urban streets.
Tasks Autonomous Driving, Decision Making, Imitation Learning, Steering Control
Published 2019-11-30
URL https://arxiv.org/abs/1912.00177v2
PDF https://arxiv.org/pdf/1912.00177v2.pdf
PWC https://paperswithcode.com/paper/urban-driving-with-conditional-imitation
Repo https://github.com/shunchan0677/deepware
Framework tf

Learning Latent Process from High-Dimensional Event Sequences via Efficient Sampling

Title Learning Latent Process from High-Dimensional Event Sequences via Efficient Sampling
Authors Qitian Wu, Zixuan Zhang, Xiaofeng Gao, Junchi Yan, Guihai Chen
Abstract We target modeling latent dynamics in high-dimension marked event sequences without any prior knowledge about marker relations. Such problem has been rarely studied by previous works which would have fundamental difficulty to handle the arisen challenges: 1) the high-dimensional markers and unknown relation network among them pose intractable obstacles for modeling the latent dynamic process; 2) one observed event sequence may concurrently contain several different chains of interdependent events; 3) it is hard to well define the distance between two high-dimension event sequences. To these ends, in this paper, we propose a seminal adversarial imitation learning framework for high-dimension event sequence generation which could be decomposed into: 1) a latent structural intensity model that estimates the adjacent nodes without explicit networks and learns to capture the temporal dynamics in the latent space of markers over observed sequence; 2) an efficient random walk based generation model that aims at imitating the generation process of high-dimension event sequences from a bottom-up view; 3) a discriminator specified as a seq2seq network optimizing the rewards to help the generator output event sequences as real as possible. Experimental results on both synthetic and real-world datasets demonstrate that the proposed method could effectively detect the hidden network among markers and make decent prediction for future marked events, even when the number of markers scales to million level.
Tasks Imitation Learning
Published 2019-10-28
URL https://arxiv.org/abs/1910.12469v1
PDF https://arxiv.org/pdf/1910.12469v1.pdf
PWC https://paperswithcode.com/paper/learning-latent-process-from-high-dimensional
Repo https://github.com/zhangzx-sjtu/LANTERN-NeurIPS-2019
Framework pytorch

Relay Policy Learning: Solving Long-Horizon Tasks via Imitation and Reinforcement Learning

Title Relay Policy Learning: Solving Long-Horizon Tasks via Imitation and Reinforcement Learning
Authors Abhishek Gupta, Vikash Kumar, Corey Lynch, Sergey Levine, Karol Hausman
Abstract We present relay policy learning, a method for imitation and reinforcement learning that can solve multi-stage, long-horizon robotic tasks. This general and universally-applicable, two-phase approach consists of an imitation learning stage that produces goal-conditioned hierarchical policies, and a reinforcement learning phase that finetunes these policies for task performance. Our method, while not necessarily perfect at imitation learning, is very amenable to further improvement via environment interaction, allowing it to scale to challenging long-horizon tasks. We simplify the long-horizon policy learning problem by using a novel data-relabeling algorithm for learning goal-conditioned hierarchical policies, where the low-level only acts for a fixed number of steps, regardless of the goal achieved. While we rely on demonstration data to bootstrap policy learning, we do not assume access to demonstrations of every specific tasks that is being solved, and instead leverage unstructured and unsegmented demonstrations of semantically meaningful behaviors that are not only less burdensome to provide, but also can greatly facilitate further improvement using reinforcement learning. We demonstrate the effectiveness of our method on a number of multi-stage, long-horizon manipulation tasks in a challenging kitchen simulation environment. Videos are available at https://relay-policy-learning.github.io/
Tasks Imitation Learning
Published 2019-10-25
URL https://arxiv.org/abs/1910.11956v1
PDF https://arxiv.org/pdf/1910.11956v1.pdf
PWC https://paperswithcode.com/paper/relay-policy-learning-solving-long-horizon
Repo https://github.com/sholtodouglas/pointMass
Framework none

EntEval: A Holistic Evaluation Benchmark for Entity Representations

Title EntEval: A Holistic Evaluation Benchmark for Entity Representations
Authors Mingda Chen, Zewei Chu, Yang Chen, Karl Stratos, Kevin Gimpel
Abstract Rich entity representations are useful for a wide class of problems involving entities. Despite their importance, there is no standardized benchmark that evaluates the overall quality of entity representations. In this work, we propose EntEval: a test suite of diverse tasks that require nontrivial understanding of entities including entity typing, entity similarity, entity relation prediction, and entity disambiguation. In addition, we develop training techniques for learning better entity representations by using natural hyperlink annotations in Wikipedia. We identify effective objectives for incorporating the contextual information in hyperlinks into state-of-the-art pretrained language models and show that they improve strong baselines on multiple EntEval tasks.
Tasks Entity Disambiguation, Entity Typing
Published 2019-08-31
URL https://arxiv.org/abs/1909.00137v2
PDF https://arxiv.org/pdf/1909.00137v2.pdf
PWC https://paperswithcode.com/paper/enteval-a-holistic-evaluation-benchmark-for
Repo https://github.com/ZeweiChu/EntEval
Framework pytorch

Knowledge Transfer Graph for Deep Collaborative Learning

Title Knowledge Transfer Graph for Deep Collaborative Learning
Authors Soma Minami, Tsubasa Hirakawa, Takayoshi Yamashita, Hironobu Fujiyoshi
Abstract Knowledge transfer among multiple networks using their outputs or intermediate activations have evolved through extensive manual design from a simple teacher-student approach (knowledge distillation) to a bidirectional cohort one (deep mutual learning). The key factors of such knowledge transfer involve the network size, the number of networks, the transfer direction, and the design of the loss function. However, because these factors are enormous when combined and become intricately entangled, the methods of conventional knowledge transfer have explored only limited combinations. In this paper, we propose a new graph-based approach for more flexible and diverse combinations of knowledge transfer. To achieve the knowledge transfer, we propose a novel graph representation called knowledge transfer graph that provides a unified view of the knowledge transfer and has the potential to represent diverse knowledge transfer patterns. We also propose four gate functions that are introduced into loss functions. The four gates, which control the gradient, can deliver diverse combinations of knowledge transfer. Searching the graph structure enables us to discover more effective knowledge transfer methods than a manually designed one. Experimental results on the CIFAR-10, -100, and Tiny-ImageNet datasets show that the proposed method achieved significant performance improvements and was able to find remarkable graph structures.
Tasks Transfer Learning
Published 2019-09-10
URL https://arxiv.org/abs/1909.04286v2
PDF https://arxiv.org/pdf/1909.04286v2.pdf
PWC https://paperswithcode.com/paper/knowledge-transfer-graph-for-deep
Repo https://github.com/somaminami/DCL
Framework pytorch

MixHop: Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing

Title MixHop: Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing
Authors Sami Abu-El-Haija, Bryan Perozzi, Amol Kapoor, Nazanin Alipourfard, Kristina Lerman, Hrayr Harutyunyan, Greg Ver Steeg, Aram Galstyan
Abstract Existing popular methods for semi-supervised learning with Graph Neural Networks (such as the Graph Convolutional Network) provably cannot learn a general class of neighborhood mixing relationships. To address this weakness, we propose a new model, MixHop, that can learn these relationships, including difference operators, by repeatedly mixing feature representations of neighbors at various distances. Mixhop requires no additional memory or computational complexity, and outperforms on challenging baselines. In addition, we propose sparsity regularization that allows us to visualize how the network prioritizes neighborhood information across different graph datasets. Our analysis of the learned architectures reveals that neighborhood mixing varies per datasets.
Tasks Node Classification
Published 2019-04-30
URL https://arxiv.org/abs/1905.00067v3
PDF https://arxiv.org/pdf/1905.00067v3.pdf
PWC https://paperswithcode.com/paper/mixhop-higher-order-graph-convolution
Repo https://github.com/samihaija/mixhop
Framework tf

Exploiting Uncertainty of Loss Landscape for Stochastic Optimization

Title Exploiting Uncertainty of Loss Landscape for Stochastic Optimization
Authors Vineeth S. Bhaskara, Sneha Desai
Abstract We introduce novel variants of momentum by incorporating the variance of the stochastic loss function. The variance characterizes the confidence or uncertainty of the local features of the averaged loss surface across the i.i.d. subsets of the training data defined by the mini-batches. We show two applications of the gradient of the variance of the loss function. First, as a bias to the conventional momentum update to encourage conformity of the local features of the loss function (e.g. local minima) across mini-batches to improve generalization and the cumulative training progress made per epoch. Second, as an alternative direction for “exploration” in the parameter space, especially, for non-convex objectives, that exploits both the optimistic and pessimistic views of the loss function in the face of uncertainty. We also introduce a novel data-driven stochastic regularization technique through the parameter update rule that is model-agnostic and compatible with arbitrary architectures. We further establish connections to probability distributions over loss functions and the REINFORCE policy gradient update with baseline in RL. Finally, we incorporate the new variants of momentum proposed into Adam, and empirically show that our methods improve the rate of convergence of training based on our experiments on the MNIST and CIFAR-10 datasets.
Tasks Stochastic Optimization
Published 2019-05-30
URL https://arxiv.org/abs/1905.13200v1
PDF https://arxiv.org/pdf/1905.13200v1.pdf
PWC https://paperswithcode.com/paper/exploiting-uncertainty-of-loss-landscape-for
Repo https://github.com/bsvineethiitg/adams
Framework pytorch

Pixel-wise Conditioning of Generative Adversarial Networks

Title Pixel-wise Conditioning of Generative Adversarial Networks
Authors Cyprien Ruffino, Romain Hérault, Eric Laloy, Gilles Gasso
Abstract Generative Adversarial Networks (GANs) have proven successful for unsupervised image generation. Several works extended GANs to image inpainting by conditioning the generation with parts of the image one wants to reconstruct. However, these methods have limitations in settings where only a small subset of the image pixels is known beforehand. In this paper, we study the effectiveness of conditioning GANs by adding an explicit regularization term to enforce pixel-wise conditions when very few pixel values are provided. In addition, we also investigate the influence of this regularization term on the quality of the generated images and the satisfaction of the conditions. Conducted experiments on MNIST and FashionMNIST show evidence that this regularization term allows for controlling the trade-off between quality of the generated images and constraint satisfaction.
Tasks Image Generation, Image Inpainting
Published 2019-11-02
URL https://arxiv.org/abs/1911.00689v1
PDF https://arxiv.org/pdf/1911.00689v1.pdf
PWC https://paperswithcode.com/paper/pixel-wise-conditioning-of-generative
Repo https://github.com/cyprienruffino/pixelwise
Framework tf

The importance of better models in stochastic optimization

Title The importance of better models in stochastic optimization
Authors Hilal Asi, John C. Duchi
Abstract Standard stochastic optimization methods are brittle, sensitive to stepsize choices and other algorithmic parameters, and they exhibit instability outside of well-behaved families of objectives. To address these challenges, we investigate models for stochastic minimization and learning problems that exhibit better robustness to problem families and algorithmic parameters. With appropriately accurate models—which we call the aProx family—stochastic methods can be made stable, provably convergent and asymptotically optimal; even modeling that the objective is nonnegative is sufficient for this stability. We extend these results beyond convexity to weakly convex objectives, which include compositions of convex losses with smooth functions common in modern machine learning applications. We highlight the importance of robustness and accurate modeling with a careful experimental evaluation of convergence time and algorithm sensitivity.
Tasks Stochastic Optimization
Published 2019-03-20
URL http://arxiv.org/abs/1903.08619v1
PDF http://arxiv.org/pdf/1903.08619v1.pdf
PWC https://paperswithcode.com/paper/the-importance-of-better-models-in-stochastic
Repo https://github.com/HilalAsi/APROX-Robust-Stochastic-Optimization-Algorithms
Framework pytorch
comments powered by Disqus