October 21, 2019

3105 words 15 mins read

Paper Group AWR 143

Paper Group AWR 143

D3D: Distilled 3D Networks for Video Action Recognition. DUGMA: Dynamic Uncertainty-Based Gaussian Mixture Alignment. Detecting Offensive Language in Tweets Using Deep Learning. Adversarial Perturbations Against Real-Time Video Classification Systems. Modeling Mistrust in End-of-Life Care. GraphVAE: Towards Generation of Small Graphs Using Variatio …

D3D: Distilled 3D Networks for Video Action Recognition

Title D3D: Distilled 3D Networks for Video Action Recognition
Authors Jonathan C. Stroud, David A. Ross, Chen Sun, Jia Deng, Rahul Sukthankar
Abstract State-of-the-art methods for video action recognition commonly use an ensemble of two networks: the spatial stream, which takes RGB frames as input, and the temporal stream, which takes optical flow as input. In recent work, both of these streams consist of 3D Convolutional Neural Networks, which apply spatiotemporal filters to the video clip before performing classification. Conceptually, the temporal filters should allow the spatial stream to learn motion representations, making the temporal stream redundant. However, we still see significant benefits in action recognition performance by including an entirely separate temporal stream, indicating that the spatial stream is “missing” some of the signal captured by the temporal stream. In this work, we first investigate whether motion representations are indeed missing in the spatial stream of 3D CNNs. Second, we demonstrate that these motion representations can be improved by distillation, by tuning the spatial stream to predict the outputs of the temporal stream, effectively combining both models into a single stream. Finally, we show that our Distilled 3D Network (D3D) achieves performance on par with two-stream approaches, using only a single model and with no need to compute optical flow.
Tasks Optical Flow Estimation, Temporal Action Localization
Published 2018-12-19
URL http://arxiv.org/abs/1812.08249v2
PDF http://arxiv.org/pdf/1812.08249v2.pdf
PWC https://paperswithcode.com/paper/d3d-distilled-3d-networks-for-video-action
Repo https://github.com/princeton-vl/d3dhelper
Framework tf

DUGMA: Dynamic Uncertainty-Based Gaussian Mixture Alignment

Title DUGMA: Dynamic Uncertainty-Based Gaussian Mixture Alignment
Authors Can Pu, Nanbo Li, Radim Tylecek, Robert B Fisher
Abstract Registering accurately point clouds from a cheap low-resolution sensor is a challenging task. Existing rigid registration methods failed to use the physical 3D uncertainty distribution of each point from a real sensor in the dynamic alignment process mainly because the uncertainty model for a point is static and invariant and it is hard to describe the change of these physical uncertainty models in the registration process. Additionally, the existing Gaussian mixture alignment architecture cannot be efficiently implement these dynamic changes. This paper proposes a simple architecture combining error estimation from sample covariances and dual dynamic global probability alignment using the convolution of uncertainty-based Gaussian Mixture Models (GMM) from point clouds. Firstly, we propose an efficient way to describe the change of each 3D uncertainty model, which represents the structure of the point cloud much better. Unlike the invariant GMM (representing a fixed point cloud) in traditional Gaussian mixture alignment, we use two uncertainty-based GMMs that change and interact with each other in each iteration. In order to have a wider basin of convergence than other local algorithms, we design a more robust energy function by convolving efficiently the two GMMs over the whole 3D space. Tens of thousands of trials have been conducted on hundreds of models from multiple datasets to demonstrate the proposed method’s superior performance compared with the current state-of-the-art methods. The new dataset and code is available from https://github.com/Canpu999
Tasks
Published 2018-03-18
URL http://arxiv.org/abs/1803.07426v2
PDF http://arxiv.org/pdf/1803.07426v2.pdf
PWC https://paperswithcode.com/paper/dugma-dynamic-uncertainty-based-gaussian
Repo https://github.com/Canpu999/DUGMA
Framework none

Detecting Offensive Language in Tweets Using Deep Learning

Title Detecting Offensive Language in Tweets Using Deep Learning
Authors Georgios K. Pitsilis, Heri Ramampiaro, Helge Langseth
Abstract This paper addresses the important problem of discerning hateful content in social media. We propose a detection scheme that is an ensemble of Recurrent Neural Network (RNN) classifiers, and it incorporates various features associated with user-related information, such as the users’ tendency towards racism or sexism. These data are fed as input to the above classifiers along with the word frequency vectors derived from the textual content. Our approach has been evaluated on a publicly available corpus of 16k tweets, and the results demonstrate its effectiveness in comparison to existing state of the art solutions. More specifically, our scheme can successfully distinguish racism and sexism messages from normal text, and achieve higher classification quality than current state-of-the-art algorithms.
Tasks
Published 2018-01-13
URL http://arxiv.org/abs/1801.04433v1
PDF http://arxiv.org/pdf/1801.04433v1.pdf
PWC https://paperswithcode.com/paper/detecting-offensive-language-in-tweets-using
Repo https://github.com/gpitsilis/hate-speech
Framework none

Adversarial Perturbations Against Real-Time Video Classification Systems

Title Adversarial Perturbations Against Real-Time Video Classification Systems
Authors Shasha Li, Ajaya Neupane, Sujoy Paul, Chengyu Song, Srikanth V. Krishnamurthy, Amit K. Roy Chowdhury, Ananthram Swami
Abstract Recent research has demonstrated the brittleness of machine learning systems to adversarial perturbations. However, the studies have been mostly limited to perturbations on images and more generally, classification that does not deal with temporally varying inputs. In this paper we ask “Are adversarial perturbations possible in real-time video classification systems and if so, what properties must they satisfy?” Such systems find application in surveillance applications, smart vehicles, and smart elderly care and thus, misclassification could be particularly harmful (e.g., a mishap at an elderly care facility may be missed). We show that accounting for temporal structure is key to generating adversarial examples in such systems. We exploit recent advances in generative adversarial network (GAN) architectures to account for temporal correlations and generate adversarial samples that can cause misclassification rates of over 80% for targeted activities. More importantly, the samples also leave other activities largely unaffected making them extremely stealthy. Finally, we also surprisingly find that in many scenarios, the same perturbation can be applied to every frame in a video clip that makes the adversary’s ability to achieve misclassification relatively easy.
Tasks Video Classification
Published 2018-07-02
URL http://arxiv.org/abs/1807.00458v1
PDF http://arxiv.org/pdf/1807.00458v1.pdf
PWC https://paperswithcode.com/paper/adversarial-perturbations-against-real-time
Repo https://github.com/sli057/Video-Perturbation
Framework tf

Modeling Mistrust in End-of-Life Care

Title Modeling Mistrust in End-of-Life Care
Authors Willie Boag, Harini Suresh, Leo Anthony Celi, Peter Szolovits, Marzyeh Ghassemi
Abstract In this work, we characterize the doctor-patient relationship using a machine learning-derived trust score. We show that this score has statistically significant racial associations, and that by modeling trust directly we find stronger disparities in care than by stratifying on race. We further demonstrate that mistrust is indicative of worse outcomes, but is only weakly associated with physiologically-created severity scores. Finally, we describe sentiment analysis experiments indicating patients with higher levels of mistrust have worse experiences and interactions with their caregivers. This work is a step towards measuring fairer machine learning in the healthcare domain.
Tasks Sentiment Analysis
Published 2018-06-30
URL https://arxiv.org/abs/1807.00124v2
PDF https://arxiv.org/pdf/1807.00124v2.pdf
PWC https://paperswithcode.com/paper/modeling-mistrust-in-end-of-life-care
Repo https://github.com/wboag/eol-mistrust
Framework none

GraphVAE: Towards Generation of Small Graphs Using Variational Autoencoders

Title GraphVAE: Towards Generation of Small Graphs Using Variational Autoencoders
Authors Martin Simonovsky, Nikos Komodakis
Abstract Deep learning on graphs has become a popular research topic with many applications. However, past work has concentrated on learning graph embedding tasks, which is in contrast with advances in generative models for images and text. Is it possible to transfer this progress to the domain of graphs? We propose to sidestep hurdles associated with linearization of such discrete structures by having a decoder output a probabilistic fully-connected graph of a predefined maximum size directly at once. Our method is formulated as a variational autoencoder. We evaluate on the challenging task of molecule generation.
Tasks Graph Embedding
Published 2018-02-09
URL http://arxiv.org/abs/1802.03480v1
PDF http://arxiv.org/pdf/1802.03480v1.pdf
PWC https://paperswithcode.com/paper/graphvae-towards-generation-of-small-graphs
Repo https://github.com/snap-stanford/GraphRNN
Framework pytorch

SOSELETO: A Unified Approach to Transfer Learning and Training with Noisy Labels

Title SOSELETO: A Unified Approach to Transfer Learning and Training with Noisy Labels
Authors Or Litany, Daniel Freedman
Abstract We present SOSELETO (SOurce SELEction for Target Optimization), a new method for exploiting a source dataset to solve a classification problem on a target dataset. SOSELETO is based on the following simple intuition: some source examples are more informative than others for the target problem. To capture this intuition, source samples are each given weights; these weights are solved for jointly with the source and target classification problems via a bilevel optimization scheme. The target therefore gets to choose the source samples which are most informative for its own classification task. Furthermore, the bilevel nature of the optimization acts as a kind of regularization on the target, mitigating overfitting. SOSELETO may be applied to both classic transfer learning, as well as the problem of training on datasets with noisy labels; we show state of the art results on both of these problems.
Tasks bilevel optimization, Transfer Learning
Published 2018-05-24
URL https://arxiv.org/abs/1805.09622v2
PDF https://arxiv.org/pdf/1805.09622v2.pdf
PWC https://paperswithcode.com/paper/soseleto-a-unified-approach-to-transfer
Repo https://github.com/orlitany/SOSELETO
Framework pytorch

Universal Successor Features Approximators

Title Universal Successor Features Approximators
Authors Diana Borsa, André Barreto, John Quan, Daniel Mankowitz, Rémi Munos, Hado van Hasselt, David Silver, Tom Schaul
Abstract The ability of a reinforcement learning (RL) agent to learn about many reward functions at the same time has many potential benefits, such as the decomposition of complex tasks into simpler ones, the exchange of information between tasks, and the reuse of skills. We focus on one aspect in particular, namely the ability to generalise to unseen tasks. Parametric generalisation relies on the interpolation power of a function approximator that is given the task description as input; one of its most common form are universal value function approximators (UVFAs). Another way to generalise to new tasks is to exploit structure in the RL problem itself. Generalised policy improvement (GPI) combines solutions of previous tasks into a policy for the unseen task; this relies on instantaneous policy evaluation of old policies under the new reward function, which is made possible through successor features (SFs). Our proposed universal successor features approximators (USFAs) combine the advantages of all of these, namely the scalability of UVFAs, the instant inference of SFs, and the strong generalisation of GPI. We discuss the challenges involved in training a USFA, its generalisation properties and demonstrate its practical benefits and transfer abilities on a large-scale domain in which the agent has to navigate in a first-person perspective three-dimensional environment.
Tasks
Published 2018-12-18
URL http://arxiv.org/abs/1812.07626v1
PDF http://arxiv.org/pdf/1812.07626v1.pdf
PWC https://paperswithcode.com/paper/universal-successor-features-approximators
Repo https://github.com/Wanqianxn/usfa
Framework pytorch

TicTac: Accelerating Distributed Deep Learning with Communication Scheduling

Title TicTac: Accelerating Distributed Deep Learning with Communication Scheduling
Authors Sayed Hadi Hashemi, Sangeetha Abdu Jyothi, Roy H. Campbell
Abstract State-of-the-art deep learning systems rely on iterative distributed training to tackle the increasing complexity of models and input data. The iteration time in these communication-heavy systems depends on the computation time, communication time and the extent of overlap of computation and communication. In this work, we identify a shortcoming in systems with graph representation for computation, such as TensorFlow and PyTorch, that result in high variance in iteration time — random order of received parameters across workers. We develop a system, TicTac, to improve the iteration time by fixing this issue in distributed deep learning with Parameter Servers while guaranteeing near-optimal overlap of communication and computation. TicTac identifies and enforces an order of network transfers which improves the iteration time using prioritization. Our system is implemented over TensorFlow and requires no changes to the model or developer inputs. TicTac improves the throughput by up to $37.7%$ in inference and $19.2%$ in training, while also reducing straggler effect by up to $2.3\times$. Our code is publicly available.
Tasks
Published 2018-03-08
URL http://arxiv.org/abs/1803.03288v2
PDF http://arxiv.org/pdf/1803.03288v2.pdf
PWC https://paperswithcode.com/paper/tictac-accelerating-distributed-deep-learning
Repo https://github.com/xldrx/tictac
Framework tf

Generating Handwritten Chinese Characters using CycleGAN

Title Generating Handwritten Chinese Characters using CycleGAN
Authors Bo Chang, Qiong Zhang, Shenyi Pan, Lili Meng
Abstract Handwriting of Chinese has long been an important skill in East Asia. However, automatic generation of handwritten Chinese characters poses a great challenge due to the large number of characters. Various machine learning techniques have been used to recognize Chinese characters, but few works have studied the handwritten Chinese character generation problem, especially with unpaired training data. In this work, we formulate the Chinese handwritten character generation as a problem that learns a mapping from an existing printed font to a personalized handwritten style. We further propose DenseNet CycleGAN to generate Chinese handwritten characters. Our method is applied not only to commonly used Chinese characters but also to calligraphy work with aesthetic values. Furthermore, we propose content accuracy and style discrepancy as the evaluation metrics to assess the quality of the handwritten characters generated. We then use our proposed metrics to evaluate the generated characters from CASIA dataset as well as our newly introduced Lanting calligraphy dataset.
Tasks
Published 2018-01-25
URL http://arxiv.org/abs/1801.08624v1
PDF http://arxiv.org/pdf/1801.08624v1.pdf
PWC https://paperswithcode.com/paper/generating-handwritten-chinese-characters
Repo https://github.com/SamuelNguyen1998/Vietnamese_Handwriting_Recognition
Framework tf

Triplet-based Deep Similarity Learning for Person Re-Identification

Title Triplet-based Deep Similarity Learning for Person Re-Identification
Authors Wentong Liao, Michael Ying Yang, Ni Zhan, Bodo Rosenhahn
Abstract In recent years, person re-identification (re-id) catches great attention in both computer vision community and industry. In this paper, we propose a new framework for person re-identification with a triplet-based deep similarity learning using convolutional neural networks (CNNs). The network is trained with triplet input: two of them have the same class labels and the other one is different. It aims to learn the deep feature representation, with which the distance within the same class is decreased, while the distance between the different classes is increased as much as possible. Moreover, we trained the model jointly on six different datasets, which differs from common practice - one model is just trained on one dataset and tested also on the same one. However, the enormous number of possible triplet data among the large number of training samples makes the training impossible. To address this challenge, a double-sampling scheme is proposed to generate triplets of images as effective as possible. The proposed framework is evaluated on several benchmark datasets. The experimental results show that, our method is effective for the task of person re-identification and it is comparable or even outperforms the state-of-the-art methods.
Tasks Person Re-Identification
Published 2018-02-09
URL http://arxiv.org/abs/1802.03254v1
PDF http://arxiv.org/pdf/1802.03254v1.pdf
PWC https://paperswithcode.com/paper/triplet-based-deep-similarity-learning-for
Repo https://github.com/ssahn3087/pedestrian_detection
Framework pytorch

Efficient Dialog Policy Learning via Positive Memory Retention

Title Efficient Dialog Policy Learning via Positive Memory Retention
Authors Rui Zhao, Volker Tresp
Abstract This paper is concerned with the training of recurrent neural networks as goal-oriented dialog agents using reinforcement learning. Training such agents with policy gradients typically requires a large amount of samples. However, the collection of the required data in form of conversations between chat-bots and human agents is time-consuming and expensive. To mitigate this problem, we describe an efficient policy gradient method using positive memory retention, which significantly increases the sample-efficiency. We show that our method is 10 times more sample-efficient than policy gradients in extensive experiments on a new synthetic number guessing game. Moreover, in a real-word visual object discovery game, the proposed method is twice as sample-efficient as policy gradients and shows state-of-the-art performance.
Tasks Goal-Oriented Dialog
Published 2018-10-02
URL http://arxiv.org/abs/1810.01371v2
PDF http://arxiv.org/pdf/1810.01371v2.pdf
PWC https://paperswithcode.com/paper/efficient-dialog-policy-learning-via-positive
Repo https://github.com/ruizhaogit/PositiveMemoryRetention
Framework pytorch

Projective Splitting with Forward Steps: Asynchronous and Block-Iterative Operator Splitting

Title Projective Splitting with Forward Steps: Asynchronous and Block-Iterative Operator Splitting
Authors Patrick R. Johnstone, Jonathan Eckstein
Abstract This work is concerned with the classical problem of finding a zero of a sum of maximal monotone operators. For the projective splitting framework recently proposed by Combettes and Eckstein, we show how to replace the fundamental subproblem calculation using a backward step with one based on two forward steps. The resulting algorithms have the same kind of coordination procedure and can be implemented in the same block-iterative and potentially distributed and asynchronous manner, but may perform backward steps on some operators and forward steps on others. Prior algorithms in the projective splitting family have used only backward steps. Forward steps can be used for any Lipschitz-continuous operators provided the stepsize is bounded by the inverse of the Lipschitz constant. If the Lipschitz constant is unknown, a simple backtracking linesearch procedure may be used. For affine operators, the stepsize can be chosen adaptively without knowledge of the Lipschitz constant and without any additional forward steps. We close the paper by empirically studying the performance of several kinds of splitting algorithms on the lasso problem.
Tasks
Published 2018-03-19
URL http://arxiv.org/abs/1803.07043v6
PDF http://arxiv.org/pdf/1803.07043v6.pdf
PWC https://paperswithcode.com/paper/projective-splitting-with-forward-steps-1
Repo https://github.com/1austrartsua1/proj_split_pub
Framework none

MD-GAN: Multi-Discriminator Generative Adversarial Networks for Distributed Datasets

Title MD-GAN: Multi-Discriminator Generative Adversarial Networks for Distributed Datasets
Authors Corentin Hardy, Erwan Le Merrer, Bruno Sericola
Abstract A recent technical breakthrough in the domain of machine learning is the discovery and the multiple applications of Generative Adversarial Networks (GANs). Those generative models are computationally demanding, as a GAN is composed of two deep neural networks, and because it trains on large datasets. A GAN is generally trained on a single server. In this paper, we address the problem of distributing GANs so that they are able to train over datasets that are spread on multiple workers. MD-GAN is exposed as the first solution for this problem: we propose a novel learning procedure for GANs so that they fit this distributed setup. We then compare the performance of MD-GAN to an adapted version of Federated Learning to GANs, using the MNIST and CIFAR10 datasets. MD-GAN exhibits a reduction by a factor of two of the learning complexity on each worker node, while providing better performances than federated learning on both datasets. We finally discuss the practical implications of distributing GANs.
Tasks
Published 2018-11-09
URL http://arxiv.org/abs/1811.03850v2
PDF http://arxiv.org/pdf/1811.03850v2.pdf
PWC https://paperswithcode.com/paper/md-gan-multi-discriminator-generative
Repo https://github.com/bbondd/DistributedGAN
Framework tf

Self-similarity Grouping: A Simple Unsupervised Cross Domain Adaptation Approach for Person Re-identification

Title Self-similarity Grouping: A Simple Unsupervised Cross Domain Adaptation Approach for Person Re-identification
Authors Yang Fu, Yunchao Wei, Guanshuo Wang, Yuqian Zhou, Honghui Shi, Thomas Huang
Abstract Domain adaptation in person re-identification (re-ID) has always been a challenging task. In this work, we explore how to harness the natural similar characteristics existing in the samples from the target domain for learning to conduct person re-ID in an unsupervised manner. Concretely, we propose a Self-similarity Grouping (SSG) approach, which exploits the potential similarity (from global body to local parts) of unlabeled samples to automatically build multiple clusters from different views. These independent clusters are then assigned with labels, which serve as the pseudo identities to supervise the training process. We repeatedly and alternatively conduct such a grouping and training process until the model is stable. Despite the apparent simplify, our SSG outperforms the state-of-the-arts by more than 4.6% (DukeMTMC to Market1501) and 4.4% (Market1501 to DukeMTMC) in mAP, respectively. Upon our SSG, we further introduce a clustering-guided semisupervised approach named SSG ++ to conduct the one-shot domain adaption in an open set setting (i.e. the number of independent identities from the target domain is unknown). Without spending much effort on labeling, our SSG ++ can further promote the mAP upon SSG by 10.7% and 6.9%, respectively. Our Code is available at: https://github.com/OasisYang/SSG .
Tasks Domain Adaptation, One-Shot Learning, Person Re-Identification, Unsupervised Domain Adaptation
Published 2018-11-26
URL https://arxiv.org/abs/1811.10144v3
PDF https://arxiv.org/pdf/1811.10144v3.pdf
PWC https://paperswithcode.com/paper/one-shot-domain-adaptation-for-person-re
Repo https://github.com/OasisYang/SSG
Framework pytorch
comments powered by Disqus