October 21, 2019

3105 words 15 mins read

Paper Group AWR 143

D3D: Distilled 3D Networks for Video Action Recognition. DUGMA: Dynamic Uncertainty-Based Gaussian Mixture Alignment. Detecting Offensive Language in Tweets Using Deep Learning. Adversarial Perturbations Against Real-Time Video Classification Systems. Modeling Mistrust in End-of-Life Care. GraphVAE: Towards Generation of Small Graphs Using Variatio …

D3D: Distilled 3D Networks for Video Action Recognition


Title	D3D: Distilled 3D Networks for Video Action Recognition
Authors	Jonathan C. Stroud, David A. Ross, Chen Sun, Jia Deng, Rahul Sukthankar
Abstract	State-of-the-art methods for video action recognition commonly use an ensemble of two networks: the spatial stream, which takes RGB frames as input, and the temporal stream, which takes optical flow as input. In recent work, both of these streams consist of 3D Convolutional Neural Networks, which apply spatiotemporal filters to the video clip before performing classification. Conceptually, the temporal filters should allow the spatial stream to learn motion representations, making the temporal stream redundant. However, we still see significant benefits in action recognition performance by including an entirely separate temporal stream, indicating that the spatial stream is “missing” some of the signal captured by the temporal stream. In this work, we first investigate whether motion representations are indeed missing in the spatial stream of 3D CNNs. Second, we demonstrate that these motion representations can be improved by distillation, by tuning the spatial stream to predict the outputs of the temporal stream, effectively combining both models into a single stream. Finally, we show that our Distilled 3D Network (D3D) achieves performance on par with two-stream approaches, using only a single model and with no need to compute optical flow.
Tasks	Optical Flow Estimation, Temporal Action Localization
Published	2018-12-19
URL	http://arxiv.org/abs/1812.08249v2
PDF	http://arxiv.org/pdf/1812.08249v2.pdf
PWC	https://paperswithcode.com/paper/d3d-distilled-3d-networks-for-video-action
Repo	https://github.com/princeton-vl/d3dhelper
Framework	tf

DUGMA: Dynamic Uncertainty-Based Gaussian Mixture Alignment


Title	DUGMA: Dynamic Uncertainty-Based Gaussian Mixture Alignment
Authors	Can Pu, Nanbo Li, Radim Tylecek, Robert B Fisher
Abstract	Registering accurately point clouds from a cheap low-resolution sensor is a challenging task. Existing rigid registration methods failed to use the physical 3D uncertainty distribution of each point from a real sensor in the dynamic alignment process mainly because the uncertainty model for a point is static and invariant and it is hard to describe the change of these physical uncertainty models in the registration process. Additionally, the existing Gaussian mixture alignment architecture cannot be efficiently implement these dynamic changes. This paper proposes a simple architecture combining error estimation from sample covariances and dual dynamic global probability alignment using the convolution of uncertainty-based Gaussian Mixture Models (GMM) from point clouds. Firstly, we propose an efficient way to describe the change of each 3D uncertainty model, which represents the structure of the point cloud much better. Unlike the invariant GMM (representing a fixed point cloud) in traditional Gaussian mixture alignment, we use two uncertainty-based GMMs that change and interact with each other in each iteration. In order to have a wider basin of convergence than other local algorithms, we design a more robust energy function by convolving efficiently the two GMMs over the whole 3D space. Tens of thousands of trials have been conducted on hundreds of models from multiple datasets to demonstrate the proposed method’s superior performance compared with the current state-of-the-art methods. The new dataset and code is available from https://github.com/Canpu999
Tasks
Published	2018-03-18
URL	http://arxiv.org/abs/1803.07426v2
PDF	http://arxiv.org/pdf/1803.07426v2.pdf
PWC	https://paperswithcode.com/paper/dugma-dynamic-uncertainty-based-gaussian
Repo	https://github.com/Canpu999/DUGMA
Framework	none

Detecting Offensive Language in Tweets Using Deep Learning


Title	Detecting Offensive Language in Tweets Using Deep Learning
Authors	Georgios K. Pitsilis, Heri Ramampiaro, Helge Langseth
Abstract	This paper addresses the important problem of discerning hateful content in social media. We propose a detection scheme that is an ensemble of Recurrent Neural Network (RNN) classifiers, and it incorporates various features associated with user-related information, such as the users’ tendency towards racism or sexism. These data are fed as input to the above classifiers along with the word frequency vectors derived from the textual content. Our approach has been evaluated on a publicly available corpus of 16k tweets, and the results demonstrate its effectiveness in comparison to existing state of the art solutions. More specifically, our scheme can successfully distinguish racism and sexism messages from normal text, and achieve higher classification quality than current state-of-the-art algorithms.
Tasks
Published	2018-01-13
URL	http://arxiv.org/abs/1801.04433v1
PDF	http://arxiv.org/pdf/1801.04433v1.pdf
PWC	https://paperswithcode.com/paper/detecting-offensive-language-in-tweets-using
Repo	https://github.com/gpitsilis/hate-speech
Framework	none

Adversarial Perturbations Against Real-Time Video Classification Systems


Title	Adversarial Perturbations Against Real-Time Video Classification Systems
Authors	Shasha Li, Ajaya Neupane, Sujoy Paul, Chengyu Song, Srikanth V. Krishnamurthy, Amit K. Roy Chowdhury, Ananthram Swami
Abstract	Recent research has demonstrated the brittleness of machine learning systems to adversarial perturbations. However, the studies have been mostly limited to perturbations on images and more generally, classification that does not deal with temporally varying inputs. In this paper we ask “Are adversarial perturbations possible in real-time video classification systems and if so, what properties must they satisfy?” Such systems find application in surveillance applications, smart vehicles, and smart elderly care and thus, misclassification could be particularly harmful (e.g., a mishap at an elderly care facility may be missed). We show that accounting for temporal structure is key to generating adversarial examples in such systems. We exploit recent advances in generative adversarial network (GAN) architectures to account for temporal correlations and generate adversarial samples that can cause misclassification rates of over 80% for targeted activities. More importantly, the samples also leave other activities largely unaffected making them extremely stealthy. Finally, we also surprisingly find that in many scenarios, the same perturbation can be applied to every frame in a video clip that makes the adversary’s ability to achieve misclassification relatively easy.
Tasks	Video Classification
Published	2018-07-02
URL	http://arxiv.org/abs/1807.00458v1
PDF	http://arxiv.org/pdf/1807.00458v1.pdf
PWC	https://paperswithcode.com/paper/adversarial-perturbations-against-real-time
Repo	https://github.com/sli057/Video-Perturbation
Framework	tf

Modeling Mistrust in End-of-Life Care


Title	Modeling Mistrust in End-of-Life Care
Authors	Willie Boag, Harini Suresh, Leo Anthony Celi, Peter Szolovits, Marzyeh Ghassemi
Abstract	In this work, we characterize the doctor-patient relationship using a machine learning-derived trust score. We show that this score has statistically significant racial associations, and that by modeling trust directly we find stronger disparities in care than by stratifying on race. We further demonstrate that mistrust is indicative of worse outcomes, but is only weakly associated with physiologically-created severity scores. Finally, we describe sentiment analysis experiments indicating patients with higher levels of mistrust have worse experiences and interactions with their caregivers. This work is a step towards measuring fairer machine learning in the healthcare domain.
Tasks	Sentiment Analysis
Published	2018-06-30
URL	https://arxiv.org/abs/1807.00124v2
PDF	https://arxiv.org/pdf/1807.00124v2.pdf
PWC	https://paperswithcode.com/paper/modeling-mistrust-in-end-of-life-care
Repo	https://github.com/wboag/eol-mistrust
Framework	none

GraphVAE: Towards Generation of Small Graphs Using Variational Autoencoders


Title	GraphVAE: Towards Generation of Small Graphs Using Variational Autoencoders
Authors	Martin Simonovsky, Nikos Komodakis
Abstract	Deep learning on graphs has become a popular research topic with many applications. However, past work has concentrated on learning graph embedding tasks, which is in contrast with advances in generative models for images and text. Is it possible to transfer this progress to the domain of graphs? We propose to sidestep hurdles associated with linearization of such discrete structures by having a decoder output a probabilistic fully-connected graph of a predefined maximum size directly at once. Our method is formulated as a variational autoencoder. We evaluate on the challenging task of molecule generation.
Tasks	Graph Embedding
Published	2018-02-09
URL	http://arxiv.org/abs/1802.03480v1
PDF	http://arxiv.org/pdf/1802.03480v1.pdf
PWC	https://paperswithcode.com/paper/graphvae-towards-generation-of-small-graphs
Repo	https://github.com/snap-stanford/GraphRNN
Framework	pytorch

SOSELETO: A Unified Approach to Transfer Learning and Training with Noisy Labels


Title	SOSELETO: A Unified Approach to Transfer Learning and Training with Noisy Labels
Authors	Or Litany, Daniel Freedman
Abstract	We present SOSELETO (SOurce SELEction for Target Optimization), a new method for exploiting a source dataset to solve a classification problem on a target dataset. SOSELETO is based on the following simple intuition: some source examples are more informative than others for the target problem. To capture this intuition, source samples are each given weights; these weights are solved for jointly with the source and target classification problems via a bilevel optimization scheme. The target therefore gets to choose the source samples which are most informative for its own classification task. Furthermore, the bilevel nature of the optimization acts as a kind of regularization on the target, mitigating overfitting. SOSELETO may be applied to both classic transfer learning, as well as the problem of training on datasets with noisy labels; we show state of the art results on both of these problems.
Tasks	bilevel optimization, Transfer Learning
Published	2018-05-24
URL	https://arxiv.org/abs/1805.09622v2
PDF	https://arxiv.org/pdf/1805.09622v2.pdf
PWC	https://paperswithcode.com/paper/soseleto-a-unified-approach-to-transfer
Repo	https://github.com/orlitany/SOSELETO
Framework	pytorch

Universal Successor Features Approximators


Title	Universal Successor Features Approximators
Authors	Diana Borsa, André Barreto, John Quan, Daniel Mankowitz, Rémi Munos, Hado van Hasselt, David Silver, Tom Schaul
Abstract	The ability of a reinforcement learning (RL) agent to learn about many reward functions at the same time has many potential benefits, such as the decomposition of complex tasks into simpler ones, the exchange of information between tasks, and the reuse of skills. We focus on one aspect in particular, namely the ability to generalise to unseen tasks. Parametric generalisation relies on the interpolation power of a function approximator that is given the task description as input; one of its most common form are universal value function approximators (UVFAs). Another way to generalise to new tasks is to exploit structure in the RL problem itself. Generalised policy improvement (GPI) combines solutions of previous tasks into a policy for the unseen task; this relies on instantaneous policy evaluation of old policies under the new reward function, which is made possible through successor features (SFs). Our proposed universal successor features approximators (USFAs) combine the advantages of all of these, namely the scalability of UVFAs, the instant inference of SFs, and the strong generalisation of GPI. We discuss the challenges involved in training a USFA, its generalisation properties and demonstrate its practical benefits and transfer abilities on a large-scale domain in which the agent has to navigate in a first-person perspective three-dimensional environment.
Tasks
Published	2018-12-18
URL	http://arxiv.org/abs/1812.07626v1
PDF	http://arxiv.org/pdf/1812.07626v1.pdf
PWC	https://paperswithcode.com/paper/universal-successor-features-approximators
Repo	https://github.com/Wanqianxn/usfa
Framework	pytorch

TicTac: Accelerating Distributed Deep Learning with Communication Scheduling


Title	TicTac: Accelerating Distributed Deep Learning with Communication Scheduling
Authors	Sayed Hadi Hashemi, Sangeetha Abdu Jyothi, Roy H. Campbell
Abstract	State-of-the-art deep learning systems rely on iterative distributed training to tackle the increasing complexity of models and input data. The iteration time in these communication-heavy systems depends on the computation time, communication time and the extent of overlap of computation and communication. In this work, we identify a shortcoming in systems with graph representation for computation, such as TensorFlow and PyTorch, that result in high variance in iteration time — random order of received parameters across workers. We develop a system, TicTac, to improve the iteration time by fixing this issue in distributed deep learning with Parameter Servers while guaranteeing near-optimal overlap of communication and computation. TicTac identifies and enforces an order of network transfers which improves the iteration time using prioritization. Our system is implemented over TensorFlow and requires no changes to the model or developer inputs. TicTac improves the throughput by up to $37.7%$ in inference and $19.2%$ in training, while also reducing straggler effect by up to $2.3\times$. Our code is publicly available.
Tasks
Published	2018-03-08
URL	http://arxiv.org/abs/1803.03288v2
PDF	http://arxiv.org/pdf/1803.03288v2.pdf
PWC	https://paperswithcode.com/paper/tictac-accelerating-distributed-deep-learning
Repo	https://github.com/xldrx/tictac
Framework	tf

Generating Handwritten Chinese Characters using CycleGAN


Title	Generating Handwritten Chinese Characters using CycleGAN
Authors	Bo Chang, Qiong Zhang, Shenyi Pan, Lili Meng
Abstract	Handwriting of Chinese has long been an important skill in East Asia. However, automatic generation of handwritten Chinese characters poses a great challenge due to the large number of characters. Various machine learning techniques have been used to recognize Chinese characters, but few works have studied the handwritten Chinese character generation problem, especially with unpaired training data. In this work, we formulate the Chinese handwritten character generation as a problem that learns a mapping from an existing printed font to a personalized handwritten style. We further propose DenseNet CycleGAN to generate Chinese handwritten characters. Our method is applied not only to commonly used Chinese characters but also to calligraphy work with aesthetic values. Furthermore, we propose content accuracy and style discrepancy as the evaluation metrics to assess the quality of the handwritten characters generated. We then use our proposed metrics to evaluate the generated characters from CASIA dataset as well as our newly introduced Lanting calligraphy dataset.
Tasks
Published	2018-01-25
URL	http://arxiv.org/abs/1801.08624v1
PDF	http://arxiv.org/pdf/1801.08624v1.pdf
PWC	https://paperswithcode.com/paper/generating-handwritten-chinese-characters
Repo	https://github.com/SamuelNguyen1998/Vietnamese_Handwriting_Recognition
Framework	tf

Triplet-based Deep Similarity Learning for Person Re-Identification


Title	Triplet-based Deep Similarity Learning for Person Re-Identification
Authors	Wentong Liao, Michael Ying Yang, Ni Zhan, Bodo Rosenhahn
Abstract	In recent years, person re-identification (re-id) catches great attention in both computer vision community and industry. In this paper, we propose a new framework for person re-identification with a triplet-based deep similarity learning using convolutional neural networks (CNNs). The network is trained with triplet input: two of them have the same class labels and the other one is different. It aims to learn the deep feature representation, with which the distance within the same class is decreased, while the distance between the different classes is increased as much as possible. Moreover, we trained the model jointly on six different datasets, which differs from common practice - one model is just trained on one dataset and tested also on the same one. However, the enormous number of possible triplet data among the large number of training samples makes the training impossible. To address this challenge, a double-sampling scheme is proposed to generate triplets of images as effective as possible. The proposed framework is evaluated on several benchmark datasets. The experimental results show that, our method is effective for the task of person re-identification and it is comparable or even outperforms the state-of-the-art methods.
Tasks	Person Re-Identification
Published	2018-02-09
URL	http://arxiv.org/abs/1802.03254v1
PDF	http://arxiv.org/pdf/1802.03254v1.pdf
PWC	https://paperswithcode.com/paper/triplet-based-deep-similarity-learning-for
Repo	https://github.com/ssahn3087/pedestrian_detection
Framework	pytorch

Efficient Dialog Policy Learning via Positive Memory Retention


Title	Efficient Dialog Policy Learning via Positive Memory Retention
Authors	Rui Zhao, Volker Tresp
Abstract	This paper is concerned with the training of recurrent neural networks as goal-oriented dialog agents using reinforcement learning. Training such agents with policy gradients typically requires a large amount of samples. However, the collection of the required data in form of conversations between chat-bots and human agents is time-consuming and expensive. To mitigate this problem, we describe an efficient policy gradient method using positive memory retention, which significantly increases the sample-efficiency. We show that our method is 10 times more sample-efficient than policy gradients in extensive experiments on a new synthetic number guessing game. Moreover, in a real-word visual object discovery game, the proposed method is twice as sample-efficient as policy gradients and shows state-of-the-art performance.
Tasks	Goal-Oriented Dialog
Published	2018-10-02
URL	http://arxiv.org/abs/1810.01371v2
PDF	http://arxiv.org/pdf/1810.01371v2.pdf
PWC	https://paperswithcode.com/paper/efficient-dialog-policy-learning-via-positive
Repo	https://github.com/ruizhaogit/PositiveMemoryRetention
Framework	pytorch

Projective Splitting with Forward Steps: Asynchronous and Block-Iterative Operator Splitting


Title	Projective Splitting with Forward Steps: Asynchronous and Block-Iterative Operator Splitting
Authors	Patrick R. Johnstone, Jonathan Eckstein
Abstract	This work is concerned with the classical problem of finding a zero of a sum of maximal monotone operators. For the projective splitting framework recently proposed by Combettes and Eckstein, we show how to replace the fundamental subproblem calculation using a backward step with one based on two forward steps. The resulting algorithms have the same kind of coordination procedure and can be implemented in the same block-iterative and potentially distributed and asynchronous manner, but may perform backward steps on some operators and forward steps on others. Prior algorithms in the projective splitting family have used only backward steps. Forward steps can be used for any Lipschitz-continuous operators provided the stepsize is bounded by the inverse of the Lipschitz constant. If the Lipschitz constant is unknown, a simple backtracking linesearch procedure may be used. For affine operators, the stepsize can be chosen adaptively without knowledge of the Lipschitz constant and without any additional forward steps. We close the paper by empirically studying the performance of several kinds of splitting algorithms on the lasso problem.
Tasks
Published	2018-03-19
URL	http://arxiv.org/abs/1803.07043v6
PDF	http://arxiv.org/pdf/1803.07043v6.pdf
PWC	https://paperswithcode.com/paper/projective-splitting-with-forward-steps-1
Repo	https://github.com/1austrartsua1/proj_split_pub
Framework	none

MD-GAN: Multi-Discriminator Generative Adversarial Networks for Distributed Datasets


Title	MD-GAN: Multi-Discriminator Generative Adversarial Networks for Distributed Datasets
Authors	Corentin Hardy, Erwan Le Merrer, Bruno Sericola
Abstract	A recent technical breakthrough in the domain of machine learning is the discovery and the multiple applications of Generative Adversarial Networks (GANs). Those generative models are computationally demanding, as a GAN is composed of two deep neural networks, and because it trains on large datasets. A GAN is generally trained on a single server. In this paper, we address the problem of distributing GANs so that they are able to train over datasets that are spread on multiple workers. MD-GAN is exposed as the first solution for this problem: we propose a novel learning procedure for GANs so that they fit this distributed setup. We then compare the performance of MD-GAN to an adapted version of Federated Learning to GANs, using the MNIST and CIFAR10 datasets. MD-GAN exhibits a reduction by a factor of two of the learning complexity on each worker node, while providing better performances than federated learning on both datasets. We finally discuss the practical implications of distributing GANs.
Tasks
Published	2018-11-09
URL	http://arxiv.org/abs/1811.03850v2
PDF	http://arxiv.org/pdf/1811.03850v2.pdf
PWC	https://paperswithcode.com/paper/md-gan-multi-discriminator-generative
Repo	https://github.com/bbondd/DistributedGAN
Framework	tf

Self-similarity Grouping: A Simple Unsupervised Cross Domain Adaptation Approach for Person Re-identification


Title	Self-similarity Grouping: A Simple Unsupervised Cross Domain Adaptation Approach for Person Re-identification
Authors	Yang Fu, Yunchao Wei, Guanshuo Wang, Yuqian Zhou, Honghui Shi, Thomas Huang
Abstract	Domain adaptation in person re-identification (re-ID) has always been a challenging task. In this work, we explore how to harness the natural similar characteristics existing in the samples from the target domain for learning to conduct person re-ID in an unsupervised manner. Concretely, we propose a Self-similarity Grouping (SSG) approach, which exploits the potential similarity (from global body to local parts) of unlabeled samples to automatically build multiple clusters from different views. These independent clusters are then assigned with labels, which serve as the pseudo identities to supervise the training process. We repeatedly and alternatively conduct such a grouping and training process until the model is stable. Despite the apparent simplify, our SSG outperforms the state-of-the-arts by more than 4.6% (DukeMTMC to Market1501) and 4.4% (Market1501 to DukeMTMC) in mAP, respectively. Upon our SSG, we further introduce a clustering-guided semisupervised approach named SSG ++ to conduct the one-shot domain adaption in an open set setting (i.e. the number of independent identities from the target domain is unknown). Without spending much effort on labeling, our SSG ++ can further promote the mAP upon SSG by 10.7% and 6.9%, respectively. Our Code is available at: https://github.com/OasisYang/SSG .
Tasks	Domain Adaptation, One-Shot Learning, Person Re-Identification, Unsupervised Domain Adaptation
Published	2018-11-26
URL	https://arxiv.org/abs/1811.10144v3
PDF	https://arxiv.org/pdf/1811.10144v3.pdf
PWC	https://paperswithcode.com/paper/one-shot-domain-adaptation-for-person-re
Repo	https://github.com/OasisYang/SSG
Framework	pytorch