April 3, 2020

2971 words 14 mins read

Paper Group AWR 23

Multi-Agent Interactions Modeling with Correlated Policies. Efficient Facial Feature Learning with Wide Ensemble-based Convolutional Neural Networks. MarginDistillation: distillation for margin-based softmax. GeoDA: a geometric framework for black-box adversarial attacks. Data Augmentation using Pre-trained Transformer Models. Interpretable End-to- …

Multi-Agent Interactions Modeling with Correlated Policies


Title	Multi-Agent Interactions Modeling with Correlated Policies
Authors	Minghuan Liu, Ming Zhou, Weinan Zhang, Yuzheng Zhuang, Jun Wang, Wulong Liu, Yong Yu
Abstract	In multi-agent systems, complex interacting behaviors arise due to the high correlations among agents. However, previous work on modeling multi-agent interactions from demonstrations is primarily constrained by assuming the independence among policies and their reward structures. In this paper, we cast the multi-agent interactions modeling problem into a multi-agent imitation learning framework with explicit modeling of correlated policies by approximating opponents’ policies, which can recover agents’ policies that can regenerate similar interactions. Consequently, we develop a Decentralized Adversarial Imitation Learning algorithm with Correlated policies (CoDAIL), which allows for decentralized training and execution. Various experiments demonstrate that CoDAIL can better regenerate complex interactions close to the demonstrators and outperforms state-of-the-art multi-agent imitation learning methods. Our code is available at \url{https://github.com/apexrl/CoDAIL}.
Tasks	Imitation Learning
Published	2020-01-04
URL	https://arxiv.org/abs/2001.03415v2
PDF	https://arxiv.org/pdf/2001.03415v2.pdf
PWC	https://paperswithcode.com/paper/multi-agent-interactions-modeling-with-1
Repo	https://github.com/apexrl/CoDAIL
Framework	none

Efficient Facial Feature Learning with Wide Ensemble-based Convolutional Neural Networks


Title	Efficient Facial Feature Learning with Wide Ensemble-based Convolutional Neural Networks
Authors	Henrique Siqueira, Sven Magg, Stefan Wermter
Abstract	Ensemble methods, traditionally built with independently trained de-correlated models, have proven to be efficient methods for reducing the remaining residual generalization error, which results in robust and accurate methods for real-world applications. In the context of deep learning, however, training an ensemble of deep networks is costly and generates high redundancy which is inefficient. In this paper, we present experiments on Ensembles with Shared Representations (ESRs) based on convolutional networks to demonstrate, quantitatively and qualitatively, their data processing efficiency and scalability to large-scale datasets of facial expressions. We show that redundancy and computational load can be dramatically reduced by varying the branching level of the ESR without loss of diversity and generalization power, which are both important for ensemble performance. Experiments on large-scale datasets suggest that ESRs reduce the remaining residual generalization error on the AffectNet and FER+ datasets, reach human-level performance, and outperform state-of-the-art methods on facial expression recognition in the wild using emotion and affect concepts.
Tasks	Facial Expression Recognition
Published	2020-01-17
URL	https://arxiv.org/abs/2001.06338v1
PDF	https://arxiv.org/pdf/2001.06338v1.pdf
PWC	https://paperswithcode.com/paper/efficient-facial-feature-learning-with-wide
Repo	https://github.com/siqueira-hc/Efficient-Facial-Feature-Learning-with-Wide-Ensemble-based-Convolutional-Neural-Networks
Framework	pytorch

MarginDistillation: distillation for margin-based softmax


Title	MarginDistillation: distillation for margin-based softmax
Authors	David Svitov, Sergey Alyamkin
Abstract	The usage of convolutional neural networks (CNNs) in conjunction with a margin-based softmax approach demonstrates a state-of-the-art performance for the face recognition problem. Recently, lightweight neural network models trained with the margin-based softmax have been introduced for the face identification task for edge devices. In this paper, we propose a novel distillation method for lightweight neural network architectures that outperforms other known methods for the face recognition task on LFW, AgeDB-30 and Megaface datasets. The idea of the proposed method is to use class centers from the teacher network for the student network. Then the student network is trained to get the same angles between the class centers and the face embeddings, predicted by the teacher network.
Tasks	Face Identification, Face Recognition
Published	2020-03-05
URL	https://arxiv.org/abs/2003.02586v1
PDF	https://arxiv.org/pdf/2003.02586v1.pdf
PWC	https://paperswithcode.com/paper/margindistillation-distillation-for-margin
Repo	https://github.com/david-svitov/margindistillation
Framework	none

GeoDA: a geometric framework for black-box adversarial attacks


Title	GeoDA: a geometric framework for black-box adversarial attacks
Authors	Ali Rahmati, Seyed-Mohsen Moosavi-Dezfooli, Pascal Frossard, Huaiyu Dai
Abstract	Adversarial examples are known as carefully perturbed images fooling image classifiers. We propose a geometric framework to generate adversarial examples in one of the most challenging black-box settings where the adversary can only generate a small number of queries, each of them returning the top-$1$ label of the classifier. Our framework is based on the observation that the decision boundary of deep networks usually has a small mean curvature in the vicinity of data samples. We propose an effective iterative algorithm to generate query-efficient black-box perturbations with small $\ell_p$ norms for $p \ge 1$, which is confirmed via experimental evaluations on state-of-the-art natural image classifiers. Moreover, for $p=2$, we theoretically show that our algorithm actually converges to the minimal $\ell_2$-perturbation when the curvature of the decision boundary is bounded. We also obtain the optimal distribution of the queries over the iterations of the algorithm. Finally, experimental results confirm that our principled black-box attack algorithm performs better than state-of-the-art algorithms as it generates smaller perturbations with a reduced number of queries.
Tasks
Published	2020-03-13
URL	https://arxiv.org/abs/2003.06468v1
PDF	https://arxiv.org/pdf/2003.06468v1.pdf
PWC	https://paperswithcode.com/paper/geoda-a-geometric-framework-for-black-box
Repo	https://github.com/thisisalirah/GeoDA
Framework	pytorch

Data Augmentation using Pre-trained Transformer Models


Title	Data Augmentation using Pre-trained Transformer Models
Authors	Varun Kumar, Ashutosh Choudhary, Eunah Cho
Abstract	Language model based pre-trained models such as BERT have provided significant gains across different NLP tasks. In this paper, we study different types of pre-trained transformer based models such as auto-regressive models (GPT-2), auto-encoder models (BERT), and seq2seq models (BART) for conditional data augmentation. We show that prepending the class labels to text sequences provides a simple yet effective way to condition the pre-trained models for data augmentation. On three classification benchmarks, pre-trained Seq2Seq model outperforms other models. Further, we explore how different pre-trained model based data augmentation differs in-terms of data diversity, and how well such methods preserve the class-label information.
Tasks	Data Augmentation, Language Modelling
Published	2020-03-04
URL	https://arxiv.org/abs/2003.02245v1
PDF	https://arxiv.org/pdf/2003.02245v1.pdf
PWC	https://paperswithcode.com/paper/data-augmentation-using-pre-trained
Repo	https://github.com/varinf/TransformersDataAugmentation
Framework	none

Interpretable End-to-end Urban Autonomous Driving with Latent Deep Reinforcement Learning


Title	Interpretable End-to-end Urban Autonomous Driving with Latent Deep Reinforcement Learning
Authors	Jianyu Chen, Shengbo Eben Li, Masayoshi Tomizuka
Abstract	Unlike popular modularized framework, end-to-end autonomous driving seeks to solve the perception, decision and control problems in an integrated way, which can be more adapting to new scenarios and easier to generalize at scale. However, existing end-to-end approaches are often lack of interpretability, and can only deal with simple driving tasks like lane keeping. In this paper, we propose an interpretable deep reinforcement learning method for end-to-end autonomous driving, which is able to handle complex urban scenarios. A sequential latent environment model is introduced and learned jointly with the reinforcement learning process. With this latent model, a semantic birdeye mask can be generated, which is enforced to connect with a certain intermediate property in today’s modularized framework for the purpose of explaining the behaviors of learned policy. The latent space also significantly reduces the sample complexity of reinforcement learning. Comparison tests with a simulated autonomous car in CARLA show that the performance of our method in urban scenarios with crowded surrounding vehicles dominates many baselines including DQN, DDPG, TD3 and SAC. Moreover, through masked outputs, the learned policy is able to provide a better explanation of how the car reasons about the driving environment. The codes and videos of this work are available at our github repo and project website.
Tasks	Autonomous Driving
Published	2020-01-23
URL	https://arxiv.org/abs/2001.08726v2
PDF	https://arxiv.org/pdf/2001.08726v2.pdf
PWC	https://paperswithcode.com/paper/interpretable-end-to-end-urban-autonomous
Repo	https://github.com/cjy1992/interp-e2e-driving
Framework	tf

Implementation of the VBM3D Video Denoising Method and Some Variants


Title	Implementation of the VBM3D Video Denoising Method and Some Variants
Authors	Thibaud Ehret, Pablo Arias
Abstract	VBM3D is an extension to video of the well known image denoising algorithm BM3D, which takes advantage of the sparse representation of stacks of similar patches in a transform domain. The extension is rather straightforward: the similar 2D patches are taken from a spatio-temporal neighborhood which includes neighboring frames. In spite of its simplicity, the algorithm offers a good trade-off between denoising performance and computational complexity. In this work we revisit this method, providing an open-source C++ implementation reproducing the results. A detailed description is given and the choice of parameters is thoroughly discussed. Furthermore, we discuss several extensions of the original algorithm: (1) a multi-scale implementation, (2) the use of 3D patches, (3) the use of optical flow to guide the patch search. These extensions allow to obtain results which are competitive with even the most recent state of the art.
Tasks	Denoising, Image Denoising, Optical Flow Estimation, Video Denoising
Published	2020-01-06
URL	https://arxiv.org/abs/2001.01802v1
PDF	https://arxiv.org/pdf/2001.01802v1.pdf
PWC	https://paperswithcode.com/paper/implementation-of-the-vbm3d-video-denoising
Repo	https://github.com/tehret/vbm3d
Framework	none


Title	RP-DNN: A Tweet level propagation context based deep neural networks for early rumor detection in Social Media
Authors	Jie Gao, Sooji Han, Xingyi Song, Fabio Ciravegna
Abstract	Early rumor detection (ERD) on social media platform is very challenging when limited, incomplete and noisy information is available. Most of the existing methods have largely worked on event-level detection that requires the collection of posts relevant to a specific event and relied only on user-generated content. They are not appropriate to detect rumor sources in the very early stages, before an event unfolds and becomes widespread. In this paper, we address the task of ERD at the message level. We present a novel hybrid neural network architecture, which combines a task-specific character-based bidirectional language model and stacked Long Short-Term Memory (LSTM) networks to represent textual contents and social-temporal contexts of input source tweets, for modelling propagation patterns of rumors in the early stages of their development. We apply multi-layered attention models to jointly learn attentive context embeddings over multiple context inputs. Our experiments employ a stringent leave-one-out cross-validation (LOO-CV) evaluation setup on seven publicly available real-life rumor event data sets. Our models achieve state-of-the-art(SoA) performance for detecting unseen rumors on large augmented data which covers more than 12 events and 2,967 rumors. An ablation study is conducted to understand the relative contribution of each component of our proposed model.
Tasks	Language Modelling
Published	2020-02-28
URL	https://arxiv.org/abs/2002.12683v2
PDF	https://arxiv.org/pdf/2002.12683v2.pdf
PWC	https://paperswithcode.com/paper/rp-dnn-a-tweet-level-propagation-context
Repo	https://github.com/jerrygaoLondon/RPDNN
Framework	none

Head and Tail Localization of C. elegans


Title	Head and Tail Localization of C. elegans
Authors	Mansi Ranjit Mane, Aniket Anand Deshmukh, Adam J. Iliff
Abstract	C. elegans is commonly used in neuroscience for behaviour analysis because of it’s compact nervous system with well-described connectivity. Localizing the animal and distinguishing between its head and tail are important tasks to track the worm during behavioural assays and to perform quantitative analyses. We demonstrate a neural network based approach to localize both the head and the tail of the worm in an image. To make empirical results in the paper reproducible and promote open source machine learning based solutions for C. elegans behavioural analysis, we also make our code publicly available.
Tasks
Published	2020-01-12
URL	https://arxiv.org/abs/2001.03981v1
PDF	https://arxiv.org/pdf/2001.03981v1.pdf
PWC	https://paperswithcode.com/paper/head-and-tail-localization-of-c-elegans
Repo	https://github.com/mansimane/WormML
Framework	tf

Modeling Global and Local Node Contexts for Text Generation from Knowledge Graphs


Title	Modeling Global and Local Node Contexts for Text Generation from Knowledge Graphs
Authors	Leonardo F. R. Ribeiro, Yue Zhang, Claire Gardent, Iryna Gurevych
Abstract	Recent graph-to-text models generate text from graph-based data using either global or local aggregation to learn node representations. Global node encoding allows explicit communication between two distant nodes, thereby neglecting graph topology as all nodes are connected. In contrast, local node encoding considers the relations between directly connected nodes capturing the graph structure, but it can fail to capture long-range relations. In this work, we gather the best of both encoding strategies, proposing novel models that encode an input graph combining both global and local node contexts. Our approaches are able to learn better contextualized node embeddings for text generation. In our experiments, we demonstrate that our models lead to significant improvements in KG-to-text generation, achieving BLEU scores of 17.81 on AGENDA dataset, and 63.10 on the WebNLG dataset for seen categories, outperforming the state of the art by 3.51 and 2.51 points, respectively.
Tasks	Knowledge Graphs, Text Generation
Published	2020-01-29
URL	https://arxiv.org/abs/2001.11003v1
PDF	https://arxiv.org/pdf/2001.11003v1.pdf
PWC	https://paperswithcode.com/paper/modeling-global-and-local-node-contexts-for
Repo	https://github.com/UKPLab/kg2text
Framework	none

University-1652: A Multi-view Multi-source Benchmark for Drone-based Geo-localization


Title	University-1652: A Multi-view Multi-source Benchmark for Drone-based Geo-localization
Authors	Zhedong Zheng, Yunchao Wei, Yi Yang
Abstract	We consider the problem of cross-view geo-localization. The primary challenge of this task is to learn the robust feature against large viewpoint changes. Existing benchmarks can help, but are limited in the number of viewpoints. Image pairs, containing two viewpoints, e.g., satellite and ground, are usually provided, which may compromise the feature learning. Besides phone cameras and satellites, in this paper, we argue that drones could serve as the third platform to deal with the geo-localization problem. In contrast to the traditional ground-view images, drone-view images meet fewer obstacles, e.g., trees, and could provide a comprehensive view when flying around the target place. To verify the effectiveness of the drone platform, we introduce a new multi-view multi-source benchmark for drone-based geo-localization, named University-1652. University-1652 contains data from three platforms, i.e., synthetic drones, satellites and ground cameras of 1,652 university buildings around the world. To our knowledge, University-1652 is the first drone-based geo-localization dataset and enables two new tasks, i.e., drone-view target localization and drone navigation. As the name implies, drone-view target localization intends to predict the location of the target place via drone-view images. On the other hand, given a satellite-view query image, drone navigation is to drive the drone to the area of interest in the query. We use this dataset to analyze a variety of off-the-shelf CNN features and propose a strong CNN baseline on this challenging dataset. The experiments show that University-1652 helps the model to learn the viewpoint-invariant features and also has good generalization ability in the real-world scenario.
Tasks	Drone navigation, Drone-view target localization, Image-Based Localization
Published	2020-02-27
URL	https://arxiv.org/abs/2002.12186v1
PDF	https://arxiv.org/pdf/2002.12186v1.pdf
PWC	https://paperswithcode.com/paper/university-1652-a-multi-view-multi-source
Repo	https://github.com/layumi/University1652-Baseline
Framework	pytorch

NeuCrowd: Neural Sampling Network for Representation Learning with Crowdsourced Labels


Title	NeuCrowd: Neural Sampling Network for Representation Learning with Crowdsourced Labels
Authors	Yang Hao, Wenbiao Ding, Zitao Liu
Abstract	Representation learning approaches require a massive amount of discriminative training data, which is unavailable in many scenarios, such as healthcare, small city, education, etc. In practice, people refer to crowdsourcing to get annotated labels. However, due to issues like data privacy, budget limitation, shortage of domain-specific annotators, the number of crowdsourced labels are still very limited. Moreover, because of annotators’ diverse expertises, crowdsourced labels are often inconsistent. Thus, directly applying existing representation learning algorithms may easily get the overfitting problem and yield suboptimal solutions. In this paper, we propose \emph{NeuCrowd}, a unified framework for representation learning from crowdsourced labels. The proposed framework (1) creates a sufficient number of high-quality \emph{n}-tuplet training samples by utilizing safety-aware sampling and robust anchor generation; and (2) automatically learns a neural sampling network that adaptively learns to select effective samples for representation learning network. The proposed framework is evaluated on both synthetic and real-world data sets. The results show that our approach outperforms a wide range of state-of-the-art baselines in terms of prediction accuracy and AUC\footnote{To encourage the reproducible results, we make our code public on a github repository, i.e., \url{https://github.com/crowd-data-mining/NeuCrowd}}.
Tasks	Representation Learning
Published	2020-03-21
URL	https://arxiv.org/abs/2003.09660v1
PDF	https://arxiv.org/pdf/2003.09660v1.pdf
PWC	https://paperswithcode.com/paper/neucrowd-neural-sampling-network-for
Repo	https://github.com/crowd-data-mining/NeuCrowd
Framework	tf

Perception of prosodic variation for speech synthesis using an unsupervised discrete representation of F0


Title	Perception of prosodic variation for speech synthesis using an unsupervised discrete representation of F0
Authors	Zack Hodari, Catherine Lai, Simon King
Abstract	In English, prosody adds a broad range of information to segment sequences, from information structure (e.g. contrast) to stylistic variation (e.g. expression of emotion). However, when learning to control prosody in text-to-speech voices, it is not clear what exactly the control is modifying. Existing research on discrete representation learning for prosody has demonstrated high naturalness, but no analysis has been performed on what these representations capture, or if they can generate meaningfully-distinct variants of an utterance. We present a phrase-level variational autoencoder with a multi-modal prior, using the mode centres as “intonation codes”. Our evaluation establishes which intonation codes are perceptually distinct, finding that the intonation codes from our multi-modal latent model were significantly more distinct than a baseline using k-means clustering. We carry out a follow-up qualitative study to determine what information the codes are carrying. Most commonly, listeners commented on the intonation codes having a statement or question style. However, many other affect-related styles were also reported, including: emotional, uncertain, surprised, sarcastic, passive aggressive, and upset.
Tasks	Representation Learning, Speech Synthesis
Published	2020-03-14
URL	https://arxiv.org/abs/2003.06686v1
PDF	https://arxiv.org/pdf/2003.06686v1.pdf
PWC	https://paperswithcode.com/paper/perception-of-prosodic-variation-for-speech
Repo	https://github.com/ZackHodari/discrete_intonation
Framework	pytorch

WICA: nonlinear weighted ICA


Title	WICA: nonlinear weighted ICA
Authors	Andrzej Bedychaj, Przemysław Spurek, Aleksandra Nowak, Jacek Tabor
Abstract	Independent Component Analysis (ICA) aims to find a coordinate system in which the components of the data are independent. In this paper we construct a new nonlinear ICA model, called WICA, which obtains better and more stable results than other algorithms. A crucial tool is given by a new efficient method of verifying nonlinear dependence with the use of computation of correlation coefficients for normally weighted data.
Tasks
Published	2020-01-13
URL	https://arxiv.org/abs/2001.04147v1
PDF	https://arxiv.org/pdf/2001.04147v1.pdf
PWC	https://paperswithcode.com/paper/wica-nonlinear-weighted-ica
Repo	https://github.com/gmum/wica
Framework	none

Conjoined Dirichlet Process


Title	Conjoined Dirichlet Process
Authors	Michelle N. Ngo, Dustin S. Pluta, Alexander N. Ngo, Babak Shahbaba
Abstract	Biclustering is a class of techniques that simultaneously clusters the rows and columns of a matrix to sort heterogeneous data into homogeneous blocks. Although many algorithms have been proposed to find biclusters, existing methods suffer from the pre-specification of the number of biclusters or place constraints on the model structure. To address these issues, we develop a novel, non-parametric probabilistic biclustering method based on Dirichlet processes to identify biclusters with strong co-occurrence in both rows and columns. The proposed method utilizes dual Dirichlet process mixture models to learn row and column clusters, with the number of resulting clusters determined by the data rather than pre-specified. Probabilistic biclusters are identified by modeling the mutual dependence between the row and column clusters. We apply our method to two different applications, text mining and gene expression analysis, and demonstrate that our method improves bicluster extraction in many settings compared to existing approaches.
Tasks
Published	2020-02-08
URL	https://arxiv.org/abs/2002.03223v1
PDF	https://arxiv.org/pdf/2002.03223v1.pdf
PWC	https://paperswithcode.com/paper/conjoined-dirichlet-process
Repo	https://github.com/micnngo/CDP
Framework	none