Paper Group AWR 23
Multi-Agent Interactions Modeling with Correlated Policies. Efficient Facial Feature Learning with Wide Ensemble-based Convolutional Neural Networks. MarginDistillation: distillation for margin-based softmax. GeoDA: a geometric framework for black-box adversarial attacks. Data Augmentation using Pre-trained Transformer Models. Interpretable End-to- …
Multi-Agent Interactions Modeling with Correlated Policies
Title | Multi-Agent Interactions Modeling with Correlated Policies |
Authors | Minghuan Liu, Ming Zhou, Weinan Zhang, Yuzheng Zhuang, Jun Wang, Wulong Liu, Yong Yu |
Abstract | In multi-agent systems, complex interacting behaviors arise due to the high correlations among agents. However, previous work on modeling multi-agent interactions from demonstrations is primarily constrained by assuming the independence among policies and their reward structures. In this paper, we cast the multi-agent interactions modeling problem into a multi-agent imitation learning framework with explicit modeling of correlated policies by approximating opponents’ policies, which can recover agents’ policies that can regenerate similar interactions. Consequently, we develop a Decentralized Adversarial Imitation Learning algorithm with Correlated policies (CoDAIL), which allows for decentralized training and execution. Various experiments demonstrate that CoDAIL can better regenerate complex interactions close to the demonstrators and outperforms state-of-the-art multi-agent imitation learning methods. Our code is available at \url{https://github.com/apexrl/CoDAIL}. |
Tasks | Imitation Learning |
Published | 2020-01-04 |
URL | https://arxiv.org/abs/2001.03415v2 |
https://arxiv.org/pdf/2001.03415v2.pdf | |
PWC | https://paperswithcode.com/paper/multi-agent-interactions-modeling-with-1 |
Repo | https://github.com/apexrl/CoDAIL |
Framework | none |
Efficient Facial Feature Learning with Wide Ensemble-based Convolutional Neural Networks
Title | Efficient Facial Feature Learning with Wide Ensemble-based Convolutional Neural Networks |
Authors | Henrique Siqueira, Sven Magg, Stefan Wermter |
Abstract | Ensemble methods, traditionally built with independently trained de-correlated models, have proven to be efficient methods for reducing the remaining residual generalization error, which results in robust and accurate methods for real-world applications. In the context of deep learning, however, training an ensemble of deep networks is costly and generates high redundancy which is inefficient. In this paper, we present experiments on Ensembles with Shared Representations (ESRs) based on convolutional networks to demonstrate, quantitatively and qualitatively, their data processing efficiency and scalability to large-scale datasets of facial expressions. We show that redundancy and computational load can be dramatically reduced by varying the branching level of the ESR without loss of diversity and generalization power, which are both important for ensemble performance. Experiments on large-scale datasets suggest that ESRs reduce the remaining residual generalization error on the AffectNet and FER+ datasets, reach human-level performance, and outperform state-of-the-art methods on facial expression recognition in the wild using emotion and affect concepts. |
Tasks | Facial Expression Recognition |
Published | 2020-01-17 |
URL | https://arxiv.org/abs/2001.06338v1 |
https://arxiv.org/pdf/2001.06338v1.pdf | |
PWC | https://paperswithcode.com/paper/efficient-facial-feature-learning-with-wide |
Repo | https://github.com/siqueira-hc/Efficient-Facial-Feature-Learning-with-Wide-Ensemble-based-Convolutional-Neural-Networks |
Framework | pytorch |
MarginDistillation: distillation for margin-based softmax
Title | MarginDistillation: distillation for margin-based softmax |
Authors | David Svitov, Sergey Alyamkin |
Abstract | The usage of convolutional neural networks (CNNs) in conjunction with a margin-based softmax approach demonstrates a state-of-the-art performance for the face recognition problem. Recently, lightweight neural network models trained with the margin-based softmax have been introduced for the face identification task for edge devices. In this paper, we propose a novel distillation method for lightweight neural network architectures that outperforms other known methods for the face recognition task on LFW, AgeDB-30 and Megaface datasets. The idea of the proposed method is to use class centers from the teacher network for the student network. Then the student network is trained to get the same angles between the class centers and the face embeddings, predicted by the teacher network. |
Tasks | Face Identification, Face Recognition |
Published | 2020-03-05 |
URL | https://arxiv.org/abs/2003.02586v1 |
https://arxiv.org/pdf/2003.02586v1.pdf | |
PWC | https://paperswithcode.com/paper/margindistillation-distillation-for-margin |
Repo | https://github.com/david-svitov/margindistillation |
Framework | none |
GeoDA: a geometric framework for black-box adversarial attacks
Title | GeoDA: a geometric framework for black-box adversarial attacks |
Authors | Ali Rahmati, Seyed-Mohsen Moosavi-Dezfooli, Pascal Frossard, Huaiyu Dai |
Abstract | Adversarial examples are known as carefully perturbed images fooling image classifiers. We propose a geometric framework to generate adversarial examples in one of the most challenging black-box settings where the adversary can only generate a small number of queries, each of them returning the top-$1$ label of the classifier. Our framework is based on the observation that the decision boundary of deep networks usually has a small mean curvature in the vicinity of data samples. We propose an effective iterative algorithm to generate query-efficient black-box perturbations with small $\ell_p$ norms for $p \ge 1$, which is confirmed via experimental evaluations on state-of-the-art natural image classifiers. Moreover, for $p=2$, we theoretically show that our algorithm actually converges to the minimal $\ell_2$-perturbation when the curvature of the decision boundary is bounded. We also obtain the optimal distribution of the queries over the iterations of the algorithm. Finally, experimental results confirm that our principled black-box attack algorithm performs better than state-of-the-art algorithms as it generates smaller perturbations with a reduced number of queries. |
Tasks | |
Published | 2020-03-13 |
URL | https://arxiv.org/abs/2003.06468v1 |
https://arxiv.org/pdf/2003.06468v1.pdf | |
PWC | https://paperswithcode.com/paper/geoda-a-geometric-framework-for-black-box |
Repo | https://github.com/thisisalirah/GeoDA |
Framework | pytorch |
Data Augmentation using Pre-trained Transformer Models
Title | Data Augmentation using Pre-trained Transformer Models |
Authors | Varun Kumar, Ashutosh Choudhary, Eunah Cho |
Abstract | Language model based pre-trained models such as BERT have provided significant gains across different NLP tasks. In this paper, we study different types of pre-trained transformer based models such as auto-regressive models (GPT-2), auto-encoder models (BERT), and seq2seq models (BART) for conditional data augmentation. We show that prepending the class labels to text sequences provides a simple yet effective way to condition the pre-trained models for data augmentation. On three classification benchmarks, pre-trained Seq2Seq model outperforms other models. Further, we explore how different pre-trained model based data augmentation differs in-terms of data diversity, and how well such methods preserve the class-label information. |
Tasks | Data Augmentation, Language Modelling |
Published | 2020-03-04 |
URL | https://arxiv.org/abs/2003.02245v1 |
https://arxiv.org/pdf/2003.02245v1.pdf | |
PWC | https://paperswithcode.com/paper/data-augmentation-using-pre-trained |
Repo | https://github.com/varinf/TransformersDataAugmentation |
Framework | none |
Interpretable End-to-end Urban Autonomous Driving with Latent Deep Reinforcement Learning
Title | Interpretable End-to-end Urban Autonomous Driving with Latent Deep Reinforcement Learning |
Authors | Jianyu Chen, Shengbo Eben Li, Masayoshi Tomizuka |
Abstract | Unlike popular modularized framework, end-to-end autonomous driving seeks to solve the perception, decision and control problems in an integrated way, which can be more adapting to new scenarios and easier to generalize at scale. However, existing end-to-end approaches are often lack of interpretability, and can only deal with simple driving tasks like lane keeping. In this paper, we propose an interpretable deep reinforcement learning method for end-to-end autonomous driving, which is able to handle complex urban scenarios. A sequential latent environment model is introduced and learned jointly with the reinforcement learning process. With this latent model, a semantic birdeye mask can be generated, which is enforced to connect with a certain intermediate property in today’s modularized framework for the purpose of explaining the behaviors of learned policy. The latent space also significantly reduces the sample complexity of reinforcement learning. Comparison tests with a simulated autonomous car in CARLA show that the performance of our method in urban scenarios with crowded surrounding vehicles dominates many baselines including DQN, DDPG, TD3 and SAC. Moreover, through masked outputs, the learned policy is able to provide a better explanation of how the car reasons about the driving environment. The codes and videos of this work are available at our github repo and project website. |
Tasks | Autonomous Driving |
Published | 2020-01-23 |
URL | https://arxiv.org/abs/2001.08726v2 |
https://arxiv.org/pdf/2001.08726v2.pdf | |
PWC | https://paperswithcode.com/paper/interpretable-end-to-end-urban-autonomous |
Repo | https://github.com/cjy1992/interp-e2e-driving |
Framework | tf |
Implementation of the VBM3D Video Denoising Method and Some Variants
Title | Implementation of the VBM3D Video Denoising Method and Some Variants |
Authors | Thibaud Ehret, Pablo Arias |
Abstract | VBM3D is an extension to video of the well known image denoising algorithm BM3D, which takes advantage of the sparse representation of stacks of similar patches in a transform domain. The extension is rather straightforward: the similar 2D patches are taken from a spatio-temporal neighborhood which includes neighboring frames. In spite of its simplicity, the algorithm offers a good trade-off between denoising performance and computational complexity. In this work we revisit this method, providing an open-source C++ implementation reproducing the results. A detailed description is given and the choice of parameters is thoroughly discussed. Furthermore, we discuss several extensions of the original algorithm: (1) a multi-scale implementation, (2) the use of 3D patches, (3) the use of optical flow to guide the patch search. These extensions allow to obtain results which are competitive with even the most recent state of the art. |
Tasks | Denoising, Image Denoising, Optical Flow Estimation, Video Denoising |
Published | 2020-01-06 |
URL | https://arxiv.org/abs/2001.01802v1 |
https://arxiv.org/pdf/2001.01802v1.pdf | |
PWC | https://paperswithcode.com/paper/implementation-of-the-vbm3d-video-denoising |
Repo | https://github.com/tehret/vbm3d |
Framework | none |
RP-DNN: A Tweet level propagation context based deep neural networks for early rumor detection in Social Media
Title | RP-DNN: A Tweet level propagation context based deep neural networks for early rumor detection in Social Media |
Authors | Jie Gao, Sooji Han, Xingyi Song, Fabio Ciravegna |
Abstract | Early rumor detection (ERD) on social media platform is very challenging when limited, incomplete and noisy information is available. Most of the existing methods have largely worked on event-level detection that requires the collection of posts relevant to a specific event and relied only on user-generated content. They are not appropriate to detect rumor sources in the very early stages, before an event unfolds and becomes widespread. In this paper, we address the task of ERD at the message level. We present a novel hybrid neural network architecture, which combines a task-specific character-based bidirectional language model and stacked Long Short-Term Memory (LSTM) networks to represent textual contents and social-temporal contexts of input source tweets, for modelling propagation patterns of rumors in the early stages of their development. We apply multi-layered attention models to jointly learn attentive context embeddings over multiple context inputs. Our experiments employ a stringent leave-one-out cross-validation (LOO-CV) evaluation setup on seven publicly available real-life rumor event data sets. Our models achieve state-of-the-art(SoA) performance for detecting unseen rumors on large augmented data which covers more than 12 events and 2,967 rumors. An ablation study is conducted to understand the relative contribution of each component of our proposed model. |
Tasks | Language Modelling |
Published | 2020-02-28 |
URL | https://arxiv.org/abs/2002.12683v2 |
https://arxiv.org/pdf/2002.12683v2.pdf | |
PWC | https://paperswithcode.com/paper/rp-dnn-a-tweet-level-propagation-context |
Repo | https://github.com/jerrygaoLondon/RPDNN |
Framework | none |
Head and Tail Localization of C. elegans
Title | Head and Tail Localization of C. elegans |
Authors | Mansi Ranjit Mane, Aniket Anand Deshmukh, Adam J. Iliff |
Abstract | C. elegans is commonly used in neuroscience for behaviour analysis because of it’s compact nervous system with well-described connectivity. Localizing the animal and distinguishing between its head and tail are important tasks to track the worm during behavioural assays and to perform quantitative analyses. We demonstrate a neural network based approach to localize both the head and the tail of the worm in an image. To make empirical results in the paper reproducible and promote open source machine learning based solutions for C. elegans behavioural analysis, we also make our code publicly available. |
Tasks | |
Published | 2020-01-12 |
URL | https://arxiv.org/abs/2001.03981v1 |
https://arxiv.org/pdf/2001.03981v1.pdf | |
PWC | https://paperswithcode.com/paper/head-and-tail-localization-of-c-elegans |
Repo | https://github.com/mansimane/WormML |
Framework | tf |
Modeling Global and Local Node Contexts for Text Generation from Knowledge Graphs
Title | Modeling Global and Local Node Contexts for Text Generation from Knowledge Graphs |
Authors | Leonardo F. R. Ribeiro, Yue Zhang, Claire Gardent, Iryna Gurevych |
Abstract | Recent graph-to-text models generate text from graph-based data using either global or local aggregation to learn node representations. Global node encoding allows explicit communication between two distant nodes, thereby neglecting graph topology as all nodes are connected. In contrast, local node encoding considers the relations between directly connected nodes capturing the graph structure, but it can fail to capture long-range relations. In this work, we gather the best of both encoding strategies, proposing novel models that encode an input graph combining both global and local node contexts. Our approaches are able to learn better contextualized node embeddings for text generation. In our experiments, we demonstrate that our models lead to significant improvements in KG-to-text generation, achieving BLEU scores of 17.81 on AGENDA dataset, and 63.10 on the WebNLG dataset for seen categories, outperforming the state of the art by 3.51 and 2.51 points, respectively. |
Tasks | Knowledge Graphs, Text Generation |
Published | 2020-01-29 |
URL | https://arxiv.org/abs/2001.11003v1 |
https://arxiv.org/pdf/2001.11003v1.pdf | |
PWC | https://paperswithcode.com/paper/modeling-global-and-local-node-contexts-for |
Repo | https://github.com/UKPLab/kg2text |
Framework | none |
University-1652: A Multi-view Multi-source Benchmark for Drone-based Geo-localization
Title | University-1652: A Multi-view Multi-source Benchmark for Drone-based Geo-localization |
Authors | Zhedong Zheng, Yunchao Wei, Yi Yang |
Abstract | We consider the problem of cross-view geo-localization. The primary challenge of this task is to learn the robust feature against large viewpoint changes. Existing benchmarks can help, but are limited in the number of viewpoints. Image pairs, containing two viewpoints, e.g., satellite and ground, are usually provided, which may compromise the feature learning. Besides phone cameras and satellites, in this paper, we argue that drones could serve as the third platform to deal with the geo-localization problem. In contrast to the traditional ground-view images, drone-view images meet fewer obstacles, e.g., trees, and could provide a comprehensive view when flying around the target place. To verify the effectiveness of the drone platform, we introduce a new multi-view multi-source benchmark for drone-based geo-localization, named University-1652. University-1652 contains data from three platforms, i.e., synthetic drones, satellites and ground cameras of 1,652 university buildings around the world. To our knowledge, University-1652 is the first drone-based geo-localization dataset and enables two new tasks, i.e., drone-view target localization and drone navigation. As the name implies, drone-view target localization intends to predict the location of the target place via drone-view images. On the other hand, given a satellite-view query image, drone navigation is to drive the drone to the area of interest in the query. We use this dataset to analyze a variety of off-the-shelf CNN features and propose a strong CNN baseline on this challenging dataset. The experiments show that University-1652 helps the model to learn the viewpoint-invariant features and also has good generalization ability in the real-world scenario. |
Tasks | Drone navigation, Drone-view target localization, Image-Based Localization |
Published | 2020-02-27 |
URL | https://arxiv.org/abs/2002.12186v1 |
https://arxiv.org/pdf/2002.12186v1.pdf | |
PWC | https://paperswithcode.com/paper/university-1652-a-multi-view-multi-source |
Repo | https://github.com/layumi/University1652-Baseline |
Framework | pytorch |
NeuCrowd: Neural Sampling Network for Representation Learning with Crowdsourced Labels
Title | NeuCrowd: Neural Sampling Network for Representation Learning with Crowdsourced Labels |
Authors | Yang Hao, Wenbiao Ding, Zitao Liu |
Abstract | Representation learning approaches require a massive amount of discriminative training data, which is unavailable in many scenarios, such as healthcare, small city, education, etc. In practice, people refer to crowdsourcing to get annotated labels. However, due to issues like data privacy, budget limitation, shortage of domain-specific annotators, the number of crowdsourced labels are still very limited. Moreover, because of annotators’ diverse expertises, crowdsourced labels are often inconsistent. Thus, directly applying existing representation learning algorithms may easily get the overfitting problem and yield suboptimal solutions. In this paper, we propose \emph{NeuCrowd}, a unified framework for representation learning from crowdsourced labels. The proposed framework (1) creates a sufficient number of high-quality \emph{n}-tuplet training samples by utilizing safety-aware sampling and robust anchor generation; and (2) automatically learns a neural sampling network that adaptively learns to select effective samples for representation learning network. The proposed framework is evaluated on both synthetic and real-world data sets. The results show that our approach outperforms a wide range of state-of-the-art baselines in terms of prediction accuracy and AUC\footnote{To encourage the reproducible results, we make our code public on a github repository, i.e., \url{https://github.com/crowd-data-mining/NeuCrowd}}. |
Tasks | Representation Learning |
Published | 2020-03-21 |
URL | https://arxiv.org/abs/2003.09660v1 |
https://arxiv.org/pdf/2003.09660v1.pdf | |
PWC | https://paperswithcode.com/paper/neucrowd-neural-sampling-network-for |
Repo | https://github.com/crowd-data-mining/NeuCrowd |
Framework | tf |
Perception of prosodic variation for speech synthesis using an unsupervised discrete representation of F0
Title | Perception of prosodic variation for speech synthesis using an unsupervised discrete representation of F0 |
Authors | Zack Hodari, Catherine Lai, Simon King |
Abstract | In English, prosody adds a broad range of information to segment sequences, from information structure (e.g. contrast) to stylistic variation (e.g. expression of emotion). However, when learning to control prosody in text-to-speech voices, it is not clear what exactly the control is modifying. Existing research on discrete representation learning for prosody has demonstrated high naturalness, but no analysis has been performed on what these representations capture, or if they can generate meaningfully-distinct variants of an utterance. We present a phrase-level variational autoencoder with a multi-modal prior, using the mode centres as “intonation codes”. Our evaluation establishes which intonation codes are perceptually distinct, finding that the intonation codes from our multi-modal latent model were significantly more distinct than a baseline using k-means clustering. We carry out a follow-up qualitative study to determine what information the codes are carrying. Most commonly, listeners commented on the intonation codes having a statement or question style. However, many other affect-related styles were also reported, including: emotional, uncertain, surprised, sarcastic, passive aggressive, and upset. |
Tasks | Representation Learning, Speech Synthesis |
Published | 2020-03-14 |
URL | https://arxiv.org/abs/2003.06686v1 |
https://arxiv.org/pdf/2003.06686v1.pdf | |
PWC | https://paperswithcode.com/paper/perception-of-prosodic-variation-for-speech |
Repo | https://github.com/ZackHodari/discrete_intonation |
Framework | pytorch |
WICA: nonlinear weighted ICA
Title | WICA: nonlinear weighted ICA |
Authors | Andrzej Bedychaj, Przemysław Spurek, Aleksandra Nowak, Jacek Tabor |
Abstract | Independent Component Analysis (ICA) aims to find a coordinate system in which the components of the data are independent. In this paper we construct a new nonlinear ICA model, called WICA, which obtains better and more stable results than other algorithms. A crucial tool is given by a new efficient method of verifying nonlinear dependence with the use of computation of correlation coefficients for normally weighted data. |
Tasks | |
Published | 2020-01-13 |
URL | https://arxiv.org/abs/2001.04147v1 |
https://arxiv.org/pdf/2001.04147v1.pdf | |
PWC | https://paperswithcode.com/paper/wica-nonlinear-weighted-ica |
Repo | https://github.com/gmum/wica |
Framework | none |
Conjoined Dirichlet Process
Title | Conjoined Dirichlet Process |
Authors | Michelle N. Ngo, Dustin S. Pluta, Alexander N. Ngo, Babak Shahbaba |
Abstract | Biclustering is a class of techniques that simultaneously clusters the rows and columns of a matrix to sort heterogeneous data into homogeneous blocks. Although many algorithms have been proposed to find biclusters, existing methods suffer from the pre-specification of the number of biclusters or place constraints on the model structure. To address these issues, we develop a novel, non-parametric probabilistic biclustering method based on Dirichlet processes to identify biclusters with strong co-occurrence in both rows and columns. The proposed method utilizes dual Dirichlet process mixture models to learn row and column clusters, with the number of resulting clusters determined by the data rather than pre-specified. Probabilistic biclusters are identified by modeling the mutual dependence between the row and column clusters. We apply our method to two different applications, text mining and gene expression analysis, and demonstrate that our method improves bicluster extraction in many settings compared to existing approaches. |
Tasks | |
Published | 2020-02-08 |
URL | https://arxiv.org/abs/2002.03223v1 |
https://arxiv.org/pdf/2002.03223v1.pdf | |
PWC | https://paperswithcode.com/paper/conjoined-dirichlet-process |
Repo | https://github.com/micnngo/CDP |
Framework | none |