April 3, 2020

# Paper Group AWR 16

A Deep Learning Approach for the Computation of Curvature in the Level-Set Method. Learning to Respond with Stickers: A Framework of Unifying Multi-Modality in Multi-Turn Dialog. Probabilistic 3D Multi-Object Tracking for Autonomous Driving. Interpretable Rumor Detection in Microblogs by Attending to User Interactions. Multi-task self-supervised le …

#### A Deep Learning Approach for the Computation of Curvature in the Level-Set Method

Title A Deep Learning Approach for the Computation of Curvature in the Level-Set Method
Authors Luis Ángel Larios Cárdenas, Frederic Gibou
Abstract We propose a deep learning strategy to compute the mean curvature of an implicit level-set representation of an interface. Our approach is based on fitting neural networks to synthetic datasets of pairs of nodal $\phi$ values and curvatures obtained from circular interfaces immersed in different uniform resolutions. These neural networks are multilayer perceptrons that ingest sample level-set values of grid points along a free boundary and output the dimensionless curvature at the center vertices of each sampled neighborhood. Evaluations with irregular (smooth and sharp) interfaces, in both uniform and adaptive meshes, show that our deep learning approach is systematically superior to conventional numerical approximation in the $L^2$ and $L^\infty$ norms. Our methodology is also less sensitive to steep curvatures and approximates them well with samples collected with fewer iterations of the reinitialization equation, often needed to regularize the underlying implicit function. Additionally, we show that an application-dependent map of local resolutions to neural networks can be constructed and employed to estimate interface curvatures more efficiently than using typically expensive numerical schemes while still attaining comparable or higher precision.
Published 2020-02-04
URL https://arxiv.org/abs/2002.02804v1
PDF https://arxiv.org/pdf/2002.02804v1.pdf
PWC https://paperswithcode.com/paper/a-deep-learning-approach-for-the-computation
Repo https://github.com/UCSB-CASL/LSCurvatureDL
Framework none

#### Learning to Respond with Stickers: A Framework of Unifying Multi-Modality in Multi-Turn Dialog

Title Learning to Respond with Stickers: A Framework of Unifying Multi-Modality in Multi-Turn Dialog
Authors Shen Gao, Xiuying Chen, Chang Liu, Li Liu, Dongyan Zhao, Rui Yan
Abstract Stickers with vivid and engaging expressions are becoming increasingly popular in online messaging apps, and some works are dedicated to automatically select sticker response by matching text labels of stickers with previous utterances. However, due to their large quantities, it is impractical to require text labels for the all stickers. Hence, in this paper, we propose to recommend an appropriate sticker to user based on multi-turn dialog context history without any external labels. Two main challenges are confronted in this task. One is to learn semantic meaning of stickers without corresponding text labels. Another challenge is to jointly model the candidate sticker with the multi-turn dialog context. To tackle these challenges, we propose a sticker response selector (SRS) model. Specifically, SRS first employs a convolutional based sticker image encoder and a self-attention based multi-turn dialog encoder to obtain the representation of stickers and utterances. Next, deep interaction network is proposed to conduct deep matching between the sticker with each utterance in the dialog history. SRS then learns the short-term and long-term dependency between all interaction results by a fusion network to output the the final matching score. To evaluate our proposed method, we collect a large-scale real-world dialog dataset with stickers from one of the most popular online chatting platform. Extensive experiments conducted on this dataset show that our model achieves the state-of-the-art performance for all commonly-used metrics. Experiments also verify the effectiveness of each component of SRS. To facilitate further research in sticker selection field, we release this dataset of 340K multi-turn dialog and sticker pairs.
Published 2020-03-10
URL https://arxiv.org/abs/2003.04679v1
PDF https://arxiv.org/pdf/2003.04679v1.pdf
PWC https://paperswithcode.com/paper/learning-to-respond-with-stickers-a-framework
Repo https://github.com/gsh199449/stickerchat
Framework tf

#### Probabilistic 3D Multi-Object Tracking for Autonomous Driving

Title Probabilistic 3D Multi-Object Tracking for Autonomous Driving
Authors Hsu-kuang Chiu, Antonio Prioletti, Jie Li, Jeannette Bohg
Abstract 3D multi-object tracking is a key module in autonomous driving applications that provides a reliable dynamic representation of the world to the planning module. In this paper, we present our on-line tracking method, which made the first place in the NuScenes Tracking Challenge, held at the AI Driving Olympics Workshop at NeurIPS 2019. Our method estimates the object states by adopting a Kalman Filter. We initialize the state covariance as well as the process and observation noise covariance with statistics from the training set. We also use the stochastic information from the Kalman Filter in the data association step by measuring the Mahalanobis distance between the predicted object states and current object detections. Our experimental results on the NuScenes validation and test set show that our method outperforms the AB3DMOT baseline method by a large margin in the Average Multi-Object Tracking Accuracy (AMOTA) metric.
Tasks 3D Multi-Object Tracking, Autonomous Driving, Multi-Object Tracking, Object Tracking
Published 2020-01-16
URL https://arxiv.org/abs/2001.05673v1
PDF https://arxiv.org/pdf/2001.05673v1.pdf
PWC https://paperswithcode.com/paper/probabilistic-3d-multi-object-tracking-for
Repo https://github.com/eddyhkchiu/mahalanobis_3d_multi_object_tracking
Framework none

#### Interpretable Rumor Detection in Microblogs by Attending to User Interactions

Title Interpretable Rumor Detection in Microblogs by Attending to User Interactions
Authors Ling Min Serena Khoo, Hai Leong Chieu, Zhong Qian, Jing Jiang
Abstract We address rumor detection by learning to differentiate between the community’s response to real and fake claims in microblogs. Existing state-of-the-art models are based on tree models that model conversational trees. However, in social media, a user posting a reply might be replying to the entire thread rather than to a specific user. We propose a post-level attention model (PLAN) to model long distance interactions between tweets with the multi-head attention mechanism in a transformer network. We investigated variants of this model: (1) a structure aware self-attention model (StA-PLAN) that incorporates tree structure information in the transformer network, and (2) a hierarchical token and post-level attention model (StA-HiTPLAN) that learns a sentence representation with token-level self-attention. To the best of our knowledge, we are the first to evaluate our models on two rumor detection data sets: the PHEME data set as well as the Twitter15 and Twitter16 data sets. We show that our best models outperform current state-of-the-art models for both data sets. Moreover, the attention mechanism allows us to explain rumor detection predictions at both token-level and post-level.
Published 2020-01-29
URL https://arxiv.org/abs/2001.10667v1
PDF https://arxiv.org/pdf/2001.10667v1.pdf
PWC https://paperswithcode.com/paper/interpretable-rumor-detection-in-microblogs
Repo https://github.com/serenaklm/rumor_detection
Framework pytorch

#### Multi-task self-supervised learning for Robust Speech Recognition

Title Multi-task self-supervised learning for Robust Speech Recognition
Authors Mirco Ravanelli, Jianyuan Zhong, Santiago Pascual, Pawel Swietojanski, Joao Monteiro, Jan Trmal, Yoshua Bengio
Abstract Despite the growing interest in unsupervised learning, extracting meaningful knowledge from unlabelled audio remains an open challenge. To take a step in this direction, we recently proposed a problem-agnostic speech encoder (PASE), that combines a convolutional encoder followed by multiple neural networks, called workers, tasked to solve self-supervised problems (i.e., ones that do not require manual annotations as ground truth). PASE was shown to capture relevant speech information, including speaker voice-print and phonemes. This paper proposes PASE+, an improved version of PASE for robust speech recognition in noisy and reverberant environments. To this end, we employ an online speech distortion module, that contaminates the input signals with a variety of random disturbances. We then propose a revised encoder that better learns short- and long-term speech dynamics with an efficient combination of recurrent and convolutional networks. Finally, we refine the set of workers used in self-supervision to encourage better cooperation. Results on TIMIT, DIRHA and CHiME-5 show that PASE+ significantly outperforms both the previous version of PASE as well as common acoustic features. Interestingly, PASE+ learns transferable representations suitable for highly mismatched acoustic conditions.
Tasks Robust Speech Recognition, Speech Recognition
Published 2020-01-25
URL https://arxiv.org/abs/2001.09239v1
PDF https://arxiv.org/pdf/2001.09239v1.pdf
Repo https://github.com/santi-pdp/pase
Framework pytorch

#### On Isometry Robustness of Deep 3D Point Cloud Models under Adversarial Attacks

Title On Isometry Robustness of Deep 3D Point Cloud Models under Adversarial Attacks
Authors Yue Zhao, Yuwei Wu, Caihua Chen, Andrew Lim
Abstract While deep learning in 3D domain has achieved revolutionary performance in many tasks, the robustness of these models has not been sufficiently studied or explored. Regarding the 3D adversarial samples, most existing works focus on manipulation of local points, which may fail to invoke the global geometry properties, like robustness under linear projection that preserves the Euclidean distance, i.e., isometry. In this work, we show that existing state-of-the-art deep 3D models are extremely vulnerable to isometry transformations. Armed with the Thompson Sampling, we develop a black-box attack with success rate over 95% on ModelNet40 data set. Incorporating with the Restricted Isometry Property, we propose a novel framework of white-box attack on top of spectral norm based perturbation. In contrast to previous works, our adversarial samples are experimentally shown to be strongly transferable. Evaluated on a sequence of prevailing 3D models, our white-box attack achieves success rates from 98.88% to 100%. It maintains a successful attack rate over 95% even within an imperceptible rotation range $[\pm 2.81^{\circ}]$.
Published 2020-02-27
URL https://arxiv.org/abs/2002.12222v2
PDF https://arxiv.org/pdf/2002.12222v2.pdf
PWC https://paperswithcode.com/paper/on-isometry-robustness-of-deep-3d-point-cloud
Repo https://github.com/skywalker6174/3d-isometry-robust
Framework pytorch

#### W2S: A Joint Denoising and Super-Resolution Dataset

Title W2S: A Joint Denoising and Super-Resolution Dataset
Authors Ruofan Zhou, Majed El Helou, Daniel Sage, Thierry Laroche, Arne Seitz, Sabine Süsstrunk
Abstract Denoising and super-resolution (SR) are fundamental tasks in imaging. These two restoration tasks are well covered in the literature, however, only separately. Given a noisy low-resolution (LR) input image, it is yet unclear what the best approach would be in order to obtain a noise-free high-resolution (HR) image. In order to study joint denoising and super-resolution (JDSR), a dataset containing pairs of noisy LR images and the corresponding HR images is fundamental. We propose such a novel JDSR dataset, Wieldfield2SIM (W2S), acquired using microscopy equipment and techniques. W2S is comprised of 144,000 real fluorescence microscopy images, used to form a total of 360 sets of images. A set is comprised of noisy LR images with different noise levels, a noise-free LR image, and a corresponding high-quality HR image. W2S allows us to benchmark the combinations of 6 denoising methods and 6 SR methods. We show that state-of-the-art SR networks perform very poorly on noisy inputs, with a loss reaching 14dB relative to noise-free inputs. Our evaluation also shows that applying the best denoiser in terms of reconstruction error followed by the best SR method does not yield the best result. The best denoising PSNR can, for instance, come at the expense of a loss in high frequencies, which is detrimental for SR methods. We lastly demonstrate that a light-weight SR network with a novel texture loss, trained specifically for JDSR, outperforms any combination of state-of-the-art deep denoising and SR networks.
Published 2020-03-12
URL https://arxiv.org/abs/2003.05961v1
PDF https://arxiv.org/pdf/2003.05961v1.pdf
PWC https://paperswithcode.com/paper/w2s-a-joint-denoising-and-super-resolution
Repo https://github.com/widefield2sim/w2s
Framework none

#### The Chef’s Hat Simulation Environment for Reinforcement-Learning-Based Agents

Title The Chef’s Hat Simulation Environment for Reinforcement-Learning-Based Agents
Authors Pablo Barros, Anne C. Bloem, Inge M. Hootsmans, Lena M. Opheij, Romain H. A. Toebosch, Emilia Barakova, Alessandra Sciutti
Abstract To achieve social interactions within Human-Robot Interaction (HRI) environments is a very challenging task. Most of the current research focuses on Wizard-of-Oz approaches, which neglect the recent development of intelligent robots. On the other hand, real-world scenarios usually do not provide the necessary control and reproducibility which are needed for learning algorithms. In this paper, we propose a virtual simulation environment that implements the Chef’s Hat card game, designed to be used in HRI scenarios, to provide a controllable and reproducible scenario for reinforcement-learning algorithms.
Published 2020-03-12
URL https://arxiv.org/abs/2003.05861v1
PDF https://arxiv.org/pdf/2003.05861v1.pdf
PWC https://paperswithcode.com/paper/the-chefs-hat-simulation-environment-for
Repo https://github.com/pablovin/ChefsHatGYM
Framework none

#### Pre-trained Models for Natural Language Processing: A Survey

Title Pre-trained Models for Natural Language Processing: A Survey
Authors Xipeng Qiu, Tianxiang Sun, Yige Xu, Yunfan Shao, Ning Dai, Xuanjing Huang
Abstract Recently, the emergence of pre-trained models (PTMs) has brought natural language processing (NLP) to a new era. In this survey, we provide a comprehensive review of PTMs for NLP. We first briefly introduce language representation learning and its research progress. Then we systematically categorize existing PTMs based on a taxonomy with four perspectives. Next, we describe how to adapt the knowledge of PTMs to the downstream tasks. Finally, we outline some potential directions of PTMs for future research. This survey is purposed to be a hands-on guide for understanding, using, and developing PTMs for various NLP tasks.
Published 2020-03-18
URL https://arxiv.org/abs/2003.08271v2
PDF https://arxiv.org/pdf/2003.08271v2.pdf
PWC https://paperswithcode.com/paper/pre-trained-models-for-natural-language
Repo https://github.com/tomohideshibata/BERT-related-papers
Framework pytorch

#### Irony Detection in a Multilingual Context

Title Irony Detection in a Multilingual Context
Authors Bilal Ghanem, Jihen Karoui, Farah Benamara, Paolo Rosso, Véronique Moriceau
Abstract This paper proposes the first multilingual (French, English and Arabic) and multicultural (Indo-European languages vs. less culturally close languages) irony detection system. We employ both feature-based models and neural architectures using monolingual word representation. We compare the performance of these systems with state-of-the-art systems to identify their capabilities. We show that these monolingual models trained separately on different languages using multilingual word representation or text-based features can open the door to irony detection in languages that lack of annotated data for irony.
Published 2020-02-06
URL https://arxiv.org/abs/2002.02427v1
PDF https://arxiv.org/pdf/2002.02427v1.pdf
PWC https://paperswithcode.com/paper/irony-detection-in-a-multilingual-context
Repo https://github.com/bilalghanem/multilingual_irony
Framework none

#### Adversarial Disentanglement with Grouped Observations

Title Adversarial Disentanglement with Grouped Observations
Authors Jozsef Nemeth
Abstract We consider the disentanglement of the representations of the relevant attributes of the data (content) from all other factors of variations (style) using Variational Autoencoders. Some recent works addressed this problem by utilizing grouped observations, where the content attributes are assumed to be common within each group, while there is no any supervised information on the style factors. In many cases, however, these methods fail to prevent the models from using the style variables to encode content related features as well. This work supplements these algorithms with a method that eliminates the content information in the style representations. For that purpose the training objective is augmented to minimize an appropriately defined mutual information term in an adversarial way. Experimental results and comparisons on image datasets show that the resulting method can efficiently separate the content and style related attributes and generalizes to unseen data.
Published 2020-01-14
URL https://arxiv.org/abs/2001.04761v1
PDF https://arxiv.org/pdf/2001.04761v1.pdf
Repo https://github.com/jonemeth/aaai20
Framework tf

#### Efficient Policy Learning from Surrogate-Loss Classification Reductions

Title Efficient Policy Learning from Surrogate-Loss Classification Reductions
Authors Andrew Bennett, Nathan Kallus
Abstract Recent work on policy learning from observational data has highlighted the importance of efficient policy evaluation and has proposed reductions to weighted (cost-sensitive) classification. But, efficient policy evaluation need not yield efficient estimation of policy parameters. We consider the estimation problem given by a weighted surrogate-loss classification reduction of policy learning with any score function, either direct, inverse-propensity weighted, or doubly robust. We show that, under a correct specification assumption, the weighted classification formulation need not be efficient for policy parameters. We draw a contrast to actual (possibly weighted) binary classification, where correct specification implies a parametric model, while for policy learning it only implies a semiparametric model. In light of this, we instead propose an estimation approach based on generalized method of moments, which is efficient for the policy parameters. We propose a particular method based on recent developments on solving moment problems using neural networks and demonstrate the efficiency and regret benefits of this method empirically.
Published 2020-02-12
URL https://arxiv.org/abs/2002.05153v1
PDF https://arxiv.org/pdf/2002.05153v1.pdf
PWC https://paperswithcode.com/paper/efficient-policy-learning-from-surrogate-loss
Repo https://github.com/CausalML/ESPRM
Framework none

#### Diagnosing Colorectal Polyps in the Wild with Capsule Networks

Title Diagnosing Colorectal Polyps in the Wild with Capsule Networks
Authors Rodney LaLonde, Pujan Kandel, Concetto Spampinato, Michael B. Wallace, Ulas Bagci
Abstract Colorectal cancer, largely arising from precursor lesions called polyps, remains one of the leading causes of cancer-related death worldwide. Current clinical standards require the resection and histopathological analysis of polyps due to test accuracy and sensitivity of optical biopsy methods falling substantially below recommended levels. In this study, we design a novel capsule network architecture (D-Caps) to improve the viability of optical biopsy of colorectal polyps. Our proposed method introduces several technical novelties including a novel capsule architecture with a capsule-average pooling (CAP) method to improve efficiency in large-scale image classification. We demonstrate improved results over the previous state-of-the-art convolutional neural network (CNN) approach by as much as 43%. This work provides an important benchmark on the new Mayo Polyp dataset, a significantly more challenging and larger dataset than previous polyp studies, with results stratified across all available categories, imaging devices and modalities, and focus modes to promote future direction into AI-driven colorectal cancer screening systems. Code is publicly available at https://github.com/lalonderodney/D-Caps .
Published 2020-01-10
URL https://arxiv.org/abs/2001.03305v1
PDF https://arxiv.org/pdf/2001.03305v1.pdf
PWC https://paperswithcode.com/paper/diagnosing-colorectal-polyps-in-the-wild-with
Repo https://github.com/lalonderodney/D-Caps
Framework none

#### Discriminator Soft Actor Critic without Extrinsic Rewards

Title Discriminator Soft Actor Critic without Extrinsic Rewards
Authors Daichi Nishio, Daiki Kuyoshi, Toi Tsuneda, Satoshi Yamane
Abstract It is difficult to be able to imitate well in unknown states from a small amount of expert data and sampling data. Supervised learning methods such as Behavioral Cloning do not require sampling data, but usually suffer from distribution shift. The methods based on reinforcement learning, such as inverse reinforcement learning and generative adversarial imitation learning (GAIL), can learn from only a few expert data. However, they often need to interact with the environment. Soft Q imitation learning addressed the problems, and it was shown that it could learn efficiently by combining Behavioral Cloning and soft Q-learning with constant rewards. In order to make this algorithm more robust to distribution shift, we propose Discriminator Soft Actor Critic (DSAC). It uses a reward function based on adversarial inverse reinforcement learning instead of constant rewards. We evaluated it on PyBullet environments with only four expert trajectories.
Published 2020-01-19
URL https://arxiv.org/abs/2001.06808v3
PDF https://arxiv.org/pdf/2001.06808v3.pdf
PWC https://paperswithcode.com/paper/discriminator-soft-actor-critic-without
Repo https://github.com/dnishio/DSAC
Framework none

#### Towards Automatic Face-to-Face Translation

Title Towards Automatic Face-to-Face Translation
Authors Prajwal K R, Rudrabha Mukhopadhyay, Jerin Philip, Abhishek Jha, Vinay Namboodiri, C. V. Jawahar
Abstract In light of the recent breakthroughs in automatic machine translation systems, we propose a novel approach that we term as “Face-to-Face Translation”. As today’s digital communication becomes increasingly visual, we argue that there is a need for systems that can automatically translate a video of a person speaking in language A into a target language B with realistic lip synchronization. In this work, we create an automatic pipeline for this problem and demonstrate its impact on multiple real-world applications. First, we build a working speech-to-speech translation system by bringing together multiple existing modules from speech and language. We then move towards “Face-to-Face Translation” by incorporating a novel visual module, LipGAN for generating realistic talking faces from the translated audio. Quantitative evaluation of LipGAN on the standard LRW test set shows that it significantly outperforms existing approaches across all standard metrics. We also subject our Face-to-Face Translation pipeline, to multiple human evaluations and show that it can significantly improve the overall user experience for consuming and interacting with multimodal content across languages. Code, models and demo video are made publicly available. Demo video: https://www.youtube.com/watch?v=aHG6Oei8jF0 Code and models: https://github.com/Rudrabha/LipGAN
Tasks Face to Face Translation, Machine Translation
Published 2020-03-01
URL https://arxiv.org/abs/2003.00418v1
PDF https://arxiv.org/pdf/2003.00418v1.pdf
PWC https://paperswithcode.com/paper/towards-automatic-face-to-face-translation-1
Repo https://github.com/Rudrabha/LipGAN
Framework none