April 3, 2020

3121 words 15 mins read

Paper Group AWR 16

A Deep Learning Approach for the Computation of Curvature in the Level-Set Method. Learning to Respond with Stickers: A Framework of Unifying Multi-Modality in Multi-Turn Dialog. Probabilistic 3D Multi-Object Tracking for Autonomous Driving. Interpretable Rumor Detection in Microblogs by Attending to User Interactions. Multi-task self-supervised le …

A Deep Learning Approach for the Computation of Curvature in the Level-Set Method


Title	A Deep Learning Approach for the Computation of Curvature in the Level-Set Method
Authors	Luis Ángel Larios Cárdenas, Frederic Gibou
Abstract	We propose a deep learning strategy to compute the mean curvature of an implicit level-set representation of an interface. Our approach is based on fitting neural networks to synthetic datasets of pairs of nodal $\phi$ values and curvatures obtained from circular interfaces immersed in different uniform resolutions. These neural networks are multilayer perceptrons that ingest sample level-set values of grid points along a free boundary and output the dimensionless curvature at the center vertices of each sampled neighborhood. Evaluations with irregular (smooth and sharp) interfaces, in both uniform and adaptive meshes, show that our deep learning approach is systematically superior to conventional numerical approximation in the $L^2$ and $L^\infty$ norms. Our methodology is also less sensitive to steep curvatures and approximates them well with samples collected with fewer iterations of the reinitialization equation, often needed to regularize the underlying implicit function. Additionally, we show that an application-dependent map of local resolutions to neural networks can be constructed and employed to estimate interface curvatures more efficiently than using typically expensive numerical schemes while still attaining comparable or higher precision.
Tasks
Published	2020-02-04
URL	https://arxiv.org/abs/2002.02804v1
PDF	https://arxiv.org/pdf/2002.02804v1.pdf
PWC	https://paperswithcode.com/paper/a-deep-learning-approach-for-the-computation
Repo	https://github.com/UCSB-CASL/LSCurvatureDL
Framework	none

Learning to Respond with Stickers: A Framework of Unifying Multi-Modality in Multi-Turn Dialog


Title	Learning to Respond with Stickers: A Framework of Unifying Multi-Modality in Multi-Turn Dialog
Authors	Shen Gao, Xiuying Chen, Chang Liu, Li Liu, Dongyan Zhao, Rui Yan
Abstract	Stickers with vivid and engaging expressions are becoming increasingly popular in online messaging apps, and some works are dedicated to automatically select sticker response by matching text labels of stickers with previous utterances. However, due to their large quantities, it is impractical to require text labels for the all stickers. Hence, in this paper, we propose to recommend an appropriate sticker to user based on multi-turn dialog context history without any external labels. Two main challenges are confronted in this task. One is to learn semantic meaning of stickers without corresponding text labels. Another challenge is to jointly model the candidate sticker with the multi-turn dialog context. To tackle these challenges, we propose a sticker response selector (SRS) model. Specifically, SRS first employs a convolutional based sticker image encoder and a self-attention based multi-turn dialog encoder to obtain the representation of stickers and utterances. Next, deep interaction network is proposed to conduct deep matching between the sticker with each utterance in the dialog history. SRS then learns the short-term and long-term dependency between all interaction results by a fusion network to output the the final matching score. To evaluate our proposed method, we collect a large-scale real-world dialog dataset with stickers from one of the most popular online chatting platform. Extensive experiments conducted on this dataset show that our model achieves the state-of-the-art performance for all commonly-used metrics. Experiments also verify the effectiveness of each component of SRS. To facilitate further research in sticker selection field, we release this dataset of 340K multi-turn dialog and sticker pairs.
Tasks
Published	2020-03-10
URL	https://arxiv.org/abs/2003.04679v1
PDF	https://arxiv.org/pdf/2003.04679v1.pdf
PWC	https://paperswithcode.com/paper/learning-to-respond-with-stickers-a-framework
Repo	https://github.com/gsh199449/stickerchat
Framework	tf

Probabilistic 3D Multi-Object Tracking for Autonomous Driving


Title	Probabilistic 3D Multi-Object Tracking for Autonomous Driving
Authors	Hsu-kuang Chiu, Antonio Prioletti, Jie Li, Jeannette Bohg
Abstract	3D multi-object tracking is a key module in autonomous driving applications that provides a reliable dynamic representation of the world to the planning module. In this paper, we present our on-line tracking method, which made the first place in the NuScenes Tracking Challenge, held at the AI Driving Olympics Workshop at NeurIPS 2019. Our method estimates the object states by adopting a Kalman Filter. We initialize the state covariance as well as the process and observation noise covariance with statistics from the training set. We also use the stochastic information from the Kalman Filter in the data association step by measuring the Mahalanobis distance between the predicted object states and current object detections. Our experimental results on the NuScenes validation and test set show that our method outperforms the AB3DMOT baseline method by a large margin in the Average Multi-Object Tracking Accuracy (AMOTA) metric.
Tasks	3D Multi-Object Tracking, Autonomous Driving, Multi-Object Tracking, Object Tracking
Published	2020-01-16
URL	https://arxiv.org/abs/2001.05673v1
PDF	https://arxiv.org/pdf/2001.05673v1.pdf
PWC	https://paperswithcode.com/paper/probabilistic-3d-multi-object-tracking-for
Repo	https://github.com/eddyhkchiu/mahalanobis_3d_multi_object_tracking
Framework	none

Interpretable Rumor Detection in Microblogs by Attending to User Interactions


Title	Interpretable Rumor Detection in Microblogs by Attending to User Interactions
Authors	Ling Min Serena Khoo, Hai Leong Chieu, Zhong Qian, Jing Jiang
Abstract	We address rumor detection by learning to differentiate between the community’s response to real and fake claims in microblogs. Existing state-of-the-art models are based on tree models that model conversational trees. However, in social media, a user posting a reply might be replying to the entire thread rather than to a specific user. We propose a post-level attention model (PLAN) to model long distance interactions between tweets with the multi-head attention mechanism in a transformer network. We investigated variants of this model: (1) a structure aware self-attention model (StA-PLAN) that incorporates tree structure information in the transformer network, and (2) a hierarchical token and post-level attention model (StA-HiTPLAN) that learns a sentence representation with token-level self-attention. To the best of our knowledge, we are the first to evaluate our models on two rumor detection data sets: the PHEME data set as well as the Twitter15 and Twitter16 data sets. We show that our best models outperform current state-of-the-art models for both data sets. Moreover, the attention mechanism allows us to explain rumor detection predictions at both token-level and post-level.
Tasks
Published	2020-01-29
URL	https://arxiv.org/abs/2001.10667v1
PDF	https://arxiv.org/pdf/2001.10667v1.pdf
PWC	https://paperswithcode.com/paper/interpretable-rumor-detection-in-microblogs
Repo	https://github.com/serenaklm/rumor_detection
Framework	pytorch

Multi-task self-supervised learning for Robust Speech Recognition


Title	Multi-task self-supervised learning for Robust Speech Recognition
Authors	Mirco Ravanelli, Jianyuan Zhong, Santiago Pascual, Pawel Swietojanski, Joao Monteiro, Jan Trmal, Yoshua Bengio
Abstract	Despite the growing interest in unsupervised learning, extracting meaningful knowledge from unlabelled audio remains an open challenge. To take a step in this direction, we recently proposed a problem-agnostic speech encoder (PASE), that combines a convolutional encoder followed by multiple neural networks, called workers, tasked to solve self-supervised problems (i.e., ones that do not require manual annotations as ground truth). PASE was shown to capture relevant speech information, including speaker voice-print and phonemes. This paper proposes PASE+, an improved version of PASE for robust speech recognition in noisy and reverberant environments. To this end, we employ an online speech distortion module, that contaminates the input signals with a variety of random disturbances. We then propose a revised encoder that better learns short- and long-term speech dynamics with an efficient combination of recurrent and convolutional networks. Finally, we refine the set of workers used in self-supervision to encourage better cooperation. Results on TIMIT, DIRHA and CHiME-5 show that PASE+ significantly outperforms both the previous version of PASE as well as common acoustic features. Interestingly, PASE+ learns transferable representations suitable for highly mismatched acoustic conditions.
Tasks	Robust Speech Recognition, Speech Recognition
Published	2020-01-25
URL	https://arxiv.org/abs/2001.09239v1
PDF	https://arxiv.org/pdf/2001.09239v1.pdf
PWC	https://paperswithcode.com/paper/multi-task-self-supervised-learning-for-1
Repo	https://github.com/santi-pdp/pase
Framework	pytorch

On Isometry Robustness of Deep 3D Point Cloud Models under Adversarial Attacks


Title	On Isometry Robustness of Deep 3D Point Cloud Models under Adversarial Attacks
Authors	Yue Zhao, Yuwei Wu, Caihua Chen, Andrew Lim
Abstract	While deep learning in 3D domain has achieved revolutionary performance in many tasks, the robustness of these models has not been sufficiently studied or explored. Regarding the 3D adversarial samples, most existing works focus on manipulation of local points, which may fail to invoke the global geometry properties, like robustness under linear projection that preserves the Euclidean distance, i.e., isometry. In this work, we show that existing state-of-the-art deep 3D models are extremely vulnerable to isometry transformations. Armed with the Thompson Sampling, we develop a black-box attack with success rate over 95% on ModelNet40 data set. Incorporating with the Restricted Isometry Property, we propose a novel framework of white-box attack on top of spectral norm based perturbation. In contrast to previous works, our adversarial samples are experimentally shown to be strongly transferable. Evaluated on a sequence of prevailing 3D models, our white-box attack achieves success rates from 98.88% to 100%. It maintains a successful attack rate over 95% even within an imperceptible rotation range $[\pm 2.81^{\circ}]$.
Tasks
Published	2020-02-27
URL	https://arxiv.org/abs/2002.12222v2
PDF	https://arxiv.org/pdf/2002.12222v2.pdf
PWC	https://paperswithcode.com/paper/on-isometry-robustness-of-deep-3d-point-cloud
Repo	https://github.com/skywalker6174/3d-isometry-robust
Framework	pytorch

W2S: A Joint Denoising and Super-Resolution Dataset


Title	W2S: A Joint Denoising and Super-Resolution Dataset
Authors	Ruofan Zhou, Majed El Helou, Daniel Sage, Thierry Laroche, Arne Seitz, Sabine Süsstrunk
Abstract	Denoising and super-resolution (SR) are fundamental tasks in imaging. These two restoration tasks are well covered in the literature, however, only separately. Given a noisy low-resolution (LR) input image, it is yet unclear what the best approach would be in order to obtain a noise-free high-resolution (HR) image. In order to study joint denoising and super-resolution (JDSR), a dataset containing pairs of noisy LR images and the corresponding HR images is fundamental. We propose such a novel JDSR dataset, Wieldfield2SIM (W2S), acquired using microscopy equipment and techniques. W2S is comprised of 144,000 real fluorescence microscopy images, used to form a total of 360 sets of images. A set is comprised of noisy LR images with different noise levels, a noise-free LR image, and a corresponding high-quality HR image. W2S allows us to benchmark the combinations of 6 denoising methods and 6 SR methods. We show that state-of-the-art SR networks perform very poorly on noisy inputs, with a loss reaching 14dB relative to noise-free inputs. Our evaluation also shows that applying the best denoiser in terms of reconstruction error followed by the best SR method does not yield the best result. The best denoising PSNR can, for instance, come at the expense of a loss in high frequencies, which is detrimental for SR methods. We lastly demonstrate that a light-weight SR network with a novel texture loss, trained specifically for JDSR, outperforms any combination of state-of-the-art deep denoising and SR networks.
Tasks	Denoising, Super-Resolution
Published	2020-03-12
URL	https://arxiv.org/abs/2003.05961v1
PDF	https://arxiv.org/pdf/2003.05961v1.pdf
PWC	https://paperswithcode.com/paper/w2s-a-joint-denoising-and-super-resolution
Repo	https://github.com/widefield2sim/w2s
Framework	none

The Chef’s Hat Simulation Environment for Reinforcement-Learning-Based Agents


Title	The Chef’s Hat Simulation Environment for Reinforcement-Learning-Based Agents
Authors	Pablo Barros, Anne C. Bloem, Inge M. Hootsmans, Lena M. Opheij, Romain H. A. Toebosch, Emilia Barakova, Alessandra Sciutti
Abstract	To achieve social interactions within Human-Robot Interaction (HRI) environments is a very challenging task. Most of the current research focuses on Wizard-of-Oz approaches, which neglect the recent development of intelligent robots. On the other hand, real-world scenarios usually do not provide the necessary control and reproducibility which are needed for learning algorithms. In this paper, we propose a virtual simulation environment that implements the Chef’s Hat card game, designed to be used in HRI scenarios, to provide a controllable and reproducible scenario for reinforcement-learning algorithms.
Tasks
Published	2020-03-12
URL	https://arxiv.org/abs/2003.05861v1
PDF	https://arxiv.org/pdf/2003.05861v1.pdf
PWC	https://paperswithcode.com/paper/the-chefs-hat-simulation-environment-for
Repo	https://github.com/pablovin/ChefsHatGYM
Framework	none

Pre-trained Models for Natural Language Processing: A Survey


Title	Pre-trained Models for Natural Language Processing: A Survey
Authors	Xipeng Qiu, Tianxiang Sun, Yige Xu, Yunfan Shao, Ning Dai, Xuanjing Huang
Abstract	Recently, the emergence of pre-trained models (PTMs) has brought natural language processing (NLP) to a new era. In this survey, we provide a comprehensive review of PTMs for NLP. We first briefly introduce language representation learning and its research progress. Then we systematically categorize existing PTMs based on a taxonomy with four perspectives. Next, we describe how to adapt the knowledge of PTMs to the downstream tasks. Finally, we outline some potential directions of PTMs for future research. This survey is purposed to be a hands-on guide for understanding, using, and developing PTMs for various NLP tasks.
Tasks	Representation Learning
Published	2020-03-18
URL	https://arxiv.org/abs/2003.08271v2
PDF	https://arxiv.org/pdf/2003.08271v2.pdf
PWC	https://paperswithcode.com/paper/pre-trained-models-for-natural-language
Repo	https://github.com/tomohideshibata/BERT-related-papers
Framework	pytorch

Irony Detection in a Multilingual Context


Title	Irony Detection in a Multilingual Context
Authors	Bilal Ghanem, Jihen Karoui, Farah Benamara, Paolo Rosso, Véronique Moriceau
Abstract	This paper proposes the first multilingual (French, English and Arabic) and multicultural (Indo-European languages vs. less culturally close languages) irony detection system. We employ both feature-based models and neural architectures using monolingual word representation. We compare the performance of these systems with state-of-the-art systems to identify their capabilities. We show that these monolingual models trained separately on different languages using multilingual word representation or text-based features can open the door to irony detection in languages that lack of annotated data for irony.
Tasks
Published	2020-02-06
URL	https://arxiv.org/abs/2002.02427v1
PDF	https://arxiv.org/pdf/2002.02427v1.pdf
PWC	https://paperswithcode.com/paper/irony-detection-in-a-multilingual-context
Repo	https://github.com/bilalghanem/multilingual_irony
Framework	none

Adversarial Disentanglement with Grouped Observations


Title	Adversarial Disentanglement with Grouped Observations
Authors	Jozsef Nemeth
Abstract	We consider the disentanglement of the representations of the relevant attributes of the data (content) from all other factors of variations (style) using Variational Autoencoders. Some recent works addressed this problem by utilizing grouped observations, where the content attributes are assumed to be common within each group, while there is no any supervised information on the style factors. In many cases, however, these methods fail to prevent the models from using the style variables to encode content related features as well. This work supplements these algorithms with a method that eliminates the content information in the style representations. For that purpose the training objective is augmented to minimize an appropriately defined mutual information term in an adversarial way. Experimental results and comparisons on image datasets show that the resulting method can efficiently separate the content and style related attributes and generalizes to unseen data.
Tasks
Published	2020-01-14
URL	https://arxiv.org/abs/2001.04761v1
PDF	https://arxiv.org/pdf/2001.04761v1.pdf
PWC	https://paperswithcode.com/paper/adversarial-disentanglement-with-grouped
Repo	https://github.com/jonemeth/aaai20
Framework	tf

Efficient Policy Learning from Surrogate-Loss Classification Reductions


Title	Efficient Policy Learning from Surrogate-Loss Classification Reductions
Authors	Andrew Bennett, Nathan Kallus
Abstract	Recent work on policy learning from observational data has highlighted the importance of efficient policy evaluation and has proposed reductions to weighted (cost-sensitive) classification. But, efficient policy evaluation need not yield efficient estimation of policy parameters. We consider the estimation problem given by a weighted surrogate-loss classification reduction of policy learning with any score function, either direct, inverse-propensity weighted, or doubly robust. We show that, under a correct specification assumption, the weighted classification formulation need not be efficient for policy parameters. We draw a contrast to actual (possibly weighted) binary classification, where correct specification implies a parametric model, while for policy learning it only implies a semiparametric model. In light of this, we instead propose an estimation approach based on generalized method of moments, which is efficient for the policy parameters. We propose a particular method based on recent developments on solving moment problems using neural networks and demonstrate the efficiency and regret benefits of this method empirically.
Tasks
Published	2020-02-12
URL	https://arxiv.org/abs/2002.05153v1
PDF	https://arxiv.org/pdf/2002.05153v1.pdf
PWC	https://paperswithcode.com/paper/efficient-policy-learning-from-surrogate-loss
Repo	https://github.com/CausalML/ESPRM
Framework	none

Diagnosing Colorectal Polyps in the Wild with Capsule Networks


Title	Diagnosing Colorectal Polyps in the Wild with Capsule Networks
Authors	Rodney LaLonde, Pujan Kandel, Concetto Spampinato, Michael B. Wallace, Ulas Bagci
Abstract	Colorectal cancer, largely arising from precursor lesions called polyps, remains one of the leading causes of cancer-related death worldwide. Current clinical standards require the resection and histopathological analysis of polyps due to test accuracy and sensitivity of optical biopsy methods falling substantially below recommended levels. In this study, we design a novel capsule network architecture (D-Caps) to improve the viability of optical biopsy of colorectal polyps. Our proposed method introduces several technical novelties including a novel capsule architecture with a capsule-average pooling (CAP) method to improve efficiency in large-scale image classification. We demonstrate improved results over the previous state-of-the-art convolutional neural network (CNN) approach by as much as 43%. This work provides an important benchmark on the new Mayo Polyp dataset, a significantly more challenging and larger dataset than previous polyp studies, with results stratified across all available categories, imaging devices and modalities, and focus modes to promote future direction into AI-driven colorectal cancer screening systems. Code is publicly available at https://github.com/lalonderodney/D-Caps .
Tasks	Image Classification
Published	2020-01-10
URL	https://arxiv.org/abs/2001.03305v1
PDF	https://arxiv.org/pdf/2001.03305v1.pdf
PWC	https://paperswithcode.com/paper/diagnosing-colorectal-polyps-in-the-wild-with
Repo	https://github.com/lalonderodney/D-Caps
Framework	none

Discriminator Soft Actor Critic without Extrinsic Rewards


Title	Discriminator Soft Actor Critic without Extrinsic Rewards
Authors	Daichi Nishio, Daiki Kuyoshi, Toi Tsuneda, Satoshi Yamane
Abstract	It is difficult to be able to imitate well in unknown states from a small amount of expert data and sampling data. Supervised learning methods such as Behavioral Cloning do not require sampling data, but usually suffer from distribution shift. The methods based on reinforcement learning, such as inverse reinforcement learning and generative adversarial imitation learning (GAIL), can learn from only a few expert data. However, they often need to interact with the environment. Soft Q imitation learning addressed the problems, and it was shown that it could learn efficiently by combining Behavioral Cloning and soft Q-learning with constant rewards. In order to make this algorithm more robust to distribution shift, we propose Discriminator Soft Actor Critic (DSAC). It uses a reward function based on adversarial inverse reinforcement learning instead of constant rewards. We evaluated it on PyBullet environments with only four expert trajectories.
Tasks	Imitation Learning, Q-Learning
Published	2020-01-19
URL	https://arxiv.org/abs/2001.06808v3
PDF	https://arxiv.org/pdf/2001.06808v3.pdf
PWC	https://paperswithcode.com/paper/discriminator-soft-actor-critic-without
Repo	https://github.com/dnishio/DSAC
Framework	none

Towards Automatic Face-to-Face Translation


Title	Towards Automatic Face-to-Face Translation
Authors	Prajwal K R, Rudrabha Mukhopadhyay, Jerin Philip, Abhishek Jha, Vinay Namboodiri, C. V. Jawahar
Abstract	In light of the recent breakthroughs in automatic machine translation systems, we propose a novel approach that we term as “Face-to-Face Translation”. As today’s digital communication becomes increasingly visual, we argue that there is a need for systems that can automatically translate a video of a person speaking in language A into a target language B with realistic lip synchronization. In this work, we create an automatic pipeline for this problem and demonstrate its impact on multiple real-world applications. First, we build a working speech-to-speech translation system by bringing together multiple existing modules from speech and language. We then move towards “Face-to-Face Translation” by incorporating a novel visual module, LipGAN for generating realistic talking faces from the translated audio. Quantitative evaluation of LipGAN on the standard LRW test set shows that it significantly outperforms existing approaches across all standard metrics. We also subject our Face-to-Face Translation pipeline, to multiple human evaluations and show that it can significantly improve the overall user experience for consuming and interacting with multimodal content across languages. Code, models and demo video are made publicly available. Demo video: https://www.youtube.com/watch?v=aHG6Oei8jF0 Code and models: https://github.com/Rudrabha/LipGAN
Tasks	Face to Face Translation, Machine Translation
Published	2020-03-01
URL	https://arxiv.org/abs/2003.00418v1
PDF	https://arxiv.org/pdf/2003.00418v1.pdf
PWC	https://paperswithcode.com/paper/towards-automatic-face-to-face-translation-1
Repo	https://github.com/Rudrabha/LipGAN
Framework	none