January 31, 2020

3089 words 15 mins read

Paper Group AWR 396

Power up! Robust Graph Convolutional Network against Evasion Attacks based on Graph Powering. Jointly Learning Semantic Parser and Natural Language Generator via Dual Information Maximization. The State of Knowledge Distillation for Classification. Learning an Urban Air Mobility Encounter Model from Expert Preferences. A Similarity Measure for Mate …

Power up! Robust Graph Convolutional Network against Evasion Attacks based on Graph Powering


Title	Power up! Robust Graph Convolutional Network against Evasion Attacks based on Graph Powering
Authors	Ming Jin, Heng Chang, Wenwu Zhu, Somayeh Sojoudi
Abstract	Graph convolutional networks (GCNs) are powerful tools for graph-structured data. However, they have been recently shown to be prone to topological attacks. Despite substantial efforts to search for new architectures, it still remains a challenge to improve performance in both benign and adversarial situations simultaneously. In this paper, we re-examine the fundamental building block of GCN—the Laplacian operator—and highlight some basic flaws in the spatial and spectral domains. As an alternative, we propose an operator based on graph powering, and prove that it enjoys a desirable property of “spectral separation.” Based on the operator, we propose a robust learning paradigm, where the network is trained on a family of “‘smoothed” graphs that span a spatial and spectral range for generalizability. We also use the new operator in replacement of the classical Laplacian to construct an architecture with improved spectral robustness, expressivity and interpretability. The enhanced performance and robustness are demonstrated in extensive experiments.
Tasks
Published	2019-05-24
URL	https://arxiv.org/abs/1905.10029v1
PDF	https://arxiv.org/pdf/1905.10029v1.pdf
PWC	https://paperswithcode.com/paper/power-up-robust-graph-convolutional-network
Repo	https://github.com/GraphReshape/GraphReshape
Framework	pytorch

Jointly Learning Semantic Parser and Natural Language Generator via Dual Information Maximization


Title	Jointly Learning Semantic Parser and Natural Language Generator via Dual Information Maximization
Authors	Hai Ye, Wenjie Li, Lu Wang
Abstract	Semantic parsing aims to transform natural language (NL) utterances into formal meaning representations (MRs), whereas an NL generator achieves the reverse: producing a NL description for some given MRs. Despite this intrinsic connection, the two tasks are often studied separately in prior work. In this paper, we model the duality of these two tasks via a joint learning framework, and demonstrate its effectiveness of boosting the performance on both tasks. Concretely, we propose a novel method of dual information maximization (DIM) to regularize the learning process, where DIM empirically maximizes the variational lower bounds of expected joint distributions of NL and MRs. We further extend DIM to a semi-supervision setup (SemiDIM), which leverages unlabeled data of both tasks. Experiments on three datasets of dialogue management and code generation (and summarization) show that performance on both semantic parsing and NL generation can be consistently improved by DIM, in both supervised and semi-supervised setups.
Tasks	Code Generation, Dialogue Management, Semantic Parsing
Published	2019-06-03
URL	https://arxiv.org/abs/1906.00575v3
PDF	https://arxiv.org/pdf/1906.00575v3.pdf
PWC	https://paperswithcode.com/paper/190600575
Repo	https://github.com/oceanypt/DIM
Framework	none

The State of Knowledge Distillation for Classification


Title	The State of Knowledge Distillation for Classification
Authors	Fabian Ruffy, Karanbir Chahal
Abstract	We survey various knowledge distillation (KD) strategies for simple classification tasks and implement a set of techniques that claim state-of-the-art accuracy. Our experiments using standardized model architectures, fixed compute budgets, and consistent training schedules indicate that many of these distillation results are hard to reproduce. This is especially apparent with methods using some form of feature distillation. Further examination reveals a lack of generalizability where these techniques may only succeed for specific architectures and training settings. We observe that appropriately tuned classical distillation in combination with a data augmentation training scheme gives an orthogonal improvement over other techniques. We validate this approach and open-source our code.
Tasks	Data Augmentation
Published	2019-12-20
URL	https://arxiv.org/abs/1912.10850v1
PDF	https://arxiv.org/pdf/1912.10850v1.pdf
PWC	https://paperswithcode.com/paper/the-state-of-knowledge-distillation-for
Repo	https://github.com/karanchahal/distiller
Framework	pytorch

Learning an Urban Air Mobility Encounter Model from Expert Preferences


Title	Learning an Urban Air Mobility Encounter Model from Expert Preferences
Authors	Sydney M. Katz, Anne-Claire Le Bihan, Mykel J. Kochenderfer
Abstract	Airspace models have played an important role in the development and evaluation of aircraft collision avoidance systems for both manned and unmanned aircraft. As Urban Air Mobility (UAM) systems are being developed, we need new encounter models that are representative of their operational environment. Developing such models is challenging due to the lack of data on UAM behavior in the airspace. While previous encounter models for other aircraft types rely on large datasets to produce realistic trajectories, this paper presents an approach to encounter modeling that instead relies on expert knowledge. In particular, recent advances in preference-based learning are extended to tune an encounter model from expert preferences. The model takes the form of a stochastic policy for a Markov decision process (MDP) in which the reward function is learned from pairwise queries of a domain expert. We evaluate the performance of two querying methods that seek to maximize the information obtained from each query. Ultimately, we demonstrate a method for generating realistic encounter trajectories with only a few minutes of an expert’s time.
Tasks
Published	2019-07-12
URL	https://arxiv.org/abs/1907.05575v1
PDF	https://arxiv.org/pdf/1907.05575v1.pdf
PWC	https://paperswithcode.com/paper/learning-an-urban-air-mobility-encounter
Repo	https://github.com/sisl/UAMPreferences
Framework	none

A Similarity Measure for Material Appearance


Title	A Similarity Measure for Material Appearance
Authors	Manuel Lagunas, Sandra Malpica, Ana Serrano, Elena Garces, Diego Gutierrez, Belen Masia
Abstract	We present a model to measure the similarity in appearance between different materials, which correlates with human similarity judgments. We first create a database of 9,000 rendered images depicting objects with varying materials, shape and illumination. We then gather data on perceived similarity from crowdsourced experiments; our analysis of over 114,840 answers suggests that indeed a shared perception of appearance similarity exists. We feed this data to a deep learning architecture with a novel loss function, which learns a feature space for materials that correlates with such perceived appearance similarity. Our evaluation shows that our model outperforms existing metrics. Last, we demonstrate several applications enabled by our metric, including appearance-based search for material suggestions, database visualization, clustering and summarization, and gamut mapping.
Tasks	Image Similarity Search
Published	2019-05-04
URL	https://arxiv.org/abs/1905.01562v1
PDF	https://arxiv.org/pdf/1905.01562v1.pdf
PWC	https://paperswithcode.com/paper/a-similarity-measure-for-material-appearance
Repo	https://github.com/mlagunas/material-appearance-similarity
Framework	pytorch

Audio Caption: Listen and Tell


Title	Audio Caption: Listen and Tell
Authors	Mengyue Wu, Heinrich Dinkel, Kai Yu
Abstract	Increasing amount of research has shed light on machine perception of audio events, most of which concerns detection and classification tasks. However, human-like perception of audio scenes involves not only detecting and classifying audio sounds, but also summarizing the relationship between different audio events. Comparable research such as image caption has been conducted, yet the audio field is still quite barren. This paper introduces a manually-annotated dataset for audio caption. The purpose is to automatically generate natural sentences for audio scene description and to bridge the gap between machine perception of audio and image. The whole dataset is labelled in Mandarin and we also include translated English annotations. A baseline encoder-decoder model is provided for both English and Mandarin. Similar BLEU scores are derived for both languages: our model can generate understandable and data-related captions based on the dataset.
Tasks
Published	2019-02-25
URL	https://arxiv.org/abs/1902.09254v4
PDF	https://arxiv.org/pdf/1902.09254v4.pdf
PWC	https://paperswithcode.com/paper/audio-caption-listen-and-tell
Repo	https://github.com/richermans/AudioCaption
Framework	pytorch

Data augmentation using learned transformations for one-shot medical image segmentation


Title	Data augmentation using learned transformations for one-shot medical image segmentation
Authors	Amy Zhao, Guha Balakrishnan, Frédo Durand, John V. Guttag, Adrian V. Dalca
Abstract	Image segmentation is an important task in many medical applications. Methods based on convolutional neural networks attain state-of-the-art accuracy; however, they typically rely on supervised training with large labeled datasets. Labeling medical images requires significant expertise and time, and typical hand-tuned approaches for data augmentation fail to capture the complex variations in such images. We present an automated data augmentation method for synthesizing labeled medical images. We demonstrate our method on the task of segmenting magnetic resonance imaging (MRI) brain scans. Our method requires only a single segmented scan, and leverages other unlabeled scans in a semi-supervised approach. We learn a model of transformations from the images, and use the model along with the labeled example to synthesize additional labeled examples. Each transformation is comprised of a spatial deformation field and an intensity change, enabling the synthesis of complex effects such as variations in anatomy and image acquisition procedures. We show that training a supervised segmenter with these new examples provides significant improvements over state-of-the-art methods for one-shot biomedical image segmentation. Our code is available at https://github.com/xamyzhao/brainstorm.
Tasks	Data Augmentation, Medical Image Segmentation, Semantic Segmentation
Published	2019-02-25
URL	http://arxiv.org/abs/1902.09383v2
PDF	http://arxiv.org/pdf/1902.09383v2.pdf
PWC	https://paperswithcode.com/paper/data-augmentation-using-learned-transforms
Repo	https://github.com/xamyzhao/brainstorm
Framework	tf

DialogueGCN: A Graph Convolutional Neural Network for Emotion Recognition in Conversation


Title	DialogueGCN: A Graph Convolutional Neural Network for Emotion Recognition in Conversation
Authors	Deepanway Ghosal, Navonil Majumder, Soujanya Poria, Niyati Chhaya, Alexander Gelbukh
Abstract	Emotion recognition in conversation (ERC) has received much attention, lately, from researchers due to its potential widespread applications in diverse areas, such as health-care, education, and human resources. In this paper, we present Dialogue Graph Convolutional Network (DialogueGCN), a graph neural network based approach to ERC. We leverage self and inter-speaker dependency of the interlocutors to model conversational context for emotion recognition. Through the graph network, DialogueGCN addresses context propagation issues present in the current RNN-based methods. We empirically show that this method alleviates such issues, while outperforming the current state of the art on a number of benchmark emotion classification datasets.
Tasks	Emotion Classification, Emotion Recognition, Emotion Recognition in Conversation
Published	2019-08-30
URL	https://arxiv.org/abs/1908.11540v1
PDF	https://arxiv.org/pdf/1908.11540v1.pdf
PWC	https://paperswithcode.com/paper/dialoguegcn-a-graph-convolutional-neural
Repo	https://github.com/SenticNet/conv-emotion
Framework	pytorch

Monocular 3D Object Detection with Pseudo-LiDAR Point Cloud


Title	Monocular 3D Object Detection with Pseudo-LiDAR Point Cloud
Authors	Xinshuo Weng, Kris Kitani
Abstract	Monocular 3D scene understanding tasks, such as object size estimation, heading angle estimation and 3D localization, is challenging. Successful modern day methods for 3D scene understanding require the use of a 3D sensor. On the other hand, single image based methods have significantly worse performance. In this work, we aim at bridging the performance gap between 3D sensing and 2D sensing for 3D object detection by enhancing LiDAR-based algorithms to work with single image input. Specifically, we perform monocular depth estimation and lift the input image to a point cloud representation, which we call pseudo-LiDAR point cloud. Then we can train a LiDAR-based 3D detection network with our pseudo-LiDAR end-to-end. Following the pipeline of two-stage 3D detection algorithms, we detect 2D object proposals in the input image and extract a point cloud frustum from the pseudo-LiDAR for each proposal. Then an oriented 3D bounding box is detected for each frustum. To handle the large amount of noise in the pseudo-LiDAR, we propose two innovations: (1) use a 2D-3D bounding box consistency constraint, adjusting the predicted 3D bounding box to have a high overlap with its corresponding 2D proposal after projecting onto the image; (2) use the instance mask instead of the bounding box as the representation of 2D proposals, in order to reduce the number of points not belonging to the object in the point cloud frustum. Through our evaluation on the KITTI benchmark, we achieve the top-ranked performance on both bird’s eye view and 3D object detection among all monocular methods, effectively quadrupling the performance over previous state-of-the-art. Our code is available at https://github.com/xinshuoweng/Mono3D_PLiDAR.
Tasks	3D Object Detection, Depth Estimation, Monocular Depth Estimation, Object Detection, Scene Understanding
Published	2019-03-23
URL	https://arxiv.org/abs/1903.09847v4
PDF	https://arxiv.org/pdf/1903.09847v4.pdf
PWC	https://paperswithcode.com/paper/monocular-3d-object-detection-with-pseudo
Repo	https://github.com/xinshuoweng/mono3D_PLiDAR
Framework	pytorch

Dual Student: Breaking the Limits of the Teacher in Semi-supervised Learning


Title	Dual Student: Breaking the Limits of the Teacher in Semi-supervised Learning
Authors	Zhanghan Ke, Daoye Wang, Qiong Yan, Jimmy Ren, Rynson W. H. Lau
Abstract	Recently, consistency-based methods have achieved state-of-the-art results in semi-supervised learning (SSL). These methods always involve two roles, an explicit or implicit teacher model and a student model, and penalize predictions under different perturbations by a consistency constraint. However, the weights of these two roles are tightly coupled since the teacher is essentially an exponential moving average (EMA) of the student. In this work, we show that the coupled EMA teacher causes a performance bottleneck. To address this problem, we introduce Dual Student, which replaces the teacher with another student. We also define a novel concept, stable sample, following which a stabilization constraint is designed for our structure to be trainable. Further, we discuss two variants of our method, which produce even higher performance. Extensive experiments show that our method improves the classification performance significantly on several main SSL benchmarks. Specifically, it reduces the error rate of the 13-layer CNN from 16.84% to 12.39% on CIFAR-10 with 1k labels and from 34.10% to 31.56% on CIFAR-100 with 10k labels. In addition, our method also achieves a clear improvement in domain adaptation.
Tasks	Semi-Supervised Image Classification, Unsupervised Domain Adaptation
Published	2019-09-03
URL	https://arxiv.org/abs/1909.01804v1
PDF	https://arxiv.org/pdf/1909.01804v1.pdf
PWC	https://paperswithcode.com/paper/dual-student-breaking-the-limits-of-the
Repo	https://github.com/ZHKKKe/DualStudent
Framework	pytorch

Semi-Supervised Learning with Normalizing Flows


Title	Semi-Supervised Learning with Normalizing Flows
Authors	Pavel Izmailov, Polina Kirichenko, Marc Finzi, Andrew Gordon Wilson
Abstract	Normalizing flows transform a latent distribution through an invertible neural network for a flexible and pleasingly simple approach to generative modelling, while preserving an exact likelihood. We propose FlowGMM, an end-to-end approach to generative semi supervised learning with normalizing flows, using a latent Gaussian mixture model. FlowGMM is distinct in its simplicity, unified treatment of labelled and unlabelled data with an exact likelihood, interpretability, and broad applicability beyond image data. We show promising results on a wide range of applications, including AG-News and Yahoo Answers text data, tabular data, and semi-supervised image classification. We also show that FlowGMM can discover interpretable structure, provide real-time optimization-free feature visualizations, and specify well calibrated predictive distributions.
Tasks	Image Classification, Semi-Supervised Image Classification
Published	2019-12-30
URL	https://arxiv.org/abs/1912.13025v1
PDF	https://arxiv.org/pdf/1912.13025v1.pdf
PWC	https://paperswithcode.com/paper/semi-supervised-learning-with-normalizing-1
Repo	https://github.com/izmailovpavel/flowgmm
Framework	pytorch

Exploring Self-Supervised Regularization for Supervised and Semi-Supervised Learning


Title	Exploring Self-Supervised Regularization for Supervised and Semi-Supervised Learning
Authors	Phi Vu Tran
Abstract	Recent advances in semi-supervised learning have shown tremendous potential in overcoming a major barrier to the success of modern machine learning algorithms: access to vast amounts of human-labeled training data. Previous algorithms based on consistency regularization can harness the abundance of unlabeled data to produce impressive results on a number of semi-supervised benchmarks, approaching the performance of strong supervised baselines using only a fraction of the available labeled data. In this work, we challenge the long-standing success of consistency regularization by introducing self-supervised regularization as the basis for combining semantic feature representations from unlabeled data. We perform extensive comparative experiments to demonstrate the effectiveness of self-supervised regularization for supervised and semi-supervised image classification on SVHN, CIFAR-10, and CIFAR-100 benchmark datasets. We present two main results: (1) models augmented with self-supervised regularization significantly improve upon traditional supervised classifiers without the need for unlabeled data; (2) together with unlabeled data, our models yield semi-supervised performance competitive with, and in many cases exceeding, prior state-of-the-art consistency baselines. Lastly, our models have the practical utility of being efficiently trained end-to-end and require no additional hyper-parameters to tune for optimal performance beyond the standard set for training neural networks. Reference code and data are available at https://github.com/vuptran/sesemi
Tasks	Image Classification, Multi-Task Learning, Semi-Supervised Image Classification
Published	2019-06-25
URL	https://arxiv.org/abs/1906.10343v2
PDF	https://arxiv.org/pdf/1906.10343v2.pdf
PWC	https://paperswithcode.com/paper/semi-supervised-learning-with-self-supervised
Repo	https://github.com/vuptran/sesemi
Framework	tf

Addressing Overfitting on Pointcloud Classification using Atrous XCRF


Title	Addressing Overfitting on Pointcloud Classification using Atrous XCRF
Authors	Hasan Asyari Arief, Ulf Geir Indahl, Geir-Harald Strand, Håvard Tveite
Abstract	Advances in techniques for automated classification of pointcloud data introduce great opportunities for many new and existing applications. However, with a limited number of labeled points, automated classification by a machine learning model is prone to overfitting and poor generalization. The present paper addresses this problem by inducing controlled noise (on a trained model) generated by invoking conditional random field similarity penalties using nearby features. The method is called Atrous XCRF and works by forcing a trained model to respect the similarity penalties provided by unlabeled data. In a benchmark study carried out using the ISPRS 3D labeling dataset, our technique achieves 84.97% in term of overall accuracy, and 71.05% in term of F1 score. The result is on par with the current best model for the benchmark dataset and has the highest value in term of F1 score.
Tasks
Published	2019-02-08
URL	http://arxiv.org/abs/1902.03088v1
PDF	http://arxiv.org/pdf/1902.03088v1.pdf
PWC	https://paperswithcode.com/paper/addressing-overfitting-on-pointcloud
Repo	https://github.com/hasanari/A-XCRF
Framework	tf

A Generative Map for Image-based Camera Localization


Title	A Generative Map for Image-based Camera Localization
Authors	Mingpan Guo, Stefan Matthes, Jiaojiao Ye, Hao Shen
Abstract	In image-based camera localization systems, information about the environment is usually stored in some representation, which can be referred to as a map. Conventionally, most maps are built upon hand-crafted features. Recently, neural networks have attracted attention as a data-driven map representation, and have shown promising results in visual localization. However, these neural network maps are generally hard to interpret by human. A readable map is not only accessible to humans, but also provides a way to be verified when the ground truth pose is unavailable. To tackle this problem, we propose Generative Map, a new framework for learning human-readable neural network maps, by combining a generative model with the Kalman filter, which also allows it to incorporate additional sensor information such as stereo visual odometry. For evaluation, we use real world images from the 7-Scenes and Oxford RobotCar datasets. We demonstrate that our Generative Map can be queried with a pose of interest from the test sequence to predict an image, which closely resembles the true scene. For localization, we show that Generative Map achieves comparable performance with current regression models. Moreover, our framework is trained completely from scratch, unlike regression models which rely on large ImageNet pretrained networks.
Tasks	Camera Localization, Visual Localization, Visual Odometry
Published	2019-02-18
URL	http://arxiv.org/abs/1902.11124v4
PDF	http://arxiv.org/pdf/1902.11124v4.pdf
PWC	https://paperswithcode.com/paper/a-generative-map-for-image-based-camera
Repo	https://github.com/Mingpan/generative_map
Framework	tf

Knowledge-Enriched Transformer for Emotion Detection in Textual Conversations


Title	Knowledge-Enriched Transformer for Emotion Detection in Textual Conversations
Authors	Peixiang Zhong, Di Wang, Chunyan Miao
Abstract	Messages in human conversations inherently convey emotions. The task of detecting emotions in textual conversations leads to a wide range of applications such as opinion mining in social networks. However, enabling machines to analyze emotions in conversations is challenging, partly because humans often rely on the context and commonsense knowledge to express emotions. In this paper, we address these challenges by proposing a Knowledge-Enriched Transformer (KET), where contextual utterances are interpreted using hierarchical self-attention and external commonsense knowledge is dynamically leveraged using a context-aware affective graph attention mechanism. Experiments on multiple textual conversation datasets demonstrate that both context and commonsense knowledge are consistently beneficial to the emotion detection performance. In addition, the experimental results show that our KET model outperforms the state-of-the-art models on most of the tested datasets in F1 score.
Tasks	Emotion Recognition in Conversation, Opinion Mining
Published	2019-09-24
URL	https://arxiv.org/abs/1909.10681v2
PDF	https://arxiv.org/pdf/1909.10681v2.pdf
PWC	https://paperswithcode.com/paper/knowledge-enriched-transformer-for-emotion
Repo	https://github.com/zhongpeixiang/KET
Framework	pytorch