January 27, 2020

3264 words 16 mins read

Paper Group ANR 1079

Reachability and Coverage Planning for Connected Agents: Extended Version. Amora: Black-box Adversarial Morphing Attack. Dynamic Graph Attention for Referring Expression Comprehension. Learning More with Less: Conditional PGGAN-based Data Augmentation for Brain Metastases Detection Using Highly-Rough Annotation on MR Images. Detection of LDDoS Atta …

Reachability and Coverage Planning for Connected Agents: Extended Version


Title	Reachability and Coverage Planning for Connected Agents: Extended Version
Authors	Tristan Charrier, Arthur Queffelec, Ocan Sankur, François Schwarzentruber
Abstract	Motivated by the increasing appeal of robots in information-gathering missions, we study multi-agent path planning problems in which the agents must remain interconnected. We model an area by a topological graph specifying the movement and the connectivity constraints of the agents. We study the theoretical complexity of the reachability and the coverage problems of a fleet of connected agents on various classes of topological graphs. We establish the complexity of these problems on known classes, and introduce a new class called sight-moveable graphs which admit efficient algorithms.
Tasks
Published	2019-03-11
URL	http://arxiv.org/abs/1903.04300v1
PDF	http://arxiv.org/pdf/1903.04300v1.pdf
PWC	https://paperswithcode.com/paper/reachability-and-coverage-planning-for
Repo
Framework

Amora: Black-box Adversarial Morphing Attack


Title	Amora: Black-box Adversarial Morphing Attack
Authors	Run Wang, Felix Juefei-Xu, Xiaofei Xie, Lei Ma, Yihao Huang, Yang Liu
Abstract	Nowadays, digital facial content manipulation has become ubiquitous and realistic with the unprecedented success of generative adversarial networks (GANs) in image synthesis. Unfortunately, face recognition (FR) systems suffer from severe security concerns due to facial image manipulations. In this paper, we investigate and introduce a new type of adversarial attack to evade FR systems by manipulating facial content, called adversarial morphing attack (a.k.a. Amora). In contrast to adversarial noise attack that perturbs pixel intensity values by adding human-imperceptible noise, our proposed adversarial morphing attack is a semantic attack that perturbs pixels spatially in a coherent manner. To tackle the black-box attack problem, we have devised a simple yet effective joint dictionary learning pipeline to obtain a proprietary optical flow field for each attack. We have quantitatively and qualitatively demonstrated the effectiveness of our adversarial morphing attack at various levels of morphing intensity on two popular FR systems with smiling facial expression manipulations. Both open-set and closed-set experimental results indicate that a novel black-box adversarial attack based on local deformation is possible, which is vastly different from additive noise based attacks. The findings of this work may pave a new research direction towards a more thorough understanding and investigation of image-based adversarial attacks and defenses.
Tasks	Adversarial Attack, Dictionary Learning, Face Recognition, Image Generation, Optical Flow Estimation
Published	2019-12-09
URL	https://arxiv.org/abs/1912.03829v2
PDF	https://arxiv.org/pdf/1912.03829v2.pdf
PWC	https://paperswithcode.com/paper/amora-black-box-adversarial-morphing-attack
Repo
Framework

Dynamic Graph Attention for Referring Expression Comprehension


Title	Dynamic Graph Attention for Referring Expression Comprehension
Authors	Sibei Yang, Guanbin Li, Yizhou Yu
Abstract	Referring expression comprehension aims to locate the object instance described by a natural language referring expression in an image. This task is compositional and inherently requires visual reasoning on top of the relationships among the objects in the image. Meanwhile, the visual reasoning process is guided by the linguistic structure of the referring expression. However, existing approaches treat the objects in isolation or only explore the first-order relationships between objects without being aligned with the potential complexity of the expression. Thus it is hard for them to adapt to the grounding of complex referring expressions. In this paper, we explore the problem of referring expression comprehension from the perspective of language-driven visual reasoning, and propose a dynamic graph attention network to perform multi-step reasoning by modeling both the relationships among the objects in the image and the linguistic structure of the expression. In particular, we construct a graph for the image with the nodes and edges corresponding to the objects and their relationships respectively, propose a differential analyzer to predict a language-guided visual reasoning process, and perform stepwise reasoning on top of the graph to update the compound object representation at every node. Experimental results demonstrate that the proposed method can not only significantly surpass all existing state-of-the-art algorithms across three common benchmark datasets, but also generate interpretable visual evidences for stepwisely locating the objects referred to in complex language descriptions.
Tasks	Visual Reasoning
Published	2019-09-18
URL	https://arxiv.org/abs/1909.08164v1
PDF	https://arxiv.org/pdf/1909.08164v1.pdf
PWC	https://paperswithcode.com/paper/dynamic-graph-attention-for-referring
Repo
Framework

Learning More with Less: Conditional PGGAN-based Data Augmentation for Brain Metastases Detection Using Highly-Rough Annotation on MR Images


Title	Learning More with Less: Conditional PGGAN-based Data Augmentation for Brain Metastases Detection Using Highly-Rough Annotation on MR Images
Authors	Changhee Han, Kohei Murao, Tomoyuki Noguchi, Yusuke Kawata, Fumiya Uchiyama, Leonardo Rundo, Hideki Nakayama, Shin’ichi Satoh
Abstract	Accurate Computer-Assisted Diagnosis, associated with proper data wrangling, can alleviate the risk of overlooking the diagnosis in a clinical environment. Towards this, as a Data Augmentation (DA) technique, Generative Adversarial Networks (GANs) can synthesize additional training data to handle the small/fragmented medical imaging datasets collected from various scanners; those images are realistic but completely different from the original ones, filling the data lack in the real image distribution. However, we cannot easily use them to locate disease areas, considering expert physicians’ expensive annotation cost. Therefore, this paper proposes Conditional Progressive Growing of GANs (CPGGANs), incorporating highly-rough bounding box conditions incrementally into PGGANs to place brain metastases at desired positions/sizes on 256 X 256 Magnetic Resonance (MR) images, for Convolutional Neural Network-based tumor detection; this first GAN-based medical DA using automatic bounding box annotation improves the training robustness. The results show that CPGGAN-based DA can boost 10% sensitivity in diagnosis with clinically acceptable additional False Positives. Surprisingly, further tumor realism, achieved with additional normal brain MR images for CPGGAN training, does not contribute to detection performance, while even three physicians cannot accurately distinguish them from the real ones in Visual Turing Test.
Tasks	Data Augmentation
Published	2019-02-26
URL	https://arxiv.org/abs/1902.09856v5
PDF	https://arxiv.org/pdf/1902.09856v5.pdf
PWC	https://paperswithcode.com/paper/learning-more-with-less-conditional-pggan
Repo
Framework

Detection of LDDoS Attacks Based on TCP Connection Parameters


Title	Detection of LDDoS Attacks Based on TCP Connection Parameters
Authors	Michael Siracusano, Stavros Shiaeles, Bogdan Ghita
Abstract	Low-rate application layer distributed denial of service (LDDoS) attacks are both powerful and stealthy. They force vulnerable webservers to open all available connections to the adversary, denying resources to real users. Mitigation advice focuses on solutions that potentially degrade quality of service for legitimate connections. Furthermore, without accurate detection mechanisms, distributed attacks can bypass these defences. A methodology for detection of LDDoS attacks, based on characteristics of malicious TCP flows, is proposed within this paper. Research will be conducted using combinations of two datasets: one generated from a simulated network, the other from the publically available CIC DoS dataset. Both contain the attacks slowread, slowheaders and slowbody, alongside legitimate web browsing. TCP flow features are extracted from all connections. Experimentation was carried out using six supervised AI algorithms to categorise attack from legitimate flows. Decision trees and k-NN accurately classified up to 99.99% of flows, with exceptionally low false positive and false negative rates, demonstrating the potential of AI in LDDoS detection.
Tasks
Published	2019-03-12
URL	http://arxiv.org/abs/1904.01508v1
PDF	http://arxiv.org/pdf/1904.01508v1.pdf
PWC	https://paperswithcode.com/paper/detection-of-lddos-attacks-based-on-tcp
Repo
Framework

Paying More Attention to Motion: Attention Distillation for Learning Video Representations


Title	Paying More Attention to Motion: Attention Distillation for Learning Video Representations
Authors	Miao Liu, Xin Chen, Yun Zhang, Yin Li, James M. Rehg
Abstract	We address the challenging problem of learning motion representations using deep models for video recognition. To this end, we make use of attention modules that learn to highlight regions in the video and aggregate features for recognition. Specifically, we propose to leverage output attention maps as a vehicle to transfer the learned representation from a motion (flow) network to an RGB network. We systematically study the design of attention modules, and develop a novel method for attention distillation. Our method is evaluated on major action benchmarks, and consistently improves the performance of the baseline RGB network by a significant margin. Moreover, we demonstrate that our attention maps can leverage motion cues in learning to identify the location of actions in video frames. We believe our method provides a step towards learning motion-aware representations in deep models.
Tasks	Action Recognition In Videos, Video Recognition
Published	2019-04-05
URL	http://arxiv.org/abs/1904.03249v1
PDF	http://arxiv.org/pdf/1904.03249v1.pdf
PWC	https://paperswithcode.com/paper/paying-more-attention-to-motion-attention
Repo
Framework

Super-Resolution for Practical Automated Plant Disease Diagnosis System


Title	Super-Resolution for Practical Automated Plant Disease Diagnosis System
Authors	Quan Huu Cap, Hiroki Tani, Hiroyuki Uga, Satoshi Kagiwada, Hitoshi Iyatomi
Abstract	Automated plant diagnosis using images taken from a distance is often insufficient in resolution and degrades diagnostic accuracy since the important external characteristics of symptoms are lost. In this paper, we first propose an effective pre-processing method for improving the performance of automated plant disease diagnosis systems using super-resolution techniques. We investigate the efficiency of two different super-resolution methods by comparing the disease diagnostic performance on the practical original high-resolution, low-resolution, and super-resolved cucumber images. Our method generates super-resolved images that look very close to natural images with 4$\times$ upscaling factors and is capable of recovering the lost detailed symptoms, largely boosting the diagnostic performance. Our model improves the disease classification accuracy by 26.9% over the bicubic interpolation method of 65.6% and shows a small gap (3% lower) between the original result of 95.5%.
Tasks	Super-Resolution
Published	2019-11-26
URL	https://arxiv.org/abs/1911.11341v1
PDF	https://arxiv.org/pdf/1911.11341v1.pdf
PWC	https://paperswithcode.com/paper/super-resolution-for-practical-automated
Repo
Framework

Multimodal Learning For Classroom Activity Detection


Title	Multimodal Learning For Classroom Activity Detection
Authors	Hang Li, Yu Kang, Wenbiao Ding, Song Yang, Songfan Yang, Gale Yan Huang, Zitao Liu
Abstract	Classroom activity detection (CAD) focuses on accurately classifying whether the teacher or student is speaking and recording both the length of individual utterances during a class. A CAD solution helps teachers get instant feedback on their pedagogical instructions. This greatly improves educators’ teaching skills and hence leads to students’ achievement. However, CAD is very challenging because (1) the CAD model needs to be generalized well enough for different teachers and students; (2) data from both vocal and language modalities has to be wisely fused so that they can be complementary; and (3) the solution shouldn’t heavily rely on additional recording device. In this paper, we address the above challenges by using a novel attention based neural framework. Our framework not only extracts both speech and language information, but utilizes attention mechanism to capture long-term semantic dependence. Our framework is device-free and is able to take any classroom recording as input. The proposed CAD learning framework is evaluated in two real-world education applications. The experimental results demonstrate the benefits of our approach on learning attention based neural network from classroom data with different modalities, and show our approach is able to outperform state-of-the-art baselines in terms of various evaluation metrics.
Tasks	Action Detection, Activity Detection
Published	2019-10-22
URL	https://arxiv.org/abs/1910.13799v3
PDF	https://arxiv.org/pdf/1910.13799v3.pdf
PWC	https://paperswithcode.com/paper/multimodal-learning-for-classroom-activity
Repo
Framework

Semantic Nearest Neighbor Fields Monocular Edge Visual-Odometry


Title	Semantic Nearest Neighbor Fields Monocular Edge Visual-Odometry
Authors	Xiaolong Wu, Assia Benbihi, Antoine Richard, Cedric Pradalier
Abstract	Recent advances in deep learning for edge detection and segmentation opens up a new path for semantic-edge-based ego-motion estimation. In this work, we propose a robust monocular visual odometry (VO) framework using category-aware semantic edges. It can reconstruct large-scale semantic maps in challenging outdoor environments. The core of our approach is a semantic nearest neighbor field that facilitates a robust data association of edges across frames using semantics. This significantly enlarges the convergence radius during tracking phases. The proposed edge registration method can be easily integrated into direct VO frameworks to estimate photometrically, geometrically, and semantically consistent camera motions. Different types of edges are evaluated and extensive experiments demonstrate that our proposed system outperforms state-of-art indirect, direct, and semantic monocular VO systems.
Tasks	Edge Detection, Monocular Visual Odometry, Motion Estimation, Visual Odometry
Published	2019-04-01
URL	http://arxiv.org/abs/1904.00738v1
PDF	http://arxiv.org/pdf/1904.00738v1.pdf
PWC	https://paperswithcode.com/paper/semantic-nearest-neighbor-fields-monocular
Repo
Framework

A Multi-cascaded Model with Data Augmentation for Enhanced Paraphrase Detection in Short Texts


Title	A Multi-cascaded Model with Data Augmentation for Enhanced Paraphrase Detection in Short Texts
Authors	Muhammad Haroon Shakeel, Asim Karim, Imdadullah Khan
Abstract	Paraphrase detection is an important task in text analytics with numerous applications such as plagiarism detection, duplicate question identification, and enhanced customer support helpdesks. Deep models have been proposed for representing and classifying paraphrases. These models, however, require large quantities of human-labeled data, which is expensive to obtain. In this work, we present a data augmentation strategy and a multi-cascaded model for improved paraphrase detection in short texts. Our data augmentation strategy considers the notions of paraphrases and non-paraphrases as binary relations over the set of texts. Subsequently, it uses graph theoretic concepts to efficiently generate additional paraphrase and non-paraphrase pairs in a sound manner. Our multi-cascaded model employs three supervised feature learners (cascades) based on CNN and LSTM networks with and without soft-attention. The learned features, together with hand-crafted linguistic features, are then forwarded to a discriminator network for final classification. Our model is both wide and deep and provides greater robustness across clean and noisy short texts. We evaluate our approach on three benchmark datasets and show that it produces a comparable or state-of-the-art performance on all three.
Tasks	Data Augmentation
Published	2019-12-27
URL	https://arxiv.org/abs/1912.12068v1
PDF	https://arxiv.org/pdf/1912.12068v1.pdf
PWC	https://paperswithcode.com/paper/a-multi-cascaded-model-with-data-augmentation
Repo
Framework

A Deep Learning Based Chatbot for Campus Psychological Therapy


Title	A Deep Learning Based Chatbot for Campus Psychological Therapy
Authors	Junjie Yin, Zixun Chen, Kelai Zhou, Chongyuan Yu
Abstract	In this paper, we propose Evebot, an innovative, sequence to sequence (Seq2seq) based, fully generative conversational system for the diagnosis of negative emotions and prevention of depression through positively suggestive responses. The system consists of an assembly of deep-learning based models, including Bi-LSTM based model for detecting negative emotions of users and obtaining psychological counselling related corpus for training the chatbot, anti-language sequence to sequence neural network, and maximum mutual information (MMI) model. As adolescents are reluctant to show their negative emotions in physical interaction, traditional methods of emotion analysis and comforting methods may not work. Therefore, this system puts emphasis on using virtual platform to detect signs of depression or anxiety, channel adolescents’ stress and mood, and thus prevent the emergence of mental illness. We launched the integrated chatbot system onto an online platform for real-world campus applications. Through a one-month user study, we observe better results in the increase in positivity than other public chatbots in the control group.
Tasks	Chatbot, Emotion Recognition
Published	2019-10-09
URL	https://arxiv.org/abs/1910.06707v1
PDF	https://arxiv.org/pdf/1910.06707v1.pdf
PWC	https://paperswithcode.com/paper/a-deep-learning-based-chatbot-for-campus
Repo
Framework

Weakly Convex Optimization over Stiefel Manifold Using Riemannian Subgradient-Type Methods


Title	Weakly Convex Optimization over Stiefel Manifold Using Riemannian Subgradient-Type Methods
Authors	Xiao Li, Shixiang Chen, Zengde Deng, Qing Qu, Zhihui Zhu, Anthony Man Cho So
Abstract	We consider a class of nonsmooth optimization problems over the Stiefel manifold, in which the objective function is weakly convex in the ambient Euclidean space. Such problems are ubiquitous in engineering applications but still largely unexplored. We present a family of Riemannian subgradient-type methods—namely Riemannain subgradient, incremental subgradient, and stochastic subgradient methods—to solve these problems and show that they all have an iteration complexity of ${\cal O}(\varepsilon^{-4})$ for driving a natural stationarity measure below $\varepsilon$. In addition, we establish the local linear convergence of the Riemannian subgradient and incremental subgradient methods when the problem at hand further satisfies a sharpness property and the algorithms are properly initialized and use geometrically diminishing stepsizes. To the best of our knowledge, these are the first convergence guarantees for using Riemannian subgradient-type methods to optimize a class of nonconvex nonsmooth functions over the Stiefel manifold. The fundamental ingredient in the proof of the aforementioned convergence results is a new Riemannian subgradient inequality for restrictions of weakly convex functions on the Stiefel manifold, which could be of independent interest. We also show that our convergence results can be extended to handle a class of compact embedded submanifolds of the Euclidean space. Finally, we discuss the sharpness properties of various formulations of the robust subspace recovery and orthogonal dictionary learning problems and demonstrate the convergence performance of the algorithms on both problems via numerical simulations.
Tasks	Dictionary Learning
Published	2019-11-12
URL	https://arxiv.org/abs/1911.05047v3
PDF	https://arxiv.org/pdf/1911.05047v3.pdf
PWC	https://paperswithcode.com/paper/nonsmooth-optimization-over-stiefel-manifold
Repo
Framework

Exploiting multi-CNN features in CNN-RNN based Dimensional Emotion Recognition on the OMG in-the-wild Dataset


Title	Exploiting multi-CNN features in CNN-RNN based Dimensional Emotion Recognition on the OMG in-the-wild Dataset
Authors	Dimitrios Kollias, Stefanos Zafeiriou
Abstract	This paper presents a novel CNN-RNN based approach, which exploits multiple CNN features for dimensional emotion recognition in-the-wild, utilizing the One-Minute Gradual-Emotion (OMG-Emotion) dataset. Our approach includes first pre-training with the relevant and large in size, Aff-Wild and Aff-Wild2 emotion databases. Low-, mid- and high-level features are extracted from the trained CNN component and are exploited by RNN subnets in a multi-task framework. Their outputs constitute an intermediate level prediction; final estimates are obtained as the mean or median values of these predictions. Fusion of the networks is also examined for boosting the obtained performance, at Decision-, or at Model-level; in the latter case a RNN was used for the fusion. Our approach, although using only the visual modality, outperformed state-of-the-art methods that utilized audio and visual modalities. Some of our developments have been submitted to the OMG-Emotion Challenge, ranking second among the technologies which used only visual information for valence estimation; ranking third overall. Through extensive experimentation, we further show that arousal estimation is greatly improved when low-level features are combined with high-level ones.
Tasks	Emotion Recognition
Published	2019-10-03
URL	https://arxiv.org/abs/1910.01417v1
PDF	https://arxiv.org/pdf/1910.01417v1.pdf
PWC	https://paperswithcode.com/paper/exploiting-multi-cnn-features-in-cnn-rnn
Repo
Framework


Title	Automatic Group Cohesiveness Detection With Multi-modal Features
Authors	Bin Zhu, Xin Guo, Kenneth Barner, Charles Boncelet
Abstract	Group cohesiveness is a compelling and often studied composition in group dynamics and group performance. The enormous number of web images of groups of people can be used to develop an effective method to detect group cohesiveness. This paper introduces an automatic group cohesiveness prediction method for the 7th Emotion Recognition in the Wild (EmotiW 2019) Grand Challenge in the category of Group-based Cohesion Prediction. The task is to predict the cohesive level for a group of people in images. To tackle this problem, a hybrid network including regression models which are separately trained on face features, skeleton features, and scene features is proposed. Predicted regression values, corresponding to each feature, are fused for the final cohesive intensity. Experimental results demonstrate that the proposed hybrid network is effective and makes promising improvements. A mean squared error (MSE) of 0.444 is achieved on the testing sets which outperforms the baseline MSE of 0.5.
Tasks	Emotion Recognition
Published	2019-10-02
URL	https://arxiv.org/abs/1910.01197v1
PDF	https://arxiv.org/pdf/1910.01197v1.pdf
PWC	https://paperswithcode.com/paper/automatic-group-cohesiveness-detection-with
Repo
Framework

Sparse Coding on Cascaded Residuals


Title	Sparse Coding on Cascaded Residuals
Authors	Tong Zhang, Fatih Porikli
Abstract	This paper seeks to combine dictionary learning and hierarchical image representation in a principled way. To make dictionary atoms capturing additional information from extended receptive fields and attain improved descriptive capacity, we present a two-pass multi-resolution cascade framework for dictionary learning and sparse coding. The cascade allows collaborative reconstructions at different resolutions using the same dimensional dictionary atoms. Our jointly learned dictionary comprises atoms that adapt to the information available at the coarsest layer where the support of atoms reaches their maximum range and the residual images where the supplementary details progressively refine the reconstruction objective. The residual at a layer is computed by the difference between the aggregated reconstructions of the previous layers and the downsampled original image at that layer. Our method generates more flexible and accurate representations using much less number of coefficients. Its computational efficiency stems from encoding at the coarsest resolution, which is minuscule, and encoding the residuals, which are relatively much sparse. Our extensive experiments on multiple datasets demonstrate that this new method is powerful in image coding, denoising, inpainting and artifact removal tasks outperforming the state-of-the-art techniques.
Tasks	Denoising, Dictionary Learning
Published	2019-11-07
URL	https://arxiv.org/abs/1911.02749v1
PDF	https://arxiv.org/pdf/1911.02749v1.pdf
PWC	https://paperswithcode.com/paper/sparse-coding-on-cascaded-residuals
Repo
Framework