April 3, 2020

3232 words 16 mins read

Paper Group AWR 31

Adversarial Continual Learning. State-Aware Tracker for Real-Time Video Object Segmentation. FedDANE: A Federated Newton-Type Method. Gimme Signals: Discriminative signal encoding for multimodal activity recognition. On the Road with 16 Neurons: Mental Imagery with Bio-inspired Deep Neural Networks. Adapting Deep Learning for Sentiment Classificati …

Adversarial Continual Learning


Title	Adversarial Continual Learning
Authors	Sayna Ebrahimi, Franziska Meier, Roberto Calandra, Trevor Darrell, Marcus Rohrbach
Abstract	Continual learning aims to learn new tasks without forgetting previously learned ones. We hypothesize that representations learned to solve each task in a sequence have a shared structure while containing some task-specific properties. We show that shared features are significantly less prone to forgetting and propose a novel hybrid continual learning framework that learns a disjoint representation for task-invariant and task-specific features required to solve a sequence of tasks. Our model combines architecture growth to prevent forgetting of task-specific skills and an experience replay approach to preserve shared skills. We demonstrate our hybrid approach is effective in avoiding forgetting and show it is superior to both architecture-based and memory-based approaches on class incrementally learning of a single dataset as well as a sequence of multiple datasets in image classification. Our code is available at \url{https://github.com/facebookresearch/Adversarial-Continual-Learning}.
Tasks	Continual Learning, Image Classification
Published	2020-03-21
URL	https://arxiv.org/abs/2003.09553v1
PDF	https://arxiv.org/pdf/2003.09553v1.pdf
PWC	https://paperswithcode.com/paper/adversarial-continual-learning
Repo	https://github.com/facebookresearch/Adversarial-Continual-Learning
Framework	pytorch

State-Aware Tracker for Real-Time Video Object Segmentation


Title	State-Aware Tracker for Real-Time Video Object Segmentation
Authors	Xi Chen, Zuoxin Li, Ye Yuan, Gang Yu, Jianxin Shen, Donglian Qi
Abstract	In this work, we address the task of semi-supervised video object segmentation(VOS) and explore how to make efficient use of video property to tackle the challenge of semi-supervision. We propose a novel pipeline called State-Aware Tracker(SAT), which can produce accurate segmentation results with real-time speed. For higher efficiency, SAT takes advantage of the inter-frame consistency and deals with each target object as a tracklet. For more stable and robust performance over video sequences, SAT gets awareness for each state and makes self-adaptation via two feedback loops. One loop assists SAT in generating more stable tracklets. The other loop helps to construct a more robust and holistic target representation. SAT achieves a promising result of 72.3% J&F mean with 39 FPS on DAVIS2017-Val dataset, which shows a decent trade-off between efficiency and accuracy. Code will be released at github.com/MegviiDetection/video_analyst.
Tasks	Semantic Segmentation, Semi-supervised Video Object Segmentation, Video Object Segmentation, Video Semantic Segmentation
Published	2020-03-01
URL	https://arxiv.org/abs/2003.00482v1
PDF	https://arxiv.org/pdf/2003.00482v1.pdf
PWC	https://paperswithcode.com/paper/state-aware-tracker-for-real-time-video
Repo	https://github.com/MegviiDetection/video_analyst
Framework	none

FedDANE: A Federated Newton-Type Method


Title	FedDANE: A Federated Newton-Type Method
Authors	Tian Li, Anit Kumar Sahu, Manzil Zaheer, Maziar Sanjabi, Ameet Talwalkar, Virginia Smith
Abstract	Federated learning aims to jointly learn statistical models over massively distributed remote devices. In this work, we propose FedDANE, an optimization method that we adapt from DANE, a method for classical distributed optimization, to handle the practical constraints of federated learning. We provide convergence guarantees for this method when learning over both convex and non-convex functions. Despite encouraging theoretical results, we find that the method has underwhelming performance empirically. In particular, through empirical simulations on both synthetic and real-world datasets, FedDANE consistently underperforms baselines of FedAvg and FedProx in realistic federated settings. We identify low device participation and statistical device heterogeneity as two underlying causes of this underwhelming performance, and conclude by suggesting several directions of future work.
Tasks	Distributed Optimization
Published	2020-01-07
URL	https://arxiv.org/abs/2001.01920v1
PDF	https://arxiv.org/pdf/2001.01920v1.pdf
PWC	https://paperswithcode.com/paper/feddane-a-federated-newton-type-method
Repo	https://github.com/litian96/FedDANE
Framework	none

Gimme Signals: Discriminative signal encoding for multimodal activity recognition


Title	Gimme Signals: Discriminative signal encoding for multimodal activity recognition
Authors	Raphael Memmesheimer, Nick Theisen, Dietrich Paulus
Abstract	We present a simple, yet effective and flexible method for action recognition supporting multiple sensor modalities. Multivariate signal sequences are encoded in an image and are then classified using a recently proposed EfficientNet CNN architecture. Our focus was to find an approach that generalizes well across different sensor modalities without specific adaptions while still achieving good results. We apply our method to 4 action recognition datasets containing skeleton sequences, inertial and motion capturing measurements as well as \wifi fingerprints that range up to 120 action classes. Our method defines the current best CNN-based approach on the NTU RGB+D 120 dataset, lifts the state of the art on the ARIL Wi-Fi dataset by +6.78%, improves the UTD-MHAD inertial baseline by +14.4%, the UTD-MHAD skeleton baseline by 1.13% and achieves 96.11% on the Simitate motion capturing data (80/20 split). We further demonstrate experiments on both, modality fusion on a signal level and signal reduction to prevent the representation from overloading.
Tasks	Action Recognition In Videos, Activity Recognition, Multimodal Activity Recognition
Published	2020-03-13
URL	https://arxiv.org/abs/2003.06156v1
PDF	https://arxiv.org/pdf/2003.06156v1.pdf
PWC	https://paperswithcode.com/paper/gimme-signals-discriminative-signal-encoding
Repo	https://github.com/airglow/gimme_signals_action_recognition
Framework	pytorch

On the Road with 16 Neurons: Mental Imagery with Bio-inspired Deep Neural Networks


Title	On the Road with 16 Neurons: Mental Imagery with Bio-inspired Deep Neural Networks
Authors	Alice Plebe, Mauro Da Lio
Abstract	This paper proposes a strategy for visual prediction in the context of autonomous driving. Humans, when not distracted or drunk, are still the best drivers you can currently find. For this reason we take inspiration from two theoretical ideas about the human mind and its neural organization. The first idea concerns how the brain uses a hierarchical structure of neuron ensembles to extract abstract concepts from visual experience and code them into compact representations. The second idea suggests that these neural perceptual representations are not neutral but functional to the prediction of the future state of affairs in the environment. Similarly, the prediction mechanism is not neutral but oriented to the current planning of a future action. We identify within the deep learning framework two artificial counterparts of the aforementioned neurocognitive theories. We find a correspondence between the first theoretical idea and the architecture of convolutional autoencoders, while we translate the second theory into a training procedure that learns compact representations which are not neutral but oriented to driving tasks, from two distinct perspectives. From a static perspective, we force groups of neural units in the compact representations to distinctly represent specific concepts crucial to the driving task. From a dynamic perspective, we encourage the compact representations to be predictive of how the current road scenario will change in the future. We successfully learn compact representations that use as few as 16 neural units for each of the two basic driving concepts we consider: car and lane. We prove the efficiency of our proposed perceptual representations on the SYNTHIA dataset. Our source code is available at https://github.com/3lis/rnn_vae
Tasks	Autonomous Driving
Published	2020-03-09
URL	https://arxiv.org/abs/2003.08745v1
PDF	https://arxiv.org/pdf/2003.08745v1.pdf
PWC	https://paperswithcode.com/paper/on-the-road-with-16-neurons-mental-imagery
Repo	https://github.com/3lis/rnn_vae
Framework	tf

Adapting Deep Learning for Sentiment Classification of Code-Switched Informal Short Text


Title	Adapting Deep Learning for Sentiment Classification of Code-Switched Informal Short Text
Authors	Muhammad Haroon Shakeel, Asim Karim
Abstract	Nowadays, an abundance of short text is being generated that uses nonstandard writing styles influenced by regional languages. Such informal and code-switched content are under-resourced in terms of labeled datasets and language models even for popular tasks like sentiment classification. In this work, we (1) present a labeled dataset called MultiSenti for sentiment classification of code-switched informal short text, (2) explore the feasibility of adapting resources from a resource-rich language for an informal one, and (3) propose a deep learning-based model for sentiment classification of code-switched informal short text. We aim to achieve this without any lexical normalization, language translation, or code-switching indication. The performance of the proposed models is compared with three existing multilingual sentiment classification models. The results show that the proposed model performs better in general and adapting character-based embeddings yield equivalent performance while being computationally more efficient than training word-based domain-specific embeddings.
Tasks	Lexical Normalization, Sentiment Analysis
Published	2020-01-04
URL	https://arxiv.org/abs/2001.01047v1
PDF	https://arxiv.org/pdf/2001.01047v1.pdf
PWC	https://paperswithcode.com/paper/adapting-deep-learning-for-sentiment
Repo	https://github.com/haroonshakeel/multisenti
Framework	none

CNN-based Density Estimation and Crowd Counting: A Survey


Title	CNN-based Density Estimation and Crowd Counting: A Survey
Authors	Guangshuai Gao, Junyu Gao, Qingjie Liu, Qi Wang, Yunhong Wang
Abstract	Accurately estimating the number of objects in a single image is a challenging yet meaningful task and has been applied in many applications such as urban planning and public safety. In the various object counting tasks, crowd counting is particularly prominent due to its specific significance to social security and development. Fortunately, the development of the techniques for crowd counting can be generalized to other related fields such as vehicle counting and environment survey, if without taking their characteristics into account. Therefore, many researchers are devoting to crowd counting, and many excellent works of literature and works have spurted out. In these works, they are must be helpful for the development of crowd counting. However, the question we should consider is why they are effective for this task. Limited by the cost of time and energy, we cannot analyze all the algorithms. In this paper, we have surveyed over 220 works to comprehensively and systematically study the crowd counting models, mainly CNN-based density map estimation methods. Finally, according to the evaluation metrics, we select the top three performers on their crowd counting datasets and analyze their merits and drawbacks. Through our analysis, we expect to make reasonable inference and prediction for the future development of crowd counting, and meanwhile, it can also provide feasible solutions for the problem of object counting in other fields. We provide the density maps and prediction results of some mainstream algorithm in the validation set of NWPU dataset for comparison and testing. Meanwhile, density map generation and evaluation tools are also provided. All the codes and evaluation results are made publicly available at https://github.com/gaoguangshuai/survey-for-crowd-counting.
Tasks	Crowd Counting, Density Estimation, Object Counting
Published	2020-03-28
URL	https://arxiv.org/abs/2003.12783v1
PDF	https://arxiv.org/pdf/2003.12783v1.pdf
PWC	https://paperswithcode.com/paper/cnn-based-density-estimation-and-crowd
Repo	https://github.com/gjy3035/Awesome-Crowd-Counting
Framework	pytorch

Lightweight Photometric Stereo for Facial Details Recovery


Title	Lightweight Photometric Stereo for Facial Details Recovery
Authors	Xueying Wang, Yudong Guo, Bailin Deng, Juyong Zhang
Abstract	Recently, 3D face reconstruction from a single image has achieved great success with the help of deep learning and shape prior knowledge, but they often fail to produce accurate geometry details. On the other hand, photometric stereo methods can recover reliable geometry details, but require dense inputs and need to solve a complex optimization problem. In this paper, we present a lightweight strategy that only requires sparse inputs or even a single image to recover high-fidelity face shapes with images captured under near-field lights. To this end, we construct a dataset containing 84 different subjects with 29 expressions under 3 different lights. Data augmentation is applied to enrich the data in terms of diversity in identity, lighting, expression, etc. With this constructed dataset, we propose a novel neural network specially designed for photometric stereo based 3D face reconstruction. Extensive experiments and comparisons demonstrate that our method can generate high-quality reconstruction results with one to three facial images captured under near-field lights. Our full framework is available at https://github.com/Juyong/FacePSNet.
Tasks	3D Face Reconstruction, Data Augmentation, Face Reconstruction
Published	2020-03-27
URL	https://arxiv.org/abs/2003.12307v1
PDF	https://arxiv.org/pdf/2003.12307v1.pdf
PWC	https://paperswithcode.com/paper/lightweight-photometric-stereo-for-facial
Repo	https://github.com/Juyong/FacePSNet
Framework	pytorch

3D Quasi-Recurrent Neural Network for Hyperspectral Image Denoising


Title	3D Quasi-Recurrent Neural Network for Hyperspectral Image Denoising
Authors	Kaixuan Wei, Ying Fu, Hua Huang
Abstract	In this paper, we propose an alternating directional 3D quasi-recurrent neural network for hyperspectral image (HSI) denoising, which can effectively embed the domain knowledge – structural spatio-spectral correlation and global correlation along spectrum. Specifically, 3D convolution is utilized to extract structural spatio-spectral correlation in an HSI, while a quasi-recurrent pooling function is employed to capture the global correlation along spectrum. Moreover, alternating directional structure is introduced to eliminate the causal dependency with no additional computation cost. The proposed model is capable of modeling spatio-spectral dependency while preserving the flexibility towards HSIs with arbitrary number of bands. Extensive experiments on HSI denoising demonstrate significant improvement over state-of-the-arts under various noise settings, in terms of both restoration accuracy and computation time. Our code is available at https://github.com/Vandermode/QRNN3D.
Tasks	Denoising, Image Denoising
Published	2020-03-10
URL	https://arxiv.org/abs/2003.04547v1
PDF	https://arxiv.org/pdf/2003.04547v1.pdf
PWC	https://paperswithcode.com/paper/3d-quasi-recurrent-neural-network-for
Repo	https://github.com/Vandermode/QRNN3D
Framework	pytorch

Contextualized Embeddings in Named-Entity Recognition: An Empirical Study on Generalization


Title	Contextualized Embeddings in Named-Entity Recognition: An Empirical Study on Generalization
Authors	Bruno Taillé, Vincent Guigue, Patrick Gallinari
Abstract	Contextualized embeddings use unsupervised language model pretraining to compute word representations depending on their context. This is intuitively useful for generalization, especially in Named-Entity Recognition where it is crucial to detect mentions never seen during training. However, standard English benchmarks overestimate the importance of lexical over contextual features because of an unrealistic lexical overlap between train and test mentions. In this paper, we perform an empirical analysis of the generalization capabilities of state-of-the-art contextualized embeddings by separating mentions by novelty and with out-of-domain evaluation. We show that they are particularly beneficial for unseen mentions detection, especially out-of-domain. For models trained on CoNLL03, language model contextualization leads to a +1.2% maximal relative micro-F1 score increase in-domain against +13% out-of-domain on the WNUT dataset
Tasks	Language Modelling, Named Entity Recognition
Published	2020-01-22
URL	https://arxiv.org/abs/2001.08053v1
PDF	https://arxiv.org/pdf/2001.08053v1.pdf
PWC	https://paperswithcode.com/paper/contextualized-embeddings-in-named-entity
Repo	https://github.com/btaille/contener
Framework	pytorch

Deep Learning-Based Feature Extraction in Iris Recognition: Use Existing Models, Fine-tune or Train From Scratch?


Title	Deep Learning-Based Feature Extraction in Iris Recognition: Use Existing Models, Fine-tune or Train From Scratch?
Authors	Aidan Boyd, Adam Czajka, Kevin Bowyer
Abstract	Modern deep learning techniques can be employed to generate effective feature extractors for the task of iris recognition. The question arises: should we train such structures from scratch on a relatively large iris image dataset, or it is better to fine-tune the existing models to adapt them to a new domain? In this work we explore five different sets of weights for the popular ResNet-50 architecture to find out whether iris-specific feature extractors perform better than models trained for non-iris tasks. Features are extracted from each convolutional layer and the classification accuracy achieved by a Support Vector Machine is measured on a dataset that is disjoint from the samples used in training of the ResNet-50 model. We show that the optimal training strategy is to fine-tune an off-the-shelf set of weights to the iris recognition domain. This approach results in greater accuracy than both off-the-shelf weights and a model trained from scratch. The winning, fine-tuned approach also shows an increase in performance when compared to previous work, in which only off-the-shelf (not fine-tuned) models were used in iris feature extraction. We make the best-performing ResNet-50 model, fine-tuned with more than 360,000 iris images, publicly available along with this paper.
Tasks	Iris Recognition
Published	2020-02-20
URL	https://arxiv.org/abs/2002.08916v1
PDF	https://arxiv.org/pdf/2002.08916v1.pdf
PWC	https://paperswithcode.com/paper/deep-learning-based-feature-extraction-in
Repo	https://github.com/BoydAidan/BTAS2019DeepFeatureExtraction
Framework	none

Rethinking Few-Shot Image Classification: a Good Embedding Is All You Need?


Title	Rethinking Few-Shot Image Classification: a Good Embedding Is All You Need?
Authors	Yonglong Tian, Yue Wang, Dilip Krishnan, Joshua B. Tenenbaum, Phillip Isola
Abstract	The focus of recent meta-learning research has been on the development of learning algorithms that can quickly adapt to test time tasks with limited data and low computational cost. Few-shot learning is widely used as one of the standard benchmarks in meta-learning. In this work, we show that a simple baseline: learning a supervised or self-supervised representation on the meta-training set, followed by training a linear classifier on top of this representation, outperforms state-of-the-art few-shot learning methods. An additional boost can be achieved through the use of self-distillation. This demonstrates that using a good learned embedding model can be more effective than sophisticated meta-learning algorithms. We believe that our findings motivate a rethinking of few-shot image classification benchmarks and the associated role of meta-learning algorithms. Code is available at: http://github.com/WangYueFt/rfs/.
Tasks	Few-Shot Image Classification, Few-Shot Learning, Image Classification, Meta-Learning
Published	2020-03-25
URL	https://arxiv.org/abs/2003.11539v1
PDF	https://arxiv.org/pdf/2003.11539v1.pdf
PWC	https://paperswithcode.com/paper/rethinking-few-shot-image-classification-a
Repo	https://github.com/WangYueFt/rfs
Framework	pytorch

Cyber Attack Detection thanks to Machine Learning Algorithms


Title	Cyber Attack Detection thanks to Machine Learning Algorithms
Authors	Antoine Delplace, Sheryl Hermoso, Kristofer Anandita
Abstract	Cybersecurity attacks are growing both in frequency and sophistication over the years. This increasing sophistication and complexity call for more advancement and continuous innovation in defensive strategies. Traditional methods of intrusion detection and deep packet inspection, while still largely used and recommended, are no longer sufficient to meet the demands of growing security threats. As computing power increases and cost drops, Machine Learning is seen as an alternative method or an additional mechanism to defend against malwares, botnets, and other attacks. This paper explores Machine Learning as a viable solution by examining its capabilities to classify malicious traffic in a network. First, a strong data analysis is performed resulting in 22 extracted features from the initial Netflow datasets. All these features are then compared with one another through a feature selection process. Then, our approach analyzes five different machine learning algorithms against NetFlow dataset containing common botnets. The Random Forest Classifier succeeds in detecting more than 95% of the botnets in 8 out of 13 scenarios and more than 55% in the most difficult datasets. Finally, insight is given to improve and generalize the results, especially through a bootstrapping technique.
Tasks	Cyber Attack Detection, Feature Selection, Intrusion Detection
Published	2020-01-17
URL	https://arxiv.org/abs/2001.06309v1
PDF	https://arxiv.org/pdf/2001.06309v1.pdf
PWC	https://paperswithcode.com/paper/cyber-attack-detection-thanks-to-machine
Repo	https://github.com/antoinedelplace/Cyberattack-Detection
Framework	tf

PointHop++: A Lightweight Learning Model on Point Sets for 3D Classification


Title	PointHop++: A Lightweight Learning Model on Point Sets for 3D Classification
Authors	Min Zhang, Yifan Wang, Pranav Kadam, Shan Liu, C. -C. Jay Kuo
Abstract	The PointHop method was recently proposed by Zhang et al. for 3D point cloud classification with unsupervised feature extraction. It has an extremely low training complexity while achieving state-of-the-art classification performance. In this work, we improve the PointHop method furthermore in two aspects: 1) reducing its model complexity in terms of the model parameter number and 2) ordering discriminant features automatically based on the cross-entropy criterion. The resulting method is called PointHop++. The first improvement is essential for wearable and mobile computing while the second improvement bridges statistics-based and optimization-based machine learning methodologies. With experiments conducted on the ModelNet40 benchmark dataset, we show that the PointHop++ method performs on par with deep neural network (DNN) solutions and surpasses other unsupervised feature extraction methods.
Tasks
Published	2020-02-09
URL	https://arxiv.org/abs/2002.03281v1
PDF	https://arxiv.org/pdf/2002.03281v1.pdf
PWC	https://paperswithcode.com/paper/pointhop-a-lightweight-learning-model-on
Repo	https://github.com/minzhang-1/PointHop2
Framework	pytorch

PO-EMO: Conceptualization, Annotation, and Modeling of Aesthetic Emotions in German and English Poetry


Title	PO-EMO: Conceptualization, Annotation, and Modeling of Aesthetic Emotions in German and English Poetry
Authors	Thomas Haider, Steffen Eger, Evgeny Kim, Roman Klinger, Winfried Menninghaus
Abstract	Most approaches to emotion analysis regarding social media, literature, news, and other domains focus exclusively on basic emotion categories as defined by Ekman or Plutchik. However, art (such as literature) enables engagement in a broader range of more complex and subtle emotions that have been shown to also include mixed emotional responses. We consider emotions as they are elicited in the reader, rather than what is expressed in the text or intended by the author. Thus, we conceptualize a set of aesthetic emotions that are predictive of aesthetic appreciation in the reader, and allow the annotation of multiple labels per line to capture mixed emotions within context. We evaluate this novel setting in an annotation experiment both with carefully trained experts and via crowdsourcing. Our annotation with experts leads to an acceptable agreement of kappa=.70, resulting in a consistent dataset for future large scale analysis. Finally, we conduct first emotion classification experiments based on BERT, showing that identifying aesthetic emotions is challenging in our data, with up to .52 F1-micro on the German subset. Data and resources are available at https://github.com/tnhaider/poetry-emotion
Tasks	Emotion Classification, Emotion Recognition
Published	2020-03-17
URL	https://arxiv.org/abs/2003.07723v1
PDF	https://arxiv.org/pdf/2003.07723v1.pdf
PWC	https://paperswithcode.com/paper/po-emo-conceptualization-annotation-and
Repo	https://github.com/tnhaider/poetry-emotion
Framework	none