Paper Group AWR 31
Adversarial Continual Learning. State-Aware Tracker for Real-Time Video Object Segmentation. FedDANE: A Federated Newton-Type Method. Gimme Signals: Discriminative signal encoding for multimodal activity recognition. On the Road with 16 Neurons: Mental Imagery with Bio-inspired Deep Neural Networks. Adapting Deep Learning for Sentiment Classificati …
Adversarial Continual Learning
Title | Adversarial Continual Learning |
Authors | Sayna Ebrahimi, Franziska Meier, Roberto Calandra, Trevor Darrell, Marcus Rohrbach |
Abstract | Continual learning aims to learn new tasks without forgetting previously learned ones. We hypothesize that representations learned to solve each task in a sequence have a shared structure while containing some task-specific properties. We show that shared features are significantly less prone to forgetting and propose a novel hybrid continual learning framework that learns a disjoint representation for task-invariant and task-specific features required to solve a sequence of tasks. Our model combines architecture growth to prevent forgetting of task-specific skills and an experience replay approach to preserve shared skills. We demonstrate our hybrid approach is effective in avoiding forgetting and show it is superior to both architecture-based and memory-based approaches on class incrementally learning of a single dataset as well as a sequence of multiple datasets in image classification. Our code is available at \url{https://github.com/facebookresearch/Adversarial-Continual-Learning}. |
Tasks | Continual Learning, Image Classification |
Published | 2020-03-21 |
URL | https://arxiv.org/abs/2003.09553v1 |
https://arxiv.org/pdf/2003.09553v1.pdf | |
PWC | https://paperswithcode.com/paper/adversarial-continual-learning |
Repo | https://github.com/facebookresearch/Adversarial-Continual-Learning |
Framework | pytorch |
State-Aware Tracker for Real-Time Video Object Segmentation
Title | State-Aware Tracker for Real-Time Video Object Segmentation |
Authors | Xi Chen, Zuoxin Li, Ye Yuan, Gang Yu, Jianxin Shen, Donglian Qi |
Abstract | In this work, we address the task of semi-supervised video object segmentation(VOS) and explore how to make efficient use of video property to tackle the challenge of semi-supervision. We propose a novel pipeline called State-Aware Tracker(SAT), which can produce accurate segmentation results with real-time speed. For higher efficiency, SAT takes advantage of the inter-frame consistency and deals with each target object as a tracklet. For more stable and robust performance over video sequences, SAT gets awareness for each state and makes self-adaptation via two feedback loops. One loop assists SAT in generating more stable tracklets. The other loop helps to construct a more robust and holistic target representation. SAT achieves a promising result of 72.3% J&F mean with 39 FPS on DAVIS2017-Val dataset, which shows a decent trade-off between efficiency and accuracy. Code will be released at github.com/MegviiDetection/video_analyst. |
Tasks | Semantic Segmentation, Semi-supervised Video Object Segmentation, Video Object Segmentation, Video Semantic Segmentation |
Published | 2020-03-01 |
URL | https://arxiv.org/abs/2003.00482v1 |
https://arxiv.org/pdf/2003.00482v1.pdf | |
PWC | https://paperswithcode.com/paper/state-aware-tracker-for-real-time-video |
Repo | https://github.com/MegviiDetection/video_analyst |
Framework | none |
FedDANE: A Federated Newton-Type Method
Title | FedDANE: A Federated Newton-Type Method |
Authors | Tian Li, Anit Kumar Sahu, Manzil Zaheer, Maziar Sanjabi, Ameet Talwalkar, Virginia Smith |
Abstract | Federated learning aims to jointly learn statistical models over massively distributed remote devices. In this work, we propose FedDANE, an optimization method that we adapt from DANE, a method for classical distributed optimization, to handle the practical constraints of federated learning. We provide convergence guarantees for this method when learning over both convex and non-convex functions. Despite encouraging theoretical results, we find that the method has underwhelming performance empirically. In particular, through empirical simulations on both synthetic and real-world datasets, FedDANE consistently underperforms baselines of FedAvg and FedProx in realistic federated settings. We identify low device participation and statistical device heterogeneity as two underlying causes of this underwhelming performance, and conclude by suggesting several directions of future work. |
Tasks | Distributed Optimization |
Published | 2020-01-07 |
URL | https://arxiv.org/abs/2001.01920v1 |
https://arxiv.org/pdf/2001.01920v1.pdf | |
PWC | https://paperswithcode.com/paper/feddane-a-federated-newton-type-method |
Repo | https://github.com/litian96/FedDANE |
Framework | none |
Gimme Signals: Discriminative signal encoding for multimodal activity recognition
Title | Gimme Signals: Discriminative signal encoding for multimodal activity recognition |
Authors | Raphael Memmesheimer, Nick Theisen, Dietrich Paulus |
Abstract | We present a simple, yet effective and flexible method for action recognition supporting multiple sensor modalities. Multivariate signal sequences are encoded in an image and are then classified using a recently proposed EfficientNet CNN architecture. Our focus was to find an approach that generalizes well across different sensor modalities without specific adaptions while still achieving good results. We apply our method to 4 action recognition datasets containing skeleton sequences, inertial and motion capturing measurements as well as \wifi fingerprints that range up to 120 action classes. Our method defines the current best CNN-based approach on the NTU RGB+D 120 dataset, lifts the state of the art on the ARIL Wi-Fi dataset by +6.78%, improves the UTD-MHAD inertial baseline by +14.4%, the UTD-MHAD skeleton baseline by 1.13% and achieves 96.11% on the Simitate motion capturing data (80/20 split). We further demonstrate experiments on both, modality fusion on a signal level and signal reduction to prevent the representation from overloading. |
Tasks | Action Recognition In Videos, Activity Recognition, Multimodal Activity Recognition |
Published | 2020-03-13 |
URL | https://arxiv.org/abs/2003.06156v1 |
https://arxiv.org/pdf/2003.06156v1.pdf | |
PWC | https://paperswithcode.com/paper/gimme-signals-discriminative-signal-encoding |
Repo | https://github.com/airglow/gimme_signals_action_recognition |
Framework | pytorch |
On the Road with 16 Neurons: Mental Imagery with Bio-inspired Deep Neural Networks
Title | On the Road with 16 Neurons: Mental Imagery with Bio-inspired Deep Neural Networks |
Authors | Alice Plebe, Mauro Da Lio |
Abstract | This paper proposes a strategy for visual prediction in the context of autonomous driving. Humans, when not distracted or drunk, are still the best drivers you can currently find. For this reason we take inspiration from two theoretical ideas about the human mind and its neural organization. The first idea concerns how the brain uses a hierarchical structure of neuron ensembles to extract abstract concepts from visual experience and code them into compact representations. The second idea suggests that these neural perceptual representations are not neutral but functional to the prediction of the future state of affairs in the environment. Similarly, the prediction mechanism is not neutral but oriented to the current planning of a future action. We identify within the deep learning framework two artificial counterparts of the aforementioned neurocognitive theories. We find a correspondence between the first theoretical idea and the architecture of convolutional autoencoders, while we translate the second theory into a training procedure that learns compact representations which are not neutral but oriented to driving tasks, from two distinct perspectives. From a static perspective, we force groups of neural units in the compact representations to distinctly represent specific concepts crucial to the driving task. From a dynamic perspective, we encourage the compact representations to be predictive of how the current road scenario will change in the future. We successfully learn compact representations that use as few as 16 neural units for each of the two basic driving concepts we consider: car and lane. We prove the efficiency of our proposed perceptual representations on the SYNTHIA dataset. Our source code is available at https://github.com/3lis/rnn_vae |
Tasks | Autonomous Driving |
Published | 2020-03-09 |
URL | https://arxiv.org/abs/2003.08745v1 |
https://arxiv.org/pdf/2003.08745v1.pdf | |
PWC | https://paperswithcode.com/paper/on-the-road-with-16-neurons-mental-imagery |
Repo | https://github.com/3lis/rnn_vae |
Framework | tf |
Adapting Deep Learning for Sentiment Classification of Code-Switched Informal Short Text
Title | Adapting Deep Learning for Sentiment Classification of Code-Switched Informal Short Text |
Authors | Muhammad Haroon Shakeel, Asim Karim |
Abstract | Nowadays, an abundance of short text is being generated that uses nonstandard writing styles influenced by regional languages. Such informal and code-switched content are under-resourced in terms of labeled datasets and language models even for popular tasks like sentiment classification. In this work, we (1) present a labeled dataset called MultiSenti for sentiment classification of code-switched informal short text, (2) explore the feasibility of adapting resources from a resource-rich language for an informal one, and (3) propose a deep learning-based model for sentiment classification of code-switched informal short text. We aim to achieve this without any lexical normalization, language translation, or code-switching indication. The performance of the proposed models is compared with three existing multilingual sentiment classification models. The results show that the proposed model performs better in general and adapting character-based embeddings yield equivalent performance while being computationally more efficient than training word-based domain-specific embeddings. |
Tasks | Lexical Normalization, Sentiment Analysis |
Published | 2020-01-04 |
URL | https://arxiv.org/abs/2001.01047v1 |
https://arxiv.org/pdf/2001.01047v1.pdf | |
PWC | https://paperswithcode.com/paper/adapting-deep-learning-for-sentiment |
Repo | https://github.com/haroonshakeel/multisenti |
Framework | none |
CNN-based Density Estimation and Crowd Counting: A Survey
Title | CNN-based Density Estimation and Crowd Counting: A Survey |
Authors | Guangshuai Gao, Junyu Gao, Qingjie Liu, Qi Wang, Yunhong Wang |
Abstract | Accurately estimating the number of objects in a single image is a challenging yet meaningful task and has been applied in many applications such as urban planning and public safety. In the various object counting tasks, crowd counting is particularly prominent due to its specific significance to social security and development. Fortunately, the development of the techniques for crowd counting can be generalized to other related fields such as vehicle counting and environment survey, if without taking their characteristics into account. Therefore, many researchers are devoting to crowd counting, and many excellent works of literature and works have spurted out. In these works, they are must be helpful for the development of crowd counting. However, the question we should consider is why they are effective for this task. Limited by the cost of time and energy, we cannot analyze all the algorithms. In this paper, we have surveyed over 220 works to comprehensively and systematically study the crowd counting models, mainly CNN-based density map estimation methods. Finally, according to the evaluation metrics, we select the top three performers on their crowd counting datasets and analyze their merits and drawbacks. Through our analysis, we expect to make reasonable inference and prediction for the future development of crowd counting, and meanwhile, it can also provide feasible solutions for the problem of object counting in other fields. We provide the density maps and prediction results of some mainstream algorithm in the validation set of NWPU dataset for comparison and testing. Meanwhile, density map generation and evaluation tools are also provided. All the codes and evaluation results are made publicly available at https://github.com/gaoguangshuai/survey-for-crowd-counting. |
Tasks | Crowd Counting, Density Estimation, Object Counting |
Published | 2020-03-28 |
URL | https://arxiv.org/abs/2003.12783v1 |
https://arxiv.org/pdf/2003.12783v1.pdf | |
PWC | https://paperswithcode.com/paper/cnn-based-density-estimation-and-crowd |
Repo | https://github.com/gjy3035/Awesome-Crowd-Counting |
Framework | pytorch |
Lightweight Photometric Stereo for Facial Details Recovery
Title | Lightweight Photometric Stereo for Facial Details Recovery |
Authors | Xueying Wang, Yudong Guo, Bailin Deng, Juyong Zhang |
Abstract | Recently, 3D face reconstruction from a single image has achieved great success with the help of deep learning and shape prior knowledge, but they often fail to produce accurate geometry details. On the other hand, photometric stereo methods can recover reliable geometry details, but require dense inputs and need to solve a complex optimization problem. In this paper, we present a lightweight strategy that only requires sparse inputs or even a single image to recover high-fidelity face shapes with images captured under near-field lights. To this end, we construct a dataset containing 84 different subjects with 29 expressions under 3 different lights. Data augmentation is applied to enrich the data in terms of diversity in identity, lighting, expression, etc. With this constructed dataset, we propose a novel neural network specially designed for photometric stereo based 3D face reconstruction. Extensive experiments and comparisons demonstrate that our method can generate high-quality reconstruction results with one to three facial images captured under near-field lights. Our full framework is available at https://github.com/Juyong/FacePSNet. |
Tasks | 3D Face Reconstruction, Data Augmentation, Face Reconstruction |
Published | 2020-03-27 |
URL | https://arxiv.org/abs/2003.12307v1 |
https://arxiv.org/pdf/2003.12307v1.pdf | |
PWC | https://paperswithcode.com/paper/lightweight-photometric-stereo-for-facial |
Repo | https://github.com/Juyong/FacePSNet |
Framework | pytorch |
3D Quasi-Recurrent Neural Network for Hyperspectral Image Denoising
Title | 3D Quasi-Recurrent Neural Network for Hyperspectral Image Denoising |
Authors | Kaixuan Wei, Ying Fu, Hua Huang |
Abstract | In this paper, we propose an alternating directional 3D quasi-recurrent neural network for hyperspectral image (HSI) denoising, which can effectively embed the domain knowledge – structural spatio-spectral correlation and global correlation along spectrum. Specifically, 3D convolution is utilized to extract structural spatio-spectral correlation in an HSI, while a quasi-recurrent pooling function is employed to capture the global correlation along spectrum. Moreover, alternating directional structure is introduced to eliminate the causal dependency with no additional computation cost. The proposed model is capable of modeling spatio-spectral dependency while preserving the flexibility towards HSIs with arbitrary number of bands. Extensive experiments on HSI denoising demonstrate significant improvement over state-of-the-arts under various noise settings, in terms of both restoration accuracy and computation time. Our code is available at https://github.com/Vandermode/QRNN3D. |
Tasks | Denoising, Image Denoising |
Published | 2020-03-10 |
URL | https://arxiv.org/abs/2003.04547v1 |
https://arxiv.org/pdf/2003.04547v1.pdf | |
PWC | https://paperswithcode.com/paper/3d-quasi-recurrent-neural-network-for |
Repo | https://github.com/Vandermode/QRNN3D |
Framework | pytorch |
Contextualized Embeddings in Named-Entity Recognition: An Empirical Study on Generalization
Title | Contextualized Embeddings in Named-Entity Recognition: An Empirical Study on Generalization |
Authors | Bruno Taillé, Vincent Guigue, Patrick Gallinari |
Abstract | Contextualized embeddings use unsupervised language model pretraining to compute word representations depending on their context. This is intuitively useful for generalization, especially in Named-Entity Recognition where it is crucial to detect mentions never seen during training. However, standard English benchmarks overestimate the importance of lexical over contextual features because of an unrealistic lexical overlap between train and test mentions. In this paper, we perform an empirical analysis of the generalization capabilities of state-of-the-art contextualized embeddings by separating mentions by novelty and with out-of-domain evaluation. We show that they are particularly beneficial for unseen mentions detection, especially out-of-domain. For models trained on CoNLL03, language model contextualization leads to a +1.2% maximal relative micro-F1 score increase in-domain against +13% out-of-domain on the WNUT dataset |
Tasks | Language Modelling, Named Entity Recognition |
Published | 2020-01-22 |
URL | https://arxiv.org/abs/2001.08053v1 |
https://arxiv.org/pdf/2001.08053v1.pdf | |
PWC | https://paperswithcode.com/paper/contextualized-embeddings-in-named-entity |
Repo | https://github.com/btaille/contener |
Framework | pytorch |
Deep Learning-Based Feature Extraction in Iris Recognition: Use Existing Models, Fine-tune or Train From Scratch?
Title | Deep Learning-Based Feature Extraction in Iris Recognition: Use Existing Models, Fine-tune or Train From Scratch? |
Authors | Aidan Boyd, Adam Czajka, Kevin Bowyer |
Abstract | Modern deep learning techniques can be employed to generate effective feature extractors for the task of iris recognition. The question arises: should we train such structures from scratch on a relatively large iris image dataset, or it is better to fine-tune the existing models to adapt them to a new domain? In this work we explore five different sets of weights for the popular ResNet-50 architecture to find out whether iris-specific feature extractors perform better than models trained for non-iris tasks. Features are extracted from each convolutional layer and the classification accuracy achieved by a Support Vector Machine is measured on a dataset that is disjoint from the samples used in training of the ResNet-50 model. We show that the optimal training strategy is to fine-tune an off-the-shelf set of weights to the iris recognition domain. This approach results in greater accuracy than both off-the-shelf weights and a model trained from scratch. The winning, fine-tuned approach also shows an increase in performance when compared to previous work, in which only off-the-shelf (not fine-tuned) models were used in iris feature extraction. We make the best-performing ResNet-50 model, fine-tuned with more than 360,000 iris images, publicly available along with this paper. |
Tasks | Iris Recognition |
Published | 2020-02-20 |
URL | https://arxiv.org/abs/2002.08916v1 |
https://arxiv.org/pdf/2002.08916v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-based-feature-extraction-in |
Repo | https://github.com/BoydAidan/BTAS2019DeepFeatureExtraction |
Framework | none |
Rethinking Few-Shot Image Classification: a Good Embedding Is All You Need?
Title | Rethinking Few-Shot Image Classification: a Good Embedding Is All You Need? |
Authors | Yonglong Tian, Yue Wang, Dilip Krishnan, Joshua B. Tenenbaum, Phillip Isola |
Abstract | The focus of recent meta-learning research has been on the development of learning algorithms that can quickly adapt to test time tasks with limited data and low computational cost. Few-shot learning is widely used as one of the standard benchmarks in meta-learning. In this work, we show that a simple baseline: learning a supervised or self-supervised representation on the meta-training set, followed by training a linear classifier on top of this representation, outperforms state-of-the-art few-shot learning methods. An additional boost can be achieved through the use of self-distillation. This demonstrates that using a good learned embedding model can be more effective than sophisticated meta-learning algorithms. We believe that our findings motivate a rethinking of few-shot image classification benchmarks and the associated role of meta-learning algorithms. Code is available at: http://github.com/WangYueFt/rfs/. |
Tasks | Few-Shot Image Classification, Few-Shot Learning, Image Classification, Meta-Learning |
Published | 2020-03-25 |
URL | https://arxiv.org/abs/2003.11539v1 |
https://arxiv.org/pdf/2003.11539v1.pdf | |
PWC | https://paperswithcode.com/paper/rethinking-few-shot-image-classification-a |
Repo | https://github.com/WangYueFt/rfs |
Framework | pytorch |
Cyber Attack Detection thanks to Machine Learning Algorithms
Title | Cyber Attack Detection thanks to Machine Learning Algorithms |
Authors | Antoine Delplace, Sheryl Hermoso, Kristofer Anandita |
Abstract | Cybersecurity attacks are growing both in frequency and sophistication over the years. This increasing sophistication and complexity call for more advancement and continuous innovation in defensive strategies. Traditional methods of intrusion detection and deep packet inspection, while still largely used and recommended, are no longer sufficient to meet the demands of growing security threats. As computing power increases and cost drops, Machine Learning is seen as an alternative method or an additional mechanism to defend against malwares, botnets, and other attacks. This paper explores Machine Learning as a viable solution by examining its capabilities to classify malicious traffic in a network. First, a strong data analysis is performed resulting in 22 extracted features from the initial Netflow datasets. All these features are then compared with one another through a feature selection process. Then, our approach analyzes five different machine learning algorithms against NetFlow dataset containing common botnets. The Random Forest Classifier succeeds in detecting more than 95% of the botnets in 8 out of 13 scenarios and more than 55% in the most difficult datasets. Finally, insight is given to improve and generalize the results, especially through a bootstrapping technique. |
Tasks | Cyber Attack Detection, Feature Selection, Intrusion Detection |
Published | 2020-01-17 |
URL | https://arxiv.org/abs/2001.06309v1 |
https://arxiv.org/pdf/2001.06309v1.pdf | |
PWC | https://paperswithcode.com/paper/cyber-attack-detection-thanks-to-machine |
Repo | https://github.com/antoinedelplace/Cyberattack-Detection |
Framework | tf |
PointHop++: A Lightweight Learning Model on Point Sets for 3D Classification
Title | PointHop++: A Lightweight Learning Model on Point Sets for 3D Classification |
Authors | Min Zhang, Yifan Wang, Pranav Kadam, Shan Liu, C. -C. Jay Kuo |
Abstract | The PointHop method was recently proposed by Zhang et al. for 3D point cloud classification with unsupervised feature extraction. It has an extremely low training complexity while achieving state-of-the-art classification performance. In this work, we improve the PointHop method furthermore in two aspects: 1) reducing its model complexity in terms of the model parameter number and 2) ordering discriminant features automatically based on the cross-entropy criterion. The resulting method is called PointHop++. The first improvement is essential for wearable and mobile computing while the second improvement bridges statistics-based and optimization-based machine learning methodologies. With experiments conducted on the ModelNet40 benchmark dataset, we show that the PointHop++ method performs on par with deep neural network (DNN) solutions and surpasses other unsupervised feature extraction methods. |
Tasks | |
Published | 2020-02-09 |
URL | https://arxiv.org/abs/2002.03281v1 |
https://arxiv.org/pdf/2002.03281v1.pdf | |
PWC | https://paperswithcode.com/paper/pointhop-a-lightweight-learning-model-on |
Repo | https://github.com/minzhang-1/PointHop2 |
Framework | pytorch |
PO-EMO: Conceptualization, Annotation, and Modeling of Aesthetic Emotions in German and English Poetry
Title | PO-EMO: Conceptualization, Annotation, and Modeling of Aesthetic Emotions in German and English Poetry |
Authors | Thomas Haider, Steffen Eger, Evgeny Kim, Roman Klinger, Winfried Menninghaus |
Abstract | Most approaches to emotion analysis regarding social media, literature, news, and other domains focus exclusively on basic emotion categories as defined by Ekman or Plutchik. However, art (such as literature) enables engagement in a broader range of more complex and subtle emotions that have been shown to also include mixed emotional responses. We consider emotions as they are elicited in the reader, rather than what is expressed in the text or intended by the author. Thus, we conceptualize a set of aesthetic emotions that are predictive of aesthetic appreciation in the reader, and allow the annotation of multiple labels per line to capture mixed emotions within context. We evaluate this novel setting in an annotation experiment both with carefully trained experts and via crowdsourcing. Our annotation with experts leads to an acceptable agreement of kappa=.70, resulting in a consistent dataset for future large scale analysis. Finally, we conduct first emotion classification experiments based on BERT, showing that identifying aesthetic emotions is challenging in our data, with up to .52 F1-micro on the German subset. Data and resources are available at https://github.com/tnhaider/poetry-emotion |
Tasks | Emotion Classification, Emotion Recognition |
Published | 2020-03-17 |
URL | https://arxiv.org/abs/2003.07723v1 |
https://arxiv.org/pdf/2003.07723v1.pdf | |
PWC | https://paperswithcode.com/paper/po-emo-conceptualization-annotation-and |
Repo | https://github.com/tnhaider/poetry-emotion |
Framework | none |