October 21, 2019

3460 words 17 mins read

Paper Group AWR 23

Pervasive Attention: 2D Convolutional Neural Networks for Sequence-to-Sequence Prediction. Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition. Is preprocessing of text really worth your time for online comment classification?. View Adaptive Neural Networks for High Performance Skeleton-based Human Action Recognit …

Pervasive Attention: 2D Convolutional Neural Networks for Sequence-to-Sequence Prediction


Title	Pervasive Attention: 2D Convolutional Neural Networks for Sequence-to-Sequence Prediction
Authors	Maha Elbayad, Laurent Besacier, Jakob Verbeek
Abstract	Current state-of-the-art machine translation systems are based on encoder-decoder architectures, that first encode the input sequence, and then generate an output sequence based on the input encoding. Both are interfaced with an attention mechanism that recombines a fixed encoding of the source tokens based on the decoder state. We propose an alternative approach which instead relies on a single 2D convolutional neural network across both sequences. Each layer of our network re-codes source tokens on the basis of the output sequence produced so far. Attention-like properties are therefore pervasive throughout the network. Our model yields excellent results, outperforming state-of-the-art encoder-decoder systems, while being conceptually simpler and having fewer parameters.
Tasks	Machine Translation
Published	2018-08-11
URL	http://arxiv.org/abs/1808.03867v3
PDF	http://arxiv.org/pdf/1808.03867v3.pdf
PWC	https://paperswithcode.com/paper/pervasive-attention-2d-convolutional-neural-1
Repo	https://github.com/macaba/NNDN
Framework	none

Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition


Title	Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition
Authors	Lei Shi, Yifan Zhang, Jian Cheng, Hanqing Lu
Abstract	In skeleton-based action recognition, graph convolutional networks (GCNs), which model the human body skeletons as spatiotemporal graphs, have achieved remarkable performance. However, in existing GCN-based methods, the topology of the graph is set manually, and it is fixed over all layers and input samples. This may not be optimal for the hierarchical GCN and diverse samples in action recognition tasks. In addition, the second-order information (the lengths and directions of bones) of the skeleton data, which is naturally more informative and discriminative for action recognition, is rarely investigated in existing methods. In this work, we propose a novel two-stream adaptive graph convolutional network (2s-AGCN) for skeleton-based action recognition. The topology of the graph in our model can be either uniformly or individually learned by the BP algorithm in an end-to-end manner. This data-driven method increases the flexibility of the model for graph construction and brings more generality to adapt to various data samples. Moreover, a two-stream framework is proposed to model both the first-order and the second-order information simultaneously, which shows notable improvement for the recognition accuracy. Extensive experiments on the two large-scale datasets, NTU-RGBD and Kinetics-Skeleton, demonstrate that the performance of our model exceeds the state-of-the-art with a significant margin.
Tasks	graph construction, Skeleton Based Action Recognition, Temporal Action Localization
Published	2018-05-20
URL	https://arxiv.org/abs/1805.07694v3
PDF	https://arxiv.org/pdf/1805.07694v3.pdf
PWC	https://paperswithcode.com/paper/non-local-graph-convolutional-networks-for
Repo	https://github.com/lshiwjx/2s-AGCN
Framework	pytorch

Is preprocessing of text really worth your time for online comment classification?


Title	Is preprocessing of text really worth your time for online comment classification?
Authors	Fahim Mohammad
Abstract	A large proportion of online comments present on public domains are constructive, however a significant proportion are toxic in nature. The comments contain lot of typos which increases the number of features manifold, making the ML model difficult to train. Considering the fact that the data scientists spend approximately 80% of their time in collecting, cleaning and organizing their data [1], we explored how much effort should we invest in the preprocessing (transformation) of raw comments before feeding it to the state-of-the-art classification models. With the help of four models on Jigsaw toxic comment classification data, we demonstrated that the training of model without any transformation produce relatively decent model. Applying even basic transformations, in some cases, lead to worse performance and should be applied with caution.
Tasks
Published	2018-06-07
URL	http://arxiv.org/abs/1806.02908v2
PDF	http://arxiv.org/pdf/1806.02908v2.pdf
PWC	https://paperswithcode.com/paper/is-preprocessing-of-text-really-worth-your
Repo	https://github.com/ifahim/toxic-preprocess
Framework	none

View Adaptive Neural Networks for High Performance Skeleton-based Human Action Recognition


Title	View Adaptive Neural Networks for High Performance Skeleton-based Human Action Recognition
Authors	Pengfei Zhang, Cuiling Lan, Junliang Xing, Wenjun Zeng, Jianru Xue, Nanning Zheng
Abstract	Skeleton-based human action recognition has recently attracted increasing attention thanks to the accessibility and the popularity of 3D skeleton data. One of the key challenges in skeleton-based action recognition lies in the large view variations when capturing data. In order to alleviate the effects of view variations, this paper introduces a novel view adaptation scheme, which automatically determines the virtual observation viewpoints in a learning based data driven manner. We design two view adaptive neural networks, i.e., VA-RNN based on RNN, and VA-CNN based on CNN. For each network, a novel view adaptation module learns and determines the most suitable observation viewpoints, and transforms the skeletons to those viewpoints for the end-to-end recognition with a main classification network. Ablation studies find that the proposed view adaptive models are capable of transforming the skeletons of various viewpoints to much more consistent virtual viewpoints which largely eliminates the viewpoint influence. In addition, we design a two-stream scheme (referred to as VA-fusion) that fuses the scores of the two networks to provide the fused prediction. Extensive experimental evaluations on five challenging benchmarks demonstrate that the effectiveness of the proposed view-adaptive networks and superior performance over state-of-the-art approaches. The source code is available at https://github.com/microsoft/View-Adaptive-Neural-Networks-for-Skeleton-based-Human-Action-Recognition.
Tasks	Skeleton Based Action Recognition, Temporal Action Localization
Published	2018-04-20
URL	https://arxiv.org/abs/1804.07453v3
PDF	https://arxiv.org/pdf/1804.07453v3.pdf
PWC	https://paperswithcode.com/paper/view-adaptive-neural-networks-for-high
Repo	https://github.com/microsoft/View-Adaptive-Neural-Networks-for-Skeleton-based-Human-Action-Recognition
Framework	pytorch

Global overview of Imitation Learning


Title	Global overview of Imitation Learning
Authors	Alexandre Attia, Sharone Dayan
Abstract	Imitation Learning is a sequential task where the learner tries to mimic an expert’s action in order to achieve the best performance. Several algorithms have been proposed recently for this task. In this project, we aim at proposing a wide review of these algorithms, presenting their main features and comparing them on their performance and their regret bounds.
Tasks	Imitation Learning
Published	2018-01-19
URL	http://arxiv.org/abs/1801.06503v1
PDF	http://arxiv.org/pdf/1801.06503v1.pdf
PWC	https://paperswithcode.com/paper/global-overview-of-imitation-learning
Repo	https://github.com/hbzhang/awesomeimitationlearning
Framework	none

Faster PET Reconstruction with Non-Smooth Priors by Randomization and Preconditioning


Title	Faster PET Reconstruction with Non-Smooth Priors by Randomization and Preconditioning
Authors	Matthias J. Ehrhardt, Pawel Markiewicz, Carola-Bibiane Schönlieb
Abstract	Uncompressed clinical data from modern positron emission tomography (PET) scanners are very large, exceeding 350 million data points (projection bins). The last decades have seen tremendous advancements in mathematical imaging tools many of which lead to non-smooth (i.e. non-differentiable) optimization problems which are much harder to solve than smooth optimization problems. Most of these tools have not been translated to clinical PET data, as the state-of-the-art algorithms for non-smooth problems do not scale well to large data. In this work, inspired by big data machine learning applications, we use advanced randomized optimization algorithms to solve the PET reconstruction problem for a very large class of non-smooth priors which includes for example total variation, total generalized variation, directional total variation and various different physical constraints. The proposed algorithm randomly uses subsets of the data and only updates the variables associated with these. While this idea often leads to divergent algorithms, we show that the proposed algorithm does indeed converge for any proper subset selection. Numerically, we show on real PET data (FDG and florbetapir) from a Siemens Biograph mMR that about ten projections and backprojections are sufficient to solve the MAP optimisation problem related to many popular non-smooth priors; thus showing that the proposed algorithm is fast enough to bring these models into routine clinical practice.
Tasks
Published	2018-08-21
URL	https://arxiv.org/abs/1808.07150v5
PDF	https://arxiv.org/pdf/1808.07150v5.pdf
PWC	https://paperswithcode.com/paper/faster-pet-reconstruction-with-non-smooth
Repo	https://github.com/mehrhardt/spdhg
Framework	none

Competency Questions and SPARQL-OWL Queries Dataset and Analysis


Title	Competency Questions and SPARQL-OWL Queries Dataset and Analysis
Authors	Dawid Wisniewski, Jedrzej Potoniec, Agnieszka Lawrynowicz, C. Maria Keet
Abstract	Competency Questions (CQs) are natural language questions outlining and constraining the scope of knowledge represented by an ontology. Despite that CQs are a part of several ontology engineering methodologies, we have observed that the actual publication of CQs for the available ontologies is very limited and even scarcer is the publication of their respective formalisations in terms of, e.g., SPARQL queries. This paper aims to contribute to addressing the engineering shortcomings of using CQs in ontology development, to facilitate wider use of CQs. In order to understand the relation between CQs and the queries over the ontology to test the CQs on an ontology, we gather, analyse, and publicly release a set of 234 CQs and their translations to SPARQL-OWL for several ontologies in different domains developed by different groups. We analysed the CQs in two principal ways. The first stage focused on a linguistic analysis of the natural language text itself, i.e., a lexico-syntactic analysis without any presuppositions of ontology elements, and a subsequent step of semantic analysis in order to find patterns. This increased diversity of CQ sources resulted in a 5-fold increase of hitherto published patterns, to 106 distinct CQ patterns, which have a limited subset of few patterns shared across the CQ sets from the different ontologies. Next, we analysed the relation between the found CQ patterns and the 46 SPARQL-OWL query signatures, which revealed that one CQ pattern may be realised by more than one SPARQL-OWL query signature, and vice versa. We hope that our work will contribute to establishing common practices, templates, automation, and user tools that will support CQ formulation, formalisation, execution, and general management.
Tasks
Published	2018-11-23
URL	http://arxiv.org/abs/1811.09529v1
PDF	http://arxiv.org/pdf/1811.09529v1.pdf
PWC	https://paperswithcode.com/paper/competency-questions-and-sparql-owl-queries
Repo	https://github.com/CQ2SPARQLOWL/Dataset
Framework	none

Look at Boundary: A Boundary-Aware Face Alignment Algorithm


Title	Look at Boundary: A Boundary-Aware Face Alignment Algorithm
Authors	Wayne Wu, Chen Qian, Shuo Yang, Quan Wang, Yici Cai, Qiang Zhou
Abstract	We present a novel boundary-aware face alignment algorithm by utilising boundary lines as the geometric structure of a human face to help facial landmark localisation. Unlike the conventional heatmap based method and regression based method, our approach derives face landmarks from boundary lines which remove the ambiguities in the landmark definition. Three questions are explored and answered by this work: 1. Why using boundary? 2. How to use boundary? 3. What is the relationship between boundary estimation and landmarks localisation? Our boundary- aware face alignment algorithm achieves 3.49% mean error on 300-W Fullset, which outperforms state-of-the-art methods by a large margin. Our method can also easily integrate information from other datasets. By utilising boundary information of 300-W dataset, our method achieves 3.92% mean error with 0.39% failure rate on COFW dataset, and 1.25% mean error on AFLW-Full dataset. Moreover, we propose a new dataset WFLW to unify training and testing across different factors, including poses, expressions, illuminations, makeups, occlusions, and blurriness. Dataset and model will be publicly available at https://wywu.github.io/projects/LAB/LAB.html
Tasks	Face Alignment
Published	2018-05-26
URL	http://arxiv.org/abs/1805.10483v1
PDF	http://arxiv.org/pdf/1805.10483v1.pdf
PWC	https://paperswithcode.com/paper/look-at-boundary-a-boundary-aware-face
Repo	https://github.com/zhusz/CVPR15-CFSS
Framework	none

Efficient Dense Modules of Asymmetric Convolution for Real-Time Semantic Segmentation


Title	Efficient Dense Modules of Asymmetric Convolution for Real-Time Semantic Segmentation
Authors	Shao-Yuan Lo, Hsueh-Ming Hang, Sheng-Wei Chan, Jing-Jhih Lin
Abstract	Real-time semantic segmentation plays an important role in practical applications such as self-driving and robots. Most semantic segmentation research focuses on improving estimation accuracy with little consideration on efficiency. Several previous studies that emphasize high-speed inference often fail to produce high-accuracy segmentation results. In this paper, we propose a novel convolutional network named Efficient Dense modules with Asymmetric convolution (EDANet), which employs an asymmetric convolution structure and incorporates dilated convolution and dense connectivity to achieve high efficiency at low computational cost and model size. EDANet is 2.7 times faster than the existing fast segmentation network, ICNet, while it achieves a similar mIoU score without any additional context module, post-processing scheme, and pretrained model. We evaluate EDANet on Cityscapes and CamVid datasets, and compare it with the other state-of-art systems. Our network can run with the high-resolution inputs at the speed of 108 FPS on one GTX 1080Ti.
Tasks	Real-Time Semantic Segmentation, Semantic Segmentation
Published	2018-09-17
URL	https://arxiv.org/abs/1809.06323v3
PDF	https://arxiv.org/pdf/1809.06323v3.pdf
PWC	https://paperswithcode.com/paper/efficient-dense-modules-of-asymmetric
Repo	https://github.com/shaoyuanlo/EDANet
Framework	pytorch

Effective Ways to Build and Evaluate Individual Survival Distributions


Title	Effective Ways to Build and Evaluate Individual Survival Distributions
Authors	Humza Haider, Bret Hoehn, Sarah Davis, Russell Greiner
Abstract	An accurate model of a patient’s individual survival distribution can help determine the appropriate treatment for terminal patients. Unfortunately, risk scores (e.g., from Cox Proportional Hazard models) do not provide survival probabilities, single-time probability models (e.g., the Gail model, predicting 5 year probability) only provide for a single time point, and standard Kaplan-Meier survival curves provide only population averages for a large class of patients meaning they are not specific to individual patients. This motivates an alternative class of tools that can learn a model which provides an individual survival distribution which gives survival probabilities across all times - such as extensions to the Cox model, Accelerated Failure Time, an extension to Random Survival Forests, and Multi-Task Logistic Regression. This paper first motivates such “individual survival distribution” (ISD) models, and explains how they differ from standard models. It then discusses ways to evaluate such models - namely Concordance, 1-Calibration, Brier score, and various versions of L1-loss - and then motivates and defines a novel approach “D-Calibration”, which determines whether a model’s probability estimates are meaningful. We also discuss how these measures differ, and use them to evaluate several ISD prediction tools, over a range of survival datasets.
Tasks	Calibration
Published	2018-11-28
URL	http://arxiv.org/abs/1811.11347v1
PDF	http://arxiv.org/pdf/1811.11347v1.pdf
PWC	https://paperswithcode.com/paper/effective-ways-to-build-and-evaluate
Repo	https://github.com/haiderstats/ISDEvaluation
Framework	none

Dynamic Few-Shot Visual Learning without Forgetting


Title	Dynamic Few-Shot Visual Learning without Forgetting
Authors	Spyros Gidaris, Nikos Komodakis
Abstract	The human visual system has the remarkably ability to be able to effortlessly learn novel concepts from only a few examples. Mimicking the same behavior on machine learning vision systems is an interesting and very challenging research problem with many practical advantages on real world vision applications. In this context, the goal of our work is to devise a few-shot visual learning system that during test time it will be able to efficiently learn novel categories from only a few training data while at the same time it will not forget the initial categories on which it was trained (here called base categories). To achieve that goal we propose (a) to extend an object recognition system with an attention based few-shot classification weight generator, and (b) to redesign the classifier of a ConvNet model as the cosine similarity function between feature representations and classification weight vectors. The latter, apart from unifying the recognition of both novel and base categories, it also leads to feature representations that generalize better on “unseen” categories. We extensively evaluate our approach on Mini-ImageNet where we manage to improve the prior state-of-the-art on few-shot recognition (i.e., we achieve 56.20% and 73.00% on the 1-shot and 5-shot settings respectively) while at the same time we do not sacrifice any accuracy on the base categories, which is a characteristic that most prior approaches lack. Finally, we apply our approach on the recently introduced few-shot benchmark of Bharath and Girshick [4] where we also achieve state-of-the-art results. The code and models of our paper will be published on: https://github.com/gidariss/FewShotWithoutForgetting
Tasks	Few-Shot Image Classification, Few-Shot Learning, Object Recognition, One-Shot Learning
Published	2018-04-25
URL	http://arxiv.org/abs/1804.09458v1
PDF	http://arxiv.org/pdf/1804.09458v1.pdf
PWC	https://paperswithcode.com/paper/dynamic-few-shot-visual-learning-without
Repo	https://github.com/gidariss/FewShotWithoutForgetting
Framework	pytorch

Memory Attention Networks for Skeleton-based Action Recognition


Title	Memory Attention Networks for Skeleton-based Action Recognition
Authors	Chunyu Xie, Ce Li, Baochang Zhang, Chen Chen, Jungong Han, Changqing Zou, Jianzhuang Liu
Abstract	Skeleton-based action recognition task is entangled with complex spatio-temporal variations of skeleton joints, and remains challenging for Recurrent Neural Networks (RNNs). In this work, we propose a temporal-then-spatial recalibration scheme to alleviate such complex variations, resulting in an end-to-end Memory Attention Networks (MANs) which consist of a Temporal Attention Recalibration Module (TARM) and a Spatio-Temporal Convolution Module (STCM). Specifically, the TARM is deployed in a residual learning module that employs a novel attention learning network to recalibrate the temporal attention of frames in a skeleton sequence. The STCM treats the attention calibrated skeleton joint sequences as images and leverages the Convolution Neural Networks (CNNs) to further model the spatial and temporal information of skeleton data. These two modules (TARM and STCM) seamlessly form a single network architecture that can be trained in an end-to-end fashion. MANs significantly boost the performance of skeleton-based action recognition and achieve the best results on four challenging benchmark datasets: NTU RGB+D, HDM05, SYSU-3D and UT-Kinect.
Tasks	Skeleton Based Action Recognition, Temporal Action Localization
Published	2018-04-23
URL	http://arxiv.org/abs/1804.08254v2
PDF	http://arxiv.org/pdf/1804.08254v2.pdf
PWC	https://paperswithcode.com/paper/memory-attention-networks-for-skeleton-based
Repo	https://github.com/memory-attention-networks/MANs
Framework	tf

Interpreting and Explaining Deep Neural Networks for Classification of Audio Signals


Title	Interpreting and Explaining Deep Neural Networks for Classification of Audio Signals
Authors	Sören Becker, Marcel Ackermann, Sebastian Lapuschkin, Klaus-Robert Müller, Wojciech Samek
Abstract	Interpretability of deep neural networks is a recently emerging area of machine learning research targeting a better understanding of how models perform feature selection and derive their classification decisions. This paper explores the interpretability of neural networks in the audio domain by using the previously proposed technique of layer-wise relevance propagation (LRP). We present a novel audio dataset of English spoken digits which we use for classification tasks on spoken digits and speaker’s gender. We use LRP to identify relevant features for two neural network architectures that process either waveform or spectrogram representations of the data. Based on the relevance scores obtained from LRP, hypotheses about the neural networks’ feature selection are derived and subsequently tested through systematic manipulations of the input data. The results confirm that the networks are highly reliant on features marked as relevant by LRP.
Tasks	Audio Classification, Decision Making, Feature Selection
Published	2018-07-09
URL	https://arxiv.org/abs/1807.03418v2
PDF	https://arxiv.org/pdf/1807.03418v2.pdf
PWC	https://paperswithcode.com/paper/interpreting-and-explaining-deep-neural
Repo	https://github.com/soerenab/AudioMNIST
Framework	none

A Closer Look at Weak Label Learning for Audio Events


Title	A Closer Look at Weak Label Learning for Audio Events
Authors	Ankit Shah, Anurag Kumar, Alexander G. Hauptmann, Bhiksha Raj
Abstract	Audio content analysis in terms of sound events is an important research problem for a variety of applications. Recently, the development of weak labeling approaches for audio or sound event detection (AED) and availability of large scale weakly labeled dataset have finally opened up the possibility of large scale AED. However, a deeper understanding of how weak labels affect the learning for sound events is still missing from literature. In this work, we first describe a CNN based approach for weakly supervised training of audio events. The approach follows some basic design principle desirable in a learning method relying on weakly labeled audio. We then describe important characteristics, which naturally arise in weakly supervised learning of sound events. We show how these aspects of weak labels affect the generalization of models. More specifically, we study how characteristics such as label density and corruption of labels affects weakly supervised training for audio events. We also study the feasibility of directly obtaining weak labeled data from the web without any manual label and compare it with a dataset which has been manually labeled. The analysis and understanding of these factors should be taken into picture in the development of future weak label learning methods. Audioset, a large scale weakly labeled dataset for sound events is used in our experiments.
Tasks	Audio Classification, Sound Event Detection
Published	2018-04-24
URL	http://arxiv.org/abs/1804.09288v1
PDF	http://arxiv.org/pdf/1804.09288v1.pdf
PWC	https://paperswithcode.com/paper/a-closer-look-at-weak-label-learning-for
Repo	https://github.com/ankitshah009/WALNet-Weak_Label_Analysis
Framework	tf

Hyperspectral Image Classification in the Presence of Noisy Labels


Title	Hyperspectral Image Classification in the Presence of Noisy Labels
Authors	Junjun Jiang, Jiayi Ma, Zheng Wang, Chen Chen, Xianming Liu
Abstract	Label information plays an important role in supervised hyperspectral image classification problem. However, current classification methods all ignore an important and inevitable problem—labels may be corrupted and collecting clean labels for training samples is difficult, and often impractical. Therefore, how to learn from the database with noisy labels is a problem of great practical importance. In this paper, we study the influence of label noise on hyperspectral image classification, and develop a random label propagation algorithm (RLPA) to cleanse the label noise. The key idea of RLPA is to exploit knowledge (e.g., the superpixel based spectral-spatial constraints) from the observed hyperspectral images and apply it to the process of label propagation. Specifically, RLPA first constructs a spectral-spatial probability transfer matrix (SSPTM) that simultaneously considers the spectral similarity and superpixel based spatial information. It then randomly chooses some training samples as “clean” samples and sets the rest as unlabeled samples, and propagates the label information from the “clean” samples to the rest unlabeled samples with the SSPTM. By repeating the random assignment (of “clean” labeled samples and unlabeled samples) and propagation, we can obtain multiple labels for each training sample. Therefore, the final propagated label can be calculated by a majority vote algorithm. Experimental studies show that RLPA can reduce the level of noisy label and demonstrates the advantages of our proposed method over four major classifiers with a significant margin—the gains in terms of the average OA, AA, Kappa are impressive, e.g., 9.18%, 9.58%, and 0.1043. The Matlab source code is available at https://github.com/junjun-jiang/RLPA
Tasks	Hyperspectral Image Classification, Image Classification
Published	2018-09-12
URL	http://arxiv.org/abs/1809.04212v2
PDF	http://arxiv.org/pdf/1809.04212v2.pdf
PWC	https://paperswithcode.com/paper/hyperspectral-image-classification-in-the
Repo	https://github.com/junjun-jiang/RLPA
Framework	none