Paper Group AWR 23
Pervasive Attention: 2D Convolutional Neural Networks for Sequence-to-Sequence Prediction. Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition. Is preprocessing of text really worth your time for online comment classification?. View Adaptive Neural Networks for High Performance Skeleton-based Human Action Recognit …
Pervasive Attention: 2D Convolutional Neural Networks for Sequence-to-Sequence Prediction
Title | Pervasive Attention: 2D Convolutional Neural Networks for Sequence-to-Sequence Prediction |
Authors | Maha Elbayad, Laurent Besacier, Jakob Verbeek |
Abstract | Current state-of-the-art machine translation systems are based on encoder-decoder architectures, that first encode the input sequence, and then generate an output sequence based on the input encoding. Both are interfaced with an attention mechanism that recombines a fixed encoding of the source tokens based on the decoder state. We propose an alternative approach which instead relies on a single 2D convolutional neural network across both sequences. Each layer of our network re-codes source tokens on the basis of the output sequence produced so far. Attention-like properties are therefore pervasive throughout the network. Our model yields excellent results, outperforming state-of-the-art encoder-decoder systems, while being conceptually simpler and having fewer parameters. |
Tasks | Machine Translation |
Published | 2018-08-11 |
URL | http://arxiv.org/abs/1808.03867v3 |
http://arxiv.org/pdf/1808.03867v3.pdf | |
PWC | https://paperswithcode.com/paper/pervasive-attention-2d-convolutional-neural-1 |
Repo | https://github.com/macaba/NNDN |
Framework | none |
Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition
Title | Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition |
Authors | Lei Shi, Yifan Zhang, Jian Cheng, Hanqing Lu |
Abstract | In skeleton-based action recognition, graph convolutional networks (GCNs), which model the human body skeletons as spatiotemporal graphs, have achieved remarkable performance. However, in existing GCN-based methods, the topology of the graph is set manually, and it is fixed over all layers and input samples. This may not be optimal for the hierarchical GCN and diverse samples in action recognition tasks. In addition, the second-order information (the lengths and directions of bones) of the skeleton data, which is naturally more informative and discriminative for action recognition, is rarely investigated in existing methods. In this work, we propose a novel two-stream adaptive graph convolutional network (2s-AGCN) for skeleton-based action recognition. The topology of the graph in our model can be either uniformly or individually learned by the BP algorithm in an end-to-end manner. This data-driven method increases the flexibility of the model for graph construction and brings more generality to adapt to various data samples. Moreover, a two-stream framework is proposed to model both the first-order and the second-order information simultaneously, which shows notable improvement for the recognition accuracy. Extensive experiments on the two large-scale datasets, NTU-RGBD and Kinetics-Skeleton, demonstrate that the performance of our model exceeds the state-of-the-art with a significant margin. |
Tasks | graph construction, Skeleton Based Action Recognition, Temporal Action Localization |
Published | 2018-05-20 |
URL | https://arxiv.org/abs/1805.07694v3 |
https://arxiv.org/pdf/1805.07694v3.pdf | |
PWC | https://paperswithcode.com/paper/non-local-graph-convolutional-networks-for |
Repo | https://github.com/lshiwjx/2s-AGCN |
Framework | pytorch |
Is preprocessing of text really worth your time for online comment classification?
Title | Is preprocessing of text really worth your time for online comment classification? |
Authors | Fahim Mohammad |
Abstract | A large proportion of online comments present on public domains are constructive, however a significant proportion are toxic in nature. The comments contain lot of typos which increases the number of features manifold, making the ML model difficult to train. Considering the fact that the data scientists spend approximately 80% of their time in collecting, cleaning and organizing their data [1], we explored how much effort should we invest in the preprocessing (transformation) of raw comments before feeding it to the state-of-the-art classification models. With the help of four models on Jigsaw toxic comment classification data, we demonstrated that the training of model without any transformation produce relatively decent model. Applying even basic transformations, in some cases, lead to worse performance and should be applied with caution. |
Tasks | |
Published | 2018-06-07 |
URL | http://arxiv.org/abs/1806.02908v2 |
http://arxiv.org/pdf/1806.02908v2.pdf | |
PWC | https://paperswithcode.com/paper/is-preprocessing-of-text-really-worth-your |
Repo | https://github.com/ifahim/toxic-preprocess |
Framework | none |
View Adaptive Neural Networks for High Performance Skeleton-based Human Action Recognition
Title | View Adaptive Neural Networks for High Performance Skeleton-based Human Action Recognition |
Authors | Pengfei Zhang, Cuiling Lan, Junliang Xing, Wenjun Zeng, Jianru Xue, Nanning Zheng |
Abstract | Skeleton-based human action recognition has recently attracted increasing attention thanks to the accessibility and the popularity of 3D skeleton data. One of the key challenges in skeleton-based action recognition lies in the large view variations when capturing data. In order to alleviate the effects of view variations, this paper introduces a novel view adaptation scheme, which automatically determines the virtual observation viewpoints in a learning based data driven manner. We design two view adaptive neural networks, i.e., VA-RNN based on RNN, and VA-CNN based on CNN. For each network, a novel view adaptation module learns and determines the most suitable observation viewpoints, and transforms the skeletons to those viewpoints for the end-to-end recognition with a main classification network. Ablation studies find that the proposed view adaptive models are capable of transforming the skeletons of various viewpoints to much more consistent virtual viewpoints which largely eliminates the viewpoint influence. In addition, we design a two-stream scheme (referred to as VA-fusion) that fuses the scores of the two networks to provide the fused prediction. Extensive experimental evaluations on five challenging benchmarks demonstrate that the effectiveness of the proposed view-adaptive networks and superior performance over state-of-the-art approaches. The source code is available at https://github.com/microsoft/View-Adaptive-Neural-Networks-for-Skeleton-based-Human-Action-Recognition. |
Tasks | Skeleton Based Action Recognition, Temporal Action Localization |
Published | 2018-04-20 |
URL | https://arxiv.org/abs/1804.07453v3 |
https://arxiv.org/pdf/1804.07453v3.pdf | |
PWC | https://paperswithcode.com/paper/view-adaptive-neural-networks-for-high |
Repo | https://github.com/microsoft/View-Adaptive-Neural-Networks-for-Skeleton-based-Human-Action-Recognition |
Framework | pytorch |
Global overview of Imitation Learning
Title | Global overview of Imitation Learning |
Authors | Alexandre Attia, Sharone Dayan |
Abstract | Imitation Learning is a sequential task where the learner tries to mimic an expert’s action in order to achieve the best performance. Several algorithms have been proposed recently for this task. In this project, we aim at proposing a wide review of these algorithms, presenting their main features and comparing them on their performance and their regret bounds. |
Tasks | Imitation Learning |
Published | 2018-01-19 |
URL | http://arxiv.org/abs/1801.06503v1 |
http://arxiv.org/pdf/1801.06503v1.pdf | |
PWC | https://paperswithcode.com/paper/global-overview-of-imitation-learning |
Repo | https://github.com/hbzhang/awesomeimitationlearning |
Framework | none |
Faster PET Reconstruction with Non-Smooth Priors by Randomization and Preconditioning
Title | Faster PET Reconstruction with Non-Smooth Priors by Randomization and Preconditioning |
Authors | Matthias J. Ehrhardt, Pawel Markiewicz, Carola-Bibiane Schönlieb |
Abstract | Uncompressed clinical data from modern positron emission tomography (PET) scanners are very large, exceeding 350 million data points (projection bins). The last decades have seen tremendous advancements in mathematical imaging tools many of which lead to non-smooth (i.e. non-differentiable) optimization problems which are much harder to solve than smooth optimization problems. Most of these tools have not been translated to clinical PET data, as the state-of-the-art algorithms for non-smooth problems do not scale well to large data. In this work, inspired by big data machine learning applications, we use advanced randomized optimization algorithms to solve the PET reconstruction problem for a very large class of non-smooth priors which includes for example total variation, total generalized variation, directional total variation and various different physical constraints. The proposed algorithm randomly uses subsets of the data and only updates the variables associated with these. While this idea often leads to divergent algorithms, we show that the proposed algorithm does indeed converge for any proper subset selection. Numerically, we show on real PET data (FDG and florbetapir) from a Siemens Biograph mMR that about ten projections and backprojections are sufficient to solve the MAP optimisation problem related to many popular non-smooth priors; thus showing that the proposed algorithm is fast enough to bring these models into routine clinical practice. |
Tasks | |
Published | 2018-08-21 |
URL | https://arxiv.org/abs/1808.07150v5 |
https://arxiv.org/pdf/1808.07150v5.pdf | |
PWC | https://paperswithcode.com/paper/faster-pet-reconstruction-with-non-smooth |
Repo | https://github.com/mehrhardt/spdhg |
Framework | none |
Competency Questions and SPARQL-OWL Queries Dataset and Analysis
Title | Competency Questions and SPARQL-OWL Queries Dataset and Analysis |
Authors | Dawid Wisniewski, Jedrzej Potoniec, Agnieszka Lawrynowicz, C. Maria Keet |
Abstract | Competency Questions (CQs) are natural language questions outlining and constraining the scope of knowledge represented by an ontology. Despite that CQs are a part of several ontology engineering methodologies, we have observed that the actual publication of CQs for the available ontologies is very limited and even scarcer is the publication of their respective formalisations in terms of, e.g., SPARQL queries. This paper aims to contribute to addressing the engineering shortcomings of using CQs in ontology development, to facilitate wider use of CQs. In order to understand the relation between CQs and the queries over the ontology to test the CQs on an ontology, we gather, analyse, and publicly release a set of 234 CQs and their translations to SPARQL-OWL for several ontologies in different domains developed by different groups. We analysed the CQs in two principal ways. The first stage focused on a linguistic analysis of the natural language text itself, i.e., a lexico-syntactic analysis without any presuppositions of ontology elements, and a subsequent step of semantic analysis in order to find patterns. This increased diversity of CQ sources resulted in a 5-fold increase of hitherto published patterns, to 106 distinct CQ patterns, which have a limited subset of few patterns shared across the CQ sets from the different ontologies. Next, we analysed the relation between the found CQ patterns and the 46 SPARQL-OWL query signatures, which revealed that one CQ pattern may be realised by more than one SPARQL-OWL query signature, and vice versa. We hope that our work will contribute to establishing common practices, templates, automation, and user tools that will support CQ formulation, formalisation, execution, and general management. |
Tasks | |
Published | 2018-11-23 |
URL | http://arxiv.org/abs/1811.09529v1 |
http://arxiv.org/pdf/1811.09529v1.pdf | |
PWC | https://paperswithcode.com/paper/competency-questions-and-sparql-owl-queries |
Repo | https://github.com/CQ2SPARQLOWL/Dataset |
Framework | none |
Look at Boundary: A Boundary-Aware Face Alignment Algorithm
Title | Look at Boundary: A Boundary-Aware Face Alignment Algorithm |
Authors | Wayne Wu, Chen Qian, Shuo Yang, Quan Wang, Yici Cai, Qiang Zhou |
Abstract | We present a novel boundary-aware face alignment algorithm by utilising boundary lines as the geometric structure of a human face to help facial landmark localisation. Unlike the conventional heatmap based method and regression based method, our approach derives face landmarks from boundary lines which remove the ambiguities in the landmark definition. Three questions are explored and answered by this work: 1. Why using boundary? 2. How to use boundary? 3. What is the relationship between boundary estimation and landmarks localisation? Our boundary- aware face alignment algorithm achieves 3.49% mean error on 300-W Fullset, which outperforms state-of-the-art methods by a large margin. Our method can also easily integrate information from other datasets. By utilising boundary information of 300-W dataset, our method achieves 3.92% mean error with 0.39% failure rate on COFW dataset, and 1.25% mean error on AFLW-Full dataset. Moreover, we propose a new dataset WFLW to unify training and testing across different factors, including poses, expressions, illuminations, makeups, occlusions, and blurriness. Dataset and model will be publicly available at https://wywu.github.io/projects/LAB/LAB.html |
Tasks | Face Alignment |
Published | 2018-05-26 |
URL | http://arxiv.org/abs/1805.10483v1 |
http://arxiv.org/pdf/1805.10483v1.pdf | |
PWC | https://paperswithcode.com/paper/look-at-boundary-a-boundary-aware-face |
Repo | https://github.com/zhusz/CVPR15-CFSS |
Framework | none |
Efficient Dense Modules of Asymmetric Convolution for Real-Time Semantic Segmentation
Title | Efficient Dense Modules of Asymmetric Convolution for Real-Time Semantic Segmentation |
Authors | Shao-Yuan Lo, Hsueh-Ming Hang, Sheng-Wei Chan, Jing-Jhih Lin |
Abstract | Real-time semantic segmentation plays an important role in practical applications such as self-driving and robots. Most semantic segmentation research focuses on improving estimation accuracy with little consideration on efficiency. Several previous studies that emphasize high-speed inference often fail to produce high-accuracy segmentation results. In this paper, we propose a novel convolutional network named Efficient Dense modules with Asymmetric convolution (EDANet), which employs an asymmetric convolution structure and incorporates dilated convolution and dense connectivity to achieve high efficiency at low computational cost and model size. EDANet is 2.7 times faster than the existing fast segmentation network, ICNet, while it achieves a similar mIoU score without any additional context module, post-processing scheme, and pretrained model. We evaluate EDANet on Cityscapes and CamVid datasets, and compare it with the other state-of-art systems. Our network can run with the high-resolution inputs at the speed of 108 FPS on one GTX 1080Ti. |
Tasks | Real-Time Semantic Segmentation, Semantic Segmentation |
Published | 2018-09-17 |
URL | https://arxiv.org/abs/1809.06323v3 |
https://arxiv.org/pdf/1809.06323v3.pdf | |
PWC | https://paperswithcode.com/paper/efficient-dense-modules-of-asymmetric |
Repo | https://github.com/shaoyuanlo/EDANet |
Framework | pytorch |
Effective Ways to Build and Evaluate Individual Survival Distributions
Title | Effective Ways to Build and Evaluate Individual Survival Distributions |
Authors | Humza Haider, Bret Hoehn, Sarah Davis, Russell Greiner |
Abstract | An accurate model of a patient’s individual survival distribution can help determine the appropriate treatment for terminal patients. Unfortunately, risk scores (e.g., from Cox Proportional Hazard models) do not provide survival probabilities, single-time probability models (e.g., the Gail model, predicting 5 year probability) only provide for a single time point, and standard Kaplan-Meier survival curves provide only population averages for a large class of patients meaning they are not specific to individual patients. This motivates an alternative class of tools that can learn a model which provides an individual survival distribution which gives survival probabilities across all times - such as extensions to the Cox model, Accelerated Failure Time, an extension to Random Survival Forests, and Multi-Task Logistic Regression. This paper first motivates such “individual survival distribution” (ISD) models, and explains how they differ from standard models. It then discusses ways to evaluate such models - namely Concordance, 1-Calibration, Brier score, and various versions of L1-loss - and then motivates and defines a novel approach “D-Calibration”, which determines whether a model’s probability estimates are meaningful. We also discuss how these measures differ, and use them to evaluate several ISD prediction tools, over a range of survival datasets. |
Tasks | Calibration |
Published | 2018-11-28 |
URL | http://arxiv.org/abs/1811.11347v1 |
http://arxiv.org/pdf/1811.11347v1.pdf | |
PWC | https://paperswithcode.com/paper/effective-ways-to-build-and-evaluate |
Repo | https://github.com/haiderstats/ISDEvaluation |
Framework | none |
Dynamic Few-Shot Visual Learning without Forgetting
Title | Dynamic Few-Shot Visual Learning without Forgetting |
Authors | Spyros Gidaris, Nikos Komodakis |
Abstract | The human visual system has the remarkably ability to be able to effortlessly learn novel concepts from only a few examples. Mimicking the same behavior on machine learning vision systems is an interesting and very challenging research problem with many practical advantages on real world vision applications. In this context, the goal of our work is to devise a few-shot visual learning system that during test time it will be able to efficiently learn novel categories from only a few training data while at the same time it will not forget the initial categories on which it was trained (here called base categories). To achieve that goal we propose (a) to extend an object recognition system with an attention based few-shot classification weight generator, and (b) to redesign the classifier of a ConvNet model as the cosine similarity function between feature representations and classification weight vectors. The latter, apart from unifying the recognition of both novel and base categories, it also leads to feature representations that generalize better on “unseen” categories. We extensively evaluate our approach on Mini-ImageNet where we manage to improve the prior state-of-the-art on few-shot recognition (i.e., we achieve 56.20% and 73.00% on the 1-shot and 5-shot settings respectively) while at the same time we do not sacrifice any accuracy on the base categories, which is a characteristic that most prior approaches lack. Finally, we apply our approach on the recently introduced few-shot benchmark of Bharath and Girshick [4] where we also achieve state-of-the-art results. The code and models of our paper will be published on: https://github.com/gidariss/FewShotWithoutForgetting |
Tasks | Few-Shot Image Classification, Few-Shot Learning, Object Recognition, One-Shot Learning |
Published | 2018-04-25 |
URL | http://arxiv.org/abs/1804.09458v1 |
http://arxiv.org/pdf/1804.09458v1.pdf | |
PWC | https://paperswithcode.com/paper/dynamic-few-shot-visual-learning-without |
Repo | https://github.com/gidariss/FewShotWithoutForgetting |
Framework | pytorch |
Memory Attention Networks for Skeleton-based Action Recognition
Title | Memory Attention Networks for Skeleton-based Action Recognition |
Authors | Chunyu Xie, Ce Li, Baochang Zhang, Chen Chen, Jungong Han, Changqing Zou, Jianzhuang Liu |
Abstract | Skeleton-based action recognition task is entangled with complex spatio-temporal variations of skeleton joints, and remains challenging for Recurrent Neural Networks (RNNs). In this work, we propose a temporal-then-spatial recalibration scheme to alleviate such complex variations, resulting in an end-to-end Memory Attention Networks (MANs) which consist of a Temporal Attention Recalibration Module (TARM) and a Spatio-Temporal Convolution Module (STCM). Specifically, the TARM is deployed in a residual learning module that employs a novel attention learning network to recalibrate the temporal attention of frames in a skeleton sequence. The STCM treats the attention calibrated skeleton joint sequences as images and leverages the Convolution Neural Networks (CNNs) to further model the spatial and temporal information of skeleton data. These two modules (TARM and STCM) seamlessly form a single network architecture that can be trained in an end-to-end fashion. MANs significantly boost the performance of skeleton-based action recognition and achieve the best results on four challenging benchmark datasets: NTU RGB+D, HDM05, SYSU-3D and UT-Kinect. |
Tasks | Skeleton Based Action Recognition, Temporal Action Localization |
Published | 2018-04-23 |
URL | http://arxiv.org/abs/1804.08254v2 |
http://arxiv.org/pdf/1804.08254v2.pdf | |
PWC | https://paperswithcode.com/paper/memory-attention-networks-for-skeleton-based |
Repo | https://github.com/memory-attention-networks/MANs |
Framework | tf |
Interpreting and Explaining Deep Neural Networks for Classification of Audio Signals
Title | Interpreting and Explaining Deep Neural Networks for Classification of Audio Signals |
Authors | Sören Becker, Marcel Ackermann, Sebastian Lapuschkin, Klaus-Robert Müller, Wojciech Samek |
Abstract | Interpretability of deep neural networks is a recently emerging area of machine learning research targeting a better understanding of how models perform feature selection and derive their classification decisions. This paper explores the interpretability of neural networks in the audio domain by using the previously proposed technique of layer-wise relevance propagation (LRP). We present a novel audio dataset of English spoken digits which we use for classification tasks on spoken digits and speaker’s gender. We use LRP to identify relevant features for two neural network architectures that process either waveform or spectrogram representations of the data. Based on the relevance scores obtained from LRP, hypotheses about the neural networks’ feature selection are derived and subsequently tested through systematic manipulations of the input data. The results confirm that the networks are highly reliant on features marked as relevant by LRP. |
Tasks | Audio Classification, Decision Making, Feature Selection |
Published | 2018-07-09 |
URL | https://arxiv.org/abs/1807.03418v2 |
https://arxiv.org/pdf/1807.03418v2.pdf | |
PWC | https://paperswithcode.com/paper/interpreting-and-explaining-deep-neural |
Repo | https://github.com/soerenab/AudioMNIST |
Framework | none |
A Closer Look at Weak Label Learning for Audio Events
Title | A Closer Look at Weak Label Learning for Audio Events |
Authors | Ankit Shah, Anurag Kumar, Alexander G. Hauptmann, Bhiksha Raj |
Abstract | Audio content analysis in terms of sound events is an important research problem for a variety of applications. Recently, the development of weak labeling approaches for audio or sound event detection (AED) and availability of large scale weakly labeled dataset have finally opened up the possibility of large scale AED. However, a deeper understanding of how weak labels affect the learning for sound events is still missing from literature. In this work, we first describe a CNN based approach for weakly supervised training of audio events. The approach follows some basic design principle desirable in a learning method relying on weakly labeled audio. We then describe important characteristics, which naturally arise in weakly supervised learning of sound events. We show how these aspects of weak labels affect the generalization of models. More specifically, we study how characteristics such as label density and corruption of labels affects weakly supervised training for audio events. We also study the feasibility of directly obtaining weak labeled data from the web without any manual label and compare it with a dataset which has been manually labeled. The analysis and understanding of these factors should be taken into picture in the development of future weak label learning methods. Audioset, a large scale weakly labeled dataset for sound events is used in our experiments. |
Tasks | Audio Classification, Sound Event Detection |
Published | 2018-04-24 |
URL | http://arxiv.org/abs/1804.09288v1 |
http://arxiv.org/pdf/1804.09288v1.pdf | |
PWC | https://paperswithcode.com/paper/a-closer-look-at-weak-label-learning-for |
Repo | https://github.com/ankitshah009/WALNet-Weak_Label_Analysis |
Framework | tf |
Hyperspectral Image Classification in the Presence of Noisy Labels
Title | Hyperspectral Image Classification in the Presence of Noisy Labels |
Authors | Junjun Jiang, Jiayi Ma, Zheng Wang, Chen Chen, Xianming Liu |
Abstract | Label information plays an important role in supervised hyperspectral image classification problem. However, current classification methods all ignore an important and inevitable problem—labels may be corrupted and collecting clean labels for training samples is difficult, and often impractical. Therefore, how to learn from the database with noisy labels is a problem of great practical importance. In this paper, we study the influence of label noise on hyperspectral image classification, and develop a random label propagation algorithm (RLPA) to cleanse the label noise. The key idea of RLPA is to exploit knowledge (e.g., the superpixel based spectral-spatial constraints) from the observed hyperspectral images and apply it to the process of label propagation. Specifically, RLPA first constructs a spectral-spatial probability transfer matrix (SSPTM) that simultaneously considers the spectral similarity and superpixel based spatial information. It then randomly chooses some training samples as “clean” samples and sets the rest as unlabeled samples, and propagates the label information from the “clean” samples to the rest unlabeled samples with the SSPTM. By repeating the random assignment (of “clean” labeled samples and unlabeled samples) and propagation, we can obtain multiple labels for each training sample. Therefore, the final propagated label can be calculated by a majority vote algorithm. Experimental studies show that RLPA can reduce the level of noisy label and demonstrates the advantages of our proposed method over four major classifiers with a significant margin—the gains in terms of the average OA, AA, Kappa are impressive, e.g., 9.18%, 9.58%, and 0.1043. The Matlab source code is available at https://github.com/junjun-jiang/RLPA |
Tasks | Hyperspectral Image Classification, Image Classification |
Published | 2018-09-12 |
URL | http://arxiv.org/abs/1809.04212v2 |
http://arxiv.org/pdf/1809.04212v2.pdf | |
PWC | https://paperswithcode.com/paper/hyperspectral-image-classification-in-the |
Repo | https://github.com/junjun-jiang/RLPA |
Framework | none |