February 2, 2020

3112 words 15 mins read

Paper Group AWR 28

Paper Group AWR 28

Deep Learning with ConvNET Predicts Imagery Tasks Through EEG. Brain Signal Classification via Learning Connectivity Structure. Deep Learning on Small Datasets without Pre-Training using Cosine Loss. Skeleton-Based Action Recognition with Multi-Stream Adaptive Graph Convolutional Networks. Multi-resolution CSI Feedback with deep learning in Massive …

Deep Learning with ConvNET Predicts Imagery Tasks Through EEG

Title Deep Learning with ConvNET Predicts Imagery Tasks Through EEG
Authors Apdullah Yayık, Yakup Kutlu, Gökhan Altan
Abstract Deep learning with convolutional neural networks (ConvNets) have dramatically improved learning capabilities of computer vision applications just through considering raw data without any prior feature extraction. Nowadays, there is rising curiosity in interpreting and analyzing electroencephalography (EEG) dynamics with ConvNets. Our study focused on ConvNets of different structures, constructed for predicting imagined left and right movements on a subject-independent basis through raw EEG data. Results showed that recently advanced methods in machine learning field, i.e. adaptive moments and batch normalization together with dropout strategy, improved ConvNets predicting ability, outperforming that of conventional fully-connected neural networks with widely-used spectral features.
Tasks EEG
Published 2019-07-12
URL https://arxiv.org/abs/1907.05674v1
PDF https://arxiv.org/pdf/1907.05674v1.pdf
PWC https://paperswithcode.com/paper/deep-learning-with-convnet-predicts-imagery
Repo https://github.com/apdullahyayik/EEGMMI-Deep-ConvNET-
Framework none

Brain Signal Classification via Learning Connectivity Structure

Title Brain Signal Classification via Learning Connectivity Structure
Authors Soobeom Jang, Seong-Eun Moon, Jong-Seok Lee
Abstract Connectivity between different brain regions is one of the most important properties for classification of brain signals including electroencephalography (EEG). However, how to define the connectivity structure for a given task is still an open problem, because there is no ground truth about how the connectivity structure should be in order to maximize the performance. In this paper, we propose an end-to-end neural network model for EEG classification, which can extract an appropriate multi-layer graph structure and signal features directly from a set of raw EEG signals and perform classification. Experimental results demonstrate that our method yields improved performance in comparison to the existing approaches where manually defined connectivity structures and signal features are used. Furthermore, we show that the graph structure extraction process is reliable in terms of consistency, and the learned graph structures make much sense in the neuroscientific viewpoint.
Tasks EEG
Published 2019-05-28
URL https://arxiv.org/abs/1905.11678v2
PDF https://arxiv.org/pdf/1905.11678v2.pdf
PWC https://paperswithcode.com/paper/brain-signal-classification-via-learning
Repo https://github.com/ELEMKEP/bsc_lcs
Framework pytorch

Deep Learning on Small Datasets without Pre-Training using Cosine Loss

Title Deep Learning on Small Datasets without Pre-Training using Cosine Loss
Authors Björn Barz, Joachim Denzler
Abstract Two things seem to be indisputable in the contemporary deep learning discourse: 1. The categorical cross-entropy loss after softmax activation is the method of choice for classification. 2. Training a CNN classifier from scratch on small datasets does not work well. In contrast to this, we show that the cosine loss function provides significantly better performance than cross-entropy on datasets with only a handful of samples per class. For example, the accuracy achieved on the CUB-200-2011 dataset without pre-training is by 30% higher than with the cross-entropy loss. Further experiments on other popular datasets confirm our findings. Moreover, we demonstrate that integrating prior knowledge in the form of class hierarchies is straightforward with the cosine loss and improves classification performance further.
Published 2019-01-25
URL https://arxiv.org/abs/1901.09054v2
PDF https://arxiv.org/pdf/1901.09054v2.pdf
PWC https://paperswithcode.com/paper/deep-learning-on-small-datasets-without-pre
Repo https://github.com/cvjena/semantic-embeddings
Framework tf

Skeleton-Based Action Recognition with Multi-Stream Adaptive Graph Convolutional Networks

Title Skeleton-Based Action Recognition with Multi-Stream Adaptive Graph Convolutional Networks
Authors Lei Shi, Yifan Zhang, Jian Cheng, Hanqing LU
Abstract Graph convolutional networks (GCNs), which generalize CNNs to more generic non-Euclidean structures, have achieved remarkable performance for skeleton-based action recognition. However, there still exist several issues in the previous GCN-based models. First, the topology of the graph is set heuristically and fixed over all the model layers and input data. This may not be suitable for the hierarchy of the GCN model and the diversity of the data in action recognition tasks. Second, the second-order information of the skeleton data, i.e., the length and orientation of the bones, is rarely investigated, which is naturally more informative and discriminative for the human action recognition. In this work, we propose a novel multi-stream attention-enhanced adaptive graph convolutional neural network (MS-AAGCN) for skeleton-based action recognition. The graph topology in our model can be either uniformly or individually learned based on the input data in an end-to-end manner. This data-driven approach increases the flexibility of the model for graph construction and brings more generality to adapt to various data samples. Besides, the proposed adaptive graph convolutional layer is further enhanced by a spatial-temporal-channel attention module, which helps the model pay more attention to important joints, frames and features. Moreover, the information of both the joints and bones, together with their motion information, are simultaneously modeled in a multi-stream framework, which shows notable improvement for the recognition accuracy. Extensive experiments on the two large-scale datasets, NTU-RGBD and Kinetics-Skeleton, demonstrate that the performance of our model exceeds the state-of-the-art with a significant margin.
Tasks graph construction, Skeleton Based Action Recognition, Temporal Action Localization
Published 2019-12-15
URL https://arxiv.org/abs/1912.06971v1
PDF https://arxiv.org/pdf/1912.06971v1.pdf
PWC https://paperswithcode.com/paper/skeleton-based-action-recognition-with-multi
Repo https://github.com/fdu-wuyuan/Siren
Framework none

Multi-resolution CSI Feedback with deep learning in Massive MIMO System

Title Multi-resolution CSI Feedback with deep learning in Massive MIMO System
Authors Zhilin Lu, Jintao Wang, Jian Song
Abstract In massive multiple-input multiple-output (MIMO) system, user equipment (UE) needs to send downlink channel state information (CSI) back to base station (BS). However, the feedback becomes expensive with the growing complexity of CSI in massive MIMO system. Recently, deep learning (DL) approaches are used to improve the reconstruction efficiency of CSI feedback. In this paper, a novel feedback network named CRNet is proposed to achieve better performance via extracting CSI features on multiple resolutions. An advanced training scheme that further boosts the network performance is also introduced. Simulation results show that the proposed CRNet outperforms the state-of-the-art CsiNet under the same computational complexity without any extra information. The open source codes are available at https://github.com/Kylin9511/CRNet
Published 2019-10-31
URL https://arxiv.org/abs/1910.14322v1
PDF https://arxiv.org/pdf/1910.14322v1.pdf
PWC https://paperswithcode.com/paper/multi-resolution-csi-feedback-with-deep
Repo https://github.com/Kylin9511/CRNet
Framework pytorch

MonoLoco: Monocular 3D Pedestrian Localization and Uncertainty Estimation

Title MonoLoco: Monocular 3D Pedestrian Localization and Uncertainty Estimation
Authors Lorenzo Bertoni, Sven Kreiss, Alexandre Alahi
Abstract We tackle the fundamentally ill-posed problem of 3D human localization from monocular RGB images. Driven by the limitation of neural networks outputting point estimates, we address the ambiguity in the task by predicting confidence intervals through a loss function based on the Laplace distribution. Our architecture is a light-weight feed-forward neural network that predicts 3D locations and corresponding confidence intervals given 2D human poses. The design is particularly well suited for small training data, cross-dataset generalization, and real-time applications. Our experiments show that we (i) outperform state-of-the-art results on KITTI and nuScenes datasets, (ii) even outperform a stereo-based method for far-away pedestrians, and (iii) estimate meaningful confidence intervals. We further share insights on our model of uncertainty in cases of limited observations and out-of-distribution samples.
Tasks 3D Depth Estimation, 3D Object Detection, Self-Driving Cars
Published 2019-06-14
URL https://arxiv.org/abs/1906.06059v2
PDF https://arxiv.org/pdf/1906.06059v2.pdf
PWC https://paperswithcode.com/paper/monoloco-monocular-3d-pedestrian-localization
Repo https://github.com/vita-epfl/monoloco
Framework pytorch

DeepGCNs: Making GCNs Go as Deep as CNNs

Title DeepGCNs: Making GCNs Go as Deep as CNNs
Authors Guohao Li, Matthias Müller, Guocheng Qian, Itzel C. Delgadillo, Abdulellah Abualshour, Ali Thabet, Bernard Ghanem
Abstract Convolutional Neural Networks (CNNs) have been very successful at solving a variety of computer vision tasks such as object classification and detection, semantic segmentation, activity understanding, to name just a few. One key enabling factor for their great performance has been the ability to train very deep CNNs. Despite their huge success in many tasks, CNNs do not work well with non-Euclidean data which is prevalent in many real-world applications. Graph Convolutional Networks (GCNs) offer an alternative that allows for non-Eucledian data as input to a neural network similar to CNNs. While GCNs already achieve encouraging results, they are currently limited to shallow architectures with 2-4 layers due to vanishing gradients during training. This work transfers concepts such as residual/dense connections and dilated convolutions from CNNs to GCNs in order to successfully train very deep GCNs. We show the benefit of deep GCNs with as many as 112 layers experimentally across various datasets and tasks. Specifically, we achieve state-of-the-art performance in part segmentation and semantic segmentation on point clouds and in node classification of protein functions across biological protein-protein interaction (PPI) graphs. We believe that the insights in this work will open a lot of avenues for future research on GCNs and transfer to further tasks not explored in this work. The source code for this work is available for Pytorch and Tensorflow at https://github.com/lightaime/deep_gcns_torch and https://github.com/lightaime/deep_gcns respectively.
Tasks Node Classification, Object Classification, Semantic Segmentation
Published 2019-10-15
URL https://arxiv.org/abs/1910.06849v1
PDF https://arxiv.org/pdf/1910.06849v1.pdf
PWC https://paperswithcode.com/paper/deepgcns-making-gcns-go-as-deep-as-cnns
Repo https://github.com/lightaime/deep_gcns
Framework tf

Spatio-spectral networks for color-texture analysis

Title Spatio-spectral networks for color-texture analysis
Authors Leonardo F. S. Scabini, Lucas C. Ribas, Odemir M. Bruno
Abstract Texture is one of the most-studied visual attribute for image characterization since the 1960s. However, most hand-crafted descriptors are monochromatic, focusing on the gray scale images and discarding the color information. In this context, this work focus on a new method for color texture analysis considering all color channels in a more intrinsic approach. Our proposal consists of modeling color images as directed complex networks that we named Spatio-Spectral Network (SSN). Its topology includes within-channel edges that cover spatial patterns throughout individual image color channels, while between-channel edges tackle spectral properties of channel pairs in an opponent fashion. Image descriptors are obtained through a concise topological characterization of the modeled network in a multiscale approach with radially symmetric neighborhoods. Experiments with four datasets cover several aspects of color-texture analysis, and results demonstrate that SSN overcomes all the compared literature methods, including known deep convolutional networks, and also has the most stable performance between datasets, achieving $98.5(\pm1.1)$ of average accuracy against $97.1(\pm1.3)$ of MCND and $96.8(\pm3.2)$ of AlexNet. Additionally, an experiment verifies the performance of the methods under different color spaces, where results show that SSN also has higher performance and robustness.
Tasks Texture Classification
Published 2019-09-13
URL https://arxiv.org/abs/1909.06446v1
PDF https://arxiv.org/pdf/1909.06446v1.pdf
PWC https://paperswithcode.com/paper/spatio-spectral-networks-for-color-texture
Repo https://github.com/scabini/ssn
Framework none

Combined tract segmentation and orientation mapping for bundle-specific tractography

Title Combined tract segmentation and orientation mapping for bundle-specific tractography
Authors Jakob Wasserthal, Peter Neher, Dusan Hirjak, Klaus H. Maier-Hein
Abstract While the major white matter tracts are of great interest to numerous studies in neuroscience and medicine, their manual dissection in larger cohorts from diffusion MRI tractograms is time-consuming, requires expert knowledge and is hard to reproduce. In previous work we presented tract orientation mapping (TOM) as a novel concept for bundle-specific tractography. It is based on a learned mapping from the original fiber orientation distribution function (FOD) peaks to tract specific peaks, called tract orientation maps. Each tract orientation map represents the voxel-wise principal orientation of one tract. Here, we present an extension of this approach that combines TOM with accurate segmentations of the tract outline and its start and end region. We also introduce a custom probabilistic tracking algorithm that samples from a Gaussian distribution with fixed standard deviation centered on each peak thus enabling more complete trackings on the tract orientation maps than deterministic tracking. These extensions enable the automatic creation of bundle-specific tractograms with previously unseen accuracy. We show for 72 different bundles on high quality, low quality and phantom data that our approach runs faster and produces more accurate bundle-specific tractograms than 7 state of the art benchmark methods while avoiding cumbersome processing steps like whole brain tractography, non-linear registration, clustering or manual dissection. Moreover, we show on 17 datasets that our approach generalizes well to datasets acquired with different scanners and settings as well as with pathologies. The code of our method is openly available at https://github.com/MIC-DKFZ/TractSeg.
Published 2019-01-29
URL https://arxiv.org/abs/1901.10271v2
PDF https://arxiv.org/pdf/1901.10271v2.pdf
PWC https://paperswithcode.com/paper/combined-tract-segmentation-and-orientation
Repo https://github.com/MIC-DKFZ/TractSeg
Framework pytorch

Audiogmenter: a MATLAB Toolbox for Audio Data Augmentation

Title Audiogmenter: a MATLAB Toolbox for Audio Data Augmentation
Authors Gianluca Maguolo, Michelangelo Paci, Loris Nanni, Ludovico Bonan
Abstract Audio data augmentation is a key step in training deep neural networks for solving audio classification tasks. In this paper, we introduce Audiogmenter, a novel audio data augmentation library in MATLAB. We provide 15 different augmentation algorithms for raw audio data and 8 for spectrograms. We efficiently implemented several augmentation techniques whose usefulness has been extensively proved in the literature. To the best of our knowledge, this is the largest MATLAB audio data augmentation library freely available. We validate the efficiency of our algorithms evaluating them on the ESC-50 dataset. The toolbox and its documentation can be downloaded at https://github.com/LorisNanni/Audiogmenter.
Tasks Audio Classification, Data Augmentation
Published 2019-12-11
URL https://arxiv.org/abs/1912.05472v3
PDF https://arxiv.org/pdf/1912.05472v3.pdf
PWC https://paperswithcode.com/paper/audiogmenter-a-matlab-toolbox-for-audio-data
Repo https://github.com/LorisNanni/Audiogmenter
Framework none

Robust Bi-Tempered Logistic Loss Based on Bregman Divergences

Title Robust Bi-Tempered Logistic Loss Based on Bregman Divergences
Authors Ehsan Amid, Manfred K. Warmuth, Rohan Anil, Tomer Koren
Abstract We introduce a temperature into the exponential function and replace the softmax output layer of neural nets by a high temperature generalization. Similarly, the logarithm in the log loss we use for training is replaced by a low temperature logarithm. By tuning the two temperatures we create loss functions that are non-convex already in the single layer case. When replacing the last layer of the neural nets by our bi-temperature generalization of logistic loss, the training becomes more robust to noise. We visualize the effect of tuning the two temperatures in a simple setting and show the efficacy of our method on large data sets. Our methodology is based on Bregman divergences and is superior to a related two-temperature method using the Tsallis divergence.
Published 2019-06-08
URL https://arxiv.org/abs/1906.03361v3
PDF https://arxiv.org/pdf/1906.03361v3.pdf
PWC https://paperswithcode.com/paper/robust-bi-tempered-logistic-loss-based-on
Repo https://github.com/fhopfmueller/bi-tempered-loss-pytorch
Framework pytorch

A Bayesian Perspective on the Deep Image Prior

Title A Bayesian Perspective on the Deep Image Prior
Authors Zezhou Cheng, Matheus Gadelha, Subhransu Maji, Daniel Sheldon
Abstract The deep image prior was recently introduced as a prior for natural images. It represents images as the output of a convolutional network with random inputs. For “inference”, gradient descent is performed to adjust network parameters to make the output match observations. This approach yields good performance on a range of image reconstruction tasks. We show that the deep image prior is asymptotically equivalent to a stationary Gaussian process prior in the limit as the number of channels in each layer of the network goes to infinity, and derive the corresponding kernel. This informs a Bayesian approach to inference. We show that by conducting posterior inference using stochastic gradient Langevin we avoid the need for early stopping, which is a drawback of the current approach, and improve results for denoising and impainting tasks. We illustrate these intuitions on a number of 1D and 2D signal reconstruction tasks.
Tasks Bayesian Inference, Denoising, Image Reconstruction
Published 2019-04-16
URL http://arxiv.org/abs/1904.07457v1
PDF http://arxiv.org/pdf/1904.07457v1.pdf
PWC https://paperswithcode.com/paper/a-bayesian-perspective-on-the-deep-image
Repo https://github.com/ZezhouCheng/GP-DIP
Framework pytorch

Heterogeneous Memory Enhanced Multimodal Attention Model for Video Question Answering

Title Heterogeneous Memory Enhanced Multimodal Attention Model for Video Question Answering
Authors Chenyou Fan, Xiaofan Zhang, Shu Zhang, Wensheng Wang, Chi Zhang, Heng Huang
Abstract In this paper, we propose a novel end-to-end trainable Video Question Answering (VideoQA) framework with three major components: 1) a new heterogeneous memory which can effectively learn global context information from appearance and motion features; 2) a redesigned question memory which helps understand the complex semantics of question and highlights queried subjects; and 3) a new multimodal fusion layer which performs multi-step reasoning by attending to relevant visual and textual hints with self-updated attention. Our VideoQA model firstly generates the global context-aware visual and textual features respectively by interacting current inputs with memory contents. After that, it makes the attentional fusion of the multimodal visual and textual representations to infer the correct answer. Multiple cycles of reasoning can be made to iteratively refine attention weights of the multimodal data and improve the final representation of the QA pair. Experimental results demonstrate our approach achieves state-of-the-art performance on four VideoQA benchmark datasets.
Tasks Question Answering, Video Question Answering, Visual Question Answering
Published 2019-04-08
URL http://arxiv.org/abs/1904.04357v1
PDF http://arxiv.org/pdf/1904.04357v1.pdf
PWC https://paperswithcode.com/paper/heterogeneous-memory-enhanced-multimodal
Repo https://github.com/fanchenyou/HME-VideoQA
Framework pytorch

Learning Graph Convolutional Network for Skeleton-based Human Action Recognition by Neural Searching

Title Learning Graph Convolutional Network for Skeleton-based Human Action Recognition by Neural Searching
Authors Wei Peng, Xiaopeng Hong, Haoyu Chen, Guoying Zhao
Abstract Human action recognition from skeleton data, fueled by the Graph Convolutional Network (GCN), has attracted lots of attention, due to its powerful capability of modeling non-Euclidean structure data. However, many existing GCN methods provide a pre-defined graph and fix it through the entire network, which can loss implicit joint correlations. Besides, the mainstream spectral GCN is approximated by one-order hop, thus higher-order connections are not well involved. Therefore, huge efforts are required to explore a better GCN architecture. To address these problems, we turn to Neural Architecture Search (NAS) and propose the first automatically designed GCN for skeleton-based action recognition. Specifically, we enrich the search space by providing multiple dynamic graph modules after fully exploring the spatial-temporal correlations between nodes. Besides, we introduce multiple-hop modules and expect to break the limitation of representational capacity caused by one-order approximation. Moreover, a sampling- and memory-efficient evolution strategy is proposed to search an optimal architecture for this task. The resulted architecture proves the effectiveness of the higher-order approximation and the dynamic graph modeling mechanism with temporal interactions, which is barely discussed before. To evaluate the performance of the searched model, we conduct extensive experiments on two very large scaled datasets and the results show that our model gets the state-of-the-art results.
Tasks Neural Architecture Search, Skeleton Based Action Recognition
Published 2019-11-11
URL https://arxiv.org/abs/1911.04131v1
PDF https://arxiv.org/pdf/1911.04131v1.pdf
PWC https://paperswithcode.com/paper/learning-graph-convolutional-network-for
Repo https://github.com/xiaoiker/GCN-NAS
Framework pytorch

Audio Captcha Recognition Using RastaPLP Features by SVM

Title Audio Captcha Recognition Using RastaPLP Features by SVM
Authors Ahmet Faruk Cakmak, Muhammet Balcilar
Abstract Nowadays, CAPTCHAs are computer generated tests that human can pass but current computer systems can not. They have common usage in various web services in order to be able to detect a human from computer programs autonomously. In this way, owners can protect their web services from bots. In addition to visual CAPTCHAs which consist of distorted images, mostly test images, that a user must write some description about that image, there are a significant amount of audio CAPTCHAs as well. Briefly, audio CAPTCHAs are sound files which consist of human sound under heavy noise where the speaker pronounces a bunch of digits consecutively. Generally, in those sound files, there are some periodic and non-periodic noises to get difficult to recognize them with a program but not for a human listener. We gathered numerous randomly collected audio file to train and then test them using our SVM algorithm to be able to extract digits out of each conversation.
Published 2019-01-08
URL http://arxiv.org/abs/1901.02153v1
PDF http://arxiv.org/pdf/1901.02153v1.pdf
PWC https://paperswithcode.com/paper/audio-captcha-recognition-using-rastaplp
Repo https://github.com/balcilar/Audio-Captcha-Recognition
Framework none
comments powered by Disqus