February 2, 2020

3112 words 15 mins read

Paper Group AWR 28

Deep Learning with ConvNET Predicts Imagery Tasks Through EEG. Brain Signal Classification via Learning Connectivity Structure. Deep Learning on Small Datasets without Pre-Training using Cosine Loss. Skeleton-Based Action Recognition with Multi-Stream Adaptive Graph Convolutional Networks. Multi-resolution CSI Feedback with deep learning in Massive …

Deep Learning with ConvNET Predicts Imagery Tasks Through EEG


Title	Deep Learning with ConvNET Predicts Imagery Tasks Through EEG
Authors	Apdullah Yayık, Yakup Kutlu, Gökhan Altan
Abstract	Deep learning with convolutional neural networks (ConvNets) have dramatically improved learning capabilities of computer vision applications just through considering raw data without any prior feature extraction. Nowadays, there is rising curiosity in interpreting and analyzing electroencephalography (EEG) dynamics with ConvNets. Our study focused on ConvNets of different structures, constructed for predicting imagined left and right movements on a subject-independent basis through raw EEG data. Results showed that recently advanced methods in machine learning field, i.e. adaptive moments and batch normalization together with dropout strategy, improved ConvNets predicting ability, outperforming that of conventional fully-connected neural networks with widely-used spectral features.
Tasks	EEG
Published	2019-07-12
URL	https://arxiv.org/abs/1907.05674v1
PDF	https://arxiv.org/pdf/1907.05674v1.pdf
PWC	https://paperswithcode.com/paper/deep-learning-with-convnet-predicts-imagery
Repo	https://github.com/apdullahyayik/EEGMMI-Deep-ConvNET-
Framework	none

Brain Signal Classification via Learning Connectivity Structure


Title	Brain Signal Classification via Learning Connectivity Structure
Authors	Soobeom Jang, Seong-Eun Moon, Jong-Seok Lee
Abstract	Connectivity between different brain regions is one of the most important properties for classification of brain signals including electroencephalography (EEG). However, how to define the connectivity structure for a given task is still an open problem, because there is no ground truth about how the connectivity structure should be in order to maximize the performance. In this paper, we propose an end-to-end neural network model for EEG classification, which can extract an appropriate multi-layer graph structure and signal features directly from a set of raw EEG signals and perform classification. Experimental results demonstrate that our method yields improved performance in comparison to the existing approaches where manually defined connectivity structures and signal features are used. Furthermore, we show that the graph structure extraction process is reliable in terms of consistency, and the learned graph structures make much sense in the neuroscientific viewpoint.
Tasks	EEG
Published	2019-05-28
URL	https://arxiv.org/abs/1905.11678v2
PDF	https://arxiv.org/pdf/1905.11678v2.pdf
PWC	https://paperswithcode.com/paper/brain-signal-classification-via-learning
Repo	https://github.com/ELEMKEP/bsc_lcs
Framework	pytorch

Deep Learning on Small Datasets without Pre-Training using Cosine Loss


Title	Deep Learning on Small Datasets without Pre-Training using Cosine Loss
Authors	Björn Barz, Joachim Denzler
Abstract	Two things seem to be indisputable in the contemporary deep learning discourse: 1. The categorical cross-entropy loss after softmax activation is the method of choice for classification. 2. Training a CNN classifier from scratch on small datasets does not work well. In contrast to this, we show that the cosine loss function provides significantly better performance than cross-entropy on datasets with only a handful of samples per class. For example, the accuracy achieved on the CUB-200-2011 dataset without pre-training is by 30% higher than with the cross-entropy loss. Further experiments on other popular datasets confirm our findings. Moreover, we demonstrate that integrating prior knowledge in the form of class hierarchies is straightforward with the cosine loss and improves classification performance further.
Tasks
Published	2019-01-25
URL	https://arxiv.org/abs/1901.09054v2
PDF	https://arxiv.org/pdf/1901.09054v2.pdf
PWC	https://paperswithcode.com/paper/deep-learning-on-small-datasets-without-pre
Repo	https://github.com/cvjena/semantic-embeddings
Framework	tf

Skeleton-Based Action Recognition with Multi-Stream Adaptive Graph Convolutional Networks


Title	Skeleton-Based Action Recognition with Multi-Stream Adaptive Graph Convolutional Networks
Authors	Lei Shi, Yifan Zhang, Jian Cheng, Hanqing LU
Abstract	Graph convolutional networks (GCNs), which generalize CNNs to more generic non-Euclidean structures, have achieved remarkable performance for skeleton-based action recognition. However, there still exist several issues in the previous GCN-based models. First, the topology of the graph is set heuristically and fixed over all the model layers and input data. This may not be suitable for the hierarchy of the GCN model and the diversity of the data in action recognition tasks. Second, the second-order information of the skeleton data, i.e., the length and orientation of the bones, is rarely investigated, which is naturally more informative and discriminative for the human action recognition. In this work, we propose a novel multi-stream attention-enhanced adaptive graph convolutional neural network (MS-AAGCN) for skeleton-based action recognition. The graph topology in our model can be either uniformly or individually learned based on the input data in an end-to-end manner. This data-driven approach increases the flexibility of the model for graph construction and brings more generality to adapt to various data samples. Besides, the proposed adaptive graph convolutional layer is further enhanced by a spatial-temporal-channel attention module, which helps the model pay more attention to important joints, frames and features. Moreover, the information of both the joints and bones, together with their motion information, are simultaneously modeled in a multi-stream framework, which shows notable improvement for the recognition accuracy. Extensive experiments on the two large-scale datasets, NTU-RGBD and Kinetics-Skeleton, demonstrate that the performance of our model exceeds the state-of-the-art with a significant margin.
Tasks	graph construction, Skeleton Based Action Recognition, Temporal Action Localization
Published	2019-12-15
URL	https://arxiv.org/abs/1912.06971v1
PDF	https://arxiv.org/pdf/1912.06971v1.pdf
PWC	https://paperswithcode.com/paper/skeleton-based-action-recognition-with-multi
Repo	https://github.com/fdu-wuyuan/Siren
Framework	none

Multi-resolution CSI Feedback with deep learning in Massive MIMO System


Title	Multi-resolution CSI Feedback with deep learning in Massive MIMO System
Authors	Zhilin Lu, Jintao Wang, Jian Song
Abstract	In massive multiple-input multiple-output (MIMO) system, user equipment (UE) needs to send downlink channel state information (CSI) back to base station (BS). However, the feedback becomes expensive with the growing complexity of CSI in massive MIMO system. Recently, deep learning (DL) approaches are used to improve the reconstruction efficiency of CSI feedback. In this paper, a novel feedback network named CRNet is proposed to achieve better performance via extracting CSI features on multiple resolutions. An advanced training scheme that further boosts the network performance is also introduced. Simulation results show that the proposed CRNet outperforms the state-of-the-art CsiNet under the same computational complexity without any extra information. The open source codes are available at https://github.com/Kylin9511/CRNet
Tasks
Published	2019-10-31
URL	https://arxiv.org/abs/1910.14322v1
PDF	https://arxiv.org/pdf/1910.14322v1.pdf
PWC	https://paperswithcode.com/paper/multi-resolution-csi-feedback-with-deep
Repo	https://github.com/Kylin9511/CRNet
Framework	pytorch

MonoLoco: Monocular 3D Pedestrian Localization and Uncertainty Estimation


Title	MonoLoco: Monocular 3D Pedestrian Localization and Uncertainty Estimation
Authors	Lorenzo Bertoni, Sven Kreiss, Alexandre Alahi
Abstract	We tackle the fundamentally ill-posed problem of 3D human localization from monocular RGB images. Driven by the limitation of neural networks outputting point estimates, we address the ambiguity in the task by predicting confidence intervals through a loss function based on the Laplace distribution. Our architecture is a light-weight feed-forward neural network that predicts 3D locations and corresponding confidence intervals given 2D human poses. The design is particularly well suited for small training data, cross-dataset generalization, and real-time applications. Our experiments show that we (i) outperform state-of-the-art results on KITTI and nuScenes datasets, (ii) even outperform a stereo-based method for far-away pedestrians, and (iii) estimate meaningful confidence intervals. We further share insights on our model of uncertainty in cases of limited observations and out-of-distribution samples.
Tasks	3D Depth Estimation, 3D Object Detection, Self-Driving Cars
Published	2019-06-14
URL	https://arxiv.org/abs/1906.06059v2
PDF	https://arxiv.org/pdf/1906.06059v2.pdf
PWC	https://paperswithcode.com/paper/monoloco-monocular-3d-pedestrian-localization
Repo	https://github.com/vita-epfl/monoloco
Framework	pytorch

DeepGCNs: Making GCNs Go as Deep as CNNs


Title	DeepGCNs: Making GCNs Go as Deep as CNNs
Authors	Guohao Li, Matthias Müller, Guocheng Qian, Itzel C. Delgadillo, Abdulellah Abualshour, Ali Thabet, Bernard Ghanem
Abstract	Convolutional Neural Networks (CNNs) have been very successful at solving a variety of computer vision tasks such as object classification and detection, semantic segmentation, activity understanding, to name just a few. One key enabling factor for their great performance has been the ability to train very deep CNNs. Despite their huge success in many tasks, CNNs do not work well with non-Euclidean data which is prevalent in many real-world applications. Graph Convolutional Networks (GCNs) offer an alternative that allows for non-Eucledian data as input to a neural network similar to CNNs. While GCNs already achieve encouraging results, they are currently limited to shallow architectures with 2-4 layers due to vanishing gradients during training. This work transfers concepts such as residual/dense connections and dilated convolutions from CNNs to GCNs in order to successfully train very deep GCNs. We show the benefit of deep GCNs with as many as 112 layers experimentally across various datasets and tasks. Specifically, we achieve state-of-the-art performance in part segmentation and semantic segmentation on point clouds and in node classification of protein functions across biological protein-protein interaction (PPI) graphs. We believe that the insights in this work will open a lot of avenues for future research on GCNs and transfer to further tasks not explored in this work. The source code for this work is available for Pytorch and Tensorflow at https://github.com/lightaime/deep_gcns_torch and https://github.com/lightaime/deep_gcns respectively.
Tasks	Node Classification, Object Classification, Semantic Segmentation
Published	2019-10-15
URL	https://arxiv.org/abs/1910.06849v1
PDF	https://arxiv.org/pdf/1910.06849v1.pdf
PWC	https://paperswithcode.com/paper/deepgcns-making-gcns-go-as-deep-as-cnns
Repo	https://github.com/lightaime/deep_gcns
Framework	tf

Spatio-spectral networks for color-texture analysis


Title	Spatio-spectral networks for color-texture analysis
Authors	Leonardo F. S. Scabini, Lucas C. Ribas, Odemir M. Bruno
Abstract	Texture is one of the most-studied visual attribute for image characterization since the 1960s. However, most hand-crafted descriptors are monochromatic, focusing on the gray scale images and discarding the color information. In this context, this work focus on a new method for color texture analysis considering all color channels in a more intrinsic approach. Our proposal consists of modeling color images as directed complex networks that we named Spatio-Spectral Network (SSN). Its topology includes within-channel edges that cover spatial patterns throughout individual image color channels, while between-channel edges tackle spectral properties of channel pairs in an opponent fashion. Image descriptors are obtained through a concise topological characterization of the modeled network in a multiscale approach with radially symmetric neighborhoods. Experiments with four datasets cover several aspects of color-texture analysis, and results demonstrate that SSN overcomes all the compared literature methods, including known deep convolutional networks, and also has the most stable performance between datasets, achieving $98.5(\pm1.1)$ of average accuracy against $97.1(\pm1.3)$ of MCND and $96.8(\pm3.2)$ of AlexNet. Additionally, an experiment verifies the performance of the methods under different color spaces, where results show that SSN also has higher performance and robustness.
Tasks	Texture Classification
Published	2019-09-13
URL	https://arxiv.org/abs/1909.06446v1
PDF	https://arxiv.org/pdf/1909.06446v1.pdf
PWC	https://paperswithcode.com/paper/spatio-spectral-networks-for-color-texture
Repo	https://github.com/scabini/ssn
Framework	none

Combined tract segmentation and orientation mapping for bundle-specific tractography


Title	Combined tract segmentation and orientation mapping for bundle-specific tractography
Authors	Jakob Wasserthal, Peter Neher, Dusan Hirjak, Klaus H. Maier-Hein
Abstract	While the major white matter tracts are of great interest to numerous studies in neuroscience and medicine, their manual dissection in larger cohorts from diffusion MRI tractograms is time-consuming, requires expert knowledge and is hard to reproduce. In previous work we presented tract orientation mapping (TOM) as a novel concept for bundle-specific tractography. It is based on a learned mapping from the original fiber orientation distribution function (FOD) peaks to tract specific peaks, called tract orientation maps. Each tract orientation map represents the voxel-wise principal orientation of one tract. Here, we present an extension of this approach that combines TOM with accurate segmentations of the tract outline and its start and end region. We also introduce a custom probabilistic tracking algorithm that samples from a Gaussian distribution with fixed standard deviation centered on each peak thus enabling more complete trackings on the tract orientation maps than deterministic tracking. These extensions enable the automatic creation of bundle-specific tractograms with previously unseen accuracy. We show for 72 different bundles on high quality, low quality and phantom data that our approach runs faster and produces more accurate bundle-specific tractograms than 7 state of the art benchmark methods while avoiding cumbersome processing steps like whole brain tractography, non-linear registration, clustering or manual dissection. Moreover, we show on 17 datasets that our approach generalizes well to datasets acquired with different scanners and settings as well as with pathologies. The code of our method is openly available at https://github.com/MIC-DKFZ/TractSeg.
Tasks
Published	2019-01-29
URL	https://arxiv.org/abs/1901.10271v2
PDF	https://arxiv.org/pdf/1901.10271v2.pdf
PWC	https://paperswithcode.com/paper/combined-tract-segmentation-and-orientation
Repo	https://github.com/MIC-DKFZ/TractSeg
Framework	pytorch

Audiogmenter: a MATLAB Toolbox for Audio Data Augmentation


Title	Audiogmenter: a MATLAB Toolbox for Audio Data Augmentation
Authors	Gianluca Maguolo, Michelangelo Paci, Loris Nanni, Ludovico Bonan
Abstract	Audio data augmentation is a key step in training deep neural networks for solving audio classification tasks. In this paper, we introduce Audiogmenter, a novel audio data augmentation library in MATLAB. We provide 15 different augmentation algorithms for raw audio data and 8 for spectrograms. We efficiently implemented several augmentation techniques whose usefulness has been extensively proved in the literature. To the best of our knowledge, this is the largest MATLAB audio data augmentation library freely available. We validate the efficiency of our algorithms evaluating them on the ESC-50 dataset. The toolbox and its documentation can be downloaded at https://github.com/LorisNanni/Audiogmenter.
Tasks	Audio Classification, Data Augmentation
Published	2019-12-11
URL	https://arxiv.org/abs/1912.05472v3
PDF	https://arxiv.org/pdf/1912.05472v3.pdf
PWC	https://paperswithcode.com/paper/audiogmenter-a-matlab-toolbox-for-audio-data
Repo	https://github.com/LorisNanni/Audiogmenter
Framework	none

Robust Bi-Tempered Logistic Loss Based on Bregman Divergences


Title	Robust Bi-Tempered Logistic Loss Based on Bregman Divergences
Authors	Ehsan Amid, Manfred K. Warmuth, Rohan Anil, Tomer Koren
Abstract	We introduce a temperature into the exponential function and replace the softmax output layer of neural nets by a high temperature generalization. Similarly, the logarithm in the log loss we use for training is replaced by a low temperature logarithm. By tuning the two temperatures we create loss functions that are non-convex already in the single layer case. When replacing the last layer of the neural nets by our bi-temperature generalization of logistic loss, the training becomes more robust to noise. We visualize the effect of tuning the two temperatures in a simple setting and show the efficacy of our method on large data sets. Our methodology is based on Bregman divergences and is superior to a related two-temperature method using the Tsallis divergence.
Tasks
Published	2019-06-08
URL	https://arxiv.org/abs/1906.03361v3
PDF	https://arxiv.org/pdf/1906.03361v3.pdf
PWC	https://paperswithcode.com/paper/robust-bi-tempered-logistic-loss-based-on
Repo	https://github.com/fhopfmueller/bi-tempered-loss-pytorch
Framework	pytorch

A Bayesian Perspective on the Deep Image Prior


Title	A Bayesian Perspective on the Deep Image Prior
Authors	Zezhou Cheng, Matheus Gadelha, Subhransu Maji, Daniel Sheldon
Abstract	The deep image prior was recently introduced as a prior for natural images. It represents images as the output of a convolutional network with random inputs. For “inference”, gradient descent is performed to adjust network parameters to make the output match observations. This approach yields good performance on a range of image reconstruction tasks. We show that the deep image prior is asymptotically equivalent to a stationary Gaussian process prior in the limit as the number of channels in each layer of the network goes to infinity, and derive the corresponding kernel. This informs a Bayesian approach to inference. We show that by conducting posterior inference using stochastic gradient Langevin we avoid the need for early stopping, which is a drawback of the current approach, and improve results for denoising and impainting tasks. We illustrate these intuitions on a number of 1D and 2D signal reconstruction tasks.
Tasks	Bayesian Inference, Denoising, Image Reconstruction
Published	2019-04-16
URL	http://arxiv.org/abs/1904.07457v1
PDF	http://arxiv.org/pdf/1904.07457v1.pdf
PWC	https://paperswithcode.com/paper/a-bayesian-perspective-on-the-deep-image
Repo	https://github.com/ZezhouCheng/GP-DIP
Framework	pytorch

Heterogeneous Memory Enhanced Multimodal Attention Model for Video Question Answering


Title	Heterogeneous Memory Enhanced Multimodal Attention Model for Video Question Answering
Authors	Chenyou Fan, Xiaofan Zhang, Shu Zhang, Wensheng Wang, Chi Zhang, Heng Huang
Abstract	In this paper, we propose a novel end-to-end trainable Video Question Answering (VideoQA) framework with three major components: 1) a new heterogeneous memory which can effectively learn global context information from appearance and motion features; 2) a redesigned question memory which helps understand the complex semantics of question and highlights queried subjects; and 3) a new multimodal fusion layer which performs multi-step reasoning by attending to relevant visual and textual hints with self-updated attention. Our VideoQA model firstly generates the global context-aware visual and textual features respectively by interacting current inputs with memory contents. After that, it makes the attentional fusion of the multimodal visual and textual representations to infer the correct answer. Multiple cycles of reasoning can be made to iteratively refine attention weights of the multimodal data and improve the final representation of the QA pair. Experimental results demonstrate our approach achieves state-of-the-art performance on four VideoQA benchmark datasets.
Tasks	Question Answering, Video Question Answering, Visual Question Answering
Published	2019-04-08
URL	http://arxiv.org/abs/1904.04357v1
PDF	http://arxiv.org/pdf/1904.04357v1.pdf
PWC	https://paperswithcode.com/paper/heterogeneous-memory-enhanced-multimodal
Repo	https://github.com/fanchenyou/HME-VideoQA
Framework	pytorch

Learning Graph Convolutional Network for Skeleton-based Human Action Recognition by Neural Searching


Title	Learning Graph Convolutional Network for Skeleton-based Human Action Recognition by Neural Searching
Authors	Wei Peng, Xiaopeng Hong, Haoyu Chen, Guoying Zhao
Abstract	Human action recognition from skeleton data, fueled by the Graph Convolutional Network (GCN), has attracted lots of attention, due to its powerful capability of modeling non-Euclidean structure data. However, many existing GCN methods provide a pre-defined graph and fix it through the entire network, which can loss implicit joint correlations. Besides, the mainstream spectral GCN is approximated by one-order hop, thus higher-order connections are not well involved. Therefore, huge efforts are required to explore a better GCN architecture. To address these problems, we turn to Neural Architecture Search (NAS) and propose the first automatically designed GCN for skeleton-based action recognition. Specifically, we enrich the search space by providing multiple dynamic graph modules after fully exploring the spatial-temporal correlations between nodes. Besides, we introduce multiple-hop modules and expect to break the limitation of representational capacity caused by one-order approximation. Moreover, a sampling- and memory-efficient evolution strategy is proposed to search an optimal architecture for this task. The resulted architecture proves the effectiveness of the higher-order approximation and the dynamic graph modeling mechanism with temporal interactions, which is barely discussed before. To evaluate the performance of the searched model, we conduct extensive experiments on two very large scaled datasets and the results show that our model gets the state-of-the-art results.
Tasks	Neural Architecture Search, Skeleton Based Action Recognition
Published	2019-11-11
URL	https://arxiv.org/abs/1911.04131v1
PDF	https://arxiv.org/pdf/1911.04131v1.pdf
PWC	https://paperswithcode.com/paper/learning-graph-convolutional-network-for
Repo	https://github.com/xiaoiker/GCN-NAS
Framework	pytorch

Audio Captcha Recognition Using RastaPLP Features by SVM


Title	Audio Captcha Recognition Using RastaPLP Features by SVM
Authors	Ahmet Faruk Cakmak, Muhammet Balcilar
Abstract	Nowadays, CAPTCHAs are computer generated tests that human can pass but current computer systems can not. They have common usage in various web services in order to be able to detect a human from computer programs autonomously. In this way, owners can protect their web services from bots. In addition to visual CAPTCHAs which consist of distorted images, mostly test images, that a user must write some description about that image, there are a significant amount of audio CAPTCHAs as well. Briefly, audio CAPTCHAs are sound files which consist of human sound under heavy noise where the speaker pronounces a bunch of digits consecutively. Generally, in those sound files, there are some periodic and non-periodic noises to get difficult to recognize them with a program but not for a human listener. We gathered numerous randomly collected audio file to train and then test them using our SVM algorithm to be able to extract digits out of each conversation.
Tasks
Published	2019-01-08
URL	http://arxiv.org/abs/1901.02153v1
PDF	http://arxiv.org/pdf/1901.02153v1.pdf
PWC	https://paperswithcode.com/paper/audio-captcha-recognition-using-rastaplp
Repo	https://github.com/balcilar/Audio-Captcha-Recognition
Framework	none