Paper Group AWR 28
Deep Learning with ConvNET Predicts Imagery Tasks Through EEG. Brain Signal Classification via Learning Connectivity Structure. Deep Learning on Small Datasets without Pre-Training using Cosine Loss. Skeleton-Based Action Recognition with Multi-Stream Adaptive Graph Convolutional Networks. Multi-resolution CSI Feedback with deep learning in Massive …
Deep Learning with ConvNET Predicts Imagery Tasks Through EEG
Title | Deep Learning with ConvNET Predicts Imagery Tasks Through EEG |
Authors | Apdullah Yayık, Yakup Kutlu, Gökhan Altan |
Abstract | Deep learning with convolutional neural networks (ConvNets) have dramatically improved learning capabilities of computer vision applications just through considering raw data without any prior feature extraction. Nowadays, there is rising curiosity in interpreting and analyzing electroencephalography (EEG) dynamics with ConvNets. Our study focused on ConvNets of different structures, constructed for predicting imagined left and right movements on a subject-independent basis through raw EEG data. Results showed that recently advanced methods in machine learning field, i.e. adaptive moments and batch normalization together with dropout strategy, improved ConvNets predicting ability, outperforming that of conventional fully-connected neural networks with widely-used spectral features. |
Tasks | EEG |
Published | 2019-07-12 |
URL | https://arxiv.org/abs/1907.05674v1 |
https://arxiv.org/pdf/1907.05674v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-with-convnet-predicts-imagery |
Repo | https://github.com/apdullahyayik/EEGMMI-Deep-ConvNET- |
Framework | none |
Brain Signal Classification via Learning Connectivity Structure
Title | Brain Signal Classification via Learning Connectivity Structure |
Authors | Soobeom Jang, Seong-Eun Moon, Jong-Seok Lee |
Abstract | Connectivity between different brain regions is one of the most important properties for classification of brain signals including electroencephalography (EEG). However, how to define the connectivity structure for a given task is still an open problem, because there is no ground truth about how the connectivity structure should be in order to maximize the performance. In this paper, we propose an end-to-end neural network model for EEG classification, which can extract an appropriate multi-layer graph structure and signal features directly from a set of raw EEG signals and perform classification. Experimental results demonstrate that our method yields improved performance in comparison to the existing approaches where manually defined connectivity structures and signal features are used. Furthermore, we show that the graph structure extraction process is reliable in terms of consistency, and the learned graph structures make much sense in the neuroscientific viewpoint. |
Tasks | EEG |
Published | 2019-05-28 |
URL | https://arxiv.org/abs/1905.11678v2 |
https://arxiv.org/pdf/1905.11678v2.pdf | |
PWC | https://paperswithcode.com/paper/brain-signal-classification-via-learning |
Repo | https://github.com/ELEMKEP/bsc_lcs |
Framework | pytorch |
Deep Learning on Small Datasets without Pre-Training using Cosine Loss
Title | Deep Learning on Small Datasets without Pre-Training using Cosine Loss |
Authors | Björn Barz, Joachim Denzler |
Abstract | Two things seem to be indisputable in the contemporary deep learning discourse: 1. The categorical cross-entropy loss after softmax activation is the method of choice for classification. 2. Training a CNN classifier from scratch on small datasets does not work well. In contrast to this, we show that the cosine loss function provides significantly better performance than cross-entropy on datasets with only a handful of samples per class. For example, the accuracy achieved on the CUB-200-2011 dataset without pre-training is by 30% higher than with the cross-entropy loss. Further experiments on other popular datasets confirm our findings. Moreover, we demonstrate that integrating prior knowledge in the form of class hierarchies is straightforward with the cosine loss and improves classification performance further. |
Tasks | |
Published | 2019-01-25 |
URL | https://arxiv.org/abs/1901.09054v2 |
https://arxiv.org/pdf/1901.09054v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-on-small-datasets-without-pre |
Repo | https://github.com/cvjena/semantic-embeddings |
Framework | tf |
Skeleton-Based Action Recognition with Multi-Stream Adaptive Graph Convolutional Networks
Title | Skeleton-Based Action Recognition with Multi-Stream Adaptive Graph Convolutional Networks |
Authors | Lei Shi, Yifan Zhang, Jian Cheng, Hanqing LU |
Abstract | Graph convolutional networks (GCNs), which generalize CNNs to more generic non-Euclidean structures, have achieved remarkable performance for skeleton-based action recognition. However, there still exist several issues in the previous GCN-based models. First, the topology of the graph is set heuristically and fixed over all the model layers and input data. This may not be suitable for the hierarchy of the GCN model and the diversity of the data in action recognition tasks. Second, the second-order information of the skeleton data, i.e., the length and orientation of the bones, is rarely investigated, which is naturally more informative and discriminative for the human action recognition. In this work, we propose a novel multi-stream attention-enhanced adaptive graph convolutional neural network (MS-AAGCN) for skeleton-based action recognition. The graph topology in our model can be either uniformly or individually learned based on the input data in an end-to-end manner. This data-driven approach increases the flexibility of the model for graph construction and brings more generality to adapt to various data samples. Besides, the proposed adaptive graph convolutional layer is further enhanced by a spatial-temporal-channel attention module, which helps the model pay more attention to important joints, frames and features. Moreover, the information of both the joints and bones, together with their motion information, are simultaneously modeled in a multi-stream framework, which shows notable improvement for the recognition accuracy. Extensive experiments on the two large-scale datasets, NTU-RGBD and Kinetics-Skeleton, demonstrate that the performance of our model exceeds the state-of-the-art with a significant margin. |
Tasks | graph construction, Skeleton Based Action Recognition, Temporal Action Localization |
Published | 2019-12-15 |
URL | https://arxiv.org/abs/1912.06971v1 |
https://arxiv.org/pdf/1912.06971v1.pdf | |
PWC | https://paperswithcode.com/paper/skeleton-based-action-recognition-with-multi |
Repo | https://github.com/fdu-wuyuan/Siren |
Framework | none |
Multi-resolution CSI Feedback with deep learning in Massive MIMO System
Title | Multi-resolution CSI Feedback with deep learning in Massive MIMO System |
Authors | Zhilin Lu, Jintao Wang, Jian Song |
Abstract | In massive multiple-input multiple-output (MIMO) system, user equipment (UE) needs to send downlink channel state information (CSI) back to base station (BS). However, the feedback becomes expensive with the growing complexity of CSI in massive MIMO system. Recently, deep learning (DL) approaches are used to improve the reconstruction efficiency of CSI feedback. In this paper, a novel feedback network named CRNet is proposed to achieve better performance via extracting CSI features on multiple resolutions. An advanced training scheme that further boosts the network performance is also introduced. Simulation results show that the proposed CRNet outperforms the state-of-the-art CsiNet under the same computational complexity without any extra information. The open source codes are available at https://github.com/Kylin9511/CRNet |
Tasks | |
Published | 2019-10-31 |
URL | https://arxiv.org/abs/1910.14322v1 |
https://arxiv.org/pdf/1910.14322v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-resolution-csi-feedback-with-deep |
Repo | https://github.com/Kylin9511/CRNet |
Framework | pytorch |
MonoLoco: Monocular 3D Pedestrian Localization and Uncertainty Estimation
Title | MonoLoco: Monocular 3D Pedestrian Localization and Uncertainty Estimation |
Authors | Lorenzo Bertoni, Sven Kreiss, Alexandre Alahi |
Abstract | We tackle the fundamentally ill-posed problem of 3D human localization from monocular RGB images. Driven by the limitation of neural networks outputting point estimates, we address the ambiguity in the task by predicting confidence intervals through a loss function based on the Laplace distribution. Our architecture is a light-weight feed-forward neural network that predicts 3D locations and corresponding confidence intervals given 2D human poses. The design is particularly well suited for small training data, cross-dataset generalization, and real-time applications. Our experiments show that we (i) outperform state-of-the-art results on KITTI and nuScenes datasets, (ii) even outperform a stereo-based method for far-away pedestrians, and (iii) estimate meaningful confidence intervals. We further share insights on our model of uncertainty in cases of limited observations and out-of-distribution samples. |
Tasks | 3D Depth Estimation, 3D Object Detection, Self-Driving Cars |
Published | 2019-06-14 |
URL | https://arxiv.org/abs/1906.06059v2 |
https://arxiv.org/pdf/1906.06059v2.pdf | |
PWC | https://paperswithcode.com/paper/monoloco-monocular-3d-pedestrian-localization |
Repo | https://github.com/vita-epfl/monoloco |
Framework | pytorch |
DeepGCNs: Making GCNs Go as Deep as CNNs
Title | DeepGCNs: Making GCNs Go as Deep as CNNs |
Authors | Guohao Li, Matthias Müller, Guocheng Qian, Itzel C. Delgadillo, Abdulellah Abualshour, Ali Thabet, Bernard Ghanem |
Abstract | Convolutional Neural Networks (CNNs) have been very successful at solving a variety of computer vision tasks such as object classification and detection, semantic segmentation, activity understanding, to name just a few. One key enabling factor for their great performance has been the ability to train very deep CNNs. Despite their huge success in many tasks, CNNs do not work well with non-Euclidean data which is prevalent in many real-world applications. Graph Convolutional Networks (GCNs) offer an alternative that allows for non-Eucledian data as input to a neural network similar to CNNs. While GCNs already achieve encouraging results, they are currently limited to shallow architectures with 2-4 layers due to vanishing gradients during training. This work transfers concepts such as residual/dense connections and dilated convolutions from CNNs to GCNs in order to successfully train very deep GCNs. We show the benefit of deep GCNs with as many as 112 layers experimentally across various datasets and tasks. Specifically, we achieve state-of-the-art performance in part segmentation and semantic segmentation on point clouds and in node classification of protein functions across biological protein-protein interaction (PPI) graphs. We believe that the insights in this work will open a lot of avenues for future research on GCNs and transfer to further tasks not explored in this work. The source code for this work is available for Pytorch and Tensorflow at https://github.com/lightaime/deep_gcns_torch and https://github.com/lightaime/deep_gcns respectively. |
Tasks | Node Classification, Object Classification, Semantic Segmentation |
Published | 2019-10-15 |
URL | https://arxiv.org/abs/1910.06849v1 |
https://arxiv.org/pdf/1910.06849v1.pdf | |
PWC | https://paperswithcode.com/paper/deepgcns-making-gcns-go-as-deep-as-cnns |
Repo | https://github.com/lightaime/deep_gcns |
Framework | tf |
Spatio-spectral networks for color-texture analysis
Title | Spatio-spectral networks for color-texture analysis |
Authors | Leonardo F. S. Scabini, Lucas C. Ribas, Odemir M. Bruno |
Abstract | Texture is one of the most-studied visual attribute for image characterization since the 1960s. However, most hand-crafted descriptors are monochromatic, focusing on the gray scale images and discarding the color information. In this context, this work focus on a new method for color texture analysis considering all color channels in a more intrinsic approach. Our proposal consists of modeling color images as directed complex networks that we named Spatio-Spectral Network (SSN). Its topology includes within-channel edges that cover spatial patterns throughout individual image color channels, while between-channel edges tackle spectral properties of channel pairs in an opponent fashion. Image descriptors are obtained through a concise topological characterization of the modeled network in a multiscale approach with radially symmetric neighborhoods. Experiments with four datasets cover several aspects of color-texture analysis, and results demonstrate that SSN overcomes all the compared literature methods, including known deep convolutional networks, and also has the most stable performance between datasets, achieving $98.5(\pm1.1)$ of average accuracy against $97.1(\pm1.3)$ of MCND and $96.8(\pm3.2)$ of AlexNet. Additionally, an experiment verifies the performance of the methods under different color spaces, where results show that SSN also has higher performance and robustness. |
Tasks | Texture Classification |
Published | 2019-09-13 |
URL | https://arxiv.org/abs/1909.06446v1 |
https://arxiv.org/pdf/1909.06446v1.pdf | |
PWC | https://paperswithcode.com/paper/spatio-spectral-networks-for-color-texture |
Repo | https://github.com/scabini/ssn |
Framework | none |
Combined tract segmentation and orientation mapping for bundle-specific tractography
Title | Combined tract segmentation and orientation mapping for bundle-specific tractography |
Authors | Jakob Wasserthal, Peter Neher, Dusan Hirjak, Klaus H. Maier-Hein |
Abstract | While the major white matter tracts are of great interest to numerous studies in neuroscience and medicine, their manual dissection in larger cohorts from diffusion MRI tractograms is time-consuming, requires expert knowledge and is hard to reproduce. In previous work we presented tract orientation mapping (TOM) as a novel concept for bundle-specific tractography. It is based on a learned mapping from the original fiber orientation distribution function (FOD) peaks to tract specific peaks, called tract orientation maps. Each tract orientation map represents the voxel-wise principal orientation of one tract. Here, we present an extension of this approach that combines TOM with accurate segmentations of the tract outline and its start and end region. We also introduce a custom probabilistic tracking algorithm that samples from a Gaussian distribution with fixed standard deviation centered on each peak thus enabling more complete trackings on the tract orientation maps than deterministic tracking. These extensions enable the automatic creation of bundle-specific tractograms with previously unseen accuracy. We show for 72 different bundles on high quality, low quality and phantom data that our approach runs faster and produces more accurate bundle-specific tractograms than 7 state of the art benchmark methods while avoiding cumbersome processing steps like whole brain tractography, non-linear registration, clustering or manual dissection. Moreover, we show on 17 datasets that our approach generalizes well to datasets acquired with different scanners and settings as well as with pathologies. The code of our method is openly available at https://github.com/MIC-DKFZ/TractSeg. |
Tasks | |
Published | 2019-01-29 |
URL | https://arxiv.org/abs/1901.10271v2 |
https://arxiv.org/pdf/1901.10271v2.pdf | |
PWC | https://paperswithcode.com/paper/combined-tract-segmentation-and-orientation |
Repo | https://github.com/MIC-DKFZ/TractSeg |
Framework | pytorch |
Audiogmenter: a MATLAB Toolbox for Audio Data Augmentation
Title | Audiogmenter: a MATLAB Toolbox for Audio Data Augmentation |
Authors | Gianluca Maguolo, Michelangelo Paci, Loris Nanni, Ludovico Bonan |
Abstract | Audio data augmentation is a key step in training deep neural networks for solving audio classification tasks. In this paper, we introduce Audiogmenter, a novel audio data augmentation library in MATLAB. We provide 15 different augmentation algorithms for raw audio data and 8 for spectrograms. We efficiently implemented several augmentation techniques whose usefulness has been extensively proved in the literature. To the best of our knowledge, this is the largest MATLAB audio data augmentation library freely available. We validate the efficiency of our algorithms evaluating them on the ESC-50 dataset. The toolbox and its documentation can be downloaded at https://github.com/LorisNanni/Audiogmenter. |
Tasks | Audio Classification, Data Augmentation |
Published | 2019-12-11 |
URL | https://arxiv.org/abs/1912.05472v3 |
https://arxiv.org/pdf/1912.05472v3.pdf | |
PWC | https://paperswithcode.com/paper/audiogmenter-a-matlab-toolbox-for-audio-data |
Repo | https://github.com/LorisNanni/Audiogmenter |
Framework | none |
Robust Bi-Tempered Logistic Loss Based on Bregman Divergences
Title | Robust Bi-Tempered Logistic Loss Based on Bregman Divergences |
Authors | Ehsan Amid, Manfred K. Warmuth, Rohan Anil, Tomer Koren |
Abstract | We introduce a temperature into the exponential function and replace the softmax output layer of neural nets by a high temperature generalization. Similarly, the logarithm in the log loss we use for training is replaced by a low temperature logarithm. By tuning the two temperatures we create loss functions that are non-convex already in the single layer case. When replacing the last layer of the neural nets by our bi-temperature generalization of logistic loss, the training becomes more robust to noise. We visualize the effect of tuning the two temperatures in a simple setting and show the efficacy of our method on large data sets. Our methodology is based on Bregman divergences and is superior to a related two-temperature method using the Tsallis divergence. |
Tasks | |
Published | 2019-06-08 |
URL | https://arxiv.org/abs/1906.03361v3 |
https://arxiv.org/pdf/1906.03361v3.pdf | |
PWC | https://paperswithcode.com/paper/robust-bi-tempered-logistic-loss-based-on |
Repo | https://github.com/fhopfmueller/bi-tempered-loss-pytorch |
Framework | pytorch |
A Bayesian Perspective on the Deep Image Prior
Title | A Bayesian Perspective on the Deep Image Prior |
Authors | Zezhou Cheng, Matheus Gadelha, Subhransu Maji, Daniel Sheldon |
Abstract | The deep image prior was recently introduced as a prior for natural images. It represents images as the output of a convolutional network with random inputs. For “inference”, gradient descent is performed to adjust network parameters to make the output match observations. This approach yields good performance on a range of image reconstruction tasks. We show that the deep image prior is asymptotically equivalent to a stationary Gaussian process prior in the limit as the number of channels in each layer of the network goes to infinity, and derive the corresponding kernel. This informs a Bayesian approach to inference. We show that by conducting posterior inference using stochastic gradient Langevin we avoid the need for early stopping, which is a drawback of the current approach, and improve results for denoising and impainting tasks. We illustrate these intuitions on a number of 1D and 2D signal reconstruction tasks. |
Tasks | Bayesian Inference, Denoising, Image Reconstruction |
Published | 2019-04-16 |
URL | http://arxiv.org/abs/1904.07457v1 |
http://arxiv.org/pdf/1904.07457v1.pdf | |
PWC | https://paperswithcode.com/paper/a-bayesian-perspective-on-the-deep-image |
Repo | https://github.com/ZezhouCheng/GP-DIP |
Framework | pytorch |
Heterogeneous Memory Enhanced Multimodal Attention Model for Video Question Answering
Title | Heterogeneous Memory Enhanced Multimodal Attention Model for Video Question Answering |
Authors | Chenyou Fan, Xiaofan Zhang, Shu Zhang, Wensheng Wang, Chi Zhang, Heng Huang |
Abstract | In this paper, we propose a novel end-to-end trainable Video Question Answering (VideoQA) framework with three major components: 1) a new heterogeneous memory which can effectively learn global context information from appearance and motion features; 2) a redesigned question memory which helps understand the complex semantics of question and highlights queried subjects; and 3) a new multimodal fusion layer which performs multi-step reasoning by attending to relevant visual and textual hints with self-updated attention. Our VideoQA model firstly generates the global context-aware visual and textual features respectively by interacting current inputs with memory contents. After that, it makes the attentional fusion of the multimodal visual and textual representations to infer the correct answer. Multiple cycles of reasoning can be made to iteratively refine attention weights of the multimodal data and improve the final representation of the QA pair. Experimental results demonstrate our approach achieves state-of-the-art performance on four VideoQA benchmark datasets. |
Tasks | Question Answering, Video Question Answering, Visual Question Answering |
Published | 2019-04-08 |
URL | http://arxiv.org/abs/1904.04357v1 |
http://arxiv.org/pdf/1904.04357v1.pdf | |
PWC | https://paperswithcode.com/paper/heterogeneous-memory-enhanced-multimodal |
Repo | https://github.com/fanchenyou/HME-VideoQA |
Framework | pytorch |
Learning Graph Convolutional Network for Skeleton-based Human Action Recognition by Neural Searching
Title | Learning Graph Convolutional Network for Skeleton-based Human Action Recognition by Neural Searching |
Authors | Wei Peng, Xiaopeng Hong, Haoyu Chen, Guoying Zhao |
Abstract | Human action recognition from skeleton data, fueled by the Graph Convolutional Network (GCN), has attracted lots of attention, due to its powerful capability of modeling non-Euclidean structure data. However, many existing GCN methods provide a pre-defined graph and fix it through the entire network, which can loss implicit joint correlations. Besides, the mainstream spectral GCN is approximated by one-order hop, thus higher-order connections are not well involved. Therefore, huge efforts are required to explore a better GCN architecture. To address these problems, we turn to Neural Architecture Search (NAS) and propose the first automatically designed GCN for skeleton-based action recognition. Specifically, we enrich the search space by providing multiple dynamic graph modules after fully exploring the spatial-temporal correlations between nodes. Besides, we introduce multiple-hop modules and expect to break the limitation of representational capacity caused by one-order approximation. Moreover, a sampling- and memory-efficient evolution strategy is proposed to search an optimal architecture for this task. The resulted architecture proves the effectiveness of the higher-order approximation and the dynamic graph modeling mechanism with temporal interactions, which is barely discussed before. To evaluate the performance of the searched model, we conduct extensive experiments on two very large scaled datasets and the results show that our model gets the state-of-the-art results. |
Tasks | Neural Architecture Search, Skeleton Based Action Recognition |
Published | 2019-11-11 |
URL | https://arxiv.org/abs/1911.04131v1 |
https://arxiv.org/pdf/1911.04131v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-graph-convolutional-network-for |
Repo | https://github.com/xiaoiker/GCN-NAS |
Framework | pytorch |
Audio Captcha Recognition Using RastaPLP Features by SVM
Title | Audio Captcha Recognition Using RastaPLP Features by SVM |
Authors | Ahmet Faruk Cakmak, Muhammet Balcilar |
Abstract | Nowadays, CAPTCHAs are computer generated tests that human can pass but current computer systems can not. They have common usage in various web services in order to be able to detect a human from computer programs autonomously. In this way, owners can protect their web services from bots. In addition to visual CAPTCHAs which consist of distorted images, mostly test images, that a user must write some description about that image, there are a significant amount of audio CAPTCHAs as well. Briefly, audio CAPTCHAs are sound files which consist of human sound under heavy noise where the speaker pronounces a bunch of digits consecutively. Generally, in those sound files, there are some periodic and non-periodic noises to get difficult to recognize them with a program but not for a human listener. We gathered numerous randomly collected audio file to train and then test them using our SVM algorithm to be able to extract digits out of each conversation. |
Tasks | |
Published | 2019-01-08 |
URL | http://arxiv.org/abs/1901.02153v1 |
http://arxiv.org/pdf/1901.02153v1.pdf | |
PWC | https://paperswithcode.com/paper/audio-captcha-recognition-using-rastaplp |
Repo | https://github.com/balcilar/Audio-Captcha-Recognition |
Framework | none |