January 31, 2020

3337 words 16 mins read

Paper Group AWR 397

Paper Group AWR 397

A Graph Auto-Encoder for Haplotype Assembly and Viral Quasispecies Reconstruction. Root Identification in Minirhizotron Imagery with Multiple Instance Learning. Quality Assessment of In-the-Wild Videos. Improvements to Target-Based 3D LiDAR to Camera Calibration. Cross Attention Network for Few-shot Classification. Synthetic Epileptic Brain Activit …

A Graph Auto-Encoder for Haplotype Assembly and Viral Quasispecies Reconstruction

Title A Graph Auto-Encoder for Haplotype Assembly and Viral Quasispecies Reconstruction
Authors Ziqi Ke, Haris Vikalo
Abstract Reconstructing components of a genomic mixture from data obtained by means of DNA sequencing is a challenging problem encountered in a variety of applications including single individual haplotyping and studies of viral communities. High-throughput DNA sequencing platforms oversample mixture components to provide massive amounts of reads whose relative positions can be determined by mapping the reads to a known reference genome; assembly of the components, however, requires discovery of the reads’ origin – an NP-hard problem that the existing methods struggle to solve with the required level of accuracy. In this paper, we present a learning framework based on a graph auto-encoder designed to exploit structural properties of sequencing data. The algorithm is a neural network which essentially trains to ignore sequencing errors and infers the posteriori probabilities of the origin of sequencing reads. Mixture components are then reconstructed by finding consensus of the reads determined to originate from the same genomic component. Results on realistic synthetic as well as experimental data demonstrate that the proposed framework reliably assembles haplotypes and reconstructs viral communities, often significantly outperforming state-of-the-art techniques.
Tasks
Published 2019-11-13
URL https://arxiv.org/abs/1911.05316v1
PDF https://arxiv.org/pdf/1911.05316v1.pdf
PWC https://paperswithcode.com/paper/a-graph-auto-encoder-for-haplotype-assembly
Repo https://github.com/WuLoli/GAEseq
Framework tf

Root Identification in Minirhizotron Imagery with Multiple Instance Learning

Title Root Identification in Minirhizotron Imagery with Multiple Instance Learning
Authors Guohao Yu, Alina Zare, Hudanyun Sheng, Roser Matamala, Joel Reyes-Cabrera, Felix B. Frischi, Thomas E. Juenger
Abstract In this paper, multiple instance learning (MIL) algorithms to automatically perform root detection and segmentation in minirhizotron imagery using only image-level labels are proposed. Root and soil characteristics vary from location to location, thus, supervised machine learning approaches that are trained with local data provide the best ability to identify and segment roots in minirhizotron imagery. However, labeling roots for training data (or otherwise) is an extremely tedious and time-consuming task. This paper aims to address this problem by labeling data at the image level (rather than the individual root or root pixel level) and train algorithms to perform individual root pixel level segmentation using MIL strategies. Three MIL methods (MI-ACE, miSVM, MIForests) were applied to root detection and compared to non-MIL approches. The results show that MIL methods improve root segmentation in challenging minirhizotron imagery and reduce the labeling burden. In our results, miSVM outperformed other methods. The MI-ACE algorithm was a close second with an added advantage that it learned an interpretable root signature which identified the traits used to distinguish roots from soil and did not require parameter selection.
Tasks Multiple Instance Learning
Published 2019-03-07
URL http://arxiv.org/abs/1903.03207v2
PDF http://arxiv.org/pdf/1903.03207v2.pdf
PWC https://paperswithcode.com/paper/root-identification-in-minirhizotron-imagery
Repo https://github.com/GatorSense/MILMinirhizotronSegmentation
Framework none

Quality Assessment of In-the-Wild Videos

Title Quality Assessment of In-the-Wild Videos
Authors Dingquan Li, Tingting Jiang, Ming Jiang
Abstract Quality assessment of in-the-wild videos is a challenging problem because of the absence of reference videos and shooting distortions. Knowledge of the human visual system can help establish methods for objective quality assessment of in-the-wild videos. In this work, we show two eminent effects of the human visual system, namely, content-dependency and temporal-memory effects, could be used for this purpose. We propose an objective no-reference video quality assessment method by integrating both effects into a deep neural network. For content-dependency, we extract features from a pre-trained image classification neural network for its inherent content-aware property. For temporal-memory effects, long-term dependencies, especially the temporal hysteresis, are integrated into the network with a gated recurrent unit and a subjectively-inspired temporal pooling layer. To validate the performance of our method, experiments are conducted on three publicly available in-the-wild video quality assessment databases: KoNViD-1k, CVD2014, and LIVE-Qualcomm, respectively. Experimental results demonstrate that our proposed method outperforms five state-of-the-art methods by a large margin, specifically, 12.39%, 15.71%, 15.45%, and 18.09% overall performance improvements over the second-best method VBLIINDS, in terms of SROCC, KROCC, PLCC and RMSE, respectively. Moreover, the ablation study verifies the crucial role of both the content-aware features and the modeling of temporal-memory effects. The PyTorch implementation of our method is released at https://github.com/lidq92/VSFA.
Tasks Image Classification, Video Quality Assessment
Published 2019-08-01
URL https://arxiv.org/abs/1908.00375v3
PDF https://arxiv.org/pdf/1908.00375v3.pdf
PWC https://paperswithcode.com/paper/quality-assessment-of-in-the-wild-videos
Repo https://github.com/SpikeKing/VQA-v2
Framework pytorch

Improvements to Target-Based 3D LiDAR to Camera Calibration

Title Improvements to Target-Based 3D LiDAR to Camera Calibration
Authors Jiunn-Kai Huang, Jessy W. Grizzle
Abstract The homogeneous transformation between a LiDAR and monocular camera is required for sensor fusion tasks, such as SLAM. While determining such a transformation is not considered glamorous in any sense of the word, it is nonetheless crucial for many modern autonomous systems. Indeed, an error of a few degrees in rotation or a few percent in translation can lead to 20 cm translation errors at a distance of 5 m when overlaying a LiDAR image on a camera image. The biggest impediments to determining the transformation accurately are the relative sparsity of LiDAR point clouds and systematic errors in their distance measurements. This paper proposes (1) the use of targets of known dimension and geometry to ameliorate target pose estimation in face of the quantization and systematic errors inherent in a LiDAR image of a target, and (2) a fitting method for the LiDAR to monocular camera transformation that fundamentally assumes the camera image data is the most accurate information in one’s possession.
Tasks Calibration, Pose Estimation, Quantization, Sensor Fusion
Published 2019-10-07
URL https://arxiv.org/abs/1910.03126v2
PDF https://arxiv.org/pdf/1910.03126v2.pdf
PWC https://paperswithcode.com/paper/improvements-to-target-based-3d-lidar-to
Repo https://github.com/UMich-BipedLab/extrinsic_lidar_camera_calibration
Framework none

Cross Attention Network for Few-shot Classification

Title Cross Attention Network for Few-shot Classification
Authors Ruibing Hou, Hong Chang, Bingpeng Ma, Shiguang Shan, Xilin Chen
Abstract Few-shot classification aims to recognize unlabeled samples from unseen classes given only few labeled samples. The unseen classes and low-data problem make few-shot classification very challenging. Many existing approaches extracted features from labeled and unlabeled samples independently, as a result, the features are not discriminative enough. In this work, we propose a novel Cross Attention Network to address the challenging problems in few-shot classification. Firstly, Cross Attention Module is introduced to deal with the problem of unseen classes. The module generates cross attention maps for each pair of class feature and query sample feature so as to highlight the target object regions, making the extracted feature more discriminative. Secondly, a transductive inference algorithm is proposed to alleviate the low-data problem, which iteratively utilizes the unlabeled query set to augment the support set, thereby making the class features more representative. Extensive experiments on two benchmarks show our method is a simple, effective and computationally efficient framework and outperforms the state-of-the-arts.
Tasks
Published 2019-10-17
URL https://arxiv.org/abs/1910.07677v1
PDF https://arxiv.org/pdf/1910.07677v1.pdf
PWC https://paperswithcode.com/paper/cross-attention-network-for-few-shot
Repo https://github.com/blue-blue272/fewshot-CAN
Framework pytorch

Synthetic Epileptic Brain Activities Using Generative Adversarial Networks

Title Synthetic Epileptic Brain Activities Using Generative Adversarial Networks
Authors Damian Pascual, Amir Aminifar, David Atienza, Philippe Ryvlin, Roger Wattenhofer
Abstract Epilepsy is a chronic neurological disorder affecting more than 65 million people worldwide and manifested by recurrent unprovoked seizures. The unpredictability of seizures not only degrades the quality of life of the patients, but it can also be life-threatening. Modern systems monitoring electroencephalography (EEG) signals are being currently developed with the view to detect epileptic seizures in order to alert caregivers and reduce the impact of seizures on patients’ quality of life. Such seizure detection systems employ state-of-the-art machine learning algorithms that require a considerably large amount of labeled personal data for training. However, acquiring EEG signals of epileptic seizures is a costly and time-consuming process for medical experts and patients, currently requiring in-hospital recordings in specialized units. In this work, we generate synthetic seizure-like brain electrical activities, i.e., EEG signals, that can be used to train seizure detection algorithms, alleviating the need for recorded data. First, we train a Generative Adversarial Network (GAN) with data from 30 epilepsy patients. Then, we generate synthetic personalized training sets for new, unseen patients, which overall yield higher detection performance than the real-data training sets. We demonstrate our results using the datasets from the EPILEPSIAE Project, one of the world’s largest public databases for seizure detection.
Tasks EEG, Seizure Detection
Published 2019-07-22
URL https://arxiv.org/abs/1907.10518v3
PDF https://arxiv.org/pdf/1907.10518v3.pdf
PWC https://paperswithcode.com/paper/synthetic-epileptic-brain-activities-using
Repo https://github.com/dapascual/GAN_epilepsy
Framework tf

Adversarial Robustness as a Prior for Learned Representations

Title Adversarial Robustness as a Prior for Learned Representations
Authors Logan Engstrom, Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Brandon Tran, Aleksander Madry
Abstract An important goal in deep learning is to learn versatile, high-level feature representations of input data. However, standard networks’ representations seem to possess shortcomings that, as we illustrate, prevent them from fully realizing this goal. In this work, we show that robust optimization can be re-cast as a tool for enforcing priors on the features learned by deep neural networks. It turns out that representations learned by robust models address the aforementioned shortcomings and make significant progress towards learning a high-level encoding of inputs. In particular, these representations are approximately invertible, while allowing for direct visualization and manipulation of salient input features. More broadly, our results indicate adversarial robustness as a promising avenue for improving learned representations. Our code and models for reproducing these results is available at https://git.io/robust-reps .
Tasks
Published 2019-06-03
URL https://arxiv.org/abs/1906.00945v2
PDF https://arxiv.org/pdf/1906.00945v2.pdf
PWC https://paperswithcode.com/paper/190600945
Repo https://github.com/anguyen8/sam
Framework pytorch

Scale Steerable Filters for Locally Scale-Invariant Convolutional Neural Networks

Title Scale Steerable Filters for Locally Scale-Invariant Convolutional Neural Networks
Authors Rohan Ghosh, Anupam K. Gupta
Abstract Augmenting transformation knowledge onto a convolutional neural network’s weights has often yielded significant improvements in performance. For rotational transformation augmentation, an important element to recent approaches has been the use of a steerable basis i.e. the circular harmonics. Here, we propose a scale-steerable filter basis for the locally scale-invariant CNN, denoted as log-radial harmonics. By replacing the kernels in the locally scale-invariant CNN \cite{lsi_cnn} with scale-steered kernels, significant improvements in performance can be observed on the MNIST-Scale and FMNIST-Scale datasets. Training with a scale-steerable basis results in filters which show meaningful structure, and feature maps demonstrate which demonstrate visibly higher spatial-structure preservation of input. Furthermore, the proposed scale-steerable CNN shows on-par generalization to global affine transformation estimation methods such as Spatial Transformers, in response to test-time data distortions.
Tasks
Published 2019-06-10
URL https://arxiv.org/abs/1906.03861v1
PDF https://arxiv.org/pdf/1906.03861v1.pdf
PWC https://paperswithcode.com/paper/scale-steerable-filters-for-locally-scale
Repo https://github.com/rghosh92/SS-CNN
Framework pytorch

River Ice Segmentation with Deep Learning

Title River Ice Segmentation with Deep Learning
Authors Abhineet Singh, Hayden Kalke, Mark Loewen, Nilanjan Ray
Abstract This paper deals with the problem of computing surface ice concentration for two different types of ice from digital images of river surface. It presents the results of attempting to solve this problem using several state of the art semantic segmentation methods based on deep convolutional neural networks (CNNs). This task presents two main challenges - very limited availability of labeled training data and presence of noisy labels due to the great difficulty of visually distinguishing between the two types of ice, even for human experts. The results are used to analyze the extent to which some of the best deep learning methods currently in existence can handle these challenges. The code and data used in the experiments are made publicly available to facilitate further work in this domain.
Tasks Semantic Segmentation
Published 2019-01-14
URL https://arxiv.org/abs/1901.04412v2
PDF https://arxiv.org/pdf/1901.04412v2.pdf
PWC https://paperswithcode.com/paper/river-ice-segmentation-with-deep-learning
Repo https://github.com/abhineet123/animal_detection
Framework tf

Resolving Gendered Ambiguous Pronouns with BERT

Title Resolving Gendered Ambiguous Pronouns with BERT
Authors Matei Ionita, Yury Kashnitsky, Ken Krige, Vladimir Larin, Denis Logvinenko, Atanas Atanasov
Abstract Pronoun resolution is part of coreference resolution, the task of pairing an expression to its referring entity. This is an important task for natural language understanding and a necessary component of machine translation systems, chat bots and assistants. Neural machine learning systems perform far from ideally in this task, reaching as low as 73% F1 scores on modern benchmark datasets. Moreover, they tend to perform better for masculine pronouns than for feminine ones. Thus, the problem is both challenging and important for NLP researchers and practitioners. In this project, we describe our BERT-based approach to solving the problem of gender-balanced pronoun resolution. We are able to reach 92% F1 score and a much lower gender bias on the benchmark dataset shared by Google AI Language team.
Tasks Coreference Resolution, Machine Translation
Published 2019-06-03
URL https://arxiv.org/abs/1906.01161v2
PDF https://arxiv.org/pdf/1906.01161v2.pdf
PWC https://paperswithcode.com/paper/resolving-gendered-ambiguous-pronouns-with
Repo https://github.com/Yorko/gender-unbiased_BERT-based_pronoun_resolution
Framework tf

Privacy-Preserving Deep Visual Recognition: An Adversarial Learning Framework and A New Dataset

Title Privacy-Preserving Deep Visual Recognition: An Adversarial Learning Framework and A New Dataset
Authors Haotao Wang, Zhenyu Wu, Zhangyang Wang, Zhaowen Wang, Hailin Jin
Abstract This paper aims to boost privacy-preserving visual recognition, an increasingly demanded feature in smart camera applications, using deep learning. We formulate a unique adversarial training framework, that learns a degradation transform for the original video inputs, in order to explicitly optimize the trade-off between target task performance and the associated privacy budgets on the degraded video. We carefully analyze and benchmark three different optimization strategies to train the resulting model. Notably, the privacy budget, often defined and measured in task-driven contexts, cannot be reliably indicated using any single model performance, because a strong protection of privacy has to sustain against any possible model that tries to hack privacy information. In order to tackle this problem, we propose two strategies: model restarting and model ensemble, which can be easily plug-and-play into our training algorithms and further improve the performance. Extensive experiments have been carried out and analyzed. On the other hand, few public datasets are available with both utility and privacy labels provided, making the power of data-driven (supervised) learning not yet fully unleashed on this task. We first discuss an innovative heuristic of cross-dataset training and evaluation, that jointly utilizes two datasets with target task and privacy labels respectively, for adversarial training. To further alleviate this challenge, we have constructed a new dataset, termed PA-HMDB51, with both target task (action) and selected privacy attributes (gender, age, race, nudity, and relationship) labeled on a frame-wise basis. This first-of-its-kind video dataset further validates the effectiveness of our proposed framework, and opens up new opportunities for the research community.
Tasks Privacy Preserving Deep Learning
Published 2019-06-12
URL https://arxiv.org/abs/1906.05675v2
PDF https://arxiv.org/pdf/1906.05675v2.pdf
PWC https://paperswithcode.com/paper/privacy-preserving-deep-visual-recognition-an
Repo https://github.com/htwang14/PA-HMDB51
Framework none

Binary Stochastic Filtering: a Method for Neural Network Size Minimization and Supervised Feature Selection

Title Binary Stochastic Filtering: a Method for Neural Network Size Minimization and Supervised Feature Selection
Authors Andrii Trelin, Ales Prochazka
Abstract Binary Stochastic Filtering (BSF), the algorithm for feature selection and neuron pruning is proposed in this work. The method defines filtering layer which penalizes amount of the information involved in the training process. This information could be the input data or output of the previous layer, which directly leads to the feature selection or neuron pruning respectively, producing \textit{ad hoc} subset of features or selecting optimal number of neurons in each layer. Filtering layer stochastically passes or drops features based on individual weights, which are tuned with standard backpropagation algorithm during the training process. Multifold decrease of neural network size has been achieved in the experiments. Besides, the method was able to select minimal number of features, surpassing literature references by the accuracy/dimensionality ratio.
Tasks Feature Selection
Published 2019-02-12
URL https://arxiv.org/abs/1902.04510v2
PDF https://arxiv.org/pdf/1902.04510v2.pdf
PWC https://paperswithcode.com/paper/binary-stochastic-filtering-a-solution-for
Repo https://github.com/Trel725/BSFilter
Framework tf

Dual Path Multi-Scale Fusion Networks with Attention for Crowd Counting

Title Dual Path Multi-Scale Fusion Networks with Attention for Crowd Counting
Authors Liang Zhu, Zhijian Zhao, Chao Lu, Yining Lin, Yao Peng, Tangren Yao
Abstract The task of crowd counting in varying density scenes is an extremely difficult challenge due to large scale variations. In this paper, we propose a novel dual path multi-scale fusion network architecture with attention mechanism named SFANet that can perform accurate count estimation as well as present high-resolution density maps for highly congested crowd scenes. The proposed SFANet contains two main components: a VGG backbone convolutional neural network (CNN) as the front-end feature map extractor and a dual path multi-scale fusion networks as the back-end to generate density map. These dual path multi-scale fusion networks have the same structure, one path is responsible for generating attention map by highlighting crowd regions in images, the other path is responsible for fusing multi-scale features as well as attention map to generate the final high-quality high-resolution density maps. SFANet can be easily trained in an end-to-end way by dual path joint training. We have evaluated our method on four crowd counting datasets (ShanghaiTech, UCF CC 50, UCSD and UCF-QRNF). The results demonstrate that with attention mechanism and multi-scale feature fusion, the proposed SFANet achieves the best performance on all these datasets and generates better quality density maps compared with other state-of-the-art approaches.
Tasks Crowd Counting
Published 2019-02-04
URL http://arxiv.org/abs/1902.01115v1
PDF http://arxiv.org/pdf/1902.01115v1.pdf
PWC https://paperswithcode.com/paper/dual-path-multi-scale-fusion-networks-with
Repo https://github.com/pxq0312/ASD-crowd-counting
Framework pytorch

Accurate Retinal Vessel Segmentation via Octave Convolution Neural Network

Title Accurate Retinal Vessel Segmentation via Octave Convolution Neural Network
Authors Zhun Fan, Jiajie Mo, Benzhang Qiu, Wenji Li, Guijie Zhu, Chong Li, Jianye Hu, Yibiao Rong, Xinjian Chen
Abstract Retinal vessel segmentation is a crucial step in diagnosing and screening various diseases, including diabetes, ophthalmologic diseases, and cardiovascular diseases. In this paper, we propose an effective and efficient method for vessel segmentation in color fundus images using encoder-decoder based octave convolution network. Compared with other convolution networks utilizing vanilla convolution for feature extraction, the proposed method adopts octave convolution for learning multiple-spatial-frequency features, thus can better capture retinal vasculatures with varying sizes and shapes. It is demonstrated that the feature maps of low-frequency kernels respond mainly to the major vascular tree, whereas the high-frequency feature maps can better capture the fine details of thin vessels. To provide the network the capability of learning how to decode multifrequency features, we extend octave convolution and propose a new operation named octave transposed convolution. A novel architecture of convolutional neural network is proposed based on the encoder-decoder architecture of UNet, which can generate high resolution vessel segmentation in one single forward feeding. The proposed method is evaluated on four publicly available datasets, including DRIVE, STARE, CHASE_DB1, and HRF. Extensive experimental results demonstrate that the proposed approach achieves better or comparable performance to the state-of-the-art methods with fast processing speed.
Tasks Retinal Vessel Segmentation
Published 2019-06-28
URL https://arxiv.org/abs/1906.12193v7
PDF https://arxiv.org/pdf/1906.12193v7.pdf
PWC https://paperswithcode.com/paper/accurate-retinal-vessel-segmentation-via
Repo https://github.com/koshian2/OctConv-TFKeras
Framework tf

Human Action Recognition Using Deep Multilevel Multimodal (M2) Fusion of Depth and Inertial Sensors

Title Human Action Recognition Using Deep Multilevel Multimodal (M2) Fusion of Depth and Inertial Sensors
Authors Zeeshan Ahmad, Naimul Khan
Abstract Multimodal fusion frameworks for Human Action Recognition (HAR) using depth and inertial sensor data have been proposed over the years. In most of the existing works, fusion is performed at a single level (feature level or decision level), missing the opportunity to fuse rich mid-level features necessary for better classification. To address this shortcoming, in this paper, we propose three novel deep multilevel multimodal fusion frameworks to capitalize on different fusion strategies at various stages and to leverage the superiority of multilevel fusion. At input, we transform the depth data into depth images called sequential front view images (SFIs) and inertial sensor data into signal images. Each input modality, depth and inertial, is further made multimodal by taking convolution with the Prewitt filter. Creating “modality within modality” enables further complementary and discriminative feature extraction through Convolutional Neural Networks (CNNs). CNNs are trained on input images of each modality to learn low-level, high-level and complex features. Learned features are extracted and fused at different stages of the proposed frameworks to combine discriminative and complementary information. These highly informative features are served as input to a multi-class Support Vector Machine (SVM). We evaluate the proposed frameworks on three publicly available multimodal HAR datasets, namely, UTD Multimodal Human Action Dataset (MHAD), Berkeley MHAD, and UTD-MHAD Kinect V2. Experimental results show the supremacy of the proposed fusion frameworks over existing methods.
Tasks Temporal Action Localization
Published 2019-10-25
URL https://arxiv.org/abs/1910.11482v1
PDF https://arxiv.org/pdf/1910.11482v1.pdf
PWC https://paperswithcode.com/paper/human-action-recognition-using-deep
Repo https://github.com/zaamad/Deep-Multilevel-Multimodal-Fusion
Framework none
comments powered by Disqus