January 31, 2020

3337 words 16 mins read

Paper Group AWR 397

A Graph Auto-Encoder for Haplotype Assembly and Viral Quasispecies Reconstruction. Root Identification in Minirhizotron Imagery with Multiple Instance Learning. Quality Assessment of In-the-Wild Videos. Improvements to Target-Based 3D LiDAR to Camera Calibration. Cross Attention Network for Few-shot Classification. Synthetic Epileptic Brain Activit …

A Graph Auto-Encoder for Haplotype Assembly and Viral Quasispecies Reconstruction


Title	A Graph Auto-Encoder for Haplotype Assembly and Viral Quasispecies Reconstruction
Authors	Ziqi Ke, Haris Vikalo
Abstract	Reconstructing components of a genomic mixture from data obtained by means of DNA sequencing is a challenging problem encountered in a variety of applications including single individual haplotyping and studies of viral communities. High-throughput DNA sequencing platforms oversample mixture components to provide massive amounts of reads whose relative positions can be determined by mapping the reads to a known reference genome; assembly of the components, however, requires discovery of the reads’ origin – an NP-hard problem that the existing methods struggle to solve with the required level of accuracy. In this paper, we present a learning framework based on a graph auto-encoder designed to exploit structural properties of sequencing data. The algorithm is a neural network which essentially trains to ignore sequencing errors and infers the posteriori probabilities of the origin of sequencing reads. Mixture components are then reconstructed by finding consensus of the reads determined to originate from the same genomic component. Results on realistic synthetic as well as experimental data demonstrate that the proposed framework reliably assembles haplotypes and reconstructs viral communities, often significantly outperforming state-of-the-art techniques.
Tasks
Published	2019-11-13
URL	https://arxiv.org/abs/1911.05316v1
PDF	https://arxiv.org/pdf/1911.05316v1.pdf
PWC	https://paperswithcode.com/paper/a-graph-auto-encoder-for-haplotype-assembly
Repo	https://github.com/WuLoli/GAEseq
Framework	tf

Root Identification in Minirhizotron Imagery with Multiple Instance Learning


Title	Root Identification in Minirhizotron Imagery with Multiple Instance Learning
Authors	Guohao Yu, Alina Zare, Hudanyun Sheng, Roser Matamala, Joel Reyes-Cabrera, Felix B. Frischi, Thomas E. Juenger
Abstract	In this paper, multiple instance learning (MIL) algorithms to automatically perform root detection and segmentation in minirhizotron imagery using only image-level labels are proposed. Root and soil characteristics vary from location to location, thus, supervised machine learning approaches that are trained with local data provide the best ability to identify and segment roots in minirhizotron imagery. However, labeling roots for training data (or otherwise) is an extremely tedious and time-consuming task. This paper aims to address this problem by labeling data at the image level (rather than the individual root or root pixel level) and train algorithms to perform individual root pixel level segmentation using MIL strategies. Three MIL methods (MI-ACE, miSVM, MIForests) were applied to root detection and compared to non-MIL approches. The results show that MIL methods improve root segmentation in challenging minirhizotron imagery and reduce the labeling burden. In our results, miSVM outperformed other methods. The MI-ACE algorithm was a close second with an added advantage that it learned an interpretable root signature which identified the traits used to distinguish roots from soil and did not require parameter selection.
Tasks	Multiple Instance Learning
Published	2019-03-07
URL	http://arxiv.org/abs/1903.03207v2
PDF	http://arxiv.org/pdf/1903.03207v2.pdf
PWC	https://paperswithcode.com/paper/root-identification-in-minirhizotron-imagery
Repo	https://github.com/GatorSense/MILMinirhizotronSegmentation
Framework	none

Quality Assessment of In-the-Wild Videos


Title	Quality Assessment of In-the-Wild Videos
Authors	Dingquan Li, Tingting Jiang, Ming Jiang
Abstract	Quality assessment of in-the-wild videos is a challenging problem because of the absence of reference videos and shooting distortions. Knowledge of the human visual system can help establish methods for objective quality assessment of in-the-wild videos. In this work, we show two eminent effects of the human visual system, namely, content-dependency and temporal-memory effects, could be used for this purpose. We propose an objective no-reference video quality assessment method by integrating both effects into a deep neural network. For content-dependency, we extract features from a pre-trained image classification neural network for its inherent content-aware property. For temporal-memory effects, long-term dependencies, especially the temporal hysteresis, are integrated into the network with a gated recurrent unit and a subjectively-inspired temporal pooling layer. To validate the performance of our method, experiments are conducted on three publicly available in-the-wild video quality assessment databases: KoNViD-1k, CVD2014, and LIVE-Qualcomm, respectively. Experimental results demonstrate that our proposed method outperforms five state-of-the-art methods by a large margin, specifically, 12.39%, 15.71%, 15.45%, and 18.09% overall performance improvements over the second-best method VBLIINDS, in terms of SROCC, KROCC, PLCC and RMSE, respectively. Moreover, the ablation study verifies the crucial role of both the content-aware features and the modeling of temporal-memory effects. The PyTorch implementation of our method is released at https://github.com/lidq92/VSFA.
Tasks	Image Classification, Video Quality Assessment
Published	2019-08-01
URL	https://arxiv.org/abs/1908.00375v3
PDF	https://arxiv.org/pdf/1908.00375v3.pdf
PWC	https://paperswithcode.com/paper/quality-assessment-of-in-the-wild-videos
Repo	https://github.com/SpikeKing/VQA-v2
Framework	pytorch

Improvements to Target-Based 3D LiDAR to Camera Calibration


Title	Improvements to Target-Based 3D LiDAR to Camera Calibration
Authors	Jiunn-Kai Huang, Jessy W. Grizzle
Abstract	The homogeneous transformation between a LiDAR and monocular camera is required for sensor fusion tasks, such as SLAM. While determining such a transformation is not considered glamorous in any sense of the word, it is nonetheless crucial for many modern autonomous systems. Indeed, an error of a few degrees in rotation or a few percent in translation can lead to 20 cm translation errors at a distance of 5 m when overlaying a LiDAR image on a camera image. The biggest impediments to determining the transformation accurately are the relative sparsity of LiDAR point clouds and systematic errors in their distance measurements. This paper proposes (1) the use of targets of known dimension and geometry to ameliorate target pose estimation in face of the quantization and systematic errors inherent in a LiDAR image of a target, and (2) a fitting method for the LiDAR to monocular camera transformation that fundamentally assumes the camera image data is the most accurate information in one’s possession.
Tasks	Calibration, Pose Estimation, Quantization, Sensor Fusion
Published	2019-10-07
URL	https://arxiv.org/abs/1910.03126v2
PDF	https://arxiv.org/pdf/1910.03126v2.pdf
PWC	https://paperswithcode.com/paper/improvements-to-target-based-3d-lidar-to
Repo	https://github.com/UMich-BipedLab/extrinsic_lidar_camera_calibration
Framework	none

Cross Attention Network for Few-shot Classification


Title	Cross Attention Network for Few-shot Classification
Authors	Ruibing Hou, Hong Chang, Bingpeng Ma, Shiguang Shan, Xilin Chen
Abstract	Few-shot classification aims to recognize unlabeled samples from unseen classes given only few labeled samples. The unseen classes and low-data problem make few-shot classification very challenging. Many existing approaches extracted features from labeled and unlabeled samples independently, as a result, the features are not discriminative enough. In this work, we propose a novel Cross Attention Network to address the challenging problems in few-shot classification. Firstly, Cross Attention Module is introduced to deal with the problem of unseen classes. The module generates cross attention maps for each pair of class feature and query sample feature so as to highlight the target object regions, making the extracted feature more discriminative. Secondly, a transductive inference algorithm is proposed to alleviate the low-data problem, which iteratively utilizes the unlabeled query set to augment the support set, thereby making the class features more representative. Extensive experiments on two benchmarks show our method is a simple, effective and computationally efficient framework and outperforms the state-of-the-arts.
Tasks
Published	2019-10-17
URL	https://arxiv.org/abs/1910.07677v1
PDF	https://arxiv.org/pdf/1910.07677v1.pdf
PWC	https://paperswithcode.com/paper/cross-attention-network-for-few-shot
Repo	https://github.com/blue-blue272/fewshot-CAN
Framework	pytorch

Synthetic Epileptic Brain Activities Using Generative Adversarial Networks


Title	Synthetic Epileptic Brain Activities Using Generative Adversarial Networks
Authors	Damian Pascual, Amir Aminifar, David Atienza, Philippe Ryvlin, Roger Wattenhofer
Abstract	Epilepsy is a chronic neurological disorder affecting more than 65 million people worldwide and manifested by recurrent unprovoked seizures. The unpredictability of seizures not only degrades the quality of life of the patients, but it can also be life-threatening. Modern systems monitoring electroencephalography (EEG) signals are being currently developed with the view to detect epileptic seizures in order to alert caregivers and reduce the impact of seizures on patients’ quality of life. Such seizure detection systems employ state-of-the-art machine learning algorithms that require a considerably large amount of labeled personal data for training. However, acquiring EEG signals of epileptic seizures is a costly and time-consuming process for medical experts and patients, currently requiring in-hospital recordings in specialized units. In this work, we generate synthetic seizure-like brain electrical activities, i.e., EEG signals, that can be used to train seizure detection algorithms, alleviating the need for recorded data. First, we train a Generative Adversarial Network (GAN) with data from 30 epilepsy patients. Then, we generate synthetic personalized training sets for new, unseen patients, which overall yield higher detection performance than the real-data training sets. We demonstrate our results using the datasets from the EPILEPSIAE Project, one of the world’s largest public databases for seizure detection.
Tasks	EEG, Seizure Detection
Published	2019-07-22
URL	https://arxiv.org/abs/1907.10518v3
PDF	https://arxiv.org/pdf/1907.10518v3.pdf
PWC	https://paperswithcode.com/paper/synthetic-epileptic-brain-activities-using
Repo	https://github.com/dapascual/GAN_epilepsy
Framework	tf

Adversarial Robustness as a Prior for Learned Representations


Title	Adversarial Robustness as a Prior for Learned Representations
Authors	Logan Engstrom, Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Brandon Tran, Aleksander Madry
Abstract	An important goal in deep learning is to learn versatile, high-level feature representations of input data. However, standard networks’ representations seem to possess shortcomings that, as we illustrate, prevent them from fully realizing this goal. In this work, we show that robust optimization can be re-cast as a tool for enforcing priors on the features learned by deep neural networks. It turns out that representations learned by robust models address the aforementioned shortcomings and make significant progress towards learning a high-level encoding of inputs. In particular, these representations are approximately invertible, while allowing for direct visualization and manipulation of salient input features. More broadly, our results indicate adversarial robustness as a promising avenue for improving learned representations. Our code and models for reproducing these results is available at https://git.io/robust-reps .
Tasks
Published	2019-06-03
URL	https://arxiv.org/abs/1906.00945v2
PDF	https://arxiv.org/pdf/1906.00945v2.pdf
PWC	https://paperswithcode.com/paper/190600945
Repo	https://github.com/anguyen8/sam
Framework	pytorch

Scale Steerable Filters for Locally Scale-Invariant Convolutional Neural Networks


Title	Scale Steerable Filters for Locally Scale-Invariant Convolutional Neural Networks
Authors	Rohan Ghosh, Anupam K. Gupta
Abstract	Augmenting transformation knowledge onto a convolutional neural network’s weights has often yielded significant improvements in performance. For rotational transformation augmentation, an important element to recent approaches has been the use of a steerable basis i.e. the circular harmonics. Here, we propose a scale-steerable filter basis for the locally scale-invariant CNN, denoted as log-radial harmonics. By replacing the kernels in the locally scale-invariant CNN \cite{lsi_cnn} with scale-steered kernels, significant improvements in performance can be observed on the MNIST-Scale and FMNIST-Scale datasets. Training with a scale-steerable basis results in filters which show meaningful structure, and feature maps demonstrate which demonstrate visibly higher spatial-structure preservation of input. Furthermore, the proposed scale-steerable CNN shows on-par generalization to global affine transformation estimation methods such as Spatial Transformers, in response to test-time data distortions.
Tasks
Published	2019-06-10
URL	https://arxiv.org/abs/1906.03861v1
PDF	https://arxiv.org/pdf/1906.03861v1.pdf
PWC	https://paperswithcode.com/paper/scale-steerable-filters-for-locally-scale
Repo	https://github.com/rghosh92/SS-CNN
Framework	pytorch

River Ice Segmentation with Deep Learning


Title	River Ice Segmentation with Deep Learning
Authors	Abhineet Singh, Hayden Kalke, Mark Loewen, Nilanjan Ray
Abstract	This paper deals with the problem of computing surface ice concentration for two different types of ice from digital images of river surface. It presents the results of attempting to solve this problem using several state of the art semantic segmentation methods based on deep convolutional neural networks (CNNs). This task presents two main challenges - very limited availability of labeled training data and presence of noisy labels due to the great difficulty of visually distinguishing between the two types of ice, even for human experts. The results are used to analyze the extent to which some of the best deep learning methods currently in existence can handle these challenges. The code and data used in the experiments are made publicly available to facilitate further work in this domain.
Tasks	Semantic Segmentation
Published	2019-01-14
URL	https://arxiv.org/abs/1901.04412v2
PDF	https://arxiv.org/pdf/1901.04412v2.pdf
PWC	https://paperswithcode.com/paper/river-ice-segmentation-with-deep-learning
Repo	https://github.com/abhineet123/animal_detection
Framework	tf

Resolving Gendered Ambiguous Pronouns with BERT


Title	Resolving Gendered Ambiguous Pronouns with BERT
Authors	Matei Ionita, Yury Kashnitsky, Ken Krige, Vladimir Larin, Denis Logvinenko, Atanas Atanasov
Abstract	Pronoun resolution is part of coreference resolution, the task of pairing an expression to its referring entity. This is an important task for natural language understanding and a necessary component of machine translation systems, chat bots and assistants. Neural machine learning systems perform far from ideally in this task, reaching as low as 73% F1 scores on modern benchmark datasets. Moreover, they tend to perform better for masculine pronouns than for feminine ones. Thus, the problem is both challenging and important for NLP researchers and practitioners. In this project, we describe our BERT-based approach to solving the problem of gender-balanced pronoun resolution. We are able to reach 92% F1 score and a much lower gender bias on the benchmark dataset shared by Google AI Language team.
Tasks	Coreference Resolution, Machine Translation
Published	2019-06-03
URL	https://arxiv.org/abs/1906.01161v2
PDF	https://arxiv.org/pdf/1906.01161v2.pdf
PWC	https://paperswithcode.com/paper/resolving-gendered-ambiguous-pronouns-with
Repo	https://github.com/Yorko/gender-unbiased_BERT-based_pronoun_resolution
Framework	tf

Privacy-Preserving Deep Visual Recognition: An Adversarial Learning Framework and A New Dataset


Title	Privacy-Preserving Deep Visual Recognition: An Adversarial Learning Framework and A New Dataset
Authors	Haotao Wang, Zhenyu Wu, Zhangyang Wang, Zhaowen Wang, Hailin Jin
Abstract	This paper aims to boost privacy-preserving visual recognition, an increasingly demanded feature in smart camera applications, using deep learning. We formulate a unique adversarial training framework, that learns a degradation transform for the original video inputs, in order to explicitly optimize the trade-off between target task performance and the associated privacy budgets on the degraded video. We carefully analyze and benchmark three different optimization strategies to train the resulting model. Notably, the privacy budget, often defined and measured in task-driven contexts, cannot be reliably indicated using any single model performance, because a strong protection of privacy has to sustain against any possible model that tries to hack privacy information. In order to tackle this problem, we propose two strategies: model restarting and model ensemble, which can be easily plug-and-play into our training algorithms and further improve the performance. Extensive experiments have been carried out and analyzed. On the other hand, few public datasets are available with both utility and privacy labels provided, making the power of data-driven (supervised) learning not yet fully unleashed on this task. We first discuss an innovative heuristic of cross-dataset training and evaluation, that jointly utilizes two datasets with target task and privacy labels respectively, for adversarial training. To further alleviate this challenge, we have constructed a new dataset, termed PA-HMDB51, with both target task (action) and selected privacy attributes (gender, age, race, nudity, and relationship) labeled on a frame-wise basis. This first-of-its-kind video dataset further validates the effectiveness of our proposed framework, and opens up new opportunities for the research community.
Tasks	Privacy Preserving Deep Learning
Published	2019-06-12
URL	https://arxiv.org/abs/1906.05675v2
PDF	https://arxiv.org/pdf/1906.05675v2.pdf
PWC	https://paperswithcode.com/paper/privacy-preserving-deep-visual-recognition-an
Repo	https://github.com/htwang14/PA-HMDB51
Framework	none

Binary Stochastic Filtering: a Method for Neural Network Size Minimization and Supervised Feature Selection


Title	Binary Stochastic Filtering: a Method for Neural Network Size Minimization and Supervised Feature Selection
Authors	Andrii Trelin, Ales Prochazka
Abstract	Binary Stochastic Filtering (BSF), the algorithm for feature selection and neuron pruning is proposed in this work. The method defines filtering layer which penalizes amount of the information involved in the training process. This information could be the input data or output of the previous layer, which directly leads to the feature selection or neuron pruning respectively, producing \textit{ad hoc} subset of features or selecting optimal number of neurons in each layer. Filtering layer stochastically passes or drops features based on individual weights, which are tuned with standard backpropagation algorithm during the training process. Multifold decrease of neural network size has been achieved in the experiments. Besides, the method was able to select minimal number of features, surpassing literature references by the accuracy/dimensionality ratio.
Tasks	Feature Selection
Published	2019-02-12
URL	https://arxiv.org/abs/1902.04510v2
PDF	https://arxiv.org/pdf/1902.04510v2.pdf
PWC	https://paperswithcode.com/paper/binary-stochastic-filtering-a-solution-for
Repo	https://github.com/Trel725/BSFilter
Framework	tf

Dual Path Multi-Scale Fusion Networks with Attention for Crowd Counting


Title	Dual Path Multi-Scale Fusion Networks with Attention for Crowd Counting
Authors	Liang Zhu, Zhijian Zhao, Chao Lu, Yining Lin, Yao Peng, Tangren Yao
Abstract	The task of crowd counting in varying density scenes is an extremely difficult challenge due to large scale variations. In this paper, we propose a novel dual path multi-scale fusion network architecture with attention mechanism named SFANet that can perform accurate count estimation as well as present high-resolution density maps for highly congested crowd scenes. The proposed SFANet contains two main components: a VGG backbone convolutional neural network (CNN) as the front-end feature map extractor and a dual path multi-scale fusion networks as the back-end to generate density map. These dual path multi-scale fusion networks have the same structure, one path is responsible for generating attention map by highlighting crowd regions in images, the other path is responsible for fusing multi-scale features as well as attention map to generate the final high-quality high-resolution density maps. SFANet can be easily trained in an end-to-end way by dual path joint training. We have evaluated our method on four crowd counting datasets (ShanghaiTech, UCF CC 50, UCSD and UCF-QRNF). The results demonstrate that with attention mechanism and multi-scale feature fusion, the proposed SFANet achieves the best performance on all these datasets and generates better quality density maps compared with other state-of-the-art approaches.
Tasks	Crowd Counting
Published	2019-02-04
URL	http://arxiv.org/abs/1902.01115v1
PDF	http://arxiv.org/pdf/1902.01115v1.pdf
PWC	https://paperswithcode.com/paper/dual-path-multi-scale-fusion-networks-with
Repo	https://github.com/pxq0312/ASD-crowd-counting
Framework	pytorch

Accurate Retinal Vessel Segmentation via Octave Convolution Neural Network


Title	Accurate Retinal Vessel Segmentation via Octave Convolution Neural Network
Authors	Zhun Fan, Jiajie Mo, Benzhang Qiu, Wenji Li, Guijie Zhu, Chong Li, Jianye Hu, Yibiao Rong, Xinjian Chen
Abstract	Retinal vessel segmentation is a crucial step in diagnosing and screening various diseases, including diabetes, ophthalmologic diseases, and cardiovascular diseases. In this paper, we propose an effective and efficient method for vessel segmentation in color fundus images using encoder-decoder based octave convolution network. Compared with other convolution networks utilizing vanilla convolution for feature extraction, the proposed method adopts octave convolution for learning multiple-spatial-frequency features, thus can better capture retinal vasculatures with varying sizes and shapes. It is demonstrated that the feature maps of low-frequency kernels respond mainly to the major vascular tree, whereas the high-frequency feature maps can better capture the fine details of thin vessels. To provide the network the capability of learning how to decode multifrequency features, we extend octave convolution and propose a new operation named octave transposed convolution. A novel architecture of convolutional neural network is proposed based on the encoder-decoder architecture of UNet, which can generate high resolution vessel segmentation in one single forward feeding. The proposed method is evaluated on four publicly available datasets, including DRIVE, STARE, CHASE_DB1, and HRF. Extensive experimental results demonstrate that the proposed approach achieves better or comparable performance to the state-of-the-art methods with fast processing speed.
Tasks	Retinal Vessel Segmentation
Published	2019-06-28
URL	https://arxiv.org/abs/1906.12193v7
PDF	https://arxiv.org/pdf/1906.12193v7.pdf
PWC	https://paperswithcode.com/paper/accurate-retinal-vessel-segmentation-via
Repo	https://github.com/koshian2/OctConv-TFKeras
Framework	tf

Human Action Recognition Using Deep Multilevel Multimodal (M2) Fusion of Depth and Inertial Sensors


Title	Human Action Recognition Using Deep Multilevel Multimodal (M2) Fusion of Depth and Inertial Sensors
Authors	Zeeshan Ahmad, Naimul Khan
Abstract	Multimodal fusion frameworks for Human Action Recognition (HAR) using depth and inertial sensor data have been proposed over the years. In most of the existing works, fusion is performed at a single level (feature level or decision level), missing the opportunity to fuse rich mid-level features necessary for better classification. To address this shortcoming, in this paper, we propose three novel deep multilevel multimodal fusion frameworks to capitalize on different fusion strategies at various stages and to leverage the superiority of multilevel fusion. At input, we transform the depth data into depth images called sequential front view images (SFIs) and inertial sensor data into signal images. Each input modality, depth and inertial, is further made multimodal by taking convolution with the Prewitt filter. Creating “modality within modality” enables further complementary and discriminative feature extraction through Convolutional Neural Networks (CNNs). CNNs are trained on input images of each modality to learn low-level, high-level and complex features. Learned features are extracted and fused at different stages of the proposed frameworks to combine discriminative and complementary information. These highly informative features are served as input to a multi-class Support Vector Machine (SVM). We evaluate the proposed frameworks on three publicly available multimodal HAR datasets, namely, UTD Multimodal Human Action Dataset (MHAD), Berkeley MHAD, and UTD-MHAD Kinect V2. Experimental results show the supremacy of the proposed fusion frameworks over existing methods.
Tasks	Temporal Action Localization
Published	2019-10-25
URL	https://arxiv.org/abs/1910.11482v1
PDF	https://arxiv.org/pdf/1910.11482v1.pdf
PWC	https://paperswithcode.com/paper/human-action-recognition-using-deep
Repo	https://github.com/zaamad/Deep-Multilevel-Multimodal-Fusion
Framework	none