January 26, 2020

3359 words 16 mins read

Paper Group ANR 1478

Improving Map Re-localization with Deep ‘Movable’ Objects Segmentation on 3D LiDAR Point Clouds. Dual Learning-based Video Coding with Inception Dense Blocks. DeepcomplexMRI: Exploiting deep residual network for fast parallel MR imaging with complex convolution. Recognizing Facial Expressions of Occluded Faces using Convolutional Neural Networks. C …

Improving Map Re-localization with Deep ‘Movable’ Objects Segmentation on 3D LiDAR Point Clouds


Title	Improving Map Re-localization with Deep ‘Movable’ Objects Segmentation on 3D LiDAR Point Clouds
Authors	Victor Vaquero, Kai Fischer, Francesc Moreno-Noguer, Alberto Sanfeliu, Stefan Milz
Abstract	Localization and Mapping is an essential component to enable Autonomous Vehicles navigation, and requires an accuracy exceeding that of commercial GPS-based systems. Current odometry and mapping algorithms are able to provide this accurate information. However, the lack of robustness of these algorithms against dynamic obstacles and environmental changes, even for short time periods, forces the generation of new maps on every session without taking advantage of previously obtained ones. In this paper we propose the use of a deep learning architecture to segment movable objects from 3D LiDAR point clouds in order to obtain longer-lasting 3D maps. This will in turn allow for better, faster and more accurate re-localization and trajectoy estimation on subsequent days. We show the effectiveness of our approach in a very dynamic and cluttered scenario, a supermarket parking lot. For that, we record several sequences on different days and compare localization errors with and without our movable objects segmentation method. Results show that we are able to accurately re-locate over a filtered map, consistently reducing trajectory errors between an average of 35.1% with respect to a non-filtered map version and of 47.9% with respect to a standalone map created on the current session.
Tasks	Autonomous Vehicles
Published	2019-10-08
URL	https://arxiv.org/abs/1910.03336v1
PDF	https://arxiv.org/pdf/1910.03336v1.pdf
PWC	https://paperswithcode.com/paper/improving-map-re-localization-with-deep
Repo
Framework

Dual Learning-based Video Coding with Inception Dense Blocks


Title	Dual Learning-based Video Coding with Inception Dense Blocks
Authors	Chao Liu, Heming Sun, Junan Chen, Zhengxue Cheng, Masaru Takeuchi, Jiro Katto, Xiaoyang Zeng, Yibo Fan
Abstract	In this paper, a dual learning-based method in intra coding is introduced for PCS Grand Challenge. This method is mainly composed of two parts: intra prediction and reconstruction filtering. They use different network structures, the neural network-based intra prediction uses the full-connected network to predict the block while the neural network-based reconstruction filtering utilizes the convolutional networks. Different with the previous filtering works, we use a network with more powerful feature extraction capabilities in our reconstruction filtering network. And the filtering unit is the block-level so as to achieve a more accurate filtering compensation. To our best knowledge, among all the learning-based methods, this is the first attempt to combine two different networks in one application, and we achieve the state-of-the-art performance for AI configuration on the HEVC Test sequences. The experimental result shows that our method leads to significant BD-rate saving for provided 8 sequences compared to HM-16.20 baseline (average 10.24% and 3.57% bitrate reductions for all-intra and random-access coding, respectively). For HEVC test sequences, our model also achieved a 9.70% BD-rate saving compared to HM-16.20 baseline for all-intra configuration.
Tasks
Published	2019-11-22
URL	https://arxiv.org/abs/1911.09857v1
PDF	https://arxiv.org/pdf/1911.09857v1.pdf
PWC	https://paperswithcode.com/paper/dual-learning-based-video-coding-with
Repo
Framework

DeepcomplexMRI: Exploiting deep residual network for fast parallel MR imaging with complex convolution


Title	DeepcomplexMRI: Exploiting deep residual network for fast parallel MR imaging with complex convolution
Authors	Shanshan Wang, Huitao Cheng, Leslie Ying, Taohui Xiao, Ziwen Ke, Xin Liu, Hairong Zheng, Dong Liang
Abstract	This paper proposes a multi-channel image reconstruction method, named DeepcomplexMRI, to accelerate parallel MR imaging with residual complex convolutional neural network. Different from most existing works which rely on the utilization of the coil sensitivities or prior information of predefined transforms, DeepcomplexMRI takes advantage of the availability of a large number of existing multi-channel groudtruth images and uses them as labeled data to train the deep residual convolutional neural network offline. In particular, a complex convolutional network is proposed to take into account the correlation between the real and imaginary parts of MR images. In addition, the k space data consistency is further enforced repeatedly in between layers of the network. The evaluations on in vivo datasets show that the proposed method has the capability to recover the desired multi-channel images. Its comparison with state-of-the-art method also demonstrates that the proposed method can reconstruct the desired MR images more accurately.
Tasks	Image Reconstruction
Published	2019-06-11
URL	https://arxiv.org/abs/1906.04359v2
PDF	https://arxiv.org/pdf/1906.04359v2.pdf
PWC	https://paperswithcode.com/paper/deepcomplexmri-exploiting-deep-residual
Repo
Framework

Recognizing Facial Expressions of Occluded Faces using Convolutional Neural Networks


Title	Recognizing Facial Expressions of Occluded Faces using Convolutional Neural Networks
Authors	Mariana-Iuliana Georgescu, Radu Tudor Ionescu
Abstract	In this paper, we present an approach based on convolutional neural networks (CNNs) for facial expression recognition in a difficult setting with severe occlusions. More specifically, our task is to recognize the facial expression of a person wearing a virtual reality (VR) headset which essentially occludes the upper part of the face. In order to accurately train neural networks for this setting, in which faces are severely occluded, we modify the training examples by intentionally occluding the upper half of the face. This forces the neural networks to focus on the lower part of the face and to obtain better accuracy rates than models trained on the entire faces. Our empirical results on two benchmark data sets, FER+ and AffectNet, show that our CNN models’ predictions on lower-half faces are up to 13% higher than the baseline CNN models trained on entire faces, proving their suitability for the VR setting. Furthermore, our models’ predictions on lower-half faces are no more than 10% under the baseline models’ predictions on full faces, proving that there are enough clues in the lower part of the face to accurately predict facial expressions.
Tasks	Facial Expression Recognition
Published	2019-11-12
URL	https://arxiv.org/abs/1911.04852v1
PDF	https://arxiv.org/pdf/1911.04852v1.pdf
PWC	https://paperswithcode.com/paper/recognizing-facial-expressions-of-occluded
Repo
Framework

Characterizing Human Behaviours Using Statistical Motion Descriptor


Title	Characterizing Human Behaviours Using Statistical Motion Descriptor
Authors	Eissa Jaber Alreshidi, Mohammad Bilal
Abstract	Identifying human behaviors is a challenging research problem due to the complexity and variation of appearances and postures, the variation of camera settings, and view angles. In this paper, we try to address the problem of human behavior identification by introducing a novel motion descriptor based on statistical features. The method first divide the video into N number of temporal segments. Then for each segment, we compute dense optical flow, which provides instantaneous velocity information for all the pixels. We then compute Histogram of Optical Flow (HOOF) weighted by the norm and quantized into 32 bins. We then compute statistical features from the obtained HOOF forming a descriptor vector of 192- dimensions. We then train a non-linear multi-class SVM that classify different human behaviors with the accuracy of 72.1%. We evaluate our method by using publicly available human action data set. Experimental results shows that our proposed method out performs state of the art methods.
Tasks	Optical Flow Estimation
Published	2019-03-06
URL	http://arxiv.org/abs/1903.02236v1
PDF	http://arxiv.org/pdf/1903.02236v1.pdf
PWC	https://paperswithcode.com/paper/characterizing-human-behaviours-using
Repo
Framework

Matrix Sketching for Secure Collaborative Machine Learning


Title	Matrix Sketching for Secure Collaborative Machine Learning
Authors	Shusen Wang
Abstract	Collaborative machine learning (ML), also known as federated ML, allows participants to jointly train a model without data sharing. To update the model parameters, the central parameter server broadcasts model parameters to the participants, and the participants send ascending directions such as gradients to the server. While data do not leave a participant’s device, the communicated gradients and parameters will leak a participant’s privacy. Prior work proposed attacks that infer participant’s privacy from gradients and parameters, and they showed simple defenses like dropout and differential privacy do not help much. To defend privacy leakage, we propose a method called Double Blind Collaborative Learning (DBCL) which is based on random matrix sketching. The high-level idea is to apply a random transformation to the parameters, data, and gradients in every iteration so that the existing attacks will fail or become less effective. While it improves the security of collaborative ML, DBCL does not increase the computation and communication cost much and does not hurt prediction accuracy at all. DBCL can be potentially applied to decentralized collaborative ML to defend privacy leakage.
Tasks
Published	2019-09-24
URL	https://arxiv.org/abs/1909.11201v2
PDF	https://arxiv.org/pdf/1909.11201v2.pdf
PWC	https://paperswithcode.com/paper/matrix-sketching-for-secure-collaborative
Repo
Framework

Facial Emotion Recognition using Convolutional Neural Networks


Title	Facial Emotion Recognition using Convolutional Neural Networks
Authors	Akash Saravanan, Gurudutt Perichetla, Dr. K. S. Gayathri
Abstract	Facial expression recognition is a topic of great interest in most fields from artificial intelligence and gaming to marketing and healthcare. The goal of this paper is to classify images of human faces into one of seven basic emotions. A number of different models were experimented with, including decision trees and neural networks before arriving at a final Convolutional Neural Network (CNN) model. CNNs work better for image recognition tasks since they are able to capture spacial features of the inputs due to their large number of filters. The proposed model consists of six convolutional layers, two max pooling layers and two fully connected layers. Upon tuning of the various hyperparameters, this model achieved a final accuracy of 0.60.
Tasks	Emotion Recognition, Facial Expression Recognition
Published	2019-10-12
URL	https://arxiv.org/abs/1910.05602v1
PDF	https://arxiv.org/pdf/1910.05602v1.pdf
PWC	https://paperswithcode.com/paper/facial-emotion-recognition-using
Repo
Framework

MetAdapt: Meta-Learned Task-Adaptive Architecture for Few-Shot Classification


Title	MetAdapt: Meta-Learned Task-Adaptive Architecture for Few-Shot Classification
Authors	Sivan Doveh, Eli Schwartz, Chao Xue, Rogerio Feris, Alex Bronstein, Raja Giryes, Leonid Karlinsky
Abstract	Few-Shot Learning (FSL) is a topic of rapidly growing interest. Typically, in FSL a model is trained on a dataset consisting of many small tasks (meta-tasks) and learns to adapt to novel tasks that it will encounter during test time. This is also referred to as meta-learning. Another topic closely related to meta-learning with a lot of interest in the community is Neural Architecture Search (NAS), automatically finding optimal architecture instead of engineering it manually. In this work, we combine these two aspects of meta-learning. So far, meta-learning FSL methods have focused on optimizing parameters of pre-defined network architectures, in order to make them easily adaptable to novel tasks. Moreover, it was observed that, in general, larger architectures perform better than smaller ones up to a certain saturation point (where they start to degrade due to over-fitting). However, little attention has been given to explicitly optimizing the architectures for FSL, nor to an adaptation of the architecture at test time to particular novel tasks. In this work, we propose to employ tools inspired by the Differentiable Neural Architecture Search (D-NAS) literature in order to optimize the architecture for FSL without over-fitting. Additionally, to make the architecture task adaptive, we propose the concept of `MetAdapt Controller’ modules. These modules are added to the model and are meta-trained to predict the optimal network connections for a given novel task. Using the proposed approach we observe state-of-the-art results on two popular few-shot benchmarks: miniImageNet and FC100. \|
Tasks	Few-Shot Learning, Meta-Learning, Neural Architecture Search
Published	2019-12-01
URL	https://arxiv.org/abs/1912.00412v3
PDF	https://arxiv.org/pdf/1912.00412v3.pdf
PWC	https://paperswithcode.com/paper/metadapt-meta-learned-task-adaptive
Repo
Framework

Learning Concave Conditional Likelihood Models for Improved Analysis of Tandem Mass Spectra


Title	Learning Concave Conditional Likelihood Models for Improved Analysis of Tandem Mass Spectra
Authors	John T. Halloran, David M. Rocke
Abstract	The most widely used technology to identify the proteins present in a complex biological sample is tandem mass spectrometry, which quickly produces a large collection of spectra representative of the peptides (i.e., protein subsequences) present in the original sample. In this work, we greatly expand the parameter learning capabilities of a dynamic Bayesian network (DBN) peptide-scoring algorithm, Didea, by deriving emission distributions for which its conditional log-likelihood scoring function remains concave. We show that this class of emission distributions, called Convex Virtual Emissions (CVEs), naturally generalizes the log-sum-exp function while rendering both maximum likelihood estimation and conditional maximum likelihood estimation concave for a wide range of Bayesian networks. Utilizing CVEs in Didea allows efficient learning of a large number of parameters while ensuring global convergence, in stark contrast to Didea’s previous parameter learning framework (which could only learn a single parameter using a costly grid search) and other trainable models (which only ensure convergence to local optima). The newly trained scoring function substantially outperforms the state-of-the-art in both scoring function accuracy and downstream Fisher kernel analysis. Furthermore, we significantly improve Didea’s runtime performance through successive optimizations to its message passing schedule and derive explicit connections between Didea’s new concave score and related MS/MS scoring functions.
Tasks
Published	2019-09-04
URL	https://arxiv.org/abs/1909.02136v1
PDF	https://arxiv.org/pdf/1909.02136v1.pdf
PWC	https://paperswithcode.com/paper/learning-concave-conditional-likelihood-1
Repo
Framework

Object Detection in 20 Years: A Survey


Title	Object Detection in 20 Years: A Survey
Authors	Zhengxia Zou, Zhenwei Shi, Yuhong Guo, Jieping Ye
Abstract	Object detection, as of one the most fundamental and challenging problems in computer vision, has received great attention in recent years. Its development in the past two decades can be regarded as an epitome of computer vision history. If we think of today’s object detection as a technical aesthetics under the power of deep learning, then turning back the clock 20 years we would witness the wisdom of cold weapon era. This paper extensively reviews 400+ papers of object detection in the light of its technical evolution, spanning over a quarter-century’s time (from the 1990s to 2019). A number of topics have been covered in this paper, including the milestone detectors in history, detection datasets, metrics, fundamental building blocks of the detection system, speed up techniques, and the recent state of the art detection methods. This paper also reviews some important detection applications, such as pedestrian detection, face detection, text detection, etc, and makes an in-deep analysis of their challenges as well as technical improvements in recent years.
Tasks	Face Detection, Object Detection, Pedestrian Detection
Published	2019-05-13
URL	https://arxiv.org/abs/1905.05055v2
PDF	https://arxiv.org/pdf/1905.05055v2.pdf
PWC	https://paperswithcode.com/paper/object-detection-in-20-years-a-survey
Repo
Framework

Many-to-Many Voice Conversion with Out-of-Dataset Speaker Support


Title	Many-to-Many Voice Conversion with Out-of-Dataset Speaker Support
Authors	Gokce Keskin, Tyler Lee, Cory Stephenson, Oguz H. Elibol
Abstract	We present a Cycle-GAN based many-to-many voice conversion method that can convert between speakers that are not in the training set. This property is enabled through speaker embeddings generated by a neural network that is jointly trained with the Cycle-GAN. In contrast to prior work in this domain, our method enables conversion between an out-of-dataset speaker and a target speaker in either direction and does not require re-training. Out-of-dataset speaker conversion quality is evaluated using an independently trained speaker identification model, and shows good style conversion characteristics for previously unheard speakers. Subjective tests on human listeners show style conversion quality for in-dataset speakers is comparable to the state-of-the-art baseline model.
Tasks	Speaker Identification, Voice Conversion
Published	2019-04-30
URL	http://arxiv.org/abs/1905.02525v1
PDF	http://arxiv.org/pdf/1905.02525v1.pdf
PWC	https://paperswithcode.com/paper/190502525
Repo
Framework

Contrastive Fairness in Machine Learning


Title	Contrastive Fairness in Machine Learning
Authors	Tapabrata Chakraborti, Arijit Patra, Alison Noble
Abstract	Was it fair that Harry was hired but not Barry? Was it fair that Pam was fired instead of Sam? How can one ensure fairness when an intelligent algorithm takes these decisions instead of a human? How can one ensure that the decisions were taken based on merit and not on protected attributes like race or sex? These are the questions that must be answered now that many decisions in real life can be made through machine learning. However research in fairness of algorithms has focused on the counterfactual questions “what if?” or “why?", whereas in real life most subjective questions of consequence are contrastive: “why this but not that?". We introduce concepts and mathematical tools using causal inference to address contrastive fairness in algorithmic decision-making with illustrative examples.
Tasks	Causal Inference, Decision Making
Published	2019-05-17
URL	https://arxiv.org/abs/1905.07360v4
PDF	https://arxiv.org/pdf/1905.07360v4.pdf
PWC	https://paperswithcode.com/paper/contrastive-fairness-in-machine-learning
Repo
Framework

Landslide Geohazard Assessment With Convolutional Neural Networks Using Sentinel-2 Imagery Data


Title	Landslide Geohazard Assessment With Convolutional Neural Networks Using Sentinel-2 Imagery Data
Authors	Silvia L. Ullo, Maximillian S. Langenkamp, Tuomas P. Oikarinen, Maria P. Del Rosso, Alessandro Sebastianelli, Federica Piccirillo, Stefania Sica
Abstract	In this paper, the authors aim to combine the latest state of the art models in image recognition with the best publicly available satellite images to create a system for landslide risk mitigation. We focus first on landslide detection and further propose a similar system to be used for prediction. Such models are valuable as they could easily be scaled up to provide data for hazard evaluation, as satellite imagery becomes increasingly available. The goal is to use satellite images and correlated data to enrich the public repository of data and guide disaster relief efforts for locating precise areas where landslides have occurred. Different image augmentation methods are used to increase diversity in the chosen dataset and create more robust classification. The resulting outputs are then fed into variants of 3-D convolutional neural networks. A review of the current literature indicates there is no research using CNNs (Convolutional Neural Networks) and freely available satellite imagery for classifying landslide risk. The model has shown to be ultimately able to achieve a significantly better than baseline accuracy.
Tasks	Image Augmentation
Published	2019-06-10
URL	https://arxiv.org/abs/1906.06151v1
PDF	https://arxiv.org/pdf/1906.06151v1.pdf
PWC	https://paperswithcode.com/paper/landslide-geohazard-assessment-with
Repo
Framework

A Hardware-Efficient ADMM-Based SVM Training Algorithm for Edge Computing


Title	A Hardware-Efficient ADMM-Based SVM Training Algorithm for Edge Computing
Authors	Shuo-An Huang, Chia-Hsiang Yang
Abstract	This work demonstrates a hardware-efficient support vector machine (SVM) training algorithm via the alternative direction method of multipliers (ADMM) optimizer. Low-rank approximation is exploited to reduce the dimension of the kernel matrix by employing the Nystr"{o}m method. Verified in four datasets, the proposed ADMM-based training algorithm with rank approximation reduces 32$\times$ of matrix dimension with only 2% drop in inference accuracy. Compared to the conventional sequential minimal optimization (SMO) algorithm, the ADMM-based training algorithm is able to achieve a 9.8$\times$10$^7$ shorter latency for training 2048 samples. Hardware design techniques, including pre-computation and memory sharing, are proposed to reduce the computational complexity by 62% and the memory usage by 60%. As a proof of concept, an epileptic seizure detector chip is designed to demonstrate the effectiveness of the proposed hardware-efficient training algorithm. The chip achieves a 153,310$\times$ higher energy efficiency and a 364$\times$ higher throughput-to-area ratio for SVM training than a high-end CPU. This work provides a promising solution for edge devices which require low-power and real-time training.
Tasks
Published	2019-07-23
URL	https://arxiv.org/abs/1907.09916v1
PDF	https://arxiv.org/pdf/1907.09916v1.pdf
PWC	https://paperswithcode.com/paper/a-hardware-efficient-admm-based-svm-training
Repo
Framework

A High-Efficiency Framework for Constructing Large-Scale Face Parsing Benchmark


Title	A High-Efficiency Framework for Constructing Large-Scale Face Parsing Benchmark
Authors	Yinglu Liu, Hailin Shi, Yue Si, Hao Shen, Xiaobo Wang, Tao Mei
Abstract	Face parsing, which is to assign a semantic label to each pixel in face images, has recently attracted increasing interest due to its huge application potentials. Although many face related fields (e.g., face recognition and face detection) have been well studied for many years, the existing datasets for face parsing are still severely limited in terms of the scale and quality, e.g., the widely used Helen dataset only contains 2,330 images. This is mainly because pixel-level annotation is a high cost and time-consuming work, especially for the facial parts without clear boundaries. The lack of accurate annotated datasets becomes a major obstacle in the progress of face parsing task. It is a feasible way to utilize dense facial landmarks to guide the parsing annotation. However, annotating dense landmarks on human face encounters the same issues as the parsing annotation. To overcome the above problems, in this paper, we develop a high-efficiency framework for face parsing annotation, which considerably simplifies and speeds up the parsing annotation by two consecutive modules. Benefit from the proposed framework, we construct a new Dense Landmark Guided Face Parsing (LaPa) benchmark. It consists of 22,000 face images with large variations in expression, pose, occlusion, etc. Each image is provided with accurate annotation of a 11-category pixel-level label map along with coordinates of 106-point landmarks. To the best of our knowledge, it is currently the largest public dataset for face parsing. To make full use of our LaPa dataset with abundant face shape and boundary priors, we propose a simple yet effective Boundary-Sensitive Parsing Network (BSPNet). Our network is taken as a baseline model on the proposed LaPa dataset, and meanwhile, it achieves the state-of-the-art performance on the Helen dataset without resorting to extra face alignment.
Tasks	Face Alignment, Face Detection, Face Recognition
Published	2019-05-13
URL	https://arxiv.org/abs/1905.04830v1
PDF	https://arxiv.org/pdf/1905.04830v1.pdf
PWC	https://paperswithcode.com/paper/a-high-efficiency-framework-for-constructing
Repo
Framework