Paper Group ANR 1478
Improving Map Re-localization with Deep ‘Movable’ Objects Segmentation on 3D LiDAR Point Clouds. Dual Learning-based Video Coding with Inception Dense Blocks. DeepcomplexMRI: Exploiting deep residual network for fast parallel MR imaging with complex convolution. Recognizing Facial Expressions of Occluded Faces using Convolutional Neural Networks. C …
Improving Map Re-localization with Deep ‘Movable’ Objects Segmentation on 3D LiDAR Point Clouds
Title | Improving Map Re-localization with Deep ‘Movable’ Objects Segmentation on 3D LiDAR Point Clouds |
Authors | Victor Vaquero, Kai Fischer, Francesc Moreno-Noguer, Alberto Sanfeliu, Stefan Milz |
Abstract | Localization and Mapping is an essential component to enable Autonomous Vehicles navigation, and requires an accuracy exceeding that of commercial GPS-based systems. Current odometry and mapping algorithms are able to provide this accurate information. However, the lack of robustness of these algorithms against dynamic obstacles and environmental changes, even for short time periods, forces the generation of new maps on every session without taking advantage of previously obtained ones. In this paper we propose the use of a deep learning architecture to segment movable objects from 3D LiDAR point clouds in order to obtain longer-lasting 3D maps. This will in turn allow for better, faster and more accurate re-localization and trajectoy estimation on subsequent days. We show the effectiveness of our approach in a very dynamic and cluttered scenario, a supermarket parking lot. For that, we record several sequences on different days and compare localization errors with and without our movable objects segmentation method. Results show that we are able to accurately re-locate over a filtered map, consistently reducing trajectory errors between an average of 35.1% with respect to a non-filtered map version and of 47.9% with respect to a standalone map created on the current session. |
Tasks | Autonomous Vehicles |
Published | 2019-10-08 |
URL | https://arxiv.org/abs/1910.03336v1 |
https://arxiv.org/pdf/1910.03336v1.pdf | |
PWC | https://paperswithcode.com/paper/improving-map-re-localization-with-deep |
Repo | |
Framework | |
Dual Learning-based Video Coding with Inception Dense Blocks
Title | Dual Learning-based Video Coding with Inception Dense Blocks |
Authors | Chao Liu, Heming Sun, Junan Chen, Zhengxue Cheng, Masaru Takeuchi, Jiro Katto, Xiaoyang Zeng, Yibo Fan |
Abstract | In this paper, a dual learning-based method in intra coding is introduced for PCS Grand Challenge. This method is mainly composed of two parts: intra prediction and reconstruction filtering. They use different network structures, the neural network-based intra prediction uses the full-connected network to predict the block while the neural network-based reconstruction filtering utilizes the convolutional networks. Different with the previous filtering works, we use a network with more powerful feature extraction capabilities in our reconstruction filtering network. And the filtering unit is the block-level so as to achieve a more accurate filtering compensation. To our best knowledge, among all the learning-based methods, this is the first attempt to combine two different networks in one application, and we achieve the state-of-the-art performance for AI configuration on the HEVC Test sequences. The experimental result shows that our method leads to significant BD-rate saving for provided 8 sequences compared to HM-16.20 baseline (average 10.24% and 3.57% bitrate reductions for all-intra and random-access coding, respectively). For HEVC test sequences, our model also achieved a 9.70% BD-rate saving compared to HM-16.20 baseline for all-intra configuration. |
Tasks | |
Published | 2019-11-22 |
URL | https://arxiv.org/abs/1911.09857v1 |
https://arxiv.org/pdf/1911.09857v1.pdf | |
PWC | https://paperswithcode.com/paper/dual-learning-based-video-coding-with |
Repo | |
Framework | |
DeepcomplexMRI: Exploiting deep residual network for fast parallel MR imaging with complex convolution
Title | DeepcomplexMRI: Exploiting deep residual network for fast parallel MR imaging with complex convolution |
Authors | Shanshan Wang, Huitao Cheng, Leslie Ying, Taohui Xiao, Ziwen Ke, Xin Liu, Hairong Zheng, Dong Liang |
Abstract | This paper proposes a multi-channel image reconstruction method, named DeepcomplexMRI, to accelerate parallel MR imaging with residual complex convolutional neural network. Different from most existing works which rely on the utilization of the coil sensitivities or prior information of predefined transforms, DeepcomplexMRI takes advantage of the availability of a large number of existing multi-channel groudtruth images and uses them as labeled data to train the deep residual convolutional neural network offline. In particular, a complex convolutional network is proposed to take into account the correlation between the real and imaginary parts of MR images. In addition, the k space data consistency is further enforced repeatedly in between layers of the network. The evaluations on in vivo datasets show that the proposed method has the capability to recover the desired multi-channel images. Its comparison with state-of-the-art method also demonstrates that the proposed method can reconstruct the desired MR images more accurately. |
Tasks | Image Reconstruction |
Published | 2019-06-11 |
URL | https://arxiv.org/abs/1906.04359v2 |
https://arxiv.org/pdf/1906.04359v2.pdf | |
PWC | https://paperswithcode.com/paper/deepcomplexmri-exploiting-deep-residual |
Repo | |
Framework | |
Recognizing Facial Expressions of Occluded Faces using Convolutional Neural Networks
Title | Recognizing Facial Expressions of Occluded Faces using Convolutional Neural Networks |
Authors | Mariana-Iuliana Georgescu, Radu Tudor Ionescu |
Abstract | In this paper, we present an approach based on convolutional neural networks (CNNs) for facial expression recognition in a difficult setting with severe occlusions. More specifically, our task is to recognize the facial expression of a person wearing a virtual reality (VR) headset which essentially occludes the upper part of the face. In order to accurately train neural networks for this setting, in which faces are severely occluded, we modify the training examples by intentionally occluding the upper half of the face. This forces the neural networks to focus on the lower part of the face and to obtain better accuracy rates than models trained on the entire faces. Our empirical results on two benchmark data sets, FER+ and AffectNet, show that our CNN models’ predictions on lower-half faces are up to 13% higher than the baseline CNN models trained on entire faces, proving their suitability for the VR setting. Furthermore, our models’ predictions on lower-half faces are no more than 10% under the baseline models’ predictions on full faces, proving that there are enough clues in the lower part of the face to accurately predict facial expressions. |
Tasks | Facial Expression Recognition |
Published | 2019-11-12 |
URL | https://arxiv.org/abs/1911.04852v1 |
https://arxiv.org/pdf/1911.04852v1.pdf | |
PWC | https://paperswithcode.com/paper/recognizing-facial-expressions-of-occluded |
Repo | |
Framework | |
Characterizing Human Behaviours Using Statistical Motion Descriptor
Title | Characterizing Human Behaviours Using Statistical Motion Descriptor |
Authors | Eissa Jaber Alreshidi, Mohammad Bilal |
Abstract | Identifying human behaviors is a challenging research problem due to the complexity and variation of appearances and postures, the variation of camera settings, and view angles. In this paper, we try to address the problem of human behavior identification by introducing a novel motion descriptor based on statistical features. The method first divide the video into N number of temporal segments. Then for each segment, we compute dense optical flow, which provides instantaneous velocity information for all the pixels. We then compute Histogram of Optical Flow (HOOF) weighted by the norm and quantized into 32 bins. We then compute statistical features from the obtained HOOF forming a descriptor vector of 192- dimensions. We then train a non-linear multi-class SVM that classify different human behaviors with the accuracy of 72.1%. We evaluate our method by using publicly available human action data set. Experimental results shows that our proposed method out performs state of the art methods. |
Tasks | Optical Flow Estimation |
Published | 2019-03-06 |
URL | http://arxiv.org/abs/1903.02236v1 |
http://arxiv.org/pdf/1903.02236v1.pdf | |
PWC | https://paperswithcode.com/paper/characterizing-human-behaviours-using |
Repo | |
Framework | |
Matrix Sketching for Secure Collaborative Machine Learning
Title | Matrix Sketching for Secure Collaborative Machine Learning |
Authors | Shusen Wang |
Abstract | Collaborative machine learning (ML), also known as federated ML, allows participants to jointly train a model without data sharing. To update the model parameters, the central parameter server broadcasts model parameters to the participants, and the participants send ascending directions such as gradients to the server. While data do not leave a participant’s device, the communicated gradients and parameters will leak a participant’s privacy. Prior work proposed attacks that infer participant’s privacy from gradients and parameters, and they showed simple defenses like dropout and differential privacy do not help much. To defend privacy leakage, we propose a method called Double Blind Collaborative Learning (DBCL) which is based on random matrix sketching. The high-level idea is to apply a random transformation to the parameters, data, and gradients in every iteration so that the existing attacks will fail or become less effective. While it improves the security of collaborative ML, DBCL does not increase the computation and communication cost much and does not hurt prediction accuracy at all. DBCL can be potentially applied to decentralized collaborative ML to defend privacy leakage. |
Tasks | |
Published | 2019-09-24 |
URL | https://arxiv.org/abs/1909.11201v2 |
https://arxiv.org/pdf/1909.11201v2.pdf | |
PWC | https://paperswithcode.com/paper/matrix-sketching-for-secure-collaborative |
Repo | |
Framework | |
Facial Emotion Recognition using Convolutional Neural Networks
Title | Facial Emotion Recognition using Convolutional Neural Networks |
Authors | Akash Saravanan, Gurudutt Perichetla, Dr. K. S. Gayathri |
Abstract | Facial expression recognition is a topic of great interest in most fields from artificial intelligence and gaming to marketing and healthcare. The goal of this paper is to classify images of human faces into one of seven basic emotions. A number of different models were experimented with, including decision trees and neural networks before arriving at a final Convolutional Neural Network (CNN) model. CNNs work better for image recognition tasks since they are able to capture spacial features of the inputs due to their large number of filters. The proposed model consists of six convolutional layers, two max pooling layers and two fully connected layers. Upon tuning of the various hyperparameters, this model achieved a final accuracy of 0.60. |
Tasks | Emotion Recognition, Facial Expression Recognition |
Published | 2019-10-12 |
URL | https://arxiv.org/abs/1910.05602v1 |
https://arxiv.org/pdf/1910.05602v1.pdf | |
PWC | https://paperswithcode.com/paper/facial-emotion-recognition-using |
Repo | |
Framework | |
MetAdapt: Meta-Learned Task-Adaptive Architecture for Few-Shot Classification
Title | MetAdapt: Meta-Learned Task-Adaptive Architecture for Few-Shot Classification |
Authors | Sivan Doveh, Eli Schwartz, Chao Xue, Rogerio Feris, Alex Bronstein, Raja Giryes, Leonid Karlinsky |
Abstract | Few-Shot Learning (FSL) is a topic of rapidly growing interest. Typically, in FSL a model is trained on a dataset consisting of many small tasks (meta-tasks) and learns to adapt to novel tasks that it will encounter during test time. This is also referred to as meta-learning. Another topic closely related to meta-learning with a lot of interest in the community is Neural Architecture Search (NAS), automatically finding optimal architecture instead of engineering it manually. In this work, we combine these two aspects of meta-learning. So far, meta-learning FSL methods have focused on optimizing parameters of pre-defined network architectures, in order to make them easily adaptable to novel tasks. Moreover, it was observed that, in general, larger architectures perform better than smaller ones up to a certain saturation point (where they start to degrade due to over-fitting). However, little attention has been given to explicitly optimizing the architectures for FSL, nor to an adaptation of the architecture at test time to particular novel tasks. In this work, we propose to employ tools inspired by the Differentiable Neural Architecture Search (D-NAS) literature in order to optimize the architecture for FSL without over-fitting. Additionally, to make the architecture task adaptive, we propose the concept of `MetAdapt Controller’ modules. These modules are added to the model and are meta-trained to predict the optimal network connections for a given novel task. Using the proposed approach we observe state-of-the-art results on two popular few-shot benchmarks: miniImageNet and FC100. | |
Tasks | Few-Shot Learning, Meta-Learning, Neural Architecture Search |
Published | 2019-12-01 |
URL | https://arxiv.org/abs/1912.00412v3 |
https://arxiv.org/pdf/1912.00412v3.pdf | |
PWC | https://paperswithcode.com/paper/metadapt-meta-learned-task-adaptive |
Repo | |
Framework | |
Learning Concave Conditional Likelihood Models for Improved Analysis of Tandem Mass Spectra
Title | Learning Concave Conditional Likelihood Models for Improved Analysis of Tandem Mass Spectra |
Authors | John T. Halloran, David M. Rocke |
Abstract | The most widely used technology to identify the proteins present in a complex biological sample is tandem mass spectrometry, which quickly produces a large collection of spectra representative of the peptides (i.e., protein subsequences) present in the original sample. In this work, we greatly expand the parameter learning capabilities of a dynamic Bayesian network (DBN) peptide-scoring algorithm, Didea, by deriving emission distributions for which its conditional log-likelihood scoring function remains concave. We show that this class of emission distributions, called Convex Virtual Emissions (CVEs), naturally generalizes the log-sum-exp function while rendering both maximum likelihood estimation and conditional maximum likelihood estimation concave for a wide range of Bayesian networks. Utilizing CVEs in Didea allows efficient learning of a large number of parameters while ensuring global convergence, in stark contrast to Didea’s previous parameter learning framework (which could only learn a single parameter using a costly grid search) and other trainable models (which only ensure convergence to local optima). The newly trained scoring function substantially outperforms the state-of-the-art in both scoring function accuracy and downstream Fisher kernel analysis. Furthermore, we significantly improve Didea’s runtime performance through successive optimizations to its message passing schedule and derive explicit connections between Didea’s new concave score and related MS/MS scoring functions. |
Tasks | |
Published | 2019-09-04 |
URL | https://arxiv.org/abs/1909.02136v1 |
https://arxiv.org/pdf/1909.02136v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-concave-conditional-likelihood-1 |
Repo | |
Framework | |
Object Detection in 20 Years: A Survey
Title | Object Detection in 20 Years: A Survey |
Authors | Zhengxia Zou, Zhenwei Shi, Yuhong Guo, Jieping Ye |
Abstract | Object detection, as of one the most fundamental and challenging problems in computer vision, has received great attention in recent years. Its development in the past two decades can be regarded as an epitome of computer vision history. If we think of today’s object detection as a technical aesthetics under the power of deep learning, then turning back the clock 20 years we would witness the wisdom of cold weapon era. This paper extensively reviews 400+ papers of object detection in the light of its technical evolution, spanning over a quarter-century’s time (from the 1990s to 2019). A number of topics have been covered in this paper, including the milestone detectors in history, detection datasets, metrics, fundamental building blocks of the detection system, speed up techniques, and the recent state of the art detection methods. This paper also reviews some important detection applications, such as pedestrian detection, face detection, text detection, etc, and makes an in-deep analysis of their challenges as well as technical improvements in recent years. |
Tasks | Face Detection, Object Detection, Pedestrian Detection |
Published | 2019-05-13 |
URL | https://arxiv.org/abs/1905.05055v2 |
https://arxiv.org/pdf/1905.05055v2.pdf | |
PWC | https://paperswithcode.com/paper/object-detection-in-20-years-a-survey |
Repo | |
Framework | |
Many-to-Many Voice Conversion with Out-of-Dataset Speaker Support
Title | Many-to-Many Voice Conversion with Out-of-Dataset Speaker Support |
Authors | Gokce Keskin, Tyler Lee, Cory Stephenson, Oguz H. Elibol |
Abstract | We present a Cycle-GAN based many-to-many voice conversion method that can convert between speakers that are not in the training set. This property is enabled through speaker embeddings generated by a neural network that is jointly trained with the Cycle-GAN. In contrast to prior work in this domain, our method enables conversion between an out-of-dataset speaker and a target speaker in either direction and does not require re-training. Out-of-dataset speaker conversion quality is evaluated using an independently trained speaker identification model, and shows good style conversion characteristics for previously unheard speakers. Subjective tests on human listeners show style conversion quality for in-dataset speakers is comparable to the state-of-the-art baseline model. |
Tasks | Speaker Identification, Voice Conversion |
Published | 2019-04-30 |
URL | http://arxiv.org/abs/1905.02525v1 |
http://arxiv.org/pdf/1905.02525v1.pdf | |
PWC | https://paperswithcode.com/paper/190502525 |
Repo | |
Framework | |
Contrastive Fairness in Machine Learning
Title | Contrastive Fairness in Machine Learning |
Authors | Tapabrata Chakraborti, Arijit Patra, Alison Noble |
Abstract | Was it fair that Harry was hired but not Barry? Was it fair that Pam was fired instead of Sam? How can one ensure fairness when an intelligent algorithm takes these decisions instead of a human? How can one ensure that the decisions were taken based on merit and not on protected attributes like race or sex? These are the questions that must be answered now that many decisions in real life can be made through machine learning. However research in fairness of algorithms has focused on the counterfactual questions “what if?” or “why?", whereas in real life most subjective questions of consequence are contrastive: “why this but not that?". We introduce concepts and mathematical tools using causal inference to address contrastive fairness in algorithmic decision-making with illustrative examples. |
Tasks | Causal Inference, Decision Making |
Published | 2019-05-17 |
URL | https://arxiv.org/abs/1905.07360v4 |
https://arxiv.org/pdf/1905.07360v4.pdf | |
PWC | https://paperswithcode.com/paper/contrastive-fairness-in-machine-learning |
Repo | |
Framework | |
Landslide Geohazard Assessment With Convolutional Neural Networks Using Sentinel-2 Imagery Data
Title | Landslide Geohazard Assessment With Convolutional Neural Networks Using Sentinel-2 Imagery Data |
Authors | Silvia L. Ullo, Maximillian S. Langenkamp, Tuomas P. Oikarinen, Maria P. Del Rosso, Alessandro Sebastianelli, Federica Piccirillo, Stefania Sica |
Abstract | In this paper, the authors aim to combine the latest state of the art models in image recognition with the best publicly available satellite images to create a system for landslide risk mitigation. We focus first on landslide detection and further propose a similar system to be used for prediction. Such models are valuable as they could easily be scaled up to provide data for hazard evaluation, as satellite imagery becomes increasingly available. The goal is to use satellite images and correlated data to enrich the public repository of data and guide disaster relief efforts for locating precise areas where landslides have occurred. Different image augmentation methods are used to increase diversity in the chosen dataset and create more robust classification. The resulting outputs are then fed into variants of 3-D convolutional neural networks. A review of the current literature indicates there is no research using CNNs (Convolutional Neural Networks) and freely available satellite imagery for classifying landslide risk. The model has shown to be ultimately able to achieve a significantly better than baseline accuracy. |
Tasks | Image Augmentation |
Published | 2019-06-10 |
URL | https://arxiv.org/abs/1906.06151v1 |
https://arxiv.org/pdf/1906.06151v1.pdf | |
PWC | https://paperswithcode.com/paper/landslide-geohazard-assessment-with |
Repo | |
Framework | |
A Hardware-Efficient ADMM-Based SVM Training Algorithm for Edge Computing
Title | A Hardware-Efficient ADMM-Based SVM Training Algorithm for Edge Computing |
Authors | Shuo-An Huang, Chia-Hsiang Yang |
Abstract | This work demonstrates a hardware-efficient support vector machine (SVM) training algorithm via the alternative direction method of multipliers (ADMM) optimizer. Low-rank approximation is exploited to reduce the dimension of the kernel matrix by employing the Nystr"{o}m method. Verified in four datasets, the proposed ADMM-based training algorithm with rank approximation reduces 32$\times$ of matrix dimension with only 2% drop in inference accuracy. Compared to the conventional sequential minimal optimization (SMO) algorithm, the ADMM-based training algorithm is able to achieve a 9.8$\times$10$^7$ shorter latency for training 2048 samples. Hardware design techniques, including pre-computation and memory sharing, are proposed to reduce the computational complexity by 62% and the memory usage by 60%. As a proof of concept, an epileptic seizure detector chip is designed to demonstrate the effectiveness of the proposed hardware-efficient training algorithm. The chip achieves a 153,310$\times$ higher energy efficiency and a 364$\times$ higher throughput-to-area ratio for SVM training than a high-end CPU. This work provides a promising solution for edge devices which require low-power and real-time training. |
Tasks | |
Published | 2019-07-23 |
URL | https://arxiv.org/abs/1907.09916v1 |
https://arxiv.org/pdf/1907.09916v1.pdf | |
PWC | https://paperswithcode.com/paper/a-hardware-efficient-admm-based-svm-training |
Repo | |
Framework | |
A High-Efficiency Framework for Constructing Large-Scale Face Parsing Benchmark
Title | A High-Efficiency Framework for Constructing Large-Scale Face Parsing Benchmark |
Authors | Yinglu Liu, Hailin Shi, Yue Si, Hao Shen, Xiaobo Wang, Tao Mei |
Abstract | Face parsing, which is to assign a semantic label to each pixel in face images, has recently attracted increasing interest due to its huge application potentials. Although many face related fields (e.g., face recognition and face detection) have been well studied for many years, the existing datasets for face parsing are still severely limited in terms of the scale and quality, e.g., the widely used Helen dataset only contains 2,330 images. This is mainly because pixel-level annotation is a high cost and time-consuming work, especially for the facial parts without clear boundaries. The lack of accurate annotated datasets becomes a major obstacle in the progress of face parsing task. It is a feasible way to utilize dense facial landmarks to guide the parsing annotation. However, annotating dense landmarks on human face encounters the same issues as the parsing annotation. To overcome the above problems, in this paper, we develop a high-efficiency framework for face parsing annotation, which considerably simplifies and speeds up the parsing annotation by two consecutive modules. Benefit from the proposed framework, we construct a new Dense Landmark Guided Face Parsing (LaPa) benchmark. It consists of 22,000 face images with large variations in expression, pose, occlusion, etc. Each image is provided with accurate annotation of a 11-category pixel-level label map along with coordinates of 106-point landmarks. To the best of our knowledge, it is currently the largest public dataset for face parsing. To make full use of our LaPa dataset with abundant face shape and boundary priors, we propose a simple yet effective Boundary-Sensitive Parsing Network (BSPNet). Our network is taken as a baseline model on the proposed LaPa dataset, and meanwhile, it achieves the state-of-the-art performance on the Helen dataset without resorting to extra face alignment. |
Tasks | Face Alignment, Face Detection, Face Recognition |
Published | 2019-05-13 |
URL | https://arxiv.org/abs/1905.04830v1 |
https://arxiv.org/pdf/1905.04830v1.pdf | |
PWC | https://paperswithcode.com/paper/a-high-efficiency-framework-for-constructing |
Repo | |
Framework | |