October 20, 2019

3284 words 16 mins read

Paper Group ANR 35

Presentation Attack Detection for Cadaver Iris. Hallucinating Dense Optical Flow from Sparse Lidar for Autonomous Vehicles. Omnidirectional CNN for Visual Place Recognition and Navigation. Testing Changes in Communities for the Stochastic Block Model. Fitting New Speakers Based on a Short Untranscribed Sample. Predicting Oral Disintegrating Tablet …

Presentation Attack Detection for Cadaver Iris


Title	Presentation Attack Detection for Cadaver Iris
Authors	Mateusz Trokielewicz, Adam Czajka, Piotr Maciejewicz
Abstract	This paper presents a deep-learning-based method for iris presentation attack detection (PAD) when iris images are obtained from deceased people. Our approach is based on the VGG-16 architecture fine-tuned with a database of 574 post-mortem, near-infrared iris images from the Warsaw-BioBase-PostMortem-Iris-v1 database, complemented by a dataset of 256 images of live irises, collected within the scope of this study. Experiments described in this paper show that our approach is able to correctly classify iris images as either representing a live or a dead eye in almost 99% of the trials, averaged over 20 subject-disjoint, train/test splits. We also show that the post-mortem iris detection accuracy increases as time since death elapses, and that we are able to construct a classification system with APCER=0%@BPCER=1% (Attack Presentation and Bona Fide Presentation Classification Error Rates, respectively) when only post-mortem samples collected at least 16 hours post-mortem are considered. Since acquisitions of ante- and post-mortem samples differ significantly, we applied countermeasures to minimize bias in our classification methodology caused by image properties that are not related to the PAD. This included using the same iris sensor in collection of ante- and post-mortem samples, and analysis of class activation maps to ensure that discriminant iris regions utilized by our classifier are related to properties of the eye, and not to those of the acquisition protocol. This paper offers the first known to us PAD method in a post-mortem setting, together with an explanation of the decisions made by the convolutional neural network. Along with the paper we offer source codes, weights of the trained network, and a dataset of live iris images to facilitate reproducibility and further research.
Tasks
Published	2018-07-11
URL	http://arxiv.org/abs/1807.04058v2
PDF	http://arxiv.org/pdf/1807.04058v2.pdf
PWC	https://paperswithcode.com/paper/presentation-attack-detection-for-cadaver
Repo
Framework

Hallucinating Dense Optical Flow from Sparse Lidar for Autonomous Vehicles


Title	Hallucinating Dense Optical Flow from Sparse Lidar for Autonomous Vehicles
Authors	Victor Vaquero, Alberto Sanfeliu, Francesc Moreno-Noguer
Abstract	In this paper we propose a novel approach to estimate dense optical flow from sparse lidar data acquired on an autonomous vehicle. This is intended to be used as a drop-in replacement of any image-based optical flow system when images are not reliable due to e.g. adverse weather conditions or at night. In order to infer high resolution 2D flows from discrete range data we devise a three-block architecture of multiscale filters that combines multiple intermediate objectives, both in the lidar and image domain. To train this network we introduce a dataset with approximately 20K lidar samples of the Kitti dataset which we have augmented with a pseudo ground-truth image-based optical flow computed using FlowNet2. We demonstrate the effectiveness of our approach on Kitti, and show that despite using the low-resolution and sparse measurements of the lidar, we can regress dense optical flow maps which are at par with those estimated with image-based methods.
Tasks	Autonomous Vehicles, Optical Flow Estimation
Published	2018-08-30
URL	http://arxiv.org/abs/1808.10542v1
PDF	http://arxiv.org/pdf/1808.10542v1.pdf
PWC	https://paperswithcode.com/paper/hallucinating-dense-optical-flow-from-sparse
Repo
Framework


Title	Omnidirectional CNN for Visual Place Recognition and Navigation
Authors	Tsun-Hsuan Wang, Hung-Jui Huang, Juan-Ting Lin, Chan-Wei Hu, Kuo-Hao Zeng, Min Sun
Abstract	$ $Visual place recognition is challenging, especially when only a few place exemplars are given. To mitigate the challenge, we consider place recognition method using omnidirectional cameras and propose a novel Omnidirectional Convolutional Neural Network (O-CNN) to handle severe camera pose variation. Given a visual input, the task of the O-CNN is not to retrieve the matched place exemplar, but to retrieve the closest place exemplar and estimate the relative distance between the input and the closest place. With the ability to estimate relative distance, a heuristic policy is proposed to navigate a robot to the retrieved closest place. Note that the network is designed to take advantage of the omnidirectional view by incorporating circular padding and rotation invariance. To train a powerful O-CNN, we build a virtual world for training on a large scale. We also propose a continuous lifted structured feature embedding loss to learn the concept of distance efficiently. Finally, our experimental results confirm that our method achieves state-of-the-art accuracy and speed with both the virtual world and real-world datasets.
Tasks	Visual Place Recognition
Published	2018-03-12
URL	http://arxiv.org/abs/1803.04228v1
PDF	http://arxiv.org/pdf/1803.04228v1.pdf
PWC	https://paperswithcode.com/paper/omnidirectional-cnn-for-visual-place
Repo
Framework

Testing Changes in Communities for the Stochastic Block Model


Title	Testing Changes in Communities for the Stochastic Block Model
Authors	Aditya Gangrade, Praveen Venkatesh, Bobak Nazer, Venkatesh Saligrama
Abstract	We propose and analyze the problems of \textit{community goodness-of-fit and two-sample testing} for stochastic block models (SBM), where changes arise due to modification in community memberships of nodes. Motivated by practical applications, we consider the challenging sparse regime, where expected node degrees are constant, and the inter-community mean degree ($b$) scales proportionally to intra-community mean degree ($a$). Prior work has sharply characterized partial or full community recovery in terms of a “signal-to-noise ratio” ($\mathrm{SNR}$) based on $a$ and $b$. For both problems, we propose computationally-efficient tests that can succeed far beyond the regime where recovery of community membership is even possible. Overall, for large changes, $s \gg \sqrt{n}$, we need only $\mathrm{SNR}= O(1)$ whereas a na"ive test based on community recovery with $O(s)$ errors requires $\mathrm{SNR}= \Theta(\log n)$. Conversely, in the small change regime, $s \ll \sqrt{n}$, via an information-theoretic lower bound, we show that, surprisingly, no algorithm can do better than the na"ive algorithm that first estimates the community up to $O(s)$ errors and then detects changes. We validate these phenomena numerically on SBMs and on real-world datasets as well as Markov Random Fields where we only observe node data rather than the existence of links.
Tasks
Published	2018-11-29
URL	https://arxiv.org/abs/1812.00769v3
PDF	https://arxiv.org/pdf/1812.00769v3.pdf
PWC	https://paperswithcode.com/paper/testing-changes-in-communities-for-the
Repo
Framework

Fitting New Speakers Based on a Short Untranscribed Sample


Title	Fitting New Speakers Based on a Short Untranscribed Sample
Authors	Eliya Nachmani, Adam Polyak, Yaniv Taigman, Lior Wolf
Abstract	Learning-based Text To Speech systems have the potential to generalize from one speaker to the next and thus require a relatively short sample of any new voice. However, this promise is currently largely unrealized. We present a method that is designed to capture a new speaker from a short untranscribed audio sample. This is done by employing an additional network that given an audio sample, places the speaker in the embedding space. This network is trained as part of the speech synthesis system using various consistency losses. Our results demonstrate a greatly improved performance on both the dataset speakers, and, more importantly, when fitting new voices, even from very short samples.
Tasks	Speech Synthesis
Published	2018-02-20
URL	http://arxiv.org/abs/1802.06984v1
PDF	http://arxiv.org/pdf/1802.06984v1.pdf
PWC	https://paperswithcode.com/paper/fitting-new-speakers-based-on-a-short
Repo
Framework

Predicting Oral Disintegrating Tablet Formulations by Neural Network Techniques


Title	Predicting Oral Disintegrating Tablet Formulations by Neural Network Techniques
Authors	Run Han, Yilong Yang, Xiaoshan Li, Defang Ouyang
Abstract	Oral Disintegrating Tablets (ODTs) is a novel dosage form that can be dissolved on the tongue within 3min or less especially for geriatric and pediatric patients. Current ODT formulation studies usually rely on the personal experience of pharmaceutical experts and trial-and-error in the laboratory, which is inefficient and time-consuming. The aim of current research was to establish the prediction model of ODT formulations with direct compression process by Artificial Neural Network (ANN) and Deep Neural Network (DNN) techniques. 145 formulation data were extracted from Web of Science. All data sets were divided into three parts: training set (105 data), validation set (20) and testing set (20). ANN and DNN were compared for the prediction of the disintegrating time. The accuracy of the ANN model has reached 85.60%, 80.00% and 75.00% on the training set, validation set and testing set respectively, whereas that of the DNN model was 85.60%, 85.00% and 80.00%, respectively. Compared with the ANN, DNN showed the better prediction for ODT formulations. It is the first time that deep neural network with the improved dataset selection algorithm is applied to formulation prediction on small data. The proposed predictive approach could evaluate the critical parameters about quality control of formulation, and guide research and process development. The implementation of this prediction model could effectively reduce drug product development timeline and material usage, and proactively facilitate the development of a robust drug product.
Tasks
Published	2018-03-14
URL	http://arxiv.org/abs/1803.05339v1
PDF	http://arxiv.org/pdf/1803.05339v1.pdf
PWC	https://paperswithcode.com/paper/predicting-oral-disintegrating-tablet
Repo
Framework

Charades-Ego: A Large-Scale Dataset of Paired Third and First Person Videos


Title	Charades-Ego: A Large-Scale Dataset of Paired Third and First Person Videos
Authors	Gunnar A. Sigurdsson, Abhinav Gupta, Cordelia Schmid, Ali Farhadi, Karteek Alahari
Abstract	In Actor and Observer we introduced a dataset linking the first and third-person video understanding domains, the Charades-Ego Dataset. In this paper we describe the egocentric aspect of the dataset and present annotations for Charades-Ego with 68,536 activity instances in 68.8 hours of first and third-person video, making it one of the largest and most diverse egocentric datasets available. Charades-Ego furthermore shares activity classes, scripts, and methodology with the Charades dataset, that consist of additional 82.3 hours of third-person video with 66,500 activity instances. Charades-Ego has temporal annotations and textual descriptions, making it suitable for egocentric video classification, localization, captioning, and new tasks utilizing the cross-modal nature of the data.
Tasks	Video Classification, Video Understanding
Published	2018-04-25
URL	http://arxiv.org/abs/1804.09626v2
PDF	http://arxiv.org/pdf/1804.09626v2.pdf
PWC	https://paperswithcode.com/paper/charades-ego-a-large-scale-dataset-of-paired
Repo
Framework

Simultaneous Compression and Quantization: A Joint Approach for Efficient Unsupervised Hashing


Title	Simultaneous Compression and Quantization: A Joint Approach for Efficient Unsupervised Hashing
Authors	Tuan Hoang, Thanh-Toan Do, Huu Le, Dang-Khoa Le-Tan, Ngai-Man Cheung
Abstract	For unsupervised data-dependent hashing, the two most important requirements are to preserve similarity in the low-dimensional feature space and to minimize the binary quantization loss. A well-established hashing approach is Iterative Quantization (ITQ), which addresses these two requirements in separate steps. In this paper, we revisit the ITQ approach and propose novel formulations and algorithms to the problem. Specifically, we propose a novel approach, named Simultaneous Compression and Quantization (SCQ), to jointly learn to compress (reduce dimensionality) and binarize input data in a single formulation under strict orthogonal constraint. With this approach, we introduce a loss function and its relaxed version, termed Orthonormal Encoder (OnE) and Orthogonal Encoder (OgE) respectively, which involve challenging binary and orthogonal constraints. We propose to attack the optimization using novel algorithms based on recent advances in cyclic coordinate descent approach. Comprehensive experiments on unsupervised image retrieval demonstrate that our proposed methods consistently outperform other state-of-the-art hashing methods. Notably, our proposed methods outperform recent deep neural networks and GAN based hashing in accuracy, while being very computationally-efficient.
Tasks	Image Retrieval, Quantization
Published	2018-02-19
URL	https://arxiv.org/abs/1802.06645v3
PDF	https://arxiv.org/pdf/1802.06645v3.pdf
PWC	https://paperswithcode.com/paper/simultaneous-compression-and-quantization-a
Repo
Framework

A novel active learning framework for classification: using weighted rank aggregation to achieve multiple query criteria


Title	A novel active learning framework for classification: using weighted rank aggregation to achieve multiple query criteria
Authors	Yu Zhao, Zhenhui Shi, Jingyang Zhang, Dong Chen, Lixu Gu
Abstract	Multiple query criteria active learning (MQCAL) methods have a higher potential performance than conventional active learning methods in which only one criterion is deployed for sample selection. A central issue related to MQCAL methods concerns the development of an integration criteria strategy (ICS) that makes full use of all criteria. The conventional ICS adopted in relevant research all facilitate the desired effects, but several limitations still must be addressed. For instance, some of the strategies are not sufficiently scalable during the design process, and the number and type of criteria involved are dictated. Thus, it is challenging for the user to integrate other criteria into the original process unless modifications are made to the algorithm. Other strategies are too dependent on empirical parameters, which can only be acquired by experience or cross-validation and thus lack generality; additionally, these strategies are counter to the intention of active learning, as samples need to be labeled in the validation set before the active learning process can begin. To address these limitations, we propose a novel MQCAL method for classification tasks that employs a third strategy via weighted rank aggregation. The proposed method serves as a heuristic means to select high-value samples of high scalability and generality and is implemented through a three-step process: (1) the transformation of the sample selection to sample ranking and scoring, (2) the computation of the self-adaptive weights of each criterion, and (3) the weighted aggregation of each sample rank list. Ultimately, the sample at the top of the aggregated ranking list is the most comprehensively valuable and must be labeled. Several experiments generating 257 wins, 194 ties and 49 losses against other state-of-the-art MQCALs are conducted to verify that the proposed method can achieve superior results.
Tasks	Active Learning
Published	2018-09-27
URL	http://arxiv.org/abs/1809.10565v1
PDF	http://arxiv.org/pdf/1809.10565v1.pdf
PWC	https://paperswithcode.com/paper/a-novel-active-learning-framework-for
Repo
Framework

Deep Multi-Structural Shape Analysis: Application to Neuroanatomy


Title	Deep Multi-Structural Shape Analysis: Application to Neuroanatomy
Authors	Benjamin Gutierrez-Becker, Christian Wachinger
Abstract	We propose a deep neural network for supervised learning on neuroanatomical shapes. The network directly operates on raw point clouds without the need for mesh processing or the identification of point correspondences, as spatial transformer networks map the data to a canonical space. Instead of relying on hand-crafted shape descriptors, an optimal representation is learned in the end-to-end training stage of the network. The proposed network consists of multiple branches, so that features for multiple structures are learned simultaneously. We demonstrate the performance of our method on two applications: (i) the prediction of Alzheimer’s disease and mild cognitive impairment and (ii) the regression of the brain age. Finally, we visualize the important parts of the anatomy for the prediction by adapting the occlusion method to point clouds.
Tasks
Published	2018-06-04
URL	http://arxiv.org/abs/1806.01069v1
PDF	http://arxiv.org/pdf/1806.01069v1.pdf
PWC	https://paperswithcode.com/paper/deep-multi-structural-shape-analysis
Repo
Framework

Image-to-GPS Verification Through A Bottom-Up Pattern Matching Network


Title	Image-to-GPS Verification Through A Bottom-Up Pattern Matching Network
Authors	Jiaxin Cheng, Yue Wu, Wael Abd-Almageed, Prem Natarajan
Abstract	The image-to-GPS verification problem asks whether a given image is taken at a claimed GPS location. In this paper, we treat it as an image verification problem – whether a query image is taken at the same place as a reference image retrieved at the claimed GPS location. We make three major contributions: 1) we propose a novel custom bottom-up pattern matching (BUPM) deep neural network solution; 2) we demonstrate that the verification can be directly done by cross-checking a perspective-looking query image and a panorama reference image, and 3) we collect and clean a dataset of 30K pairs query and reference. Our experimental results show that the proposed BUPM solution outperforms the state-of-the-art solutions in terms of both verification and localization.
Tasks	Image-To-Gps Verification
Published	2018-11-18
URL	http://arxiv.org/abs/1811.07288v1
PDF	http://arxiv.org/pdf/1811.07288v1.pdf
PWC	https://paperswithcode.com/paper/image-to-gps-verification-through-a-bottom-up
Repo
Framework

Distribution Aware Active Learning


Title	Distribution Aware Active Learning
Authors	Arash Mehrjou, Mehran Khodabandeh, Greg Mori
Abstract	Discriminative learning machines often need a large set of labeled samples for training. Active learning (AL) settings assume that the learner has the freedom to ask an oracle to label its desired samples. Traditional AL algorithms heuristically choose query samples about which the current learner is uncertain. This strategy does not make good use of the structure of the dataset at hand and is prone to be misguided by outliers. To alleviate this problem, we propose to distill the structural information into a probabilistic generative model which acts as a \emph{teacher} in our model. The active \emph{learner} uses this information effectively at each cycle of active learning. The proposed method is generic and does not depend on the type of learner and teacher. We then suggest a query criterion for active learning that is aware of distribution of data and is more robust against outliers. Our method can be combined readily with several other query criteria for active learning. We provide the formulation and empirically show our idea via toy and real examples.
Tasks	Active Learning
Published	2018-05-23
URL	http://arxiv.org/abs/1805.08916v1
PDF	http://arxiv.org/pdf/1805.08916v1.pdf
PWC	https://paperswithcode.com/paper/distribution-aware-active-learning
Repo
Framework

Learning Descriptor Networks for 3D Shape Synthesis and Analysis


Title	Learning Descriptor Networks for 3D Shape Synthesis and Analysis
Authors	Jianwen Xie, Zilong Zheng, Ruiqi Gao, Wenguan Wang, Song-Chun Zhu, Ying Nian Wu
Abstract	This paper proposes a 3D shape descriptor network, which is a deep convolutional energy-based model, for modeling volumetric shape patterns. The maximum likelihood training of the model follows an “analysis by synthesis” scheme and can be interpreted as a mode seeking and mode shifting process. The model can synthesize 3D shape patterns by sampling from the probability distribution via MCMC such as Langevin dynamics. The model can be used to train a 3D generator network via MCMC teaching. The conditional version of the 3D shape descriptor net can be used for 3D object recovery and 3D object super-resolution. Experiments demonstrate that the proposed model can generate realistic 3D shape patterns and can be useful for 3D shape analysis.
Tasks	3D Object Super-Resolution, 3D Shape Analysis, Super-Resolution
Published	2018-04-02
URL	http://arxiv.org/abs/1804.00586v1
PDF	http://arxiv.org/pdf/1804.00586v1.pdf
PWC	https://paperswithcode.com/paper/learning-descriptor-networks-for-3d-shape
Repo
Framework

Shapes Characterization on Address Event Representation Using Histograms of Oriented Events and an Extended LBP Approach


Title	Shapes Characterization on Address Event Representation Using Histograms of Oriented Events and an Extended LBP Approach
Authors	Pablo Negri
Abstract	Address Event Representation is a thriving technology that could change digital image processing paradigm. This paper proposes a methodology to characterize the shape of objects using the streaming of asynchronous events. A new descriptor that enhances spikes connectivity is associated with two oriented histogram based representations. This paper uses these features to develop both a non-supervised and a supervised multi-classification framework to recognize poker symbols from the Poker-DVS public dataset. The aforementioned framework, which uses a very limited number of events and a simple class modeling, yields results that challenge more sophisticated methodologies proposed by the state of the art. A feature family based on context shapes is applied to the more challenging 2015 Poker-DVS dataset with a supervised classifier obtaining an accuracy of 98.5 %. The system is also applied to the MNIST-DVS dataset yielding an accuracy of 94.6 % and 96.3 % on digit recognition, for scales 4 and 8 respectively.
Tasks
Published	2018-02-09
URL	http://arxiv.org/abs/1802.03327v1
PDF	http://arxiv.org/pdf/1802.03327v1.pdf
PWC	https://paperswithcode.com/paper/shapes-characterization-on-address-event
Repo
Framework

Discriminative Feature Learning for Unsupervised Video Summarization


Title	Discriminative Feature Learning for Unsupervised Video Summarization
Authors	Yunjae Jung, Donghyeon Cho, Dahun Kim, Sanghyun Woo, In So Kweon
Abstract	In this paper, we address the problem of unsupervised video summarization that automatically extracts key-shots from an input video. Specifically, we tackle two critical issues based on our empirical observations: (i) Ineffective feature learning due to flat distributions of output importance scores for each frame, and (ii) training difficulty when dealing with long-length video inputs. To alleviate the first problem, we propose a simple yet effective regularization loss term called variance loss. The proposed variance loss allows a network to predict output scores for each frame with high discrepancy which enables effective feature learning and significantly improves model performance. For the second problem, we design a novel two-stream network named Chunk and Stride Network (CSNet) that utilizes local (chunk) and global (stride) temporal view on the video features. Our CSNet gives better summarization results for long-length videos compared to the existing methods. In addition, we introduce an attention mechanism to handle the dynamic information in videos. We demonstrate the effectiveness of the proposed methods by conducting extensive ablation studies and show that our final model achieves new state-of-the-art results on two benchmark datasets.
Tasks	Supervised Video Summarization, Unsupervised Video Summarization, Video Summarization
Published	2018-11-24
URL	http://arxiv.org/abs/1811.09791v1
PDF	http://arxiv.org/pdf/1811.09791v1.pdf
PWC	https://paperswithcode.com/paper/discriminative-feature-learning-for
Repo
Framework