Paper Group ANR 35
Presentation Attack Detection for Cadaver Iris. Hallucinating Dense Optical Flow from Sparse Lidar for Autonomous Vehicles. Omnidirectional CNN for Visual Place Recognition and Navigation. Testing Changes in Communities for the Stochastic Block Model. Fitting New Speakers Based on a Short Untranscribed Sample. Predicting Oral Disintegrating Tablet …
Presentation Attack Detection for Cadaver Iris
Title | Presentation Attack Detection for Cadaver Iris |
Authors | Mateusz Trokielewicz, Adam Czajka, Piotr Maciejewicz |
Abstract | This paper presents a deep-learning-based method for iris presentation attack detection (PAD) when iris images are obtained from deceased people. Our approach is based on the VGG-16 architecture fine-tuned with a database of 574 post-mortem, near-infrared iris images from the Warsaw-BioBase-PostMortem-Iris-v1 database, complemented by a dataset of 256 images of live irises, collected within the scope of this study. Experiments described in this paper show that our approach is able to correctly classify iris images as either representing a live or a dead eye in almost 99% of the trials, averaged over 20 subject-disjoint, train/test splits. We also show that the post-mortem iris detection accuracy increases as time since death elapses, and that we are able to construct a classification system with APCER=0%@BPCER=1% (Attack Presentation and Bona Fide Presentation Classification Error Rates, respectively) when only post-mortem samples collected at least 16 hours post-mortem are considered. Since acquisitions of ante- and post-mortem samples differ significantly, we applied countermeasures to minimize bias in our classification methodology caused by image properties that are not related to the PAD. This included using the same iris sensor in collection of ante- and post-mortem samples, and analysis of class activation maps to ensure that discriminant iris regions utilized by our classifier are related to properties of the eye, and not to those of the acquisition protocol. This paper offers the first known to us PAD method in a post-mortem setting, together with an explanation of the decisions made by the convolutional neural network. Along with the paper we offer source codes, weights of the trained network, and a dataset of live iris images to facilitate reproducibility and further research. |
Tasks | |
Published | 2018-07-11 |
URL | http://arxiv.org/abs/1807.04058v2 |
http://arxiv.org/pdf/1807.04058v2.pdf | |
PWC | https://paperswithcode.com/paper/presentation-attack-detection-for-cadaver |
Repo | |
Framework | |
Hallucinating Dense Optical Flow from Sparse Lidar for Autonomous Vehicles
Title | Hallucinating Dense Optical Flow from Sparse Lidar for Autonomous Vehicles |
Authors | Victor Vaquero, Alberto Sanfeliu, Francesc Moreno-Noguer |
Abstract | In this paper we propose a novel approach to estimate dense optical flow from sparse lidar data acquired on an autonomous vehicle. This is intended to be used as a drop-in replacement of any image-based optical flow system when images are not reliable due to e.g. adverse weather conditions or at night. In order to infer high resolution 2D flows from discrete range data we devise a three-block architecture of multiscale filters that combines multiple intermediate objectives, both in the lidar and image domain. To train this network we introduce a dataset with approximately 20K lidar samples of the Kitti dataset which we have augmented with a pseudo ground-truth image-based optical flow computed using FlowNet2. We demonstrate the effectiveness of our approach on Kitti, and show that despite using the low-resolution and sparse measurements of the lidar, we can regress dense optical flow maps which are at par with those estimated with image-based methods. |
Tasks | Autonomous Vehicles, Optical Flow Estimation |
Published | 2018-08-30 |
URL | http://arxiv.org/abs/1808.10542v1 |
http://arxiv.org/pdf/1808.10542v1.pdf | |
PWC | https://paperswithcode.com/paper/hallucinating-dense-optical-flow-from-sparse |
Repo | |
Framework | |
Omnidirectional CNN for Visual Place Recognition and Navigation
Title | Omnidirectional CNN for Visual Place Recognition and Navigation |
Authors | Tsun-Hsuan Wang, Hung-Jui Huang, Juan-Ting Lin, Chan-Wei Hu, Kuo-Hao Zeng, Min Sun |
Abstract | $ $Visual place recognition is challenging, especially when only a few place exemplars are given. To mitigate the challenge, we consider place recognition method using omnidirectional cameras and propose a novel Omnidirectional Convolutional Neural Network (O-CNN) to handle severe camera pose variation. Given a visual input, the task of the O-CNN is not to retrieve the matched place exemplar, but to retrieve the closest place exemplar and estimate the relative distance between the input and the closest place. With the ability to estimate relative distance, a heuristic policy is proposed to navigate a robot to the retrieved closest place. Note that the network is designed to take advantage of the omnidirectional view by incorporating circular padding and rotation invariance. To train a powerful O-CNN, we build a virtual world for training on a large scale. We also propose a continuous lifted structured feature embedding loss to learn the concept of distance efficiently. Finally, our experimental results confirm that our method achieves state-of-the-art accuracy and speed with both the virtual world and real-world datasets. |
Tasks | Visual Place Recognition |
Published | 2018-03-12 |
URL | http://arxiv.org/abs/1803.04228v1 |
http://arxiv.org/pdf/1803.04228v1.pdf | |
PWC | https://paperswithcode.com/paper/omnidirectional-cnn-for-visual-place |
Repo | |
Framework | |
Testing Changes in Communities for the Stochastic Block Model
Title | Testing Changes in Communities for the Stochastic Block Model |
Authors | Aditya Gangrade, Praveen Venkatesh, Bobak Nazer, Venkatesh Saligrama |
Abstract | We propose and analyze the problems of \textit{community goodness-of-fit and two-sample testing} for stochastic block models (SBM), where changes arise due to modification in community memberships of nodes. Motivated by practical applications, we consider the challenging sparse regime, where expected node degrees are constant, and the inter-community mean degree ($b$) scales proportionally to intra-community mean degree ($a$). Prior work has sharply characterized partial or full community recovery in terms of a “signal-to-noise ratio” ($\mathrm{SNR}$) based on $a$ and $b$. For both problems, we propose computationally-efficient tests that can succeed far beyond the regime where recovery of community membership is even possible. Overall, for large changes, $s \gg \sqrt{n}$, we need only $\mathrm{SNR}= O(1)$ whereas a na"ive test based on community recovery with $O(s)$ errors requires $\mathrm{SNR}= \Theta(\log n)$. Conversely, in the small change regime, $s \ll \sqrt{n}$, via an information-theoretic lower bound, we show that, surprisingly, no algorithm can do better than the na"ive algorithm that first estimates the community up to $O(s)$ errors and then detects changes. We validate these phenomena numerically on SBMs and on real-world datasets as well as Markov Random Fields where we only observe node data rather than the existence of links. |
Tasks | |
Published | 2018-11-29 |
URL | https://arxiv.org/abs/1812.00769v3 |
https://arxiv.org/pdf/1812.00769v3.pdf | |
PWC | https://paperswithcode.com/paper/testing-changes-in-communities-for-the |
Repo | |
Framework | |
Fitting New Speakers Based on a Short Untranscribed Sample
Title | Fitting New Speakers Based on a Short Untranscribed Sample |
Authors | Eliya Nachmani, Adam Polyak, Yaniv Taigman, Lior Wolf |
Abstract | Learning-based Text To Speech systems have the potential to generalize from one speaker to the next and thus require a relatively short sample of any new voice. However, this promise is currently largely unrealized. We present a method that is designed to capture a new speaker from a short untranscribed audio sample. This is done by employing an additional network that given an audio sample, places the speaker in the embedding space. This network is trained as part of the speech synthesis system using various consistency losses. Our results demonstrate a greatly improved performance on both the dataset speakers, and, more importantly, when fitting new voices, even from very short samples. |
Tasks | Speech Synthesis |
Published | 2018-02-20 |
URL | http://arxiv.org/abs/1802.06984v1 |
http://arxiv.org/pdf/1802.06984v1.pdf | |
PWC | https://paperswithcode.com/paper/fitting-new-speakers-based-on-a-short |
Repo | |
Framework | |
Predicting Oral Disintegrating Tablet Formulations by Neural Network Techniques
Title | Predicting Oral Disintegrating Tablet Formulations by Neural Network Techniques |
Authors | Run Han, Yilong Yang, Xiaoshan Li, Defang Ouyang |
Abstract | Oral Disintegrating Tablets (ODTs) is a novel dosage form that can be dissolved on the tongue within 3min or less especially for geriatric and pediatric patients. Current ODT formulation studies usually rely on the personal experience of pharmaceutical experts and trial-and-error in the laboratory, which is inefficient and time-consuming. The aim of current research was to establish the prediction model of ODT formulations with direct compression process by Artificial Neural Network (ANN) and Deep Neural Network (DNN) techniques. 145 formulation data were extracted from Web of Science. All data sets were divided into three parts: training set (105 data), validation set (20) and testing set (20). ANN and DNN were compared for the prediction of the disintegrating time. The accuracy of the ANN model has reached 85.60%, 80.00% and 75.00% on the training set, validation set and testing set respectively, whereas that of the DNN model was 85.60%, 85.00% and 80.00%, respectively. Compared with the ANN, DNN showed the better prediction for ODT formulations. It is the first time that deep neural network with the improved dataset selection algorithm is applied to formulation prediction on small data. The proposed predictive approach could evaluate the critical parameters about quality control of formulation, and guide research and process development. The implementation of this prediction model could effectively reduce drug product development timeline and material usage, and proactively facilitate the development of a robust drug product. |
Tasks | |
Published | 2018-03-14 |
URL | http://arxiv.org/abs/1803.05339v1 |
http://arxiv.org/pdf/1803.05339v1.pdf | |
PWC | https://paperswithcode.com/paper/predicting-oral-disintegrating-tablet |
Repo | |
Framework | |
Charades-Ego: A Large-Scale Dataset of Paired Third and First Person Videos
Title | Charades-Ego: A Large-Scale Dataset of Paired Third and First Person Videos |
Authors | Gunnar A. Sigurdsson, Abhinav Gupta, Cordelia Schmid, Ali Farhadi, Karteek Alahari |
Abstract | In Actor and Observer we introduced a dataset linking the first and third-person video understanding domains, the Charades-Ego Dataset. In this paper we describe the egocentric aspect of the dataset and present annotations for Charades-Ego with 68,536 activity instances in 68.8 hours of first and third-person video, making it one of the largest and most diverse egocentric datasets available. Charades-Ego furthermore shares activity classes, scripts, and methodology with the Charades dataset, that consist of additional 82.3 hours of third-person video with 66,500 activity instances. Charades-Ego has temporal annotations and textual descriptions, making it suitable for egocentric video classification, localization, captioning, and new tasks utilizing the cross-modal nature of the data. |
Tasks | Video Classification, Video Understanding |
Published | 2018-04-25 |
URL | http://arxiv.org/abs/1804.09626v2 |
http://arxiv.org/pdf/1804.09626v2.pdf | |
PWC | https://paperswithcode.com/paper/charades-ego-a-large-scale-dataset-of-paired |
Repo | |
Framework | |
Simultaneous Compression and Quantization: A Joint Approach for Efficient Unsupervised Hashing
Title | Simultaneous Compression and Quantization: A Joint Approach for Efficient Unsupervised Hashing |
Authors | Tuan Hoang, Thanh-Toan Do, Huu Le, Dang-Khoa Le-Tan, Ngai-Man Cheung |
Abstract | For unsupervised data-dependent hashing, the two most important requirements are to preserve similarity in the low-dimensional feature space and to minimize the binary quantization loss. A well-established hashing approach is Iterative Quantization (ITQ), which addresses these two requirements in separate steps. In this paper, we revisit the ITQ approach and propose novel formulations and algorithms to the problem. Specifically, we propose a novel approach, named Simultaneous Compression and Quantization (SCQ), to jointly learn to compress (reduce dimensionality) and binarize input data in a single formulation under strict orthogonal constraint. With this approach, we introduce a loss function and its relaxed version, termed Orthonormal Encoder (OnE) and Orthogonal Encoder (OgE) respectively, which involve challenging binary and orthogonal constraints. We propose to attack the optimization using novel algorithms based on recent advances in cyclic coordinate descent approach. Comprehensive experiments on unsupervised image retrieval demonstrate that our proposed methods consistently outperform other state-of-the-art hashing methods. Notably, our proposed methods outperform recent deep neural networks and GAN based hashing in accuracy, while being very computationally-efficient. |
Tasks | Image Retrieval, Quantization |
Published | 2018-02-19 |
URL | https://arxiv.org/abs/1802.06645v3 |
https://arxiv.org/pdf/1802.06645v3.pdf | |
PWC | https://paperswithcode.com/paper/simultaneous-compression-and-quantization-a |
Repo | |
Framework | |
A novel active learning framework for classification: using weighted rank aggregation to achieve multiple query criteria
Title | A novel active learning framework for classification: using weighted rank aggregation to achieve multiple query criteria |
Authors | Yu Zhao, Zhenhui Shi, Jingyang Zhang, Dong Chen, Lixu Gu |
Abstract | Multiple query criteria active learning (MQCAL) methods have a higher potential performance than conventional active learning methods in which only one criterion is deployed for sample selection. A central issue related to MQCAL methods concerns the development of an integration criteria strategy (ICS) that makes full use of all criteria. The conventional ICS adopted in relevant research all facilitate the desired effects, but several limitations still must be addressed. For instance, some of the strategies are not sufficiently scalable during the design process, and the number and type of criteria involved are dictated. Thus, it is challenging for the user to integrate other criteria into the original process unless modifications are made to the algorithm. Other strategies are too dependent on empirical parameters, which can only be acquired by experience or cross-validation and thus lack generality; additionally, these strategies are counter to the intention of active learning, as samples need to be labeled in the validation set before the active learning process can begin. To address these limitations, we propose a novel MQCAL method for classification tasks that employs a third strategy via weighted rank aggregation. The proposed method serves as a heuristic means to select high-value samples of high scalability and generality and is implemented through a three-step process: (1) the transformation of the sample selection to sample ranking and scoring, (2) the computation of the self-adaptive weights of each criterion, and (3) the weighted aggregation of each sample rank list. Ultimately, the sample at the top of the aggregated ranking list is the most comprehensively valuable and must be labeled. Several experiments generating 257 wins, 194 ties and 49 losses against other state-of-the-art MQCALs are conducted to verify that the proposed method can achieve superior results. |
Tasks | Active Learning |
Published | 2018-09-27 |
URL | http://arxiv.org/abs/1809.10565v1 |
http://arxiv.org/pdf/1809.10565v1.pdf | |
PWC | https://paperswithcode.com/paper/a-novel-active-learning-framework-for |
Repo | |
Framework | |
Deep Multi-Structural Shape Analysis: Application to Neuroanatomy
Title | Deep Multi-Structural Shape Analysis: Application to Neuroanatomy |
Authors | Benjamin Gutierrez-Becker, Christian Wachinger |
Abstract | We propose a deep neural network for supervised learning on neuroanatomical shapes. The network directly operates on raw point clouds without the need for mesh processing or the identification of point correspondences, as spatial transformer networks map the data to a canonical space. Instead of relying on hand-crafted shape descriptors, an optimal representation is learned in the end-to-end training stage of the network. The proposed network consists of multiple branches, so that features for multiple structures are learned simultaneously. We demonstrate the performance of our method on two applications: (i) the prediction of Alzheimer’s disease and mild cognitive impairment and (ii) the regression of the brain age. Finally, we visualize the important parts of the anatomy for the prediction by adapting the occlusion method to point clouds. |
Tasks | |
Published | 2018-06-04 |
URL | http://arxiv.org/abs/1806.01069v1 |
http://arxiv.org/pdf/1806.01069v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-multi-structural-shape-analysis |
Repo | |
Framework | |
Image-to-GPS Verification Through A Bottom-Up Pattern Matching Network
Title | Image-to-GPS Verification Through A Bottom-Up Pattern Matching Network |
Authors | Jiaxin Cheng, Yue Wu, Wael Abd-Almageed, Prem Natarajan |
Abstract | The image-to-GPS verification problem asks whether a given image is taken at a claimed GPS location. In this paper, we treat it as an image verification problem – whether a query image is taken at the same place as a reference image retrieved at the claimed GPS location. We make three major contributions: 1) we propose a novel custom bottom-up pattern matching (BUPM) deep neural network solution; 2) we demonstrate that the verification can be directly done by cross-checking a perspective-looking query image and a panorama reference image, and 3) we collect and clean a dataset of 30K pairs query and reference. Our experimental results show that the proposed BUPM solution outperforms the state-of-the-art solutions in terms of both verification and localization. |
Tasks | Image-To-Gps Verification |
Published | 2018-11-18 |
URL | http://arxiv.org/abs/1811.07288v1 |
http://arxiv.org/pdf/1811.07288v1.pdf | |
PWC | https://paperswithcode.com/paper/image-to-gps-verification-through-a-bottom-up |
Repo | |
Framework | |
Distribution Aware Active Learning
Title | Distribution Aware Active Learning |
Authors | Arash Mehrjou, Mehran Khodabandeh, Greg Mori |
Abstract | Discriminative learning machines often need a large set of labeled samples for training. Active learning (AL) settings assume that the learner has the freedom to ask an oracle to label its desired samples. Traditional AL algorithms heuristically choose query samples about which the current learner is uncertain. This strategy does not make good use of the structure of the dataset at hand and is prone to be misguided by outliers. To alleviate this problem, we propose to distill the structural information into a probabilistic generative model which acts as a \emph{teacher} in our model. The active \emph{learner} uses this information effectively at each cycle of active learning. The proposed method is generic and does not depend on the type of learner and teacher. We then suggest a query criterion for active learning that is aware of distribution of data and is more robust against outliers. Our method can be combined readily with several other query criteria for active learning. We provide the formulation and empirically show our idea via toy and real examples. |
Tasks | Active Learning |
Published | 2018-05-23 |
URL | http://arxiv.org/abs/1805.08916v1 |
http://arxiv.org/pdf/1805.08916v1.pdf | |
PWC | https://paperswithcode.com/paper/distribution-aware-active-learning |
Repo | |
Framework | |
Learning Descriptor Networks for 3D Shape Synthesis and Analysis
Title | Learning Descriptor Networks for 3D Shape Synthesis and Analysis |
Authors | Jianwen Xie, Zilong Zheng, Ruiqi Gao, Wenguan Wang, Song-Chun Zhu, Ying Nian Wu |
Abstract | This paper proposes a 3D shape descriptor network, which is a deep convolutional energy-based model, for modeling volumetric shape patterns. The maximum likelihood training of the model follows an “analysis by synthesis” scheme and can be interpreted as a mode seeking and mode shifting process. The model can synthesize 3D shape patterns by sampling from the probability distribution via MCMC such as Langevin dynamics. The model can be used to train a 3D generator network via MCMC teaching. The conditional version of the 3D shape descriptor net can be used for 3D object recovery and 3D object super-resolution. Experiments demonstrate that the proposed model can generate realistic 3D shape patterns and can be useful for 3D shape analysis. |
Tasks | 3D Object Super-Resolution, 3D Shape Analysis, Super-Resolution |
Published | 2018-04-02 |
URL | http://arxiv.org/abs/1804.00586v1 |
http://arxiv.org/pdf/1804.00586v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-descriptor-networks-for-3d-shape |
Repo | |
Framework | |
Shapes Characterization on Address Event Representation Using Histograms of Oriented Events and an Extended LBP Approach
Title | Shapes Characterization on Address Event Representation Using Histograms of Oriented Events and an Extended LBP Approach |
Authors | Pablo Negri |
Abstract | Address Event Representation is a thriving technology that could change digital image processing paradigm. This paper proposes a methodology to characterize the shape of objects using the streaming of asynchronous events. A new descriptor that enhances spikes connectivity is associated with two oriented histogram based representations. This paper uses these features to develop both a non-supervised and a supervised multi-classification framework to recognize poker symbols from the Poker-DVS public dataset. The aforementioned framework, which uses a very limited number of events and a simple class modeling, yields results that challenge more sophisticated methodologies proposed by the state of the art. A feature family based on context shapes is applied to the more challenging 2015 Poker-DVS dataset with a supervised classifier obtaining an accuracy of 98.5 %. The system is also applied to the MNIST-DVS dataset yielding an accuracy of 94.6 % and 96.3 % on digit recognition, for scales 4 and 8 respectively. |
Tasks | |
Published | 2018-02-09 |
URL | http://arxiv.org/abs/1802.03327v1 |
http://arxiv.org/pdf/1802.03327v1.pdf | |
PWC | https://paperswithcode.com/paper/shapes-characterization-on-address-event |
Repo | |
Framework | |
Discriminative Feature Learning for Unsupervised Video Summarization
Title | Discriminative Feature Learning for Unsupervised Video Summarization |
Authors | Yunjae Jung, Donghyeon Cho, Dahun Kim, Sanghyun Woo, In So Kweon |
Abstract | In this paper, we address the problem of unsupervised video summarization that automatically extracts key-shots from an input video. Specifically, we tackle two critical issues based on our empirical observations: (i) Ineffective feature learning due to flat distributions of output importance scores for each frame, and (ii) training difficulty when dealing with long-length video inputs. To alleviate the first problem, we propose a simple yet effective regularization loss term called variance loss. The proposed variance loss allows a network to predict output scores for each frame with high discrepancy which enables effective feature learning and significantly improves model performance. For the second problem, we design a novel two-stream network named Chunk and Stride Network (CSNet) that utilizes local (chunk) and global (stride) temporal view on the video features. Our CSNet gives better summarization results for long-length videos compared to the existing methods. In addition, we introduce an attention mechanism to handle the dynamic information in videos. We demonstrate the effectiveness of the proposed methods by conducting extensive ablation studies and show that our final model achieves new state-of-the-art results on two benchmark datasets. |
Tasks | Supervised Video Summarization, Unsupervised Video Summarization, Video Summarization |
Published | 2018-11-24 |
URL | http://arxiv.org/abs/1811.09791v1 |
http://arxiv.org/pdf/1811.09791v1.pdf | |
PWC | https://paperswithcode.com/paper/discriminative-feature-learning-for |
Repo | |
Framework | |