January 29, 2020

3616 words 17 mins read

Paper Group ANR 737

DeepDistance: A Multi-task Deep Regression Model for Cell Detection in Inverted Microscopy Images. Spherical Principal Component Analysis. GFD-SSD: Gated Fusion Double SSD for Multispectral Pedestrian Detection. Active learning to optimise time-expensive algorithm selection. Short-Term Temporal Convolutional Networks for Dynamic Hand Gesture Recogn …

DeepDistance: A Multi-task Deep Regression Model for Cell Detection in Inverted Microscopy Images


Title	DeepDistance: A Multi-task Deep Regression Model for Cell Detection in Inverted Microscopy Images
Authors	Can Fahrettin Koyuncu, Gozde Nur Gunesli, Rengul Cetin-Atalay, Cigdem Gunduz-Demir
Abstract	This paper presents a new deep regression model, which we call DeepDistance, for cell detection in images acquired with inverted microscopy. This model considers cell detection as a task of finding most probable locations that suggest cell centers in an image. It represents this main task with a regression task of learning an inner distance metric. However, different than the previously reported regression based methods, the DeepDistance model proposes to approach its learning as a multi-task regression problem where multiple tasks are learned by using shared feature representations. To this end, it defines a secondary metric, normalized outer distance, to represent a different aspect of the problem and proposes to define its learning as complementary to the main cell detection task. In order to learn these two complementary tasks more effectively, the DeepDistance model designs a fully convolutional network (FCN) with a shared encoder path and end-to-end trains this FCN to concurrently learn the tasks in parallel. DeepDistance uses the inner distances estimated by this FCN in a detection algorithm to locate individual cells in a given image. For further performance improvement on the main task, this paper also presents an extended version of the DeepDistance model. This extended model includes an auxiliary classification task and learns it in parallel to the two regression tasks by sharing feature representations with them. Our experiments on three different human cell lines reveal that the proposed multi-task learning models, the DeepDistance model and its extended version, successfully identify cell locations, even for the cell line that was not used in training, and improve the results of the previous deep learning methods.
Tasks	Multi-Task Learning
Published	2019-08-29
URL	https://arxiv.org/abs/1908.11211v1
PDF	https://arxiv.org/pdf/1908.11211v1.pdf
PWC	https://paperswithcode.com/paper/deepdistance-a-multi-task-deep-regression
Repo
Framework

Spherical Principal Component Analysis


Title	Spherical Principal Component Analysis
Authors	Kai Liu, Qiuwei Li, Hua Wang, Gongguo Tang
Abstract	Principal Component Analysis (PCA) is one of the most important methods to handle high dimensional data. However, most of the studies on PCA aim to minimize the loss after projection, which usually measures the Euclidean distance, though in some fields, angle distance is known to be more important and critical for analysis. In this paper, we propose a method by adding constraints on factors to unify the Euclidean distance and angle distance. However, due to the nonconvexity of the objective and constraints, the optimized solution is not easy to obtain. We propose an alternating linearized minimization method to solve it with provable convergence rate and guarantee. Experiments on synthetic data and real-world datasets have validated the effectiveness of our method and demonstrated its advantages over state-of-art clustering methods.
Tasks
Published	2019-03-16
URL	http://arxiv.org/abs/1903.06877v1
PDF	http://arxiv.org/pdf/1903.06877v1.pdf
PWC	https://paperswithcode.com/paper/spherical-principal-component-analysis
Repo
Framework

GFD-SSD: Gated Fusion Double SSD for Multispectral Pedestrian Detection


Title	GFD-SSD: Gated Fusion Double SSD for Multispectral Pedestrian Detection
Authors	Yang Zheng, Izzat H. Izzat, Shahrzad Ziaee
Abstract	Pedestrian detection is an essential task in autonomous driving research. In addition to typical color images, thermal images benefit the detection in dark environments. Hence, it is worthwhile to explore an integrated approach to take advantage of both color and thermal images simultaneously. In this paper, we propose a novel approach to fuse color and thermal sensors using deep neural networks (DNN). Current state-of-the-art DNN object detectors vary from two-stage to one-stage mechanisms. Two-stage detectors, like Faster-RCNN, achieve higher accuracy, while one-stage detectors such as Single Shot Detector (SSD) demonstrate faster performance. To balance the trade-off, especially in the consideration of autonomous driving applications, we investigate a fusion strategy to combine two SSDs on color and thermal inputs. Traditional fusion methods stack selected features from each channel and adjust their weights. In this paper, we propose two variations of novel Gated Fusion Units (GFU), that learn the combination of feature maps generated by the two SSD middle layers. Leveraging GFUs for the entire feature pyramid structure, we propose several mixed versions of both stack fusion and gated fusion. Experiments are conducted on the KAIST multispectral pedestrian detection dataset. Our Gated Fusion Double SSD (GFD-SSD) outperforms the stacked fusion and achieves the lowest miss rate in the benchmark, at an inference speed that is two times faster than Faster-RCNN based fusion networks.
Tasks	Autonomous Driving, Pedestrian Detection
Published	2019-03-16
URL	http://arxiv.org/abs/1903.06999v2
PDF	http://arxiv.org/pdf/1903.06999v2.pdf
PWC	https://paperswithcode.com/paper/gfd-ssd-gated-fusion-double-ssd-for
Repo
Framework

Active learning to optimise time-expensive algorithm selection


Title	Active learning to optimise time-expensive algorithm selection
Authors	Riccardo Volpato, Guangyan Song
Abstract	Hard optimisation problems such as Boolean Satisfiability typically have long solving times and can usually be solved by many algorithms, although the performance can vary widely in practice. Research has shown that no single algorithm outperforms all the others; thus, it is crucial to select the best algorithm for a given problem. Supervised machine learning models can accurately predict which solver is best for a given problem, but they require first to run every solver in the portfolio for all examples available to create labelled data. As this approach cannot scale, we developed an active learning framework that addresses this problem by constructing an optimal training set, so that the learner can achieve higher or equal performances with less training data. Our work proves that active learning is beneficial for algorithm selection techniques and provides practical guidance to incorporate into existing systems.
Tasks	Active Learning
Published	2019-09-07
URL	https://arxiv.org/abs/1909.03261v1
PDF	https://arxiv.org/pdf/1909.03261v1.pdf
PWC	https://paperswithcode.com/paper/active-learning-to-optimise-time-expensive
Repo
Framework

Short-Term Temporal Convolutional Networks for Dynamic Hand Gesture Recognition


Title	Short-Term Temporal Convolutional Networks for Dynamic Hand Gesture Recognition
Authors	Yi Zhang, Chong Wang, Ye Zheng, Jieyu Zhao, Yuqi Li, Xijiong Xie
Abstract	The purpose of gesture recognition is to recognize meaningful movements of human bodies, and gesture recognition is an important issue in computer vision. In this paper, we present a multimodal gesture recognition method based on 3D densely convolutional networks (3D-DenseNets) and improved temporal convolutional networks (TCNs). The key idea of our approach is to find a compact and effective representation of spatial and temporal features, which orderly and separately divide task of gesture video analysis into two parts: spatial analysis and temporal analysis. In spatial analysis, we adopt 3D-DenseNets to learn short-term spatio-temporal features effectively. Subsequently, in temporal analysis, we use TCNs to extract temporal features and employ improved Squeeze-and-Excitation Networks (SENets) to strengthen the representational power of temporal features from each TCNs’ layers. The method has been evaluated on the VIVA and the NVIDIA Gesture Dynamic Hand Gesture Datasets. Our approach obtains very competitive performance on VIVA benchmarks with the classification accuracies of 91.54%, and achieve state-of-the art performance with 86.37% accuracy on NVIDIA benchmark.
Tasks	Gesture Recognition, Hand Gesture Recognition, Hand-Gesture Recognition
Published	2019-12-31
URL	https://arxiv.org/abs/2001.05833v1
PDF	https://arxiv.org/pdf/2001.05833v1.pdf
PWC	https://paperswithcode.com/paper/short-term-temporal-convolutional-networks
Repo
Framework

Understanding Goal-Oriented Active Learning via Influence Functions


Title	Understanding Goal-Oriented Active Learning via Influence Functions
Authors	Minjie Xu, Gary Kazantsev
Abstract	Active learning (AL) concerns itself with learning a model from as few labelled data as possible through actively and iteratively querying an oracle with selected unlabelled samples. In this paper, we focus on analyzing a popular type of AL in which the utility of a sample is measured by a specified goal achieved by the retrained model after accounting for the sample’s marginal influence. Such AL strategies attract a lot of attention thanks to their intuitive motivations, yet they also suffer from impractically high computational costs due to their need for many iterations of model retraining. With the help of influence functions, we present an effective approximation that bypasses model retraining altogether, and propose a general efficient implementation that makes such AL strategies applicable in practice, both in the serial and the more challenging batch-mode setting. Additionally, we present both theoretical and empirical findings which call into question a few common practices and beliefs about such AL strategies.
Tasks	Active Learning
Published	2019-05-30
URL	https://arxiv.org/abs/1905.13183v3
PDF	https://arxiv.org/pdf/1905.13183v3.pdf
PWC	https://paperswithcode.com/paper/understanding-goal-oriented-active-learning
Repo
Framework

Channel Max Pooling Layer for Fine-Grained Vehicle Classification


Title	Channel Max Pooling Layer for Fine-Grained Vehicle Classification
Authors	Zhanyu Ma, Dongliang Chang, Xiaoxu Li
Abstract	Deep convolutional networks have recently shown excellent performance on Fine-Grained Vehicle Classification. Based on these existing works, we consider that the back-probation algorithm does not focus on extracting less discriminative feature as much as possible, but focus on that the loss function equals zero. Intuitively, if we can learn less discriminative features, and these features still could fit the training data well, the generalization ability of neural network could be improved. Therefore, we propose a new layer which is placed between fully connected layers and convolutional layers, called as Chanel Max Pooling. The proposed layer groups the features map first and then compress each group into a new feature map by computing maximum of pixels with same positions in the group of feature maps. Meanwhile, the proposed layer has an advantage that it could help neural network reduce massive parameters. Experimental results on two fine-grained vehicle datasets, the Stanford Cars-196 dataset and the Comp Cars dataset, demonstrate that the proposed layer could improve classification accuracies of deep neural networks on fine-grained vehicle classification in the situation that a massive of parameters are reduced. Moreover, it has a competitive performance with the-state-of-art performance on the two datasets.
Tasks
Published	2019-02-14
URL	https://arxiv.org/abs/1902.11107v2
PDF	https://arxiv.org/pdf/1902.11107v2.pdf
PWC	https://paperswithcode.com/paper/channel-max-pooling-layer-for-fine-grained
Repo
Framework

Code-Switching Detection Using ASR-Generated Language Posteriors


Title	Code-Switching Detection Using ASR-Generated Language Posteriors
Authors	Qinyi Wang, Emre Yılmaz, Adem Derinel, Haizhou Li
Abstract	Code-switching (CS) detection refers to the automatic detection of language switches in code-mixed utterances. This task can be achieved by using a CS automatic speech recognition (ASR) system that can handle such language switches. In our previous work, we have investigated the code-switching detection performance of the Frisian-Dutch CS ASR system by using the time alignment of the most likely hypothesis and found that this technique suffers from over-switching due to numerous very short spurious language switches. In this paper, we propose a novel method for CS detection aiming to remedy this shortcoming by using the language posteriors which are the sum of the frame-level posteriors of phones belonging to the same language. The CS ASR-generated language posteriors contain more complete language-specific information on frame level compared to the time alignment of the ASR output. Hence, it is expected to yield more accurate and robust CS detection. The CS detection experiments demonstrate that the proposed language posterior-based approach provides higher detection accuracy than the baseline system in terms of equal error rate. Moreover, a detailed CS detection error analysis reveals that using language posteriors reduces the false alarms and results in more robust CS detection.
Tasks	Speech Recognition
Published	2019-06-19
URL	https://arxiv.org/abs/1906.08003v1
PDF	https://arxiv.org/pdf/1906.08003v1.pdf
PWC	https://paperswithcode.com/paper/code-switching-detection-using-asr-generated
Repo
Framework

Defense Against Adversarial Images using Web-Scale Nearest-Neighbor Search


Title	Defense Against Adversarial Images using Web-Scale Nearest-Neighbor Search
Authors	Abhimanyu Dubey, Laurens van der Maaten, Zeki Yalniz, Yixuan Li, Dhruv Mahajan
Abstract	A plethora of recent work has shown that convolutional networks are not robust to adversarial images: images that are created by perturbing a sample from the data distribution as to maximize the loss on the perturbed example. In this work, we hypothesize that adversarial perturbations move the image away from the image manifold in the sense that there exists no physical process that could have produced the adversarial image. This hypothesis suggests that a successful defense mechanism against adversarial images should aim to project the images back onto the image manifold. We study such defense mechanisms, which approximate the projection onto the unknown image manifold by a nearest-neighbor search against a web-scale image database containing tens of billions of images. Empirical evaluations of this defense strategy on ImageNet suggest that it is very effective in attack settings in which the adversary does not have access to the image database. We also propose two novel attack methods to break nearest-neighbor defenses, and demonstrate conditions under which nearest-neighbor defense fails. We perform a series of ablation experiments, which suggest that there is a trade-off between robustness and accuracy in our defenses, that a large image database (with hundreds of millions of images) is crucial to get good performance, and that careful construction the image database is important to be robust against attacks tailored to circumvent our defenses.
Tasks
Published	2019-03-05
URL	https://arxiv.org/abs/1903.01612v2
PDF	https://arxiv.org/pdf/1903.01612v2.pdf
PWC	https://paperswithcode.com/paper/defense-against-adversarial-images-using-web
Repo
Framework

Deep learning for Plankton and Coral Classification


Title	Deep learning for Plankton and Coral Classification
Authors	Alessandra Lumini, Loris Nanni, Gianluca Maguolo
Abstract	Oceans are the essential lifeblood of the Earth: they provide over 70% of the oxygen and over 97% of the water. Plankton and corals are two of the most fundamental components of ocean ecosystems, the former due to their function at many levels of the oceans food chain, the latter because they provide spawning and nursery grounds to many fish populations. Studying and monitoring plankton distribution and coral reefs is vital for environment protection. In the last years there has been a massive proliferation of digital imagery for the monitoring of underwater ecosystems and much research is concentrated on the automated recognition of plankton and corals. In this paper, we present a study about an automated system for monitoring of underwater ecosystems. The system here proposed is based on the fusion of different deep learning methods. We study how to create an ensemble based of different CNN models, fine tuned on several datasets with the aim of exploiting their diversity. The aim of our study is to experiment the possibility of fine-tuning pretrained CNN for underwater imagery analysis, the opportunity of using different datasets for pretraining models, the possibility to design an ensemble using the same architecture with small variations in the training procedure. The experimental results are very encouraging, our experiments performed on 5 well-knowns datasets (3 plankton and 2 coral datasets) show that the fusion of such different CNN models in a heterogeneous ensemble grants a substantial performance improvement with respect to other state-of-the-art approaches in all the tested problems. One of the main contributions of this work is a wide experimental evaluation of famous CNN architectures to report performance of both single CNN and ensemble of CNNs in different problems. Moreover, we show how to create an ensemble which improves the performance of the best single model.
Tasks
Published	2019-08-15
URL	https://arxiv.org/abs/1908.05489v2
PDF	https://arxiv.org/pdf/1908.05489v2.pdf
PWC	https://paperswithcode.com/paper/deep-learning-for-plankton-and-coral
Repo
Framework

Towards Real-Time Action Recognition on Mobile Devices Using Deep Models


Title	Towards Real-Time Action Recognition on Mobile Devices Using Deep Models
Authors	Chen-Lin Zhang, Xin-Xin Liu, Jianxin Wu
Abstract	Action recognition is a vital task in computer vision, and many methods are developed to push it to the limit. However, current action recognition models have huge computational costs, which cannot be deployed to real-world tasks on mobile devices. In this paper, we first illustrate the setting of real-time action recognition, which is different from current action recognition inference settings. Under the new inference setting, we investigate state-of-the-art action recognition models on the Kinetics dataset empirically. Our results show that designing efficient real-time action recognition models is different from designing efficient ImageNet models, especially in weight initialization. We show that pre-trained weights on ImageNet improve the accuracy under the real-time action recognition setting. Finally, we use the hand gesture recognition task as a case study to evaluate our compact real-time action recognition models in real-world applications on mobile phones. Results show that our action recognition models, being 6x faster and with similar accuracy as state-of-the-art, can roughly meet the real-time requirements on mobile devices. To our best knowledge, this is the first paper that deploys current deep learning action recognition models on mobile devices.
Tasks	Gesture Recognition, Hand Gesture Recognition, Hand-Gesture Recognition
Published	2019-06-17
URL	https://arxiv.org/abs/1906.07052v1
PDF	https://arxiv.org/pdf/1906.07052v1.pdf
PWC	https://paperswithcode.com/paper/towards-real-time-action-recognition-on
Repo
Framework

Tensor Analysis with n-Mode Generalized Difference Subspace


Title	Tensor Analysis with n-Mode Generalized Difference Subspace
Authors	Bernardo B. Gatto, Eulanda M. dos Santos, Alessandro L. Koerich, Kazuhiro Fukui, Waldir S. S. Junior
Abstract	The increasing use of multiple sensors requires more efficient methods to represent and classify multi-dimensional data, since these applications produce a large amount of data, demanding modern techniques for data processing. Considering these observations, we present in this paper a new method for multi-dimensional data classification which relies on two premises: 1) multi-dimensional data are usually represented by tensors, due to benefits from multilinear algebra and the established tensor factorization methods; and 2) this kind of data can be described by a subspace lying within a vector space. Subspace representation has been consistently employed for pattern-set recognition, and its tensor representation counterpart is also available in the literature. However, traditional methods do not employ discriminative information of the tensors, which degrades the classification accuracy. In this scenario, generalized difference subspace (GDS) may provide an enhanced subspace representation by reducing data redundancy and revealing discriminative structures. Since GDS is not able to directly handle tensor data, we propose a new projection called n-mode GDS, which efficiently handles tensor data. In addition, n-mode Fisher score is introduced as a class separability index and an improved metric based on the geodesic distance is provided to measure the similarity between tensor data. To confirm the advantages of the proposed method, we address the problem of representing and classifying tensor data for gesture and action recognition. The experimental results have shown that the proposed approach outperforms methods commonly used in the literature without adopting pre-trained models or transfer learning.
Tasks	Transfer Learning
Published	2019-09-04
URL	https://arxiv.org/abs/1909.01954v1
PDF	https://arxiv.org/pdf/1909.01954v1.pdf
PWC	https://paperswithcode.com/paper/tensor-analysis-with-n-mode-generalized
Repo
Framework

Progressive Cross-camera Soft-label Learning for Semi-supervised Person Re-identification


Title	Progressive Cross-camera Soft-label Learning for Semi-supervised Person Re-identification
Authors	Lei Qi, Lei Wang, Jing Huo, Yinghuan Shi, Yang Gao
Abstract	In this paper, we focus on the semi-supervised person re-identification (Re-ID) case, which only has the intra-camera (within-camera) labels but not inter-camera (cross-camera) labels. In real-world applications, these intra-camera labels can be readily captured by tracking algorithms or few manual annotations, when compared with cross-camera labels. In this case, it is very difficult to explore the relationships between cross-camera persons in the training stage due to the lack of cross-camera label information. To deal with this issue, we propose a novel Progressive Cross-camera Soft-label Learning (PCSL) framework for the semi-supervised person Re-ID task, which can generate cross-camera soft-labels and utilize them to optimize the network. Concretely, we calculate an affinity matrix based on person-level features and adapt them to produce the similarities between cross-camera persons (i.e., cross-camera soft-labels). To exploit these soft-labels to train the network, we investigate the weighted cross-entropy loss and the weighted triplet loss from the classification and discrimination perspectives, respectively. Particularly, the proposed framework alternately generates progressive cross-camera soft-labels and gradually improves feature representations in the whole learning course. Extensive experiments on five large-scale benchmark datasets show that PCSL significantly outperforms the state-of-the-art unsupervised methods that employ labeled source domains or the images generated by the GAN-based models. Furthermore, the proposed method even has a competitive performance with respect to deep supervised Re-ID methods.
Tasks	Person Re-Identification, Semi-Supervised Person Re-Identification
Published	2019-08-15
URL	https://arxiv.org/abs/1908.05669v2
PDF	https://arxiv.org/pdf/1908.05669v2.pdf
PWC	https://paperswithcode.com/paper/progressive-cross-camera-soft-label-learning
Repo
Framework

RFBTD: RFB Text Detector


Title	RFBTD: RFB Text Detector
Authors	Christen M, AB Saravanan
Abstract	Text detection plays a critical role in the whole procedure of textual information extraction and understanding. On a high note, recent years have seen a surge in the high recall text detectors in scene text images, however text boxes for individual words is still a challenging when dense text is present in the scene. In this work, we propose an elegant solution that promotes prediction of words or text lines of arbitrary orientations and directions, providing emphasis on individual words. We also investigate the effects of Receptive Field Blocks(RFB) and its impact in receptive fields for text segments. Experiments were done on the ICDAR2015 and achieves an F-score of 47.09 at 720p
Tasks
Published	2019-07-04
URL	https://arxiv.org/abs/1907.02228v1
PDF	https://arxiv.org/pdf/1907.02228v1.pdf
PWC	https://paperswithcode.com/paper/rfbtd-rfb-text-detector
Repo
Framework

Estimating the Circuit Deobfuscating Runtime based on Graph Deep Learning


Title	Estimating the Circuit Deobfuscating Runtime based on Graph Deep Learning
Authors	Zhiqian Chen, Gaurav Kolhe, Setareh Rafatirad, Sai Manoj P. D., Houman Homayoun, Liang Zhao, Chang-Tien Lu
Abstract	Circuit obfuscation is a recently proposed defense mechanism to protect digital integrated circuits (ICs) from reverse engineering by using camouflaged gates i.e., logic gates whose functionality cannot be precisely determined by the attacker. There have been effective schemes such as satisfiability-checking (SAT)-based attacks that can potentially decrypt obfuscated circuits, called deobfuscation. Deobfuscation runtime could have a large span ranging from few milliseconds to thousands of years or more, depending on the number and layouts of the ICs and camouflaged gates. And hence accurately pre-estimating the deobfuscation runtime is highly crucial for the defenders to maximize it and optimize their defense. However, estimating the deobfuscation runtime is a challenging task due to 1) the complexity and heterogeneity of graph-structured circuit, 2) the unknown and sophisticated mechanisms of the attackers for deobfuscation. To address the above mentioned challenges, this work proposes the first machine-learning framework that predicts the deobfuscation runtime based on graph deep learning techniques. Specifically, we design a new model, ICNet with new input and convolution layers to characterize and extract graph frequencies from ICs, which are then integrated by heterogeneous deep fully-connected layers to obtain final output. ICNet is an end-to-end framework which can automatically extract the determinant features for deobfuscation runtime. Extensive experiments demonstrate its effectiveness and efficiency.
Tasks
Published	2019-02-14
URL	https://arxiv.org/abs/1902.05357v2
PDF	https://arxiv.org/pdf/1902.05357v2.pdf
PWC	https://paperswithcode.com/paper/estimating-the-circuit-deobfuscating-runtime
Repo
Framework