January 27, 2020

3392 words 16 mins read

Paper Group ANR 1068

Paper Group ANR 1068

Siam R-CNN: Visual Tracking by Re-Detection. Improve Object Detection by Data Enhancement based on Generative Adversarial Nets. RED: A ReRAM-based Deconvolution Accelerator. VAE-based regularization for deep speaker embedding. AANet: Attribute Attention Network for Person Re-Identifications. LSMI-Sinkhorn: Semi-supervised Squared-Loss Mutual Inform …

Siam R-CNN: Visual Tracking by Re-Detection

Title Siam R-CNN: Visual Tracking by Re-Detection
Authors Paul Voigtlaender, Jonathon Luiten, Philip H. S. Torr, Bastian Leibe
Abstract We present Siam R-CNN, a Siamese re-detection architecture which unleashes the full power of two-stage object detection approaches for visual object tracking. We combine this with a novel tracklet-based dynamic programming algorithm, which takes advantage of re-detections of both the first-frame template and previous-frame predictions, to model the full history of both the object to be tracked and potential distractor objects. This enables our approach to make better tracking decisions, as well as to re-detect tracked objects after long occlusion. Finally, we propose a novel hard example mining strategy to improve Siam R-CNN’s robustness to similar looking objects. The proposed tracker achieves the current best performance on ten tracking benchmarks, with especially strong results for long-term tracking.
Tasks Object Detection, Object Tracking, Visual Object Tracking, Visual Tracking
Published 2019-11-28
URL https://arxiv.org/abs/1911.12836v1
PDF https://arxiv.org/pdf/1911.12836v1.pdf
PWC https://paperswithcode.com/paper/siam-r-cnn-visual-tracking-by-re-detection
Repo
Framework

Improve Object Detection by Data Enhancement based on Generative Adversarial Nets

Title Improve Object Detection by Data Enhancement based on Generative Adversarial Nets
Authors Wei Jiang, Na Ying
Abstract The accuracy of the object detection model depends on whether the anchor boxes effectively trained. Because of the small number of GT boxes or object target is invariant in the training phase, cannot effectively train anchor boxes. Improving detection accuracy by extending the dataset is an effective way. We propose a data enhancement method based on the foreground-background separation model. While this model uses a binary image of object target random perturb original dataset image. Perturbation methods include changing the color channel of the object, adding salt noise to the object, and enhancing contrast. The main contribution of this paper is to propose a data enhancement method based on GAN and improve detection accuracy of DSSD. Results are shown on both PASCAL VOC2007 and PASCAL VOC2012 dataset. Our model with 321x321 input achieves 78.7% mAP on the VOC2007 test, 76.6% mAP on the VOC2012 test.
Tasks Object Detection
Published 2019-03-05
URL http://arxiv.org/abs/1903.01716v1
PDF http://arxiv.org/pdf/1903.01716v1.pdf
PWC https://paperswithcode.com/paper/improve-object-detection-by-data-enhancement
Repo
Framework

RED: A ReRAM-based Deconvolution Accelerator

Title RED: A ReRAM-based Deconvolution Accelerator
Authors Zichen Fan, Ziru Li, Bing Li, Yiran Chen, Hai, Li
Abstract Deconvolution has been widespread in neural networks. For example, it is essential for performing unsupervised learning in generative adversarial networks or constructing fully convolutional networks for semantic segmentation. Resistive RAM (ReRAM)-based processing-in-memory architecture has been widely explored in accelerating convolutional computation and demonstrates good performance. Performing deconvolution on existing ReRAM-based accelerator designs, however, suffers from long latency and high energy consumption because deconvolutional computation includes not only convolution but also extra add-on operations. To realize the more efficient execution for deconvolution, we analyze its computation requirement and propose a ReRAM-based accelerator design, namely, RED. More specific, RED integrates two orthogonal methods, the pixel-wise mapping scheme for reducing redundancy caused by zero-inserting operations and the zero-skipping data flow for increasing the computation parallelism and therefore improving performance. Experimental evaluations show that compared to the state-of-the-art ReRAM-based accelerator, RED can speed up operation 3.69x~1.15x and reduce 8%~88.36% energy consumption.
Tasks Semantic Segmentation
Published 2019-07-05
URL https://arxiv.org/abs/1907.02987v1
PDF https://arxiv.org/pdf/1907.02987v1.pdf
PWC https://paperswithcode.com/paper/red-a-reram-based-deconvolution-accelerator
Repo
Framework

VAE-based regularization for deep speaker embedding

Title VAE-based regularization for deep speaker embedding
Authors Yang Zhang, Lantian Li, Dong Wang
Abstract Deep speaker embedding has achieved state-of-the-art performance in speaker recognition. A potential problem of these embedded vectors (called `x-vectors’) are not Gaussian, causing performance degradation with the famous PLDA back-end scoring. In this paper, we propose a regularization approach based on Variational Auto-Encoder (VAE). This model transforms x-vectors to a latent space where mapped latent codes are more Gaussian, hence more suitable for PLDA scoring. |
Tasks Speaker Recognition
Published 2019-04-07
URL http://arxiv.org/abs/1904.03617v1
PDF http://arxiv.org/pdf/1904.03617v1.pdf
PWC https://paperswithcode.com/paper/vae-based-regularization-for-deep-speaker
Repo
Framework

AANet: Attribute Attention Network for Person Re-Identifications

Title AANet: Attribute Attention Network for Person Re-Identifications
Authors Chiat-Pin Tay, Sharmili Roy, Kim-Hui Yap
Abstract This paper proposes Attribute Attention Network (AANet), a new architecture that integrates person attributes and attribute attention maps into a classification framework to solve the person re-identification (re-ID) problem. Many person re-ID models typically employ semantic cues such as body parts or human pose to improve the re-ID performance. Attribute information, however, is often not utilized. The proposed AANet leverages on a baseline model that uses body parts and integrates the key attribute information in an unified learning framework. The AANet consists of a global person ID task, a part detection task and a crucial attribute detection task. By estimating the class responses of individual attributes and combining them to form the attribute attention map (AAM), a very strong discriminatory representation is constructed. The proposed AANet outperforms the best state-of-the-art method arXiv:1711.09349v3 [cs.CV] using ResNet-50 by 3.36% in mAP and 3.12% in Rank-1 accuracy on DukeMTMC-reID dataset. On Market1501 dataset, AANet achieves 92.38% mAP and 95.10% Rank-1 accuracy with re-ranking, outperforming arXiv:1804.00216v1 [cs.CV], another state of the art method using ResNet-152, by 1.42% in mAP and 0.47% in Rank-1 accuracy. In addition, AANet can perform person attribute prediction (e.g., gender, hair length, clothing length etc.), and localize the attributes in the query image.
Tasks Person Re-Identification
Published 2019-12-19
URL https://arxiv.org/abs/1912.09021v1
PDF https://arxiv.org/pdf/1912.09021v1.pdf
PWC https://paperswithcode.com/paper/aanet-attribute-attention-network-for-person-1
Repo
Framework

LSMI-Sinkhorn: Semi-supervised Squared-Loss Mutual Information Estimation with Optimal Transport

Title LSMI-Sinkhorn: Semi-supervised Squared-Loss Mutual Information Estimation with Optimal Transport
Authors Yanbin Liu, Makoto Yamada, Yao-Hung Hubert Tsai, Tam Le, Ruslan Salakhutdinov, Yi Yang
Abstract Estimating mutual information is an important machine learning and statistics problem. To estimate the mutual information from data, a common practice is preparing a set of paired samples. However, in some cases, it is difficult to obtain a large number of data pairs. To address this problem, we propose squared-loss mutual information (SMI) estimation using a small number of paired samples and the available unpaired ones. We first represent SMI through the density ratio function, where the expectation is approximated by the samples from marginals and its assignment parameters. The objective is formulated using the optimal transport problem and quadratic programming. Then, we introduce the least-square mutual information-Sinkhorn algorithm (LSMI-Sinkhorn) for efficient optimization. Through experiments, we first demonstrate that the proposed method can estimate the SMI without a large number of paired samples. We also evaluate and show the effectiveness of the proposed LSMI-Sinkhorn on various types of machine learning problems such as image matching and photo album summarization.
Tasks
Published 2019-09-05
URL https://arxiv.org/abs/1909.02373v1
PDF https://arxiv.org/pdf/1909.02373v1.pdf
PWC https://paperswithcode.com/paper/lsmi-sinkhorn-semi-supervised-squared-loss
Repo
Framework

EnsemFDet: An Ensemble Approach to Fraud Detection based on Bipartite Graph

Title EnsemFDet: An Ensemble Approach to Fraud Detection based on Bipartite Graph
Authors Yuxiang Ren, Hao Zhu, Jiawei ZHang, Peng Dai, Liefeng Bo
Abstract Fraud detection is extremely critical for e-commerce business. It is the intent of the companies to detect and prevent fraud as early as possible. Existing fraud detection methods try to identify unexpected dense subgraphs and treat related nodes as suspicious. Spectral relaxation-based methods solve the problem efficiently but hurt the performance due to the relaxed constraints. Besides, many methods cannot be accelerated with parallel computation or control the number of returned suspicious nodes because they provide a set of subgraphs with diverse node sizes. These drawbacks affect the real-world applications of existing methods. In this paper, we propose an Ensemble-based Fraud Detection (EnsemFDet) method to scale up fraud detection in bipartite graphs by decomposing the original problem into subproblems on small-sized subgraphs. By oversampling the graph and solving the subproblems, the ensemble approach further votes suspicious nodes without sacrificing the prediction accuracy. Extensive experiments have been done on real transaction data from JD.com, which is one of the world’s largest e-commerce platforms. Experimental results demonstrate the effectiveness, practicability, and scalability of EnsemFDet. More specifically, EnsemFDet is up to 100x faster than the state-of-the-art methods due to its parallelism with all aspects of data.
Tasks Fraud Detection
Published 2019-12-23
URL https://arxiv.org/abs/1912.11113v1
PDF https://arxiv.org/pdf/1912.11113v1.pdf
PWC https://paperswithcode.com/paper/ensemfdet-an-ensemble-approach-to-fraud
Repo
Framework

User-Device Authentication in Mobile Banking using APHEN for Paratuck2 Tensor Decomposition

Title User-Device Authentication in Mobile Banking using APHEN for Paratuck2 Tensor Decomposition
Authors Jeremy Charlier, Eric Falk, Radu State, Jean Hilger
Abstract The new financial European regulations such as PSD2 are changing the retail banking services. Noticeably, the monitoring of the personal expenses is now opened to other institutions than retail banks. Nonetheless, the retail banks are looking to leverage the user-device authentication on the mobile banking applications to enhance the personal financial advertisement. To address the profiling of the authentication, we rely on tensor decomposition, a higher dimensional analogue of matrix decomposition. We use Paratuck2, which expresses a tensor as a multiplication of matrices and diagonal tensors, because of the imbalance between the number of users and devices. We highlight why Paratuck2 is more appropriate in this case than the popular CP tensor decomposition, which decomposes a tensor as a sum of rank-one tensors. However, the computation of Paratuck2 is computational intensive. We propose a new APproximate HEssian-based Newton resolution algorithm, APHEN, capable of solving Paratuck2 more accurately and faster than the other popular approaches based on alternating least square or gradient descent. The results of Paratuck2 are used for the predictions of users’ authentication with neural networks. We apply our method for the concrete case of targeting clients for financial advertising campaigns based on the authentication events generated by mobile banking applications.
Tasks
Published 2019-05-23
URL https://arxiv.org/abs/1905.10363v1
PDF https://arxiv.org/pdf/1905.10363v1.pdf
PWC https://paperswithcode.com/paper/user-device-authentication-in-mobile-banking
Repo
Framework

A Comprehensive Survey on Machine Learning Techniques and User Authentication Approaches for Credit Card Fraud Detection

Title A Comprehensive Survey on Machine Learning Techniques and User Authentication Approaches for Credit Card Fraud Detection
Authors Niloofar Yousefi, Marie Alaghband, Ivan Garibay
Abstract With the increase of credit card usage, the volume of credit card misuse also has significantly increased. As a result, financial organizations are working hard on developing and deploying credit card fraud detection methods, in order to adapt to ever-evolving, increasingly sophisticated defrauding strategies and identifying illicit transactions as quickly as possible to protect themselves and their customers. Compounding on the complex nature of such adverse strategies, credit card fraudulent activities are rare events compared to the number of legitimate transactions. Hence, the challenge to develop fraud detection that are accurate and efficient is substantially intensified and, as a consequence, credit card fraud detection has lately become a very active area of research. In this work, we provide a survey of current techniques most relevant to the problem of credit card fraud detection. We carry out our survey in two main parts. In the first part,we focus on studies utilizing classical machine learning models, which mostly employ traditional transnational features to make fraud predictions. These models typically rely on some static physical characteristics, such as what the user knows (knowledge-based method), or what he/she has access to (object-based method). In the second part of our survey, we review more advanced techniques of user authentication, which use behavioral biometrics to identify an individual based on his/her unique behavior while he/she is interacting with his/her electronic devices. These approaches rely on how people behave (instead of what they do), which cannot be easily forged. By providing an overview of current approaches and the results reported in the literature, this survey aims to drive the future research agenda for the community in order to develop more accurate, reliable and scalable models of credit card fraud detection.
Tasks Fraud Detection
Published 2019-12-02
URL https://arxiv.org/abs/1912.02629v1
PDF https://arxiv.org/pdf/1912.02629v1.pdf
PWC https://paperswithcode.com/paper/a-comprehensive-survey-on-machine-learning
Repo
Framework

On the Importance of Video Action Recognition for Visual Lipreading

Title On the Importance of Video Action Recognition for Visual Lipreading
Authors Xinshuo Weng
Abstract We focus on the word-level visual lipreading, which requires to decode the word from the speaker’s video. Recently, many state-of-the-art visual lipreading methods explore the end-to-end trainable deep models, involving the use of 2D convolutional networks (e.g., ResNet) as the front-end visual feature extractor and the sequential model (e.g., Bi-LSTM or Bi-GRU) as the back-end. Although a deep 2D convolution neural network can provide informative image-based features, it ignores the temporal motion existing between the adjacent frames. In this work, we investigate the spatial-temporal capacity power of I3D (Inflated 3D ConvNet) for visual lipreading. We demonstrate that, after pre-trained on the large-scale video action recognition dataset (e.g., Kinetics), our models show a considerable improvement of performance on the task of lipreading. A comparison between a set of video model architectures and input data representation is also reported. Our extensive experiments on LRW shows that a two-stream I3D model with RGB video and optical flow as the inputs achieves the state-of-the-art performance.
Tasks Lipreading, Optical Flow Estimation, Temporal Action Localization
Published 2019-03-22
URL https://arxiv.org/abs/1903.09616v2
PDF https://arxiv.org/pdf/1903.09616v2.pdf
PWC https://paperswithcode.com/paper/on-the-importance-of-video-action-recognition
Repo
Framework

Reactive, Proactive, and Inductive Agents: An evolutionary path for biological and artificial spiking networks

Title Reactive, Proactive, and Inductive Agents: An evolutionary path for biological and artificial spiking networks
Authors Lana Sinapayen, Atsushi Masumori, Ikegami Takashi
Abstract Complex environments provide structured yet variable sensory inputs. To best exploit information from these environments, organisms must evolve the ability to anticipate consequences of unknown stimuli, and act on these predictions. We propose an evolutionary path for neural networks, leading an organism from reactive behavior to simple proactive behavior and from simple proactive behavior to induction-based behavior. Through in-vitro and in-silico experiments, we define the conditions necessary in a network with spike-timing dependent plasticity for the organism to go from reactive to proactive behavior. Our results support the existence of specific evolutionary steps and four conditions necessary for embodied neural networks to evolve predictive and inductive abilities from an initial reactive strategy. We extend these conditions to more general structures.
Tasks
Published 2019-02-18
URL https://arxiv.org/abs/1902.06410v2
PDF https://arxiv.org/pdf/1902.06410v2.pdf
PWC https://paperswithcode.com/paper/reactive-proactive-and-inductive-agents-an
Repo
Framework

Good, Better, Best: Textual Distractors Generation for Multi-Choice VQA via Policy Gradient

Title Good, Better, Best: Textual Distractors Generation for Multi-Choice VQA via Policy Gradient
Authors Jiaying Lu, Xin Ye, Yi Ren, Yezhou Yang
Abstract Textual distractors in current multi-choice VQA datasets are not challenging enough for state-of-the-art neural models. To better assess whether well-trained VQA models are vulnerable to potential attack such as more challenging distractors, we introduce a novel task called \textit{textual Distractors Generation for VQA} (DG-VQA). The goal of DG-VQA is to generate the most confusing distractors in multi-choice VQA tasks represented as a tuple of image, question, and the correct answer. Consequently, such distractors expose the vulnerability of neural models. We show that distractor generation can be formulated as a Markov Decision Process, and present a reinforcement learning solution to unsupervised produce distractors. Our solution addresses the lack of large annotated corpus issue in classical distractor generation methods. Our proposed model receives reward signals from well-trained multi-choice VQA models and updates its parameters via policy gradient. The empirical results show that the generated textual distractors can successfully confuse several cutting-edge models with an average 20% accuracy drop from around 64%. Furthermore, we conduct extra adversarial training to improve the robustness of VQA models by incorporating the generated distractors. The experiment validates the effectiveness of adversarial training by showing a performance improvement of 27% for the multi-choice VQA task
Tasks Visual Question Answering
Published 2019-10-21
URL https://arxiv.org/abs/1910.09134v1
PDF https://arxiv.org/pdf/1910.09134v1.pdf
PWC https://paperswithcode.com/paper/good-better-best-textual-distractors
Repo
Framework

GenSample: A Genetic Algorithm for Oversampling in Imbalanced Datasets

Title GenSample: A Genetic Algorithm for Oversampling in Imbalanced Datasets
Authors Vishwa Karia, Wenhao Zhang, Arash Naeim, Ramin Ramezani
Abstract Imbalanced datasets are ubiquitous. Classification performance on imbalanced datasets is generally poor for the minority class as the classifier cannot learn decision boundaries well. However, in sensitive applications like fraud detection, medical diagnosis, and spam identification, it is extremely important to classify the minority instances correctly. In this paper, we present a novel technique based on genetic algorithms, GenSample, for oversampling the minority class in imbalanced datasets. GenSample decides the rate of oversampling a minority example by taking into account the difficulty in learning that example, along with the performance improvement achieved by oversampling it. This technique terminates the oversampling process when the performance of the classifier begins to deteriorate. Consequently, it produces synthetic data only as long as a performance boost is obtained. The algorithm was tested on 9 real-world imbalanced datasets of varying sizes and imbalance ratios. It achieved the highest F-Score on 8 out of 9 datasets, confirming its ability to better handle imbalanced data compared to other existing methodologies.
Tasks Fraud Detection, Medical Diagnosis
Published 2019-10-23
URL https://arxiv.org/abs/1910.10806v1
PDF https://arxiv.org/pdf/1910.10806v1.pdf
PWC https://paperswithcode.com/paper/gensample-a-genetic-algorithm-for
Repo
Framework

Data-Driven Malaria Prevalence Prediction in Large Densely-Populated Urban Holoendemic sub-Saharan West Africa: Harnessing Machine Learning Approaches and 22-years of Prospectively Collected Data

Title Data-Driven Malaria Prevalence Prediction in Large Densely-Populated Urban Holoendemic sub-Saharan West Africa: Harnessing Machine Learning Approaches and 22-years of Prospectively Collected Data
Authors Biobele J. Brown, Alexander A. Przybylski, Petru Manescu, Fabio Caccioli, Gbeminiyi Oyinloye, Muna Elmi, Michael J. Shaw, Vijay Pawar, Remy Claveau, John Shawe-Taylor, Mandayam A. Srinivasan, Nathaniel K. Afolabi, Adebola E. Orimadegun, Wasiu A. Ajetunmobi, Francis Akinkunmi, Olayinka Kowobari, Kikelomo Osinusi, Felix O. Akinbami, Samuel Omokhodion, Wuraola A. Shokunbi, Ikeoluwa Lagunju, Olugbemiro Sodeinde, Delmiro Fernandez-Reyes
Abstract Plasmodium falciparum malaria still poses one of the greatest threats to human life with over 200 million cases globally leading to half-million deaths annually. Of these, 90% of cases and of the mortality occurs in sub-Saharan Africa, mostly among children. Although malaria prediction systems are central to the 2016-2030 malaria Global Technical Strategy, currently these are inadequate at capturing and estimating the burden of disease in highly endemic countries. We developed and validated a computational system that exploits the predictive power of current Machine Learning approaches on 22-years of prospective data from the high-transmission holoendemic malaria urban-densely-populated sub-Saharan West-Africa metropolis of Ibadan. Our dataset of >9x104 screened study participants attending our clinical and community services from 1996 to 2017 contains monthly prevalence, temporal, environmental and host features. Our Locality-specific Elastic-Net based Malaria Prediction System (LEMPS) achieves good generalization performance, both in magnitude and direction of the prediction, when tasked to predict monthly prevalence on previously unseen validation data (MAE<=6x10-2, MSE<=7x10-3) within a range of (+0.1 to -0.05) error-tolerance which is relevant and usable for aiding decision-support in a holoendemic setting. LEMPS is well-suited for malaria prediction, where there are multiple features which are correlated with one another, and trading-off between regularization-strength L1-norm and L2-norm allows the system to retain stability. Data-driven systems are critical for regionally-adaptable surveillance, management of control strategies and resource allocation across stretched healthcare systems.
Tasks
Published 2019-06-18
URL https://arxiv.org/abs/1906.07502v1
PDF https://arxiv.org/pdf/1906.07502v1.pdf
PWC https://paperswithcode.com/paper/data-driven-malaria-prevalence-prediction-in
Repo
Framework

Radiopathomics: Integration of radiographic and histologic characteristics for prognostication in glioblastoma

Title Radiopathomics: Integration of radiographic and histologic characteristics for prognostication in glioblastoma
Authors Saima Rathore, Muhammad A. Iftikhar, Metin N. Gurcan, Zissimos Mourelatos
Abstract Both radiographic (Rad) imaging, such as multi-parametric magnetic resonance imaging, and digital pathology (Path) images captured from tissue samples are currently acquired as standard clinical practice for glioblastoma tumors. Both these data streams have been separately used for diagnosis and treatment planning, despite the fact that they provide complementary information. In this research work, we aimed to assess the potential of both Rad and Path images in combination and comparison. An extensive set of engineered features was extracted from delineated tumor regions in Rad images, comprising T1, T1-Gd, T2, T2-FLAIR, and 100 random patches extracted from Path images. Specifically, the features comprised descriptors of intensity, histogram, and texture, mainly quantified via gray-level-co-occurrence matrix and gray-level-run-length matrices. Features extracted from images of 107 glioblastoma patients, downloaded from The Cancer Imaging Archive, were run through support vector machine for classification using leave-one-out cross-validation mechanism, and through support vector regression for prediction of continuous survival outcome. The Pearson correlation coefficient was estimated to be 0.75, 0.74, and 0.78 for Rad, Path and RadPath data. The area-under the receiver operating characteristic curve was estimated to be 0.74, 0.76 and 0.80 for Rad, Path and RadPath data, when patients were discretized into long- and short-survival groups based on average survival cutoff. Our results support the notion that synergistically using Rad and Path images may lead to better prognosis at the initial presentation of the disease, thereby facilitating the targeted enrollment of patients into clinical trials.
Tasks
Published 2019-09-17
URL https://arxiv.org/abs/1909.07581v2
PDF https://arxiv.org/pdf/1909.07581v2.pdf
PWC https://paperswithcode.com/paper/radiopathomics-integration-of-radiographic
Repo
Framework
comments powered by Disqus