January 27, 2020

3392 words 16 mins read

Paper Group ANR 1068

Siam R-CNN: Visual Tracking by Re-Detection. Improve Object Detection by Data Enhancement based on Generative Adversarial Nets. RED: A ReRAM-based Deconvolution Accelerator. VAE-based regularization for deep speaker embedding. AANet: Attribute Attention Network for Person Re-Identifications. LSMI-Sinkhorn: Semi-supervised Squared-Loss Mutual Inform …

Siam R-CNN: Visual Tracking by Re-Detection


Title	Siam R-CNN: Visual Tracking by Re-Detection
Authors	Paul Voigtlaender, Jonathon Luiten, Philip H. S. Torr, Bastian Leibe
Abstract	We present Siam R-CNN, a Siamese re-detection architecture which unleashes the full power of two-stage object detection approaches for visual object tracking. We combine this with a novel tracklet-based dynamic programming algorithm, which takes advantage of re-detections of both the first-frame template and previous-frame predictions, to model the full history of both the object to be tracked and potential distractor objects. This enables our approach to make better tracking decisions, as well as to re-detect tracked objects after long occlusion. Finally, we propose a novel hard example mining strategy to improve Siam R-CNN’s robustness to similar looking objects. The proposed tracker achieves the current best performance on ten tracking benchmarks, with especially strong results for long-term tracking.
Tasks	Object Detection, Object Tracking, Visual Object Tracking, Visual Tracking
Published	2019-11-28
URL	https://arxiv.org/abs/1911.12836v1
PDF	https://arxiv.org/pdf/1911.12836v1.pdf
PWC	https://paperswithcode.com/paper/siam-r-cnn-visual-tracking-by-re-detection
Repo
Framework

Improve Object Detection by Data Enhancement based on Generative Adversarial Nets


Title	Improve Object Detection by Data Enhancement based on Generative Adversarial Nets
Authors	Wei Jiang, Na Ying
Abstract	The accuracy of the object detection model depends on whether the anchor boxes effectively trained. Because of the small number of GT boxes or object target is invariant in the training phase, cannot effectively train anchor boxes. Improving detection accuracy by extending the dataset is an effective way. We propose a data enhancement method based on the foreground-background separation model. While this model uses a binary image of object target random perturb original dataset image. Perturbation methods include changing the color channel of the object, adding salt noise to the object, and enhancing contrast. The main contribution of this paper is to propose a data enhancement method based on GAN and improve detection accuracy of DSSD. Results are shown on both PASCAL VOC2007 and PASCAL VOC2012 dataset. Our model with 321x321 input achieves 78.7% mAP on the VOC2007 test, 76.6% mAP on the VOC2012 test.
Tasks	Object Detection
Published	2019-03-05
URL	http://arxiv.org/abs/1903.01716v1
PDF	http://arxiv.org/pdf/1903.01716v1.pdf
PWC	https://paperswithcode.com/paper/improve-object-detection-by-data-enhancement
Repo
Framework

RED: A ReRAM-based Deconvolution Accelerator


Title	RED: A ReRAM-based Deconvolution Accelerator
Authors	Zichen Fan, Ziru Li, Bing Li, Yiran Chen, Hai, Li
Abstract	Deconvolution has been widespread in neural networks. For example, it is essential for performing unsupervised learning in generative adversarial networks or constructing fully convolutional networks for semantic segmentation. Resistive RAM (ReRAM)-based processing-in-memory architecture has been widely explored in accelerating convolutional computation and demonstrates good performance. Performing deconvolution on existing ReRAM-based accelerator designs, however, suffers from long latency and high energy consumption because deconvolutional computation includes not only convolution but also extra add-on operations. To realize the more efficient execution for deconvolution, we analyze its computation requirement and propose a ReRAM-based accelerator design, namely, RED. More specific, RED integrates two orthogonal methods, the pixel-wise mapping scheme for reducing redundancy caused by zero-inserting operations and the zero-skipping data flow for increasing the computation parallelism and therefore improving performance. Experimental evaluations show that compared to the state-of-the-art ReRAM-based accelerator, RED can speed up operation 3.69x~1.15x and reduce 8%~88.36% energy consumption.
Tasks	Semantic Segmentation
Published	2019-07-05
URL	https://arxiv.org/abs/1907.02987v1
PDF	https://arxiv.org/pdf/1907.02987v1.pdf
PWC	https://paperswithcode.com/paper/red-a-reram-based-deconvolution-accelerator
Repo
Framework

VAE-based regularization for deep speaker embedding


Title	VAE-based regularization for deep speaker embedding
Authors	Yang Zhang, Lantian Li, Dong Wang
Abstract	Deep speaker embedding has achieved state-of-the-art performance in speaker recognition. A potential problem of these embedded vectors (called `x-vectors’) are not Gaussian, causing performance degradation with the famous PLDA back-end scoring. In this paper, we propose a regularization approach based on Variational Auto-Encoder (VAE). This model transforms x-vectors to a latent space where mapped latent codes are more Gaussian, hence more suitable for PLDA scoring. \|
Tasks	Speaker Recognition
Published	2019-04-07
URL	http://arxiv.org/abs/1904.03617v1
PDF	http://arxiv.org/pdf/1904.03617v1.pdf
PWC	https://paperswithcode.com/paper/vae-based-regularization-for-deep-speaker
Repo
Framework

AANet: Attribute Attention Network for Person Re-Identifications


Title	AANet: Attribute Attention Network for Person Re-Identifications
Authors	Chiat-Pin Tay, Sharmili Roy, Kim-Hui Yap
Abstract	This paper proposes Attribute Attention Network (AANet), a new architecture that integrates person attributes and attribute attention maps into a classification framework to solve the person re-identification (re-ID) problem. Many person re-ID models typically employ semantic cues such as body parts or human pose to improve the re-ID performance. Attribute information, however, is often not utilized. The proposed AANet leverages on a baseline model that uses body parts and integrates the key attribute information in an unified learning framework. The AANet consists of a global person ID task, a part detection task and a crucial attribute detection task. By estimating the class responses of individual attributes and combining them to form the attribute attention map (AAM), a very strong discriminatory representation is constructed. The proposed AANet outperforms the best state-of-the-art method arXiv:1711.09349v3 [cs.CV] using ResNet-50 by 3.36% in mAP and 3.12% in Rank-1 accuracy on DukeMTMC-reID dataset. On Market1501 dataset, AANet achieves 92.38% mAP and 95.10% Rank-1 accuracy with re-ranking, outperforming arXiv:1804.00216v1 [cs.CV], another state of the art method using ResNet-152, by 1.42% in mAP and 0.47% in Rank-1 accuracy. In addition, AANet can perform person attribute prediction (e.g., gender, hair length, clothing length etc.), and localize the attributes in the query image.
Tasks	Person Re-Identification
Published	2019-12-19
URL	https://arxiv.org/abs/1912.09021v1
PDF	https://arxiv.org/pdf/1912.09021v1.pdf
PWC	https://paperswithcode.com/paper/aanet-attribute-attention-network-for-person-1
Repo
Framework

LSMI-Sinkhorn: Semi-supervised Squared-Loss Mutual Information Estimation with Optimal Transport


Title	LSMI-Sinkhorn: Semi-supervised Squared-Loss Mutual Information Estimation with Optimal Transport
Authors	Yanbin Liu, Makoto Yamada, Yao-Hung Hubert Tsai, Tam Le, Ruslan Salakhutdinov, Yi Yang
Abstract	Estimating mutual information is an important machine learning and statistics problem. To estimate the mutual information from data, a common practice is preparing a set of paired samples. However, in some cases, it is difficult to obtain a large number of data pairs. To address this problem, we propose squared-loss mutual information (SMI) estimation using a small number of paired samples and the available unpaired ones. We first represent SMI through the density ratio function, where the expectation is approximated by the samples from marginals and its assignment parameters. The objective is formulated using the optimal transport problem and quadratic programming. Then, we introduce the least-square mutual information-Sinkhorn algorithm (LSMI-Sinkhorn) for efficient optimization. Through experiments, we first demonstrate that the proposed method can estimate the SMI without a large number of paired samples. We also evaluate and show the effectiveness of the proposed LSMI-Sinkhorn on various types of machine learning problems such as image matching and photo album summarization.
Tasks
Published	2019-09-05
URL	https://arxiv.org/abs/1909.02373v1
PDF	https://arxiv.org/pdf/1909.02373v1.pdf
PWC	https://paperswithcode.com/paper/lsmi-sinkhorn-semi-supervised-squared-loss
Repo
Framework

EnsemFDet: An Ensemble Approach to Fraud Detection based on Bipartite Graph


Title	EnsemFDet: An Ensemble Approach to Fraud Detection based on Bipartite Graph
Authors	Yuxiang Ren, Hao Zhu, Jiawei ZHang, Peng Dai, Liefeng Bo
Abstract	Fraud detection is extremely critical for e-commerce business. It is the intent of the companies to detect and prevent fraud as early as possible. Existing fraud detection methods try to identify unexpected dense subgraphs and treat related nodes as suspicious. Spectral relaxation-based methods solve the problem efficiently but hurt the performance due to the relaxed constraints. Besides, many methods cannot be accelerated with parallel computation or control the number of returned suspicious nodes because they provide a set of subgraphs with diverse node sizes. These drawbacks affect the real-world applications of existing methods. In this paper, we propose an Ensemble-based Fraud Detection (EnsemFDet) method to scale up fraud detection in bipartite graphs by decomposing the original problem into subproblems on small-sized subgraphs. By oversampling the graph and solving the subproblems, the ensemble approach further votes suspicious nodes without sacrificing the prediction accuracy. Extensive experiments have been done on real transaction data from JD.com, which is one of the world’s largest e-commerce platforms. Experimental results demonstrate the effectiveness, practicability, and scalability of EnsemFDet. More specifically, EnsemFDet is up to 100x faster than the state-of-the-art methods due to its parallelism with all aspects of data.
Tasks	Fraud Detection
Published	2019-12-23
URL	https://arxiv.org/abs/1912.11113v1
PDF	https://arxiv.org/pdf/1912.11113v1.pdf
PWC	https://paperswithcode.com/paper/ensemfdet-an-ensemble-approach-to-fraud
Repo
Framework

User-Device Authentication in Mobile Banking using APHEN for Paratuck2 Tensor Decomposition


Title	User-Device Authentication in Mobile Banking using APHEN for Paratuck2 Tensor Decomposition
Authors	Jeremy Charlier, Eric Falk, Radu State, Jean Hilger
Abstract	The new financial European regulations such as PSD2 are changing the retail banking services. Noticeably, the monitoring of the personal expenses is now opened to other institutions than retail banks. Nonetheless, the retail banks are looking to leverage the user-device authentication on the mobile banking applications to enhance the personal financial advertisement. To address the profiling of the authentication, we rely on tensor decomposition, a higher dimensional analogue of matrix decomposition. We use Paratuck2, which expresses a tensor as a multiplication of matrices and diagonal tensors, because of the imbalance between the number of users and devices. We highlight why Paratuck2 is more appropriate in this case than the popular CP tensor decomposition, which decomposes a tensor as a sum of rank-one tensors. However, the computation of Paratuck2 is computational intensive. We propose a new APproximate HEssian-based Newton resolution algorithm, APHEN, capable of solving Paratuck2 more accurately and faster than the other popular approaches based on alternating least square or gradient descent. The results of Paratuck2 are used for the predictions of users’ authentication with neural networks. We apply our method for the concrete case of targeting clients for financial advertising campaigns based on the authentication events generated by mobile banking applications.
Tasks
Published	2019-05-23
URL	https://arxiv.org/abs/1905.10363v1
PDF	https://arxiv.org/pdf/1905.10363v1.pdf
PWC	https://paperswithcode.com/paper/user-device-authentication-in-mobile-banking
Repo
Framework

A Comprehensive Survey on Machine Learning Techniques and User Authentication Approaches for Credit Card Fraud Detection


Title	A Comprehensive Survey on Machine Learning Techniques and User Authentication Approaches for Credit Card Fraud Detection
Authors	Niloofar Yousefi, Marie Alaghband, Ivan Garibay
Abstract	With the increase of credit card usage, the volume of credit card misuse also has significantly increased. As a result, financial organizations are working hard on developing and deploying credit card fraud detection methods, in order to adapt to ever-evolving, increasingly sophisticated defrauding strategies and identifying illicit transactions as quickly as possible to protect themselves and their customers. Compounding on the complex nature of such adverse strategies, credit card fraudulent activities are rare events compared to the number of legitimate transactions. Hence, the challenge to develop fraud detection that are accurate and efficient is substantially intensified and, as a consequence, credit card fraud detection has lately become a very active area of research. In this work, we provide a survey of current techniques most relevant to the problem of credit card fraud detection. We carry out our survey in two main parts. In the first part,we focus on studies utilizing classical machine learning models, which mostly employ traditional transnational features to make fraud predictions. These models typically rely on some static physical characteristics, such as what the user knows (knowledge-based method), or what he/she has access to (object-based method). In the second part of our survey, we review more advanced techniques of user authentication, which use behavioral biometrics to identify an individual based on his/her unique behavior while he/she is interacting with his/her electronic devices. These approaches rely on how people behave (instead of what they do), which cannot be easily forged. By providing an overview of current approaches and the results reported in the literature, this survey aims to drive the future research agenda for the community in order to develop more accurate, reliable and scalable models of credit card fraud detection.
Tasks	Fraud Detection
Published	2019-12-02
URL	https://arxiv.org/abs/1912.02629v1
PDF	https://arxiv.org/pdf/1912.02629v1.pdf
PWC	https://paperswithcode.com/paper/a-comprehensive-survey-on-machine-learning
Repo
Framework

On the Importance of Video Action Recognition for Visual Lipreading


Title	On the Importance of Video Action Recognition for Visual Lipreading
Authors	Xinshuo Weng
Abstract	We focus on the word-level visual lipreading, which requires to decode the word from the speaker’s video. Recently, many state-of-the-art visual lipreading methods explore the end-to-end trainable deep models, involving the use of 2D convolutional networks (e.g., ResNet) as the front-end visual feature extractor and the sequential model (e.g., Bi-LSTM or Bi-GRU) as the back-end. Although a deep 2D convolution neural network can provide informative image-based features, it ignores the temporal motion existing between the adjacent frames. In this work, we investigate the spatial-temporal capacity power of I3D (Inflated 3D ConvNet) for visual lipreading. We demonstrate that, after pre-trained on the large-scale video action recognition dataset (e.g., Kinetics), our models show a considerable improvement of performance on the task of lipreading. A comparison between a set of video model architectures and input data representation is also reported. Our extensive experiments on LRW shows that a two-stream I3D model with RGB video and optical flow as the inputs achieves the state-of-the-art performance.
Tasks	Lipreading, Optical Flow Estimation, Temporal Action Localization
Published	2019-03-22
URL	https://arxiv.org/abs/1903.09616v2
PDF	https://arxiv.org/pdf/1903.09616v2.pdf
PWC	https://paperswithcode.com/paper/on-the-importance-of-video-action-recognition
Repo
Framework

Reactive, Proactive, and Inductive Agents: An evolutionary path for biological and artificial spiking networks


Title	Reactive, Proactive, and Inductive Agents: An evolutionary path for biological and artificial spiking networks
Authors	Lana Sinapayen, Atsushi Masumori, Ikegami Takashi
Abstract	Complex environments provide structured yet variable sensory inputs. To best exploit information from these environments, organisms must evolve the ability to anticipate consequences of unknown stimuli, and act on these predictions. We propose an evolutionary path for neural networks, leading an organism from reactive behavior to simple proactive behavior and from simple proactive behavior to induction-based behavior. Through in-vitro and in-silico experiments, we define the conditions necessary in a network with spike-timing dependent plasticity for the organism to go from reactive to proactive behavior. Our results support the existence of specific evolutionary steps and four conditions necessary for embodied neural networks to evolve predictive and inductive abilities from an initial reactive strategy. We extend these conditions to more general structures.
Tasks
Published	2019-02-18
URL	https://arxiv.org/abs/1902.06410v2
PDF	https://arxiv.org/pdf/1902.06410v2.pdf
PWC	https://paperswithcode.com/paper/reactive-proactive-and-inductive-agents-an
Repo
Framework

Good, Better, Best: Textual Distractors Generation for Multi-Choice VQA via Policy Gradient


Title	Good, Better, Best: Textual Distractors Generation for Multi-Choice VQA via Policy Gradient
Authors	Jiaying Lu, Xin Ye, Yi Ren, Yezhou Yang
Abstract	Textual distractors in current multi-choice VQA datasets are not challenging enough for state-of-the-art neural models. To better assess whether well-trained VQA models are vulnerable to potential attack such as more challenging distractors, we introduce a novel task called \textit{textual Distractors Generation for VQA} (DG-VQA). The goal of DG-VQA is to generate the most confusing distractors in multi-choice VQA tasks represented as a tuple of image, question, and the correct answer. Consequently, such distractors expose the vulnerability of neural models. We show that distractor generation can be formulated as a Markov Decision Process, and present a reinforcement learning solution to unsupervised produce distractors. Our solution addresses the lack of large annotated corpus issue in classical distractor generation methods. Our proposed model receives reward signals from well-trained multi-choice VQA models and updates its parameters via policy gradient. The empirical results show that the generated textual distractors can successfully confuse several cutting-edge models with an average 20% accuracy drop from around 64%. Furthermore, we conduct extra adversarial training to improve the robustness of VQA models by incorporating the generated distractors. The experiment validates the effectiveness of adversarial training by showing a performance improvement of 27% for the multi-choice VQA task
Tasks	Visual Question Answering
Published	2019-10-21
URL	https://arxiv.org/abs/1910.09134v1
PDF	https://arxiv.org/pdf/1910.09134v1.pdf
PWC	https://paperswithcode.com/paper/good-better-best-textual-distractors
Repo
Framework

GenSample: A Genetic Algorithm for Oversampling in Imbalanced Datasets


Title	GenSample: A Genetic Algorithm for Oversampling in Imbalanced Datasets
Authors	Vishwa Karia, Wenhao Zhang, Arash Naeim, Ramin Ramezani
Abstract	Imbalanced datasets are ubiquitous. Classification performance on imbalanced datasets is generally poor for the minority class as the classifier cannot learn decision boundaries well. However, in sensitive applications like fraud detection, medical diagnosis, and spam identification, it is extremely important to classify the minority instances correctly. In this paper, we present a novel technique based on genetic algorithms, GenSample, for oversampling the minority class in imbalanced datasets. GenSample decides the rate of oversampling a minority example by taking into account the difficulty in learning that example, along with the performance improvement achieved by oversampling it. This technique terminates the oversampling process when the performance of the classifier begins to deteriorate. Consequently, it produces synthetic data only as long as a performance boost is obtained. The algorithm was tested on 9 real-world imbalanced datasets of varying sizes and imbalance ratios. It achieved the highest F-Score on 8 out of 9 datasets, confirming its ability to better handle imbalanced data compared to other existing methodologies.
Tasks	Fraud Detection, Medical Diagnosis
Published	2019-10-23
URL	https://arxiv.org/abs/1910.10806v1
PDF	https://arxiv.org/pdf/1910.10806v1.pdf
PWC	https://paperswithcode.com/paper/gensample-a-genetic-algorithm-for
Repo
Framework

Data-Driven Malaria Prevalence Prediction in Large Densely-Populated Urban Holoendemic sub-Saharan West Africa: Harnessing Machine Learning Approaches and 22-years of Prospectively Collected Data


Title	Data-Driven Malaria Prevalence Prediction in Large Densely-Populated Urban Holoendemic sub-Saharan West Africa: Harnessing Machine Learning Approaches and 22-years of Prospectively Collected Data
Authors	Biobele J. Brown, Alexander A. Przybylski, Petru Manescu, Fabio Caccioli, Gbeminiyi Oyinloye, Muna Elmi, Michael J. Shaw, Vijay Pawar, Remy Claveau, John Shawe-Taylor, Mandayam A. Srinivasan, Nathaniel K. Afolabi, Adebola E. Orimadegun, Wasiu A. Ajetunmobi, Francis Akinkunmi, Olayinka Kowobari, Kikelomo Osinusi, Felix O. Akinbami, Samuel Omokhodion, Wuraola A. Shokunbi, Ikeoluwa Lagunju, Olugbemiro Sodeinde, Delmiro Fernandez-Reyes
Abstract	Plasmodium falciparum malaria still poses one of the greatest threats to human life with over 200 million cases globally leading to half-million deaths annually. Of these, 90% of cases and of the mortality occurs in sub-Saharan Africa, mostly among children. Although malaria prediction systems are central to the 2016-2030 malaria Global Technical Strategy, currently these are inadequate at capturing and estimating the burden of disease in highly endemic countries. We developed and validated a computational system that exploits the predictive power of current Machine Learning approaches on 22-years of prospective data from the high-transmission holoendemic malaria urban-densely-populated sub-Saharan West-Africa metropolis of Ibadan. Our dataset of >9x104 screened study participants attending our clinical and community services from 1996 to 2017 contains monthly prevalence, temporal, environmental and host features. Our Locality-specific Elastic-Net based Malaria Prediction System (LEMPS) achieves good generalization performance, both in magnitude and direction of the prediction, when tasked to predict monthly prevalence on previously unseen validation data (MAE<=6x10-2, MSE<=7x10-3) within a range of (+0.1 to -0.05) error-tolerance which is relevant and usable for aiding decision-support in a holoendemic setting. LEMPS is well-suited for malaria prediction, where there are multiple features which are correlated with one another, and trading-off between regularization-strength L1-norm and L2-norm allows the system to retain stability. Data-driven systems are critical for regionally-adaptable surveillance, management of control strategies and resource allocation across stretched healthcare systems.
Tasks
Published	2019-06-18
URL	https://arxiv.org/abs/1906.07502v1
PDF	https://arxiv.org/pdf/1906.07502v1.pdf
PWC	https://paperswithcode.com/paper/data-driven-malaria-prevalence-prediction-in
Repo
Framework

Radiopathomics: Integration of radiographic and histologic characteristics for prognostication in glioblastoma


Title	Radiopathomics: Integration of radiographic and histologic characteristics for prognostication in glioblastoma
Authors	Saima Rathore, Muhammad A. Iftikhar, Metin N. Gurcan, Zissimos Mourelatos
Abstract	Both radiographic (Rad) imaging, such as multi-parametric magnetic resonance imaging, and digital pathology (Path) images captured from tissue samples are currently acquired as standard clinical practice for glioblastoma tumors. Both these data streams have been separately used for diagnosis and treatment planning, despite the fact that they provide complementary information. In this research work, we aimed to assess the potential of both Rad and Path images in combination and comparison. An extensive set of engineered features was extracted from delineated tumor regions in Rad images, comprising T1, T1-Gd, T2, T2-FLAIR, and 100 random patches extracted from Path images. Specifically, the features comprised descriptors of intensity, histogram, and texture, mainly quantified via gray-level-co-occurrence matrix and gray-level-run-length matrices. Features extracted from images of 107 glioblastoma patients, downloaded from The Cancer Imaging Archive, were run through support vector machine for classification using leave-one-out cross-validation mechanism, and through support vector regression for prediction of continuous survival outcome. The Pearson correlation coefficient was estimated to be 0.75, 0.74, and 0.78 for Rad, Path and RadPath data. The area-under the receiver operating characteristic curve was estimated to be 0.74, 0.76 and 0.80 for Rad, Path and RadPath data, when patients were discretized into long- and short-survival groups based on average survival cutoff. Our results support the notion that synergistically using Rad and Path images may lead to better prognosis at the initial presentation of the disease, thereby facilitating the targeted enrollment of patients into clinical trials.
Tasks
Published	2019-09-17
URL	https://arxiv.org/abs/1909.07581v2
PDF	https://arxiv.org/pdf/1909.07581v2.pdf
PWC	https://paperswithcode.com/paper/radiopathomics-integration-of-radiographic
Repo
Framework