February 1, 2020

3097 words 15 mins read

Paper Group AWR 203

Auto-ReID: Searching for a Part-aware ConvNet for Person Re-Identification. Dynamic Measurement Scheduling for Event Forecasting using Deep RL. STAR: A Concise Deep Learning Framework for Citywide Human Mobility Prediction. QATM: Quality-Aware Template Matching For Deep Learning. Single Path One-Shot Neural Architecture Search with Uniform Sampling …

Auto-ReID: Searching for a Part-aware ConvNet for Person Re-Identification


Title	Auto-ReID: Searching for a Part-aware ConvNet for Person Re-Identification
Authors	Ruijie Quan, Xuanyi Dong, Yu Wu, Linchao Zhu, Yi Yang
Abstract	Prevailing deep convolutional neural networks (CNNs) for person re-IDentification (reID) are usually built upon ResNet or VGG backbones, which were originally designed for classification. Because reID is different from classification, the architecture should be modified accordingly. We propose to automatically search for a CNN architecture that is specifically suitable for the reID task. There are three aspects to be tackled. First, body structural information plays an important role in reID but it is not encoded in backbones. Second, Neural Architecture Search (NAS) automates the process of architecture design without human effort, but no existing NAS methods incorporate the structure information of input images. Third, reID is essentially a retrieval task but current NAS algorithms are merely designed for classification. To solve these problems, we propose a retrieval-based search algorithm over a specifically designed reID search space, named Auto-ReID. Our Auto-ReID enables the automated approach to find an efficient and effective CNN architecture for reID. Extensive experiments demonstrate that the searched architecture achieves state-of-the-art performance while reducing 50% parameters and 53% FLOPs compared to others.
Tasks	Neural Architecture Search, Person Re-Identification
Published	2019-03-23
URL	https://arxiv.org/abs/1903.09776v4
PDF	https://arxiv.org/pdf/1903.09776v4.pdf
PWC	https://paperswithcode.com/paper/auto-reid-searching-for-a-part-aware-convnet
Repo	https://github.com/DuanYiqun/Auto-ReID-Fast
Framework	pytorch

Dynamic Measurement Scheduling for Event Forecasting using Deep RL


Title	Dynamic Measurement Scheduling for Event Forecasting using Deep RL
Authors	Chun-Hao Chang, Mingjie Mai, Anna Goldenberg
Abstract	Imagine a patient in critical condition. What and when should be measured to forecast detrimental events, especially under the budget constraints? We answer this question by deep reinforcement learning (RL) that jointly minimizes the measurement cost and maximizes predictive gain, by scheduling strategically-timed measurements. We learn our policy to be dynamically dependent on the patient’s health history. To scale our framework to exponentially large action space, we distribute our reward in a sequential setting that makes the learning easier. In our simulation, our policy outperforms heuristic-based scheduling with higher predictive gain and lower cost. In a real-world ICU mortality prediction task (MIMIC3), our policies reduce the total number of measurements by $31%$ or improve predictive gain by a factor of $3$ as compared to physicians, under the off-policy policy evaluation.
Tasks	Mortality Prediction
Published	2019-01-24
URL	https://arxiv.org/abs/1901.09699v3
PDF	https://arxiv.org/pdf/1901.09699v3.pdf
PWC	https://paperswithcode.com/paper/dynamic-measurement-scheduling-for-event
Repo	https://github.com/zzzace2000/autodiagnosis
Framework	tf

STAR: A Concise Deep Learning Framework for Citywide Human Mobility Prediction


Title	STAR: A Concise Deep Learning Framework for Citywide Human Mobility Prediction
Authors	Hongnian Wang, Han Su
Abstract	Human mobility forecasting in a city is of utmost importance to transportation and public safety, but with the process of urbanization and the generation of big data, intensive computing and determination of mobility pattern have become challenging. This study focuses on how to improve the accuracy and efficiency of predicting citywide human mobility via a simpler solution. A spatio-temporal mobility event prediction framework based on a single fully-convolutional residual network (STAR) is proposed. STAR is a highly simple, general and effective method for learning a single tensor representing the mobility event. Residual learning is utilized for training the deep network to derive the detailed result for scenarios of citywide prediction. Extensive benchmark evaluation results on real-world data demonstrate that STAR outperforms state-of-the-art approaches in single- and multi-step prediction while utilizing fewer parameters and achieving higher efficiency.
Tasks
Published	2019-05-16
URL	https://arxiv.org/abs/1905.06576v1
PDF	https://arxiv.org/pdf/1905.06576v1.pdf
PWC	https://paperswithcode.com/paper/star-a-concise-deep-learning-framework-for
Repo	https://github.com/hongnianwang/STAR
Framework	none

QATM: Quality-Aware Template Matching For Deep Learning


Title	QATM: Quality-Aware Template Matching For Deep Learning
Authors	Jiaxin Cheng, Yue Wu, Wael Abd-Almageed, Premkumar Natarajan
Abstract	Finding a template in a search image is one of the core problems many computer vision, such as semantic image semantic, image-to-GPS verification \etc. We propose a novel quality-aware template matching method, QATM, which is not only used as a standalone template matching algorithm, but also a trainable layer that can be easily embedded into any deep neural network. Specifically, we assess the quality of a matching pair using soft-ranking among all matching pairs, and thus different matching scenarios such as 1-to-1, 1-to-many, and many-to-many will be all reflected to different values. Our extensive evaluation on classic template matching benchmarks and deep learning tasks demonstrate the effectiveness of QATM. It not only outperforms state-of-the-art template matching methods when used alone, but also largely improves existing deep network solutions.
Tasks	Image-To-Gps Verification
Published	2019-03-18
URL	http://arxiv.org/abs/1903.07254v2
PDF	http://arxiv.org/pdf/1903.07254v2.pdf
PWC	https://paperswithcode.com/paper/qatm-quality-aware-template-matching-for-deep
Repo	https://github.com/kamata1729/QATM_pytorch
Framework	pytorch

Single Path One-Shot Neural Architecture Search with Uniform Sampling


Title	Single Path One-Shot Neural Architecture Search with Uniform Sampling
Authors	Zichao Guo, Xiangyu Zhang, Haoyuan Mu, Wen Heng, Zechun Liu, Yichen Wei, Jian Sun
Abstract	One-shot method is a powerful Neural Architecture Search (NAS) framework, but its training is non-trivial and it is difficult to achieve competitive results on large scale datasets like ImageNet. In this work, we propose a Single Path One-Shot model to address its main challenge in the training. Our central idea is to construct a simplified supernet, Single Path Supernet, which is trained by an uniform path sampling method. All underlying architectures (and their weights) get trained fully and equally. Once we have a trained supernet, we apply an evolutionary algorithm to efficiently search the best-performing architectures without any fine tuning. Comprehensive experiments verify that our approach is flexible and effective. It is easy to train and fast to search. It effortlessly supports complex search spaces (e.g., building blocks, channel, mixed-precision quantization) and different search constraints (e.g., FLOPs, latency). It is thus convenient to use for various needs. It achieves start-of-the-art performance on the large dataset ImageNet.
Tasks	Neural Architecture Search, Quantization
Published	2019-03-31
URL	http://arxiv.org/abs/1904.00420v3
PDF	http://arxiv.org/pdf/1904.00420v3.pdf
PWC	https://paperswithcode.com/paper/single-path-one-shot-neural-architecture
Repo	https://github.com/ShunLu91/Single-Path-One-Shot-NAS
Framework	pytorch

Joint Iris Segmentation and Localization Using Deep Multi-task Learning Framework


Title	Joint Iris Segmentation and Localization Using Deep Multi-task Learning Framework
Authors	Caiyong Wang, Yuhao Zhu, Yunfan Liu, Ran He, Zhenan Sun
Abstract	Iris segmentation and localization in non-cooperative environment is challenging due to illumination variations, long distances, moving subjects and limited user cooperation, etc. Traditional methods often suffer from poor performance when confronted with iris images captured in these conditions. Recent studies have shown that deep learning methods could achieve impressive performance on iris segmentation task. In addition, as iris is defined as an annular region between pupil and sclera, geometric constraints could be imposed to help locating the iris more accurately and improve the segmentation results. In this paper, we propose a deep multi-task learning framework, named as IrisParseNet, to exploit the inherent correlations between pupil, iris and sclera to boost up the performance of iris segmentation and localization in a unified model. In particular, IrisParseNet firstly applies a Fully Convolutional Encoder-Decoder Attention Network to simultaneously estimate pupil center, iris segmentation mask and iris inner/outer boundary. Then, an effective post-processing method is adopted for iris inner/outer circle localization.To train and evaluate the proposed method, we manually label three challenging iris datasets, namely CASIA-Iris-Distance, UBIRIS.v2, and MICHE-I, which cover various types of noises. Extensive experiments are conducted on these newly annotated datasets, and results show that our method outperforms state-of-the-art methods on various benchmarks. All the ground-truth annotations, annotation codes and evaluation protocols are publicly available at https://github.com/xiamenwcy/IrisParseNet.
Tasks	Iris Segmentation, Medical Image Segmentation, Multi-Task Learning
Published	2019-01-31
URL	https://arxiv.org/abs/1901.11195v2
PDF	https://arxiv.org/pdf/1901.11195v2.pdf
PWC	https://paperswithcode.com/paper/joint-iris-segmentation-and-localization
Repo	https://github.com/xiamenwcy/IrisParseNet
Framework	none

Gate-Shift Networks for Video Action Recognition


Title	Gate-Shift Networks for Video Action Recognition
Authors	Swathikiran Sudhakaran, Sergio Escalera, Oswald Lanz
Abstract	Deep 3D CNNs for video action recognition are designed to learn powerful representations in the joint spatio-temporal feature space. In practice however, because of the large number of parameters and computations involved, they may under-perform in the lack of sufficiently large datasets for training them at scale. In this paper we introduce spatial gating in spatial-temporal decomposition of 3D kernels. We implement this concept with Gate-Shift Module (GSM). GSM is lightweight and turns a 2D-CNN into a highly efficient spatio-temporal feature extractor. With GSM plugged in, a 2D-CNN learns to adaptively route features through time and combine them, at almost no additional parameters and computational overhead. We perform an extensive evaluation of the proposed module to study its effectiveness in video action recognition, achieving state-of-the-art results on Something Something-V1 and Diving48 datasets, and obtaining competitive results on EPIC-Kitchens with far less model complexity.
Tasks	Action Recognition In Videos
Published	2019-12-01
URL	https://arxiv.org/abs/1912.00381v2
PDF	https://arxiv.org/pdf/1912.00381v2.pdf
PWC	https://paperswithcode.com/paper/gate-shift-networks-for-video-action
Repo	https://github.com/swathikirans/GSM
Framework	tf

A Factored Generalized Additive Model for Clinical Decision Support in the Operating Room


Title	A Factored Generalized Additive Model for Clinical Decision Support in the Operating Room
Authors	Zhicheng Cui, Bradley A Fritz, Christopher R King, Michael S Avidan, Yixin Chen
Abstract	Logistic regression (LR) is widely used in clinical prediction because it is simple to deploy and easy to interpret. Nevertheless, being a linear model, LR has limited expressive capability and often has unsatisfactory performance. Generalized additive models (GAMs) extend the linear model with transformations of input features, though feature interaction is not allowed for all GAM variants. In this paper, we propose a factored generalized additive model (F-GAM) to preserve the model interpretability for targeted features while allowing a rich model for interaction with features fixed within the individual. We evaluate F-GAM on prediction of two targets, postoperative acute kidney injury and acute respiratory failure, from a single-center database. We find superior model performance of F-GAM in terms of AUPRC and AUROC compared to several other GAM implementations, random forests, support vector machine, and a deep neural network. We find that the model interpretability is good with results with high face validity.
Tasks
Published	2019-07-29
URL	https://arxiv.org/abs/1907.12596v1
PDF	https://arxiv.org/pdf/1907.12596v1.pdf
PWC	https://paperswithcode.com/paper/a-factored-generalized-additive-model-for
Repo	https://github.com/nostringattached/FGAM
Framework	pytorch

ME-Net: Towards Effective Adversarial Robustness with Matrix Estimation


Title	ME-Net: Towards Effective Adversarial Robustness with Matrix Estimation
Authors	Yuzhe Yang, Guo Zhang, Dina Katabi, Zhi Xu
Abstract	Deep neural networks are vulnerable to adversarial attacks. The literature is rich with algorithms that can easily craft successful adversarial examples. In contrast, the performance of defense techniques still lags behind. This paper proposes ME-Net, a defense method that leverages matrix estimation (ME). In ME-Net, images are preprocessed using two steps: first pixels are randomly dropped from the image; then, the image is reconstructed using ME. We show that this process destroys the adversarial structure of the noise, while re-enforcing the global structure in the original image. Since humans typically rely on such global structures in classifying images, the process makes the network mode compatible with human perception. We conduct comprehensive experiments on prevailing benchmarks such as MNIST, CIFAR-10, SVHN, and Tiny-ImageNet. Comparing ME-Net with state-of-the-art defense mechanisms shows that ME-Net consistently outperforms prior techniques, improving robustness against both black-box and white-box attacks.
Tasks
Published	2019-05-28
URL	https://arxiv.org/abs/1905.11971v1
PDF	https://arxiv.org/pdf/1905.11971v1.pdf
PWC	https://paperswithcode.com/paper/me-net-towards-effective-adversarial
Repo	https://github.com/YyzHarry/ME-Net
Framework	pytorch

Finding the most similar textual documents using Case-Based Reasoning


Title	Finding the most similar textual documents using Case-Based Reasoning
Authors	Marko Mihajlovic, Ning Xiong
Abstract	In recent years, huge amounts of unstructured textual data on the Internet are a big difficulty for AI algorithms to provide the best recommendations for users and their search queries. Since the Internet became widespread, a lot of research has been done in the field of Natural Language Processing (NLP) and machine learning. Almost every solution transforms documents into Vector Space Models (VSM) in order to apply AI algorithms over them. One such approach is based on Case-Based Reasoning (CBR). Therefore, the most important part of those systems is to compute the similarity between numerical data points. In 2016, the new similarity TS-SS metric is proposed, which showed state-of-the-art results in the field of textual mining for unsupervised learning. However, no one before has investigated its performances for supervised learning (classification task). In this work, we devised a CBR system capable of finding the most similar documents for a given query aiming to investigate performances of the new state-of-the-art metric, TS-SS, in addition to the two other geometrical similarity measures — Euclidean distance and Cosine similarity — that showed the best predictive results over several benchmark corpora. The results show surprising inappropriateness of TS-SS measure for high dimensional features.
Tasks
Published	2019-11-01
URL	https://arxiv.org/abs/1911.00262v1
PDF	https://arxiv.org/pdf/1911.00262v1.pdf
PWC	https://paperswithcode.com/paper/finding-the-most-similar-textual-documents
Repo	https://github.com/Maki94/document-classification
Framework	none

Unsupervised speech representation learning using WaveNet autoencoders


Title	Unsupervised speech representation learning using WaveNet autoencoders
Authors	Jan Chorowski, Ron J. Weiss, Samy Bengio, Aäron van den Oord
Abstract	We consider the task of unsupervised extraction of meaningful latent representations of speech by applying autoencoding neural networks to speech waveforms. The goal is to learn a representation able to capture high level semantic content from the signal, e.g.\ phoneme identities, while being invariant to confounding low level details in the signal such as the underlying pitch contour or background noise. Since the learned representation is tuned to contain only phonetic content, we resort to using a high capacity WaveNet decoder to infer information discarded by the encoder from previous samples. Moreover, the behavior of autoencoder models depends on the kind of constraint that is applied to the latent representation. We compare three variants: a simple dimensionality reduction bottleneck, a Gaussian Variational Autoencoder (VAE), and a discrete Vector Quantized VAE (VQ-VAE). We analyze the quality of learned representations in terms of speaker independence, the ability to predict phonetic content, and the ability to accurately reconstruct individual spectrogram frames. Moreover, for discrete encodings extracted using the VQ-VAE, we measure the ease of mapping them to phonemes. We introduce a regularization scheme that forces the representations to focus on the phonetic content of the utterance and report performance comparable with the top entries in the ZeroSpeech 2017 unsupervised acoustic unit discovery task.
Tasks	Dimensionality Reduction, Representation Learning
Published	2019-01-25
URL	https://arxiv.org/abs/1901.08810v2
PDF	https://arxiv.org/pdf/1901.08810v2.pdf
PWC	https://paperswithcode.com/paper/unsupervised-speech-representation-learning
Repo	https://github.com/swasun/VQ-VAE-Speech
Framework	pytorch

Data Cleansing for Models Trained with SGD


Title	Data Cleansing for Models Trained with SGD
Authors	Satoshi Hara, Atsushi Nitanda, Takanori Maehara
Abstract	Data cleansing is a typical approach used to improve the accuracy of machine learning models, which, however, requires extensive domain knowledge to identify the influential instances that affect the models. In this paper, we propose an algorithm that can suggest influential instances without using any domain knowledge. With the proposed method, users only need to inspect the instances suggested by the algorithm, implying that users do not need extensive knowledge for this procedure, which enables even non-experts to conduct data cleansing and improve the model. The existing methods require the loss function to be convex and an optimal model to be obtained, which is not always the case in modern machine learning. To overcome these limitations, we propose a novel approach specifically designed for the models trained with stochastic gradient descent (SGD). The proposed method infers the influential instances by retracing the steps of the SGD while incorporating intermediate models computed in each step. Through experiments, we demonstrate that the proposed method can accurately infer the influential instances. Moreover, we used MNIST and CIFAR10 to show that the models can be effectively improved by removing the influential instances suggested by the proposed method.
Tasks
Published	2019-06-20
URL	https://arxiv.org/abs/1906.08473v1
PDF	https://arxiv.org/pdf/1906.08473v1.pdf
PWC	https://paperswithcode.com/paper/data-cleansing-for-models-trained-with-sgd
Repo	https://github.com/sato9hara/sgd-influence
Framework	pytorch

TriMap: Large-scale Dimensionality Reduction Using Triplets


Title	TriMap: Large-scale Dimensionality Reduction Using Triplets
Authors	Ehsan Amid, Manfred K. Warmuth
Abstract	We introduce ``TriMap’'; a dimensionality reduction technique based on triplet constraints that preserves the global accuracy of the data better than the other commonly used methods such as t-SNE, LargeVis, and UMAP. To quantify the global accuracy, we introduce a score which roughly reflects the relative placement of the clusters rather than the individual points. We empirically show the excellent performance of TriMap on a large variety of datasets in terms of the quality of the embedding as well as the runtime. On our performance benchmarks, TriMap easily scales to millions of points without depleting the memory and clearly outperforms t-SNE, LargeVis, and UMAP in terms of runtime. \|
Tasks	Dimensionality Reduction
Published	2019-10-01
URL	https://arxiv.org/abs/1910.00204v1
PDF	https://arxiv.org/pdf/1910.00204v1.pdf
PWC	https://paperswithcode.com/paper/trimap-large-scale-dimensionality-reduction
Repo	https://github.com/eamid/trimap
Framework	none

Privacy-Preserving Multiple Tensor Factorization for Synthesizing Large-Scale Location Traces


Title	Privacy-Preserving Multiple Tensor Factorization for Synthesizing Large-Scale Location Traces
Authors	Takao Murakami, Koki Hamada, Yusuke Kawamoto, Takuma Hatano
Abstract	With the widespread use of LBSs (Location-based Services), synthesizing location traces plays an increasingly important role in analyzing spatial big data while protecting user privacy. Although location synthesizers have been widely studied, existing synthesizers do not provide sufficient utility, privacy, or scalability, hence are not practical for large-scale location traces. To overcome this issue, we propose a novel location synthesizer called PPMTF (Privacy-Preserving Multiple Tensor Factorization). We model various statistical features of the original traces by a transition-count tensor and a visit-count tensor. We factorize these two tensors simultaneously via multiple tensor factorization, and train factor matrices via posterior sampling. Then we synthesize traces using the MH (Metropolis-Hastings) algorithm, and perform a plausible deniability test for a synthetic trace. We comprehensively evaluate the proposed method using two datasets. Our experimental results show that the proposed method preserves various statistical features, provides plausible deniability and differential privacy, and synthesizes large-scale location traces in practical time. The proposed method also significantly outperforms the state-of-the-art methods in terms of utility, privacy, and scalability.
Tasks
Published	2019-11-11
URL	https://arxiv.org/abs/1911.04226v5
PDF	https://arxiv.org/pdf/1911.04226v5.pdf
PWC	https://paperswithcode.com/paper/privacy-preserving-multiple-tensor
Repo	https://github.com/PPMTF/PPMTF
Framework	none

Connectivity-Optimized Representation Learning via Persistent Homology


Title	Connectivity-Optimized Representation Learning via Persistent Homology
Authors	Christoph Hofer, Roland Kwitt, Mandar Dixit, Marc Niethammer
Abstract	We study the problem of learning representations with controllable connectivity properties. This is beneficial in situations when the imposed structure can be leveraged upstream. In particular, we control the connectivity of an autoencoder’s latent space via a novel type of loss, operating on information from persistent homology. Under mild conditions, this loss is differentiable and we present a theoretical analysis of the properties induced by the loss. We choose one-class learning as our upstream task and demonstrate that the imposed structure enables informed parameter selection for modeling the in-class distribution via kernel density estimators. Evaluated on computer vision data, these one-class models exhibit competitive performance and, in a low sample size regime, outperform other methods by a large margin. Notably, our results indicate that a single autoencoder, trained on auxiliary (unlabeled) data, yields a mapping into latent space that can be reused across datasets for one-class learning.
Tasks	Representation Learning
Published	2019-06-21
URL	https://arxiv.org/abs/1906.09003v1
PDF	https://arxiv.org/pdf/1906.09003v1.pdf
PWC	https://paperswithcode.com/paper/connectivity-optimized-representation
Repo	https://github.com/c-hofer/COREL_icml2019
Framework	pytorch