Paper Group AWR 203
Auto-ReID: Searching for a Part-aware ConvNet for Person Re-Identification. Dynamic Measurement Scheduling for Event Forecasting using Deep RL. STAR: A Concise Deep Learning Framework for Citywide Human Mobility Prediction. QATM: Quality-Aware Template Matching For Deep Learning. Single Path One-Shot Neural Architecture Search with Uniform Sampling …
Auto-ReID: Searching for a Part-aware ConvNet for Person Re-Identification
Title | Auto-ReID: Searching for a Part-aware ConvNet for Person Re-Identification |
Authors | Ruijie Quan, Xuanyi Dong, Yu Wu, Linchao Zhu, Yi Yang |
Abstract | Prevailing deep convolutional neural networks (CNNs) for person re-IDentification (reID) are usually built upon ResNet or VGG backbones, which were originally designed for classification. Because reID is different from classification, the architecture should be modified accordingly. We propose to automatically search for a CNN architecture that is specifically suitable for the reID task. There are three aspects to be tackled. First, body structural information plays an important role in reID but it is not encoded in backbones. Second, Neural Architecture Search (NAS) automates the process of architecture design without human effort, but no existing NAS methods incorporate the structure information of input images. Third, reID is essentially a retrieval task but current NAS algorithms are merely designed for classification. To solve these problems, we propose a retrieval-based search algorithm over a specifically designed reID search space, named Auto-ReID. Our Auto-ReID enables the automated approach to find an efficient and effective CNN architecture for reID. Extensive experiments demonstrate that the searched architecture achieves state-of-the-art performance while reducing 50% parameters and 53% FLOPs compared to others. |
Tasks | Neural Architecture Search, Person Re-Identification |
Published | 2019-03-23 |
URL | https://arxiv.org/abs/1903.09776v4 |
https://arxiv.org/pdf/1903.09776v4.pdf | |
PWC | https://paperswithcode.com/paper/auto-reid-searching-for-a-part-aware-convnet |
Repo | https://github.com/DuanYiqun/Auto-ReID-Fast |
Framework | pytorch |
Dynamic Measurement Scheduling for Event Forecasting using Deep RL
Title | Dynamic Measurement Scheduling for Event Forecasting using Deep RL |
Authors | Chun-Hao Chang, Mingjie Mai, Anna Goldenberg |
Abstract | Imagine a patient in critical condition. What and when should be measured to forecast detrimental events, especially under the budget constraints? We answer this question by deep reinforcement learning (RL) that jointly minimizes the measurement cost and maximizes predictive gain, by scheduling strategically-timed measurements. We learn our policy to be dynamically dependent on the patient’s health history. To scale our framework to exponentially large action space, we distribute our reward in a sequential setting that makes the learning easier. In our simulation, our policy outperforms heuristic-based scheduling with higher predictive gain and lower cost. In a real-world ICU mortality prediction task (MIMIC3), our policies reduce the total number of measurements by $31%$ or improve predictive gain by a factor of $3$ as compared to physicians, under the off-policy policy evaluation. |
Tasks | Mortality Prediction |
Published | 2019-01-24 |
URL | https://arxiv.org/abs/1901.09699v3 |
https://arxiv.org/pdf/1901.09699v3.pdf | |
PWC | https://paperswithcode.com/paper/dynamic-measurement-scheduling-for-event |
Repo | https://github.com/zzzace2000/autodiagnosis |
Framework | tf |
STAR: A Concise Deep Learning Framework for Citywide Human Mobility Prediction
Title | STAR: A Concise Deep Learning Framework for Citywide Human Mobility Prediction |
Authors | Hongnian Wang, Han Su |
Abstract | Human mobility forecasting in a city is of utmost importance to transportation and public safety, but with the process of urbanization and the generation of big data, intensive computing and determination of mobility pattern have become challenging. This study focuses on how to improve the accuracy and efficiency of predicting citywide human mobility via a simpler solution. A spatio-temporal mobility event prediction framework based on a single fully-convolutional residual network (STAR) is proposed. STAR is a highly simple, general and effective method for learning a single tensor representing the mobility event. Residual learning is utilized for training the deep network to derive the detailed result for scenarios of citywide prediction. Extensive benchmark evaluation results on real-world data demonstrate that STAR outperforms state-of-the-art approaches in single- and multi-step prediction while utilizing fewer parameters and achieving higher efficiency. |
Tasks | |
Published | 2019-05-16 |
URL | https://arxiv.org/abs/1905.06576v1 |
https://arxiv.org/pdf/1905.06576v1.pdf | |
PWC | https://paperswithcode.com/paper/star-a-concise-deep-learning-framework-for |
Repo | https://github.com/hongnianwang/STAR |
Framework | none |
QATM: Quality-Aware Template Matching For Deep Learning
Title | QATM: Quality-Aware Template Matching For Deep Learning |
Authors | Jiaxin Cheng, Yue Wu, Wael Abd-Almageed, Premkumar Natarajan |
Abstract | Finding a template in a search image is one of the core problems many computer vision, such as semantic image semantic, image-to-GPS verification \etc. We propose a novel quality-aware template matching method, QATM, which is not only used as a standalone template matching algorithm, but also a trainable layer that can be easily embedded into any deep neural network. Specifically, we assess the quality of a matching pair using soft-ranking among all matching pairs, and thus different matching scenarios such as 1-to-1, 1-to-many, and many-to-many will be all reflected to different values. Our extensive evaluation on classic template matching benchmarks and deep learning tasks demonstrate the effectiveness of QATM. It not only outperforms state-of-the-art template matching methods when used alone, but also largely improves existing deep network solutions. |
Tasks | Image-To-Gps Verification |
Published | 2019-03-18 |
URL | http://arxiv.org/abs/1903.07254v2 |
http://arxiv.org/pdf/1903.07254v2.pdf | |
PWC | https://paperswithcode.com/paper/qatm-quality-aware-template-matching-for-deep |
Repo | https://github.com/kamata1729/QATM_pytorch |
Framework | pytorch |
Single Path One-Shot Neural Architecture Search with Uniform Sampling
Title | Single Path One-Shot Neural Architecture Search with Uniform Sampling |
Authors | Zichao Guo, Xiangyu Zhang, Haoyuan Mu, Wen Heng, Zechun Liu, Yichen Wei, Jian Sun |
Abstract | One-shot method is a powerful Neural Architecture Search (NAS) framework, but its training is non-trivial and it is difficult to achieve competitive results on large scale datasets like ImageNet. In this work, we propose a Single Path One-Shot model to address its main challenge in the training. Our central idea is to construct a simplified supernet, Single Path Supernet, which is trained by an uniform path sampling method. All underlying architectures (and their weights) get trained fully and equally. Once we have a trained supernet, we apply an evolutionary algorithm to efficiently search the best-performing architectures without any fine tuning. Comprehensive experiments verify that our approach is flexible and effective. It is easy to train and fast to search. It effortlessly supports complex search spaces (e.g., building blocks, channel, mixed-precision quantization) and different search constraints (e.g., FLOPs, latency). It is thus convenient to use for various needs. It achieves start-of-the-art performance on the large dataset ImageNet. |
Tasks | Neural Architecture Search, Quantization |
Published | 2019-03-31 |
URL | http://arxiv.org/abs/1904.00420v3 |
http://arxiv.org/pdf/1904.00420v3.pdf | |
PWC | https://paperswithcode.com/paper/single-path-one-shot-neural-architecture |
Repo | https://github.com/ShunLu91/Single-Path-One-Shot-NAS |
Framework | pytorch |
Joint Iris Segmentation and Localization Using Deep Multi-task Learning Framework
Title | Joint Iris Segmentation and Localization Using Deep Multi-task Learning Framework |
Authors | Caiyong Wang, Yuhao Zhu, Yunfan Liu, Ran He, Zhenan Sun |
Abstract | Iris segmentation and localization in non-cooperative environment is challenging due to illumination variations, long distances, moving subjects and limited user cooperation, etc. Traditional methods often suffer from poor performance when confronted with iris images captured in these conditions. Recent studies have shown that deep learning methods could achieve impressive performance on iris segmentation task. In addition, as iris is defined as an annular region between pupil and sclera, geometric constraints could be imposed to help locating the iris more accurately and improve the segmentation results. In this paper, we propose a deep multi-task learning framework, named as IrisParseNet, to exploit the inherent correlations between pupil, iris and sclera to boost up the performance of iris segmentation and localization in a unified model. In particular, IrisParseNet firstly applies a Fully Convolutional Encoder-Decoder Attention Network to simultaneously estimate pupil center, iris segmentation mask and iris inner/outer boundary. Then, an effective post-processing method is adopted for iris inner/outer circle localization.To train and evaluate the proposed method, we manually label three challenging iris datasets, namely CASIA-Iris-Distance, UBIRIS.v2, and MICHE-I, which cover various types of noises. Extensive experiments are conducted on these newly annotated datasets, and results show that our method outperforms state-of-the-art methods on various benchmarks. All the ground-truth annotations, annotation codes and evaluation protocols are publicly available at https://github.com/xiamenwcy/IrisParseNet. |
Tasks | Iris Segmentation, Medical Image Segmentation, Multi-Task Learning |
Published | 2019-01-31 |
URL | https://arxiv.org/abs/1901.11195v2 |
https://arxiv.org/pdf/1901.11195v2.pdf | |
PWC | https://paperswithcode.com/paper/joint-iris-segmentation-and-localization |
Repo | https://github.com/xiamenwcy/IrisParseNet |
Framework | none |
Gate-Shift Networks for Video Action Recognition
Title | Gate-Shift Networks for Video Action Recognition |
Authors | Swathikiran Sudhakaran, Sergio Escalera, Oswald Lanz |
Abstract | Deep 3D CNNs for video action recognition are designed to learn powerful representations in the joint spatio-temporal feature space. In practice however, because of the large number of parameters and computations involved, they may under-perform in the lack of sufficiently large datasets for training them at scale. In this paper we introduce spatial gating in spatial-temporal decomposition of 3D kernels. We implement this concept with Gate-Shift Module (GSM). GSM is lightweight and turns a 2D-CNN into a highly efficient spatio-temporal feature extractor. With GSM plugged in, a 2D-CNN learns to adaptively route features through time and combine them, at almost no additional parameters and computational overhead. We perform an extensive evaluation of the proposed module to study its effectiveness in video action recognition, achieving state-of-the-art results on Something Something-V1 and Diving48 datasets, and obtaining competitive results on EPIC-Kitchens with far less model complexity. |
Tasks | Action Recognition In Videos |
Published | 2019-12-01 |
URL | https://arxiv.org/abs/1912.00381v2 |
https://arxiv.org/pdf/1912.00381v2.pdf | |
PWC | https://paperswithcode.com/paper/gate-shift-networks-for-video-action |
Repo | https://github.com/swathikirans/GSM |
Framework | tf |
A Factored Generalized Additive Model for Clinical Decision Support in the Operating Room
Title | A Factored Generalized Additive Model for Clinical Decision Support in the Operating Room |
Authors | Zhicheng Cui, Bradley A Fritz, Christopher R King, Michael S Avidan, Yixin Chen |
Abstract | Logistic regression (LR) is widely used in clinical prediction because it is simple to deploy and easy to interpret. Nevertheless, being a linear model, LR has limited expressive capability and often has unsatisfactory performance. Generalized additive models (GAMs) extend the linear model with transformations of input features, though feature interaction is not allowed for all GAM variants. In this paper, we propose a factored generalized additive model (F-GAM) to preserve the model interpretability for targeted features while allowing a rich model for interaction with features fixed within the individual. We evaluate F-GAM on prediction of two targets, postoperative acute kidney injury and acute respiratory failure, from a single-center database. We find superior model performance of F-GAM in terms of AUPRC and AUROC compared to several other GAM implementations, random forests, support vector machine, and a deep neural network. We find that the model interpretability is good with results with high face validity. |
Tasks | |
Published | 2019-07-29 |
URL | https://arxiv.org/abs/1907.12596v1 |
https://arxiv.org/pdf/1907.12596v1.pdf | |
PWC | https://paperswithcode.com/paper/a-factored-generalized-additive-model-for |
Repo | https://github.com/nostringattached/FGAM |
Framework | pytorch |
ME-Net: Towards Effective Adversarial Robustness with Matrix Estimation
Title | ME-Net: Towards Effective Adversarial Robustness with Matrix Estimation |
Authors | Yuzhe Yang, Guo Zhang, Dina Katabi, Zhi Xu |
Abstract | Deep neural networks are vulnerable to adversarial attacks. The literature is rich with algorithms that can easily craft successful adversarial examples. In contrast, the performance of defense techniques still lags behind. This paper proposes ME-Net, a defense method that leverages matrix estimation (ME). In ME-Net, images are preprocessed using two steps: first pixels are randomly dropped from the image; then, the image is reconstructed using ME. We show that this process destroys the adversarial structure of the noise, while re-enforcing the global structure in the original image. Since humans typically rely on such global structures in classifying images, the process makes the network mode compatible with human perception. We conduct comprehensive experiments on prevailing benchmarks such as MNIST, CIFAR-10, SVHN, and Tiny-ImageNet. Comparing ME-Net with state-of-the-art defense mechanisms shows that ME-Net consistently outperforms prior techniques, improving robustness against both black-box and white-box attacks. |
Tasks | |
Published | 2019-05-28 |
URL | https://arxiv.org/abs/1905.11971v1 |
https://arxiv.org/pdf/1905.11971v1.pdf | |
PWC | https://paperswithcode.com/paper/me-net-towards-effective-adversarial |
Repo | https://github.com/YyzHarry/ME-Net |
Framework | pytorch |
Finding the most similar textual documents using Case-Based Reasoning
Title | Finding the most similar textual documents using Case-Based Reasoning |
Authors | Marko Mihajlovic, Ning Xiong |
Abstract | In recent years, huge amounts of unstructured textual data on the Internet are a big difficulty for AI algorithms to provide the best recommendations for users and their search queries. Since the Internet became widespread, a lot of research has been done in the field of Natural Language Processing (NLP) and machine learning. Almost every solution transforms documents into Vector Space Models (VSM) in order to apply AI algorithms over them. One such approach is based on Case-Based Reasoning (CBR). Therefore, the most important part of those systems is to compute the similarity between numerical data points. In 2016, the new similarity TS-SS metric is proposed, which showed state-of-the-art results in the field of textual mining for unsupervised learning. However, no one before has investigated its performances for supervised learning (classification task). In this work, we devised a CBR system capable of finding the most similar documents for a given query aiming to investigate performances of the new state-of-the-art metric, TS-SS, in addition to the two other geometrical similarity measures — Euclidean distance and Cosine similarity — that showed the best predictive results over several benchmark corpora. The results show surprising inappropriateness of TS-SS measure for high dimensional features. |
Tasks | |
Published | 2019-11-01 |
URL | https://arxiv.org/abs/1911.00262v1 |
https://arxiv.org/pdf/1911.00262v1.pdf | |
PWC | https://paperswithcode.com/paper/finding-the-most-similar-textual-documents |
Repo | https://github.com/Maki94/document-classification |
Framework | none |
Unsupervised speech representation learning using WaveNet autoencoders
Title | Unsupervised speech representation learning using WaveNet autoencoders |
Authors | Jan Chorowski, Ron J. Weiss, Samy Bengio, Aäron van den Oord |
Abstract | We consider the task of unsupervised extraction of meaningful latent representations of speech by applying autoencoding neural networks to speech waveforms. The goal is to learn a representation able to capture high level semantic content from the signal, e.g.\ phoneme identities, while being invariant to confounding low level details in the signal such as the underlying pitch contour or background noise. Since the learned representation is tuned to contain only phonetic content, we resort to using a high capacity WaveNet decoder to infer information discarded by the encoder from previous samples. Moreover, the behavior of autoencoder models depends on the kind of constraint that is applied to the latent representation. We compare three variants: a simple dimensionality reduction bottleneck, a Gaussian Variational Autoencoder (VAE), and a discrete Vector Quantized VAE (VQ-VAE). We analyze the quality of learned representations in terms of speaker independence, the ability to predict phonetic content, and the ability to accurately reconstruct individual spectrogram frames. Moreover, for discrete encodings extracted using the VQ-VAE, we measure the ease of mapping them to phonemes. We introduce a regularization scheme that forces the representations to focus on the phonetic content of the utterance and report performance comparable with the top entries in the ZeroSpeech 2017 unsupervised acoustic unit discovery task. |
Tasks | Dimensionality Reduction, Representation Learning |
Published | 2019-01-25 |
URL | https://arxiv.org/abs/1901.08810v2 |
https://arxiv.org/pdf/1901.08810v2.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-speech-representation-learning |
Repo | https://github.com/swasun/VQ-VAE-Speech |
Framework | pytorch |
Data Cleansing for Models Trained with SGD
Title | Data Cleansing for Models Trained with SGD |
Authors | Satoshi Hara, Atsushi Nitanda, Takanori Maehara |
Abstract | Data cleansing is a typical approach used to improve the accuracy of machine learning models, which, however, requires extensive domain knowledge to identify the influential instances that affect the models. In this paper, we propose an algorithm that can suggest influential instances without using any domain knowledge. With the proposed method, users only need to inspect the instances suggested by the algorithm, implying that users do not need extensive knowledge for this procedure, which enables even non-experts to conduct data cleansing and improve the model. The existing methods require the loss function to be convex and an optimal model to be obtained, which is not always the case in modern machine learning. To overcome these limitations, we propose a novel approach specifically designed for the models trained with stochastic gradient descent (SGD). The proposed method infers the influential instances by retracing the steps of the SGD while incorporating intermediate models computed in each step. Through experiments, we demonstrate that the proposed method can accurately infer the influential instances. Moreover, we used MNIST and CIFAR10 to show that the models can be effectively improved by removing the influential instances suggested by the proposed method. |
Tasks | |
Published | 2019-06-20 |
URL | https://arxiv.org/abs/1906.08473v1 |
https://arxiv.org/pdf/1906.08473v1.pdf | |
PWC | https://paperswithcode.com/paper/data-cleansing-for-models-trained-with-sgd |
Repo | https://github.com/sato9hara/sgd-influence |
Framework | pytorch |
TriMap: Large-scale Dimensionality Reduction Using Triplets
Title | TriMap: Large-scale Dimensionality Reduction Using Triplets |
Authors | Ehsan Amid, Manfred K. Warmuth |
Abstract | We introduce ``TriMap’'; a dimensionality reduction technique based on triplet constraints that preserves the global accuracy of the data better than the other commonly used methods such as t-SNE, LargeVis, and UMAP. To quantify the global accuracy, we introduce a score which roughly reflects the relative placement of the clusters rather than the individual points. We empirically show the excellent performance of TriMap on a large variety of datasets in terms of the quality of the embedding as well as the runtime. On our performance benchmarks, TriMap easily scales to millions of points without depleting the memory and clearly outperforms t-SNE, LargeVis, and UMAP in terms of runtime. | |
Tasks | Dimensionality Reduction |
Published | 2019-10-01 |
URL | https://arxiv.org/abs/1910.00204v1 |
https://arxiv.org/pdf/1910.00204v1.pdf | |
PWC | https://paperswithcode.com/paper/trimap-large-scale-dimensionality-reduction |
Repo | https://github.com/eamid/trimap |
Framework | none |
Privacy-Preserving Multiple Tensor Factorization for Synthesizing Large-Scale Location Traces
Title | Privacy-Preserving Multiple Tensor Factorization for Synthesizing Large-Scale Location Traces |
Authors | Takao Murakami, Koki Hamada, Yusuke Kawamoto, Takuma Hatano |
Abstract | With the widespread use of LBSs (Location-based Services), synthesizing location traces plays an increasingly important role in analyzing spatial big data while protecting user privacy. Although location synthesizers have been widely studied, existing synthesizers do not provide sufficient utility, privacy, or scalability, hence are not practical for large-scale location traces. To overcome this issue, we propose a novel location synthesizer called PPMTF (Privacy-Preserving Multiple Tensor Factorization). We model various statistical features of the original traces by a transition-count tensor and a visit-count tensor. We factorize these two tensors simultaneously via multiple tensor factorization, and train factor matrices via posterior sampling. Then we synthesize traces using the MH (Metropolis-Hastings) algorithm, and perform a plausible deniability test for a synthetic trace. We comprehensively evaluate the proposed method using two datasets. Our experimental results show that the proposed method preserves various statistical features, provides plausible deniability and differential privacy, and synthesizes large-scale location traces in practical time. The proposed method also significantly outperforms the state-of-the-art methods in terms of utility, privacy, and scalability. |
Tasks | |
Published | 2019-11-11 |
URL | https://arxiv.org/abs/1911.04226v5 |
https://arxiv.org/pdf/1911.04226v5.pdf | |
PWC | https://paperswithcode.com/paper/privacy-preserving-multiple-tensor |
Repo | https://github.com/PPMTF/PPMTF |
Framework | none |
Connectivity-Optimized Representation Learning via Persistent Homology
Title | Connectivity-Optimized Representation Learning via Persistent Homology |
Authors | Christoph Hofer, Roland Kwitt, Mandar Dixit, Marc Niethammer |
Abstract | We study the problem of learning representations with controllable connectivity properties. This is beneficial in situations when the imposed structure can be leveraged upstream. In particular, we control the connectivity of an autoencoder’s latent space via a novel type of loss, operating on information from persistent homology. Under mild conditions, this loss is differentiable and we present a theoretical analysis of the properties induced by the loss. We choose one-class learning as our upstream task and demonstrate that the imposed structure enables informed parameter selection for modeling the in-class distribution via kernel density estimators. Evaluated on computer vision data, these one-class models exhibit competitive performance and, in a low sample size regime, outperform other methods by a large margin. Notably, our results indicate that a single autoencoder, trained on auxiliary (unlabeled) data, yields a mapping into latent space that can be reused across datasets for one-class learning. |
Tasks | Representation Learning |
Published | 2019-06-21 |
URL | https://arxiv.org/abs/1906.09003v1 |
https://arxiv.org/pdf/1906.09003v1.pdf | |
PWC | https://paperswithcode.com/paper/connectivity-optimized-representation |
Repo | https://github.com/c-hofer/COREL_icml2019 |
Framework | pytorch |