April 2, 2020

3149 words 15 mins read

Paper Group ANR 141

Plane Pair Matching for Efficient 3D View Registration. Generating Digital Twins with Multiple Sclerosis Using Probabilistic Neural Networks. Towards Modular Algorithm Induction. Improved Subsampled Randomized Hadamard Transform for Linear SVM. Likelihood Regret: An Out-of-Distribution Detection Score For Variational Auto-encoder. Two-stage Discrim …

Plane Pair Matching for Efficient 3D View Registration


Title	Plane Pair Matching for Efficient 3D View Registration
Authors	Adrien Kaiser, José Alonso Ybanez Zepeda, Tamy Boubekeur
Abstract	We present a novel method to estimate the motion matrix between overlapping pairs of 3D views in the context of indoor scenes. We use the Manhattan world assumption to introduce lightweight geometric constraints under the form of planes into the problem, which reduces complexity by taking into account the structure of the scene. In particular, we define a stochastic framework to categorize planes as vertical or horizontal and parallel or non-parallel. We leverage this classification to match pairs of planes in overlapping views with point-of-view agnostic structural metrics. We propose to split the motion computation using the classification and estimate separately the rotation and translation of the sensor, using a quadric minimizer. We validate our approach on a toy example and present quantitative experiments on a public RGB-D dataset, comparing against recent state-of-the-art methods. Our evaluation shows that planar constraints only add low computational overhead while improving results in precision when applied after a prior coarse estimate. We conclude by giving hints towards extensions and improvements of current results.
Tasks
Published	2020-01-20
URL	https://arxiv.org/abs/2001.07058v1
PDF	https://arxiv.org/pdf/2001.07058v1.pdf
PWC	https://paperswithcode.com/paper/plane-pair-matching-for-efficient-3d-view
Repo
Framework

Generating Digital Twins with Multiple Sclerosis Using Probabilistic Neural Networks


Title	Generating Digital Twins with Multiple Sclerosis Using Probabilistic Neural Networks
Authors	Jonathan R. Walsh, Aaron M. Smith, Yannick Pouliot, David Li-Bland, Anton Loukianov, Charles K. Fisher
Abstract	Multiple Sclerosis (MS) is a neurodegenerative disorder characterized by a complex set of clinical assessments. We use an unsupervised machine learning model called a Conditional Restricted Boltzmann Machine (CRBM) to learn the relationships between covariates commonly used to characterize subjects and their disease progression in MS clinical trials. A CRBM is capable of generating digital twins, which are simulated subjects having the same baseline data as actual subjects. Digital twins allow for subject-level statistical analyses of disease progression. The CRBM is trained using data from 2395 subjects enrolled in the placebo arms of clinical trials across the three primary subtypes of MS. We discuss how CRBMs are trained and show that digital twins generated by the model are statistically indistinguishable from their actual subject counterparts along a number of measures.
Tasks
Published	2020-02-04
URL	https://arxiv.org/abs/2002.02779v1
PDF	https://arxiv.org/pdf/2002.02779v1.pdf
PWC	https://paperswithcode.com/paper/generating-digital-twins-with-multiple
Repo
Framework

Towards Modular Algorithm Induction


Title	Towards Modular Algorithm Induction
Authors	Daniel A. Abolafia, Rishabh Singh, Manzil Zaheer, Charles Sutton
Abstract	We present a modular neural network architecture Main that learns algorithms given a set of input-output examples. Main consists of a neural controller that interacts with a variable-length input tape and learns to compose modules together with their corresponding argument choices. Unlike previous approaches, Main uses a general domain-agnostic mechanism for selection of modules and their arguments. It uses a general input tape layout together with a parallel history tape to indicate most recently used locations. Finally, it uses a memoryless controller with a length-invariant self-attention based input tape encoding to allow for random access to tape locations. The Main architecture is trained end-to-end using reinforcement learning from a set of input-output examples. We evaluate Main on five algorithmic tasks and show that it can learn policies that generalizes perfectly to inputs of much longer lengths than the ones used for training.
Tasks
Published	2020-02-27
URL	https://arxiv.org/abs/2003.04227v1
PDF	https://arxiv.org/pdf/2003.04227v1.pdf
PWC	https://paperswithcode.com/paper/towards-modular-algorithm-induction
Repo
Framework

Improved Subsampled Randomized Hadamard Transform for Linear SVM


Title	Improved Subsampled Randomized Hadamard Transform for Linear SVM
Authors	Zijian Lei, Liang Lan
Abstract	Subsampled Randomized Hadamard Transform (SRHT), a popular random projection method that can efficiently project a $d$-dimensional data into $r$-dimensional space ($r \ll d$) in $O(dlog(d))$ time, has been widely used to address the challenge of high-dimensionality in machine learning. SRHT works by rotating the input data matrix $\mathbf{X} \in \mathbb{R}^{n \times d}$ by Randomized Walsh-Hadamard Transform followed with a subsequent uniform column sampling on the rotated matrix. Despite the advantages of SRHT, one limitation of SRHT is that it generates the new low-dimensional embedding without considering any specific properties of a given dataset. Therefore, this data-independent random projection method may result in inferior and unstable performance when used for a particular machine learning task, e.g., classification. To overcome this limitation, we analyze the effect of using SRHT for random projection in the context of linear SVM classification. Based on our analysis, we propose importance sampling and deterministic top-$r$ sampling to produce effective low-dimensional embedding instead of uniform sampling SRHT. In addition, we also proposed a new supervised non-uniform sampling method. Our experimental results have demonstrated that our proposed methods can achieve higher classification accuracies than SRHT and other random projection methods on six real-life datasets.
Tasks
Published	2020-02-05
URL	https://arxiv.org/abs/2002.01628v1
PDF	https://arxiv.org/pdf/2002.01628v1.pdf
PWC	https://paperswithcode.com/paper/improved-subsampled-randomized-hadamard
Repo
Framework

Likelihood Regret: An Out-of-Distribution Detection Score For Variational Auto-encoder


Title	Likelihood Regret: An Out-of-Distribution Detection Score For Variational Auto-encoder
Authors	Zhisheng Xiao, Qing Yan, Yali Amit
Abstract	Deep probabilistic generative models enable modeling the likelihoods of very high dimensional data. An important application of generative modeling should be the ability to detect out-of-distribution (OOD) samples by setting a threshold on the likelihood. However, a recent study shows that probabilistic generative models can, in some cases, assign higher likelihoods on certain types of OOD samples, making the OOD detection rules based on likelihood threshold problematic. To address this issue, several OOD detection methods have been proposed for deep generative models. In this paper, we make the observation that some of these methods fail when applied to generative models based on Variational Auto-encoders (VAE). As an alternative, we propose Likelihood Regret, an efficient OOD score for VAEs. We benchmark our proposed method over existing approaches, and empirical results suggest that our method obtains the best overall OOD detection performances compared with other OOD method applied on VAE.
Tasks	Out-of-Distribution Detection
Published	2020-03-06
URL	https://arxiv.org/abs/2003.02977v1
PDF	https://arxiv.org/pdf/2003.02977v1.pdf
PWC	https://paperswithcode.com/paper/likelihood-regret-an-out-of-distribution
Repo
Framework

Two-stage Discriminative Re-ranking for Large-scale Landmark Retrieval


Title	Two-stage Discriminative Re-ranking for Large-scale Landmark Retrieval
Authors	Shuhei Yokoo, Kohei Ozaki, Edgar Simo-Serra, Satoshi Iizuka
Abstract	We propose an efficient pipeline for large-scale landmark image retrieval that addresses the diversity of the dataset through two-stage discriminative re-ranking. Our approach is based on embedding the images in a feature-space using a convolutional neural network trained with a cosine softmax loss. Due to the variance of the images, which include extreme viewpoint changes such as having to retrieve images of the exterior of a landmark from images of the interior, this is very challenging for approaches based exclusively on visual similarity. Our proposed re-ranking approach improves the results in two steps: in the sort-step, $k$-nearest neighbor search with soft-voting to sort the retrieved results based on their label similarity to the query images, and in the insert-step, we add additional samples from the dataset that were not retrieved by image-similarity. This approach allows overcoming the low visual diversity in retrieved images. In-depth experimental results show that the proposed approach significantly outperforms existing approaches on the challenging Google Landmarks Datasets. Using our methods, we achieved 1st place in the Google Landmark Retrieval 2019 challenge and 3rd place in the Google Landmark Recognition 2019 challenge on Kaggle. Our code is publicly available here: \url{https://github.com/lyakaap/Landmark2019-1st-and-3rd-Place-Solution}
Tasks	Image Retrieval
Published	2020-03-25
URL	https://arxiv.org/abs/2003.11211v1
PDF	https://arxiv.org/pdf/2003.11211v1.pdf
PWC	https://paperswithcode.com/paper/two-stage-discriminative-re-ranking-for-large
Repo
Framework

Learning Individually Fair Classifier with Causal-Effect Constraint


Title	Learning Individually Fair Classifier with Causal-Effect Constraint
Authors	Yoichi Chikahara, Shinsaku Sakaue, Akinori Fujino
Abstract	Machine learning is increasingly being used in various applications that make decisions for individuals. For such applications, we need to strike a balance between achieving good prediction accuracy and making fair decisions with respect to a sensitive feature (e.g., race or gender), which is difficult in complex real-world scenarios. Existing methods measure the unfairness in such scenarios as {\it unfair causal effects} and constrain its mean to zero. Unfortunately, with these methods, the decisions are not necessarily fair for all individuals because even when the mean unfair effect is zero, unfair effects might be positive for some individuals and negative for others, which is discriminatory for them. To learn a classifier that is fair for all individuals, we define unfairness as the {\it probability of individual unfairness} (PIU) and propose to solve an optimization problem that constrains an upper bound on PIU. We theoretically illustrate why our method achieves individual fairness. Experimental results demonstrate that our method learns an individually fair classifier at a slight cost of prediction accuracy.
Tasks
Published	2020-02-17
URL	https://arxiv.org/abs/2002.06746v1
PDF	https://arxiv.org/pdf/2002.06746v1.pdf
PWC	https://paperswithcode.com/paper/learning-individually-fair-classifier-with
Repo
Framework

VIFB: A Visible and Infrared Image Fusion Benchmark


Title	VIFB: A Visible and Infrared Image Fusion Benchmark
Authors	Xingchen Zhang, Ping Ye, Gang Xiao
Abstract	Visible and infrared image fusion is one of the most important areas in image processing due to its numerous applications. While much progress has been made in recent years with efforts on developing fusion algorithms, there is a lack of code library and benchmark which can gauge the state-of-the-art. In this paper, after briefly reviewing recent advances of visible and infrared image fusion, we present a visible and infrared image fusion benchmark (VIFB) which consists of 21 image pairs, a code library of 20 fusion algorithms and 13 evaluation metrics. We also carry out large scale experiments within the benchmark to understand the performance of these algorithms. By analyzing qualitative and quantitative results, we identify effective algorithms for robust image fusion and give some observations on the status and future prospects of this field.
Tasks
Published	2020-02-09
URL	https://arxiv.org/abs/2002.03322v2
PDF	https://arxiv.org/pdf/2002.03322v2.pdf
PWC	https://paperswithcode.com/paper/vifb-a-visible-and-infrared-image-fusion
Repo
Framework

Mitigating Class Boundary Label Uncertainty to Reduce Both Model Bias and Variance


Title	Mitigating Class Boundary Label Uncertainty to Reduce Both Model Bias and Variance
Authors	Matthew Almeida, Wei Ding, Scott Crouter, Ping Chen
Abstract	The study of model bias and variance with respect to decision boundaries is critically important in supervised classification. There is generally a tradeoff between the two, as fine-tuning of the decision boundary of a classification model to accommodate more boundary training samples (i.e., higher model complexity) may improve training accuracy (i.e., lower bias) but hurt generalization against unseen data (i.e., higher variance). By focusing on just classification boundary fine-tuning and model complexity, it is difficult to reduce both bias and variance. To overcome this dilemma, we take a different perspective and investigate a new approach to handle inaccuracy and uncertainty in the training data labels, which are inevitable in many applications where labels are conceptual and labeling is performed by human annotators. The process of classification can be undermined by uncertainty in the labels of the training data; extending a boundary to accommodate an inaccurately labeled point will increase both bias and variance. Our novel method can reduce both bias and variance by estimating the pointwise label uncertainty of the training set and accordingly adjusting the training sample weights such that those samples with high uncertainty are weighted down and those with low uncertainty are weighted up. In this way, uncertain samples have a smaller contribution to the objective function of the model’s learning algorithm and exert less pull on the decision boundary. In a real-world physical activity recognition case study, the data presents many labeling challenges, and we show that this new approach improves model performance and reduces model variance.
Tasks	Activity Recognition
Published	2020-02-23
URL	https://arxiv.org/abs/2002.09963v1
PDF	https://arxiv.org/pdf/2002.09963v1.pdf
PWC	https://paperswithcode.com/paper/mitigating-class-boundary-label-uncertainty
Repo
Framework

Driver Gaze Estimation in the Real World: Overcoming the Eyeglass Challenge


Title	Driver Gaze Estimation in the Real World: Overcoming the Eyeglass Challenge
Authors	Akshay Rangesh, Bowen Zhang, Mohan M. Trivedi
Abstract	A driver’s gaze is critical for determining the driver’s attention level, state, situational awareness, and readiness to take over control from partially and fully automated vehicles. Tracking both the head and eyes (pupils) can provide reliable estimation of a driver’s gaze using face images under ideal conditions. However, the vehicular environment introduces a variety of challenges that are usually unaccounted for - harsh illumination, nighttime conditions, and reflective/dark eyeglasses. Unfortunately, relying on head pose alone under such conditions can prove to be unreliable owing to significant eye movements. In this study, we offer solutions to address these problems encountered in the real world. To solve issues with lighting, we demonstrate that using an infrared camera with suitable equalization and normalization usually suffices. To handle eyeglasses and their corresponding artifacts, we adopt the idea of image-to-image translation using generative adversarial networks (GANs) to pre-process images prior to gaze estimation. To this end, we propose the Gaze Preserving CycleGAN (GPCycleGAN). As the name suggests, this network preserves the driver’s gaze while removing potential eyeglasses from infrared face images. GPCycleGAN is based on the well-known CycleGAN approach, with the addition of a gaze classifier and a gaze consistency loss for additional supervision. Our approach exhibits improved performance and robustness on challenging real-world data spanning 13 subjects and a variety of driving conditions.
Tasks	Gaze Estimation, Image-to-Image Translation
Published	2020-02-06
URL	https://arxiv.org/abs/2002.02077v2
PDF	https://arxiv.org/pdf/2002.02077v2.pdf
PWC	https://paperswithcode.com/paper/driver-gaze-estimation-in-the-real-world
Repo
Framework

Kernel of CycleGAN as a Principle homogeneous space


Title	Kernel of CycleGAN as a Principle homogeneous space
Authors	Nikita Moriakov, Jonas Adler, Jonas Teuwen
Abstract	Unpaired image-to-image translation has attracted significant interest due to the invention of CycleGAN, a method which utilizes a combination of adversarial and cycle consistency losses to avoid the need for paired data. It is known that the CycleGAN problem might admit multiple solutions, and our goal in this paper is to analyze the space of exact solutions and to give perturbation bounds for approximate solutions. We show theoretically that the exact solution space is invariant with respect to automorphisms of the underlying probability spaces, and, furthermore, that the group of automorphisms acts freely and transitively on the space of exact solutions. We examine the case of zero `pure' CycleGAN loss first in its generality, and, subsequently, expand our analysis to approximate solutions for` extended’ CycleGAN loss where identity loss term is included. In order to demonstrate that these results are applicable, we show that under mild conditions nontrivial smooth automorphisms exist. Furthermore, we provide empirical evidence that neural networks can learn these automorphisms with unexpected and unwanted results. We conclude that finding optimal solutions to the CycleGAN loss does not necessarily lead to the envisioned result in image-to-image translation tasks and that underlying hidden symmetries can render the result utterly useless.
Tasks	Image-to-Image Translation
Published	2020-01-24
URL	https://arxiv.org/abs/2001.09061v1
PDF	https://arxiv.org/pdf/2001.09061v1.pdf
PWC	https://paperswithcode.com/paper/kernel-of-cyclegan-as-a-principle-homogeneous
Repo
Framework

Distortion Agnostic Deep Watermarking


Title	Distortion Agnostic Deep Watermarking
Authors	Xiyang Luo, Ruohan Zhan, Huiwen Chang, Feng Yang, Peyman Milanfar
Abstract	Watermarking is the process of embedding information into an image that can survive under distortions, while requiring the encoded image to have little or no perceptual difference from the original image. Recently, deep learning-based methods achieved impressive results in both visual quality and message payload under a wide variety of image distortions. However, these methods all require differentiable models for the image distortions at training time, and may generalize poorly to unknown distortions. This is undesirable since the types of distortions applied to watermarked images are usually unknown and non-differentiable. In this paper, we propose a new framework for distortion-agnostic watermarking, where the image distortion is not explicitly modeled during training. Instead, the robustness of our system comes from two sources: adversarial training and channel coding. Compared to training on a fixed set of distortions and noise levels, our method achieves comparable or better results on distortions available during training, and better performance on unknown distortions.
Tasks
Published	2020-01-14
URL	https://arxiv.org/abs/2001.04580v1
PDF	https://arxiv.org/pdf/2001.04580v1.pdf
PWC	https://paperswithcode.com/paper/distortion-agnostic-deep-watermarking
Repo
Framework

Forensic Authorship Analysis of Microblogging Texts Using N-Grams and Stylometric Features


Title	Forensic Authorship Analysis of Microblogging Texts Using N-Grams and Stylometric Features
Authors	Nicole Mariah Sharon Belvisi, Naveed Muhammad, Fernando Alonso-Fernandez
Abstract	In recent years, messages and text posted on the Internet are used in criminal investigations. Unfortunately, the authorship of many of them remains unknown. In some channels, the problem of establishing authorship may be even harder, since the length of digital texts is limited to a certain number of characters. In this work, we aim at identifying authors of tweet messages, which are limited to 280 characters. We evaluate popular features employed traditionally in authorship attribution which capture properties of the writing style at different levels. We use for our experiments a self-captured database of 40 users, with 120 to 200 tweets per user. Results using this small set are promising, with the different features providing a classification accuracy between 92% and 98.5%. These results are competitive in comparison to existing studies which employ short texts such as tweets or SMS.
Tasks
Published	2020-03-24
URL	https://arxiv.org/abs/2003.11545v1
PDF	https://arxiv.org/pdf/2003.11545v1.pdf
PWC	https://paperswithcode.com/paper/forensic-authorship-analysis-of-microblogging
Repo
Framework

A Neural Network Based on First Principles


Title	A Neural Network Based on First Principles
Authors	Paul M Baggenstoss
Abstract	In this paper, a Neural network is derived from first principles, assuming only that each layer begins with a linear dimension-reducing transformation. The approach appeals to the principle of Maximum Entropy (MaxEnt) to find the posterior distribution of the input data of each layer, conditioned on the layer output variables. This posterior has a well-defined mean, the conditional mean estimator, that is calculated using a type of neural network with theoretically-derived activation functions similar to sigmoid, softplus, and relu. This implicitly provides a theoretical justification for their use. A theorem that finds the conditional distribution and conditional mean estimator under the MaxEnt prior is proposed, unifying results for special cases. Combining layers results in an auto-encoder with conventional feed-forward analysis network and a type of linear Bayesian belief network in the reconstruction path.
Tasks
Published	2020-02-18
URL	https://arxiv.org/abs/2002.07469v1
PDF	https://arxiv.org/pdf/2002.07469v1.pdf
PWC	https://paperswithcode.com/paper/a-neural-network-based-on-first-principles
Repo
Framework

Can AI decrypt fashion jargon for you?


Title	Can AI decrypt fashion jargon for you?
Authors	Yuan Shen, Shanduojiao Jiang, Muhammad Rizky Wellyanto, Ranjitha Kumar
Abstract	When people talk about fashion, they care about the underlying meaning of fashion concepts,e.g., style.For example, people ask questions like what features make this dress smart.However, the product descriptions in today fashion websites are full of domain specific and low level words. It is not clear to people how exactly those low level descriptions can contribute to a style or any high level fashion concept. In this paper, we proposed a data driven solution to address this concept understanding issues by leveraging a large number of existing product data on fashion sites. We first collected and categorized 1546 fashion keywords into 5 different fashion categories. Then, we collected a new fashion product dataset with 853,056 products in total. Finally, we trained a deep learning model that can explicitly predict and explain high level fashion concepts in a product image with its low level and domain specific fashion features.
Tasks
Published	2020-03-18
URL	https://arxiv.org/abs/2003.08052v1
PDF	https://arxiv.org/pdf/2003.08052v1.pdf
PWC	https://paperswithcode.com/paper/can-ai-decrypt-fashion-jargon-for-you
Repo
Framework