Paper Group ANR 141
Plane Pair Matching for Efficient 3D View Registration. Generating Digital Twins with Multiple Sclerosis Using Probabilistic Neural Networks. Towards Modular Algorithm Induction. Improved Subsampled Randomized Hadamard Transform for Linear SVM. Likelihood Regret: An Out-of-Distribution Detection Score For Variational Auto-encoder. Two-stage Discrim …
Plane Pair Matching for Efficient 3D View Registration
Title | Plane Pair Matching for Efficient 3D View Registration |
Authors | Adrien Kaiser, José Alonso Ybanez Zepeda, Tamy Boubekeur |
Abstract | We present a novel method to estimate the motion matrix between overlapping pairs of 3D views in the context of indoor scenes. We use the Manhattan world assumption to introduce lightweight geometric constraints under the form of planes into the problem, which reduces complexity by taking into account the structure of the scene. In particular, we define a stochastic framework to categorize planes as vertical or horizontal and parallel or non-parallel. We leverage this classification to match pairs of planes in overlapping views with point-of-view agnostic structural metrics. We propose to split the motion computation using the classification and estimate separately the rotation and translation of the sensor, using a quadric minimizer. We validate our approach on a toy example and present quantitative experiments on a public RGB-D dataset, comparing against recent state-of-the-art methods. Our evaluation shows that planar constraints only add low computational overhead while improving results in precision when applied after a prior coarse estimate. We conclude by giving hints towards extensions and improvements of current results. |
Tasks | |
Published | 2020-01-20 |
URL | https://arxiv.org/abs/2001.07058v1 |
https://arxiv.org/pdf/2001.07058v1.pdf | |
PWC | https://paperswithcode.com/paper/plane-pair-matching-for-efficient-3d-view |
Repo | |
Framework | |
Generating Digital Twins with Multiple Sclerosis Using Probabilistic Neural Networks
Title | Generating Digital Twins with Multiple Sclerosis Using Probabilistic Neural Networks |
Authors | Jonathan R. Walsh, Aaron M. Smith, Yannick Pouliot, David Li-Bland, Anton Loukianov, Charles K. Fisher |
Abstract | Multiple Sclerosis (MS) is a neurodegenerative disorder characterized by a complex set of clinical assessments. We use an unsupervised machine learning model called a Conditional Restricted Boltzmann Machine (CRBM) to learn the relationships between covariates commonly used to characterize subjects and their disease progression in MS clinical trials. A CRBM is capable of generating digital twins, which are simulated subjects having the same baseline data as actual subjects. Digital twins allow for subject-level statistical analyses of disease progression. The CRBM is trained using data from 2395 subjects enrolled in the placebo arms of clinical trials across the three primary subtypes of MS. We discuss how CRBMs are trained and show that digital twins generated by the model are statistically indistinguishable from their actual subject counterparts along a number of measures. |
Tasks | |
Published | 2020-02-04 |
URL | https://arxiv.org/abs/2002.02779v1 |
https://arxiv.org/pdf/2002.02779v1.pdf | |
PWC | https://paperswithcode.com/paper/generating-digital-twins-with-multiple |
Repo | |
Framework | |
Towards Modular Algorithm Induction
Title | Towards Modular Algorithm Induction |
Authors | Daniel A. Abolafia, Rishabh Singh, Manzil Zaheer, Charles Sutton |
Abstract | We present a modular neural network architecture Main that learns algorithms given a set of input-output examples. Main consists of a neural controller that interacts with a variable-length input tape and learns to compose modules together with their corresponding argument choices. Unlike previous approaches, Main uses a general domain-agnostic mechanism for selection of modules and their arguments. It uses a general input tape layout together with a parallel history tape to indicate most recently used locations. Finally, it uses a memoryless controller with a length-invariant self-attention based input tape encoding to allow for random access to tape locations. The Main architecture is trained end-to-end using reinforcement learning from a set of input-output examples. We evaluate Main on five algorithmic tasks and show that it can learn policies that generalizes perfectly to inputs of much longer lengths than the ones used for training. |
Tasks | |
Published | 2020-02-27 |
URL | https://arxiv.org/abs/2003.04227v1 |
https://arxiv.org/pdf/2003.04227v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-modular-algorithm-induction |
Repo | |
Framework | |
Improved Subsampled Randomized Hadamard Transform for Linear SVM
Title | Improved Subsampled Randomized Hadamard Transform for Linear SVM |
Authors | Zijian Lei, Liang Lan |
Abstract | Subsampled Randomized Hadamard Transform (SRHT), a popular random projection method that can efficiently project a $d$-dimensional data into $r$-dimensional space ($r \ll d$) in $O(dlog(d))$ time, has been widely used to address the challenge of high-dimensionality in machine learning. SRHT works by rotating the input data matrix $\mathbf{X} \in \mathbb{R}^{n \times d}$ by Randomized Walsh-Hadamard Transform followed with a subsequent uniform column sampling on the rotated matrix. Despite the advantages of SRHT, one limitation of SRHT is that it generates the new low-dimensional embedding without considering any specific properties of a given dataset. Therefore, this data-independent random projection method may result in inferior and unstable performance when used for a particular machine learning task, e.g., classification. To overcome this limitation, we analyze the effect of using SRHT for random projection in the context of linear SVM classification. Based on our analysis, we propose importance sampling and deterministic top-$r$ sampling to produce effective low-dimensional embedding instead of uniform sampling SRHT. In addition, we also proposed a new supervised non-uniform sampling method. Our experimental results have demonstrated that our proposed methods can achieve higher classification accuracies than SRHT and other random projection methods on six real-life datasets. |
Tasks | |
Published | 2020-02-05 |
URL | https://arxiv.org/abs/2002.01628v1 |
https://arxiv.org/pdf/2002.01628v1.pdf | |
PWC | https://paperswithcode.com/paper/improved-subsampled-randomized-hadamard |
Repo | |
Framework | |
Likelihood Regret: An Out-of-Distribution Detection Score For Variational Auto-encoder
Title | Likelihood Regret: An Out-of-Distribution Detection Score For Variational Auto-encoder |
Authors | Zhisheng Xiao, Qing Yan, Yali Amit |
Abstract | Deep probabilistic generative models enable modeling the likelihoods of very high dimensional data. An important application of generative modeling should be the ability to detect out-of-distribution (OOD) samples by setting a threshold on the likelihood. However, a recent study shows that probabilistic generative models can, in some cases, assign higher likelihoods on certain types of OOD samples, making the OOD detection rules based on likelihood threshold problematic. To address this issue, several OOD detection methods have been proposed for deep generative models. In this paper, we make the observation that some of these methods fail when applied to generative models based on Variational Auto-encoders (VAE). As an alternative, we propose Likelihood Regret, an efficient OOD score for VAEs. We benchmark our proposed method over existing approaches, and empirical results suggest that our method obtains the best overall OOD detection performances compared with other OOD method applied on VAE. |
Tasks | Out-of-Distribution Detection |
Published | 2020-03-06 |
URL | https://arxiv.org/abs/2003.02977v1 |
https://arxiv.org/pdf/2003.02977v1.pdf | |
PWC | https://paperswithcode.com/paper/likelihood-regret-an-out-of-distribution |
Repo | |
Framework | |
Two-stage Discriminative Re-ranking for Large-scale Landmark Retrieval
Title | Two-stage Discriminative Re-ranking for Large-scale Landmark Retrieval |
Authors | Shuhei Yokoo, Kohei Ozaki, Edgar Simo-Serra, Satoshi Iizuka |
Abstract | We propose an efficient pipeline for large-scale landmark image retrieval that addresses the diversity of the dataset through two-stage discriminative re-ranking. Our approach is based on embedding the images in a feature-space using a convolutional neural network trained with a cosine softmax loss. Due to the variance of the images, which include extreme viewpoint changes such as having to retrieve images of the exterior of a landmark from images of the interior, this is very challenging for approaches based exclusively on visual similarity. Our proposed re-ranking approach improves the results in two steps: in the sort-step, $k$-nearest neighbor search with soft-voting to sort the retrieved results based on their label similarity to the query images, and in the insert-step, we add additional samples from the dataset that were not retrieved by image-similarity. This approach allows overcoming the low visual diversity in retrieved images. In-depth experimental results show that the proposed approach significantly outperforms existing approaches on the challenging Google Landmarks Datasets. Using our methods, we achieved 1st place in the Google Landmark Retrieval 2019 challenge and 3rd place in the Google Landmark Recognition 2019 challenge on Kaggle. Our code is publicly available here: \url{https://github.com/lyakaap/Landmark2019-1st-and-3rd-Place-Solution} |
Tasks | Image Retrieval |
Published | 2020-03-25 |
URL | https://arxiv.org/abs/2003.11211v1 |
https://arxiv.org/pdf/2003.11211v1.pdf | |
PWC | https://paperswithcode.com/paper/two-stage-discriminative-re-ranking-for-large |
Repo | |
Framework | |
Learning Individually Fair Classifier with Causal-Effect Constraint
Title | Learning Individually Fair Classifier with Causal-Effect Constraint |
Authors | Yoichi Chikahara, Shinsaku Sakaue, Akinori Fujino |
Abstract | Machine learning is increasingly being used in various applications that make decisions for individuals. For such applications, we need to strike a balance between achieving good prediction accuracy and making fair decisions with respect to a sensitive feature (e.g., race or gender), which is difficult in complex real-world scenarios. Existing methods measure the unfairness in such scenarios as {\it unfair causal effects} and constrain its mean to zero. Unfortunately, with these methods, the decisions are not necessarily fair for all individuals because even when the mean unfair effect is zero, unfair effects might be positive for some individuals and negative for others, which is discriminatory for them. To learn a classifier that is fair for all individuals, we define unfairness as the {\it probability of individual unfairness} (PIU) and propose to solve an optimization problem that constrains an upper bound on PIU. We theoretically illustrate why our method achieves individual fairness. Experimental results demonstrate that our method learns an individually fair classifier at a slight cost of prediction accuracy. |
Tasks | |
Published | 2020-02-17 |
URL | https://arxiv.org/abs/2002.06746v1 |
https://arxiv.org/pdf/2002.06746v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-individually-fair-classifier-with |
Repo | |
Framework | |
VIFB: A Visible and Infrared Image Fusion Benchmark
Title | VIFB: A Visible and Infrared Image Fusion Benchmark |
Authors | Xingchen Zhang, Ping Ye, Gang Xiao |
Abstract | Visible and infrared image fusion is one of the most important areas in image processing due to its numerous applications. While much progress has been made in recent years with efforts on developing fusion algorithms, there is a lack of code library and benchmark which can gauge the state-of-the-art. In this paper, after briefly reviewing recent advances of visible and infrared image fusion, we present a visible and infrared image fusion benchmark (VIFB) which consists of 21 image pairs, a code library of 20 fusion algorithms and 13 evaluation metrics. We also carry out large scale experiments within the benchmark to understand the performance of these algorithms. By analyzing qualitative and quantitative results, we identify effective algorithms for robust image fusion and give some observations on the status and future prospects of this field. |
Tasks | |
Published | 2020-02-09 |
URL | https://arxiv.org/abs/2002.03322v2 |
https://arxiv.org/pdf/2002.03322v2.pdf | |
PWC | https://paperswithcode.com/paper/vifb-a-visible-and-infrared-image-fusion |
Repo | |
Framework | |
Mitigating Class Boundary Label Uncertainty to Reduce Both Model Bias and Variance
Title | Mitigating Class Boundary Label Uncertainty to Reduce Both Model Bias and Variance |
Authors | Matthew Almeida, Wei Ding, Scott Crouter, Ping Chen |
Abstract | The study of model bias and variance with respect to decision boundaries is critically important in supervised classification. There is generally a tradeoff between the two, as fine-tuning of the decision boundary of a classification model to accommodate more boundary training samples (i.e., higher model complexity) may improve training accuracy (i.e., lower bias) but hurt generalization against unseen data (i.e., higher variance). By focusing on just classification boundary fine-tuning and model complexity, it is difficult to reduce both bias and variance. To overcome this dilemma, we take a different perspective and investigate a new approach to handle inaccuracy and uncertainty in the training data labels, which are inevitable in many applications where labels are conceptual and labeling is performed by human annotators. The process of classification can be undermined by uncertainty in the labels of the training data; extending a boundary to accommodate an inaccurately labeled point will increase both bias and variance. Our novel method can reduce both bias and variance by estimating the pointwise label uncertainty of the training set and accordingly adjusting the training sample weights such that those samples with high uncertainty are weighted down and those with low uncertainty are weighted up. In this way, uncertain samples have a smaller contribution to the objective function of the model’s learning algorithm and exert less pull on the decision boundary. In a real-world physical activity recognition case study, the data presents many labeling challenges, and we show that this new approach improves model performance and reduces model variance. |
Tasks | Activity Recognition |
Published | 2020-02-23 |
URL | https://arxiv.org/abs/2002.09963v1 |
https://arxiv.org/pdf/2002.09963v1.pdf | |
PWC | https://paperswithcode.com/paper/mitigating-class-boundary-label-uncertainty |
Repo | |
Framework | |
Driver Gaze Estimation in the Real World: Overcoming the Eyeglass Challenge
Title | Driver Gaze Estimation in the Real World: Overcoming the Eyeglass Challenge |
Authors | Akshay Rangesh, Bowen Zhang, Mohan M. Trivedi |
Abstract | A driver’s gaze is critical for determining the driver’s attention level, state, situational awareness, and readiness to take over control from partially and fully automated vehicles. Tracking both the head and eyes (pupils) can provide reliable estimation of a driver’s gaze using face images under ideal conditions. However, the vehicular environment introduces a variety of challenges that are usually unaccounted for - harsh illumination, nighttime conditions, and reflective/dark eyeglasses. Unfortunately, relying on head pose alone under such conditions can prove to be unreliable owing to significant eye movements. In this study, we offer solutions to address these problems encountered in the real world. To solve issues with lighting, we demonstrate that using an infrared camera with suitable equalization and normalization usually suffices. To handle eyeglasses and their corresponding artifacts, we adopt the idea of image-to-image translation using generative adversarial networks (GANs) to pre-process images prior to gaze estimation. To this end, we propose the Gaze Preserving CycleGAN (GPCycleGAN). As the name suggests, this network preserves the driver’s gaze while removing potential eyeglasses from infrared face images. GPCycleGAN is based on the well-known CycleGAN approach, with the addition of a gaze classifier and a gaze consistency loss for additional supervision. Our approach exhibits improved performance and robustness on challenging real-world data spanning 13 subjects and a variety of driving conditions. |
Tasks | Gaze Estimation, Image-to-Image Translation |
Published | 2020-02-06 |
URL | https://arxiv.org/abs/2002.02077v2 |
https://arxiv.org/pdf/2002.02077v2.pdf | |
PWC | https://paperswithcode.com/paper/driver-gaze-estimation-in-the-real-world |
Repo | |
Framework | |
Kernel of CycleGAN as a Principle homogeneous space
Title | Kernel of CycleGAN as a Principle homogeneous space |
Authors | Nikita Moriakov, Jonas Adler, Jonas Teuwen |
Abstract | Unpaired image-to-image translation has attracted significant interest due to the invention of CycleGAN, a method which utilizes a combination of adversarial and cycle consistency losses to avoid the need for paired data. It is known that the CycleGAN problem might admit multiple solutions, and our goal in this paper is to analyze the space of exact solutions and to give perturbation bounds for approximate solutions. We show theoretically that the exact solution space is invariant with respect to automorphisms of the underlying probability spaces, and, furthermore, that the group of automorphisms acts freely and transitively on the space of exact solutions. We examine the case of zero pure' CycleGAN loss first in its generality, and, subsequently, expand our analysis to approximate solutions for extended’ CycleGAN loss where identity loss term is included. In order to demonstrate that these results are applicable, we show that under mild conditions nontrivial smooth automorphisms exist. Furthermore, we provide empirical evidence that neural networks can learn these automorphisms with unexpected and unwanted results. We conclude that finding optimal solutions to the CycleGAN loss does not necessarily lead to the envisioned result in image-to-image translation tasks and that underlying hidden symmetries can render the result utterly useless. |
Tasks | Image-to-Image Translation |
Published | 2020-01-24 |
URL | https://arxiv.org/abs/2001.09061v1 |
https://arxiv.org/pdf/2001.09061v1.pdf | |
PWC | https://paperswithcode.com/paper/kernel-of-cyclegan-as-a-principle-homogeneous |
Repo | |
Framework | |
Distortion Agnostic Deep Watermarking
Title | Distortion Agnostic Deep Watermarking |
Authors | Xiyang Luo, Ruohan Zhan, Huiwen Chang, Feng Yang, Peyman Milanfar |
Abstract | Watermarking is the process of embedding information into an image that can survive under distortions, while requiring the encoded image to have little or no perceptual difference from the original image. Recently, deep learning-based methods achieved impressive results in both visual quality and message payload under a wide variety of image distortions. However, these methods all require differentiable models for the image distortions at training time, and may generalize poorly to unknown distortions. This is undesirable since the types of distortions applied to watermarked images are usually unknown and non-differentiable. In this paper, we propose a new framework for distortion-agnostic watermarking, where the image distortion is not explicitly modeled during training. Instead, the robustness of our system comes from two sources: adversarial training and channel coding. Compared to training on a fixed set of distortions and noise levels, our method achieves comparable or better results on distortions available during training, and better performance on unknown distortions. |
Tasks | |
Published | 2020-01-14 |
URL | https://arxiv.org/abs/2001.04580v1 |
https://arxiv.org/pdf/2001.04580v1.pdf | |
PWC | https://paperswithcode.com/paper/distortion-agnostic-deep-watermarking |
Repo | |
Framework | |
Forensic Authorship Analysis of Microblogging Texts Using N-Grams and Stylometric Features
Title | Forensic Authorship Analysis of Microblogging Texts Using N-Grams and Stylometric Features |
Authors | Nicole Mariah Sharon Belvisi, Naveed Muhammad, Fernando Alonso-Fernandez |
Abstract | In recent years, messages and text posted on the Internet are used in criminal investigations. Unfortunately, the authorship of many of them remains unknown. In some channels, the problem of establishing authorship may be even harder, since the length of digital texts is limited to a certain number of characters. In this work, we aim at identifying authors of tweet messages, which are limited to 280 characters. We evaluate popular features employed traditionally in authorship attribution which capture properties of the writing style at different levels. We use for our experiments a self-captured database of 40 users, with 120 to 200 tweets per user. Results using this small set are promising, with the different features providing a classification accuracy between 92% and 98.5%. These results are competitive in comparison to existing studies which employ short texts such as tweets or SMS. |
Tasks | |
Published | 2020-03-24 |
URL | https://arxiv.org/abs/2003.11545v1 |
https://arxiv.org/pdf/2003.11545v1.pdf | |
PWC | https://paperswithcode.com/paper/forensic-authorship-analysis-of-microblogging |
Repo | |
Framework | |
A Neural Network Based on First Principles
Title | A Neural Network Based on First Principles |
Authors | Paul M Baggenstoss |
Abstract | In this paper, a Neural network is derived from first principles, assuming only that each layer begins with a linear dimension-reducing transformation. The approach appeals to the principle of Maximum Entropy (MaxEnt) to find the posterior distribution of the input data of each layer, conditioned on the layer output variables. This posterior has a well-defined mean, the conditional mean estimator, that is calculated using a type of neural network with theoretically-derived activation functions similar to sigmoid, softplus, and relu. This implicitly provides a theoretical justification for their use. A theorem that finds the conditional distribution and conditional mean estimator under the MaxEnt prior is proposed, unifying results for special cases. Combining layers results in an auto-encoder with conventional feed-forward analysis network and a type of linear Bayesian belief network in the reconstruction path. |
Tasks | |
Published | 2020-02-18 |
URL | https://arxiv.org/abs/2002.07469v1 |
https://arxiv.org/pdf/2002.07469v1.pdf | |
PWC | https://paperswithcode.com/paper/a-neural-network-based-on-first-principles |
Repo | |
Framework | |
Can AI decrypt fashion jargon for you?
Title | Can AI decrypt fashion jargon for you? |
Authors | Yuan Shen, Shanduojiao Jiang, Muhammad Rizky Wellyanto, Ranjitha Kumar |
Abstract | When people talk about fashion, they care about the underlying meaning of fashion concepts,e.g., style.For example, people ask questions like what features make this dress smart.However, the product descriptions in today fashion websites are full of domain specific and low level words. It is not clear to people how exactly those low level descriptions can contribute to a style or any high level fashion concept. In this paper, we proposed a data driven solution to address this concept understanding issues by leveraging a large number of existing product data on fashion sites. We first collected and categorized 1546 fashion keywords into 5 different fashion categories. Then, we collected a new fashion product dataset with 853,056 products in total. Finally, we trained a deep learning model that can explicitly predict and explain high level fashion concepts in a product image with its low level and domain specific fashion features. |
Tasks | |
Published | 2020-03-18 |
URL | https://arxiv.org/abs/2003.08052v1 |
https://arxiv.org/pdf/2003.08052v1.pdf | |
PWC | https://paperswithcode.com/paper/can-ai-decrypt-fashion-jargon-for-you |
Repo | |
Framework | |