January 27, 2020

2982 words 14 mins read

Paper Group ANR 1293

Animated Stickies: Fast Video Projection Mapping onto a Markerless Plane through a Direct Closed-Loop Alignment. SpecAE: Spectral AutoEncoder for Anomaly Detection in Attributed Networks. BERT for Large-scale Video Segment Classification with Test-time Augmentation. Standing on the Shoulders of Giants: AI-driven Calibration of Localisation Technolo …

Animated Stickies: Fast Video Projection Mapping onto a Markerless Plane through a Direct Closed-Loop Alignment


Title	Animated Stickies: Fast Video Projection Mapping onto a Markerless Plane through a Direct Closed-Loop Alignment
Authors	Shingo Kagami, Koichi Hashimoto
Abstract	This paper presents a fast projection mapping method for moving image content projected onto a markerless planar surface using a low-latency Digital Micromirror Device (DMD) projector. By adopting a closed-loop alignment approach, in which not only the surface texture but also the projected image is tracked by a camera, the proposed method is free from a calibration or position adjustment between the camera and projector. We designed fiducial patterns to be inserted into a fast flapping sequence of binary frames of the DMD projector, which allows the simultaneous tracking of the surface texture and a fiducial geometry separate from a single image captured by the camera. The proposed method implemented on a CPU runs at 400 fps and enables arbitrary video contents to be “stuck” onto a variety of textured surfaces.
Tasks	Calibration
Published	2019-08-30
URL	https://arxiv.org/abs/1909.00032v1
PDF	https://arxiv.org/pdf/1909.00032v1.pdf
PWC	https://paperswithcode.com/paper/animated-stickies-fast-video-projection
Repo
Framework

SpecAE: Spectral AutoEncoder for Anomaly Detection in Attributed Networks


Title	SpecAE: Spectral AutoEncoder for Anomaly Detection in Attributed Networks
Authors	Yuening Li, Xiao Huang, Jundong Li, Mengnan Du, Na Zou
Abstract	Anomaly detection aims to distinguish observations that are rare and different from the majority. While most existing algorithms assume that instances are i.i.d., in many practical scenarios, links describing instance-to-instance dependencies and interactions are available. Such systems are called attributed networks. Anomaly detection in attributed networks has various applications such as monitoring suspicious accounts in social media and financial fraud in transaction networks. However, it remains a challenging task since the definition of anomaly becomes more complicated and topological structures are heterogeneous with nodal attributes. In this paper, we propose a spectral convolution and deconvolution based framework – SpecAE, to project the attributed network into a tailored space to detect global and community anomalies. SpecAE leverages Laplacian sharpening to amplify the distances between representations of anomalies and the ones of the majority. The learned representations along with reconstruction errors are combined with a density estimation model to perform the detection. They are trained jointly as an end-to-end framework. Experiments on real-world datasets demonstrate the effectiveness of SpecAE.
Tasks	Anomaly Detection, Density Estimation
Published	2019-08-11
URL	https://arxiv.org/abs/1908.03849v3
PDF	https://arxiv.org/pdf/1908.03849v3.pdf
PWC	https://paperswithcode.com/paper/specae-spectral-autoencoder-for-anomaly
Repo
Framework

BERT for Large-scale Video Segment Classification with Test-time Augmentation


Title	BERT for Large-scale Video Segment Classification with Test-time Augmentation
Authors	Tianqi Liu, Qizhan Shao
Abstract	This paper presents our approach to the third YouTube-8M video understanding competition that challenges par-ticipants to localize video-level labels at scale to the pre-cise time in the video where the label actually occurs. Ourmodel is an ensemble of frame-level models such as GatedNetVLAD and NeXtVLAD and various BERT models withtest-time augmentation. We explore multiple ways to ag-gregate BERT outputs as video representation and variousways to combine visual and audio information. We proposetest-time augmentation as shifting video frames to one leftor right unit, which adds variety to the predictions and em-pirically shows improvement in evaluation metrics. We firstpre-train the model on the 4M training video-level data, andthen fine-tune the model on 237K annotated video segment-level data. We achieve MAP@100K 0.7871 on private test-ing video segment data, which is ranked 9th over 283 teams.
Tasks	Video Understanding
Published	2019-12-02
URL	https://arxiv.org/abs/1912.01127v1
PDF	https://arxiv.org/pdf/1912.01127v1.pdf
PWC	https://paperswithcode.com/paper/bert-for-large-scale-video-segment
Repo
Framework

Standing on the Shoulders of Giants: AI-driven Calibration of Localisation Technologies


Title	Standing on the Shoulders of Giants: AI-driven Calibration of Localisation Technologies
Authors	Aftab Khan, Tim Farnham, Roget Kou, Usman Raza, Thajanee Premalal, Aleksandar Stanoev, William Thompson
Abstract	High accuracy localisation technologies exist but are prohibitively expensive to deploy for large indoor spaces such as warehouses, factories, and supermarkets to track assets and people. However, these technologies can be used to lend their highly accurate localisation capabilities to low-cost, commodity, and less-accurate technologies. In this paper, we bridge this link by proposing a technology-agnostic calibration framework based on artificial intelligence to assist such low-cost technologies through highly accurate localisation systems. A single-layer neural network is used to calibrate less accurate technology using more accurate one such as BLE using UWB and UWB using a professional motion tracking system. On a real indoor testbed, we demonstrate an increase in accuracy of approximately 70% for BLE and 50% for UWB. Not only the proposed approach requires a very short measurement campaign, the low complexity of the single-layer neural network also makes it ideal for deployment on constrained devices typically for localisation purposes.
Tasks	Calibration
Published	2019-05-30
URL	https://arxiv.org/abs/1905.13118v1
PDF	https://arxiv.org/pdf/1905.13118v1.pdf
PWC	https://paperswithcode.com/paper/standing-on-the-shoulders-of-giants-ai-driven
Repo
Framework

Cross-Class Relevance Learning for Temporal Concept Localization


Title	Cross-Class Relevance Learning for Temporal Concept Localization
Authors	Junwei Ma, Satya Krishna Gorti, Maksims Volkovs, Ilya Stanevich, Guangwei Yu
Abstract	We present a novel Cross-Class Relevance Learning approach for the task of temporal concept localization. Most localization architectures rely on feature extraction layers followed by a classification layer which outputs class probabilities for each segment. However, in many real-world applications classes can exhibit complex relationships that are difficult to model with this architecture. In contrast, we propose to incorporate target class and class-related features as input, and learn a pairwise binary model to predict general segment to class relevance. This facilitates learning of shared information between classes, and allows for arbitrary class-specific feature engineering. We apply this approach to the 3rd YouTube-8M Video Understanding Challenge together with other leading models, and achieve first place out of over 280 teams. In this paper we describe our approach and show some empirical results.
Tasks	Feature Engineering, Video Understanding
Published	2019-11-19
URL	https://arxiv.org/abs/1911.08548v1
PDF	https://arxiv.org/pdf/1911.08548v1.pdf
PWC	https://paperswithcode.com/paper/cross-class-relevance-learning-for-temporal
Repo
Framework

Identifying Epigenetic Signature of Breast Cancer with Machine Learning


Title	Identifying Epigenetic Signature of Breast Cancer with Machine Learning
Authors	Maxim Vaysburd
Abstract	The research reported in this paper identifies the epigenetic biomarker (methylation beta pattern) of breast cancer. Many cancers are triggered by abnormal gene expression levels caused by aberrant methylation of CpG sites in the DNA. In order to develop early diagnostics of cancer-causing methylations and to develop a treatment, it is necessary to identify a few dozen key cancer-related CpG methylation sites out of the millions of locations in the DNA. This research used public TCGA dataset to train a TensorFlow machine learning model to classify breast cancer versus non-breast-cancer tissue samples, based on over 300,000 methylation beta values in each sample. L1 regularization was applied to identify the CpG methylation sites most important for accurate classification. It was hypothesized that CpG sites with the highest learned model weights correspond to DNA locations most relevant to breast cancer. A reduced model trained on methylation betas of just the 25 CpG sites having the highest weights in the full model (trained on methylation betas at over 300,000 CpG sites) has achieved over 94% accuracy on evaluation data, confirming that the identified 25 CpG sites are indeed a biomarker of breast cancer.
Tasks
Published	2019-10-12
URL	https://arxiv.org/abs/1910.06899v1
PDF	https://arxiv.org/pdf/1910.06899v1.pdf
PWC	https://paperswithcode.com/paper/identifying-epigenetic-signature-of-breast
Repo
Framework

Measuring Long-term Impact of Ads on LinkedIn Feed


Title	Measuring Long-term Impact of Ads on LinkedIn Feed
Authors	Jinyun Yan, Birjodh Tiwana, Souvik Ghosh, Haishan Liu, Shaunak Chatterjee
Abstract	Organic updates (from a member’s network) and sponsored updates (or ads, from advertisers) together form the newsfeed on LinkedIn. The newsfeed, the default homepage for members, attracts them to engage, brings them value and helps LinkedIn grow. Engagement and Revenue on feed are two critical, yet often conflicting objectives. Hence, it is important to design a good Revenue-Engagement Tradeoff (RENT) mechanism to blend ads in the feed. In this paper, we design experiments to understand how members’ behavior evolve over time given different ads experiences. These experiences vary on ads density, while the quality of ads (ensured by relevance models) is held constant. Our experiments have been conducted on randomized member buckets and we use two experimental designs to measure the short term and long term effects of the various treatments. Based on the first three months’ data, we observe that the long term impact is at a much smaller scale than the short term impact in our application. Furthermore, we observe different member cohorts (based on user activity level) adapt and react differently over time.
Tasks
Published	2019-01-29
URL	https://arxiv.org/abs/1902.03098v2
PDF	https://arxiv.org/pdf/1902.03098v2.pdf
PWC	https://paperswithcode.com/paper/measuring-long-term-impact-of-ads-on-linkedin
Repo
Framework

Object Pose Estimation in Robotics Revisited


Title	Object Pose Estimation in Robotics Revisited
Authors	Antti Hietanen, Jyrki Latokartano, Alessandro Foi, Roel Pieters, Ville Kyrki, Minna Lanz, Joni-Kristian Kämäräinen
Abstract	Vision-based object grasping and manipulation in robotics require accurate estimation of the object 6D pose. Therefore pose estimation has received significant attention and multiple datasets and evaluation metrics have been proposed. Most of the existing evaluation metrics rank the estimated poses solely based on the visual perspective i.e. how well two geometrical surfaces are aligned, which does not directly indicate the goodness of the pose for a robot manipulation. In robotic manipulation the optimal grasp pose depends on many factors such as target object weight and material, robot, gripper, and the task itself. In this work we address these factors by proposing a probabilistic evaluation metric that ranks an estimated object pose based on the conditional probability of completing a task given this estimated pose. The evaluation metric is validated in controlled experiments and a number of baseline and recent pose estimation methods are compared on a dataset of industrial parts for assembly tasks. The experimental results confirm that the proposed evaluation metric measures the fitness of an estimated pose more accurately for a robotic task compared to prior metrics.
Tasks	3D Pose Estimation, 6D Pose Estimation using RGB, Pose Estimation
Published	2019-06-06
URL	https://arxiv.org/abs/1906.02783v2
PDF	https://arxiv.org/pdf/1906.02783v2.pdf
PWC	https://paperswithcode.com/paper/benchmarking-6d-object-pose-estimation-for
Repo
Framework

Quadtree Generating Networks: Efficient Hierarchical Scene Parsing with Sparse Convolutions


Title	Quadtree Generating Networks: Efficient Hierarchical Scene Parsing with Sparse Convolutions
Authors	Kashyap Chitta, Jose M. Alvarez, Martial Hebert
Abstract	Semantic segmentation with Convolutional Neural Networks is a memory-intensive task due to the high spatial resolution of feature maps and output predictions. In this paper, we present Quadtree Generating Networks (QGNs), a novel approach able to drastically reduce the memory footprint of modern semantic segmentation networks. The key idea is to use quadtrees to represent the predictions and target segmentation masks instead of dense pixel grids. Our quadtree representation enables hierarchical processing of an input image, with the most computationally demanding layers only being used at regions in the image containing boundaries between classes. In addition, given a trained model, our representation enables flexible inference schemes to trade-off accuracy and computational cost, allowing the network to adapt in constrained situations such as embedded devices. We demonstrate the benefits of our approach on the Cityscapes, SUN-RGBD and ADE20k datasets. On Cityscapes, we obtain an relative 3% mIoU improvement compared to a dilated network with similar memory consumption; and only receive a 3% relative mIoU drop compared to a large dilated network, while reducing memory consumption by over 4$\times$.
Tasks	Scene Parsing, Semantic Segmentation
Published	2019-07-27
URL	https://arxiv.org/abs/1907.11821v2
PDF	https://arxiv.org/pdf/1907.11821v2.pdf
PWC	https://paperswithcode.com/paper/quadtree-generating-networks-efficient
Repo
Framework

Decision-Making in Reinforcement Learning


Title	Decision-Making in Reinforcement Learning
Authors	Arsh Javed Rehman, Pradeep Tomar
Abstract	In this research work, probabilistic decision-making approaches are studied, e.g. Bayesian and Boltzmann strategies, along with various deterministic exploration strategies, e.g. greedy, epsilon-Greedy and random approaches. In this research work, a comparative study has been done between probabilistic and deterministic decision-making approaches, the experiments are performed in OpenAI gym environment, solving Cart Pole problem. This research work discusses about the Bayesian approach to decision-making in deep reinforcement learning, and about dropout, how it can reduce the computational cost. All the exploration approaches are compared. It also discusses about the importance of exploration in deep reinforcement learning, and how improving exploration strategies may help in science and technology. This research work shows how probabilistic decision-making approaches are better in the long run as compared to the deterministic approaches. When there is uncertainty, Bayesian dropout approach proved to be better than all other approaches in this research work.
Tasks	Decision Making
Published	2019-06-01
URL	https://arxiv.org/abs/1906.00131v1
PDF	https://arxiv.org/pdf/1906.00131v1.pdf
PWC	https://paperswithcode.com/paper/190600131
Repo
Framework

Latent Gaussian process with composite likelihoods for data-driven disease stratification


Title	Latent Gaussian process with composite likelihoods for data-driven disease stratification
Authors	Siddharth Ramchandran, Miika Koskinen, Harri Lähdesmäki
Abstract	Data-driven techniques for identifying disease subtypes using medical records can greatly benefit the management of patients’ health and unravel the underpinnings of diseases. Clinical patient records are typically collected from disparate sources and result in high-dimensional data comprising of multiple likelihoods with noisy and missing values. Probabilistic methods capable of analysing large-scale patient records have a central role in biomedical research and are expected to become even more important when data-driven personalised medicine will be established in clinical practise. In this work we propose an unsupervised, generative model that can identify clustering among patients in a latent space while making use of all available data (i.e. in a heterogeneous data setting with noisy and missing values). We make use of the Gaussian process latent variable models (GPLVM) and deep neural networks to create a non-linear dimensionality reduction technique for heterogeneous data. The effectiveness of our model is demonstrated on clinical data of Parkinson’s disease patients treated at the HUS Helsinki University Hospital. We demonstrate sub-groups from the heterogeneous patient data, evaluate the robustness of the findings, and interpret cluster characteristics.
Tasks	Dimensionality Reduction, Latent Variable Models
Published	2019-09-04
URL	https://arxiv.org/abs/1909.01614v1
PDF	https://arxiv.org/pdf/1909.01614v1.pdf
PWC	https://paperswithcode.com/paper/latent-gaussian-process-with-composite
Repo
Framework

Towards an Intelligent Microscope: adaptively learned illumination for optimal sample classification


Title	Towards an Intelligent Microscope: adaptively learned illumination for optimal sample classification
Authors	Amey Chaware, Colin L. Cooke, Kanghyun Kim, Roarke Horstmeyer
Abstract	Recent machine learning techniques have dramatically changed how we process digital images. However, the way in which we capture images is still largely driven by human intuition and experience. This restriction is in part due to the many available degrees of freedom that alter the image acquisition process (lens focus, exposure, filtering, etc). Here we focus on one such degree of freedom - illumination within a microscope - which can drastically alter information captured by the image sensor. We present a reinforcement learning system that adaptively explores optimal patterns to illuminate specimens for immediate classification. The agent uses a recurrent latent space to encode a large set of variably-illuminated samples and illumination patterns. We train our agent using a reward that balances classification confidence with image acquisition cost. By synthesizing knowledge over multiple snapshots, the agent can classify on the basis of all previous images with higher accuracy than from naively illuminated images, thus demonstrating a smarter way to physically capture task-specific information.
Tasks
Published	2019-10-22
URL	https://arxiv.org/abs/1910.10209v2
PDF	https://arxiv.org/pdf/1910.10209v2.pdf
PWC	https://paperswithcode.com/paper/towards-an-intelligent-microscope-adaptively
Repo
Framework

Probability Estimation with Truncated Inverse Binomial Sampling


Title	Probability Estimation with Truncated Inverse Binomial Sampling
Authors	Xinjia Chen
Abstract	In this paper, we develop a general theory of truncated inverse binomial sampling. In this theory, the fixed-size sampling and inverse binomial sampling are accommodated as special cases. In particular, the classical Chernoff-Hoeffding bound is an immediate consequence of the theory. Moreover, we propose a rigorous and efficient method for probability estimation, which is an adaptive Monte Carlo estimation method based on truncated inverse binomial sampling. Our proposed method of probability estimation can be orders of magnitude more efficient as compared to existing methods in literature and widely used software.
Tasks
Published	2019-08-19
URL	https://arxiv.org/abs/1908.06907v1
PDF	https://arxiv.org/pdf/1908.06907v1.pdf
PWC	https://paperswithcode.com/paper/probability-estimation-with-truncated-inverse
Repo
Framework

PixelVAE++: Improved PixelVAE with Discrete Prior


Title	PixelVAE++: Improved PixelVAE with Discrete Prior
Authors	Hossein Sadeghi, Evgeny Andriyash, Walter Vinci, Lorenzo Buffoni, Mohammad H. Amin
Abstract	Constructing powerful generative models for natural images is a challenging task. PixelCNN models capture details and local information in images very well but have limited receptive field. Variational autoencoders with a factorial decoder can capture global information easily, but they often fail to reconstruct details faithfully. PixelVAE combines the best features of the two models and constructs a generative model that is able to learn local and global structures. Here we introduce PixelVAE++, a VAE with three types of latent variables and a PixelCNN++ for the decoder. We introduce a novel architecture that reuses a part of the decoder as an encoder. We achieve the state of the art performance on binary data sets such as MNIST and Omniglot and achieve the state of the art performance on CIFAR-10 among latent variable models while keeping the latent variables informative.
Tasks	Latent Variable Models, Omniglot
Published	2019-08-26
URL	https://arxiv.org/abs/1908.09948v1
PDF	https://arxiv.org/pdf/1908.09948v1.pdf
PWC	https://paperswithcode.com/paper/pixelvae-improved-pixelvae-with-discrete
Repo
Framework

Conditional Flow Variational Autoencoders for Structured Sequence Prediction


Title	Conditional Flow Variational Autoencoders for Structured Sequence Prediction
Authors	Apratim Bhattacharyya, Michael Hanselmann, Mario Fritz, Bernt Schiele, Christoph-Nikolas Straehle
Abstract	Prediction of future states of the environment and interacting agents is a key competence required for autonomous agents to operate successfully in the real world. Prior work for structured sequence prediction based on latent variable models imposes a uni-modal standard Gaussian prior on the latent variables. This induces a strong model bias which makes it challenging to fully capture the multi-modality of the distribution of the future states. In this work, we introduce Conditional Flow Variational Autoencoders (CF-VAE) using our novel conditional normalizing flow based prior to capture complex multi-modal conditional distributions for effective structured sequence prediction. Moreover, we propose two novel regularization schemes which stabilizes training and deals with posterior collapse for stable training and better fit to the target data distribution. Our experiments on three multi-modal structured sequence prediction datasets – MNIST Sequences, Stanford Drone and HighD – show that the proposed method obtains state of art results across different evaluation metrics.
Tasks	Latent Variable Models
Published	2019-08-24
URL	https://arxiv.org/abs/1908.09008v2
PDF	https://arxiv.org/pdf/1908.09008v2.pdf
PWC	https://paperswithcode.com/paper/conditional-flow-variational-autoencoders-for
Repo
Framework