July 29, 2019

3404 words 16 mins read

Paper Group AWR 187

Learning Hawkes Processes from Short Doubly-Censored Event Sequences. Dropout Inference in Bayesian Neural Networks with Alpha-divergences. Combining Strategic Learning and Tactical Search in Real-Time Strategy Games. Robust Keyframe-based Dense SLAM with an RGB-D Camera. BB8: A Scalable, Accurate, Robust to Partial Occlusion Method for Predicting …

Learning Hawkes Processes from Short Doubly-Censored Event Sequences


Title	Learning Hawkes Processes from Short Doubly-Censored Event Sequences
Authors	Hongteng Xu, Dixin Luo, Hongyuan Zha
Abstract	Many real-world applications require robust algorithms to learn point processes based on a type of incomplete data — the so-called short doubly-censored (SDC) event sequences. We study this critical problem of quantitative asynchronous event sequence analysis under the framework of Hawkes processes by leveraging the idea of data synthesis. Given SDC event sequences observed in a variety of time intervals, we propose a sampling-stitching data synthesis method — sampling predecessors and successors for each SDC event sequence from potential candidates and stitching them together to synthesize long training sequences. The rationality and the feasibility of our method are discussed in terms of arguments based on likelihood. Experiments on both synthetic and real-world data demonstrate that the proposed data synthesis method improves learning results indeed for both time-invariant and time-varying Hawkes processes.
Tasks	Point Processes
Published	2017-02-22
URL	http://arxiv.org/abs/1702.07013v2
PDF	http://arxiv.org/pdf/1702.07013v2.pdf
PWC	https://paperswithcode.com/paper/learning-hawkes-processes-from-short-doubly
Repo	https://github.com/HongtengXu/Hawkes-Process-Toolkit
Framework	none

Dropout Inference in Bayesian Neural Networks with Alpha-divergences


Title	Dropout Inference in Bayesian Neural Networks with Alpha-divergences
Authors	Yingzhen Li, Yarin Gal
Abstract	To obtain uncertainty estimates with real-world Bayesian deep learning models, practical inference approximations are needed. Dropout variational inference (VI) for example has been used for machine vision and medical applications, but VI can severely underestimates model uncertainty. Alpha-divergences are alternative divergences to VI’s KL objective, which are able to avoid VI’s uncertainty underestimation. But these are hard to use in practice: existing techniques can only use Gaussian approximating distributions, and require existing models to be changed radically, thus are of limited use for practitioners. We propose a re-parametrisation of the alpha-divergence objectives, deriving a simple inference technique which, together with dropout, can be easily implemented with existing models by simply changing the loss of the model. We demonstrate improved uncertainty estimates and accuracy compared to VI in dropout networks. We study our model’s epistemic uncertainty far away from the data using adversarial images, showing that these can be distinguished from non-adversarial images by examining our model’s uncertainty.
Tasks
Published	2017-03-08
URL	http://arxiv.org/abs/1703.02914v1
PDF	http://arxiv.org/pdf/1703.02914v1.pdf
PWC	https://paperswithcode.com/paper/dropout-inference-in-bayesian-neural-networks
Repo	https://github.com/janisgp/Sampling-free-Epistemic-Uncertainty
Framework	tf

Combining Strategic Learning and Tactical Search in Real-Time Strategy Games


Title	Combining Strategic Learning and Tactical Search in Real-Time Strategy Games
Authors	Nicolas A. Barriga, Marius Stanescu, Michael Buro
Abstract	A commonly used technique for managing AI complexity in real-time strategy (RTS) games is to use action and/or state abstractions. High-level abstractions can often lead to good strategic decision making, but tactical decision quality may suffer due to lost details. A competing method is to sample the search space which often leads to good tactical performance in simple scenarios, but poor high-level planning. We propose to use a deep convolutional neural network (CNN) to select among a limited set of abstract action choices, and to utilize the remaining computation time for game tree search to improve low level tactics. The CNN is trained by supervised learning on game states labelled by Puppet Search, a strategic search algorithm that uses action abstractions. The network is then used to select a script — an abstract action — to produce low level actions for all units. Subsequently, the game tree search algorithm improves the tactical actions of a subset of units using a limited view of the game state only considering units close to opponent units. Experiments in the microRTS game show that the combined algorithm results in higher win-rates than either of its two independent components and other state-of-the-art microRTS agents. To the best of our knowledge, this is the first successful application of a convolutional network to play a full RTS game on standard game maps, as previous work has focused on sub-problems, such as combat, or on very small maps.
Tasks	Decision Making, Real-Time Strategy Games
Published	2017-09-11
URL	http://arxiv.org/abs/1709.03480v1
PDF	http://arxiv.org/pdf/1709.03480v1.pdf
PWC	https://paperswithcode.com/paper/combining-strategic-learning-and-tactical
Repo	https://github.com/AmoyZhp/microRTS
Framework	none

Robust Keyframe-based Dense SLAM with an RGB-D Camera


Title	Robust Keyframe-based Dense SLAM with an RGB-D Camera
Authors	Haomin Liu, Chen Li, Guojun Chen, Guofeng Zhang, Michael Kaess, Hujun Bao
Abstract	In this paper, we present RKD-SLAM, a robust keyframe-based dense SLAM approach for an RGB-D camera that can robustly handle fast motion and dense loop closure, and run without time limitation in a moderate size scene. It not only can be used to scan high-quality 3D models, but also can satisfy the demand of VR and AR applications. First, we combine color and depth information to construct a very fast keyframe-based tracking method on a CPU, which can work robustly in challenging cases (e.g.~fast camera motion and complex loops). For reducing accumulation error, we also introduce a very efficient incremental bundle adjustment (BA) algorithm, which can greatly save unnecessary computation and perform local and global BA in a unified optimization framework. An efficient keyframe-based depth representation and fusion method is proposed to generate and timely update the dense 3D surface with online correction according to the refined camera poses of keyframes through BA. The experimental results and comparisons on a variety of challenging datasets and TUM RGB-D benchmark demonstrate the effectiveness of the proposed system.
Tasks
Published	2017-11-14
URL	http://arxiv.org/abs/1711.05166v1
PDF	http://arxiv.org/pdf/1711.05166v1.pdf
PWC	https://paperswithcode.com/paper/robust-keyframe-based-dense-slam-with-an-rgb
Repo	https://github.com/wbnature/ICE-BA
Framework	none

BB8: A Scalable, Accurate, Robust to Partial Occlusion Method for Predicting the 3D Poses of Challenging Objects without Using Depth


Title	BB8: A Scalable, Accurate, Robust to Partial Occlusion Method for Predicting the 3D Poses of Challenging Objects without Using Depth
Authors	Mahdi Rad, Vincent Lepetit
Abstract	We introduce a novel method for 3D object detection and pose estimation from color images only. We first use segmentation to detect the objects of interest in 2D even in presence of partial occlusions and cluttered background. By contrast with recent patch-based methods, we rely on a “holistic” approach: We apply to the detected objects a Convolutional Neural Network (CNN) trained to predict their 3D poses in the form of 2D projections of the corners of their 3D bounding boxes. This, however, is not sufficient for handling objects from the recent T-LESS dataset: These objects exhibit an axis of rotational symmetry, and the similarity of two images of such an object under two different poses makes training the CNN challenging. We solve this problem by restricting the range of poses used for training, and by introducing a classifier to identify the range of a pose at run-time before estimating it. We also use an optional additional step that refines the predicted poses. We improve the state-of-the-art on the LINEMOD dataset from 73.7% to 89.3% of correctly registered RGB frames. We are also the first to report results on the Occlusion dataset using color images only. We obtain 54% of frames passing the Pose 6D criterion on average on several sequences of the T-LESS dataset, compared to the 67% of the state-of-the-art on the same sequences which uses both color and depth. The full approach is also scalable, as a single network can be trained for multiple objects simultaneously.
Tasks	3D Object Detection, 6D Pose Estimation using RGB, Object Detection, Pose Estimation
Published	2017-03-31
URL	http://arxiv.org/abs/1703.10896v2
PDF	http://arxiv.org/pdf/1703.10896v2.pdf
PWC	https://paperswithcode.com/paper/bb8-a-scalable-accurate-robust-to-partial
Repo	https://github.com/Microsoft/singleshotpose
Framework	pytorch

Hybrid Reward Architecture for Reinforcement Learning


Title	Hybrid Reward Architecture for Reinforcement Learning
Authors	Harm van Seijen, Mehdi Fatemi, Joshua Romoff, Romain Laroche, Tavian Barnes, Jeffrey Tsang
Abstract	One of the main challenges in reinforcement learning (RL) is generalisation. In typical deep RL methods this is achieved by approximating the optimal value function with a low-dimensional representation using a deep network. While this approach works well in many domains, in domains where the optimal value function cannot easily be reduced to a low-dimensional representation, learning can be very slow and unstable. This paper contributes towards tackling such challenging domains, by proposing a new method, called Hybrid Reward Architecture (HRA). HRA takes as input a decomposed reward function and learns a separate value function for each component reward function. Because each component typically only depends on a subset of all features, the corresponding value function can be approximated more easily by a low-dimensional representation, enabling more effective learning. We demonstrate HRA on a toy-problem and the Atari game Ms. Pac-Man, where HRA achieves above-human performance.
Tasks	Representation Learning
Published	2017-06-13
URL	http://arxiv.org/abs/1706.04208v2
PDF	http://arxiv.org/pdf/1706.04208v2.pdf
PWC	https://paperswithcode.com/paper/hybrid-reward-architecture-for-reinforcement
Repo	https://github.com/Maluuba/hra
Framework	tf

Large Pose 3D Face Reconstruction from a Single Image via Direct Volumetric CNN Regression


Title	Large Pose 3D Face Reconstruction from a Single Image via Direct Volumetric CNN Regression
Authors	Aaron S. Jackson, Adrian Bulat, Vasileios Argyriou, Georgios Tzimiropoulos
Abstract	3D face reconstruction is a fundamental Computer Vision problem of extraordinary difficulty. Current systems often assume the availability of multiple facial images (sometimes from the same subject) as input, and must address a number of methodological challenges such as establishing dense correspondences across large facial poses, expressions, and non-uniform illumination. In general these methods require complex and inefficient pipelines for model building and fitting. In this work, we propose to address many of these limitations by training a Convolutional Neural Network (CNN) on an appropriate dataset consisting of 2D images and 3D facial models or scans. Our CNN works with just a single 2D facial image, does not require accurate alignment nor establishes dense correspondence between images, works for arbitrary facial poses and expressions, and can be used to reconstruct the whole 3D facial geometry (including the non-visible parts of the face) bypassing the construction (during training) and fitting (during testing) of a 3D Morphable Model. We achieve this via a simple CNN architecture that performs direct regression of a volumetric representation of the 3D facial geometry from a single 2D image. We also demonstrate how the related task of facial landmark localization can be incorporated into the proposed framework and help improve reconstruction quality, especially for the cases of large poses and facial expressions. Testing code will be made available online, along with pre-trained models http://aaronsplace.co.uk/papers/jackson2017recon
Tasks	3D Face Reconstruction, Face Alignment, Face Reconstruction
Published	2017-03-22
URL	http://arxiv.org/abs/1703.07834v2
PDF	http://arxiv.org/pdf/1703.07834v2.pdf
PWC	https://paperswithcode.com/paper/large-pose-3d-face-reconstruction-from-a
Repo	https://github.com/AaronJackson/vrn
Framework	torch

A Semantic Relevance Based Neural Network for Text Summarization and Text Simplification


Title	A Semantic Relevance Based Neural Network for Text Summarization and Text Simplification
Authors	Shuming Ma, Xu Sun
Abstract	Text summarization and text simplification are two major ways to simplify the text for poor readers, including children, non-native speakers, and the functionally illiterate. Text summarization is to produce a brief summary of the main ideas of the text, while text simplification aims to reduce the linguistic complexity of the text and retain the original meaning. Recently, most approaches for text summarization and text simplification are based on the sequence-to-sequence model, which achieves much success in many text generation tasks. However, although the generated simplified texts are similar to source texts literally, they have low semantic relevance. In this work, our goal is to improve semantic relevance between source texts and simplified texts for text summarization and text simplification. We introduce a Semantic Relevance Based neural model to encourage high semantic similarity between texts and summaries. In our model, the source text is represented by a gated attention encoder, while the summary representation is produced by a decoder. Besides, the similarity score between the representations is maximized during training. Our experiments show that the proposed model outperforms the state-of-the-art systems on two benchmark corpus.
Tasks	Semantic Similarity, Semantic Textual Similarity, Text Generation, Text Simplification, Text Summarization
Published	2017-10-06
URL	http://arxiv.org/abs/1710.02318v1
PDF	http://arxiv.org/pdf/1710.02318v1.pdf
PWC	https://paperswithcode.com/paper/a-semantic-relevance-based-neural-network-for
Repo	https://github.com/shumingma/SRB
Framework	tf

Quantifying Facial Age by Posterior of Age Comparisons


Title	Quantifying Facial Age by Posterior of Age Comparisons
Authors	Yunxuan Zhang, Li Liu, Cheng Li, Chen change Loy
Abstract	We introduce a novel approach for annotating large quantity of in-the-wild facial images with high-quality posterior age distribution as labels. Each posterior provides a probability distribution of estimated ages for a face. Our approach is motivated by observations that it is easier to distinguish who is the older of two people than to determine the person’s actual age. Given a reference database with samples of known ages and a dataset to label, we can transfer reliable annotations from the former to the latter via human-in-the-loop comparisons. We show an effective way to transform such comparisons to posterior via fully-connected and SoftMax layers, so as to permit end-to-end training in a deep network. Thanks to the efficient and effective annotation approach, we collect a new large-scale facial age dataset, dubbed `MegaAge’, which consists of 41,941 images. Data can be downloaded from our project page mmlab.ie.cuhk.edu.hk/projects/MegaAge and github.com/zyx2012/Age_estimation_BMVC2017. With the dataset, we train a network that jointly performs ordinal hyperplane classification and posterior distribution learning. Our approach achieves state-of-the-art results on popular benchmarks such as MORPH2, Adience, and the newly proposed MegaAge. \|
Tasks
Published	2017-08-31
URL	http://arxiv.org/abs/1708.09687v2
PDF	http://arxiv.org/pdf/1708.09687v2.pdf
PWC	https://paperswithcode.com/paper/quantifying-facial-age-by-posterior-of-age
Repo	https://github.com/zyx2012/Age_estimation_BMVC2017
Framework	none

Automatic Brain Tumor Segmentation using Cascaded Anisotropic Convolutional Neural Networks


Title	Automatic Brain Tumor Segmentation using Cascaded Anisotropic Convolutional Neural Networks
Authors	Guotai Wang, Wenqi Li, Sebastien Ourselin, Tom Vercauteren
Abstract	A cascade of fully convolutional neural networks is proposed to segment multi-modal Magnetic Resonance (MR) images with brain tumor into background and three hierarchical regions: whole tumor, tumor core and enhancing tumor core. The cascade is designed to decompose the multi-class segmentation problem into a sequence of three binary segmentation problems according to the subregion hierarchy. The whole tumor is segmented in the first step and the bounding box of the result is used for the tumor core segmentation in the second step. The enhancing tumor core is then segmented based on the bounding box of the tumor core segmentation result. Our networks consist of multiple layers of anisotropic and dilated convolution filters, and they are combined with multi-view fusion to reduce false positives. Residual connections and multi-scale predictions are employed in these networks to boost the segmentation performance. Experiments with BraTS 2017 validation set show that the proposed method achieved average Dice scores of 0.7859, 0.9050, 0.8378 for enhancing tumor core, whole tumor and tumor core, respectively. The corresponding values for BraTS 2017 testing set were 0.7831, 0.8739, and 0.7748, respectively.
Tasks	Brain Tumor Segmentation, Medical Image Segmentation
Published	2017-09-01
URL	http://arxiv.org/abs/1709.00382v2
PDF	http://arxiv.org/pdf/1709.00382v2.pdf
PWC	https://paperswithcode.com/paper/automatic-brain-tumor-segmentation-using-1
Repo	https://github.com/taigw/brats17
Framework	tf

Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Optimization


Title	Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Optimization
Authors	Fabian Pedregosa, Rémi Leblond, Simon Lacoste-Julien
Abstract	Due to their simplicity and excellent performance, parallel asynchronous variants of stochastic gradient descent have become popular methods to solve a wide range of large-scale optimization problems on multi-core architectures. Yet, despite their practical success, support for nonsmooth objectives is still lacking, making them unsuitable for many problems of interest in machine learning, such as the Lasso, group Lasso or empirical risk minimization with convex constraints. In this work, we propose and analyze ProxASAGA, a fully asynchronous sparse method inspired by SAGA, a variance reduced incremental gradient algorithm. The proposed method is easy to implement and significantly outperforms the state of the art on several nonsmooth, large-scale problems. We prove that our method achieves a theoretical linear speedup with respect to the sequential version under assumptions on the sparsity of gradients and block-separability of the proximal term. Empirical benchmarks on a multi-core architecture illustrate practical speedups of up to 12x on a 20-core machine.
Tasks
Published	2017-07-20
URL	http://arxiv.org/abs/1707.06468v3
PDF	http://arxiv.org/pdf/1707.06468v3.pdf
PWC	https://paperswithcode.com/paper/breaking-the-nonsmooth-barrier-a-scalable
Repo	https://github.com/fabianp/ProxASAGA
Framework	none

Unsupervised Diverse Colorization via Generative Adversarial Networks


Title	Unsupervised Diverse Colorization via Generative Adversarial Networks
Authors	Yun Cao, Zhiming Zhou, Weinan Zhang, Yong Yu
Abstract	Colorization of grayscale images has been a hot topic in computer vision. Previous research mainly focuses on producing a colored image to match the original one. However, since many colors share the same gray value, an input grayscale image could be diversely colored while maintaining its reality. In this paper, we design a novel solution for unsupervised diverse colorization. Specifically, we leverage conditional generative adversarial networks to model the distribution of real-world item colors, in which we develop a fully convolutional generator with multi-layer noise to enhance diversity, with multi-layer condition concatenation to maintain reality, and with stride 1 to keep spatial information. With such a novel network architecture, the model yields highly competitive performance on the open LSUN bedroom dataset. The Turing test of 80 humans further indicates our generated color schemes are highly convincible.
Tasks	Colorization
Published	2017-02-22
URL	http://arxiv.org/abs/1702.06674v2
PDF	http://arxiv.org/pdf/1702.06674v2.pdf
PWC	https://paperswithcode.com/paper/unsupervised-diverse-colorization-via
Repo	https://github.com/ccyyatnet/COLORGAN
Framework	tf

Age Progression/Regression by Conditional Adversarial Autoencoder


Title	Age Progression/Regression by Conditional Adversarial Autoencoder
Authors	Zhifei Zhang, Yang Song, Hairong Qi
Abstract	“If I provide you a face image of mine (without telling you the actual age when I took the picture) and a large amount of face images that I crawled (containing labeled faces of different ages but not necessarily paired), can you show me what I would look like when I am 80 or what I was like when I was 5?” The answer is probably a “No.” Most existing face aging works attempt to learn the transformation between age groups and thus would require the paired samples as well as the labeled query image. In this paper, we look at the problem from a generative modeling perspective such that no paired samples is required. In addition, given an unlabeled image, the generative model can directly produce the image with desired age attribute. We propose a conditional adversarial autoencoder (CAAE) that learns a face manifold, traversing on which smooth age progression and regression can be realized simultaneously. In CAAE, the face is first mapped to a latent vector through a convolutional encoder, and then the vector is projected to the face manifold conditional on age through a deconvolutional generator. The latent vector preserves personalized face features (i.e., personality) and the age condition controls progression vs. regression. Two adversarial networks are imposed on the encoder and generator, respectively, forcing to generate more photo-realistic faces. Experimental results demonstrate the appealing performance and flexibility of the proposed framework by comparing with the state-of-the-art and ground truth.
Tasks
Published	2017-02-27
URL	http://arxiv.org/abs/1702.08423v2
PDF	http://arxiv.org/pdf/1702.08423v2.pdf
PWC	https://paperswithcode.com/paper/age-progressionregression-by-conditional
Repo	https://github.com/mattans/AgeProgression
Framework	pytorch

It Takes (Only) Two: Adversarial Generator-Encoder Networks


Title	It Takes (Only) Two: Adversarial Generator-Encoder Networks
Authors	Dmitry Ulyanov, Andrea Vedaldi, Victor Lempitsky
Abstract	We present a new autoencoder-type architecture that is trainable in an unsupervised mode, sustains both generation and inference, and has the quality of conditional and unconditional samples boosted by adversarial learning. Unlike previous hybrids of autoencoders and adversarial networks, the adversarial game in our approach is set up directly between the encoder and the generator, and no external mappings are trained in the process of learning. The game objective compares the divergences of each of the real and the generated data distributions with the prior distribution in the latent space. We show that direct generator-vs-encoder game leads to a tight coupling of the two components, resulting in samples and reconstructions of a comparable quality to some recently-proposed more complex architectures.
Tasks
Published	2017-04-07
URL	http://arxiv.org/abs/1704.02304v3
PDF	http://arxiv.org/pdf/1704.02304v3.pdf
PWC	https://paperswithcode.com/paper/it-takes-only-two-adversarial-generator
Repo	https://github.com/DmitryUlyanov/AGE
Framework	pytorch

What Looks Good with my Sofa: Multimodal Search Engine for Interior Design


Title	What Looks Good with my Sofa: Multimodal Search Engine for Interior Design
Authors	Ivona Tautkute, Aleksandra Możejko, Wojciech Stokowiec, Tomasz Trzciński, Łukasz Brocki, Krzysztof Marasek
Abstract	In this paper, we propose a multi-modal search engine for interior design that combines visual and textual queries. The goal of our engine is to retrieve interior objects, e.g. furniture or wall clocks, that share visual and aesthetic similarities with the query. Our search engine allows the user to take a photo of a room and retrieve with a high recall a list of items identical or visually similar to those present in the photo. Additionally, it allows to return other items that aesthetically and stylistically fit well together. To achieve this goal, our system blends the results obtained using textual and visual modalities. Thanks to this blending strategy, we increase the average style similarity score of the retrieved items by 11%. Our work is implemented as a Web-based application and it is planned to be opened to the public.
Tasks
Published	2017-07-21
URL	http://arxiv.org/abs/1707.06907v2
PDF	http://arxiv.org/pdf/1707.06907v2.pdf
PWC	https://paperswithcode.com/paper/what-looks-good-with-my-sofa-multimodal
Repo	https://github.com/peter0083/DeepDeco
Framework	tf