October 19, 2019

3101 words 15 mins read

Paper Group ANR 210

Variational Saccading: Efficient Inference for Large Resolution Images. Cycle-of-Learning for Autonomous Systems from Human Interaction. Context-Aware Neural Machine Translation Learns Anaphora Resolution. An Optimal Algorithm for Online Unconstrained Submodular Maximization. 3D Human Pose Estimation on a Configurable Bed from a Pressure Image. Can …

Variational Saccading: Efficient Inference for Large Resolution Images


Title	Variational Saccading: Efficient Inference for Large Resolution Images
Authors	Jason Ramapuram, Maurits Diephuis, Frantzeska Lavda, Russ Webb, Alexandros Kalousis
Abstract	Image classification with deep neural networks is typically restricted to images of small dimensionality such as 224 x 244 in Resnet models [24]. This limitation excludes the 4000 x 3000 dimensional images that are taken by modern smartphone cameras and smart devices. In this work, we aim to mitigate the prohibitive inferential and memory costs of operating in such large dimensional spaces. To sample from the high-resolution original input distribution, we propose using a smaller proxy distribution to learn the co-ordinates that correspond to regions of interest in the high-dimensional space. We introduce a new principled variational lower bound that captures the relationship of the proxy distribution’s posterior and the original image’s co-ordinate space in a way that maximizes the conditional classification likelihood. We empirically demonstrate on one synthetic benchmark and one real world large resolution DSLR camera image dataset that our method produces comparable results with ~10x faster inference and lower memory consumption than a model that utilizes the entire original input distribution. Finally, we experiment with a more complex setting using mini-maps from Starcraft II [56] to infer the number of characters in a complex 3d-rendered scene. Even in such complicated scenes our model provides strong localization: a feature missing from traditional classification models.
Tasks	Image Classification, Starcraft, Starcraft II
Published	2018-12-08
URL	https://arxiv.org/abs/1812.03170v3
PDF	https://arxiv.org/pdf/1812.03170v3.pdf
PWC	https://paperswithcode.com/paper/variational-saccading-efficient-inference-for
Repo
Framework

Cycle-of-Learning for Autonomous Systems from Human Interaction


Title	Cycle-of-Learning for Autonomous Systems from Human Interaction
Authors	Nicholas R. Waytowich, Vinicius G. Goecks, Vernon J. Lawhern
Abstract	We discuss different types of human-robot interaction paradigms in the context of training end-to-end reinforcement learning algorithms. We provide a taxonomy to categorize the types of human interaction and present our Cycle-of-Learning framework for autonomous systems that combines different human-interaction modalities with reinforcement learning. Two key concepts provided by our Cycle-of-Learning framework are how it handles the integration of the different human-interaction modalities (demonstration, intervention, and evaluation) and how to define the switching criteria between them.
Tasks
Published	2018-08-28
URL	http://arxiv.org/abs/1808.09572v2
PDF	http://arxiv.org/pdf/1808.09572v2.pdf
PWC	https://paperswithcode.com/paper/cycle-of-learning-for-autonomous-systems-from
Repo
Framework

Context-Aware Neural Machine Translation Learns Anaphora Resolution


Title	Context-Aware Neural Machine Translation Learns Anaphora Resolution
Authors	Elena Voita, Pavel Serdyukov, Rico Sennrich, Ivan Titov
Abstract	Standard machine translation systems process sentences in isolation and hence ignore extra-sentential information, even though extended context can both prevent mistakes in ambiguous cases and improve translation coherence. We introduce a context-aware neural machine translation model designed in such way that the flow of information from the extended context to the translation model can be controlled and analyzed. We experiment with an English-Russian subtitles dataset, and observe that much of what is captured by our model deals with improving pronoun translation. We measure correspondences between induced attention distributions and coreference relations and observe that the model implicitly captures anaphora. It is consistent with gains for sentences where pronouns need to be gendered in translation. Beside improvements in anaphoric cases, the model also improves in overall BLEU, both over its context-agnostic version (+0.7) and over simple concatenation of the context and source sentences (+0.6).
Tasks	Machine Translation
Published	2018-05-25
URL	http://arxiv.org/abs/1805.10163v1
PDF	http://arxiv.org/pdf/1805.10163v1.pdf
PWC	https://paperswithcode.com/paper/context-aware-neural-machine-translation
Repo
Framework

An Optimal Algorithm for Online Unconstrained Submodular Maximization


Title	An Optimal Algorithm for Online Unconstrained Submodular Maximization
Authors	Tim Roughgarden, Joshua R. Wang
Abstract	We consider a basic problem at the interface of two fundamental fields: submodular optimization and online learning. In the online unconstrained submodular maximization (online USM) problem, there is a universe $[n]={1,2,…,n}$ and a sequence of $T$ nonnegative (not necessarily monotone) submodular functions arrive over time. The goal is to design a computationally efficient online algorithm, which chooses a subset of $[n]$ at each time step as a function only of the past, such that the accumulated value of the chosen subsets is as close as possible to the maximum total value of a fixed subset in hindsight. Our main result is a polynomial-time no-$1/2$-regret algorithm for this problem, meaning that for every sequence of nonnegative submodular functions, the algorithm’s expected total value is at least $1/2$ times that of the best subset in hindsight, up to an error term sublinear in $T$. The factor of $1/2$ cannot be improved upon by any polynomial-time online algorithm when the submodular functions are presented as value oracles. Previous work on the offline problem implies that picking a subset uniformly at random in each time step achieves zero $1/4$-regret. A byproduct of our techniques is an explicit subroutine for the two-experts problem that has an unusually strong regret guarantee: the total value of its choices is comparable to twice the total value of either expert on rounds it did not pick that expert. This subroutine may be of independent interest.
Tasks
Published	2018-06-08
URL	http://arxiv.org/abs/1806.03349v1
PDF	http://arxiv.org/pdf/1806.03349v1.pdf
PWC	https://paperswithcode.com/paper/an-optimal-algorithm-for-online-unconstrained
Repo
Framework

3D Human Pose Estimation on a Configurable Bed from a Pressure Image


Title	3D Human Pose Estimation on a Configurable Bed from a Pressure Image
Authors	Henry M. Clever, Ariel Kapusta, Daehyung Park, Zackory Erickson, Yash Chitalia, Charles C. Kemp
Abstract	Robots have the potential to assist people in bed, such as in healthcare settings, yet bedding materials like sheets and blankets can make observation of the human body difficult for robots. A pressure-sensing mat on a bed can provide pressure images that are relatively insensitive to bedding materials. However, prior work on estimating human pose from pressure images has been restricted to 2D pose estimates and flat beds. In this work, we present two convolutional neural networks to estimate the 3D joint positions of a person in a configurable bed from a single pressure image. The first network directly outputs 3D joint positions, while the second outputs a kinematic model that includes estimated joint angles and limb lengths. We evaluated our networks on data from 17 human participants with two bed configurations: supine and seated. Our networks achieved a mean joint position error of 77 mm when tested with data from people outside the training set, outperforming several baselines. We also present a simple mechanical model that provides insight into ambiguity associated with limbs raised off of the pressure mat, and demonstrate that Monte Carlo dropout can be used to estimate pose confidence in these situations. Finally, we provide a demonstration in which a mobile manipulator uses our network’s estimated kinematic model to reach a location on a person’s body in spite of the person being seated in a bed and covered by a blanket.
Tasks	3D Human Pose Estimation, Pose Estimation
Published	2018-04-21
URL	http://arxiv.org/abs/1804.07873v2
PDF	http://arxiv.org/pdf/1804.07873v2.pdf
PWC	https://paperswithcode.com/paper/3d-human-pose-estimation-on-a-configurable
Repo
Framework

Can Entropy Explain Successor Surprisal Effects in Reading?


Title	Can Entropy Explain Successor Surprisal Effects in Reading?
Authors	Marten van Schijndel, Tal Linzen
Abstract	Human reading behavior is sensitive to surprisal: more predictable words tend to be read faster. Unexpectedly, this applies not only to the surprisal of the word that is currently being read, but also to the surprisal of upcoming (successor) words that have not been fixated yet. This finding has been interpreted as evidence that readers can extract lexical information parafoveally. Calling this interpretation into question, Angele et al. (2015) showed that successor effects appear even in contexts in which those successor words are not yet visible. They hypothesized that successor surprisal predicts reading time because it approximates the reader’s uncertainty about upcoming words. We test this hypothesis on a reading time corpus using an LSTM language model, and find that successor surprisal and entropy are independent predictors of reading time. This independence suggests that entropy alone is unlikely to be the full explanation for successor surprisal effects.
Tasks	Language Modelling
Published	2018-10-26
URL	http://arxiv.org/abs/1810.11481v1
PDF	http://arxiv.org/pdf/1810.11481v1.pdf
PWC	https://paperswithcode.com/paper/can-entropy-explain-successor-surprisal
Repo
Framework

The Reconstruction Approach: From Interpolation to Regression


Title	The Reconstruction Approach: From Interpolation to Regression
Authors	Shifeng Xiong
Abstract	This paper introduces an interpolation-based method, called the reconstruction approach, for nonparametric regression. Based on the fact that interpolation usually has negligible errors compared to statistical estimation, the reconstruction approach uses an interpolator to parameterize the regression function with its values at finite knots, and then estimates these values by (regularized) least squares. Some popular methods including kernel ridge regression can be viewed as its special cases. It is shown that, the reconstruction idea not only provides different angles to look into existing methods, but also produces new effective experimental design and estimation methods for nonparametric models. In particular, for some methods of complexity O(n3), where n is the sample size, this approach provides effective surrogates with much less computational burden. This point makes it very suitable for large datasets.
Tasks
Published	2018-05-25
URL	https://arxiv.org/abs/1805.10122v3
PDF	https://arxiv.org/pdf/1805.10122v3.pdf
PWC	https://paperswithcode.com/paper/function-estimation-via-reconstruction
Repo
Framework

Primal-dual residual networks


Title	Primal-dual residual networks
Authors	Christoph Brauer, Dirk Lorenz
Abstract	In this work, we propose a deep neural network architecture motivated by primal-dual splitting methods from convex optimization. We show theoretically that there exists a close relation between the derived architecture and residual networks, and further investigate this connection in numerical experiments. Moreover, we demonstrate how our approach can be used to unroll optimization algorithms for certain problems with hard constraints. Using the example of speech dequantization, we show that our method can outperform classical splitting methods when both are applied to the same task.
Tasks
Published	2018-06-15
URL	http://arxiv.org/abs/1806.05823v1
PDF	http://arxiv.org/pdf/1806.05823v1.pdf
PWC	https://paperswithcode.com/paper/primal-dual-residual-networks
Repo
Framework

3D Human Pose Estimation in the Wild by Adversarial Learning


Title	3D Human Pose Estimation in the Wild by Adversarial Learning
Authors	Wei Yang, Wanli Ouyang, Xiaolong Wang, Jimmy Ren, Hongsheng Li, Xiaogang Wang
Abstract	Recently, remarkable advances have been achieved in 3D human pose estimation from monocular images because of the powerful Deep Convolutional Neural Networks (DCNNs). Despite their success on large-scale datasets collected in the constrained lab environment, it is difficult to obtain the 3D pose annotations for in-the-wild images. Therefore, 3D human pose estimation in the wild is still a challenge. In this paper, we propose an adversarial learning framework, which distills the 3D human pose structures learned from the fully annotated dataset to in-the-wild images with only 2D pose annotations. Instead of defining hard-coded rules to constrain the pose estimation results, we design a novel multi-source discriminator to distinguish the predicted 3D poses from the ground-truth, which helps to enforce the pose estimator to generate anthropometrically valid poses even with images in the wild. We also observe that a carefully designed information source for the discriminator is essential to boost the performance. Thus, we design a geometric descriptor, which computes the pairwise relative locations and distances between body joints, as a new information source for the discriminator. The efficacy of our adversarial learning framework with the new geometric descriptor has been demonstrated through extensive experiments on widely used public benchmarks. Our approach significantly improves the performance compared with previous state-of-the-art approaches.
Tasks	3D Human Pose Estimation, Pose Estimation
Published	2018-03-26
URL	http://arxiv.org/abs/1803.09722v2
PDF	http://arxiv.org/pdf/1803.09722v2.pdf
PWC	https://paperswithcode.com/paper/3d-human-pose-estimation-in-the-wild-by
Repo
Framework

3D Human Pose Estimation in RGBD Images for Robotic Task Learning


Title	3D Human Pose Estimation in RGBD Images for Robotic Task Learning
Authors	Christian Zimmermann, Tim Welschehold, Christian Dornhege, Wolfram Burgard, Thomas Brox
Abstract	We propose an approach to estimate 3D human pose in real world units from a single RGBD image and show that it exceeds performance of monocular 3D pose estimation approaches from color as well as pose estimation exclusively from depth. Our approach builds on robust human keypoint detectors for color images and incorporates depth for lifting into 3D. We combine the system with our learning from demonstration framework to instruct a service robot without the need of markers. Experiments in real world settings demonstrate that our approach enables a PR2 robot to imitate manipulation actions observed from a human teacher.
Tasks	3D Human Pose Estimation, 3D Pose Estimation, Pose Estimation
Published	2018-03-07
URL	http://arxiv.org/abs/1803.02622v2
PDF	http://arxiv.org/pdf/1803.02622v2.pdf
PWC	https://paperswithcode.com/paper/3d-human-pose-estimation-in-rgbd-images-for
Repo
Framework

DARCCC: Detecting Adversaries by Reconstruction from Class Conditional Capsules


Title	DARCCC: Detecting Adversaries by Reconstruction from Class Conditional Capsules
Authors	Nicholas Frosst, Sara Sabour, Geoffrey Hinton
Abstract	We present a simple technique that allows capsule models to detect adversarial images. In addition to being trained to classify images, the capsule model is trained to reconstruct the images from the pose parameters and identity of the correct top-level capsule. Adversarial images do not look like a typical member of the predicted class and they have much larger reconstruction errors when the reconstruction is produced from the top-level capsule for that class. We show that setting a threshold on the $l2$ distance between the input image and its reconstruction from the winning capsule is very effective at detecting adversarial images for three different datasets. The same technique works quite well for CNNs that have been trained to reconstruct the image from all or part of the last hidden layer before the softmax. We then explore a stronger, white-box attack that takes the reconstruction error into account. This attack is able to fool our detection technique but in order to make the model change its prediction to another class, the attack must typically make the “adversarial” image resemble images of the other class.
Tasks
Published	2018-11-16
URL	http://arxiv.org/abs/1811.06969v1
PDF	http://arxiv.org/pdf/1811.06969v1.pdf
PWC	https://paperswithcode.com/paper/darccc-detecting-adversaries-by
Repo
Framework

A Bayesian and Machine Learning approach to estimating Influence Model parameters for IM-RO


Title	A Bayesian and Machine Learning approach to estimating Influence Model parameters for IM-RO
Authors	Trisha Lawrence
Abstract	The rise of Online Social Networks (OSNs) has caused an insurmountable amount of interest from advertisers and researchers seeking to monopolize on its features. Researchers aim to develop strategies for determining how information is propagated among users within an OSN that is captured by diffusion or influence models. We consider the influence models for the IM-RO problem, a novel formulation to the Influence Maximization (IM) problem based on implementing Stochastic Dynamic Programming (SDP). In contrast to existing approaches involving influence spread and the theory of submodular functions, the SDP method focuses on optimizing clicks and ultimately revenue to advertisers in OSNs. Existing approaches to influence maximization have been actively researched over the past decade, with applications to multiple fields, however, our approach is a more practical variant to the original IM problem. In this paper, we provide an analysis on the influence models of the IM-RO problem by conducting experiments on synthetic and real-world datasets. We propose a Bayesian and Machine Learning approach for estimating the parameters of the influence models for the (Influence Maximization- Revenue Optimization) IM-RO problem. We present a Bayesian hierarchical model and implement the well-known Naive Bayes classifier (NBC), Decision Trees classifier (DTC) and Random Forest classifier (RFC) on three real-world datasets. Compared to previous approaches to estimating influence model parameters, our strategy has the great advantage of being directly implementable in standard software packages such as WinBUGS/OpenBUGS/JAGS and Apache Spark. We demonstrate the efficiency and usability of our methods in terms of spreading information and generating revenue for advertisers in the context of OSNs.
Tasks
Published	2018-03-08
URL	http://arxiv.org/abs/1803.03191v1
PDF	http://arxiv.org/pdf/1803.03191v1.pdf
PWC	https://paperswithcode.com/paper/a-bayesian-and-machine-learning-approach-to
Repo
Framework

Sampling the Riemann-Theta Boltzmann Machine


Title	Sampling the Riemann-Theta Boltzmann Machine
Authors	Stefano Carrazza, Daniel Krefl
Abstract	We show that the visible sector probability density function of the Riemann-Theta Boltzmann machine corresponds to a gaussian mixture model consisting of an infinite number of component multi-variate gaussians. The weights of the mixture are given by a discrete multi-variate gaussian over the hidden state space. This allows us to sample the visible sector density function in a straight-forward manner. Furthermore, we show that the visible sector probability density function possesses an affine transform property, similar to the multi-variate gaussian density.
Tasks
Published	2018-04-20
URL	http://arxiv.org/abs/1804.07768v1
PDF	http://arxiv.org/pdf/1804.07768v1.pdf
PWC	https://paperswithcode.com/paper/sampling-the-riemann-theta-boltzmann-machine
Repo
Framework

Video Captioning with Boundary-aware Hierarchical Language Decoding and Joint Video Prediction


Title	Video Captioning with Boundary-aware Hierarchical Language Decoding and Joint Video Prediction
Authors	Xiangxi Shi, Jianfei Cai, Jiuxiang Gu, Shafiq Joty
Abstract	The explosion of video data on the internet requires effective and efficient technology to generate captions automatically for people who are not able to watch the videos. Despite the great progress of video captioning research, particularly on video feature encoding, the language decoder is still largely based on the prevailing RNN decoder such as LSTM, which tends to prefer the frequent word that aligns with the video. In this paper, we propose a boundary-aware hierarchical language decoder for video captioning, which consists of a high-level GRU based language decoder, working as a global (caption-level) language model, and a low-level GRU based language decoder, working as a local (phrase-level) language model. Most importantly, we introduce a binary gate into the low-level GRU language decoder to detect the language boundaries. Together with other advanced components including joint video prediction, shared soft attention, and boundary-aware video encoding, our integrated video captioning framework can discover hierarchical language information and distinguish the subject and the object in a sentence, which are usually confusing during the language generation. Extensive experiments on two widely-used video captioning datasets, MSR-Video-to-Text (MSR-VTT) \cite{xu2016msr} and YouTube-to-Text (MSVD) \cite{chen2011collecting} show that our method is highly competitive, compared with the state-of-the-art methods.
Tasks	Language Modelling, Text Generation, Video Captioning, Video Prediction
Published	2018-07-08
URL	http://arxiv.org/abs/1807.03658v1
PDF	http://arxiv.org/pdf/1807.03658v1.pdf
PWC	https://paperswithcode.com/paper/video-captioning-with-boundary-aware
Repo
Framework

Deep Inception Generative Network for Cognitive Image Inpainting


Title	Deep Inception Generative Network for Cognitive Image Inpainting
Authors	Qingguo Xiao, Guangyao Li, Qiaochuan Chen
Abstract	Recent advances in deep learning have shown exciting promise in filling large holes and lead to another orientation for image inpainting. However, existing learning-based methods often create artifacts and fallacious textures because of insufficient cognition understanding. Previous generative networks are limited with single receptive type and give up pooling in consideration of detail sharpness. Human cognition is constant regardless of the target attribute. As multiple receptive fields improve the ability of abstract image characterization and pooling can keep feature invariant, specifically, deep inception learning is adopted to promote high-level feature representation and enhance model learning capacity for local patches. Moreover, approaches for generating diverse mask images are introduced and a random mask dataset is created. We benchmark our methods on ImageNet, Places2 dataset, and CelebA-HQ. Experiments for regular, irregular, and custom regions completion are all performed and free-style image inpainting is also presented. Quantitative comparisons with previous state-of-the-art methods show that ours obtain much more natural image completions.
Tasks	Image Inpainting
Published	2018-12-01
URL	http://arxiv.org/abs/1812.01458v1
PDF	http://arxiv.org/pdf/1812.01458v1.pdf
PWC	https://paperswithcode.com/paper/deep-inception-generative-network-for
Repo
Framework

reinforcement learning Image Inpainting 3D Pose Estimation Starcraft II Starcraft Language Modelling machine learning 3D Human Pose Estimation dataset Text Generation deep learning Image Classification Pose Estimation Video Prediction Video Captioning Machine Translation