October 20, 2019

3264 words 16 mins read

Paper Group AWR 332

Measuring the Intrinsic Dimension of Objective Landscapes. 3D RoI-aware U-Net for Accurate and Efficient Colorectal Tumor Segmentation. Learning Attractor Dynamics for Generative Memory. Is it Safe to Drive? An Overview of Factors, Challenges, and Datasets for Driveability Assessment in Autonomous Driving. Recurrent Neural Network-Based Semantic Va …

Measuring the Intrinsic Dimension of Objective Landscapes


Title	Measuring the Intrinsic Dimension of Objective Landscapes
Authors	Chunyuan Li, Heerad Farkhoor, Rosanne Liu, Jason Yosinski
Abstract	Many recently trained neural networks employ large numbers of parameters to achieve good performance. One may intuitively use the number of parameters required as a rough gauge of the difficulty of a problem. But how accurate are such notions? How many parameters are really needed? In this paper we attempt to answer this question by training networks not in their native parameter space, but instead in a smaller, randomly oriented subspace. We slowly increase the dimension of this subspace, note at which dimension solutions first appear, and define this to be the intrinsic dimension of the objective landscape. The approach is simple to implement, computationally tractable, and produces several suggestive conclusions. Many problems have smaller intrinsic dimensions than one might suspect, and the intrinsic dimension for a given dataset varies little across a family of models with vastly different sizes. This latter result has the profound implication that once a parameter space is large enough to solve a problem, extra parameters serve directly to increase the dimensionality of the solution manifold. Intrinsic dimension allows some quantitative comparison of problem difficulty across supervised, reinforcement, and other types of learning where we conclude, for example, that solving the inverted pendulum problem is 100 times easier than classifying digits from MNIST, and playing Atari Pong from pixels is about as hard as classifying CIFAR-10. In addition to providing new cartography of the objective landscapes wandered by parameterized models, the method is a simple technique for constructively obtaining an upper bound on the minimum description length of a solution. A byproduct of this construction is a simple approach for compressing networks, in some cases by more than 100 times.
Tasks
Published	2018-04-24
URL	http://arxiv.org/abs/1804.08838v1
PDF	http://arxiv.org/pdf/1804.08838v1.pdf
PWC	https://paperswithcode.com/paper/measuring-the-intrinsic-dimension-of
Repo	https://github.com/Helsinki-NLP/shared-info
Framework	none

3D RoI-aware U-Net for Accurate and Efficient Colorectal Tumor Segmentation


Title	3D RoI-aware U-Net for Accurate and Efficient Colorectal Tumor Segmentation
Authors	Yi-Jie Huang, Qi Dou, Zi-Xian Wang, Li-Zhi Liu, Ying Jin, Chao-Feng Li, Lisheng Wang, Hao Chen, Rui-Hua Xu
Abstract	Segmentation of colorectal cancerous regions from 3D Magnetic Resonance (MR) images is a crucial procedure for radiotherapy which conventionally requires accurate delineation of tumour boundaries at an expense of labor, time and reproducibility. While deep learning based methods serve good baselines in 3D image segmentation tasks, small applicable patch size limits effective receptive field and degrades segmentation performance. In addition, Regions of interest (RoIs) localization from large whole volume 3D images serves as a preceding operation that brings about multiple benefits in terms of speed, target completeness, reduction of false positives. Distinct from sliding window or non-joint localization-segmentation based models, we propose a novel multitask framework referred to as 3D RoI-aware U-Net (3D RU-Net), for RoI localization and in-region segmentation where the two tasks share one backbone encoder network. With the region proposals from the encoder, we crop multi-level RoI in-region features from the encoder to form a GPU memory-efficient decoder for detailpreserving segmentation and therefore enlarged applicable volume size and effective receptive field. To effectively train the model, we designed a Dice formulated loss function for the global-to-local multi-task learning procedure. Based on the efficiency gains, we went on to ensemble models with different receptive fields to achieve even higher performance costing minor extra computational expensiveness. Extensive experiments were conducted on 64 cancerous cases with a four-fold cross-validation, and the results showed significant superiority in terms of accuracy and efficiency over conventional frameworks. In conclusion, the proposed method has a huge potential for extension to other 3D object segmentation tasks from medical images due to its inherent generalizability. The code for the proposed method is publicly available.
Tasks	Multi-Task Learning, Semantic Segmentation
Published	2018-06-27
URL	http://arxiv.org/abs/1806.10342v5
PDF	http://arxiv.org/pdf/1806.10342v5.pdf
PWC	https://paperswithcode.com/paper/3d-roi-aware-u-net-for-accurate-and-efficient
Repo	https://github.com/RashmiUSC/3D-RU-Net
Framework	pytorch

Learning Attractor Dynamics for Generative Memory


Title	Learning Attractor Dynamics for Generative Memory
Authors	Yan Wu, Greg Wayne, Karol Gregor, Timothy Lillicrap
Abstract	A central challenge faced by memory systems is the robust retrieval of a stored pattern in the presence of interference due to other stored patterns and noise. A theoretically well-founded solution to robust retrieval is given by attractor dynamics, which iteratively clean up patterns during recall. However, incorporating attractor dynamics into modern deep learning systems poses difficulties: attractor basins are characterised by vanishing gradients, which are known to make training neural networks difficult. In this work, we avoid the vanishing gradient problem by training a generative distributed memory without simulating the attractor dynamics. Based on the idea of memory writing as inference, as proposed in the Kanerva Machine, we show that a likelihood-based Lyapunov function emerges from maximising the variational lower-bound of a generative memory. Experiments shows it converges to correct patterns upon iterative retrieval and achieves competitive performance as both a memory model and a generative model.
Tasks
Published	2018-11-23
URL	http://arxiv.org/abs/1811.09556v1
PDF	http://arxiv.org/pdf/1811.09556v1.pdf
PWC	https://paperswithcode.com/paper/learning-attractor-dynamics-for-generative
Repo	https://github.com/deepmind/dynamic-kanerva-machines
Framework	tf

Is it Safe to Drive? An Overview of Factors, Challenges, and Datasets for Driveability Assessment in Autonomous Driving


Title	Is it Safe to Drive? An Overview of Factors, Challenges, and Datasets for Driveability Assessment in Autonomous Driving
Authors	Junyao Guo, Unmesh Kurup, Mohak Shah
Abstract	With recent advances in learning algorithms and hardware development, autonomous cars have shown promise when operating in structured environments under good driving conditions. However, for complex, cluttered and unseen environments with high uncertainty, autonomous driving systems still frequently demonstrate erroneous or unexpected behaviors, that could lead to catastrophic outcomes. Autonomous vehicles should ideally adapt to driving conditions; while this can be achieved through multiple routes, it would be beneficial as a first step to be able to characterize Driveability in some quantified form. To this end, this paper aims to create a framework for investigating different factors that can impact driveability. Also, one of the main mechanisms to adapt autonomous driving systems to any driving condition is to be able to learn and generalize from representative scenarios. The machine learning algorithms that currently do so learn predominantly in a supervised manner and consequently need sufficient data for robust and efficient learning. Therefore, we also perform a comparative overview of 45 public driving datasets that enable learning and publish this dataset index at https://sites.google.com/view/driveability-survey-datasets. Specifically, we categorize the datasets according to use cases, and highlight the datasets that capture complicated and hazardous driving conditions which can be better used for training robust driving models. Furthermore, by discussions of what driving scenarios are not covered by existing public datasets and what driveability factors need more investigation and data acquisition, this paper aims to encourage both targeted dataset collection and the proposal of novel driveability metrics that enhance the robustness of autonomous cars in adverse environments.
Tasks	Autonomous Driving, Autonomous Vehicles
Published	2018-11-27
URL	http://arxiv.org/abs/1811.11277v1
PDF	http://arxiv.org/pdf/1811.11277v1.pdf
PWC	https://paperswithcode.com/paper/is-it-safe-to-drive-an-overview-of-factors
Repo	https://github.com/scaleapi/open-dataset-list
Framework	none

Recurrent Neural Network-Based Semantic Variational Autoencoder for Sequence-to-Sequence Learning


Title	Recurrent Neural Network-Based Semantic Variational Autoencoder for Sequence-to-Sequence Learning
Authors	Myeongjun Jang, Seungwan Seo, Pilsung Kang
Abstract	Sequence-to-sequence (Seq2seq) models have played an important role in the recent success of various natural language processing methods, such as machine translation, text summarization, and speech recognition. However, current Seq2seq models have trouble preserving global latent information from a long sequence of words. Variational autoencoder (VAE) alleviates this problem by learning a continuous semantic space of the input sentence. However, it does not solve the problem completely. In this paper, we propose a new recurrent neural network (RNN)-based Seq2seq model, RNN semantic variational autoencoder (RNN–SVAE), to better capture the global latent information of a sequence of words. To reflect the meaning of words in a sentence properly, without regard to its position within the sentence, we construct a document information vector using the attention information between the final state of the encoder and every prior hidden state. Then, the mean and standard deviation of the continuous semantic space are learned by using this vector to take advantage of the variational method. By using the document information vector to find the semantic space of the sentence, it becomes possible to better capture the global latent feature of the sentence. Experimental results of three natural language tasks (i.e., language modeling, missing word imputation, paraphrase identification) confirm that the proposed RNN–SVAE yields higher performance than two benchmark models.
Tasks	Imputation, Language Modelling, Machine Translation, Paraphrase Identification, Speech Recognition, Text Summarization
Published	2018-02-09
URL	http://arxiv.org/abs/1802.03238v2
PDF	http://arxiv.org/pdf/1802.03238v2.pdf
PWC	https://paperswithcode.com/paper/recurrent-neural-network-based-semantic
Repo	https://github.com/MJ-Jang/RNN_SVAE
Framework	tf

JeSemE: A Website for Exploring Diachronic Changes in Word Meaning and Emotion


Title	JeSemE: A Website for Exploring Diachronic Changes in Word Meaning and Emotion
Authors	Johannes Hellrich, Sven Buechel, Udo Hahn
Abstract	We here introduce a substantially extended version of JeSemE, a website for visually exploring computationally derived time-variant information on word meaning and lexical emotion assembled from five large diachronic text corpora. JeSemE is intended as an interactive tool for scholars in the (digital) humanities who are mostly limited to consulting manually compiled dictionaries for such information, if available at all. JeSemE uniquely combines state-of-the-art distributional semantics with a nuanced model of human emotions, two information streams we deem beneficial for a data-driven interpretation of texts in the humanities.
Tasks
Published	2018-07-11
URL	http://arxiv.org/abs/1807.04148v1
PDF	http://arxiv.org/pdf/1807.04148v1.pdf
PWC	https://paperswithcode.com/paper/jeseme-a-website-for-exploring-diachronic
Repo	https://github.com/hellrich/JeSemE
Framework	none

How Robust is 3D Human Pose Estimation to Occlusion?


Title	How Robust is 3D Human Pose Estimation to Occlusion?
Authors	István Sárándi, Timm Linder, Kai O. Arras, Bastian Leibe
Abstract	Occlusion is commonplace in realistic human-robot shared environments, yet its effects are not considered in standard 3D human pose estimation benchmarks. This leaves the question open: how robust are state-of-the-art 3D pose estimation methods against partial occlusions? We study several types of synthetic occlusions over the Human3.6M dataset and find a method with state-of-the-art benchmark performance to be sensitive even to low amounts of occlusion. Addressing this issue is key to progress in applications such as collaborative and service robotics. We take a first step in this direction by improving occlusion-robustness through training data augmentation with synthetic occlusions. This also turns out to be an effective regularizer that is beneficial even for non-occluded test cases.
Tasks	3D Human Pose Estimation, 3D Pose Estimation, Data Augmentation, Pose Estimation
Published	2018-08-28
URL	http://arxiv.org/abs/1808.09316v2
PDF	http://arxiv.org/pdf/1808.09316v2.pdf
PWC	https://paperswithcode.com/paper/how-robust-is-3d-human-pose-estimation-to
Repo	https://github.com/isarandi/synthetic-occlusion
Framework	none

Dynamic Graph Representation Learning via Self-Attention Networks


Title	Dynamic Graph Representation Learning via Self-Attention Networks
Authors	Aravind Sankar, Yanhong Wu, Liang Gou, Wei Zhang, Hao Yang
Abstract	Learning latent representations of nodes in graphs is an important and ubiquitous task with widespread applications such as link prediction, node classification, and graph visualization. Previous methods on graph representation learning mainly focus on static graphs, however, many real-world graphs are dynamic and evolve over time. In this paper, we present Dynamic Self-Attention Network (DySAT), a novel neural architecture that operates on dynamic graphs and learns node representations that capture both structural properties and temporal evolutionary patterns. Specifically, DySAT computes node representations by jointly employing self-attention layers along two dimensions: structural neighborhood and temporal dynamics. We conduct link prediction experiments on two classes of graphs: communication networks and bipartite rating networks. Our experimental results show that DySAT has a significant performance gain over several different state-of-the-art graph embedding baselines.
Tasks	Graph Embedding, Graph Representation Learning, Link Prediction, Node Classification, Representation Learning
Published	2018-12-22
URL	https://arxiv.org/abs/1812.09430v2
PDF	https://arxiv.org/pdf/1812.09430v2.pdf
PWC	https://paperswithcode.com/paper/dynamic-graph-representation-learning-via
Repo	https://github.com/aravindsankar28/DySAT
Framework	tf

Personalized Gaussian Processes for Forecasting of Alzheimer’s Disease Assessment Scale-Cognition Sub-Scale (ADAS-Cog13)


Title	Personalized Gaussian Processes for Forecasting of Alzheimer’s Disease Assessment Scale-Cognition Sub-Scale (ADAS-Cog13)
Authors	Yuria Utsumi, Ognjen Rudovic, Kelly Peterson, Ricardo Guerrero, Rosalind W. Picard
Abstract	In this paper, we introduce the use of a personalized Gaussian Process model (pGP) to predict per-patient changes in ADAS-Cog13 – a significant predictor of Alzheimer’s Disease (AD) in the cognitive domain – using data from each patient’s previous visits, and testing on future (held-out) data. We start by learning a population-level model using multi-modal data from previously seen patients using a base Gaussian Process (GP) regression. The personalized GP (pGP) is formed by adapting the base GP sequentially over time to a new (target) patient using domain adaptive GPs. We extend this personalized approach to predict the values of ADAS-Cog13 over the future 6, 12, 18, and 24 months. We compare this approach to a GP model trained only on past data of the target patients (tGP), as well as to a new approach that combines pGP with tGP. We find that the new approach, combining pGP with tGP, leads to large improvements in accurately forecasting future ADAS-Cog13 scores.
Tasks	Gaussian Processes
Published	2018-02-22
URL	http://arxiv.org/abs/1802.08561v4
PDF	http://arxiv.org/pdf/1802.08561v4.pdf
PWC	https://paperswithcode.com/paper/personalized-gaussian-processes-for
Repo	https://github.com/yuriautsumi/PersonalizedGP
Framework	tf

On the Spectrum of Random Features Maps of High Dimensional Data


Title	On the Spectrum of Random Features Maps of High Dimensional Data
Authors	Zhenyu Liao, Romain Couillet
Abstract	Random feature maps are ubiquitous in modern statistical machine learning, where they generalize random projections by means of powerful, yet often difficult to analyze nonlinear operators. In this paper, we leverage the “concentration” phenomenon induced by random matrix theory to perform a spectral analysis on the Gram matrix of these random feature maps, here for Gaussian mixture models of simultaneously large dimension and size. Our results are instrumental to a deeper understanding on the interplay of the nonlinearity and the statistics of the data, thereby allowing for a better tuning of random feature-based techniques.
Tasks
Published	2018-05-30
URL	http://arxiv.org/abs/1805.11916v2
PDF	http://arxiv.org/pdf/1805.11916v2.pdf
PWC	https://paperswithcode.com/paper/on-the-spectrum-of-random-features-maps-of
Repo	https://github.com/Zhenyu-LIAO/RMT4RFM
Framework	none

Digging Into Self-Supervised Monocular Depth Estimation


Title	Digging Into Self-Supervised Monocular Depth Estimation
Authors	Clément Godard, Oisin Mac Aodha, Michael Firman, Gabriel Brostow
Abstract	Per-pixel ground-truth depth data is challenging to acquire at scale. To overcome this limitation, self-supervised learning has emerged as a promising alternative for training models to perform monocular depth estimation. In this paper, we propose a set of improvements, which together result in both quantitatively and qualitatively improved depth maps compared to competing self-supervised methods. Research on self-supervised monocular training usually explores increasingly complex architectures, loss functions, and image formation models, all of which have recently helped to close the gap with fully-supervised methods. We show that a surprisingly simple model, and associated design choices, lead to superior predictions. In particular, we propose (i) a minimum reprojection loss, designed to robustly handle occlusions, (ii) a full-resolution multi-scale sampling method that reduces visual artifacts, and (iii) an auto-masking loss to ignore training pixels that violate camera motion assumptions. We demonstrate the effectiveness of each component in isolation, and show high quality, state-of-the-art results on the KITTI benchmark.
Tasks	Depth Estimation, Image Reconstruction, Motion Estimation, Scene Understanding
Published	2018-06-04
URL	https://arxiv.org/abs/1806.01260v4
PDF	https://arxiv.org/pdf/1806.01260v4.pdf
PWC	https://paperswithcode.com/paper/digging-into-self-supervised-monocular-depth
Repo	https://github.com/FangGet/tf-monodepth2
Framework	tf

Interactive Language Acquisition with One-shot Visual Concept Learning through a Conversational Game


Title	Interactive Language Acquisition with One-shot Visual Concept Learning through a Conversational Game
Authors	Haichao Zhang, Haonan Yu, Wei Xu
Abstract	Building intelligent agents that can communicate with and learn from humans in natural language is of great value. Supervised language learning is limited by the ability of capturing mainly the statistics of training data, and is hardly adaptive to new scenarios or flexible for acquiring new knowledge without inefficient retraining or catastrophic forgetting. We highlight the perspective that conversational interaction serves as a natural interface both for language learning and for novel knowledge acquisition and propose a joint imitation and reinforcement approach for grounded language learning through an interactive conversational game. The agent trained with this approach is able to actively acquire information by asking questions about novel objects and use the just-learned knowledge in subsequent conversations in a one-shot fashion. Results compared with other methods verified the effectiveness of the proposed approach.
Tasks	Language Acquisition
Published	2018-04-26
URL	http://arxiv.org/abs/1805.00462v1
PDF	http://arxiv.org/pdf/1805.00462v1.pdf
PWC	https://paperswithcode.com/paper/interactive-language-acquisition-with-one
Repo	https://github.com/PaddlePaddle/XWorld
Framework	none

Multi-Agent Imitation Learning for Driving Simulation


Title	Multi-Agent Imitation Learning for Driving Simulation
Authors	Raunak P. Bhattacharyya, Derek J. Phillips, Blake Wulfe, Jeremy Morton, Alex Kuefler, Mykel J. Kochenderfer
Abstract	Simulation is an appealing option for validating the safety of autonomous vehicles. Generative Adversarial Imitation Learning (GAIL) has recently been shown to learn representative human driver models. These human driver models were learned through training in single-agent environments, but they have difficulty in generalizing to multi-agent driving scenarios. We argue these difficulties arise because observations at training and test time are sampled from different distributions. This difference makes such models unsuitable for the simulation of driving scenes, where multiple agents must interact realistically over long time horizons. We extend GAIL to address these shortcomings through a parameter-sharing approach grounded in curriculum learning. Compared with single-agent GAIL policies, policies generated by our PS-GAIL method prove superior at interacting stably in a multi-agent setting and capturing the emergent behavior of human drivers.
Tasks	Autonomous Vehicles, Imitation Learning
Published	2018-03-02
URL	http://arxiv.org/abs/1803.01044v1
PDF	http://arxiv.org/pdf/1803.01044v1.pdf
PWC	https://paperswithcode.com/paper/multi-agent-imitation-learning-for-driving
Repo	https://github.com/sisl/ngsim_env
Framework	tf

Deep Pictorial Gaze Estimation


Title	Deep Pictorial Gaze Estimation
Authors	Seonwook Park, Adrian Spurr, Otmar Hilliges
Abstract	Estimating human gaze from natural eye images only is a challenging task. Gaze direction can be defined by the pupil- and the eyeball center where the latter is unobservable in 2D images. Hence, achieving highly accurate gaze estimates is an ill-posed problem. In this paper, we introduce a novel deep neural network architecture specifically designed for the task of gaze estimation from single eye input. Instead of directly regressing two angles for the pitch and yaw of the eyeball, we regress to an intermediate pictorial representation which in turn simplifies the task of 3D gaze direction estimation. Our quantitative and qualitative results show that our approach achieves higher accuracies than the state-of-the-art and is robust to variation in gaze, head pose and image quality.
Tasks	Gaze Estimation
Published	2018-07-26
URL	http://arxiv.org/abs/1807.10002v1
PDF	http://arxiv.org/pdf/1807.10002v1.pdf
PWC	https://paperswithcode.com/paper/deep-pictorial-gaze-estimation
Repo	https://github.com/swook/GazeML
Framework	tf

Learning model-based strategies in simple environments with hierarchical q-networks


Title	Learning model-based strategies in simple environments with hierarchical q-networks
Authors	Necati Alp Muyesser, Kyle Dunovan, Timothy Verstynen
Abstract	Recent advances in deep learning have allowed artificial agents to rival human-level performance on a wide range of complex tasks; however, the ability of these networks to learn generalizable strategies remains a pressing challenge. This critical limitation is due in part to two factors: the opaque information representation in deep neural networks and the complexity of the task environments in which they are typically deployed. Here we propose a novel Hierarchical Q-Network (HQN) motivated by theories of the hierarchical organization of the human prefrontal cortex, that attempts to identify lower dimensional patterns in the value landscape that can be exploited to construct an internal model of rules in simple environments. We draw on combinatorial games, where there exists a single optimal strategy for winning that generalizes across other features of the game, to probe the strategy generalization of the HQN and other reinforcement learning (RL) agents using variations of Wythoff’s game. Traditional RL approaches failed to reach satisfactory performance on variants of Wythoff’s Game; however, the HQN learned heuristic-like strategies that generalized across changes in board configuration. More importantly, the HQN allowed for transparent inspection of the agent’s internal model of the game following training. Our results show how a biologically inspired hierarchical learner can facilitate learning abstract rules to promote robust and flexible action policies in simplified training environments with clearly delineated optimal strategies.
Tasks
Published	2018-01-20
URL	http://arxiv.org/abs/1801.06689v1
PDF	http://arxiv.org/pdf/1801.06689v1.pdf
PWC	https://paperswithcode.com/paper/learning-model-based-strategies-in-simple
Repo	https://github.com/CoAxLab/azad
Framework	pytorch