October 20, 2019

3264 words 16 mins read

Paper Group AWR 332

Paper Group AWR 332

Measuring the Intrinsic Dimension of Objective Landscapes. 3D RoI-aware U-Net for Accurate and Efficient Colorectal Tumor Segmentation. Learning Attractor Dynamics for Generative Memory. Is it Safe to Drive? An Overview of Factors, Challenges, and Datasets for Driveability Assessment in Autonomous Driving. Recurrent Neural Network-Based Semantic Va …

Measuring the Intrinsic Dimension of Objective Landscapes

Title Measuring the Intrinsic Dimension of Objective Landscapes
Authors Chunyuan Li, Heerad Farkhoor, Rosanne Liu, Jason Yosinski
Abstract Many recently trained neural networks employ large numbers of parameters to achieve good performance. One may intuitively use the number of parameters required as a rough gauge of the difficulty of a problem. But how accurate are such notions? How many parameters are really needed? In this paper we attempt to answer this question by training networks not in their native parameter space, but instead in a smaller, randomly oriented subspace. We slowly increase the dimension of this subspace, note at which dimension solutions first appear, and define this to be the intrinsic dimension of the objective landscape. The approach is simple to implement, computationally tractable, and produces several suggestive conclusions. Many problems have smaller intrinsic dimensions than one might suspect, and the intrinsic dimension for a given dataset varies little across a family of models with vastly different sizes. This latter result has the profound implication that once a parameter space is large enough to solve a problem, extra parameters serve directly to increase the dimensionality of the solution manifold. Intrinsic dimension allows some quantitative comparison of problem difficulty across supervised, reinforcement, and other types of learning where we conclude, for example, that solving the inverted pendulum problem is 100 times easier than classifying digits from MNIST, and playing Atari Pong from pixels is about as hard as classifying CIFAR-10. In addition to providing new cartography of the objective landscapes wandered by parameterized models, the method is a simple technique for constructively obtaining an upper bound on the minimum description length of a solution. A byproduct of this construction is a simple approach for compressing networks, in some cases by more than 100 times.
Tasks
Published 2018-04-24
URL http://arxiv.org/abs/1804.08838v1
PDF http://arxiv.org/pdf/1804.08838v1.pdf
PWC https://paperswithcode.com/paper/measuring-the-intrinsic-dimension-of
Repo https://github.com/Helsinki-NLP/shared-info
Framework none

3D RoI-aware U-Net for Accurate and Efficient Colorectal Tumor Segmentation

Title 3D RoI-aware U-Net for Accurate and Efficient Colorectal Tumor Segmentation
Authors Yi-Jie Huang, Qi Dou, Zi-Xian Wang, Li-Zhi Liu, Ying Jin, Chao-Feng Li, Lisheng Wang, Hao Chen, Rui-Hua Xu
Abstract Segmentation of colorectal cancerous regions from 3D Magnetic Resonance (MR) images is a crucial procedure for radiotherapy which conventionally requires accurate delineation of tumour boundaries at an expense of labor, time and reproducibility. While deep learning based methods serve good baselines in 3D image segmentation tasks, small applicable patch size limits effective receptive field and degrades segmentation performance. In addition, Regions of interest (RoIs) localization from large whole volume 3D images serves as a preceding operation that brings about multiple benefits in terms of speed, target completeness, reduction of false positives. Distinct from sliding window or non-joint localization-segmentation based models, we propose a novel multitask framework referred to as 3D RoI-aware U-Net (3D RU-Net), for RoI localization and in-region segmentation where the two tasks share one backbone encoder network. With the region proposals from the encoder, we crop multi-level RoI in-region features from the encoder to form a GPU memory-efficient decoder for detailpreserving segmentation and therefore enlarged applicable volume size and effective receptive field. To effectively train the model, we designed a Dice formulated loss function for the global-to-local multi-task learning procedure. Based on the efficiency gains, we went on to ensemble models with different receptive fields to achieve even higher performance costing minor extra computational expensiveness. Extensive experiments were conducted on 64 cancerous cases with a four-fold cross-validation, and the results showed significant superiority in terms of accuracy and efficiency over conventional frameworks. In conclusion, the proposed method has a huge potential for extension to other 3D object segmentation tasks from medical images due to its inherent generalizability. The code for the proposed method is publicly available.
Tasks Multi-Task Learning, Semantic Segmentation
Published 2018-06-27
URL http://arxiv.org/abs/1806.10342v5
PDF http://arxiv.org/pdf/1806.10342v5.pdf
PWC https://paperswithcode.com/paper/3d-roi-aware-u-net-for-accurate-and-efficient
Repo https://github.com/RashmiUSC/3D-RU-Net
Framework pytorch

Learning Attractor Dynamics for Generative Memory

Title Learning Attractor Dynamics for Generative Memory
Authors Yan Wu, Greg Wayne, Karol Gregor, Timothy Lillicrap
Abstract A central challenge faced by memory systems is the robust retrieval of a stored pattern in the presence of interference due to other stored patterns and noise. A theoretically well-founded solution to robust retrieval is given by attractor dynamics, which iteratively clean up patterns during recall. However, incorporating attractor dynamics into modern deep learning systems poses difficulties: attractor basins are characterised by vanishing gradients, which are known to make training neural networks difficult. In this work, we avoid the vanishing gradient problem by training a generative distributed memory without simulating the attractor dynamics. Based on the idea of memory writing as inference, as proposed in the Kanerva Machine, we show that a likelihood-based Lyapunov function emerges from maximising the variational lower-bound of a generative memory. Experiments shows it converges to correct patterns upon iterative retrieval and achieves competitive performance as both a memory model and a generative model.
Tasks
Published 2018-11-23
URL http://arxiv.org/abs/1811.09556v1
PDF http://arxiv.org/pdf/1811.09556v1.pdf
PWC https://paperswithcode.com/paper/learning-attractor-dynamics-for-generative
Repo https://github.com/deepmind/dynamic-kanerva-machines
Framework tf

Is it Safe to Drive? An Overview of Factors, Challenges, and Datasets for Driveability Assessment in Autonomous Driving

Title Is it Safe to Drive? An Overview of Factors, Challenges, and Datasets for Driveability Assessment in Autonomous Driving
Authors Junyao Guo, Unmesh Kurup, Mohak Shah
Abstract With recent advances in learning algorithms and hardware development, autonomous cars have shown promise when operating in structured environments under good driving conditions. However, for complex, cluttered and unseen environments with high uncertainty, autonomous driving systems still frequently demonstrate erroneous or unexpected behaviors, that could lead to catastrophic outcomes. Autonomous vehicles should ideally adapt to driving conditions; while this can be achieved through multiple routes, it would be beneficial as a first step to be able to characterize Driveability in some quantified form. To this end, this paper aims to create a framework for investigating different factors that can impact driveability. Also, one of the main mechanisms to adapt autonomous driving systems to any driving condition is to be able to learn and generalize from representative scenarios. The machine learning algorithms that currently do so learn predominantly in a supervised manner and consequently need sufficient data for robust and efficient learning. Therefore, we also perform a comparative overview of 45 public driving datasets that enable learning and publish this dataset index at https://sites.google.com/view/driveability-survey-datasets. Specifically, we categorize the datasets according to use cases, and highlight the datasets that capture complicated and hazardous driving conditions which can be better used for training robust driving models. Furthermore, by discussions of what driving scenarios are not covered by existing public datasets and what driveability factors need more investigation and data acquisition, this paper aims to encourage both targeted dataset collection and the proposal of novel driveability metrics that enhance the robustness of autonomous cars in adverse environments.
Tasks Autonomous Driving, Autonomous Vehicles
Published 2018-11-27
URL http://arxiv.org/abs/1811.11277v1
PDF http://arxiv.org/pdf/1811.11277v1.pdf
PWC https://paperswithcode.com/paper/is-it-safe-to-drive-an-overview-of-factors
Repo https://github.com/scaleapi/open-dataset-list
Framework none

Recurrent Neural Network-Based Semantic Variational Autoencoder for Sequence-to-Sequence Learning

Title Recurrent Neural Network-Based Semantic Variational Autoencoder for Sequence-to-Sequence Learning
Authors Myeongjun Jang, Seungwan Seo, Pilsung Kang
Abstract Sequence-to-sequence (Seq2seq) models have played an important role in the recent success of various natural language processing methods, such as machine translation, text summarization, and speech recognition. However, current Seq2seq models have trouble preserving global latent information from a long sequence of words. Variational autoencoder (VAE) alleviates this problem by learning a continuous semantic space of the input sentence. However, it does not solve the problem completely. In this paper, we propose a new recurrent neural network (RNN)-based Seq2seq model, RNN semantic variational autoencoder (RNN–SVAE), to better capture the global latent information of a sequence of words. To reflect the meaning of words in a sentence properly, without regard to its position within the sentence, we construct a document information vector using the attention information between the final state of the encoder and every prior hidden state. Then, the mean and standard deviation of the continuous semantic space are learned by using this vector to take advantage of the variational method. By using the document information vector to find the semantic space of the sentence, it becomes possible to better capture the global latent feature of the sentence. Experimental results of three natural language tasks (i.e., language modeling, missing word imputation, paraphrase identification) confirm that the proposed RNN–SVAE yields higher performance than two benchmark models.
Tasks Imputation, Language Modelling, Machine Translation, Paraphrase Identification, Speech Recognition, Text Summarization
Published 2018-02-09
URL http://arxiv.org/abs/1802.03238v2
PDF http://arxiv.org/pdf/1802.03238v2.pdf
PWC https://paperswithcode.com/paper/recurrent-neural-network-based-semantic
Repo https://github.com/MJ-Jang/RNN_SVAE
Framework tf

JeSemE: A Website for Exploring Diachronic Changes in Word Meaning and Emotion

Title JeSemE: A Website for Exploring Diachronic Changes in Word Meaning and Emotion
Authors Johannes Hellrich, Sven Buechel, Udo Hahn
Abstract We here introduce a substantially extended version of JeSemE, a website for visually exploring computationally derived time-variant information on word meaning and lexical emotion assembled from five large diachronic text corpora. JeSemE is intended as an interactive tool for scholars in the (digital) humanities who are mostly limited to consulting manually compiled dictionaries for such information, if available at all. JeSemE uniquely combines state-of-the-art distributional semantics with a nuanced model of human emotions, two information streams we deem beneficial for a data-driven interpretation of texts in the humanities.
Tasks
Published 2018-07-11
URL http://arxiv.org/abs/1807.04148v1
PDF http://arxiv.org/pdf/1807.04148v1.pdf
PWC https://paperswithcode.com/paper/jeseme-a-website-for-exploring-diachronic
Repo https://github.com/hellrich/JeSemE
Framework none

How Robust is 3D Human Pose Estimation to Occlusion?

Title How Robust is 3D Human Pose Estimation to Occlusion?
Authors István Sárándi, Timm Linder, Kai O. Arras, Bastian Leibe
Abstract Occlusion is commonplace in realistic human-robot shared environments, yet its effects are not considered in standard 3D human pose estimation benchmarks. This leaves the question open: how robust are state-of-the-art 3D pose estimation methods against partial occlusions? We study several types of synthetic occlusions over the Human3.6M dataset and find a method with state-of-the-art benchmark performance to be sensitive even to low amounts of occlusion. Addressing this issue is key to progress in applications such as collaborative and service robotics. We take a first step in this direction by improving occlusion-robustness through training data augmentation with synthetic occlusions. This also turns out to be an effective regularizer that is beneficial even for non-occluded test cases.
Tasks 3D Human Pose Estimation, 3D Pose Estimation, Data Augmentation, Pose Estimation
Published 2018-08-28
URL http://arxiv.org/abs/1808.09316v2
PDF http://arxiv.org/pdf/1808.09316v2.pdf
PWC https://paperswithcode.com/paper/how-robust-is-3d-human-pose-estimation-to
Repo https://github.com/isarandi/synthetic-occlusion
Framework none

Dynamic Graph Representation Learning via Self-Attention Networks

Title Dynamic Graph Representation Learning via Self-Attention Networks
Authors Aravind Sankar, Yanhong Wu, Liang Gou, Wei Zhang, Hao Yang
Abstract Learning latent representations of nodes in graphs is an important and ubiquitous task with widespread applications such as link prediction, node classification, and graph visualization. Previous methods on graph representation learning mainly focus on static graphs, however, many real-world graphs are dynamic and evolve over time. In this paper, we present Dynamic Self-Attention Network (DySAT), a novel neural architecture that operates on dynamic graphs and learns node representations that capture both structural properties and temporal evolutionary patterns. Specifically, DySAT computes node representations by jointly employing self-attention layers along two dimensions: structural neighborhood and temporal dynamics. We conduct link prediction experiments on two classes of graphs: communication networks and bipartite rating networks. Our experimental results show that DySAT has a significant performance gain over several different state-of-the-art graph embedding baselines.
Tasks Graph Embedding, Graph Representation Learning, Link Prediction, Node Classification, Representation Learning
Published 2018-12-22
URL https://arxiv.org/abs/1812.09430v2
PDF https://arxiv.org/pdf/1812.09430v2.pdf
PWC https://paperswithcode.com/paper/dynamic-graph-representation-learning-via
Repo https://github.com/aravindsankar28/DySAT
Framework tf

Personalized Gaussian Processes for Forecasting of Alzheimer’s Disease Assessment Scale-Cognition Sub-Scale (ADAS-Cog13)

Title Personalized Gaussian Processes for Forecasting of Alzheimer’s Disease Assessment Scale-Cognition Sub-Scale (ADAS-Cog13)
Authors Yuria Utsumi, Ognjen Rudovic, Kelly Peterson, Ricardo Guerrero, Rosalind W. Picard
Abstract In this paper, we introduce the use of a personalized Gaussian Process model (pGP) to predict per-patient changes in ADAS-Cog13 – a significant predictor of Alzheimer’s Disease (AD) in the cognitive domain – using data from each patient’s previous visits, and testing on future (held-out) data. We start by learning a population-level model using multi-modal data from previously seen patients using a base Gaussian Process (GP) regression. The personalized GP (pGP) is formed by adapting the base GP sequentially over time to a new (target) patient using domain adaptive GPs. We extend this personalized approach to predict the values of ADAS-Cog13 over the future 6, 12, 18, and 24 months. We compare this approach to a GP model trained only on past data of the target patients (tGP), as well as to a new approach that combines pGP with tGP. We find that the new approach, combining pGP with tGP, leads to large improvements in accurately forecasting future ADAS-Cog13 scores.
Tasks Gaussian Processes
Published 2018-02-22
URL http://arxiv.org/abs/1802.08561v4
PDF http://arxiv.org/pdf/1802.08561v4.pdf
PWC https://paperswithcode.com/paper/personalized-gaussian-processes-for
Repo https://github.com/yuriautsumi/PersonalizedGP
Framework tf

On the Spectrum of Random Features Maps of High Dimensional Data

Title On the Spectrum of Random Features Maps of High Dimensional Data
Authors Zhenyu Liao, Romain Couillet
Abstract Random feature maps are ubiquitous in modern statistical machine learning, where they generalize random projections by means of powerful, yet often difficult to analyze nonlinear operators. In this paper, we leverage the “concentration” phenomenon induced by random matrix theory to perform a spectral analysis on the Gram matrix of these random feature maps, here for Gaussian mixture models of simultaneously large dimension and size. Our results are instrumental to a deeper understanding on the interplay of the nonlinearity and the statistics of the data, thereby allowing for a better tuning of random feature-based techniques.
Tasks
Published 2018-05-30
URL http://arxiv.org/abs/1805.11916v2
PDF http://arxiv.org/pdf/1805.11916v2.pdf
PWC https://paperswithcode.com/paper/on-the-spectrum-of-random-features-maps-of
Repo https://github.com/Zhenyu-LIAO/RMT4RFM
Framework none

Digging Into Self-Supervised Monocular Depth Estimation

Title Digging Into Self-Supervised Monocular Depth Estimation
Authors Clément Godard, Oisin Mac Aodha, Michael Firman, Gabriel Brostow
Abstract Per-pixel ground-truth depth data is challenging to acquire at scale. To overcome this limitation, self-supervised learning has emerged as a promising alternative for training models to perform monocular depth estimation. In this paper, we propose a set of improvements, which together result in both quantitatively and qualitatively improved depth maps compared to competing self-supervised methods. Research on self-supervised monocular training usually explores increasingly complex architectures, loss functions, and image formation models, all of which have recently helped to close the gap with fully-supervised methods. We show that a surprisingly simple model, and associated design choices, lead to superior predictions. In particular, we propose (i) a minimum reprojection loss, designed to robustly handle occlusions, (ii) a full-resolution multi-scale sampling method that reduces visual artifacts, and (iii) an auto-masking loss to ignore training pixels that violate camera motion assumptions. We demonstrate the effectiveness of each component in isolation, and show high quality, state-of-the-art results on the KITTI benchmark.
Tasks Depth Estimation, Image Reconstruction, Motion Estimation, Scene Understanding
Published 2018-06-04
URL https://arxiv.org/abs/1806.01260v4
PDF https://arxiv.org/pdf/1806.01260v4.pdf
PWC https://paperswithcode.com/paper/digging-into-self-supervised-monocular-depth
Repo https://github.com/FangGet/tf-monodepth2
Framework tf

Interactive Language Acquisition with One-shot Visual Concept Learning through a Conversational Game

Title Interactive Language Acquisition with One-shot Visual Concept Learning through a Conversational Game
Authors Haichao Zhang, Haonan Yu, Wei Xu
Abstract Building intelligent agents that can communicate with and learn from humans in natural language is of great value. Supervised language learning is limited by the ability of capturing mainly the statistics of training data, and is hardly adaptive to new scenarios or flexible for acquiring new knowledge without inefficient retraining or catastrophic forgetting. We highlight the perspective that conversational interaction serves as a natural interface both for language learning and for novel knowledge acquisition and propose a joint imitation and reinforcement approach for grounded language learning through an interactive conversational game. The agent trained with this approach is able to actively acquire information by asking questions about novel objects and use the just-learned knowledge in subsequent conversations in a one-shot fashion. Results compared with other methods verified the effectiveness of the proposed approach.
Tasks Language Acquisition
Published 2018-04-26
URL http://arxiv.org/abs/1805.00462v1
PDF http://arxiv.org/pdf/1805.00462v1.pdf
PWC https://paperswithcode.com/paper/interactive-language-acquisition-with-one
Repo https://github.com/PaddlePaddle/XWorld
Framework none

Multi-Agent Imitation Learning for Driving Simulation

Title Multi-Agent Imitation Learning for Driving Simulation
Authors Raunak P. Bhattacharyya, Derek J. Phillips, Blake Wulfe, Jeremy Morton, Alex Kuefler, Mykel J. Kochenderfer
Abstract Simulation is an appealing option for validating the safety of autonomous vehicles. Generative Adversarial Imitation Learning (GAIL) has recently been shown to learn representative human driver models. These human driver models were learned through training in single-agent environments, but they have difficulty in generalizing to multi-agent driving scenarios. We argue these difficulties arise because observations at training and test time are sampled from different distributions. This difference makes such models unsuitable for the simulation of driving scenes, where multiple agents must interact realistically over long time horizons. We extend GAIL to address these shortcomings through a parameter-sharing approach grounded in curriculum learning. Compared with single-agent GAIL policies, policies generated by our PS-GAIL method prove superior at interacting stably in a multi-agent setting and capturing the emergent behavior of human drivers.
Tasks Autonomous Vehicles, Imitation Learning
Published 2018-03-02
URL http://arxiv.org/abs/1803.01044v1
PDF http://arxiv.org/pdf/1803.01044v1.pdf
PWC https://paperswithcode.com/paper/multi-agent-imitation-learning-for-driving
Repo https://github.com/sisl/ngsim_env
Framework tf

Deep Pictorial Gaze Estimation

Title Deep Pictorial Gaze Estimation
Authors Seonwook Park, Adrian Spurr, Otmar Hilliges
Abstract Estimating human gaze from natural eye images only is a challenging task. Gaze direction can be defined by the pupil- and the eyeball center where the latter is unobservable in 2D images. Hence, achieving highly accurate gaze estimates is an ill-posed problem. In this paper, we introduce a novel deep neural network architecture specifically designed for the task of gaze estimation from single eye input. Instead of directly regressing two angles for the pitch and yaw of the eyeball, we regress to an intermediate pictorial representation which in turn simplifies the task of 3D gaze direction estimation. Our quantitative and qualitative results show that our approach achieves higher accuracies than the state-of-the-art and is robust to variation in gaze, head pose and image quality.
Tasks Gaze Estimation
Published 2018-07-26
URL http://arxiv.org/abs/1807.10002v1
PDF http://arxiv.org/pdf/1807.10002v1.pdf
PWC https://paperswithcode.com/paper/deep-pictorial-gaze-estimation
Repo https://github.com/swook/GazeML
Framework tf

Learning model-based strategies in simple environments with hierarchical q-networks

Title Learning model-based strategies in simple environments with hierarchical q-networks
Authors Necati Alp Muyesser, Kyle Dunovan, Timothy Verstynen
Abstract Recent advances in deep learning have allowed artificial agents to rival human-level performance on a wide range of complex tasks; however, the ability of these networks to learn generalizable strategies remains a pressing challenge. This critical limitation is due in part to two factors: the opaque information representation in deep neural networks and the complexity of the task environments in which they are typically deployed. Here we propose a novel Hierarchical Q-Network (HQN) motivated by theories of the hierarchical organization of the human prefrontal cortex, that attempts to identify lower dimensional patterns in the value landscape that can be exploited to construct an internal model of rules in simple environments. We draw on combinatorial games, where there exists a single optimal strategy for winning that generalizes across other features of the game, to probe the strategy generalization of the HQN and other reinforcement learning (RL) agents using variations of Wythoff’s game. Traditional RL approaches failed to reach satisfactory performance on variants of Wythoff’s Game; however, the HQN learned heuristic-like strategies that generalized across changes in board configuration. More importantly, the HQN allowed for transparent inspection of the agent’s internal model of the game following training. Our results show how a biologically inspired hierarchical learner can facilitate learning abstract rules to promote robust and flexible action policies in simplified training environments with clearly delineated optimal strategies.
Tasks
Published 2018-01-20
URL http://arxiv.org/abs/1801.06689v1
PDF http://arxiv.org/pdf/1801.06689v1.pdf
PWC https://paperswithcode.com/paper/learning-model-based-strategies-in-simple
Repo https://github.com/CoAxLab/azad
Framework pytorch
comments powered by Disqus