Paper Group AWR 332
Measuring the Intrinsic Dimension of Objective Landscapes. 3D RoI-aware U-Net for Accurate and Efficient Colorectal Tumor Segmentation. Learning Attractor Dynamics for Generative Memory. Is it Safe to Drive? An Overview of Factors, Challenges, and Datasets for Driveability Assessment in Autonomous Driving. Recurrent Neural Network-Based Semantic Va …
Measuring the Intrinsic Dimension of Objective Landscapes
Title | Measuring the Intrinsic Dimension of Objective Landscapes |
Authors | Chunyuan Li, Heerad Farkhoor, Rosanne Liu, Jason Yosinski |
Abstract | Many recently trained neural networks employ large numbers of parameters to achieve good performance. One may intuitively use the number of parameters required as a rough gauge of the difficulty of a problem. But how accurate are such notions? How many parameters are really needed? In this paper we attempt to answer this question by training networks not in their native parameter space, but instead in a smaller, randomly oriented subspace. We slowly increase the dimension of this subspace, note at which dimension solutions first appear, and define this to be the intrinsic dimension of the objective landscape. The approach is simple to implement, computationally tractable, and produces several suggestive conclusions. Many problems have smaller intrinsic dimensions than one might suspect, and the intrinsic dimension for a given dataset varies little across a family of models with vastly different sizes. This latter result has the profound implication that once a parameter space is large enough to solve a problem, extra parameters serve directly to increase the dimensionality of the solution manifold. Intrinsic dimension allows some quantitative comparison of problem difficulty across supervised, reinforcement, and other types of learning where we conclude, for example, that solving the inverted pendulum problem is 100 times easier than classifying digits from MNIST, and playing Atari Pong from pixels is about as hard as classifying CIFAR-10. In addition to providing new cartography of the objective landscapes wandered by parameterized models, the method is a simple technique for constructively obtaining an upper bound on the minimum description length of a solution. A byproduct of this construction is a simple approach for compressing networks, in some cases by more than 100 times. |
Tasks | |
Published | 2018-04-24 |
URL | http://arxiv.org/abs/1804.08838v1 |
http://arxiv.org/pdf/1804.08838v1.pdf | |
PWC | https://paperswithcode.com/paper/measuring-the-intrinsic-dimension-of |
Repo | https://github.com/Helsinki-NLP/shared-info |
Framework | none |
3D RoI-aware U-Net for Accurate and Efficient Colorectal Tumor Segmentation
Title | 3D RoI-aware U-Net for Accurate and Efficient Colorectal Tumor Segmentation |
Authors | Yi-Jie Huang, Qi Dou, Zi-Xian Wang, Li-Zhi Liu, Ying Jin, Chao-Feng Li, Lisheng Wang, Hao Chen, Rui-Hua Xu |
Abstract | Segmentation of colorectal cancerous regions from 3D Magnetic Resonance (MR) images is a crucial procedure for radiotherapy which conventionally requires accurate delineation of tumour boundaries at an expense of labor, time and reproducibility. While deep learning based methods serve good baselines in 3D image segmentation tasks, small applicable patch size limits effective receptive field and degrades segmentation performance. In addition, Regions of interest (RoIs) localization from large whole volume 3D images serves as a preceding operation that brings about multiple benefits in terms of speed, target completeness, reduction of false positives. Distinct from sliding window or non-joint localization-segmentation based models, we propose a novel multitask framework referred to as 3D RoI-aware U-Net (3D RU-Net), for RoI localization and in-region segmentation where the two tasks share one backbone encoder network. With the region proposals from the encoder, we crop multi-level RoI in-region features from the encoder to form a GPU memory-efficient decoder for detailpreserving segmentation and therefore enlarged applicable volume size and effective receptive field. To effectively train the model, we designed a Dice formulated loss function for the global-to-local multi-task learning procedure. Based on the efficiency gains, we went on to ensemble models with different receptive fields to achieve even higher performance costing minor extra computational expensiveness. Extensive experiments were conducted on 64 cancerous cases with a four-fold cross-validation, and the results showed significant superiority in terms of accuracy and efficiency over conventional frameworks. In conclusion, the proposed method has a huge potential for extension to other 3D object segmentation tasks from medical images due to its inherent generalizability. The code for the proposed method is publicly available. |
Tasks | Multi-Task Learning, Semantic Segmentation |
Published | 2018-06-27 |
URL | http://arxiv.org/abs/1806.10342v5 |
http://arxiv.org/pdf/1806.10342v5.pdf | |
PWC | https://paperswithcode.com/paper/3d-roi-aware-u-net-for-accurate-and-efficient |
Repo | https://github.com/RashmiUSC/3D-RU-Net |
Framework | pytorch |
Learning Attractor Dynamics for Generative Memory
Title | Learning Attractor Dynamics for Generative Memory |
Authors | Yan Wu, Greg Wayne, Karol Gregor, Timothy Lillicrap |
Abstract | A central challenge faced by memory systems is the robust retrieval of a stored pattern in the presence of interference due to other stored patterns and noise. A theoretically well-founded solution to robust retrieval is given by attractor dynamics, which iteratively clean up patterns during recall. However, incorporating attractor dynamics into modern deep learning systems poses difficulties: attractor basins are characterised by vanishing gradients, which are known to make training neural networks difficult. In this work, we avoid the vanishing gradient problem by training a generative distributed memory without simulating the attractor dynamics. Based on the idea of memory writing as inference, as proposed in the Kanerva Machine, we show that a likelihood-based Lyapunov function emerges from maximising the variational lower-bound of a generative memory. Experiments shows it converges to correct patterns upon iterative retrieval and achieves competitive performance as both a memory model and a generative model. |
Tasks | |
Published | 2018-11-23 |
URL | http://arxiv.org/abs/1811.09556v1 |
http://arxiv.org/pdf/1811.09556v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-attractor-dynamics-for-generative |
Repo | https://github.com/deepmind/dynamic-kanerva-machines |
Framework | tf |
Is it Safe to Drive? An Overview of Factors, Challenges, and Datasets for Driveability Assessment in Autonomous Driving
Title | Is it Safe to Drive? An Overview of Factors, Challenges, and Datasets for Driveability Assessment in Autonomous Driving |
Authors | Junyao Guo, Unmesh Kurup, Mohak Shah |
Abstract | With recent advances in learning algorithms and hardware development, autonomous cars have shown promise when operating in structured environments under good driving conditions. However, for complex, cluttered and unseen environments with high uncertainty, autonomous driving systems still frequently demonstrate erroneous or unexpected behaviors, that could lead to catastrophic outcomes. Autonomous vehicles should ideally adapt to driving conditions; while this can be achieved through multiple routes, it would be beneficial as a first step to be able to characterize Driveability in some quantified form. To this end, this paper aims to create a framework for investigating different factors that can impact driveability. Also, one of the main mechanisms to adapt autonomous driving systems to any driving condition is to be able to learn and generalize from representative scenarios. The machine learning algorithms that currently do so learn predominantly in a supervised manner and consequently need sufficient data for robust and efficient learning. Therefore, we also perform a comparative overview of 45 public driving datasets that enable learning and publish this dataset index at https://sites.google.com/view/driveability-survey-datasets. Specifically, we categorize the datasets according to use cases, and highlight the datasets that capture complicated and hazardous driving conditions which can be better used for training robust driving models. Furthermore, by discussions of what driving scenarios are not covered by existing public datasets and what driveability factors need more investigation and data acquisition, this paper aims to encourage both targeted dataset collection and the proposal of novel driveability metrics that enhance the robustness of autonomous cars in adverse environments. |
Tasks | Autonomous Driving, Autonomous Vehicles |
Published | 2018-11-27 |
URL | http://arxiv.org/abs/1811.11277v1 |
http://arxiv.org/pdf/1811.11277v1.pdf | |
PWC | https://paperswithcode.com/paper/is-it-safe-to-drive-an-overview-of-factors |
Repo | https://github.com/scaleapi/open-dataset-list |
Framework | none |
Recurrent Neural Network-Based Semantic Variational Autoencoder for Sequence-to-Sequence Learning
Title | Recurrent Neural Network-Based Semantic Variational Autoencoder for Sequence-to-Sequence Learning |
Authors | Myeongjun Jang, Seungwan Seo, Pilsung Kang |
Abstract | Sequence-to-sequence (Seq2seq) models have played an important role in the recent success of various natural language processing methods, such as machine translation, text summarization, and speech recognition. However, current Seq2seq models have trouble preserving global latent information from a long sequence of words. Variational autoencoder (VAE) alleviates this problem by learning a continuous semantic space of the input sentence. However, it does not solve the problem completely. In this paper, we propose a new recurrent neural network (RNN)-based Seq2seq model, RNN semantic variational autoencoder (RNN–SVAE), to better capture the global latent information of a sequence of words. To reflect the meaning of words in a sentence properly, without regard to its position within the sentence, we construct a document information vector using the attention information between the final state of the encoder and every prior hidden state. Then, the mean and standard deviation of the continuous semantic space are learned by using this vector to take advantage of the variational method. By using the document information vector to find the semantic space of the sentence, it becomes possible to better capture the global latent feature of the sentence. Experimental results of three natural language tasks (i.e., language modeling, missing word imputation, paraphrase identification) confirm that the proposed RNN–SVAE yields higher performance than two benchmark models. |
Tasks | Imputation, Language Modelling, Machine Translation, Paraphrase Identification, Speech Recognition, Text Summarization |
Published | 2018-02-09 |
URL | http://arxiv.org/abs/1802.03238v2 |
http://arxiv.org/pdf/1802.03238v2.pdf | |
PWC | https://paperswithcode.com/paper/recurrent-neural-network-based-semantic |
Repo | https://github.com/MJ-Jang/RNN_SVAE |
Framework | tf |
JeSemE: A Website for Exploring Diachronic Changes in Word Meaning and Emotion
Title | JeSemE: A Website for Exploring Diachronic Changes in Word Meaning and Emotion |
Authors | Johannes Hellrich, Sven Buechel, Udo Hahn |
Abstract | We here introduce a substantially extended version of JeSemE, a website for visually exploring computationally derived time-variant information on word meaning and lexical emotion assembled from five large diachronic text corpora. JeSemE is intended as an interactive tool for scholars in the (digital) humanities who are mostly limited to consulting manually compiled dictionaries for such information, if available at all. JeSemE uniquely combines state-of-the-art distributional semantics with a nuanced model of human emotions, two information streams we deem beneficial for a data-driven interpretation of texts in the humanities. |
Tasks | |
Published | 2018-07-11 |
URL | http://arxiv.org/abs/1807.04148v1 |
http://arxiv.org/pdf/1807.04148v1.pdf | |
PWC | https://paperswithcode.com/paper/jeseme-a-website-for-exploring-diachronic |
Repo | https://github.com/hellrich/JeSemE |
Framework | none |
How Robust is 3D Human Pose Estimation to Occlusion?
Title | How Robust is 3D Human Pose Estimation to Occlusion? |
Authors | István Sárándi, Timm Linder, Kai O. Arras, Bastian Leibe |
Abstract | Occlusion is commonplace in realistic human-robot shared environments, yet its effects are not considered in standard 3D human pose estimation benchmarks. This leaves the question open: how robust are state-of-the-art 3D pose estimation methods against partial occlusions? We study several types of synthetic occlusions over the Human3.6M dataset and find a method with state-of-the-art benchmark performance to be sensitive even to low amounts of occlusion. Addressing this issue is key to progress in applications such as collaborative and service robotics. We take a first step in this direction by improving occlusion-robustness through training data augmentation with synthetic occlusions. This also turns out to be an effective regularizer that is beneficial even for non-occluded test cases. |
Tasks | 3D Human Pose Estimation, 3D Pose Estimation, Data Augmentation, Pose Estimation |
Published | 2018-08-28 |
URL | http://arxiv.org/abs/1808.09316v2 |
http://arxiv.org/pdf/1808.09316v2.pdf | |
PWC | https://paperswithcode.com/paper/how-robust-is-3d-human-pose-estimation-to |
Repo | https://github.com/isarandi/synthetic-occlusion |
Framework | none |
Dynamic Graph Representation Learning via Self-Attention Networks
Title | Dynamic Graph Representation Learning via Self-Attention Networks |
Authors | Aravind Sankar, Yanhong Wu, Liang Gou, Wei Zhang, Hao Yang |
Abstract | Learning latent representations of nodes in graphs is an important and ubiquitous task with widespread applications such as link prediction, node classification, and graph visualization. Previous methods on graph representation learning mainly focus on static graphs, however, many real-world graphs are dynamic and evolve over time. In this paper, we present Dynamic Self-Attention Network (DySAT), a novel neural architecture that operates on dynamic graphs and learns node representations that capture both structural properties and temporal evolutionary patterns. Specifically, DySAT computes node representations by jointly employing self-attention layers along two dimensions: structural neighborhood and temporal dynamics. We conduct link prediction experiments on two classes of graphs: communication networks and bipartite rating networks. Our experimental results show that DySAT has a significant performance gain over several different state-of-the-art graph embedding baselines. |
Tasks | Graph Embedding, Graph Representation Learning, Link Prediction, Node Classification, Representation Learning |
Published | 2018-12-22 |
URL | https://arxiv.org/abs/1812.09430v2 |
https://arxiv.org/pdf/1812.09430v2.pdf | |
PWC | https://paperswithcode.com/paper/dynamic-graph-representation-learning-via |
Repo | https://github.com/aravindsankar28/DySAT |
Framework | tf |
Personalized Gaussian Processes for Forecasting of Alzheimer’s Disease Assessment Scale-Cognition Sub-Scale (ADAS-Cog13)
Title | Personalized Gaussian Processes for Forecasting of Alzheimer’s Disease Assessment Scale-Cognition Sub-Scale (ADAS-Cog13) |
Authors | Yuria Utsumi, Ognjen Rudovic, Kelly Peterson, Ricardo Guerrero, Rosalind W. Picard |
Abstract | In this paper, we introduce the use of a personalized Gaussian Process model (pGP) to predict per-patient changes in ADAS-Cog13 – a significant predictor of Alzheimer’s Disease (AD) in the cognitive domain – using data from each patient’s previous visits, and testing on future (held-out) data. We start by learning a population-level model using multi-modal data from previously seen patients using a base Gaussian Process (GP) regression. The personalized GP (pGP) is formed by adapting the base GP sequentially over time to a new (target) patient using domain adaptive GPs. We extend this personalized approach to predict the values of ADAS-Cog13 over the future 6, 12, 18, and 24 months. We compare this approach to a GP model trained only on past data of the target patients (tGP), as well as to a new approach that combines pGP with tGP. We find that the new approach, combining pGP with tGP, leads to large improvements in accurately forecasting future ADAS-Cog13 scores. |
Tasks | Gaussian Processes |
Published | 2018-02-22 |
URL | http://arxiv.org/abs/1802.08561v4 |
http://arxiv.org/pdf/1802.08561v4.pdf | |
PWC | https://paperswithcode.com/paper/personalized-gaussian-processes-for |
Repo | https://github.com/yuriautsumi/PersonalizedGP |
Framework | tf |
On the Spectrum of Random Features Maps of High Dimensional Data
Title | On the Spectrum of Random Features Maps of High Dimensional Data |
Authors | Zhenyu Liao, Romain Couillet |
Abstract | Random feature maps are ubiquitous in modern statistical machine learning, where they generalize random projections by means of powerful, yet often difficult to analyze nonlinear operators. In this paper, we leverage the “concentration” phenomenon induced by random matrix theory to perform a spectral analysis on the Gram matrix of these random feature maps, here for Gaussian mixture models of simultaneously large dimension and size. Our results are instrumental to a deeper understanding on the interplay of the nonlinearity and the statistics of the data, thereby allowing for a better tuning of random feature-based techniques. |
Tasks | |
Published | 2018-05-30 |
URL | http://arxiv.org/abs/1805.11916v2 |
http://arxiv.org/pdf/1805.11916v2.pdf | |
PWC | https://paperswithcode.com/paper/on-the-spectrum-of-random-features-maps-of |
Repo | https://github.com/Zhenyu-LIAO/RMT4RFM |
Framework | none |
Digging Into Self-Supervised Monocular Depth Estimation
Title | Digging Into Self-Supervised Monocular Depth Estimation |
Authors | Clément Godard, Oisin Mac Aodha, Michael Firman, Gabriel Brostow |
Abstract | Per-pixel ground-truth depth data is challenging to acquire at scale. To overcome this limitation, self-supervised learning has emerged as a promising alternative for training models to perform monocular depth estimation. In this paper, we propose a set of improvements, which together result in both quantitatively and qualitatively improved depth maps compared to competing self-supervised methods. Research on self-supervised monocular training usually explores increasingly complex architectures, loss functions, and image formation models, all of which have recently helped to close the gap with fully-supervised methods. We show that a surprisingly simple model, and associated design choices, lead to superior predictions. In particular, we propose (i) a minimum reprojection loss, designed to robustly handle occlusions, (ii) a full-resolution multi-scale sampling method that reduces visual artifacts, and (iii) an auto-masking loss to ignore training pixels that violate camera motion assumptions. We demonstrate the effectiveness of each component in isolation, and show high quality, state-of-the-art results on the KITTI benchmark. |
Tasks | Depth Estimation, Image Reconstruction, Motion Estimation, Scene Understanding |
Published | 2018-06-04 |
URL | https://arxiv.org/abs/1806.01260v4 |
https://arxiv.org/pdf/1806.01260v4.pdf | |
PWC | https://paperswithcode.com/paper/digging-into-self-supervised-monocular-depth |
Repo | https://github.com/FangGet/tf-monodepth2 |
Framework | tf |
Interactive Language Acquisition with One-shot Visual Concept Learning through a Conversational Game
Title | Interactive Language Acquisition with One-shot Visual Concept Learning through a Conversational Game |
Authors | Haichao Zhang, Haonan Yu, Wei Xu |
Abstract | Building intelligent agents that can communicate with and learn from humans in natural language is of great value. Supervised language learning is limited by the ability of capturing mainly the statistics of training data, and is hardly adaptive to new scenarios or flexible for acquiring new knowledge without inefficient retraining or catastrophic forgetting. We highlight the perspective that conversational interaction serves as a natural interface both for language learning and for novel knowledge acquisition and propose a joint imitation and reinforcement approach for grounded language learning through an interactive conversational game. The agent trained with this approach is able to actively acquire information by asking questions about novel objects and use the just-learned knowledge in subsequent conversations in a one-shot fashion. Results compared with other methods verified the effectiveness of the proposed approach. |
Tasks | Language Acquisition |
Published | 2018-04-26 |
URL | http://arxiv.org/abs/1805.00462v1 |
http://arxiv.org/pdf/1805.00462v1.pdf | |
PWC | https://paperswithcode.com/paper/interactive-language-acquisition-with-one |
Repo | https://github.com/PaddlePaddle/XWorld |
Framework | none |
Multi-Agent Imitation Learning for Driving Simulation
Title | Multi-Agent Imitation Learning for Driving Simulation |
Authors | Raunak P. Bhattacharyya, Derek J. Phillips, Blake Wulfe, Jeremy Morton, Alex Kuefler, Mykel J. Kochenderfer |
Abstract | Simulation is an appealing option for validating the safety of autonomous vehicles. Generative Adversarial Imitation Learning (GAIL) has recently been shown to learn representative human driver models. These human driver models were learned through training in single-agent environments, but they have difficulty in generalizing to multi-agent driving scenarios. We argue these difficulties arise because observations at training and test time are sampled from different distributions. This difference makes such models unsuitable for the simulation of driving scenes, where multiple agents must interact realistically over long time horizons. We extend GAIL to address these shortcomings through a parameter-sharing approach grounded in curriculum learning. Compared with single-agent GAIL policies, policies generated by our PS-GAIL method prove superior at interacting stably in a multi-agent setting and capturing the emergent behavior of human drivers. |
Tasks | Autonomous Vehicles, Imitation Learning |
Published | 2018-03-02 |
URL | http://arxiv.org/abs/1803.01044v1 |
http://arxiv.org/pdf/1803.01044v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-agent-imitation-learning-for-driving |
Repo | https://github.com/sisl/ngsim_env |
Framework | tf |
Deep Pictorial Gaze Estimation
Title | Deep Pictorial Gaze Estimation |
Authors | Seonwook Park, Adrian Spurr, Otmar Hilliges |
Abstract | Estimating human gaze from natural eye images only is a challenging task. Gaze direction can be defined by the pupil- and the eyeball center where the latter is unobservable in 2D images. Hence, achieving highly accurate gaze estimates is an ill-posed problem. In this paper, we introduce a novel deep neural network architecture specifically designed for the task of gaze estimation from single eye input. Instead of directly regressing two angles for the pitch and yaw of the eyeball, we regress to an intermediate pictorial representation which in turn simplifies the task of 3D gaze direction estimation. Our quantitative and qualitative results show that our approach achieves higher accuracies than the state-of-the-art and is robust to variation in gaze, head pose and image quality. |
Tasks | Gaze Estimation |
Published | 2018-07-26 |
URL | http://arxiv.org/abs/1807.10002v1 |
http://arxiv.org/pdf/1807.10002v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-pictorial-gaze-estimation |
Repo | https://github.com/swook/GazeML |
Framework | tf |
Learning model-based strategies in simple environments with hierarchical q-networks
Title | Learning model-based strategies in simple environments with hierarchical q-networks |
Authors | Necati Alp Muyesser, Kyle Dunovan, Timothy Verstynen |
Abstract | Recent advances in deep learning have allowed artificial agents to rival human-level performance on a wide range of complex tasks; however, the ability of these networks to learn generalizable strategies remains a pressing challenge. This critical limitation is due in part to two factors: the opaque information representation in deep neural networks and the complexity of the task environments in which they are typically deployed. Here we propose a novel Hierarchical Q-Network (HQN) motivated by theories of the hierarchical organization of the human prefrontal cortex, that attempts to identify lower dimensional patterns in the value landscape that can be exploited to construct an internal model of rules in simple environments. We draw on combinatorial games, where there exists a single optimal strategy for winning that generalizes across other features of the game, to probe the strategy generalization of the HQN and other reinforcement learning (RL) agents using variations of Wythoff’s game. Traditional RL approaches failed to reach satisfactory performance on variants of Wythoff’s Game; however, the HQN learned heuristic-like strategies that generalized across changes in board configuration. More importantly, the HQN allowed for transparent inspection of the agent’s internal model of the game following training. Our results show how a biologically inspired hierarchical learner can facilitate learning abstract rules to promote robust and flexible action policies in simplified training environments with clearly delineated optimal strategies. |
Tasks | |
Published | 2018-01-20 |
URL | http://arxiv.org/abs/1801.06689v1 |
http://arxiv.org/pdf/1801.06689v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-model-based-strategies-in-simple |
Repo | https://github.com/CoAxLab/azad |
Framework | pytorch |