February 1, 2020

3269 words 16 mins read

Paper Group AWR 329

Paper Group AWR 329

Attentive Single-Tasking of Multiple Tasks. Graphical model inference: Sequential Monte Carlo meets deterministic approximations. Explaining individual predictions when features are dependent: More accurate approximations to Shapley values. Eliciting New Wikipedia Users’ Interests via Automatically Mined Questionnaires: For a Warm Welcome, Not a Co …

Attentive Single-Tasking of Multiple Tasks

Title Attentive Single-Tasking of Multiple Tasks
Authors Kevis-Kokitsi Maninis, Ilija Radosavovic, Iasonas Kokkinos
Abstract In this work we address task interference in universal networks by considering that a network is trained on multiple tasks, but performs one task at a time, an approach we refer to as “single-tasking multiple tasks”. The network thus modifies its behaviour through task-dependent feature adaptation, or task attention. This gives the network the ability to accentuate the features that are adapted to a task, while shunning irrelevant ones. We further reduce task interference by forcing the task gradients to be statistically indistinguishable through adversarial training, ensuring that the common backbone architecture serving all tasks is not dominated by any of the task-specific gradients. Results in three multi-task dense labelling problems consistently show: (i) a large reduction in the number of parameters while preserving, or even improving performance and (ii) a smooth trade-off between computation and multi-task accuracy. We provide our system’s code and pre-trained models at http://vision.ee.ethz.ch/~kmaninis/astmt/.
Tasks
Published 2019-04-18
URL http://arxiv.org/abs/1904.08918v1
PDF http://arxiv.org/pdf/1904.08918v1.pdf
PWC https://paperswithcode.com/paper/attentive-single-tasking-of-multiple-tasks
Repo https://github.com/facebookresearch/astmt
Framework pytorch

Graphical model inference: Sequential Monte Carlo meets deterministic approximations

Title Graphical model inference: Sequential Monte Carlo meets deterministic approximations
Authors Fredrik Lindsten, Jouni Helske, Matti Vihola
Abstract Approximate inference in probabilistic graphical models (PGMs) can be grouped into deterministic methods and Monte-Carlo-based methods. The former can often provide accurate and rapid inferences, but are typically associated with biases that are hard to quantify. The latter enjoy asymptotic consistency, but can suffer from high computational costs. In this paper we present a way of bridging the gap between deterministic and stochastic inference. Specifically, we suggest an efficient sequential Monte Carlo (SMC) algorithm for PGMs which can leverage the output from deterministic inference methods. While generally applicable, we show explicitly how this can be done with loopy belief propagation, expectation propagation, and Laplace approximations. The resulting algorithm can be viewed as a post-correction of the biases associated with these methods and, indeed, numerical results show clear improvements over the baseline deterministic methods as well as over “plain” SMC.
Tasks
Published 2019-01-08
URL http://arxiv.org/abs/1901.02374v1
PDF http://arxiv.org/pdf/1901.02374v1.pdf
PWC https://paperswithcode.com/paper/graphical-model-inference-sequential-monte
Repo https://github.com/freli005/smc-pgm-twist
Framework none

Explaining individual predictions when features are dependent: More accurate approximations to Shapley values

Title Explaining individual predictions when features are dependent: More accurate approximations to Shapley values
Authors Kjersti Aas, Martin Jullum, Anders Løland
Abstract Explaining complex or seemingly simple machine learning models is an important practical problem. We want to explain individual predictions from a complex machine learning model by learning simple, interpretable explanations. Shapley values is a game theoretic concept that can be used for this purpose. The Shapley value framework has a series of desirable theoretical properties, and can in principle handle any predictive model. Kernel SHAP is a computationally efficient approximation to Shapley values in higher dimensions. Like several other existing methods, this approach assumes that the features are independent, which may give very wrong explanations. This is the case even if a simple linear model is used for predictions. In this paper, we extend the Kernel SHAP method to handle dependent features. We provide several examples of linear and non-linear models with various degrees of feature dependence, where our method gives more accurate approximations to the true Shapley values. We also propose a method for aggregating individual Shapley values, such that the prediction can be explained by groups of dependent variables.
Tasks
Published 2019-03-25
URL https://arxiv.org/abs/1903.10464v3
PDF https://arxiv.org/pdf/1903.10464v3.pdf
PWC https://paperswithcode.com/paper/explaining-individual-predictions-when
Repo https://github.com/NorskRegnesentral/shapr
Framework none

Eliciting New Wikipedia Users’ Interests via Automatically Mined Questionnaires: For a Warm Welcome, Not a Cold Start

Title Eliciting New Wikipedia Users’ Interests via Automatically Mined Questionnaires: For a Warm Welcome, Not a Cold Start
Authors Ramtin Yazdanian, Leila Zia, Jonathan Morgan, Bahodir Mansurov, Robert West
Abstract Every day, thousands of users sign up as new Wikipedia contributors. Once joined, these users have to decide which articles to contribute to, which users to seek out and learn from or collaborate with, etc. Any such task is a hard and potentially frustrating one given the sheer size of Wikipedia. Supporting newcomers in their first steps by recommending articles they would enjoy editing or editors they would enjoy collaborating with is thus a promising route toward converting them into long-term contributors. Standard recommender systems, however, rely on users’ histories of previous interactions with the platform. As such, these systems cannot make high-quality recommendations to newcomers without any previous interactions – the so-called cold-start problem. The present paper addresses the cold-start problem on Wikipedia by developing a method for automatically building short questionnaires that, when completed by a newly registered Wikipedia user, can be used for a variety of purposes, including article recommendations that can help new editors get started. Our questionnaires are constructed based on the text of Wikipedia articles as well as the history of contributions by the already onboarded Wikipedia editors. We assess the quality of our questionnaire-based recommendations in an offline evaluation using historical data, as well as an online evaluation with hundreds of real Wikipedia newcomers, concluding that our method provides cohesive, human-readable questions that perform well against several baselines. By addressing the cold-start problem, this work can help with the sustainable growth and maintenance of Wikipedia’s diverse editor community.
Tasks Recommendation Systems
Published 2019-04-08
URL http://arxiv.org/abs/1904.03889v1
PDF http://arxiv.org/pdf/1904.03889v1.pdf
PWC https://paperswithcode.com/paper/eliciting-new-wikipedia-users-interests-via
Repo https://github.com/RamtinYazdanian/wikipedia_coldstart_questionnaire
Framework tf

GAN-Tree: An Incrementally Learned Hierarchical Generative Framework for Multi-Modal Data Distributions

Title GAN-Tree: An Incrementally Learned Hierarchical Generative Framework for Multi-Modal Data Distributions
Authors Jogendra Nath Kundu, Maharshi Gor, Dakshit Agrawal, R. Venkatesh Babu
Abstract Despite the remarkable success of generative adversarial networks, their performance seems less impressive for diverse training sets, requiring learning of discontinuous mapping functions. Though multi-mode prior or multi-generator models have been proposed to alleviate this problem, such approaches may fail depending on the empirically chosen initial mode components. In contrast to such bottom-up approaches, we present GAN-Tree, which follows a hierarchical divisive strategy to address such discontinuous multi-modal data. Devoid of any assumption on the number of modes, GAN-Tree utilizes a novel mode-splitting algorithm to effectively split the parent mode to semantically cohesive children modes, facilitating unsupervised clustering. Further, it also enables incremental addition of new data modes to an already trained GAN-Tree, by updating only a single branch of the tree structure. As compared to prior approaches, the proposed framework offers a higher degree of flexibility in choosing a large variety of mutually exclusive and exhaustive tree nodes called GAN-Set. Extensive experiments on synthetic and natural image datasets including ImageNet demonstrate the superiority of GAN-Tree against the prior state-of-the-arts.
Tasks
Published 2019-08-11
URL https://arxiv.org/abs/1908.03919v3
PDF https://arxiv.org/pdf/1908.03919v3.pdf
PWC https://paperswithcode.com/paper/gan-tree-an-incrementally-learned
Repo https://github.com/maharshi95/GANTree
Framework pytorch

Model-agnostic Approaches to Handling Noisy Labels When Training Sound Event Classifiers

Title Model-agnostic Approaches to Handling Noisy Labels When Training Sound Event Classifiers
Authors Eduardo Fonseca, Frederic Font, Xavier Serra
Abstract Label noise is emerging as a pressing issue in sound event classification. This arises as we move towards larger datasets that are difficult to annotate manually, but it is even more severe if datasets are collected automatically from online repositories, where labels are inferred through automated heuristics applied to the audio content or metadata. While learning from noisy labels has been an active area of research in computer vision, it has received little attention in sound event classification. Most recent computer vision approaches against label noise are relatively complex, requiring complex networks or extra data resources. In this work, we evaluate simple and efficient model-agnostic approaches to handling noisy labels when training sound event classifiers, namely label smoothing regularization, mixup and noise-robust loss functions. The main advantage of these methods is that they can be easily incorporated to existing deep learning pipelines without need for network modifications or extra resources. We report results from experiments conducted with the FSDnoisy18k dataset. We show that these simple methods can be effective in mitigating the effect of label noise, providing up to 2.5% of accuracy boost when incorporated to two different CNNs, while requiring minimal intervention and computational overhead.
Tasks
Published 2019-10-26
URL https://arxiv.org/abs/1910.12004v1
PDF https://arxiv.org/pdf/1910.12004v1.pdf
PWC https://paperswithcode.com/paper/model-agnostic-approaches-to-handling-noisy
Repo https://github.com/edufonseca/waspaa19
Framework tf

HIGAN: Cosmic Neutral Hydrogen with Generative Adversarial Networks

Title HIGAN: Cosmic Neutral Hydrogen with Generative Adversarial Networks
Authors Juan Zamudio-Fernandez, Atakan Okan, Francisco Villaescusa-Navarro, Seda Bilaloglu, Asena Derin Cengiz, Siyu He, Laurence Perreault Levasseur, Shirley Ho
Abstract One of the most promising ways to observe the Universe is by detecting the 21cm emission from cosmic neutral hydrogen (HI) through radio-telescopes. Those observations can shed light on fundamental astrophysical questions only if accurate theoretical predictions are available. In order to maximize the scientific return of these surveys, those predictions need to include different observables and be precise on non-linear scales. Currently, one of the best ways to achieve this is via cosmological hydrodynamic simulations; however, the computational cost of these simulations is high – tens of millions of CPU hours. In this work, we use Wasserstein Generative Adversarial Networks (WGANs) to generate new high-resolution ($35~h^{-1}{\rm kpc}$) 3D realizations of cosmic HI at $z=5$. We do so by sampling from a 100-dimension manifold, learned by the generator, that characterizes the fully non-linear abundance and clustering of cosmic HI from the state-of-the-art simulation IllustrisTNG. We show that different statistical properties of the produced samples – 1D PDF, power spectrum, bispectrum, and void size function – match very well those of IllustrisTNG, and outperform state-of-the-art models such as Halo Occupation Distributions (HODs). Our WGAN samples reproduce the abundance of HI across 9 orders of magnitude, from the Ly$\alpha$ forest to Damped Lyman Absorbers. WGAN can produce new samples orders of magnitude faster than hydrodynamic simulations.
Tasks
Published 2019-04-29
URL http://arxiv.org/abs/1904.12846v1
PDF http://arxiv.org/pdf/1904.12846v1.pdf
PWC https://paperswithcode.com/paper/higan-cosmic-neutral-hydrogen-with-generative
Repo https://github.com/jjzamudio/HIGAN
Framework pytorch

Predicting Long-Term Skeletal Motions by a Spatio-Temporal Hierarchical Recurrent Network

Title Predicting Long-Term Skeletal Motions by a Spatio-Temporal Hierarchical Recurrent Network
Authors Junfeng Hu, Zhencheng Fan, Jun Liao, Li Liu
Abstract The primary goal of skeletal motion prediction is to generate future motion by observing a sequence of 3D skeletons. A key challenge in motion prediction is the fact that a motion can often be performed in several different ways, with each consisting of its own configuration of poses and their spatio-temporal dependencies, and as a result, the predicted poses often converge to the motionless poses or non-human like motions in long-term prediction. This leads us to define a hierarchical recurrent network model that explicitly characterizes these internal configurations of poses and their local and global spatio-temporal dependencies. The model introduces a latent vector variable from the Lie algebra to represent spatial and temporal relations simultaneously. Furthermore, a structured stack LSTM-based decoder is devised to decode the predicted poses with a new loss function defined to estimate the quantized weight of each body part in a pose. Empirical evaluations on benchmark datasets suggest our approach significantly outperforms the state-of-the-art methods on both short-term and long-term motion prediction.
Tasks motion prediction
Published 2019-11-06
URL https://arxiv.org/abs/1911.02404v3
PDF https://arxiv.org/pdf/1911.02404v3.pdf
PWC https://paperswithcode.com/paper/predicting-long-term-skeletal-motions-by-a
Repo https://github.com/p0werHu/articulated-objects-motion-prediction
Framework pytorch

Segmentation of lesioned brain anatomy with deep volumetric neural networks and multiple spatial priors achieves human-level performance

Title Segmentation of lesioned brain anatomy with deep volumetric neural networks and multiple spatial priors achieves human-level performance
Authors Lukas Hirsch, Yu Huang, Lucas C Parra
Abstract Conventional automated segmentation of MRI of the brain and head distinguishes different tissues based on image intensities and prior tissue probability maps (TPM). This works well for normal head anatomies, but fails in the presence of unexpected lesions. Deep convolutional neural networks leverage instead spatial patterns and can learn to segment lesions, but have thus far not leveraged prior probabilities. Here we add to a three-dimensional convolutional network spatial priors with a TPM, morphological priors with conditional random fields, and context with a wider field-of-view at lower resolution. We train and test these networks on images of 43 stroke patients and 4 healthy individuals which have been manually segmented. The analysis demonstrates the benefits of leveraging the three sources of prior information. We also provide an out-of-sample validation and clinical application of the approach on an additional 47 patients with disorders of consciousness. Importantly, we demonstrate that the new architecture, which we call MultiPrior network, surpaces the performance of expert human segmenters. We make the code and trained networks freely available.
Tasks
Published 2019-05-24
URL https://arxiv.org/abs/1905.10010v3
PDF https://arxiv.org/pdf/1905.10010v3.pdf
PWC https://paperswithcode.com/paper/tissue-segmentation-with-deep-3d-networks-and
Repo https://github.com/lkshrsch/MultiPrior_Brain
Framework none

Loss Landscapes of Regularized Linear Autoencoders

Title Loss Landscapes of Regularized Linear Autoencoders
Authors Daniel Kunin, Jonathan M. Bloom, Aleksandrina Goeva, Cotton Seed
Abstract Autoencoders are a deep learning model for representation learning. When trained to minimize the distance between the data and its reconstruction, linear autoencoders (LAEs) learn the subspace spanned by the top principal directions but cannot learn the principal directions themselves. In this paper, we prove that $L_2$-regularized LAEs are symmetric at all critical points and learn the principal directions as the left singular vectors of the decoder. We smoothly parameterize the critical manifold and relate the minima to the MAP estimate of probabilistic PCA. We illustrate these results empirically and consider implications for PCA algorithms, computational neuroscience, and the algebraic topology of learning.
Tasks Representation Learning
Published 2019-01-23
URL https://arxiv.org/abs/1901.08168v2
PDF https://arxiv.org/pdf/1901.08168v2.pdf
PWC https://paperswithcode.com/paper/loss-landscapes-of-regularized-linear
Repo https://github.com/danielkunin/Regularized-Linear-Autoencoders
Framework tf

Convolution with even-sized kernels and symmetric padding

Title Convolution with even-sized kernels and symmetric padding
Authors Shuang Wu, Guanrui Wang, Pei Tang, Feng Chen, Luping Shi
Abstract Compact convolutional neural networks gain efficiency mainly through depthwise convolutions, expanded channels and complex topologies, which contrarily aggravate the training process. Besides, 3x3 kernels dominate the spatial representation in these models, whereas even-sized kernels (2x2, 4x4) are rarely adopted. In this work, we quantify the shift problem occurs in even-sized kernel convolutions by an information erosion hypothesis, and eliminate it by proposing symmetric padding on four sides of the feature maps (C2sp, C4sp). Symmetric padding releases the generalization capabilities of even-sized kernels at little computational cost, making them outperform 3x3 kernels in image classification and generation tasks. Moreover, C2sp obtains comparable accuracy to emerging compact models with much less memory and time consumption during training. Symmetric padding coupled with even-sized convolutions can be neatly implemented into existing frameworks, providing effective elements for architecture designs, especially on online and continual learning occasions where training efforts are emphasized.
Tasks Continual Learning, Image Classification
Published 2019-03-20
URL https://arxiv.org/abs/1903.08385v2
PDF https://arxiv.org/pdf/1903.08385v2.pdf
PWC https://paperswithcode.com/paper/convolution-with-even-sized-kernels-and
Repo https://github.com/boluoweifenda/CNN
Framework tf

Decoupled Data Based Approach for Learning to Control Nonlinear Dynamical Systems

Title Decoupled Data Based Approach for Learning to Control Nonlinear Dynamical Systems
Authors Ran Wang, Karthikeya Parunandi, Dan Yu, Dileep Kalathil, Suman Chakravorty
Abstract This paper addresses the problem of learning the optimal control policy for a nonlinear stochastic dynamical system with continuous state space, continuous action space and unknown dynamics. This class of problems are typically addressed in stochastic adaptive control and reinforcement learning literature using model-based and model-free approaches respectively. Both methods rely on solving a dynamic programming problem, either directly or indirectly, for finding the optimal closed loop control policy. The inherent curse of dimensionality' associated with dynamic programming method makes these approaches also computationally difficult. This paper proposes a novel decoupled data-based control (D2C) algorithm that addresses this problem using a decoupled, open loop - closed loop’, approach. First, an open-loop deterministic trajectory optimization problem is solved using a black-box simulation model of the dynamical system. Then, a closed loop control is developed around this open loop trajectory by linearization of the dynamics about this nominal trajectory. By virtue of linearization, a linear quadratic regulator based algorithm can be used for this closed loop control. We show that the performance of D2C algorithm is approximately optimal. Moreover, simulation performance suggests significant reduction in training time compared to other state of the art algorithms.
Tasks
Published 2019-04-17
URL http://arxiv.org/abs/1904.08361v1
PDF http://arxiv.org/pdf/1904.08361v1.pdf
PWC https://paperswithcode.com/paper/decoupled-data-based-approach-for-learning-to
Repo https://github.com/rwang0417/d2c_mujoco200
Framework none

Neural Mention Detection

Title Neural Mention Detection
Authors Juntao Yu, Bernd Bohnet, Massimo Poesio
Abstract Mention detection is an important aspect of the annotation task and interpretation process for applications such as coreference resolution. In this work, we propose and compare three neural network-based approaches to mention detection. The first approach is based on the mention detection part of a state-of-the-art coreference resolution system; the second uses ELMo embeddings together with a bidirectional LSTM and a biaffine classifier; the third approach uses the recently introduced BERT model. Our best model (using a biaffine classifier) achieved gains of up to 1.8 percentage points on mention recall when compared with a strong baseline in a HIGH RECALL setting. The same model achieved improvements of up to 5.3 and 6.5 p.p. when compared with the best-reported mention detection F1 on thevCONLL and CRAC data sets respectively in a HIGH F1 setting. We further evaluated our models on coreference resolution by using mentions predicted by our best model in the start-of-the-art coreference systems. The enhanced model achieved absolute improvements of up to 1.7 and 0.7 p.p. when compared with the best pipeline system and the state-of-the-art end-to-end system respectively.
Tasks Coreference Resolution
Published 2019-07-29
URL https://arxiv.org/abs/1907.12524v1
PDF https://arxiv.org/pdf/1907.12524v1.pdf
PWC https://paperswithcode.com/paper/neural-mention-detection
Repo https://github.com/juntaoy/dali-md
Framework tf

Unsupervised Attention Mechanism across Neural Network Layers

Title Unsupervised Attention Mechanism across Neural Network Layers
Authors Baihan Lin
Abstract Inspired by the adaptation phenomenon of neuronal firing, we propose an unsupervised attention mechanism (UAM) which computes the statistical regularity in the implicit space of neural networks under the Minimum Description Length (MDL) principle. Treating the neural network optimization process as a partially observable model selection problem, UAM constrained the implicit space by a normalization factor, the universal code length. We compute this universal code incrementally across neural network layers and demonstrated the flexibility to include data priors such as top-down attention and other oracle information. Empirically, our approach outperforms existing normalization methods in tackling limited, imbalanced and non-stationary input distribution in computer vision and reinforcement learning tasks. Lastly, UAM tracks dependency and critical learning stages across layers and recurrent time steps of deep networks.
Tasks Model Selection
Published 2019-02-27
URL https://arxiv.org/abs/1902.10658v9
PDF https://arxiv.org/pdf/1902.10658v9.pdf
PWC https://paperswithcode.com/paper/regularity-normalization-constraining
Repo https://github.com/doerlbh/UnsupervisedAttentionMechanism
Framework pytorch

Regional Homogeneity: Towards Learning Transferable Universal Adversarial Perturbations Against Defenses

Title Regional Homogeneity: Towards Learning Transferable Universal Adversarial Perturbations Against Defenses
Authors Yingwei Li, Song Bai, Cihang Xie, Zhenyu Liao, Xiaohui Shen, Alan L. Yuille
Abstract This paper focuses on learning transferable adversarial examples specifically against defense models (models to defense adversarial attacks). In particular, we show that a simple universal perturbation can fool a series of state-of-the-art defenses. Adversarial examples generated by existing attacks are generally hard to transfer to defense models. We observe the property of regional homogeneity in adversarial perturbations and suggest that the defenses are less robust to regionally homogeneous perturbations. Therefore, we propose an effective transforming paradigm and a customized gradient transformer module to transform existing perturbations into regionally homogeneous ones. Without explicitly forcing the perturbations to be universal, we observe that a well-trained gradient transformer module tends to output input-independent gradients (hence universal) benefiting from the under-fitting phenomenon. Thorough experiments demonstrate that our work significantly outperforms the prior art attacking algorithms (either image-dependent or universal ones) by an average improvement of 14.0% when attacking 9 defenses in the black-box setting. In addition to the cross-model transferability, we also verify that regionally homogeneous perturbations can well transfer across different vision tasks (attacking with the semantic segmentation task and testing on the object detection task).
Tasks Object Detection, Semantic Segmentation
Published 2019-04-01
URL http://arxiv.org/abs/1904.00979v1
PDF http://arxiv.org/pdf/1904.00979v1.pdf
PWC https://paperswithcode.com/paper/regional-homogeneity-towards-learning
Repo https://github.com/LiYingwei/Regional-Homogeneity
Framework tf
comments powered by Disqus