April 2, 2020

3236 words 16 mins read

Paper Group ANR 181

Product Kanerva Machines: Factorized Bayesian Memory. CTM: Collaborative Temporal Modeling for Action Recognition. Machine Unlearning: Linear Filtration for Logit-based Classifiers. A Hoeffding Inequality for Finite State Markov Chains and its Applications to Markovian Bandits. Blur, Noise, and Compression Robust Generative Adversarial Networks. Bi …

Product Kanerva Machines: Factorized Bayesian Memory


Title	Product Kanerva Machines: Factorized Bayesian Memory
Authors	Adam Marblestone, Yan Wu, Greg Wayne
Abstract	An ideal cognitively-inspired memory system would compress and organize incoming items. The Kanerva Machine (Wu et al, 2018) is a Bayesian model that naturally implements online memory compression. However, the organization of the Kanerva Machine is limited by its use of a single Gaussian random matrix for storage. Here we introduce the Product Kanerva Machine, which dynamically combines many smaller Kanerva Machines. Its hierarchical structure provides a principled way to abstract invariant features and gives scaling and capacity advantages over single Kanerva Machines. We show that it can exhibit unsupervised clustering, find sparse and combinatorial allocation patterns, and discover spatial tunings that approximately factorize simple images by object.
Tasks
Published	2020-02-06
URL	https://arxiv.org/abs/2002.02385v1
PDF	https://arxiv.org/pdf/2002.02385v1.pdf
PWC	https://paperswithcode.com/paper/product-kanerva-machines-factorized-bayesian
Repo
Framework

CTM: Collaborative Temporal Modeling for Action Recognition


Title	CTM: Collaborative Temporal Modeling for Action Recognition
Authors	Qian Liu, Tao Wang, Jie Liu, Yang Guan, Qi Bu, Longfei Yang
Abstract	With the rapid development of digital multimedia, video understanding has become an important field. For action recognition, temporal dimension plays an important role, and this is quite different from image recognition. In order to learn powerful feature of videos, we propose a Collaborative Temporal Modeling (CTM) block (Figure 1) to learn temporal information for action recognition. Besides a parameter-free identity shortcut, as a separate temporal modeling block, CTM includes two collaborative paths: a spatial-aware temporal modeling path, which we propose the Temporal-Channel Convolution Module (TCCM) with unshared parameters for each spatial position (H*W) to build, and a spatial-unaware temporal modeling path. CTM blocks can seamlessly be inserted into many popular networks to generate CTM Networks and bring the capability of learning temporal information to 2D CNN backbone networks, which only capture spatial information. Experiments on several popular action recognition datasets demonstrate that CTM blocks bring the performance improvements on 2D CNN baselines, and our method achieves the competitive results against the state-of-the-art methods. Code will be made publicly available.
Tasks	Video Understanding
Published	2020-02-08
URL	https://arxiv.org/abs/2002.03152v1
PDF	https://arxiv.org/pdf/2002.03152v1.pdf
PWC	https://paperswithcode.com/paper/ctm-collaborative-temporal-modeling-for
Repo
Framework

Machine Unlearning: Linear Filtration for Logit-based Classifiers


Title	Machine Unlearning: Linear Filtration for Logit-based Classifiers
Authors	Thomas Baumhauer, Pascal Schöttle, Matthias Zeppelzauer
Abstract	Recently enacted legislation grants individuals certain rights to decide in what fashion their personal data may be used, and in particular a “right to be forgotten”. This poses a challenge to machine learning: how to proceed when an individual retracts permission to use data which has been part of the training process of a model? From this question emerges the field of machine unlearning, which could be broadly described as the investigation of how to “delete training data from models”. Our work complements this direction of research for the specific setting of class-wide deletion requests for classification models (e.g. deep neural networks). As a first step, we propose linear filtration as a computationally efficient sanitization method. Our experiments demonstrate benefits in an adversarial setting over naive deletion schemes.
Tasks
Published	2020-02-07
URL	https://arxiv.org/abs/2002.02730v1
PDF	https://arxiv.org/pdf/2002.02730v1.pdf
PWC	https://paperswithcode.com/paper/machine-unlearning-linear-filtration-for
Repo
Framework

A Hoeffding Inequality for Finite State Markov Chains and its Applications to Markovian Bandits


Title	A Hoeffding Inequality for Finite State Markov Chains and its Applications to Markovian Bandits
Authors	Vrettos Moulos
Abstract	This paper develops a Hoeffding inequality for the partial sums $\sum_{k=1}^n f (X_k)$, where ${X_k}_{k \in \mathbb{Z}_{> 0}}$ is an irreducible Markov chain on a finite state space $S$, and $f : S \to [a, b]$ is a real-valued function. Our bound is simple, general, since it only assumes irreducibility and finiteness of the state space, and powerful. In order to demonstrate its usefulness we provide two applications in multi-armed bandit problems. The first is about identifying an approximately best Markovian arm, while the second is concerned with regret minimization in the context of Markovian bandits.
Tasks
Published	2020-01-05
URL	https://arxiv.org/abs/2001.01199v1
PDF	https://arxiv.org/pdf/2001.01199v1.pdf
PWC	https://paperswithcode.com/paper/a-hoeffding-inequality-for-finite-state
Repo
Framework

Blur, Noise, and Compression Robust Generative Adversarial Networks


Title	Blur, Noise, and Compression Robust Generative Adversarial Networks
Authors	Takuhiro Kaneko, Tatsuya Harada
Abstract	Recently, generative adversarial networks (GANs), which learn data distributions through adversarial training, have gained special attention owing to their high image reproduction ability. However, one limitation of standard GANs is that they recreate training images faithfully despite image degradation characteristics such as blur, noise, and compression. To remedy this, we address the problem of blur, noise, and compression robust image generation. Our objective is to learn a non-degraded image generator directly from degraded images without prior knowledge of image degradation. The recently proposed noise robust GAN (NR-GAN) already provides a solution to the problem of noise degradation. Therefore, we first focus on blur and compression degradations. We propose blur robust GAN (BR-GAN) and compression robust GAN (CR-GAN), which learn a kernel generator and quality factor generator, respectively, with non-degraded image generators. Owing to the irreversible blur and compression characteristics, adjusting their strengths is non-trivial. Therefore, we incorporate switching architectures that can adapt the strengths in a data-driven manner. Based on BR-GAN, NR-GAN, and CR-GAN, we further propose blur, noise, and compression robust GAN (BNCR-GAN), which unifies these three models into a single model with additionally introduced adaptive consistency losses that suppress the uncertainty caused by the combination. We provide benchmark scores through large-scale comparative studies on CIFAR-10 and a generality analysis on FFHQ dataset.
Tasks	Image Generation
Published	2020-03-17
URL	https://arxiv.org/abs/2003.07849v1
PDF	https://arxiv.org/pdf/2003.07849v1.pdf
PWC	https://paperswithcode.com/paper/blur-noise-and-compression-robust-generative
Repo
Framework

BigGAN-based Bayesian reconstruction of natural images from human brain activity


Title	BigGAN-based Bayesian reconstruction of natural images from human brain activity
Authors	Kai Qiao, Jian Chen, Linyuan Wang, Chi Zhang, Li Tong, Bin Yan
Abstract	In the visual decoding domain, visually reconstructing presented images given the corresponding human brain activity monitored by functional magnetic resonance imaging (fMRI) is difficult, especially when reconstructing viewed natural images. Visual reconstruction is a conditional image generation on fMRI data and thus generative adversarial network (GAN) for natural image generation is recently introduced for this task. Although GAN-based methods have greatly improved, the fidelity and naturalness of reconstruction are still unsatisfactory due to the small number of fMRI data samples and the instability of GAN training. In this study, we proposed a new GAN-based Bayesian visual reconstruction method (GAN-BVRM) that includes a classifier to decode categories from fMRI data, a pre-trained conditional generator to generate natural images of specified categories, and a set of encoding models and evaluator to evaluate generated images. GAN-BVRM employs the pre-trained generator of the prevailing BigGAN to generate masses of natural images, and selects the images that best matches with the corresponding brain activity through the encoding models as the reconstruction of the image stimuli. In this process, the semantic and detailed contents of reconstruction are controlled by decoded categories and encoding models, respectively. GAN-BVRM used the Bayesian manner to avoid contradiction between naturalness and fidelity from current GAN-based methods and thus can improve the advantages of GAN. Experimental results revealed that GAN-BVRM improves the fidelity and naturalness, that is, the reconstruction is natural and similar to the presented image stimuli.
Tasks	Conditional Image Generation, Image Generation
Published	2020-03-13
URL	https://arxiv.org/abs/2003.06105v1
PDF	https://arxiv.org/pdf/2003.06105v1.pdf
PWC	https://paperswithcode.com/paper/biggan-based-bayesian-reconstruction-of
Repo
Framework

Efficient Content-Based Sparse Attention with Routing Transformers


Title	Efficient Content-Based Sparse Attention with Routing Transformers
Authors	Aurko Roy, Mohammad Saffar, Ashish Vaswani, David Grangier
Abstract	Self-attention has recently been adopted for a wide range of sequence modeling problems. Despite its effectiveness, self-attention suffers from quadratic compute and memory requirements with respect to sequence length. Successful approaches to reduce this complexity focused on attending to local sliding windows or a small set of locations independent of content. Our work proposes to learn dynamic sparse attention patterns that avoid allocating computation and memory to attend to content unrelated to the query of interest. This work builds upon two lines of research: it combines the modeling flexibility of prior work on content-based sparse attention with the efficiency gains from approaches based on local, temporal sparse attention. Our model, the Routing Transformer, endows self-attention with a sparse routing module based on online k-means while reducing the overall complexity of attention to $O\left(n^{1.5}d\right)$ from $O\left(n^2d\right)$ for sequence length $n$ and hidden dimension $d$. We show that our model outperforms comparable sparse attention models on language modeling on Wikitext-103 (15.8 vs 18.3 perplexity) as well as on image generation on ImageNet-64 (3.43 vs 3.44 bits/dim) while using fewer self-attention layers.
Tasks	Image Generation, Language Modelling
Published	2020-03-12
URL	https://arxiv.org/abs/2003.05997v1
PDF	https://arxiv.org/pdf/2003.05997v1.pdf
PWC	https://paperswithcode.com/paper/efficient-content-based-sparse-attention-with-1
Repo
Framework

$Π-$nets: Deep Polynomial Neural Networks


Title	$Π-$nets: Deep Polynomial Neural Networks
Authors	Grigorios G. Chrysos, Stylianos Moschoglou, Giorgos Bouritsas, Yannis Panagakis, Jiankang Deng, Stefanos Zafeiriou
Abstract	Deep Convolutional Neural Networks (DCNNs) is currently the method of choice both for generative, as well as for discriminative learning in computer vision and machine learning. The success of DCNNs can be attributed to the careful selection of their building blocks (e.g., residual blocks, rectifiers, sophisticated normalization schemes, to mention but a few). In this paper, we propose $\Pi$-Nets, a new class of DCNNs. $\Pi$-Nets are polynomial neural networks, i.e., the output is a high-order polynomial of the input. $\Pi$-Nets can be implemented using special kind of skip connections and their parameters can be represented via high-order tensors. We empirically demonstrate that $\Pi$-Nets have better representation power than standard DCNNs and they even produce good results without the use of non-linear activation functions in a large battery of tasks and signals, i.e., images, graphs, and audio. When used in conjunction with activation functions, $\Pi$-Nets produce state-of-the-art results in challenging tasks, such as image generation. Lastly, our framework elucidates why recent generative models, such as StyleGAN, improve upon their predecessors, e.g., ProGAN.
Tasks	Image Generation
Published	2020-03-08
URL	https://arxiv.org/abs/2003.03828v2
PDF	https://arxiv.org/pdf/2003.03828v2.pdf
PWC	https://paperswithcode.com/paper/-nets-deep-polynomial-neural-networks
Repo
Framework

Adversarial Data Encryption


Title	Adversarial Data Encryption
Authors	Yingdong Hu, Liang Zhang, Wei Shan, Xiaoxiao Qin, Jing Qi, Zhenzhou Wu, Yang Yuan
Abstract	In the big data era, many organizations face the dilemma of data sharing. Regular data sharing is often necessary for human-centered discussion and communication, especially in medical scenarios. However, unprotected data sharing may also lead to data leakage. Inspired by adversarial attack, we propose a method for data encryption, so that for human beings the encrypted data look identical to the original version, but for machine learning methods they are misleading. To show the effectiveness of our method, we collaborate with the Beijing Tiantan Hospital, which has a world leading neurological center. We invite $3$ doctors to manually inspect our encryption method based on real world medical images. The results show that the encrypted images can be used for diagnosis by the doctors, but not by machine learning methods.
Tasks	Adversarial Attack
Published	2020-02-10
URL	https://arxiv.org/abs/2002.03793v2
PDF	https://arxiv.org/pdf/2002.03793v2.pdf
PWC	https://paperswithcode.com/paper/adversarial-data-encryption
Repo
Framework

Solving Raven’s Progressive Matrices with Neural Networks


Title	Solving Raven’s Progressive Matrices with Neural Networks
Authors	Tao Zhuo, Mohan Kankanhalli
Abstract	Raven’s Progressive Matrices (RPM) have been widely used for Intelligence Quotient (IQ) test of humans. In this paper, we aim to solve RPM with neural networks in both supervised and unsupervised manners. First, we investigate strategies to reduce over-fitting in supervised learning. We suggest the use of a neural network with deep layers and pre-training on large-scale datasets to improve model generalization. Experiments on the RAVEN dataset show that the overall accuracy of our supervised approach surpasses human-level performance. Second, as an intelligent agent requires to automatically learn new skills to solve new problems, we propose the first unsupervised method, Multilabel Classification with Pseudo Target (MCPT), for RPM problems. Based on the design of the pseudo target, MCPT converts the unsupervised learning problem to a supervised task. Experiments show that MCPT doubles the testing accuracy of random guessing e.g. 28.50% vs. 12.5%. Finally, we discuss the problem of solving RPM with unsupervised and explainable strategies in the future.
Tasks
Published	2020-02-05
URL	https://arxiv.org/abs/2002.01646v2
PDF	https://arxiv.org/pdf/2002.01646v2.pdf
PWC	https://paperswithcode.com/paper/solving-ravens-progressive-matrices-with
Repo
Framework

Analysis and Prediction of Pedestrian Crosswalk Behavior during Automated Vehicle Interactions


Title	Analysis and Prediction of Pedestrian Crosswalk Behavior during Automated Vehicle Interactions
Authors	Suresh Kumaar Jayaraman, Dawn M. Tilbury, X. Jessie Yang, Anuj K. Pradhan, Lionel P. Robert Jr
Abstract	For safe navigation around pedestrians, automated vehicles (AVs) need to plan their motion by accurately predicting pedestrians trajectories over long time horizons. Current approaches to AV motion planning around crosswalks predict only for short time horizons (1-2 s) and are based on data from pedestrian interactions with human-driven vehicles (HDVs). In this paper, we develop a hybrid systems model that uses pedestrians gap acceptance behavior and constant velocity dynamics for long-term pedestrian trajectory prediction when interacting with AVs. Results demonstrate the applicability of the model for long-term (> 5 s) pedestrian trajectory prediction at crosswalks. Further we compared measures of pedestrian crossing behaviors in the immersive virtual environment (when interacting with AVs) to that in the real world (results of published studies of pedestrians interacting with HDVs), and found similarities between the two. These similarities demonstrate the applicability of the hybrid model of AV interactions developed from an immersive virtual environment (IVE) for real-world scenarios for both AVs and HDVs.
Tasks	Motion Planning, Trajectory Prediction
Published	2020-03-22
URL	https://arxiv.org/abs/2003.09996v1
PDF	https://arxiv.org/pdf/2003.09996v1.pdf
PWC	https://paperswithcode.com/paper/analysis-and-prediction-of-pedestrian
Repo
Framework

Detecting Lane and Road Markings at A Distance with Perspective Transformer Layers


Title	Detecting Lane and Road Markings at A Distance with Perspective Transformer Layers
Authors	Zhuoping Yu, Xiaozhou Ren, Yuyao Huang, Wei Tian, Junqiao Zhao
Abstract	Accurate detection of lane and road markings is a task of great importance for intelligent vehicles. In existing approaches, the detection accuracy often degrades with the increasing distance. This is due to the fact that distant lane and road markings occupy a small number of pixels in the image, and scales of lane and road markings are inconsistent at various distances and perspectives. The Inverse Perspective Mapping (IPM) can be used to eliminate the perspective distortion, but the inherent interpolation can lead to artifacts especially around distant lane and road markings and thus has a negative impact on the accuracy of lane marking detection and segmentation. To solve this problem, we adopt the Encoder-Decoder architecture in Fully Convolutional Networks and leverage the idea of Spatial Transformer Networks to introduce a novel semantic segmentation neural network. This approach decomposes the IPM process into multiple consecutive differentiable homographic transform layers, which are called “Perspective Transformer Layers”. Furthermore, the interpolated feature map is refined by subsequent convolutional layers thus reducing the artifacts and improving the accuracy. The effectiveness of the proposed method in lane marking detection is validated on two public datasets: TuSimple and ApolloScape
Tasks	Semantic Segmentation
Published	2020-03-19
URL	https://arxiv.org/abs/2003.08550v1
PDF	https://arxiv.org/pdf/2003.08550v1.pdf
PWC	https://paperswithcode.com/paper/detecting-lane-and-road-markings-at-a
Repo
Framework

Data Selection for Federated Learning with Relevant and Irrelevant Data at Clients


Title	Data Selection for Federated Learning with Relevant and Irrelevant Data at Clients
Authors	Tiffany Tuor, Shiqiang Wang, Bong Jun Ko, Changchang Liu, Kin K. Leung
Abstract	Federated learning is an effective way of training a machine learning model from data collected by client devices. A challenge is that among the large variety of data collected at each client, it is likely that only a subset is relevant for a learning task while the rest of data has a negative impact on model training. Therefore, before starting the learning process, it is important to select the subset of data that is relevant to the given federated learning task. In this paper, we propose a method for distributedly selecting relevant data, where we use a benchmark model trained on a small benchmark dataset that is task-specific, to evaluate the relevance of individual data samples at each client and select the data with sufficiently high relevance. Then, each client only uses the selected subset of its data in the federated learning process. The effectiveness of our proposed approach is evaluated on multiple real-world datasets in a simulated system with a large number of clients, showing up to $25%$ improvement in model accuracy compared to training with all data.
Tasks
Published	2020-01-22
URL	https://arxiv.org/abs/2001.08300v1
PDF	https://arxiv.org/pdf/2001.08300v1.pdf
PWC	https://paperswithcode.com/paper/data-selection-for-federated-learning-with
Repo
Framework

Object condensation: one-stage grid-free multi-object reconstruction in physics detectors, graph and image data


Title	Object condensation: one-stage grid-free multi-object reconstruction in physics detectors, graph and image data
Authors	Jan Kieseler
Abstract	High-energy physics detectors, images, and point clouds share many similarities as far as object detection is concerned. However, while detecting an unknown number of objects in an image is well established in computer vision, even machine learning assisted object reconstruction algorithms in particle physics almost exclusively predict properties on an object-by-object basis. One of the reasons is that traditional approaches to deep-neural network based multi-object detection usually employ anchor boxes, imposing implicit constraints on object sizes and density, which are not well suited for highly sparse detector data with differences in densities spanning multiple orders of magnitude. Other approaches rely heavily on objects being dense and solid, with well defined edges and a central point that is used as a keypoint to attach properties. This approach is also not directly applicable to generic detector signals. The object condensation method proposed here is independent of assumptions on object size, sorting or object density, and further generalises to non-image like data structures, such as graphs and point clouds, which are more suitable to represent detector signals. The pixels or vertices themselves serve as representations of the entire object and a combination of learnable local clustering in a latent space and confidence assignment allows one to collect condensates of the predicted object properties with a simple algorithm. As proof of concept, the object condensation method is applied to a simple object classification problem in images and used to reconstruct multiple particles from detector signals. The latter results are also compared to a classic particle flow approach.
Tasks	Object Classification, Object Detection, Object Reconstruction
Published	2020-02-10
URL	https://arxiv.org/abs/2002.03605v2
PDF	https://arxiv.org/pdf/2002.03605v2.pdf
PWC	https://paperswithcode.com/paper/object-condensation-one-stage-grid-free-multi
Repo
Framework

Toward Interpretability of Dual-Encoder Models for Dialogue Response Suggestions


Title	Toward Interpretability of Dual-Encoder Models for Dialogue Response Suggestions
Authors	Yitong Li, Dianqi Li, Sushant Prakash, Peng Wang
Abstract	This work shows how to improve and interpret the commonly used dual encoder model for response suggestion in dialogue. We present an attentive dual encoder model that includes an attention mechanism on top of the extracted word-level features from two encoders, one for context and one for label respectively. To improve the interpretability in the dual encoder models, we design a novel regularization loss to minimize the mutual information between unimportant words and desired labels, in addition to the original attention method, so that important words are emphasized while unimportant words are de-emphasized. This can help not only with model interpretability, but can also further improve model accuracy. We propose an approximation method that uses a neural network to calculate the mutual information. Furthermore, by adding a residual layer between raw word embeddings and the final encoded context feature, word-level interpretability is preserved at the final prediction of the model. We compare the proposed model with existing methods for the dialogue response task on two public datasets (Persona and Ubuntu). The experiments demonstrate the effectiveness of the proposed model in terms of better Recall@1 accuracy and visualized interpretability.
Tasks	Word Embeddings
Published	2020-03-02
URL	https://arxiv.org/abs/2003.04998v1
PDF	https://arxiv.org/pdf/2003.04998v1.pdf
PWC	https://paperswithcode.com/paper/toward-interpretability-of-dual-encoder
Repo
Framework