April 3, 2020

# Paper Group ANR 20

Learning to Catch Piglets in Flight. Countering Language Drift with Seeded Iterated Learning. Verification of Neural Networks: Enhancing Scalability through Pruning. On the Decision Boundaries of Deep Neural Networks: A Tropical Geometry Perspective. Jointly Learning to Recommend and Advertise. Sketching Transformed Matrices with Applications to Na …

#### Learning to Catch Piglets in Flight

Title Learning to Catch Piglets in Flight
Authors Ozan Çatal, Lawrence De Mol, Tim Verbelen, Bart Dhoedt
Abstract Catching objects in-flight is an outstanding challenge in robotics. In this paper, we present a closed-loop control system fusing data from two sensor modalities: an RGB-D camera and a radar. To develop and test our method, we start with an easy to identify object: a stuffed Piglet. We implement and compare two approaches to detect and track the object, and to predict the interception point. A baseline model uses colour filtering for locating the thrown object in the environment, while the interception point is predicted using a least squares regression over the physical ballistic trajectory equations. A deep learning based method uses artificial neural networks for both object detection and interception point prediction. We show that we are able to successfully catch Piglet in 80% of the cases with our deep learning approach.
Published 2020-01-28
URL https://arxiv.org/abs/2001.10220v1
PDF https://arxiv.org/pdf/2001.10220v1.pdf
PWC https://paperswithcode.com/paper/learning-to-catch-piglets-in-flight
Repo
Framework

#### Countering Language Drift with Seeded Iterated Learning

Title Countering Language Drift with Seeded Iterated Learning
Authors Yuchen Lu, Soumye Singhal, Florian Strub, Olivier Pietquin, Aaron Courville
Abstract Supervised learning methods excel at capturing statistical properties of language when trained over large text corpora. Yet, these models often produce inconsistent outputs in goal-oriented language settings as they are not trained to complete the underlying task. Moreover, as soon as the agents are finetuned to maximize task completion, they suffer from the so-called language drift phenomenon: they slowly lose syntactic and semantic properties of language as they only focus on solving the task. In this paper, we propose a generic approach to counter language drift by using iterated learning. We iterate between fine-tuning agents with interactive training steps, and periodically replacing them with new agents that are seeded from last iteration and trained to imitate the latest finetuned models. Iterated learning does not require external syntactic constraint nor semantic knowledge, making it a valuable task-agnostic finetuning protocol. We first explore iterated learning in the Lewis Game. We then scale-up the approach in the translation game. In both settings, our results show that iterated learn-ing drastically counters language drift as well as it improves the task completion metric.
Published 2020-03-28
URL https://arxiv.org/abs/2003.12694v1
PDF https://arxiv.org/pdf/2003.12694v1.pdf
PWC https://paperswithcode.com/paper/countering-language-drift-with-seeded
Repo
Framework

#### Verification of Neural Networks: Enhancing Scalability through Pruning

Title Verification of Neural Networks: Enhancing Scalability through Pruning
Authors Dario Guidotti, Francesco Leofante, Luca Pulina, Armando Tacchella
Abstract Verification of deep neural networks has witnessed a recent surge of interest, fueled by success stories in diverse domains and by abreast concerns about safety and security in envisaged applications. Complexity and sheer size of such networks are challenging for automated formal verification techniques which, on the other hand, could ease the adoption of deep networks in safety- and security-critical contexts. In this paper we focus on enabling state-of-the-art verification tools to deal with neural networks of some practical interest. We propose a new training pipeline based on network pruning with the goal of striking a balance between maintaining accuracy and robustness while making the resulting networks amenable to formal analysis. The results of our experiments with a portfolio of pruning algorithms and verification tools show that our approach is successful for the kind of networks we consider and for some combinations of pruning and verification techniques, thus bringing deep neural networks closer to the reach of formally-grounded methods.
Published 2020-03-17
URL https://arxiv.org/abs/2003.07636v1
PDF https://arxiv.org/pdf/2003.07636v1.pdf
PWC https://paperswithcode.com/paper/verification-of-neural-networks-enhancing
Repo
Framework

#### On the Decision Boundaries of Deep Neural Networks: A Tropical Geometry Perspective

Title On the Decision Boundaries of Deep Neural Networks: A Tropical Geometry Perspective
Authors Motasem Alfarra, Adel Bibi, Hasan Hammoud, Mohamed Gaafar, Bernard Ghanem
Abstract This work tackles the problem of characterizing and understanding the decision boundaries of neural networks with piecewise linear non-linearity activations. We use tropical geometry, a new development in the area of algebraic geometry, to characterize the decision boundaries of a simple neural network of the form (Affine, ReLU, Affine). Our main finding is that the decision boundaries are a subset of a tropical hypersurface, which is intimately related to a polytope formed by the convex hull of two zonotopes. The generators of these zonotopes are functions of the neural network parameters. This geometric characterization provides new perspective to three tasks. Specifically, we propose a new tropical perspective to the lottery ticket hypothesis, where we see the effect of different initializations on the tropical geometric representation of a network’s decision boundaries. Moreover, we use this characterization to propose a new set of tropical regularizers, which directly deal with the decision boundaries of a network. We investigate the use of these regularizers in neural network pruning (by removing network parameters that do not contribute to the tropical geometric representation of the decision boundaries) and in generating adversarial input attacks (by producing input perturbations that explicitly perturb the decision boundaries’ geometry and ultimately change the network’s prediction).
Published 2020-02-20
URL https://arxiv.org/abs/2002.08838v1
PDF https://arxiv.org/pdf/2002.08838v1.pdf
PWC https://paperswithcode.com/paper/on-the-decision-boundaries-of-deep-neural-1
Repo
Framework

#### Jointly Learning to Recommend and Advertise

Title Jointly Learning to Recommend and Advertise
Authors Xiangyu Zhao, Xudong Zheng, Xiwang Yang, Xiaobing Liu, Jiliang Tang
Published 2020-02-28
URL https://arxiv.org/abs/2003.00097v1
PDF https://arxiv.org/pdf/2003.00097v1.pdf
Repo
Framework

#### Sketching Transformed Matrices with Applications to Natural Language Processing

Title Sketching Transformed Matrices with Applications to Natural Language Processing
Authors Yingyu Liang, Zhao Song, Mengdi Wang, Lin F. Yang, Xin Yang
Abstract Suppose we are given a large matrix $A=(a_{i,j})$ that cannot be stored in memory but is in a disk or is presented in a data stream. However, we need to compute a matrix decomposition of the entry-wisely transformed matrix, $f(A):=(f(a_{i,j}))$ for some function $f$. Is it possible to do it in a space efficient way? Many machine learning applications indeed need to deal with such large transformed matrices, for example word embedding method in NLP needs to work with the pointwise mutual information (PMI) matrix, while the entrywise transformation makes it difficult to apply known linear algebraic tools. Existing approaches for this problem either need to store the whole matrix and perform the entry-wise transformation afterwards, which is space consuming or infeasible, or need to redesign the learning method, which is application specific and requires substantial remodeling. In this paper, we first propose a space-efficient sketching algorithm for computing the product of a given small matrix with the transformed matrix. It works for a general family of transformations with provable small error bounds and thus can be used as a primitive in downstream learning tasks. We then apply this primitive to a concrete application: low-rank approximation. We show that our approach obtains small error and is efficient in both space and time. We complement our theoretical results with experiments on synthetic and real data.
Published 2020-02-23
URL https://arxiv.org/abs/2002.09812v1
PDF https://arxiv.org/pdf/2002.09812v1.pdf
PWC https://paperswithcode.com/paper/sketching-transformed-matrices-with
Repo
Framework

#### Approximate Data Deletion from Machine Learning Models: Algorithms and Evaluations

Title Approximate Data Deletion from Machine Learning Models: Algorithms and Evaluations
Authors Zachary Izzo, Mary Anne Smart, Kamalika Chaudhuri, James Zou
Abstract Deleting data from a trained machine learning (ML) model is a critical task in many applications. For example, we may want to remove the influence of training points that might be out of date or outliers. Regulations such as EU’s General Data Protection Regulation also stipulate that individuals can request to have their data deleted. The naive approach to data deletion is to retrain the ML model on the remaining data, but this is too time consuming. Moreover there is no known efficient algorithm that exactly deletes data from most ML models. In this work, we evaluate several approaches for approximate data deletion from trained models. For the case of linear regression, we propose a new method with linear dependence on the feature dimension $d$, a significant gain over all existing methods which all have superlinear time dependence on the dimension. We also provide a new test for evaluating data deletion from linear models.
Published 2020-02-24
URL https://arxiv.org/abs/2002.10077v1
PDF https://arxiv.org/pdf/2002.10077v1.pdf
PWC https://paperswithcode.com/paper/approximate-data-deletion-from-machine
Repo
Framework

#### Compact Representation of Uncertainty in Hierarchical Clustering

Title Compact Representation of Uncertainty in Hierarchical Clustering
Authors Craig S. Greenberg, Sebastian Macaluso, Nicholas Monath, Ji-Ah Lee, Patrick Flaherty, Kyle Cranmer, Andrew McGregor, Andrew McCallum
Abstract Hierarchical clustering is a fundamental task often used to discover meaningful structures in data, such as phylogenetic trees, taxonomies of concepts, subtypes of cancer, and cascades of particle decays in particle physics. When multiple hierarchical clusterings of the data are possible, it is useful to represent uncertainty in the clustering through various probabilistic quantities. Existing approaches represent uncertainty for a range of models; however, they only provide approximate inference. This paper presents dynamic-programming algorithms and proofs for exact inference in hierarchical clustering. We are able to compute the partition function, MAP hierarchical clustering, and marginal probabilities of sub-hierarchies and clusters. Our method supports a wide range of hierarchical models and only requires a cluster compatibility function. Rather than scaling with the number of hierarchical clusterings of $n$ elements ($\omega(n n! / 2^{n-1})$), our approach runs in time and space proportional to the significantly smaller powerset of $n$. Despite still being large, these algorithms enable exact inference in small-data applications and are also interesting from a theoretical perspective. We demonstrate the utility of our method and compare its performance with respect to existing approximate methods.
Published 2020-02-26
URL https://arxiv.org/abs/2002.11661v1
PDF https://arxiv.org/pdf/2002.11661v1.pdf
PWC https://paperswithcode.com/paper/compact-representation-of-uncertainty-in-1
Repo
Framework

#### Performance Aware Convolutional Neural Network Channel Pruning for Embedded GPUs

Title Performance Aware Convolutional Neural Network Channel Pruning for Embedded GPUs
Authors Valentin Radu, Kuba Kaszyk, Yuan Wen, Jack Turner, Jose Cano, Elliot J. Crowley, Bjorn Franke, Amos Storkey, Michael O’Boyle
Abstract Convolutional Neural Networks (CNN) are becoming a common presence in many applications and services, due to their superior recognition accuracy. They are increasingly being used on mobile devices, many times just by porting large models designed for server space, although several model compression techniques have been considered. One model compression technique intended to reduce computations is channel pruning. Mobile and embedded systems now have GPUs which are ideal for the parallel computations of neural networks and for their lower energy cost per operation. Specialized libraries perform these neural network computations through highly optimized routines. As we find in our experiments, these libraries are optimized for the most common network shapes, making uninstructed channel pruning inefficient. We evaluate higher level libraries, which analyze the input characteristics of a convolutional layer, based on which they produce optimized OpenCL (Arm Compute Library and TVM) and CUDA (cuDNN) code. However, in reality, these characteristics and subsequent choices intended for optimization can have the opposite effect. We show that a reduction in the number of convolutional channels, pruning 12% of the initial size, is in some cases detrimental to performance, leading to 2x slowdown. On the other hand, we also find examples where performance-aware pruning achieves the intended results, with performance speedups of 3x with cuDNN and above 10x with Arm Compute Library and TVM. Our findings expose the need for hardware-instructed neural network pruning.
Published 2020-02-20
URL https://arxiv.org/abs/2002.08697v1
PDF https://arxiv.org/pdf/2002.08697v1.pdf
PWC https://paperswithcode.com/paper/performance-aware-convolutional-neural
Repo
Framework

#### Bayesian Domain Randomization for Sim-to-Real Transfer

Title Bayesian Domain Randomization for Sim-to-Real Transfer
Authors Fabio Muratore, Christian Eilers, Michael Gienger, Jan Peters
Abstract When learning policies for robot control, the real-world data required is typically prohibitively expensive to acquire, so learning in simulation is a popular strategy. Unfortunately, such polices are often not transferable to the real world due to a mismatch between the simulation and reality, called ‘reality gap’. Domain randomization methods tackle this problem by randomizing the physics simulator (source domain) according to a distribution over domain parameters during training in order to obtain more robust policies that are able to overcome the reality gap. Most domain randomization approaches sample the domain parameters from a fixed distribution. This solution is suboptimal in the context of sim-to-real transferability, since it yields policies that have been trained without explicitly optimizing for the reward on the real system (target domain). Additionally, a fixed distribution assumes there is prior knowledge about the uncertainty over the domain parameters. Thus, we propose Bayesian Domain Randomization (BayRn), a black box sim-to-real algorithm that solves tasks efficiently by adapting the domain parameter distribution during learning by sampling the real-world target domain. BayRn utilizes Bayesian optimization to search the space of source domain distribution parameters which produce a policy that maximizes the real-word objective, allowing for adaptive distributions during policy optimization. We experimentally validate the proposed approach by comparing against two baseline methods on a nonlinear under-actuated swing-up task. Our results show that BayRn is capable to perform direct sim-to-real transfer, while significantly reducing the required prior knowledge.
Published 2020-03-05
URL https://arxiv.org/abs/2003.02471v1
PDF https://arxiv.org/pdf/2003.02471v1.pdf
PWC https://paperswithcode.com/paper/bayesian-domain-randomization-for-sim-to-real
Repo
Framework

#### Parallel Performance-Energy Predictive Modeling of Browsers: Case Study of Servo

Title Parallel Performance-Energy Predictive Modeling of Browsers: Case Study of Servo
Authors Rohit Zambre, Lars Bergstrom, Laleh Aghababaie Beni, Aparna Chandramowliswharan
Abstract Mozilla Research is developing Servo, a parallel web browser engine, to exploit the benefits of parallelism and concurrency in the web rendering pipeline. Parallelization results in improved performance for pinterest.com but not for google.com. This is because the workload of a browser is dependent on the web page it is rendering. In many cases, the overhead of creating, deleting, and coordinating parallel work outweighs any of its benefits. In this paper, we model the relationship between web page primitives and a web browser’s parallel performance using supervised learning. We discover a feature space that is representative of the parallelism available in a web page and characterize it using seven key features. Additionally, we consider energy usage trade-offs for different levels of performance improvements using automated labeling algorithms. Such a model allows us to predict the degree of parallelism available in a web page and decide whether or not to render a web page in parallel. This modeling is critical for improving the browser’s performance and minimizing its energy usage. We evaluate our model by using Servo’s layout stage as a case study. Experiments on a quad-core Intel Ivy Bridge (i7-3615QM) laptop show that we can improve performance and energy usage by up to 94.52% and 46.32% respectively on the 535 web pages considered in this study. Looking forward, we identify opportunities to apply this model to other stages of a browser’s architecture as well as other performance- and energy-critical devices.
Published 2020-02-06
URL https://arxiv.org/abs/2002.03850v1
PDF https://arxiv.org/pdf/2002.03850v1.pdf
PWC https://paperswithcode.com/paper/parallel-performance-energy-predictive
Repo
Framework

#### Quaternion-Valued Recurrent Projection Neural Networks on Unit Quaternions

Title Quaternion-Valued Recurrent Projection Neural Networks on Unit Quaternions
Authors Marcos Eduardo Valle, Rodolfo Anibal Lobo
Abstract Hypercomplex-valued neural networks, including quaternion-valued neural networks, can treat multi-dimensional data as a single entity. In this paper, we present the quaternion-valued recurrent projection neural networks (QRPNNs). Briefly, QRPNNs are obtained by combining the non-local projection learning with the quaternion-valued recurrent correlation neural network (QRCNNs). We show that QRPNNs overcome the cross-talk problem of QRCNNs. Thus, they are appropriate to implement associative memories. Furthermore, computational experiments reveal that QRPNNs exhibit greater storage capacity and noise tolerance than their corresponding QRCNNs.
Published 2020-01-30
URL https://arxiv.org/abs/2001.11846v1
PDF https://arxiv.org/pdf/2001.11846v1.pdf
PWC https://paperswithcode.com/paper/quaternion-valued-recurrent-projection-neural
Repo
Framework

#### Politics of Adversarial Machine Learning

Title Politics of Adversarial Machine Learning
Authors Kendra Albert, Jonathon Penney, Bruce Schneier, Ram Shankar Siva Kumar
Abstract In addition to their security properties, adversarial machine-learning attacks and defenses have political dimensions. They enable or foreclose certain options for both the subjects of the machine learning systems and for those who deploy them, creating risks for civil liberties and human rights. In this paper, we draw on insights from science and technology studies, anthropology, and human rights literature, to inform how defenses against adversarial attacks can be used to suppress dissent and limit attempts to investigate machine learning systems. To make this concrete, we use real-world examples of how attacks such as perturbation, model inversion, or membership inference can be used for socially desirable ends. Although the predictions of this analysis may seem dire, there is hope. Efforts to address human rights concerns in the commercial spyware industry provide guidance for similar measures to ensure ML systems serve democratic, not authoritarian ends
Published 2020-02-01
URL https://arxiv.org/abs/2002.05648v2
PDF https://arxiv.org/pdf/2002.05648v2.pdf
Repo
Framework

#### NeuroFabric: Identifying Ideal Topologies for Training A Priori Sparse Networks

Title NeuroFabric: Identifying Ideal Topologies for Training A Priori Sparse Networks
Authors Mihailo Isakov, Michel A. Kinsy
Abstract Long training times of deep neural networks are a bottleneck in machine learning research. The major impediment to fast training is the quadratic growth of both memory and compute requirements of dense and convolutional layers with respect to their information bandwidth. Recently, training `a priori’ sparse networks has been proposed as a method for allowing layers to retain high information bandwidth, while keeping memory and compute low. However, the choice of which sparse topology should be used in these networks is unclear. In this work, we provide a theoretical foundation for the choice of intra-layer topology. First, we derive a new sparse neural network initialization scheme that allows us to explore the space of very deep sparse networks. Next, we evaluate several topologies and show that seemingly similar topologies can often have a large difference in attainable accuracy. To explain these differences, we develop a data-free heuristic that can evaluate a topology independently from the dataset the network will be trained on. We then derive a set of requirements that make a good topology, and arrive at a single topology that satisfies all of them. |
Published 2020-02-19
URL https://arxiv.org/abs/2002.08339v1
PDF https://arxiv.org/pdf/2002.08339v1.pdf
PWC https://paperswithcode.com/paper/neurofabric-identifying-ideal-topologies-for-1
Repo
Framework

#### Registration by tracking for sequential 2D MRI

Title Registration by tracking for sequential 2D MRI
Authors Niklas Gunnarsson, Jens Sjölund, Thomas B. Schön
Abstract Our anatomy is in constant motion. With modern MR imaging it is possible to record this motion in real-time during an ongoing radiation therapy session. In this paper we present an image registration method that exploits the sequential nature of 2D MR images to estimate the corresponding displacement field. The method employs several discriminative correlation filters that independently track specific points. Together with a sparse-to-dense interpolation scheme we can then estimate of the displacement field. The discriminative correlation filters are trained online, and our method is modality agnostic. For the interpolation scheme we use a neural network with normalized convolutions that is trained using synthetic diffeomorphic displacement fields. The method is evaluated on a segmented cardiac dataset and when compared to two conventional methods we observe an improved performance. This improvement is especially pronounced when it comes to the detection of larger motions of small objects.