Paper Group ANR 20
Learning to Catch Piglets in Flight. Countering Language Drift with Seeded Iterated Learning. Verification of Neural Networks: Enhancing Scalability through Pruning. On the Decision Boundaries of Deep Neural Networks: A Tropical Geometry Perspective. Jointly Learning to Recommend and Advertise. Sketching Transformed Matrices with Applications to Na …
Learning to Catch Piglets in Flight
Title | Learning to Catch Piglets in Flight |
Authors | Ozan Çatal, Lawrence De Mol, Tim Verbelen, Bart Dhoedt |
Abstract | Catching objects in-flight is an outstanding challenge in robotics. In this paper, we present a closed-loop control system fusing data from two sensor modalities: an RGB-D camera and a radar. To develop and test our method, we start with an easy to identify object: a stuffed Piglet. We implement and compare two approaches to detect and track the object, and to predict the interception point. A baseline model uses colour filtering for locating the thrown object in the environment, while the interception point is predicted using a least squares regression over the physical ballistic trajectory equations. A deep learning based method uses artificial neural networks for both object detection and interception point prediction. We show that we are able to successfully catch Piglet in 80% of the cases with our deep learning approach. |
Tasks | Object Detection |
Published | 2020-01-28 |
URL | https://arxiv.org/abs/2001.10220v1 |
https://arxiv.org/pdf/2001.10220v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-catch-piglets-in-flight |
Repo | |
Framework | |
Countering Language Drift with Seeded Iterated Learning
Title | Countering Language Drift with Seeded Iterated Learning |
Authors | Yuchen Lu, Soumye Singhal, Florian Strub, Olivier Pietquin, Aaron Courville |
Abstract | Supervised learning methods excel at capturing statistical properties of language when trained over large text corpora. Yet, these models often produce inconsistent outputs in goal-oriented language settings as they are not trained to complete the underlying task. Moreover, as soon as the agents are finetuned to maximize task completion, they suffer from the so-called language drift phenomenon: they slowly lose syntactic and semantic properties of language as they only focus on solving the task. In this paper, we propose a generic approach to counter language drift by using iterated learning. We iterate between fine-tuning agents with interactive training steps, and periodically replacing them with new agents that are seeded from last iteration and trained to imitate the latest finetuned models. Iterated learning does not require external syntactic constraint nor semantic knowledge, making it a valuable task-agnostic finetuning protocol. We first explore iterated learning in the Lewis Game. We then scale-up the approach in the translation game. In both settings, our results show that iterated learn-ing drastically counters language drift as well as it improves the task completion metric. |
Tasks | |
Published | 2020-03-28 |
URL | https://arxiv.org/abs/2003.12694v1 |
https://arxiv.org/pdf/2003.12694v1.pdf | |
PWC | https://paperswithcode.com/paper/countering-language-drift-with-seeded |
Repo | |
Framework | |
Verification of Neural Networks: Enhancing Scalability through Pruning
Title | Verification of Neural Networks: Enhancing Scalability through Pruning |
Authors | Dario Guidotti, Francesco Leofante, Luca Pulina, Armando Tacchella |
Abstract | Verification of deep neural networks has witnessed a recent surge of interest, fueled by success stories in diverse domains and by abreast concerns about safety and security in envisaged applications. Complexity and sheer size of such networks are challenging for automated formal verification techniques which, on the other hand, could ease the adoption of deep networks in safety- and security-critical contexts. In this paper we focus on enabling state-of-the-art verification tools to deal with neural networks of some practical interest. We propose a new training pipeline based on network pruning with the goal of striking a balance between maintaining accuracy and robustness while making the resulting networks amenable to formal analysis. The results of our experiments with a portfolio of pruning algorithms and verification tools show that our approach is successful for the kind of networks we consider and for some combinations of pruning and verification techniques, thus bringing deep neural networks closer to the reach of formally-grounded methods. |
Tasks | Network Pruning |
Published | 2020-03-17 |
URL | https://arxiv.org/abs/2003.07636v1 |
https://arxiv.org/pdf/2003.07636v1.pdf | |
PWC | https://paperswithcode.com/paper/verification-of-neural-networks-enhancing |
Repo | |
Framework | |
On the Decision Boundaries of Deep Neural Networks: A Tropical Geometry Perspective
Title | On the Decision Boundaries of Deep Neural Networks: A Tropical Geometry Perspective |
Authors | Motasem Alfarra, Adel Bibi, Hasan Hammoud, Mohamed Gaafar, Bernard Ghanem |
Abstract | This work tackles the problem of characterizing and understanding the decision boundaries of neural networks with piecewise linear non-linearity activations. We use tropical geometry, a new development in the area of algebraic geometry, to characterize the decision boundaries of a simple neural network of the form (Affine, ReLU, Affine). Our main finding is that the decision boundaries are a subset of a tropical hypersurface, which is intimately related to a polytope formed by the convex hull of two zonotopes. The generators of these zonotopes are functions of the neural network parameters. This geometric characterization provides new perspective to three tasks. Specifically, we propose a new tropical perspective to the lottery ticket hypothesis, where we see the effect of different initializations on the tropical geometric representation of a network’s decision boundaries. Moreover, we use this characterization to propose a new set of tropical regularizers, which directly deal with the decision boundaries of a network. We investigate the use of these regularizers in neural network pruning (by removing network parameters that do not contribute to the tropical geometric representation of the decision boundaries) and in generating adversarial input attacks (by producing input perturbations that explicitly perturb the decision boundaries’ geometry and ultimately change the network’s prediction). |
Tasks | Network Pruning |
Published | 2020-02-20 |
URL | https://arxiv.org/abs/2002.08838v1 |
https://arxiv.org/pdf/2002.08838v1.pdf | |
PWC | https://paperswithcode.com/paper/on-the-decision-boundaries-of-deep-neural-1 |
Repo | |
Framework | |
Jointly Learning to Recommend and Advertise
Title | Jointly Learning to Recommend and Advertise |
Authors | Xiangyu Zhao, Xudong Zheng, Xiwang Yang, Xiaobing Liu, Jiliang Tang |
Abstract | Online recommendation and advertising are two major income channels for online recommendation platforms (e.g. e-commerce and news feed site). However, most platforms optimize recommending and advertising strategies by different teams separately via different techniques, which may lead to suboptimal overall performances. To this end, in this paper, we propose a novel two-level reinforcement learning framework to jointly optimize the recommending and advertising strategies, where the first level generates a list of recommendations to optimize user experience in the long run; then the second level inserts ads into the recommendation list that can balance the immediate advertising revenue from advertisers and the negative influence of ads on long-term user experience. To be specific, first level tackles high combinatorial action space problem that selects a subset items from the large item space; while the second level determines three internally related tasks, i.e., (i) whether to insert an ad, and if yes, (ii) the optimal ad and (iii) the optimal location to insert. The experimental results based on real-world data demonstrate the effectiveness of the proposed framework. |
Tasks | |
Published | 2020-02-28 |
URL | https://arxiv.org/abs/2003.00097v1 |
https://arxiv.org/pdf/2003.00097v1.pdf | |
PWC | https://paperswithcode.com/paper/jointly-learning-to-recommend-and-advertise |
Repo | |
Framework | |
Sketching Transformed Matrices with Applications to Natural Language Processing
Title | Sketching Transformed Matrices with Applications to Natural Language Processing |
Authors | Yingyu Liang, Zhao Song, Mengdi Wang, Lin F. Yang, Xin Yang |
Abstract | Suppose we are given a large matrix $A=(a_{i,j})$ that cannot be stored in memory but is in a disk or is presented in a data stream. However, we need to compute a matrix decomposition of the entry-wisely transformed matrix, $f(A):=(f(a_{i,j}))$ for some function $f$. Is it possible to do it in a space efficient way? Many machine learning applications indeed need to deal with such large transformed matrices, for example word embedding method in NLP needs to work with the pointwise mutual information (PMI) matrix, while the entrywise transformation makes it difficult to apply known linear algebraic tools. Existing approaches for this problem either need to store the whole matrix and perform the entry-wise transformation afterwards, which is space consuming or infeasible, or need to redesign the learning method, which is application specific and requires substantial remodeling. In this paper, we first propose a space-efficient sketching algorithm for computing the product of a given small matrix with the transformed matrix. It works for a general family of transformations with provable small error bounds and thus can be used as a primitive in downstream learning tasks. We then apply this primitive to a concrete application: low-rank approximation. We show that our approach obtains small error and is efficient in both space and time. We complement our theoretical results with experiments on synthetic and real data. |
Tasks | |
Published | 2020-02-23 |
URL | https://arxiv.org/abs/2002.09812v1 |
https://arxiv.org/pdf/2002.09812v1.pdf | |
PWC | https://paperswithcode.com/paper/sketching-transformed-matrices-with |
Repo | |
Framework | |
Approximate Data Deletion from Machine Learning Models: Algorithms and Evaluations
Title | Approximate Data Deletion from Machine Learning Models: Algorithms and Evaluations |
Authors | Zachary Izzo, Mary Anne Smart, Kamalika Chaudhuri, James Zou |
Abstract | Deleting data from a trained machine learning (ML) model is a critical task in many applications. For example, we may want to remove the influence of training points that might be out of date or outliers. Regulations such as EU’s General Data Protection Regulation also stipulate that individuals can request to have their data deleted. The naive approach to data deletion is to retrain the ML model on the remaining data, but this is too time consuming. Moreover there is no known efficient algorithm that exactly deletes data from most ML models. In this work, we evaluate several approaches for approximate data deletion from trained models. For the case of linear regression, we propose a new method with linear dependence on the feature dimension $d$, a significant gain over all existing methods which all have superlinear time dependence on the dimension. We also provide a new test for evaluating data deletion from linear models. |
Tasks | |
Published | 2020-02-24 |
URL | https://arxiv.org/abs/2002.10077v1 |
https://arxiv.org/pdf/2002.10077v1.pdf | |
PWC | https://paperswithcode.com/paper/approximate-data-deletion-from-machine |
Repo | |
Framework | |
Compact Representation of Uncertainty in Hierarchical Clustering
Title | Compact Representation of Uncertainty in Hierarchical Clustering |
Authors | Craig S. Greenberg, Sebastian Macaluso, Nicholas Monath, Ji-Ah Lee, Patrick Flaherty, Kyle Cranmer, Andrew McGregor, Andrew McCallum |
Abstract | Hierarchical clustering is a fundamental task often used to discover meaningful structures in data, such as phylogenetic trees, taxonomies of concepts, subtypes of cancer, and cascades of particle decays in particle physics. When multiple hierarchical clusterings of the data are possible, it is useful to represent uncertainty in the clustering through various probabilistic quantities. Existing approaches represent uncertainty for a range of models; however, they only provide approximate inference. This paper presents dynamic-programming algorithms and proofs for exact inference in hierarchical clustering. We are able to compute the partition function, MAP hierarchical clustering, and marginal probabilities of sub-hierarchies and clusters. Our method supports a wide range of hierarchical models and only requires a cluster compatibility function. Rather than scaling with the number of hierarchical clusterings of $n$ elements ($\omega(n n! / 2^{n-1})$), our approach runs in time and space proportional to the significantly smaller powerset of $n$. Despite still being large, these algorithms enable exact inference in small-data applications and are also interesting from a theoretical perspective. We demonstrate the utility of our method and compare its performance with respect to existing approximate methods. |
Tasks | |
Published | 2020-02-26 |
URL | https://arxiv.org/abs/2002.11661v1 |
https://arxiv.org/pdf/2002.11661v1.pdf | |
PWC | https://paperswithcode.com/paper/compact-representation-of-uncertainty-in-1 |
Repo | |
Framework | |
Performance Aware Convolutional Neural Network Channel Pruning for Embedded GPUs
Title | Performance Aware Convolutional Neural Network Channel Pruning for Embedded GPUs |
Authors | Valentin Radu, Kuba Kaszyk, Yuan Wen, Jack Turner, Jose Cano, Elliot J. Crowley, Bjorn Franke, Amos Storkey, Michael O’Boyle |
Abstract | Convolutional Neural Networks (CNN) are becoming a common presence in many applications and services, due to their superior recognition accuracy. They are increasingly being used on mobile devices, many times just by porting large models designed for server space, although several model compression techniques have been considered. One model compression technique intended to reduce computations is channel pruning. Mobile and embedded systems now have GPUs which are ideal for the parallel computations of neural networks and for their lower energy cost per operation. Specialized libraries perform these neural network computations through highly optimized routines. As we find in our experiments, these libraries are optimized for the most common network shapes, making uninstructed channel pruning inefficient. We evaluate higher level libraries, which analyze the input characteristics of a convolutional layer, based on which they produce optimized OpenCL (Arm Compute Library and TVM) and CUDA (cuDNN) code. However, in reality, these characteristics and subsequent choices intended for optimization can have the opposite effect. We show that a reduction in the number of convolutional channels, pruning 12% of the initial size, is in some cases detrimental to performance, leading to 2x slowdown. On the other hand, we also find examples where performance-aware pruning achieves the intended results, with performance speedups of 3x with cuDNN and above 10x with Arm Compute Library and TVM. Our findings expose the need for hardware-instructed neural network pruning. |
Tasks | Model Compression, Network Pruning |
Published | 2020-02-20 |
URL | https://arxiv.org/abs/2002.08697v1 |
https://arxiv.org/pdf/2002.08697v1.pdf | |
PWC | https://paperswithcode.com/paper/performance-aware-convolutional-neural |
Repo | |
Framework | |
Bayesian Domain Randomization for Sim-to-Real Transfer
Title | Bayesian Domain Randomization for Sim-to-Real Transfer |
Authors | Fabio Muratore, Christian Eilers, Michael Gienger, Jan Peters |
Abstract | When learning policies for robot control, the real-world data required is typically prohibitively expensive to acquire, so learning in simulation is a popular strategy. Unfortunately, such polices are often not transferable to the real world due to a mismatch between the simulation and reality, called ‘reality gap’. Domain randomization methods tackle this problem by randomizing the physics simulator (source domain) according to a distribution over domain parameters during training in order to obtain more robust policies that are able to overcome the reality gap. Most domain randomization approaches sample the domain parameters from a fixed distribution. This solution is suboptimal in the context of sim-to-real transferability, since it yields policies that have been trained without explicitly optimizing for the reward on the real system (target domain). Additionally, a fixed distribution assumes there is prior knowledge about the uncertainty over the domain parameters. Thus, we propose Bayesian Domain Randomization (BayRn), a black box sim-to-real algorithm that solves tasks efficiently by adapting the domain parameter distribution during learning by sampling the real-world target domain. BayRn utilizes Bayesian optimization to search the space of source domain distribution parameters which produce a policy that maximizes the real-word objective, allowing for adaptive distributions during policy optimization. We experimentally validate the proposed approach by comparing against two baseline methods on a nonlinear under-actuated swing-up task. Our results show that BayRn is capable to perform direct sim-to-real transfer, while significantly reducing the required prior knowledge. |
Tasks | |
Published | 2020-03-05 |
URL | https://arxiv.org/abs/2003.02471v1 |
https://arxiv.org/pdf/2003.02471v1.pdf | |
PWC | https://paperswithcode.com/paper/bayesian-domain-randomization-for-sim-to-real |
Repo | |
Framework | |
Parallel Performance-Energy Predictive Modeling of Browsers: Case Study of Servo
Title | Parallel Performance-Energy Predictive Modeling of Browsers: Case Study of Servo |
Authors | Rohit Zambre, Lars Bergstrom, Laleh Aghababaie Beni, Aparna Chandramowliswharan |
Abstract | Mozilla Research is developing Servo, a parallel web browser engine, to exploit the benefits of parallelism and concurrency in the web rendering pipeline. Parallelization results in improved performance for pinterest.com but not for google.com. This is because the workload of a browser is dependent on the web page it is rendering. In many cases, the overhead of creating, deleting, and coordinating parallel work outweighs any of its benefits. In this paper, we model the relationship between web page primitives and a web browser’s parallel performance using supervised learning. We discover a feature space that is representative of the parallelism available in a web page and characterize it using seven key features. Additionally, we consider energy usage trade-offs for different levels of performance improvements using automated labeling algorithms. Such a model allows us to predict the degree of parallelism available in a web page and decide whether or not to render a web page in parallel. This modeling is critical for improving the browser’s performance and minimizing its energy usage. We evaluate our model by using Servo’s layout stage as a case study. Experiments on a quad-core Intel Ivy Bridge (i7-3615QM) laptop show that we can improve performance and energy usage by up to 94.52% and 46.32% respectively on the 535 web pages considered in this study. Looking forward, we identify opportunities to apply this model to other stages of a browser’s architecture as well as other performance- and energy-critical devices. |
Tasks | |
Published | 2020-02-06 |
URL | https://arxiv.org/abs/2002.03850v1 |
https://arxiv.org/pdf/2002.03850v1.pdf | |
PWC | https://paperswithcode.com/paper/parallel-performance-energy-predictive |
Repo | |
Framework | |
Quaternion-Valued Recurrent Projection Neural Networks on Unit Quaternions
Title | Quaternion-Valued Recurrent Projection Neural Networks on Unit Quaternions |
Authors | Marcos Eduardo Valle, Rodolfo Anibal Lobo |
Abstract | Hypercomplex-valued neural networks, including quaternion-valued neural networks, can treat multi-dimensional data as a single entity. In this paper, we present the quaternion-valued recurrent projection neural networks (QRPNNs). Briefly, QRPNNs are obtained by combining the non-local projection learning with the quaternion-valued recurrent correlation neural network (QRCNNs). We show that QRPNNs overcome the cross-talk problem of QRCNNs. Thus, they are appropriate to implement associative memories. Furthermore, computational experiments reveal that QRPNNs exhibit greater storage capacity and noise tolerance than their corresponding QRCNNs. |
Tasks | |
Published | 2020-01-30 |
URL | https://arxiv.org/abs/2001.11846v1 |
https://arxiv.org/pdf/2001.11846v1.pdf | |
PWC | https://paperswithcode.com/paper/quaternion-valued-recurrent-projection-neural |
Repo | |
Framework | |
Politics of Adversarial Machine Learning
Title | Politics of Adversarial Machine Learning |
Authors | Kendra Albert, Jonathon Penney, Bruce Schneier, Ram Shankar Siva Kumar |
Abstract | In addition to their security properties, adversarial machine-learning attacks and defenses have political dimensions. They enable or foreclose certain options for both the subjects of the machine learning systems and for those who deploy them, creating risks for civil liberties and human rights. In this paper, we draw on insights from science and technology studies, anthropology, and human rights literature, to inform how defenses against adversarial attacks can be used to suppress dissent and limit attempts to investigate machine learning systems. To make this concrete, we use real-world examples of how attacks such as perturbation, model inversion, or membership inference can be used for socially desirable ends. Although the predictions of this analysis may seem dire, there is hope. Efforts to address human rights concerns in the commercial spyware industry provide guidance for similar measures to ensure ML systems serve democratic, not authoritarian ends |
Tasks | |
Published | 2020-02-01 |
URL | https://arxiv.org/abs/2002.05648v2 |
https://arxiv.org/pdf/2002.05648v2.pdf | |
PWC | https://paperswithcode.com/paper/politics-of-adversarial-machine-learning |
Repo | |
Framework | |
NeuroFabric: Identifying Ideal Topologies for Training A Priori Sparse Networks
Title | NeuroFabric: Identifying Ideal Topologies for Training A Priori Sparse Networks |
Authors | Mihailo Isakov, Michel A. Kinsy |
Abstract | Long training times of deep neural networks are a bottleneck in machine learning research. The major impediment to fast training is the quadratic growth of both memory and compute requirements of dense and convolutional layers with respect to their information bandwidth. Recently, training `a priori’ sparse networks has been proposed as a method for allowing layers to retain high information bandwidth, while keeping memory and compute low. However, the choice of which sparse topology should be used in these networks is unclear. In this work, we provide a theoretical foundation for the choice of intra-layer topology. First, we derive a new sparse neural network initialization scheme that allows us to explore the space of very deep sparse networks. Next, we evaluate several topologies and show that seemingly similar topologies can often have a large difference in attainable accuracy. To explain these differences, we develop a data-free heuristic that can evaluate a topology independently from the dataset the network will be trained on. We then derive a set of requirements that make a good topology, and arrive at a single topology that satisfies all of them. | |
Tasks | |
Published | 2020-02-19 |
URL | https://arxiv.org/abs/2002.08339v1 |
https://arxiv.org/pdf/2002.08339v1.pdf | |
PWC | https://paperswithcode.com/paper/neurofabric-identifying-ideal-topologies-for-1 |
Repo | |
Framework | |
Registration by tracking for sequential 2D MRI
Title | Registration by tracking for sequential 2D MRI |
Authors | Niklas Gunnarsson, Jens Sjölund, Thomas B. Schön |
Abstract | Our anatomy is in constant motion. With modern MR imaging it is possible to record this motion in real-time during an ongoing radiation therapy session. In this paper we present an image registration method that exploits the sequential nature of 2D MR images to estimate the corresponding displacement field. The method employs several discriminative correlation filters that independently track specific points. Together with a sparse-to-dense interpolation scheme we can then estimate of the displacement field. The discriminative correlation filters are trained online, and our method is modality agnostic. For the interpolation scheme we use a neural network with normalized convolutions that is trained using synthetic diffeomorphic displacement fields. The method is evaluated on a segmented cardiac dataset and when compared to two conventional methods we observe an improved performance. This improvement is especially pronounced when it comes to the detection of larger motions of small objects. |
Tasks | Image Registration |
Published | 2020-03-24 |
URL | https://arxiv.org/abs/2003.10819v1 |
https://arxiv.org/pdf/2003.10819v1.pdf | |
PWC | https://paperswithcode.com/paper/registration-by-tracking-for-sequential-2d |
Repo | |
Framework | |