October 21, 2019

3010 words 15 mins read

Paper Group AWR 7

Paper Group AWR 7

Learning Multiplication-free Linear Transformations. iCAN: Instance-Centric Attention Network for Human-Object Interaction Detection. URSA: A Neural Network for Unordered Point Clouds Using Constellations. Learning Permutations with Sinkhorn Policy Gradient. Urban Change Detection for Multispectral Earth Observation Using Convolutional Neural Netwo …

Learning Multiplication-free Linear Transformations

Title Learning Multiplication-free Linear Transformations
Authors Cristian Rusu
Abstract In this paper, we propose several dictionary learning algorithms for sparse representations that also impose specific structures on the learned dictionaries such that they are numerically efficient to use: reduced number of addition/multiplications and even avoiding multiplications altogether. We base our work on factorizations of the dictionary in highly structured basic building blocks (binary orthonormal, scaling and shear transformations) for which we can write closed-form solutions to the optimization problems that we consider. We show the effectiveness of our methods on image data where we can compare against well-known numerically efficient transforms such as the fast Fourier and the fast discrete cosine transforms.
Tasks Dictionary Learning
Published 2018-12-09
URL http://arxiv.org/abs/1812.03412v1
PDF http://arxiv.org/pdf/1812.03412v1.pdf
PWC https://paperswithcode.com/paper/learning-multiplication-free-linear
Repo https://github.com/cristian-rusu-research/multiplication-free-transform
Framework none

iCAN: Instance-Centric Attention Network for Human-Object Interaction Detection

Title iCAN: Instance-Centric Attention Network for Human-Object Interaction Detection
Authors Chen Gao, Yuliang Zou, Jia-Bin Huang
Abstract Recent years have witnessed rapid progress in detecting and recognizing individual object instances. To understand the situation in a scene, however, computers need to recognize how humans interact with surrounding objects. In this paper, we tackle the challenging task of detecting human-object interactions (HOI). Our core idea is that the appearance of a person or an object instance contains informative cues on which relevant parts of an image to attend to for facilitating interaction prediction. To exploit these cues, we propose an instance-centric attention module that learns to dynamically highlight regions in an image conditioned on the appearance of each instance. Such an attention-based network allows us to selectively aggregate features relevant for recognizing HOIs. We validate the efficacy of the proposed network on the Verb in COCO and HICO-DET datasets and show that our approach compares favorably with the state-of-the-arts.
Tasks Human-Object Interaction Detection
Published 2018-08-30
URL http://arxiv.org/abs/1808.10437v1
PDF http://arxiv.org/pdf/1808.10437v1.pdf
PWC https://paperswithcode.com/paper/ican-instance-centric-attention-network-for
Repo https://github.com/TaiwanRobert/iCAN_for_live_video
Framework tf

URSA: A Neural Network for Unordered Point Clouds Using Constellations

Title URSA: A Neural Network for Unordered Point Clouds Using Constellations
Authors Mark B. Skouson, Brett J. Borghetti, Robert C. Leishman
Abstract This paper describes a neural network layer, named Ursa, that uses a constellation of points to learn classification information from point cloud data. Unlike other machine learning classification problems where the task is to classify an individual high-dimensional observation, in a point-cloud classification problem the goal is to classify a set of d-dimensional observations. Because a point cloud is a set, there is no ordering to the collection of points in a point-cloud classification problem. Thus, the challenge of classifying point clouds inputs is in building a classifier which is agnostic to the ordering of the observations, yet preserves the d-dimensional information of each point in the set. This research presents Ursa, a new layer type for an artificial neural network which achieves these two properties. Similar to new methods for this task, this architecture works directly on d-dimensional points rather than first converting the points to a d-dimensional volume. The Ursa layer is followed by a series of dense layers to classify 2D and 3D objects from point clouds. Experiments on ModelNet40 and MNIST data show classification results comparable with current methods, while reducing the training parameters by over 50 percent.
Tasks
Published 2018-08-14
URL http://arxiv.org/abs/1808.04848v2
PDF http://arxiv.org/pdf/1808.04848v2.pdf
PWC https://paperswithcode.com/paper/ursa-a-neural-network-for-unordered-point
Repo https://github.com/RadicalAcronym/URSA
Framework tf

Learning Permutations with Sinkhorn Policy Gradient

Title Learning Permutations with Sinkhorn Policy Gradient
Authors Patrick Emami, Sanjay Ranka
Abstract Many problems at the intersection of combinatorics and computer science require solving for a permutation that optimally matches, ranks, or sorts some data. These problems usually have a task-specific, often non-differentiable objective function that data-driven algorithms can use as a learning signal. In this paper, we propose the Sinkhorn Policy Gradient (SPG) algorithm for learning policies on permutation matrices. The actor-critic neural network architecture we introduce for SPG uniquely decouples representation learning of the state space from the highly-structured action space of permutations with a temperature-controlled Sinkhorn layer. The Sinkhorn layer produces continuous relaxations of permutation matrices so that the actor-critic architecture can be trained end-to-end. Our empirical results show that agents trained with SPG can perform competitively on sorting, the Euclidean TSP, and matching tasks. We also observe that SPG is significantly more data efficient at the matching task than the baseline methods, which indicates that SPG is conducive to learning representations that are useful for reasoning about permutations.
Tasks Representation Learning
Published 2018-05-18
URL http://arxiv.org/abs/1805.07010v1
PDF http://arxiv.org/pdf/1805.07010v1.pdf
PWC https://paperswithcode.com/paper/learning-permutations-with-sinkhorn-policy
Repo https://github.com/pemami4911/sinkhorn-policy-gradient.pytorch
Framework pytorch

Urban Change Detection for Multispectral Earth Observation Using Convolutional Neural Networks

Title Urban Change Detection for Multispectral Earth Observation Using Convolutional Neural Networks
Authors Rodrigo Caye Daudt, Bertrand Le Saux, Alexandre Boulch, Yann Gousseau
Abstract The Copernicus Sentinel-2 program now provides multispectral images at a global scale with a high revisit rate. In this paper we explore the usage of convolutional neural networks for urban change detection using such multispectral images. We first present the new change detection dataset that was used for training the proposed networks, which will be openly available to serve as a benchmark. The Onera Satellite Change Detection (OSCD) dataset is composed of pairs of multispectral aerial images, and the changes were manually annotated at pixel level. We then propose two architectures to detect changes, Siamese and Early Fusion, and compare the impact of using different numbers of spectral channels as inputs. These architectures are trained from scratch using the provided dataset.
Tasks
Published 2018-10-19
URL http://arxiv.org/abs/1810.08468v1
PDF http://arxiv.org/pdf/1810.08468v1.pdf
PWC https://paperswithcode.com/paper/urban-change-detection-for-multispectral
Repo https://github.com/rcdaudt/patch_based_change_detection
Framework pytorch

CAAD 2018: Generating Transferable Adversarial Examples

Title CAAD 2018: Generating Transferable Adversarial Examples
Authors Yash Sharma, Tien-Dung Le, Moustafa Alzantot
Abstract Deep neural networks (DNNs) are vulnerable to adversarial examples, perturbations carefully crafted to fool the targeted DNN, in both the non-targeted and targeted case. In the non-targeted case, the attacker simply aims to induce misclassification. In the targeted case, the attacker aims to induce classification to a specified target class. In addition, it has been observed that strong adversarial examples can transfer to unknown models, yielding a serious security concern. The NIPS 2017 competition was organized to accelerate research in adversarial attacks and defenses, taking place in the realistic setting where submitted adversarial attacks attempt to transfer to submitted defenses. The CAAD 2018 competition took place with nearly identical rules to the NIPS 2017 one. Given the requirement that the NIPS 2017 submissions were to be open-sourced, participants in the CAAD 2018 competition were able to directly build upon previous solutions, and thus improve the state-of-the-art in this setting. Our team participated in the CAAD 2018 competition, and won 1st place in both attack subtracks, non-targeted and targeted adversarial attacks, and 3rd place in defense. We outline our solutions and development results in this article. We hope our results can inform researchers in both generating and defending against adversarial examples.
Tasks
Published 2018-09-29
URL http://arxiv.org/abs/1810.01268v2
PDF http://arxiv.org/pdf/1810.01268v2.pdf
PWC https://paperswithcode.com/paper/caad-2018-generating-transferable-adversarial
Repo https://github.com/ysharma1126/caad_18
Framework tf

Fast and interpretable classification of small X-ray diffraction datasets using data augmentation and deep neural networks

Title Fast and interpretable classification of small X-ray diffraction datasets using data augmentation and deep neural networks
Authors Felipe Oviedo, Zekun Ren, Shijing Sun, Charlie Settens, Zhe Liu, Noor Titan Putri Hartono, Ramasamy Savitha, Brian L. DeCost, Siyu I. P. Tian, Giuseppe Romano, Aaron Gilad Kusne, Tonio Buonassisi
Abstract X-ray diffraction (XRD) data acquisition and analysis is among the most time-consuming steps in the development cycle of novel thin-film materials. We propose a machine-learning-enabled approach to predict crystallographic dimensionality and space group from a limited number of thin-film XRD patterns. We overcome the scarce-data problem intrinsic to novel materials development by coupling a supervised machine learning approach with a model agnostic, physics-informed data augmentation strategy using simulated data from the Inorganic Crystal Structure Database (ICSD) and experimental data. As a test case, 115 thin-film metal halides spanning 3 dimensionalities and 7 space-groups are synthesized and classified. After testing various algorithms, we develop and implement an all convolutional neural network, with cross validated accuracies for dimensionality and space-group classification of 93% and 89%, respectively. We propose average class activation maps, computed from a global average pooling layer, to allow high model interpretability by human experimentalists, elucidating the root causes of misclassification. Finally, we systematically evaluate the maximum XRD pattern step size (data acquisition rate) before loss of predictive accuracy occurs, and determine it to be 0.16{\deg}, which enables an XRD pattern to be obtained and classified in 5.5 minutes or less.
Tasks Data Augmentation, X-Ray Diffraction (XRD)
Published 2018-11-20
URL http://arxiv.org/abs/1811.08425v2
PDF http://arxiv.org/pdf/1811.08425v2.pdf
PWC https://paperswithcode.com/paper/fast-classification-of-small-x-ray
Repo https://github.com/PV-Lab/autoXRD
Framework tf

BinGAN: Learning Compact Binary Descriptors with a Regularized GAN

Title BinGAN: Learning Compact Binary Descriptors with a Regularized GAN
Authors Maciej Zieba, Piotr Semberecki, Tarek El-Gaaly, Tomasz Trzcinski
Abstract In this paper, we propose a novel regularization method for Generative Adversarial Networks, which allows the model to learn discriminative yet compact binary representations of image patches (image descriptors). We employ the dimensionality reduction that takes place in the intermediate layers of the discriminator network and train binarized low-dimensional representation of the penultimate layer to mimic the distribution of the higher-dimensional preceding layers. To achieve this, we introduce two loss terms that aim at: (i) reducing the correlation between the dimensions of the binarized low-dimensional representation of the penultimate layer i. e. maximizing joint entropy) and (ii) propagating the relations between the dimensions in the high-dimensional space to the low-dimensional space. We evaluate the resulting binary image descriptors on two challenging applications, image matching and retrieval, and achieve state-of-the-art results.
Tasks Dimensionality Reduction
Published 2018-06-18
URL http://arxiv.org/abs/1806.06778v5
PDF http://arxiv.org/pdf/1806.06778v5.pdf
PWC https://paperswithcode.com/paper/bingan-learning-compact-binary-descriptors
Repo https://github.com/maciejzieba/binGAN
Framework none

On Breiman’s Dilemma in Neural Networks: Phase Transitions of Margin Dynamics

Title On Breiman’s Dilemma in Neural Networks: Phase Transitions of Margin Dynamics
Authors Weizhi Zhu, Yifei Huang, Yuan Yao
Abstract Margin enlargement over training data has been an important strategy since perceptrons in machine learning for the purpose of boosting the robustness of classifiers toward a good generalization ability. Yet Breiman shows a dilemma (Breiman, 1999) that a uniform improvement on margin distribution \emph{does not} necessarily reduces generalization errors. In this paper, we revisit Breiman’s dilemma in deep neural networks with recently proposed spectrally normalized margins. A novel perspective is provided to explain Breiman’s dilemma based on phase transitions in dynamics of normalized margin distributions, that reflects the trade-off between expressive power of models and complexity of data. When data complexity is comparable to the model expressiveness in the sense that both training and test data share similar phase transitions in normalized margin dynamics, two efficient ways are derived to predict the trend of generalization or test error via classic margin-based generalization bounds with restricted Rademacher complexities. On the other hand, over-expressive models that exhibit uniform improvements on training margins, as a distinct phase transition to test margin dynamics, may lose such a prediction power and fail to prevent the overfitting. Experiments are conducted to show the validity of the proposed method with some basic convolutional networks, AlexNet, VGG-16, and ResNet-18, on several datasets including Cifar10/100 and mini-ImageNet.
Tasks
Published 2018-10-08
URL http://arxiv.org/abs/1810.03389v2
PDF http://arxiv.org/pdf/1810.03389v2.pdf
PWC https://paperswithcode.com/paper/on-breimans-dilemma-in-neural-networks-phase
Repo https://github.com/yao-lab/margin
Framework pytorch

High throughput quantitative metallography for complex microstructures using deep learning: A case study in ultrahigh carbon steel

Title High throughput quantitative metallography for complex microstructures using deep learning: A case study in ultrahigh carbon steel
Authors Brian L. DeCost, Bo Lei, Toby Francis, Elizabeth A. Holm
Abstract We apply a deep convolutional neural network segmentation model to enable novel automated microstructure segmentation applications for complex microstructures typically evaluated manually and subjectively. We explore two microstructure segmentation tasks in an openly-available ultrahigh carbon steel microstructure dataset: segmenting cementite particles in the spheroidized matrix, and segmenting larger fields of view featuring grain boundary carbide, spheroidized particle matrix, particle-free grain boundary denuded zone, and Widmanst"atten cementite. We also demonstrate how to combine these data-driven microstructure segmentation models to obtain empirical cementite particle size and denuded zone width distributions from more complex micrographs containing multiple microconstituents. The full annotated dataset is available on materialsdata.nist.gov (https://materialsdata.nist.gov/handle/11256/964).
Tasks
Published 2018-05-04
URL http://arxiv.org/abs/1805.08693v2
PDF http://arxiv.org/pdf/1805.08693v2.pdf
PWC https://paperswithcode.com/paper/high-throughput-quantitative-metallography
Repo https://github.com/bdecost/pixelnet
Framework tf

Learning to Learn How to Learn: Self-Adaptive Visual Navigation Using Meta-Learning

Title Learning to Learn How to Learn: Self-Adaptive Visual Navigation Using Meta-Learning
Authors Mitchell Wortsman, Kiana Ehsani, Mohammad Rastegari, Ali Farhadi, Roozbeh Mottaghi
Abstract Learning is an inherently continuous phenomenon. When humans learn a new task there is no explicit distinction between training and inference. As we learn a task, we keep learning about it while performing the task. What we learn and how we learn it varies during different stages of learning. Learning how to learn and adapt is a key property that enables us to generalize effortlessly to new settings. This is in contrast with conventional settings in machine learning where a trained model is frozen during inference. In this paper we study the problem of learning to learn at both training and test time in the context of visual navigation. A fundamental challenge in navigation is generalization to unseen scenes. In this paper we propose a self-adaptive visual navigation method (SAVN) which learns to adapt to new environments without any explicit supervision. Our solution is a meta-reinforcement learning approach where an agent learns a self-supervised interaction loss that encourages effective navigation. Our experiments, performed in the AI2-THOR framework, show major improvements in both success rate and SPL for visual navigation in novel scenes. Our code and data are available at: https://github.com/allenai/savn .
Tasks Meta-Learning, Visual Navigation
Published 2018-12-03
URL http://arxiv.org/abs/1812.00971v2
PDF http://arxiv.org/pdf/1812.00971v2.pdf
PWC https://paperswithcode.com/paper/learning-to-learn-how-to-learn-self-adaptive
Repo https://github.com/allenai/savn
Framework pytorch

DiCE: The Infinitely Differentiable Monte-Carlo Estimator

Title DiCE: The Infinitely Differentiable Monte-Carlo Estimator
Authors Jakob Foerster, Gregory Farquhar, Maruan Al-Shedivat, Tim Rocktäschel, Eric P. Xing, Shimon Whiteson
Abstract The score function estimator is widely used for estimating gradients of stochastic objectives in stochastic computation graphs (SCG), eg, in reinforcement learning and meta-learning. While deriving the first-order gradient estimators by differentiating a surrogate loss (SL) objective is computationally and conceptually simple, using the same approach for higher-order derivatives is more challenging. Firstly, analytically deriving and implementing such estimators is laborious and not compliant with automatic differentiation. Secondly, repeatedly applying SL to construct new objectives for each order derivative involves increasingly cumbersome graph manipulations. Lastly, to match the first-order gradient under differentiation, SL treats part of the cost as a fixed sample, which we show leads to missing and wrong terms for estimators of higher-order derivatives. To address all these shortcomings in a unified way, we introduce DiCE, which provides a single objective that can be differentiated repeatedly, generating correct estimators of derivatives of any order in SCGs. Unlike SL, DiCE relies on automatic differentiation for performing the requisite graph manipulations. We verify the correctness of DiCE both through a proof and numerical evaluation of the DiCE derivative estimates. We also use DiCE to propose and evaluate a novel approach for multi-agent learning. Our code is available at https://www.github.com/alshedivat/lola.
Tasks Meta-Learning
Published 2018-02-14
URL http://arxiv.org/abs/1802.05098v3
PDF http://arxiv.org/pdf/1802.05098v3.pdf
PWC https://paperswithcode.com/paper/dice-the-infinitely-differentiable-monte
Repo https://github.com/alexis-jacq/LOLA_DICE
Framework pytorch

Multi-platform Version of StarCraft: Brood War in a Docker Container: Technical Report

Title Multi-platform Version of StarCraft: Brood War in a Docker Container: Technical Report
Authors Michal Šustr, Jan Malý, Michal Čertický
Abstract We present a dockerized version of a real-time strategy game StarCraft: Brood War, commonly used as a domain for AI research, with a pre-installed collection of AI developement tools supporting all the major types of StarCraft bots. This provides a convenient way to deploy StarCraft AIs on numerous hosts at once and across multiple platforms despite limited OS support of StarCraft. In this technical report, we describe the design of our Docker images and present a few use cases.
Tasks Real-Time Strategy Games, Starcraft
Published 2018-01-07
URL http://arxiv.org/abs/1801.02193v1
PDF http://arxiv.org/pdf/1801.02193v1.pdf
PWC https://paperswithcode.com/paper/multi-platform-version-of-starcraft-brood-war
Repo https://github.com/Games-and-Simulations/sc-docker
Framework none

TextBoxes++: A Single-Shot Oriented Scene Text Detector

Title TextBoxes++: A Single-Shot Oriented Scene Text Detector
Authors Minghui Liao, Baoguang Shi, Xiang Bai
Abstract Scene text detection is an important step of scene text recognition system and also a challenging problem. Different from general object detection, the main challenges of scene text detection lie on arbitrary orientations, small sizes, and significantly variant aspect ratios of text in natural images. In this paper, we present an end-to-end trainable fast scene text detector, named TextBoxes++, which detects arbitrary-oriented scene text with both high accuracy and efficiency in a single network forward pass. No post-processing other than an efficient non-maximum suppression is involved. We have evaluated the proposed TextBoxes++ on four public datasets. In all experiments, TextBoxes++ outperforms competing methods in terms of text localization accuracy and runtime. More specifically, TextBoxes++ achieves an f-measure of 0.817 at 11.6fps for 10241024 ICDAR 2015 Incidental text images, and an f-measure of 0.5591 at 19.8fps for 768768 COCO-Text images. Furthermore, combined with a text recognizer, TextBoxes++ significantly outperforms the state-of-the-art approaches for word spotting and end-to-end text recognition tasks on popular benchmarks. Code is available at: https://github.com/MhLiao/TextBoxes_plusplus
Tasks Object Detection, Scene Text Detection, Scene Text Recognition
Published 2018-01-09
URL http://arxiv.org/abs/1801.02765v3
PDF http://arxiv.org/pdf/1801.02765v3.pdf
PWC https://paperswithcode.com/paper/textboxes-a-single-shot-oriented-scene-text
Repo https://github.com/jercas/TextBoxes_plusplus_tf
Framework tf

Robustness of Rotation-Equivariant Networks to Adversarial Perturbations

Title Robustness of Rotation-Equivariant Networks to Adversarial Perturbations
Authors Beranger Dumont, Simona Maggio, Pablo Montalvo
Abstract Deep neural networks have been shown to be vulnerable to adversarial examples: very small perturbations of the input having a dramatic impact on the predictions. A wealth of adversarial attacks and distance metrics to quantify the similarity between natural and adversarial images have been proposed, recently enlarging the scope of adversarial examples with geometric transformations beyond pixel-wise attacks. In this context, we investigate the robustness to adversarial attacks of new Convolutional Neural Network architectures providing equivariance to rotations. We found that rotation-equivariant networks are significantly less vulnerable to geometric-based attacks than regular networks on the MNIST, CIFAR-10, and ImageNet datasets.
Tasks
Published 2018-02-19
URL http://arxiv.org/abs/1802.06627v2
PDF http://arxiv.org/pdf/1802.06627v2.pdf
PWC https://paperswithcode.com/paper/robustness-of-rotation-equivariant-networks
Repo https://github.com/rakutentech/stAdv
Framework tf
comments powered by Disqus