October 21, 2019

2986 words 15 mins read

Paper Group AWR 48

Actor Conditioned Attention Maps for Video Action Detection. Certified Defenses against Adversarial Examples. AirLab: Autograd Image Registration Laboratory. Deep Learning based Inter-Modality Image Registration Supervised by Intra-Modality Similarity. One-Shot Unsupervised Cross Domain Translation. L4: Practical loss-based stepsize adaptation for …

Actor Conditioned Attention Maps for Video Action Detection


Title	Actor Conditioned Attention Maps for Video Action Detection
Authors	Oytun Ulutan, Swati Rallapalli, Mudhakar Srivatsa, Carlos Torres, B. S. Manjunath
Abstract	While observing complex events with multiple actors, humans do not assess each actor separately, but infer from the context. The surrounding context provides essential information for understanding actions. To this end, we propose to replace region of interest(RoI) pooling with an attention module, which ranks each spatio-temporal region’s relevance to a detected actor instead of cropping. We refer to these as Actor-Conditioned Attention Maps (ACAM), which weight the features extracted from the entire scene. The resulting actor-conditioned features focus the model on regions that are relevant to the conditioned actor. For actor localization, we leverage pre-trained object detectors, which generalize better. The proposed model is efficient and our action detection pipeline achieves near real-time performance. Experimental results on AVA 2.1 and JHMDB demonstrate the effectiveness of attention maps, with improvements of 5 mAP on AVA and 4 mAP on JHMDB.
Tasks	Action Detection
Published	2018-12-30
URL	http://arxiv.org/abs/1812.11631v2
PDF	http://arxiv.org/pdf/1812.11631v2.pdf
PWC	https://paperswithcode.com/paper/actor-conditioned-attention-maps-for-video
Repo	https://github.com/oulutan/ACAM_Demo
Framework	tf

Certified Defenses against Adversarial Examples


Title	Certified Defenses against Adversarial Examples
Authors	Aditi Raghunathan, Jacob Steinhardt, Percy Liang
Abstract	While neural networks have achieved high accuracy on standard image classification benchmarks, their accuracy drops to nearly zero in the presence of small adversarial perturbations to test inputs. Defenses based on regularization and adversarial training have been proposed, but often followed by new, stronger attacks that defeat these defenses. Can we somehow end this arms race? In this work, we study this problem for neural networks with one hidden layer. We first propose a method based on a semidefinite relaxation that outputs a certificate that for a given network and test input, no attack can force the error to exceed a certain value. Second, as this certificate is differentiable, we jointly optimize it with the network parameters, providing an adaptive regularizer that encourages robustness against all attacks. On MNIST, our approach produces a network and a certificate that no attack that perturbs each pixel by at most \epsilon = 0.1 can cause more than 35% test error.
Tasks	Adversarial Attack, Adversarial Defense, Image Classification
Published	2018-01-29
URL	http://arxiv.org/abs/1801.09344v1
PDF	http://arxiv.org/pdf/1801.09344v1.pdf
PWC	https://paperswithcode.com/paper/certified-defenses-against-adversarial
Repo	https://github.com/UnofficialJuliaMirrorSnapshots/MIPVerify.jl-e5e5f8be-2a6a-5994-adbb-5afbd0e30425
Framework	none

AirLab: Autograd Image Registration Laboratory


Title	AirLab: Autograd Image Registration Laboratory
Authors	Robin Sandkühler, Christoph Jud, Simon Andermatt, Philippe C. Cattin
Abstract	Medical image registration is an active research topic and forms a basis for many medical image analysis tasks. Although image registration is a rather general concept specialized methods are usually required to target a specific registration problem. The development and implementation of such methods has been tough so far as the gradient of the objective has to be computed. Also, its evaluation has to be performed preferably on a GPU for larger images and for more complex transformation models and regularization terms. This hinders researchers from rapid prototyping and poses hurdles to reproduce research results. There is a clear need for an environment which hides this complexity to put the modeling and the experimental exploration of registration methods into the foreground. With the “Autograd Image Registration Laboratory” (AIRLab), we introduce an open laboratory for image registration tasks, where the analytic gradients of the objective function are computed automatically and the device where the computations are performed, on a CPU or a GPU, is transparent. It is meant as a laboratory for researchers and developers enabling them to rapidly try out new ideas for registering images and to reproduce registration results which have already been published. AIRLab is implemented in Python using PyTorch as tensor and optimization library and SimpleITK for basic image IO. Therefore, it profits from recent advances made by the machine learning community concerning optimization and deep neural network models. The presented draft of this paper outlines AIRLab with first code snippets and performance analyses. A more exhaustive introduction will follow as a final version soon.
Tasks	Image Registration, Medical Image Registration
Published	2018-06-26
URL	https://arxiv.org/abs/1806.09907v2
PDF	https://arxiv.org/pdf/1806.09907v2.pdf
PWC	https://paperswithcode.com/paper/airlab-autograd-image-registration-laboratory
Repo	https://github.com/airlab-unibas/airlab
Framework	pytorch

Deep Learning based Inter-Modality Image Registration Supervised by Intra-Modality Similarity


Title	Deep Learning based Inter-Modality Image Registration Supervised by Intra-Modality Similarity
Authors	Xiaohuan Cao, Jianhua Yang, Li Wang, Zhong Xue, Qian Wang, Dinggang Shen
Abstract	Non-rigid inter-modality registration can facilitate accurate information fusion from different modalities, but it is challenging due to the very different image appearances across modalities. In this paper, we propose to train a non-rigid inter-modality image registration network, which can directly predict the transformation field from the input multimodal images, such as CT and MR images. In particular, the training of our inter-modality registration network is supervised by intra-modality similarity metric based on the available paired data, which is derived from a pre-aligned CT and MR dataset. Specifically, in the training stage, to register the input CT and MR images, their similarity is evaluated on the warped MR image and the MR image that is paired with the input CT. So that, the intra-modality similarity metric can be directly applied to measure whether the input CT and MR images are well registered. Moreover, we use the idea of dual-modality fashion, in which we measure the similarity on both CT modality and MR modality. In this way, the complementary anatomies in both modalities can be jointly considered to more accurately train the inter-modality registration network. In the testing stage, the trained inter-modality registration network can be directly applied to register the new multimodal images without any paired data. Experimental results have shown that, the proposed method can achieve promising accuracy and efficiency for the challenging non-rigid inter-modality registration task and also outperforms the state-of-the-art approaches.
Tasks	Image Registration
Published	2018-04-28
URL	http://arxiv.org/abs/1804.10735v1
PDF	http://arxiv.org/pdf/1804.10735v1.pdf
PWC	https://paperswithcode.com/paper/deep-learning-based-inter-modality-image
Repo	https://github.com/Duoduo-Qian/Medical-image-registration-Resources
Framework	pytorch

One-Shot Unsupervised Cross Domain Translation


Title	One-Shot Unsupervised Cross Domain Translation
Authors	Sagie Benaim, Lior Wolf
Abstract	Given a single image x from domain A and a set of images from domain B, our task is to generate the analogous of x in B. We argue that this task could be a key AI capability that underlines the ability of cognitive agents to act in the world and present empirical evidence that the existing unsupervised domain translation methods fail on this task. Our method follows a two step process. First, a variational autoencoder for domain B is trained. Then, given the new sample x, we create a variational autoencoder for domain A by adapting the layers that are close to the image in order to directly fit x, and only indirectly adapt the other layers. Our experiments indicate that the new method does as well, when trained on one sample x, as the existing domain transfer methods, when these enjoy a multitude of training samples from domain A. Our code is made publicly available at https://github.com/sagiebenaim/OneShotTranslation
Tasks	One Shot Image to Image Translation, Unsupervised Image-To-Image Translation, Zero-Shot Learning
Published	2018-06-15
URL	http://arxiv.org/abs/1806.06029v2
PDF	http://arxiv.org/pdf/1806.06029v2.pdf
PWC	https://paperswithcode.com/paper/one-shot-unsupervised-cross-domain
Repo	https://github.com/guy-oren/OneShotTranslationExt
Framework	pytorch

L4: Practical loss-based stepsize adaptation for deep learning


Title	L4: Practical loss-based stepsize adaptation for deep learning
Authors	Michal Rolinek, Georg Martius
Abstract	We propose a stepsize adaptation scheme for stochastic gradient descent. It operates directly with the loss function and rescales the gradient in order to make fixed predicted progress on the loss. We demonstrate its capabilities by conclusively improving the performance of Adam and Momentum optimizers. The enhanced optimizers with default hyperparameters consistently outperform their constant stepsize counterparts, even the best ones, without a measurable increase in computational cost. The performance is validated on multiple architectures including dense nets, CNNs, ResNets, and the recurrent Differential Neural Computer on classical datasets MNIST, fashion MNIST, CIFAR10 and others.
Tasks
Published	2018-02-14
URL	http://arxiv.org/abs/1802.05074v5
PDF	http://arxiv.org/pdf/1802.05074v5.pdf
PWC	https://paperswithcode.com/paper/l4-practical-loss-based-stepsize-adaptation
Repo	https://github.com/martius-lab/l4-optimizer
Framework	tf

Learning Tasks for Multitask Learning: Heterogenous Patient Populations in the ICU


Title	Learning Tasks for Multitask Learning: Heterogenous Patient Populations in the ICU
Authors	Harini Suresh, Jen J. Gong, John Guttag
Abstract	Machine learning approaches have been effective in predicting adverse outcomes in different clinical settings. These models are often developed and evaluated on datasets with heterogeneous patient populations. However, good predictive performance on the aggregate population does not imply good performance for specific groups. In this work, we present a two-step framework to 1) learn relevant patient subgroups, and 2) predict an outcome for separate patient populations in a multi-task framework, where each population is a separate task. We demonstrate how to discover relevant groups in an unsupervised way with a sequence-to-sequence autoencoder. We show that using these groups in a multi-task framework leads to better predictive performance of in-hospital mortality both across groups and overall. We also highlight the need for more granular evaluation of performance when dealing with heterogeneous populations.
Tasks
Published	2018-06-07
URL	http://arxiv.org/abs/1806.02878v1
PDF	http://arxiv.org/pdf/1806.02878v1.pdf
PWC	https://paperswithcode.com/paper/learning-tasks-for-multitask-learning
Repo	https://github.com/mit-ddig/multitask-patients
Framework	tf

The Lyapunov Neural Network: Adaptive Stability Certification for Safe Learning of Dynamical Systems


Title	The Lyapunov Neural Network: Adaptive Stability Certification for Safe Learning of Dynamical Systems
Authors	Spencer M. Richards, Felix Berkenkamp, Andreas Krause
Abstract	Learning algorithms have shown considerable prowess in simulation by allowing robots to adapt to uncertain environments and improve their performance. However, such algorithms are rarely used in practice on safety-critical systems, since the learned policy typically does not yield any safety guarantees. That is, the required exploration may cause physical harm to the robot or its environment. In this paper, we present a method to learn accurate safety certificates for nonlinear, closed-loop dynamical systems. Specifically, we construct a neural network Lyapunov function and a training algorithm that adapts it to the shape of the largest safe region in the state space. The algorithm relies only on knowledge of inputs and outputs of the dynamics, rather than on any specific model structure. We demonstrate our method by learning the safe region of attraction for a simulated inverted pendulum. Furthermore, we discuss how our method can be used in safe learning algorithms together with statistical models of dynamical systems.
Tasks
Published	2018-08-02
URL	http://arxiv.org/abs/1808.00924v2
PDF	http://arxiv.org/pdf/1808.00924v2.pdf
PWC	https://paperswithcode.com/paper/the-lyapunov-neural-network-adaptive
Repo	https://github.com/befelix/safe_learning
Framework	tf

Consistent Individualized Feature Attribution for Tree Ensembles


Title	Consistent Individualized Feature Attribution for Tree Ensembles
Authors	Scott M. Lundberg, Gabriel G. Erion, Su-In Lee
Abstract	A unified approach to explain the output of any machine learning model.
Tasks
Published	2018-02-12
URL	http://arxiv.org/abs/1802.03888v3
PDF	http://arxiv.org/pdf/1802.03888v3.pdf
PWC	https://paperswithcode.com/paper/consistent-individualized-feature-attribution
Repo	https://github.com/saivarunr/xshap
Framework	tf

Mutual Information Maximization for Simple and Accurate Part-Of-Speech Induction


Title	Mutual Information Maximization for Simple and Accurate Part-Of-Speech Induction
Authors	Karl Stratos
Abstract	We address part-of-speech (POS) induction by maximizing the mutual information between the induced label and its context. We focus on two training objectives that are amenable to stochastic gradient descent (SGD): a novel generalization of the classical Brown clustering objective and a recently proposed variational lower bound. While both objectives are subject to noise in gradient updates, we show through analysis and experiments that the variational lower bound is robust whereas the generalized Brown objective is vulnerable. We obtain competitive performance on a multitude of datasets and languages with a simple architecture that encodes morphology and context.
Tasks
Published	2018-04-20
URL	http://arxiv.org/abs/1804.07849v4
PDF	http://arxiv.org/pdf/1804.07849v4.pdf
PWC	https://paperswithcode.com/paper/mutual-information-maximization-for-simple
Repo	https://github.com/karlstratos/mmi-tagger
Framework	pytorch

Guess Where? Actor-Supervision for Spatiotemporal Action Localization


Title	Guess Where? Actor-Supervision for Spatiotemporal Action Localization
Authors	Victor Escorcia, Cuong D. Dao, Mihir Jain, Bernard Ghanem, Cees Snoek
Abstract	This paper addresses the problem of spatiotemporal localization of actions in videos. Compared to leading approaches, which all learn to localize based on carefully annotated boxes on training video frames, we adhere to a weakly-supervised solution that only requires a video class label. We introduce an actor-supervised architecture that exploits the inherent compositionality of actions in terms of actor transformations, to localize actions. We make two contributions. First, we propose actor proposals derived from a detector for human and non-human actors intended for images, which is linked over time by Siamese similarity matching to account for actor deformations. Second, we propose an actor-based attention mechanism that enables the localization of the actions from action class labels and actor proposals and is end-to-end trainable. Experiments on three human and non-human action datasets show actor supervision is state-of-the-art for weakly-supervised action localization and is even competitive to some fully-supervised alternatives.
Tasks	Action Localization, Weakly Supervised Action Localization
Published	2018-04-05
URL	http://arxiv.org/abs/1804.01824v1
PDF	http://arxiv.org/pdf/1804.01824v1.pdf
PWC	https://paperswithcode.com/paper/guess-where-actor-supervision-for
Repo	https://github.com/escorciav/roi_pooling
Framework	pytorch

Low Latency Privacy Preserving Inference


Title	Low Latency Privacy Preserving Inference
Authors	Alon Brutzkus, Oren Elisha, Ran Gilad-Bachrach
Abstract	When applying machine learning to sensitive data, one has to find a balance between accuracy, information security, and computational-complexity. Recent studies combined Homomorphic Encryption with neural networks to make inferences while protecting against information leakage. However, these methods are limited by the width and depth of neural networks that can be used (and hence the accuracy) and exhibit high latency even for relatively simple networks. In this study we provide two solutions that address these limitations. In the first solution, we present more than $10\times$ improvement in latency and enable inference on wider networks compared to prior attempts with the same level of security. The improved performance is achieved by novel methods to represent the data during the computation. In the second solution, we apply the method of transfer learning to provide private inference services using deep networks with latency of $\sim0.16$ seconds. We demonstrate the efficacy of our methods on several computer vision tasks.
Tasks	Transfer Learning
Published	2018-12-27
URL	https://arxiv.org/abs/1812.10659v2
PDF	https://arxiv.org/pdf/1812.10659v2.pdf
PWC	https://paperswithcode.com/paper/low-latency-privacy-preserving-inference
Repo	https://github.com/microsoft/CryptoNets
Framework	none

Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than Character Level


Title	Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than Character Level
Authors	Lifeng Han, Shaohui Kuang
Abstract	In neural machine translation (NMT), researchers face the challenge of un-seen (or out-of-vocabulary OOV) words translation. To solve this, some researchers propose the splitting of western languages such as English and German into sub-words or compounds. In this paper, we try to address this OOV issue and improve the NMT adequacy with a harder language Chinese whose characters are even more sophisticated in composition. We integrate the Chinese radicals into the NMT model with different settings to address the unseen words challenge in Chinese to English translation. On the other hand, this also can be considered as semantic part of the MT system since the Chinese radicals usually carry the essential meaning of the words they are constructed in. Meaningful radicals and new characters can be integrated into the NMT systems with our models. We use an attention-based NMT system as a strong baseline system. The experiments on standard Chinese-to-English NIST translation shared task data 2006 and 2008 show that our designed models outperform the baseline model in a wide range of state-of-the-art evaluation metrics including LEPOR, BEER, and CharacTER, in addition to BLEU and NIST scores, especially on the adequacy-level translation. We also have some interesting findings from the results of our various experiment settings about the performance of words and characters in Chinese NMT, which is different with other languages. For instance, the fully character level NMT may perform well or the state of the art in some other languages as researchers demonstrated recently, however, in the Chinese NMT model, word boundary knowledge is important for the model learning.
Tasks	Machine Translation
Published	2018-05-03
URL	https://arxiv.org/abs/1805.01565v3
PDF	https://arxiv.org/pdf/1805.01565v3.pdf
PWC	https://paperswithcode.com/paper/apply-chinese-radicals-into-neural-machine
Repo	https://github.com/poethan/MWE4MT
Framework	none

L-Shapley and C-Shapley: Efficient Model Interpretation for Structured Data


Title	L-Shapley and C-Shapley: Efficient Model Interpretation for Structured Data
Authors	Jianbo Chen, Le Song, Martin J. Wainwright, Michael I. Jordan
Abstract	We study instancewise feature importance scoring as a method for model interpretation. Any such method yields, for each predicted instance, a vector of importance scores associated with the feature vector. Methods based on the Shapley score have been proposed as a fair way of computing feature attributions of this kind, but incur an exponential complexity in the number of features. This combinatorial explosion arises from the definition of the Shapley value and prevents these methods from being scalable to large data sets and complex models. We focus on settings in which the data have a graph structure, and the contribution of features to the target variable is well-approximated by a graph-structured factorization. In such settings, we develop two algorithms with linear complexity for instancewise feature importance scoring. We establish the relationship of our methods to the Shapley value and another closely related concept known as the Myerson value from cooperative game theory. We demonstrate on both language and image data that our algorithms compare favorably with other methods for model interpretation.
Tasks	Feature Importance
Published	2018-08-08
URL	http://arxiv.org/abs/1808.02610v1
PDF	http://arxiv.org/pdf/1808.02610v1.pdf
PWC	https://paperswithcode.com/paper/l-shapley-and-c-shapley-efficient-model
Repo	https://github.com/Jianbo-Lab/LCShapley
Framework	tf

Learning Multi-scale Features for Foreground Segmentation


Title	Learning Multi-scale Features for Foreground Segmentation
Authors	Long Ang Lim, Hacer Yalim Keles
Abstract	Foreground segmentation algorithms aim segmenting moving objects from the background in a robust way under various challenging scenarios. Encoder-decoder type deep neural networks that are used in this domain recently perform impressive segmentation results. In this work, we propose a novel robust encoder-decoder structure neural network that can be trained end-to-end using only a few training examples. The proposed method extends the Feature Pooling Module (FPM) of FgSegNet by introducing features fusions inside this module, which is capable of extracting multi-scale features within images; resulting in a robust feature pooling against camera motion, which can alleviate the need of multi-scale inputs to the network. Our method outperforms all existing state-of-the-art methods in CDnet2014 dataset by an average overall F-Measure of 0.9847. We also evaluate the effectiveness of our method on SBI2015 and UCSD Background Subtraction datasets. The source code of the proposed method is made available at https://github.com/lim-anggun/FgSegNet_v2 .
Tasks
Published	2018-08-04
URL	http://arxiv.org/abs/1808.01477v1
PDF	http://arxiv.org/pdf/1808.01477v1.pdf
PWC	https://paperswithcode.com/paper/learning-multi-scale-features-for-foreground
Repo	https://github.com/lim-anggun/FgSegNet_v2
Framework	tf