October 21, 2019

3278 words 16 mins read

Paper Group AWR 134

Paper Group AWR 134

Scale-aware multi-level guidance for interactive instance segmentation. Bidirectional Quaternion Long-Short Term Memory Recurrent Neural Networks for Speech Recognition. Learning to Summarize Radiology Findings. Deep Multimodal Image-Repurposing Detection. Reinforcement Learning for Relation Classification from Noisy Data. Security Analysis of Deep …

Scale-aware multi-level guidance for interactive instance segmentation

Title Scale-aware multi-level guidance for interactive instance segmentation
Authors Soumajit Majumder, Angela Yao
Abstract In interactive instance segmentation, users give feedback to iteratively refine segmentation masks. The user-provided clicks are transformed into guidance maps which provide the network with necessary cues on the whereabouts of the object of interest. Guidance maps used in current systems are purely distance-based and are either too localized or non-informative. We propose a novel transformation of user clicks to generate scale-aware guidance maps that leverage the hierarchical structural information present in an image. Using our guidance maps, even the most basic FCNs are able to outperform existing approaches that require state-of-the-art segmentation networks pre-trained on large scale segmentation datasets. We demonstrate the effectiveness of our proposed transformation strategy through comprehensive experimentation in which we significantly raise state-of-the-art on four standard interactive segmentation benchmarks.
Tasks Instance Segmentation, Interactive Segmentation, Semantic Segmentation
Published 2018-12-07
URL http://arxiv.org/abs/1812.02967v1
PDF http://arxiv.org/pdf/1812.02967v1.pdf
PWC https://paperswithcode.com/paper/scale-aware-multi-level-guidance-for
Repo https://github.com/sm176357/mlg
Framework none

Bidirectional Quaternion Long-Short Term Memory Recurrent Neural Networks for Speech Recognition

Title Bidirectional Quaternion Long-Short Term Memory Recurrent Neural Networks for Speech Recognition
Authors Titouan Parcollet, Mohamed Morchid, Georges Linarès, Renato De Mori
Abstract Recurrent neural networks (RNN) are at the core of modern automatic speech recognition (ASR) systems. In particular, long-short term memory (LSTM) recurrent neural networks have achieved state-of-the-art results in many speech recognition tasks, due to their efficient representation of long and short term dependencies in sequences of inter-dependent features. Nonetheless, internal dependencies within the element composing multidimensional features are weakly considered by traditional real-valued representations. We propose a novel quaternion long-short term memory (QLSTM) recurrent neural network that takes into account both the external relations between the features composing a sequence, and these internal latent structural dependencies with the quaternion algebra. QLSTMs are compared to LSTMs during a memory copy-task and a realistic application of speech recognition on the Wall Street Journal (WSJ) dataset. QLSTM reaches better performances during the two experiments with up to $2.8$ times less learning parameters, leading to a more expressive representation of the information.
Tasks Speech Recognition
Published 2018-11-06
URL http://arxiv.org/abs/1811.02566v1
PDF http://arxiv.org/pdf/1811.02566v1.pdf
PWC https://paperswithcode.com/paper/bidirectional-quaternion-long-short-term
Repo https://github.com/mravanelli/pytorch-kaldi
Framework pytorch

Learning to Summarize Radiology Findings

Title Learning to Summarize Radiology Findings
Authors Yuhao Zhang, Daisy Yi Ding, Tianpei Qian, Christopher D. Manning, Curtis P. Langlotz
Abstract The Impression section of a radiology report summarizes crucial radiology findings in natural language and plays a central role in communicating these findings to physicians. However, the process of generating impressions by summarizing findings is time-consuming for radiologists and prone to errors. We propose to automate the generation of radiology impressions with neural sequence-to-sequence learning. We further propose a customized neural model for this task which learns to encode the study background information and use this information to guide the decoding process. On a large dataset of radiology reports collected from actual hospital studies, our model outperforms existing non-neural and neural baselines under the ROUGE metrics. In a blind experiment, a board-certified radiologist indicated that 67% of sampled system summaries are at least as good as the corresponding human-written summaries, suggesting significant clinical validity. To our knowledge our work represents the first attempt in this direction.
Tasks
Published 2018-09-12
URL http://arxiv.org/abs/1809.04698v2
PDF http://arxiv.org/pdf/1809.04698v2.pdf
PWC https://paperswithcode.com/paper/learning-to-summarize-radiology-findings
Repo https://github.com/abhishekr7/Summarization-of-Radiological-Reports
Framework none

Deep Multimodal Image-Repurposing Detection

Title Deep Multimodal Image-Repurposing Detection
Authors Ekraam Sabir, Wael AbdAlmageed, Yue Wu, Prem Natarajan
Abstract Nefarious actors on social media and other platforms often spread rumors and falsehoods through images whose metadata (e.g., captions) have been modified to provide visual substantiation of the rumor/falsehood. This type of modification is referred to as image repurposing, in which often an unmanipulated image is published along with incorrect or manipulated metadata to serve the actor’s ulterior motives. We present the Multimodal Entity Image Repurposing (MEIR) dataset, a substantially challenging dataset over that which has been previously available to support research into image repurposing detection. The new dataset includes location, person, and organization manipulations on real-world data sourced from Flickr. We also present a novel, end-to-end, deep multimodal learning model for assessing the integrity of an image by combining information extracted from the image with related information from a knowledge base. The proposed method is compared against state-of-the-art techniques on existing datasets as well as MEIR, where it outperforms existing methods across the board, with AUC improvement up to 0.23.
Tasks
Published 2018-08-20
URL http://arxiv.org/abs/1808.06686v1
PDF http://arxiv.org/pdf/1808.06686v1.pdf
PWC https://paperswithcode.com/paper/deep-multimodal-image-repurposing-detection
Repo https://github.com/Ekraam/MEIR
Framework tf

Reinforcement Learning for Relation Classification from Noisy Data

Title Reinforcement Learning for Relation Classification from Noisy Data
Authors Jun Feng, Minlie Huang, Li Zhao, Yang Yang, Xiaoyan Zhu
Abstract Existing relation classification methods that rely on distant supervision assume that a bag of sentences mentioning an entity pair are all describing a relation for the entity pair. Such methods, performing classification at the bag level, cannot identify the mapping between a relation and a sentence, and largely suffers from the noisy labeling problem. In this paper, we propose a novel model for relation classification at the sentence level from noisy data. The model has two modules: an instance selector and a relation classifier. The instance selector chooses high-quality sentences with reinforcement learning and feeds the selected sentences into the relation classifier, and the relation classifier makes sentence level prediction and provides rewards to the instance selector. The two modules are trained jointly to optimize the instance selection and relation classification processes. Experiment results show that our model can deal with the noise of data effectively and obtains better performance for relation classification at the sentence level.
Tasks Relation Classification
Published 2018-08-24
URL http://arxiv.org/abs/1808.08013v1
PDF http://arxiv.org/pdf/1808.08013v1.pdf
PWC https://paperswithcode.com/paper/reinforcement-learning-for-relation
Repo https://github.com/unreliableXu/TensorFlow_RLRE
Framework tf

Security Analysis of Deep Neural Networks Operating in the Presence of Cache Side-Channel Attacks

Title Security Analysis of Deep Neural Networks Operating in the Presence of Cache Side-Channel Attacks
Authors Sanghyun Hong, Michael Davinroy, Yiǧitcan Kaya, Stuart Nevans Locke, Ian Rackow, Kevin Kulda, Dana Dachman-Soled, Tudor Dumitraş
Abstract Recent work has introduced attacks that extract the architecture information of deep neural networks (DNN), as this knowledge enhances an adversary’s capability to conduct black-box attacks against the model. This paper presents the first in-depth security analysis of DNN fingerprinting attacks that exploit cache side-channels. First, we define the threat model for these attacks: our adversary does not need the ability to query the victim model; instead, she runs a co-located process on the host machine victim’s deep learning (DL) system is running and passively monitors the accesses of the target functions in the shared framework. Second, we introduce DeepRecon, an attack that reconstructs the architecture of the victim network by using the internal information extracted via Flush+Reload, a cache side-channel technique. Once the attacker observes function invocations that map directly to architecture attributes of the victim network, the attacker can reconstruct the victim’s entire network architecture. In our evaluation, we demonstrate that an attacker can accurately reconstruct two complex networks (VGG19 and ResNet50) having observed only one forward propagation. Based on the extracted architecture attributes, we also demonstrate that an attacker can build a meta-model that accurately fingerprints the architecture and family of the pre-trained model in a transfer learning setting. From this meta-model, we evaluate the importance of the observed attributes in the fingerprinting process. Third, we propose and evaluate new framework-level defense techniques that obfuscate our attacker’s observations. Our empirical security analysis represents a step toward understanding the DNNs’ vulnerability to cache side-channel attacks.
Tasks Transfer Learning
Published 2018-10-08
URL https://arxiv.org/abs/1810.03487v4
PDF https://arxiv.org/pdf/1810.03487v4.pdf
PWC https://paperswithcode.com/paper/security-analysis-of-deep-neural-networks
Repo https://github.com/Sanghyun-Hong/DeepRecon
Framework tf

Dynamic Natural Language Processing with Recurrence Quantification Analysis

Title Dynamic Natural Language Processing with Recurrence Quantification Analysis
Authors Rick Dale, Nicholas D. Duran, Moreno Coco
Abstract Writing and reading are dynamic processes. As an author composes a text, a sequence of words is produced. This sequence is one that, the author hopes, causes a revisitation of certain thoughts and ideas in others. These processes of composition and revisitation by readers are ordered in time. This means that text itself can be investigated under the lens of dynamical systems. A common technique for analyzing the behavior of dynamical systems, known as recurrence quantification analysis (RQA), can be used as a method for analyzing sequential structure of text. RQA treats text as a sequential measurement, much like a time series, and can thus be seen as a kind of dynamic natural language processing (NLP). The extension has several benefits. Because it is part of a suite of time series analysis tools, many measures can be extracted in one common framework. Secondly, the measures have a close relationship with some commonly used measures from natural language processing. Finally, using recurrence analysis offers an opportunity expand analysis of text by developing theoretical descriptions derived from complex dynamic systems. We showcase an example analysis on 8,000 texts from the Gutenberg Project, compare it to well-known NLP approaches, and describe an R package (crqanlp) that can be used in conjunction with R library crqa.
Tasks Time Series, Time Series Analysis
Published 2018-03-19
URL http://arxiv.org/abs/1803.07136v1
PDF http://arxiv.org/pdf/1803.07136v1.pdf
PWC https://paperswithcode.com/paper/dynamic-natural-language-processing-with
Repo https://github.com/racdale/crqanlp
Framework none

Supervision-by-Registration: An Unsupervised Approach to Improve the Precision of Facial Landmark Detectors

Title Supervision-by-Registration: An Unsupervised Approach to Improve the Precision of Facial Landmark Detectors
Authors Xuanyi Dong, Shoou-I Yu, Xinshuo Weng, Shih-En Wei, Yi Yang, Yaser Sheikh
Abstract In this paper, we present supervision-by-registration, an unsupervised approach to improve the precision of facial landmark detectors on both images and video. Our key observation is that the detections of the same landmark in adjacent frames should be coherent with registration, i.e., optical flow. Interestingly, the coherency of optical flow is a source of supervision that does not require manual labeling, and can be leveraged during detector training. For example, we can enforce in the training loss function that a detected landmark at frame$_{t-1}$ followed by optical flow tracking from frame$_{t-1}$ to frame$_t$ should coincide with the location of the detection at frame$_t$. Essentially, supervision-by-registration augments the training loss function with a registration loss, thus training the detector to have output that is not only close to the annotations in labeled images, but also consistent with registration on large amounts of unlabeled videos. End-to-end training with the registration loss is made possible by a differentiable Lucas-Kanade operation, which computes optical flow registration in the forward pass, and back-propagates gradients that encourage temporal coherency in the detector. The output of our method is a more precise image-based facial landmark detector, which can be applied to single images or video. With supervision-by-registration, we demonstrate (1) improvements in facial landmark detection on both images (300W, ALFW) and video (300VW, Youtube-Celebrities), and (2) significant reduction of jittering in video detections.
Tasks Facial Landmark Detection, Optical Flow Estimation
Published 2018-07-03
URL http://arxiv.org/abs/1807.00966v2
PDF http://arxiv.org/pdf/1807.00966v2.pdf
PWC https://paperswithcode.com/paper/supervision-by-registration-an-unsupervised
Repo https://github.com/facebookresearch/supervision-by-registration
Framework pytorch

Model Agnostic Supervised Local Explanations

Title Model Agnostic Supervised Local Explanations
Authors Gregory Plumb, Denali Molitor, Ameet Talwalkar
Abstract Model interpretability is an increasingly important component of practical machine learning. Some of the most common forms of interpretability systems are example-based, local, and global explanations. One of the main challenges in interpretability is designing explanation systems that can capture aspects of each of these explanation types, in order to develop a more thorough understanding of the model. We address this challenge in a novel model called MAPLE that uses local linear modeling techniques along with a dual interpretation of random forests (both as a supervised neighborhood approach and as a feature selection method). MAPLE has two fundamental advantages over existing interpretability systems. First, while it is effective as a black-box explanation system, MAPLE itself is a highly accurate predictive model that provides faithful self explanations, and thus sidesteps the typical accuracy-interpretability trade-off. Specifically, we demonstrate, on several UCI datasets, that MAPLE is at least as accurate as random forests and that it produces more faithful local explanations than LIME, a popular interpretability system. Second, MAPLE provides both example-based and local explanations and can detect global patterns, which allows it to diagnose limitations in its local explanations.
Tasks Feature Selection
Published 2018-07-09
URL http://arxiv.org/abs/1807.02910v3
PDF http://arxiv.org/pdf/1807.02910v3.pdf
PWC https://paperswithcode.com/paper/model-agnostic-supervised-local-explanations
Repo https://github.com/GDPlumb/MAPLE
Framework none

ECG Heartbeat Classification: A Deep Transferable Representation

Title ECG Heartbeat Classification: A Deep Transferable Representation
Authors Mohammad Kachuee, Shayan Fazeli, Majid Sarrafzadeh
Abstract Electrocardiogram (ECG) can be reliably used as a measure to monitor the functionality of the cardiovascular system. Recently, there has been a great attention towards accurate categorization of heartbeats. While there are many commonalities between different ECG conditions, the focus of most studies has been classifying a set of conditions on a dataset annotated for that task rather than learning and employing a transferable knowledge between different tasks. In this paper, we propose a method based on deep convolutional neural networks for the classification of heartbeats which is able to accurately classify five different arrhythmias in accordance with the AAMI EC57 standard. Furthermore, we suggest a method for transferring the knowledge acquired on this task to the myocardial infarction (MI) classification task. We evaluated the proposed method on PhysionNet’s MIT-BIH and PTB Diagnostics datasets. According to the results, the suggested method is able to make predictions with the average accuracies of 93.4% and 95.9% on arrhythmia classification and MI classification, respectively.
Tasks Arrhythmia Detection, Electrocardiography (ECG), Heartbeat Classification, Myocardial infarction detection
Published 2018-04-19
URL http://arxiv.org/abs/1805.00794v2
PDF http://arxiv.org/pdf/1805.00794v2.pdf
PWC https://paperswithcode.com/paper/ecg-heartbeat-classification-a-deep
Repo https://github.com/ljleeworking/4-Heartbeat-Categorization-from-ECG-Signal
Framework none

Cost-Sensitive Robustness against Adversarial Examples

Title Cost-Sensitive Robustness against Adversarial Examples
Authors Xiao Zhang, David Evans
Abstract Several recent works have developed methods for training classifiers that are certifiably robust against norm-bounded adversarial perturbations. These methods assume that all the adversarial transformations are equally important, which is seldom the case in real-world applications. We advocate for cost-sensitive robustness as the criteria for measuring the classifier’s performance for tasks where some adversarial transformation are more important than others. We encode the potential harm of each adversarial transformation in a cost matrix, and propose a general objective function to adapt the robust training method of Wong & Kolter (2018) to optimize for cost-sensitive robustness. Our experiments on simple MNIST and CIFAR10 models with a variety of cost matrices show that the proposed approach can produce models with substantially reduced cost-sensitive robust error, while maintaining classification accuracy.
Tasks
Published 2018-10-22
URL http://arxiv.org/abs/1810.09225v2
PDF http://arxiv.org/pdf/1810.09225v2.pdf
PWC https://paperswithcode.com/paper/cost-sensitive-robustness-against-adversarial
Repo https://github.com/xiaozhanguva/Cost-Sensitive-Robustness
Framework pytorch

ResNets Ensemble via the Feynman-Kac Formalism to Improve Natural and Robust Accuracies

Title ResNets Ensemble via the Feynman-Kac Formalism to Improve Natural and Robust Accuracies
Authors Bao Wang, Binjie Yuan, Zuoqiang Shi, Stanley J. Osher
Abstract Empirical adversarial risk minimization (EARM) is a widely used mathematical framework to robustly train deep neural nets (DNNs) that are resistant to adversarial attacks. However, both natural and robust accuracies, in classifying clean and adversarial images, respectively, of the trained robust models are far from satisfactory. In this work, we unify the theory of optimal control of transport equations with the practice of training and testing of ResNets. Based on this unified viewpoint, we propose a simple yet effective ResNets ensemble algorithm to boost the accuracy of the robustly trained model on both clean and adversarial images. The proposed algorithm consists of two components: First, we modify the base ResNets by injecting a variance specified Gaussian noise to the output of each residual mapping. Second, we average over the production of multiple jointly trained modified ResNets to get the final prediction. These two steps give an approximation to the Feynman-Kac formula for representing the solution of a transport equation with viscosity, or a convection-diffusion equation. For the CIFAR10 benchmark, this simple algorithm leads to a robust model with a natural accuracy of {\bf 85.62}% on clean images and a robust accuracy of ${\bf 57.94 %}$ under the 20 iterations of the IFGSM attack, which outperforms the current state-of-the-art in defending against IFGSM attack on the CIFAR10. Both natural and robust accuracies of the proposed ResNets ensemble can be improved dynamically as the building block ResNet advances. The code is available at: \url{https://github.com/BaoWangMath/EnResNet}.
Tasks Adversarial Attack, Adversarial Defense
Published 2018-11-26
URL https://arxiv.org/abs/1811.10745v2
PDF https://arxiv.org/pdf/1811.10745v2.pdf
PWC https://paperswithcode.com/paper/enresnet-resnet-ensemble-via-the-feynman-kac
Repo https://github.com/BaoWangMath/Graph-Structured-Recurrent-Neural-Nets-
Framework tf

Modeling Visual Context is Key to Augmenting Object Detection Datasets

Title Modeling Visual Context is Key to Augmenting Object Detection Datasets
Authors Nikita Dvornik, Julien Mairal, Cordelia Schmid
Abstract Performing data augmentation for learning deep neural networks is well known to be important for training visual recognition systems. By artificially increasing the number of training examples, it helps reducing overfitting and improves generalization. For object detection, classical approaches for data augmentation consist of generating images obtained by basic geometrical transformations and color changes of original training images. In this work, we go one step further and leverage segmentation annotations to increase the number of object instances present on training data. For this approach to be successful, we show that modeling appropriately the visual context surrounding objects is crucial to place them in the right environment. Otherwise, we show that the previous strategy actually hurts. With our context model, we achieve significant mean average precision improvements when few labeled examples are available on the VOC’12 benchmark.
Tasks Data Augmentation, Object Detection
Published 2018-07-19
URL http://arxiv.org/abs/1807.07428v1
PDF http://arxiv.org/pdf/1807.07428v1.pdf
PWC https://paperswithcode.com/paper/modeling-visual-context-is-key-to-augmenting
Repo https://github.com/dvornikita/context_aug
Framework none

Learning Hierarchical Semantic Image Manipulation through Structured Representations

Title Learning Hierarchical Semantic Image Manipulation through Structured Representations
Authors Seunghoon Hong, Xinchen Yan, Thomas Huang, Honglak Lee
Abstract Understanding, reasoning, and manipulating semantic concepts of images have been a fundamental research problem for decades. Previous work mainly focused on direct manipulation on natural image manifold through color strokes, key-points, textures, and holes-to-fill. In this work, we present a novel hierarchical framework for semantic image manipulation. Key to our hierarchical framework is that we employ a structured semantic layout as our intermediate representation for manipulation. Initialized with coarse-level bounding boxes, our structure generator first creates pixel-wise semantic layout capturing the object shape, object-object interactions, and object-scene relations. Then our image generator fills in the pixel-level textures guided by the semantic layout. Such framework allows a user to manipulate images at object-level by adding, removing, and moving one bounding box at a time. Experimental evaluations demonstrate the advantages of the hierarchical manipulation framework over existing image generation and context hole-filing models, both qualitatively and quantitatively. Benefits of the hierarchical framework are further demonstrated in applications such as semantic object manipulation, interactive image editing, and data-driven image manipulation.
Tasks Image Generation
Published 2018-08-22
URL http://arxiv.org/abs/1808.07535v2
PDF http://arxiv.org/pdf/1808.07535v2.pdf
PWC https://paperswithcode.com/paper/learning-hierarchical-semantic-image
Repo https://github.com/xcyan/neurips18_hierchical_image_manipulation
Framework pytorch

Compressing the Input for CNNs with the First-Order Scattering Transform

Title Compressing the Input for CNNs with the First-Order Scattering Transform
Authors Edouard Oyallon, Eugene Belilovsky, Sergey Zagoruyko, Michal Valko
Abstract We study the first-order scattering transform as a candidate for reducing the signal processed by a convolutional neural network (CNN). We show theoretical and empirical evidence that in the case of natural images and sufficiently small translation invariance, this transform preserves most of the signal information needed for classification while substantially reducing the spatial resolution and total signal size. We demonstrate that cascading a CNN with this representation performs on par with ImageNet classification models, commonly used in downstream tasks, such as the ResNet-50. We subsequently apply our trained hybrid ImageNet model as a base model on a detection system, which has typically larger image inputs. On Pascal VOC and COCO detection tasks we demonstrate improvements in the inference speed and training memory consumption compared to models trained directly on the input image.
Tasks
Published 2018-09-27
URL http://arxiv.org/abs/1809.10200v1
PDF http://arxiv.org/pdf/1809.10200v1.pdf
PWC https://paperswithcode.com/paper/compressing-the-input-for-cnns-with-the-first
Repo https://github.com/edouardoyallon/pyscatlight
Framework pytorch
comments powered by Disqus