October 20, 2019

3171 words 15 mins read

Paper Group AWR 292

Stronger Data Poisoning Attacks Break Data Sanitization Defenses. Fast, Accurate, and Lightweight Super-Resolution with Cascading Residual Network. AI Blue Book: Vehicle Price Prediction using Visual Features. Generating Diverse and Meaningful Captions. DeLS-3D: Deep Localization and Segmentation with a 3D Semantic Map. Counterexample-Guided Data A …

Stronger Data Poisoning Attacks Break Data Sanitization Defenses


Title	Stronger Data Poisoning Attacks Break Data Sanitization Defenses
Authors	Pang Wei Koh, Jacob Steinhardt, Percy Liang
Abstract	Machine learning models trained on data from the outside world can be corrupted by data poisoning attacks that inject malicious points into the models’ training sets. A common defense against these attacks is data sanitization: first filter out anomalous training points before training the model. Can data poisoning attacks break data sanitization defenses? In this paper, we develop three new attacks that can all bypass a broad range of data sanitization defenses, including commonly-used anomaly detectors based on nearest neighbors, training loss, and singular-value decomposition. For example, our attacks successfully increase the test error on the Enron spam detection dataset from 3% to 24% and on the IMDB sentiment classification dataset from 12% to 29% by adding just 3% poisoned data. In contrast, many existing attacks from the literature do not explicitly consider defenses, and we show that those attacks are ineffective in the presence of the defenses we consider. Our attacks are based on two ideas: (i) we coordinate our attacks to place poisoned points near one another, which fools some anomaly detectors, and (ii) we formulate each attack as a constrained optimization problem, with constraints designed to ensure that the poisoned points evade detection. While this optimization involves solving an expensive bilevel problem, we explore and develop three efficient approximations to this problem based on influence functions; minimax duality; and the Karush-Kuhn-Tucker (KKT) conditions. Our results underscore the urgent need to develop more sophisticated and robust defenses against data poisoning attacks.
Tasks	data poisoning, Sentiment Analysis
Published	2018-11-02
URL	http://arxiv.org/abs/1811.00741v1
PDF	http://arxiv.org/pdf/1811.00741v1.pdf
PWC	https://paperswithcode.com/paper/stronger-data-poisoning-attacks-break-data
Repo	https://github.com/kohpangwei/data-poisoning-journal-release
Framework	tf

Fast, Accurate, and Lightweight Super-Resolution with Cascading Residual Network


Title	Fast, Accurate, and Lightweight Super-Resolution with Cascading Residual Network
Authors	Namhyuk Ahn, Byungkon Kang, Kyung-Ah Sohn
Abstract	In recent years, deep learning methods have been successfully applied to single-image super-resolution tasks. Despite their great performances, deep learning methods cannot be easily applied to real-world applications due to the requirement of heavy computation. In this paper, we address this issue by proposing an accurate and lightweight deep network for image super-resolution. In detail, we design an architecture that implements a cascading mechanism upon a residual network. We also present variant models of the proposed cascading residual network to further improve efficiency. Our extensive experiments show that even with much fewer parameters and operations, our models achieve performance comparable to that of state-of-the-art methods.
Tasks	Image Super-Resolution, Super-Resolution
Published	2018-03-23
URL	http://arxiv.org/abs/1803.08664v5
PDF	http://arxiv.org/pdf/1803.08664v5.pdf
PWC	https://paperswithcode.com/paper/fast-accurate-and-lightweight-super-1
Repo	https://github.com/godpgf/scarn
Framework	pytorch

AI Blue Book: Vehicle Price Prediction using Visual Features


Title	AI Blue Book: Vehicle Price Prediction using Visual Features
Authors	Richard R. Yang, Steven Chen, Edward Chou
Abstract	In this work, we build a series of machine learning models to predict the price of a product given its image, and visualize the features that result in higher or lower price predictions. We collect two novel datasets of product images and their MSRP prices for this purpose: a bicycle dataset and a car dataset. We set baselines for price regression using linear regression on histogram of oriented gradients (HOG) and convolutional neural network (CNN) features, and a baseline for price segment classification using a multiclass SVM. For our main models, we train several deep CNNs using both transfer learning and our own architectures, for both regression and classification. We achieve strong results on both datasets, with deep CNNs significantly outperforming other models in a variety of metrics. Finally, we use several recently-developed methods to visualize the image features that result in higher or lower prices.
Tasks	Transfer Learning
Published	2018-03-29
URL	http://arxiv.org/abs/1803.11227v2
PDF	http://arxiv.org/pdf/1803.11227v2.pdf
PWC	https://paperswithcode.com/paper/ai-blue-book-vehicle-price-prediction-using
Repo	https://github.com/richardyang/AI-blue-book
Framework	none

Generating Diverse and Meaningful Captions


Title	Generating Diverse and Meaningful Captions
Authors	Annika Lindh, Robert J. Ross, Abhijit Mahalunkar, Giancarlo Salton, John D. Kelleher
Abstract	Image Captioning is a task that requires models to acquire a multi-modal understanding of the world and to express this understanding in natural language text. While the state-of-the-art for this task has rapidly improved in terms of n-gram metrics, these models tend to output the same generic captions for similar images. In this work, we address this limitation and train a model that generates more diverse and specific captions through an unsupervised training approach that incorporates a learning signal from an Image Retrieval model. We summarize previous results and improve the state-of-the-art on caption diversity and novelty. We make our source code publicly available online.
Tasks	Image Captioning, Image Retrieval
Published	2018-12-19
URL	http://arxiv.org/abs/1812.08126v1
PDF	http://arxiv.org/pdf/1812.08126v1.pdf
PWC	https://paperswithcode.com/paper/generating-diverse-and-meaningful-captions
Repo	https://github.com/AnnikaLindh/Diverse_and_Specific_Image_Captioning
Framework	pytorch

DeLS-3D: Deep Localization and Segmentation with a 3D Semantic Map


Title	DeLS-3D: Deep Localization and Segmentation with a 3D Semantic Map
Authors	Peng Wang, Ruigang Yang, Binbin Cao, Wei Xu, Yuanqing Lin
Abstract	For applications such as autonomous driving, self-localization/camera pose estimation and scene parsing are crucial technologies. In this paper, we propose a unified framework to tackle these two problems simultaneously. The uniqueness of our design is a sensor fusion scheme which integrates camera videos, motion sensors (GPS/IMU), and a 3D semantic map in order to achieve robustness and efficiency of the system. Specifically, we first have an initial coarse camera pose obtained from consumer-grade GPS/IMU, based on which a label map can be rendered from the 3D semantic map. Then, the rendered label map and the RGB image are jointly fed into a pose CNN, yielding a corrected camera pose. In addition, to incorporate temporal information, a multi-layer recurrent neural network (RNN) is further deployed improve the pose accuracy. Finally, based on the pose from RNN, we render a new label map, which is fed together with the RGB image into a segment CNN which produces per-pixel semantic label. In order to validate our approach, we build a dataset with registered 3D point clouds and video camera images. Both the point clouds and the images are semantically-labeled. Each video frame has ground truth pose from highly accurate motion sensors. We show that practically, pose estimation solely relying on images like PoseNet may fail due to street view confusion, and it is important to fuse multiple sensors. Finally, various ablation studies are performed, which demonstrate the effectiveness of the proposed system. In particular, we show that scene parsing and pose estimation are mutually beneficial to achieve a more robust and accurate system.
Tasks	Autonomous Driving, Pose Estimation, Scene Parsing, Sensor Fusion
Published	2018-05-13
URL	http://arxiv.org/abs/1805.04949v1
PDF	http://arxiv.org/pdf/1805.04949v1.pdf
PWC	https://paperswithcode.com/paper/dels-3d-deep-localization-and-segmentation
Repo	https://github.com/pengwangucla/DeLS-3D
Framework	tf

Counterexample-Guided Data Augmentation


Title	Counterexample-Guided Data Augmentation
Authors	Tommaso Dreossi, Shromona Ghosh, Xiangyu Yue, Kurt Keutzer, Alberto Sangiovanni-Vincentelli, Sanjit A. Seshia
Abstract	We present a novel framework for augmenting data sets for machine learning based on counterexamples. Counterexamples are misclassified examples that have important properties for retraining and improving the model. Key components of our framework include a counterexample generator, which produces data items that are misclassified by the model and error tables, a novel data structure that stores information pertaining to misclassifications. Error tables can be used to explain the model’s vulnerabilities and are used to efficiently generate counterexamples for augmentation. We show the efficacy of the proposed framework by comparing it to classical augmentation techniques on a case study of object detection in autonomous driving based on deep neural networks.
Tasks	Autonomous Driving, Data Augmentation, Object Detection
Published	2018-05-17
URL	http://arxiv.org/abs/1805.06962v1
PDF	http://arxiv.org/pdf/1805.06962v1.pdf
PWC	https://paperswithcode.com/paper/counterexample-guided-data-augmentation
Repo	https://github.com/BerkeleyLearnVerify/VerifAI
Framework	tf

Fully Convolutional Network Ensembles for White Matter Hyperintensities Segmentation in MR Images


Title	Fully Convolutional Network Ensembles for White Matter Hyperintensities Segmentation in MR Images
Authors	Hongwei Li, Gongfa Jiang, Jianguo Zhang, Ruixuan Wang, Zhaolei Wang, Wei-Shi Zheng, Bjoern Menze
Abstract	White matter hyperintensities (WMH) are commonly found in the brains of healthy elderly individuals and have been associated with various neurological and geriatric disorders. In this paper, we present a study using deep fully convolutional network and ensemble models to automatically detect such WMH using fluid attenuation inversion recovery (FLAIR) and T1 magnetic resonance (MR) scans. The algorithm was evaluated and ranked 1 st in the WMH Segmentation Challenge at MICCAI 2017. In the evaluation stage, the implementation of the algorithm was submitted to the challenge organizers, who then independently tested it on a hidden set of 110 cases from 5 scanners. Averaged dice score, precision and robust Hausdorff distance obtained on held-out test datasets were 80%, 84% and 6.30mm respectively. These were the highest achieved in the challenge, suggesting the proposed method is the state-of-the-art. In this paper, we provide detailed descriptions and quantitative analysis on key components of the system. Furthermore, a study of cross-scanner evaluation is presented to discuss how the combination of modalities and data augmentation affect the generalization capability of the system. The adaptability of the system to different scanners and protocols is also investigated. A quantitative study is further presented to test the effect of ensemble size. Additionally, software and models of our method are made publicly available. The effectiveness and generalization capability of the proposed system show its potential for real-world clinical practice.
Tasks	Data Augmentation
Published	2018-02-14
URL	http://arxiv.org/abs/1802.05203v3
PDF	http://arxiv.org/pdf/1802.05203v3.pdf
PWC	https://paperswithcode.com/paper/fully-convolutional-network-ensembles-for
Repo	https://github.com/labhstats/lbhs_wmh_seg_manuals
Framework	tf

A Dataset and Architecture for Visual Reasoning with a Working Memory


Title	A Dataset and Architecture for Visual Reasoning with a Working Memory
Authors	Guangyu Robert Yang, Igor Ganichev, Xiao-Jing Wang, Jonathon Shlens, David Sussillo
Abstract	A vexing problem in artificial intelligence is reasoning about events that occur in complex, changing visual stimuli such as in video analysis or game play. Inspired by a rich tradition of visual reasoning and memory in cognitive psychology and neuroscience, we developed an artificial, configurable visual question and answer dataset (COG) to parallel experiments in humans and animals. COG is much simpler than the general problem of video analysis, yet it addresses many of the problems relating to visual and logical reasoning and memory – problems that remain challenging for modern deep learning architectures. We additionally propose a deep learning architecture that performs competitively on other diagnostic VQA datasets (i.e. CLEVR) as well as easy settings of the COG dataset. However, several settings of COG result in datasets that are progressively more challenging to learn. After training, the network can zero-shot generalize to many new tasks. Preliminary analyses of the network architectures trained on COG demonstrate that the network accomplishes the task in a manner interpretable to humans.
Tasks	Visual Question Answering, Visual Reasoning
Published	2018-03-16
URL	http://arxiv.org/abs/1803.06092v2
PDF	http://arxiv.org/pdf/1803.06092v2.pdf
PWC	https://paperswithcode.com/paper/a-dataset-and-architecture-for-visual
Repo	https://github.com/google/cog
Framework	tf

Multimodal Unsupervised Image-to-Image Translation


Title	Multimodal Unsupervised Image-to-Image Translation
Authors	Xun Huang, Ming-Yu Liu, Serge Belongie, Jan Kautz
Abstract	Unsupervised image-to-image translation is an important and challenging problem in computer vision. Given an image in the source domain, the goal is to learn the conditional distribution of corresponding images in the target domain, without seeing any pairs of corresponding images. While this conditional distribution is inherently multimodal, existing approaches make an overly simplified assumption, modeling it as a deterministic one-to-one mapping. As a result, they fail to generate diverse outputs from a given source domain image. To address this limitation, we propose a Multimodal Unsupervised Image-to-image Translation (MUNIT) framework. We assume that the image representation can be decomposed into a content code that is domain-invariant, and a style code that captures domain-specific properties. To translate an image to another domain, we recombine its content code with a random style code sampled from the style space of the target domain. We analyze the proposed framework and establish several theoretical results. Extensive experiments with comparisons to the state-of-the-art approaches further demonstrates the advantage of the proposed framework. Moreover, our framework allows users to control the style of translation outputs by providing an example style image. Code and pretrained models are available at https://github.com/nvlabs/MUNIT
Tasks	Image-to-Image Translation, Multimodal Unsupervised Image-To-Image Translation, Unsupervised Image-To-Image Translation
Published	2018-04-12
URL	http://arxiv.org/abs/1804.04732v2
PDF	http://arxiv.org/pdf/1804.04732v2.pdf
PWC	https://paperswithcode.com/paper/multimodal-unsupervised-image-to-image
Repo	https://github.com/taki0112/MUNIT-Tensorflow
Framework	tf

Compressed Sensing of Scanning Transmission Electron Microscopy (STEM) on Non-Rectangular Scans


Title	Compressed Sensing of Scanning Transmission Electron Microscopy (STEM) on Non-Rectangular Scans
Authors	Xin Li, Ondrej Dyck, Sergei V. Kalinin, Stephen Jesse
Abstract	Scanning Transmission Electron Microscopy (STEM) has become the main stay for materials characterization on atomic level, with applications ranging from visualization of localized and extended defects to mapping order parameter fields. In the last several years, attention was attracted by potential of STEM to explore beam induced chemical processes and especially manipulating atomic motion, enabling atom-by-atom fabrication. These applications, as well as traditional imaging of beam sensitive materials, necessitate increasing dynamic range of STEM between imaging and manipulation modes, and increasing absolute scanning/imaging speeds, that can be achieved by combining sparse sensing methods with non-rectangular scanning trajectories. Here we developed a general method for real-time reconstruction of sparsely sampled images from high-speed, non-invasive and diverse scanning pathways. This approach is demonstrated on both the synthetic data where ground truth is known and the experimental STEM data. This work lays the foundation for future tasks such as optimal design of dose efficient scanning strategies and real-time adaptive inference and control of e-beam induced atomic fabrication.
Tasks
Published	2018-05-13
URL	http://arxiv.org/abs/1805.04957v3
PDF	http://arxiv.org/pdf/1805.04957v3.pdf
PWC	https://paperswithcode.com/paper/compressed-sensing-of-scanning-transmission
Repo	https://github.com/nonmin/RTSSTEM
Framework	none

Mapping Natural Language Commands to Web Elements


Title	Mapping Natural Language Commands to Web Elements
Authors	Panupong Pasupat, Tian-Shun Jiang, Evan Zheran Liu, Kelvin Guu, Percy Liang
Abstract	The web provides a rich, open-domain environment with textual, structural, and spatial properties. We propose a new task for grounding language in this environment: given a natural language command (e.g., “click on the second article”), choose the correct element on the web page (e.g., a hyperlink or text box). We collected a dataset of over 50,000 commands that capture various phenomena such as functional references (e.g. “find who made this site”), relational reasoning (e.g. “article by john”), and visual reasoning (e.g. “top-most article”). We also implemented and analyzed three baseline models that capture different phenomena present in the dataset.
Tasks	Relational Reasoning, Visual Reasoning
Published	2018-08-28
URL	http://arxiv.org/abs/1808.09132v2
PDF	http://arxiv.org/pdf/1808.09132v2.pdf
PWC	https://paperswithcode.com/paper/mapping-natural-language-commands-to-web
Repo	https://github.com/stanfordnlp/phrasenode
Framework	pytorch

Covariance-based Dissimilarity Measures Applied to Clustering Wide-sense Stationary Ergodic Processes


Title	Covariance-based Dissimilarity Measures Applied to Clustering Wide-sense Stationary Ergodic Processes
Authors	Qidi Peng, Nan Rao, Ran Zhao
Abstract	We introduce a new unsupervised learning problem: clustering wide-sense stationary ergodic stochastic processes. A covariance-based dissimilarity measure together with asymptotically consistent algorithms is designed for clustering offline and online datasets, respectively. We also suggest a formal criterion on the efficiency of dissimilarity measures, and discuss of some approach to improve the efficiency of our clustering algorithms, when they are applied to cluster particular type of processes, such as self-similar processes with wide-sense stationary ergodic increments. Clustering synthetic data and real-world data are provided as examples of applications.
Tasks
Published	2018-01-27
URL	https://arxiv.org/abs/1801.09049v4
PDF	https://arxiv.org/pdf/1801.09049v4.pdf
PWC	https://paperswithcode.com/paper/covariance-based-dissimilarity-measures
Repo	https://github.com/researchcoding/clustering_stochastic_processes
Framework	none

Towards Query Efficient Black-box Attacks: An Input-free Perspective


Title	Towards Query Efficient Black-box Attacks: An Input-free Perspective
Authors	Yali Du, Meng Fang, Jinfeng Yi, Jun Cheng, Dacheng Tao
Abstract	Recent studies have highlighted that deep neural networks (DNNs) are vulnerable to adversarial attacks, even in a black-box scenario. However, most of the existing black-box attack algorithms need to make a huge amount of queries to perform attacks, which is not practical in the real world. We note one of the main reasons for the massive queries is that the adversarial example is required to be visually similar to the original image, but in many cases, how adversarial examples look like does not matter much. It inspires us to introduce a new attack called \emph{input-free} attack, under which an adversary can choose an arbitrary image to start with and is allowed to add perceptible perturbations on it. Following this approach, we propose two techniques to significantly reduce the query complexity. First, we initialize an adversarial example with a gray color image on which every pixel has roughly the same importance for the target model. Then we shrink the dimension of the attack space by perturbing a small region and tiling it to cover the input image. To make our algorithm more effective, we stabilize a projected gradient ascent algorithm with momentum, and also propose a heuristic approach for region size selection. Through extensive experiments, we show that with only 1,701 queries on average, we can perturb a gray image to any target class of ImageNet with a 100% success rate on InceptionV3. Besides, our algorithm has successfully defeated two real-world systems, the Clarifai food detection API and the Baidu Animal Identification API.
Tasks
Published	2018-09-09
URL	http://arxiv.org/abs/1809.02918v1
PDF	http://arxiv.org/pdf/1809.02918v1.pdf
PWC	https://paperswithcode.com/paper/towards-query-efficient-black-box-attacks-an
Repo	https://github.com/yalidu/input-free-attack
Framework	tf

Deep Multimodal Subspace Clustering Networks


Title	Deep Multimodal Subspace Clustering Networks
Authors	Mahdi Abavisani, Vishal M. Patel
Abstract	We present convolutional neural network (CNN) based approaches for unsupervised multimodal subspace clustering. The proposed framework consists of three main stages - multimodal encoder, self-expressive layer, and multimodal decoder. The encoder takes multimodal data as input and fuses them to a latent space representation. The self-expressive layer is responsible for enforcing the self-expressiveness property and acquiring an affinity matrix corresponding to the data points. The decoder reconstructs the original input data. The network uses the distance between the decoder’s reconstruction and the original input in its training. We investigate early, late and intermediate fusion techniques and propose three different encoders corresponding to them for spatial fusion. The self-expressive layers and multimodal decoders are essentially the same for different spatial fusion-based approaches. In addition to various spatial fusion-based methods, an affinity fusion-based network is also proposed in which the self-expressive layer corresponding to different modalities is enforced to be the same. Extensive experiments on three datasets show that the proposed methods significantly outperform the state-of-the-art multimodal subspace clustering methods.
Tasks	Image Clustering, Multi-modal Subspace Clustering, Multiview Learning, Multi-view Subspace Clustering
Published	2018-04-17
URL	http://arxiv.org/abs/1804.06498v3
PDF	http://arxiv.org/pdf/1804.06498v3.pdf
PWC	https://paperswithcode.com/paper/deep-multimodal-subspace-clustering-networks
Repo	https://github.com/mahdiabavisani/Deep-multimodal-subspace-clustering-networks
Framework	tf

Deep Generative Networks For Sequence Prediction


Title	Deep Generative Networks For Sequence Prediction
Authors	Markus Beissinger
Abstract	This thesis investigates unsupervised time series representation learning for sequence prediction problems, i.e. generating nice-looking input samples given a previous history, for high dimensional input sequences by decoupling the static input representation from the recurrent sequence representation. We introduce three models based on Generative Stochastic Networks (GSN) for unsupervised sequence learning and prediction. Experimental results for these three models are presented on pixels of sequential handwritten digit (MNIST) data, videos of low-resolution bouncing balls, and motion capture data. The main contribution of this thesis is to provide evidence that GSNs are a viable framework to learn useful representations of complex sequential input data, and to suggest a new framework for deep generative models to learn complex sequences by decoupling static input representations from dynamic time dependency representations.
Tasks	Motion Capture, Representation Learning, Time Series
Published	2018-04-18
URL	http://arxiv.org/abs/1804.06546v1
PDF	http://arxiv.org/pdf/1804.06546v1.pdf
PWC	https://paperswithcode.com/paper/deep-generative-networks-for-sequence
Repo	https://github.com/mbeissinger/recurrent_gsn
Framework	none