January 28, 2020

3258 words 16 mins read

Paper Group ANR 1037

Imperio: Robust Over-the-Air Adversarial Examples for Automatic Speech Recognition Systems. Label Universal Targeted Attack. Question Answering via Web Extracted Tables and Pipelined Models. Improved Visual Localization via Graph Smoothing. Reinforcement Learning Based Text Style Transfer without Parallel Training Corpus. X-ToM: Explaining with The …

Imperio: Robust Over-the-Air Adversarial Examples for Automatic Speech Recognition Systems


Title	Imperio: Robust Over-the-Air Adversarial Examples for Automatic Speech Recognition Systems
Authors	Lea Schönherr, Steffen Zeiler, Thorsten Holz, Dorothea Kolossa
Abstract	Previous research showed that automatic speech recognition (ASR) systems can be fooled via adversarial examples. These can induce the ASR system to produce an arbitrary transcription in response to any type of audio signal. Unfortunately, the adversarial examples introduced in prior work did not work in a real-world setup, where the attack is played over the air. Instead, most examples rather have to be fed directly into the ASR system, ignoring practical side-effects such as reflections. In the few cases where the adversarial examples have been successfully demonstrated over the air, the attacks were not transferable between environments, but instead required precise information about the room where the attack was to take place. The remaining over-the-air attacks in the literature are either handcrafted examples or human listeners can easily recognize the target transcription once they have been alerted to its content. We demonstrate the first algorithm that produces generic adversarial examples, which remain robust in an over-the-air attack that is not adapted to the specific environment. Hence, no prior knowledge of the room characteristics is required. Instead, we use room impulse responses to compute robust adversarial examples for arbitrary room characteristics and employ the open-source ASR system Kaldi to demonstrate a full end-to-end attack. Further, we utilize psychoacoustic masking to hide the changes of the original audio signal below the human thresholds of hearing. We show that the adversarial examples work for varying room setups and that no line-of-sight between speaker and microphone is necessary. As a result, an attacker can optimize adversarial examples for any kind of target transcription, based on any kind of audio content, for arbitrary room setups without any prior knowledge. Additionally, the adversarial examples remain transferable across a wide range of rooms.
Tasks	Speech Recognition
Published	2019-08-05
URL	https://arxiv.org/abs/1908.01551v3
PDF	https://arxiv.org/pdf/1908.01551v3.pdf
PWC	https://paperswithcode.com/paper/robust-over-the-air-adversarial-examples
Repo
Framework

Label Universal Targeted Attack


Title	Label Universal Targeted Attack
Authors	Naveed Akhtar, Mohammad A. A. K. Jalwana, Mohammed Bennamoun, Ajmal Mian
Abstract	We introduce Label Universal Targeted Attack (LUTA) that makes a deep model predict a label of attacker’s choice for `any’ sample of a given source class with high probability. Our attack stochastically maximizes the log-probability of the target label for the source class with first order gradient optimization, while accounting for the gradient moments. It also suppresses the leakage of attack information to the non-source classes for avoiding the attack suspicions. The perturbations resulting from our attack achieve high fooling ratios on the large-scale ImageNet and VGGFace models, and transfer well to the Physical World. Given full control over the perturbation scope in LUTA, we also demonstrate it as a tool for deep model autopsy. The proposed attack reveals interesting perturbation patterns and observations regarding the deep models. \|
Tasks
Published	2019-05-27
URL	https://arxiv.org/abs/1905.11544v2
PDF	https://arxiv.org/pdf/1905.11544v2.pdf
PWC	https://paperswithcode.com/paper/label-universal-targeted-attack
Repo
Framework

Question Answering via Web Extracted Tables and Pipelined Models


Title	Question Answering via Web Extracted Tables and Pipelined Models
Authors	Bhavya Karki, Fan Hu, Nithin Haridas, Suhail Barot, Zihua Liu, Lucile Callebert, Matthias Grabmair, Anthony Tomasic
Abstract	In this paper, we describe a dataset and baseline result for a question answering that utilizes web tables. It contains commonly asked questions on the web and their corresponding answers found in tables on websites. Our dataset is novel in that every question is paired with a table of a different signature. In particular, the dataset contains two classes of tables: entity-instance tables and the key-value tables. Each QA instance comprises a table of either kind, a natural language question, and a corresponding structured SQL query. We build our model by dividing question answering into several tasks, including table retrieval and question element classification, and conduct experiments to measure the performance of each task. We extract various features specific to each task and compose a full pipeline which constructs the SQL query from its parts. Our work provides qualitative results and error analysis for each task, and identifies in detail the reasoning required to generate SQL expressions from natural language questions. This analysis of reasoning informs future models based on neural machine learning.
Tasks	Question Answering
Published	2019-03-17
URL	http://arxiv.org/abs/1903.07113v2
PDF	http://arxiv.org/pdf/1903.07113v2.pdf
PWC	https://paperswithcode.com/paper/question-answering-via-web-extracted-tables
Repo
Framework

Improved Visual Localization via Graph Smoothing


Title	Improved Visual Localization via Graph Smoothing
Authors	Carlos Lassance, Yasir Latif, Ravi Garg, Vincent Gripon, Ian Reid
Abstract	Vision based localization is the problem of inferring the pose of the camera given a single image. One solution to this problem is to learn a deep neural network to infer the pose of a query image after learning on a dataset of images with known poses. Another more commonly used approach rely on image retrieval where the query image is compared against the database of images and its pose is inferred with the help of the retrieved images. The latter approach assumes that images taken from the same places consists of the same landmarks and, thus would have similar feature representations. These representation can be learned using full supervision to be robust to different variations in capture conditions like time of the day and weather. In this work, we introduce a framework to enhance the performance of these retrieval based localization methods by taking into account the additional information including GPS coordinates and temporal neighbourhood of the images provided by the acquisition process in addition to the descriptor similarity of pairs of images in the reference or query database which is used traditionally for localization. Our method constructs a graph based on this additional information and use it for robust retrieval by smoothing the feature representation of reference and/or query images. We show that the proposed method is able to significantly improve the localization accuracy on two large scale datasets over the baselines.
Tasks	Image Retrieval, Visual Localization
Published	2019-11-07
URL	https://arxiv.org/abs/1911.02961v1
PDF	https://arxiv.org/pdf/1911.02961v1.pdf
PWC	https://paperswithcode.com/paper/improved-visual-localization-via-graph
Repo
Framework

Reinforcement Learning Based Text Style Transfer without Parallel Training Corpus


Title	Reinforcement Learning Based Text Style Transfer without Parallel Training Corpus
Authors	Hongyu Gong, Suma Bhat, Lingfei Wu, Jinjun Xiong, Wen-mei Hwu
Abstract	Text style transfer rephrases a text from a source style (e.g., informal) to a target style (e.g., formal) while keeping its original meaning. Despite the success existing works have achieved using a parallel corpus for the two styles, transferring text style has proven significantly more challenging when there is no parallel training corpus. In this paper, we address this challenge by using a reinforcement-learning-based generator-evaluator architecture. Our generator employs an attention-based encoder-decoder to transfer a sentence from the source style to the target style. Our evaluator is an adversarially trained style discriminator with semantic and syntactic constraints that score the generated sentence for style, meaning preservation, and fluency. Experimental results on two different style transfer tasks (sentiment transfer and formality transfer) show that our model outperforms state-of-the-art approaches. Furthermore, we perform a manual evaluation that demonstrates the effectiveness of the proposed method using subjective metrics of generated text quality.
Tasks	Style Transfer, Text Style Transfer
Published	2019-03-26
URL	http://arxiv.org/abs/1903.10671v2
PDF	http://arxiv.org/pdf/1903.10671v2.pdf
PWC	https://paperswithcode.com/paper/reinforcement-learning-based-text-style
Repo
Framework

X-ToM: Explaining with Theory-of-Mind for Gaining Justified Human Trust


Title	X-ToM: Explaining with Theory-of-Mind for Gaining Justified Human Trust
Authors	Arjun R. Akula, Changsong Liu, Sari Saba-Sadiya, Hongjing Lu, Sinisa Todorovic, Joyce Y. Chai, Song-Chun Zhu
Abstract	We present a new explainable AI (XAI) framework aimed at increasing justified human trust and reliance in the AI machine through explanations. We pose explanation as an iterative communication process, i.e. dialog, between the machine and human user. More concretely, the machine generates sequence of explanations in a dialog which takes into account three important aspects at each dialog turn: (a) human’s intention (or curiosity); (b) human’s understanding of the machine; and (c) machine’s understanding of the human user. To do this, we use Theory of Mind (ToM) which helps us in explicitly modeling human’s intention, machine’s mind as inferred by the human as well as human’s mind as inferred by the machine. In other words, these explicit mental representations in ToM are incorporated to learn an optimal explanation policy that takes into account human’s perception and beliefs. Furthermore, we also show that ToM facilitates in quantitatively measuring justified human trust in the machine by comparing all the three mental representations. We applied our framework to three visual recognition tasks, namely, image classification, action recognition, and human body pose estimation. We argue that our ToM based explanations are practical and more natural for both expert and non-expert users to understand the internal workings of complex machine learning models. To the best of our knowledge, this is the first work to derive explanations using ToM. Extensive human study experiments verify our hypotheses, showing that the proposed explanations significantly outperform the state-of-the-art XAI methods in terms of all the standard quantitative and qualitative XAI evaluation metrics including human trust, reliance, and explanation satisfaction.
Tasks	Image Classification, Pose Estimation
Published	2019-09-15
URL	https://arxiv.org/abs/1909.06907v1
PDF	https://arxiv.org/pdf/1909.06907v1.pdf
PWC	https://paperswithcode.com/paper/x-tom-explaining-with-theory-of-mind-for
Repo
Framework

Code Farming: A Process for Creating Generic Computational Building Blocks


Title	Code Farming: A Process for Creating Generic Computational Building Blocks
Authors	David Landaeta
Abstract	Motivated by a desire to improve on the current state of the art in genetic programming, and aided by recent progress in understanding the computational aspects of evolutionary systems, we describe a process that creates a set of generic computational building blocks for the purpose of seeding initial populations of programs in any genetic programming system. This provides an advantage over the standard approach of initializing the population purely randomly in that it avoids the need to constantly rediscover such building blocks. It is also better than seeding the initial population with hand-coded building blocks, since it lessens the amount of human intervention required by the system.
Tasks
Published	2019-01-30
URL	http://arxiv.org/abs/1901.11115v2
PDF	http://arxiv.org/pdf/1901.11115v2.pdf
PWC	https://paperswithcode.com/paper/code-farming-a-process-for-creating-generic
Repo
Framework

Set Functions for Time Series


Title	Set Functions for Time Series
Authors	Max Horn, Michael Moor, Christian Bock, Bastian Rieck, Karsten Borgwardt
Abstract	Despite the eminent successes of deep neural networks, many architectures are often hard to transfer to irregularly-sampled and asynchronous time series that commonly occur in real-world datasets, especially in healthcare applications. This paper proposes a novel approach for classifying irregularly-sampled time series with unaligned measurements, focusing on high scalability and data efficiency. Our method SeFT (Set Functions for Time Series) is based on recent advances in differentiable set function learning, extremely parallelizable with a beneficial memory footprint, thus scaling well to large datasets of long time series and online monitoring scenarios. Furthermore, our approach permits quantifying per-observation contributions to the classification outcome. We extensively compare our method with existing algorithms on multiple healthcare time series datasets and demonstrate that it performs competitively whilst significantly reducing runtime.
Tasks	Time Series, Time Series Classification
Published	2019-09-26
URL	https://arxiv.org/abs/1909.12064v2
PDF	https://arxiv.org/pdf/1909.12064v2.pdf
PWC	https://paperswithcode.com/paper/set-functions-for-time-series-1
Repo
Framework

Embracing Imperfect Datasets: A Review of Deep Learning Solutions for Medical Image Segmentation


Title	Embracing Imperfect Datasets: A Review of Deep Learning Solutions for Medical Image Segmentation
Authors	Nima Tajbakhsh, Laura Jeyaseelan, Qian Li, Jeffrey Chiang, Zhihao Wu, Xiaowei Ding
Abstract	The medical imaging literature has witnessed remarkable progress in high-performing segmentation models based on convolutional neural networks. Despite the new performance highs, the recent advanced segmentation models still require large, representative, and high quality annotated datasets. However, rarely do we have a perfect training dataset, particularly in the field of medical imaging, where data and annotations are both expensive to acquire. Recently, a large body of research has studied the problem of medical image segmentation with imperfect datasets, tackling two major dataset limitations: scarce annotations where only limited annotated data is available for training, and weak annotations where the training data has only sparse annotations, noisy annotations, or image-level annotations. In this article, we provide a detailed review of the solutions above, summarizing both the technical novelties and empirical results. We further compare the benefits and requirements of the surveyed methodologies and provide our recommended solutions. We hope this survey article increases the community awareness of the techniques that are available to handle imperfect medical image segmentation datasets.
Tasks	Medical Image Segmentation, Semantic Segmentation
Published	2019-08-27
URL	https://arxiv.org/abs/1908.10454v2
PDF	https://arxiv.org/pdf/1908.10454v2.pdf
PWC	https://paperswithcode.com/paper/embracing-imperfect-datasets-a-review-of-deep
Repo
Framework

Hue Modification Localization By Pair Matching


Title	Hue Modification Localization By Pair Matching
Authors	Quoc-Tin Phan, Michele Vascotto, Giulia Boato
Abstract	Hue modification is the adjustment of hue property on color images. Conducting hue modification on an image is trivial, and it can be abused to falsify opinions of viewers. Since shapes, edges or textural information remains unchanged after hue modification, this type of manipulation is relatively hard to be detected and localized. Since small patches inherit the same Color Filter Array (CFA) configuration and demosaicing, any distortion made by local hue modification can be detected by patch matching within the same image. In this paper, we propose to localize hue modification by means of a Siamese neural network specifically designed for matching two inputs. By crafting the network outputs, we are able to form a heatmap which potentially highlights malicious regions. Our proposed method deals well not only with uncompressed images but also with the presence of JPEG compression, an operation usually hindering the exploitation of CFA and demosaicing artifacts. Experimental evidences corroborate the effectiveness of the proposed method.
Tasks	Demosaicking
Published	2019-03-05
URL	http://arxiv.org/abs/1903.01735v1
PDF	http://arxiv.org/pdf/1903.01735v1.pdf
PWC	https://paperswithcode.com/paper/hue-modification-localization-by-pair
Repo
Framework

Sentence Length


Title	Sentence Length
Authors	Gábor Borbély, András Kornai
Abstract	The distribution of sentence length in ordinary language is not well captured by the existing models. Here we survey previous models of sentence length and present our random walk model that offers both a better fit with the data and a better understanding of the distribution. We develop a generalization of KL divergence, discuss measuring the noise inherent in a corpus, and present a hyperparameter-free Bayesian model comparison method that has strong conceptual ties to Minimal Description Length modeling. The models we obtain require only a few dozen bits, orders of magnitude less than the naive nonparametric MDL models would.
Tasks
Published	2019-05-22
URL	https://arxiv.org/abs/1905.09139v1
PDF	https://arxiv.org/pdf/1905.09139v1.pdf
PWC	https://paperswithcode.com/paper/sentence-length
Repo
Framework

Knowledge Map: Toward a New Approach Supporting the Knowledge Management in Distributed Data Mining


Title	Knowledge Map: Toward a New Approach Supporting the Knowledge Management in Distributed Data Mining
Authors	Nhien-An Le-Khac, Lamine M. Aouad, M-Tahar Kechadi
Abstract	Distributed data mining (DDM) deals with the problem of finding patterns or models, called knowledge, in an environment with distributed data and computations. Today, a massive amounts of data which are often geographically distributed and owned by different organisation are being mined. As consequence, a large mount of knowledge are being produced. This causes problems of not only knowledge management but also visualization in data mining. Besides, the main aim of DDM is to exploit fully the benefit of distributed data analysis while minimising the communication. Existing DDM techniques perform partial analysis of local data at individual sites and then generate a global model by aggregating these local results. These two steps are not independent since naive approaches to local analysis may produce an incorrect and ambiguous global data model. The integrating and cooperating of these two steps need an effective knowledge management, concretely an efficient map of knowledge in order to take the advantage of mined knowledge to guide mining the data. In this paper, we present “knowledge map”, a representation of knowledge about mined knowledge. This new approach aims to manage efficiently mined knowledge in large scale distributed platform such as Grid. This knowledge map is used to facilitate not only the visualization, evaluation of mining results but also the coordinating of local mining process and existing knowledge to increase the accuracy of final model.
Tasks
Published	2019-10-23
URL	https://arxiv.org/abs/1910.10547v1
PDF	https://arxiv.org/pdf/1910.10547v1.pdf
PWC	https://paperswithcode.com/paper/knowledge-map-toward-a-new-approach
Repo
Framework

The PhotoBook Dataset: Building Common Ground through Visually-Grounded Dialogue


Title	The PhotoBook Dataset: Building Common Ground through Visually-Grounded Dialogue
Authors	Janosch Haber, Tim Baumgärtner, Ece Takmaz, Lieke Gelderloos, Elia Bruni, Raquel Fernández
Abstract	This paper introduces the PhotoBook dataset, a large-scale collection of visually-grounded, task-oriented dialogues in English designed to investigate shared dialogue history accumulating during conversation. Taking inspiration from seminal work on dialogue analysis, we propose a data-collection task formulated as a collaborative game prompting two online participants to refer to images utilising both their visual context as well as previously established referring expressions. We provide a detailed description of the task setup and a thorough analysis of the 2,500 dialogues collected. To further illustrate the novel features of the dataset, we propose a baseline model for reference resolution which uses a simple method to take into account shared information accumulated in a reference chain. Our results show that this information is particularly important to resolve later descriptions and underline the need to develop more sophisticated models of common ground in dialogue interaction.
Tasks
Published	2019-06-04
URL	https://arxiv.org/abs/1906.01530v2
PDF	https://arxiv.org/pdf/1906.01530v2.pdf
PWC	https://paperswithcode.com/paper/the-photobook-dataset-building-common-ground
Repo
Framework

Distributed Deep Learning Model for Intelligent Video Surveillance Systems with Edge Computing


Title	Distributed Deep Learning Model for Intelligent Video Surveillance Systems with Edge Computing
Authors	Jianguo Chen, Kenli Li, Qingying Deng, Keqin Li, Philip S. Yu
Abstract	In this paper, we propose a Distributed Intelligent Video Surveillance (DIVS) system using Deep Learning (DL) algorithms and deploy it in an edge computing environment. We establish a multi-layer edge computing architecture and a distributed DL training model for the DIVS system. The DIVS system can migrate computing workloads from the network center to network edges to reduce huge network communication overhead and provide low-latency and accurate video analysis solutions. We implement the proposed DIVS system and address the problems of parallel training, model synchronization, and workload balancing. Task-level parallel and model-level parallel training methods are proposed to further accelerate the video analysis process. In addition, we propose a model parameter updating method to achieve model synchronization of the global DL model in a distributed EC environment. Moreover, a dynamic data migration approach is proposed to address the imbalance of workload and computational power of edge nodes. Experimental results showed that the EC architecture can provide elastic and scalable computing power, and the proposed DIVS system can efficiently handle video surveillance and analysis tasks.
Tasks
Published	2019-04-12
URL	http://arxiv.org/abs/1904.06400v1
PDF	http://arxiv.org/pdf/1904.06400v1.pdf
PWC	https://paperswithcode.com/paper/distributed-deep-learning-model-for
Repo
Framework

An Encoder-Decoder Based Approach for Anomaly Detection with Application in Additive Manufacturing


Title	An Encoder-Decoder Based Approach for Anomaly Detection with Application in Additive Manufacturing
Authors	Baihong Jin, Yingshui Tan, Alexander Nettekoven, Yuxin Chen, Ufuk Topcu, Yisong Yue, Alberto Sangiovanni Vincentelli
Abstract	We present a novel unsupervised deep learning approach that utilizes the encoder-decoder architecture for detecting anomalies in sequential sensor data collected during industrial manufacturing. Our approach is designed not only to detect whether there exists an anomaly at a given time step, but also to predict what will happen next in the (sequential) process. We demonstrate our approach on a dataset collected from a real-world testbed. The dataset contains images collected under both normal conditions and synthetic anomalies. We show that the encoder-decoder model is able to identify the injected anomalies in a modern manufacturing process in an unsupervised fashion. In addition, it also gives hints about the temperature non-uniformity of the testbed during manufacturing, which is what we are not aware of before doing the experiment.
Tasks	Anomaly Detection
Published	2019-07-26
URL	https://arxiv.org/abs/1907.11778v1
PDF	https://arxiv.org/pdf/1907.11778v1.pdf
PWC	https://paperswithcode.com/paper/an-encoder-decoder-based-approach-for-anomaly
Repo
Framework