Paper Group ANR 1037
Imperio: Robust Over-the-Air Adversarial Examples for Automatic Speech Recognition Systems. Label Universal Targeted Attack. Question Answering via Web Extracted Tables and Pipelined Models. Improved Visual Localization via Graph Smoothing. Reinforcement Learning Based Text Style Transfer without Parallel Training Corpus. X-ToM: Explaining with The …
Imperio: Robust Over-the-Air Adversarial Examples for Automatic Speech Recognition Systems
Title | Imperio: Robust Over-the-Air Adversarial Examples for Automatic Speech Recognition Systems |
Authors | Lea Schönherr, Steffen Zeiler, Thorsten Holz, Dorothea Kolossa |
Abstract | Previous research showed that automatic speech recognition (ASR) systems can be fooled via adversarial examples. These can induce the ASR system to produce an arbitrary transcription in response to any type of audio signal. Unfortunately, the adversarial examples introduced in prior work did not work in a real-world setup, where the attack is played over the air. Instead, most examples rather have to be fed directly into the ASR system, ignoring practical side-effects such as reflections. In the few cases where the adversarial examples have been successfully demonstrated over the air, the attacks were not transferable between environments, but instead required precise information about the room where the attack was to take place. The remaining over-the-air attacks in the literature are either handcrafted examples or human listeners can easily recognize the target transcription once they have been alerted to its content. We demonstrate the first algorithm that produces generic adversarial examples, which remain robust in an over-the-air attack that is not adapted to the specific environment. Hence, no prior knowledge of the room characteristics is required. Instead, we use room impulse responses to compute robust adversarial examples for arbitrary room characteristics and employ the open-source ASR system Kaldi to demonstrate a full end-to-end attack. Further, we utilize psychoacoustic masking to hide the changes of the original audio signal below the human thresholds of hearing. We show that the adversarial examples work for varying room setups and that no line-of-sight between speaker and microphone is necessary. As a result, an attacker can optimize adversarial examples for any kind of target transcription, based on any kind of audio content, for arbitrary room setups without any prior knowledge. Additionally, the adversarial examples remain transferable across a wide range of rooms. |
Tasks | Speech Recognition |
Published | 2019-08-05 |
URL | https://arxiv.org/abs/1908.01551v3 |
https://arxiv.org/pdf/1908.01551v3.pdf | |
PWC | https://paperswithcode.com/paper/robust-over-the-air-adversarial-examples |
Repo | |
Framework | |
Label Universal Targeted Attack
Title | Label Universal Targeted Attack |
Authors | Naveed Akhtar, Mohammad A. A. K. Jalwana, Mohammed Bennamoun, Ajmal Mian |
Abstract | We introduce Label Universal Targeted Attack (LUTA) that makes a deep model predict a label of attacker’s choice for `any’ sample of a given source class with high probability. Our attack stochastically maximizes the log-probability of the target label for the source class with first order gradient optimization, while accounting for the gradient moments. It also suppresses the leakage of attack information to the non-source classes for avoiding the attack suspicions. The perturbations resulting from our attack achieve high fooling ratios on the large-scale ImageNet and VGGFace models, and transfer well to the Physical World. Given full control over the perturbation scope in LUTA, we also demonstrate it as a tool for deep model autopsy. The proposed attack reveals interesting perturbation patterns and observations regarding the deep models. | |
Tasks | |
Published | 2019-05-27 |
URL | https://arxiv.org/abs/1905.11544v2 |
https://arxiv.org/pdf/1905.11544v2.pdf | |
PWC | https://paperswithcode.com/paper/label-universal-targeted-attack |
Repo | |
Framework | |
Question Answering via Web Extracted Tables and Pipelined Models
Title | Question Answering via Web Extracted Tables and Pipelined Models |
Authors | Bhavya Karki, Fan Hu, Nithin Haridas, Suhail Barot, Zihua Liu, Lucile Callebert, Matthias Grabmair, Anthony Tomasic |
Abstract | In this paper, we describe a dataset and baseline result for a question answering that utilizes web tables. It contains commonly asked questions on the web and their corresponding answers found in tables on websites. Our dataset is novel in that every question is paired with a table of a different signature. In particular, the dataset contains two classes of tables: entity-instance tables and the key-value tables. Each QA instance comprises a table of either kind, a natural language question, and a corresponding structured SQL query. We build our model by dividing question answering into several tasks, including table retrieval and question element classification, and conduct experiments to measure the performance of each task. We extract various features specific to each task and compose a full pipeline which constructs the SQL query from its parts. Our work provides qualitative results and error analysis for each task, and identifies in detail the reasoning required to generate SQL expressions from natural language questions. This analysis of reasoning informs future models based on neural machine learning. |
Tasks | Question Answering |
Published | 2019-03-17 |
URL | http://arxiv.org/abs/1903.07113v2 |
http://arxiv.org/pdf/1903.07113v2.pdf | |
PWC | https://paperswithcode.com/paper/question-answering-via-web-extracted-tables |
Repo | |
Framework | |
Improved Visual Localization via Graph Smoothing
Title | Improved Visual Localization via Graph Smoothing |
Authors | Carlos Lassance, Yasir Latif, Ravi Garg, Vincent Gripon, Ian Reid |
Abstract | Vision based localization is the problem of inferring the pose of the camera given a single image. One solution to this problem is to learn a deep neural network to infer the pose of a query image after learning on a dataset of images with known poses. Another more commonly used approach rely on image retrieval where the query image is compared against the database of images and its pose is inferred with the help of the retrieved images. The latter approach assumes that images taken from the same places consists of the same landmarks and, thus would have similar feature representations. These representation can be learned using full supervision to be robust to different variations in capture conditions like time of the day and weather. In this work, we introduce a framework to enhance the performance of these retrieval based localization methods by taking into account the additional information including GPS coordinates and temporal neighbourhood of the images provided by the acquisition process in addition to the descriptor similarity of pairs of images in the reference or query database which is used traditionally for localization. Our method constructs a graph based on this additional information and use it for robust retrieval by smoothing the feature representation of reference and/or query images. We show that the proposed method is able to significantly improve the localization accuracy on two large scale datasets over the baselines. |
Tasks | Image Retrieval, Visual Localization |
Published | 2019-11-07 |
URL | https://arxiv.org/abs/1911.02961v1 |
https://arxiv.org/pdf/1911.02961v1.pdf | |
PWC | https://paperswithcode.com/paper/improved-visual-localization-via-graph |
Repo | |
Framework | |
Reinforcement Learning Based Text Style Transfer without Parallel Training Corpus
Title | Reinforcement Learning Based Text Style Transfer without Parallel Training Corpus |
Authors | Hongyu Gong, Suma Bhat, Lingfei Wu, Jinjun Xiong, Wen-mei Hwu |
Abstract | Text style transfer rephrases a text from a source style (e.g., informal) to a target style (e.g., formal) while keeping its original meaning. Despite the success existing works have achieved using a parallel corpus for the two styles, transferring text style has proven significantly more challenging when there is no parallel training corpus. In this paper, we address this challenge by using a reinforcement-learning-based generator-evaluator architecture. Our generator employs an attention-based encoder-decoder to transfer a sentence from the source style to the target style. Our evaluator is an adversarially trained style discriminator with semantic and syntactic constraints that score the generated sentence for style, meaning preservation, and fluency. Experimental results on two different style transfer tasks (sentiment transfer and formality transfer) show that our model outperforms state-of-the-art approaches. Furthermore, we perform a manual evaluation that demonstrates the effectiveness of the proposed method using subjective metrics of generated text quality. |
Tasks | Style Transfer, Text Style Transfer |
Published | 2019-03-26 |
URL | http://arxiv.org/abs/1903.10671v2 |
http://arxiv.org/pdf/1903.10671v2.pdf | |
PWC | https://paperswithcode.com/paper/reinforcement-learning-based-text-style |
Repo | |
Framework | |
X-ToM: Explaining with Theory-of-Mind for Gaining Justified Human Trust
Title | X-ToM: Explaining with Theory-of-Mind for Gaining Justified Human Trust |
Authors | Arjun R. Akula, Changsong Liu, Sari Saba-Sadiya, Hongjing Lu, Sinisa Todorovic, Joyce Y. Chai, Song-Chun Zhu |
Abstract | We present a new explainable AI (XAI) framework aimed at increasing justified human trust and reliance in the AI machine through explanations. We pose explanation as an iterative communication process, i.e. dialog, between the machine and human user. More concretely, the machine generates sequence of explanations in a dialog which takes into account three important aspects at each dialog turn: (a) human’s intention (or curiosity); (b) human’s understanding of the machine; and (c) machine’s understanding of the human user. To do this, we use Theory of Mind (ToM) which helps us in explicitly modeling human’s intention, machine’s mind as inferred by the human as well as human’s mind as inferred by the machine. In other words, these explicit mental representations in ToM are incorporated to learn an optimal explanation policy that takes into account human’s perception and beliefs. Furthermore, we also show that ToM facilitates in quantitatively measuring justified human trust in the machine by comparing all the three mental representations. We applied our framework to three visual recognition tasks, namely, image classification, action recognition, and human body pose estimation. We argue that our ToM based explanations are practical and more natural for both expert and non-expert users to understand the internal workings of complex machine learning models. To the best of our knowledge, this is the first work to derive explanations using ToM. Extensive human study experiments verify our hypotheses, showing that the proposed explanations significantly outperform the state-of-the-art XAI methods in terms of all the standard quantitative and qualitative XAI evaluation metrics including human trust, reliance, and explanation satisfaction. |
Tasks | Image Classification, Pose Estimation |
Published | 2019-09-15 |
URL | https://arxiv.org/abs/1909.06907v1 |
https://arxiv.org/pdf/1909.06907v1.pdf | |
PWC | https://paperswithcode.com/paper/x-tom-explaining-with-theory-of-mind-for |
Repo | |
Framework | |
Code Farming: A Process for Creating Generic Computational Building Blocks
Title | Code Farming: A Process for Creating Generic Computational Building Blocks |
Authors | David Landaeta |
Abstract | Motivated by a desire to improve on the current state of the art in genetic programming, and aided by recent progress in understanding the computational aspects of evolutionary systems, we describe a process that creates a set of generic computational building blocks for the purpose of seeding initial populations of programs in any genetic programming system. This provides an advantage over the standard approach of initializing the population purely randomly in that it avoids the need to constantly rediscover such building blocks. It is also better than seeding the initial population with hand-coded building blocks, since it lessens the amount of human intervention required by the system. |
Tasks | |
Published | 2019-01-30 |
URL | http://arxiv.org/abs/1901.11115v2 |
http://arxiv.org/pdf/1901.11115v2.pdf | |
PWC | https://paperswithcode.com/paper/code-farming-a-process-for-creating-generic |
Repo | |
Framework | |
Set Functions for Time Series
Title | Set Functions for Time Series |
Authors | Max Horn, Michael Moor, Christian Bock, Bastian Rieck, Karsten Borgwardt |
Abstract | Despite the eminent successes of deep neural networks, many architectures are often hard to transfer to irregularly-sampled and asynchronous time series that commonly occur in real-world datasets, especially in healthcare applications. This paper proposes a novel approach for classifying irregularly-sampled time series with unaligned measurements, focusing on high scalability and data efficiency. Our method SeFT (Set Functions for Time Series) is based on recent advances in differentiable set function learning, extremely parallelizable with a beneficial memory footprint, thus scaling well to large datasets of long time series and online monitoring scenarios. Furthermore, our approach permits quantifying per-observation contributions to the classification outcome. We extensively compare our method with existing algorithms on multiple healthcare time series datasets and demonstrate that it performs competitively whilst significantly reducing runtime. |
Tasks | Time Series, Time Series Classification |
Published | 2019-09-26 |
URL | https://arxiv.org/abs/1909.12064v2 |
https://arxiv.org/pdf/1909.12064v2.pdf | |
PWC | https://paperswithcode.com/paper/set-functions-for-time-series-1 |
Repo | |
Framework | |
Embracing Imperfect Datasets: A Review of Deep Learning Solutions for Medical Image Segmentation
Title | Embracing Imperfect Datasets: A Review of Deep Learning Solutions for Medical Image Segmentation |
Authors | Nima Tajbakhsh, Laura Jeyaseelan, Qian Li, Jeffrey Chiang, Zhihao Wu, Xiaowei Ding |
Abstract | The medical imaging literature has witnessed remarkable progress in high-performing segmentation models based on convolutional neural networks. Despite the new performance highs, the recent advanced segmentation models still require large, representative, and high quality annotated datasets. However, rarely do we have a perfect training dataset, particularly in the field of medical imaging, where data and annotations are both expensive to acquire. Recently, a large body of research has studied the problem of medical image segmentation with imperfect datasets, tackling two major dataset limitations: scarce annotations where only limited annotated data is available for training, and weak annotations where the training data has only sparse annotations, noisy annotations, or image-level annotations. In this article, we provide a detailed review of the solutions above, summarizing both the technical novelties and empirical results. We further compare the benefits and requirements of the surveyed methodologies and provide our recommended solutions. We hope this survey article increases the community awareness of the techniques that are available to handle imperfect medical image segmentation datasets. |
Tasks | Medical Image Segmentation, Semantic Segmentation |
Published | 2019-08-27 |
URL | https://arxiv.org/abs/1908.10454v2 |
https://arxiv.org/pdf/1908.10454v2.pdf | |
PWC | https://paperswithcode.com/paper/embracing-imperfect-datasets-a-review-of-deep |
Repo | |
Framework | |
Hue Modification Localization By Pair Matching
Title | Hue Modification Localization By Pair Matching |
Authors | Quoc-Tin Phan, Michele Vascotto, Giulia Boato |
Abstract | Hue modification is the adjustment of hue property on color images. Conducting hue modification on an image is trivial, and it can be abused to falsify opinions of viewers. Since shapes, edges or textural information remains unchanged after hue modification, this type of manipulation is relatively hard to be detected and localized. Since small patches inherit the same Color Filter Array (CFA) configuration and demosaicing, any distortion made by local hue modification can be detected by patch matching within the same image. In this paper, we propose to localize hue modification by means of a Siamese neural network specifically designed for matching two inputs. By crafting the network outputs, we are able to form a heatmap which potentially highlights malicious regions. Our proposed method deals well not only with uncompressed images but also with the presence of JPEG compression, an operation usually hindering the exploitation of CFA and demosaicing artifacts. Experimental evidences corroborate the effectiveness of the proposed method. |
Tasks | Demosaicking |
Published | 2019-03-05 |
URL | http://arxiv.org/abs/1903.01735v1 |
http://arxiv.org/pdf/1903.01735v1.pdf | |
PWC | https://paperswithcode.com/paper/hue-modification-localization-by-pair |
Repo | |
Framework | |
Sentence Length
Title | Sentence Length |
Authors | Gábor Borbély, András Kornai |
Abstract | The distribution of sentence length in ordinary language is not well captured by the existing models. Here we survey previous models of sentence length and present our random walk model that offers both a better fit with the data and a better understanding of the distribution. We develop a generalization of KL divergence, discuss measuring the noise inherent in a corpus, and present a hyperparameter-free Bayesian model comparison method that has strong conceptual ties to Minimal Description Length modeling. The models we obtain require only a few dozen bits, orders of magnitude less than the naive nonparametric MDL models would. |
Tasks | |
Published | 2019-05-22 |
URL | https://arxiv.org/abs/1905.09139v1 |
https://arxiv.org/pdf/1905.09139v1.pdf | |
PWC | https://paperswithcode.com/paper/sentence-length |
Repo | |
Framework | |
Knowledge Map: Toward a New Approach Supporting the Knowledge Management in Distributed Data Mining
Title | Knowledge Map: Toward a New Approach Supporting the Knowledge Management in Distributed Data Mining |
Authors | Nhien-An Le-Khac, Lamine M. Aouad, M-Tahar Kechadi |
Abstract | Distributed data mining (DDM) deals with the problem of finding patterns or models, called knowledge, in an environment with distributed data and computations. Today, a massive amounts of data which are often geographically distributed and owned by different organisation are being mined. As consequence, a large mount of knowledge are being produced. This causes problems of not only knowledge management but also visualization in data mining. Besides, the main aim of DDM is to exploit fully the benefit of distributed data analysis while minimising the communication. Existing DDM techniques perform partial analysis of local data at individual sites and then generate a global model by aggregating these local results. These two steps are not independent since naive approaches to local analysis may produce an incorrect and ambiguous global data model. The integrating and cooperating of these two steps need an effective knowledge management, concretely an efficient map of knowledge in order to take the advantage of mined knowledge to guide mining the data. In this paper, we present “knowledge map”, a representation of knowledge about mined knowledge. This new approach aims to manage efficiently mined knowledge in large scale distributed platform such as Grid. This knowledge map is used to facilitate not only the visualization, evaluation of mining results but also the coordinating of local mining process and existing knowledge to increase the accuracy of final model. |
Tasks | |
Published | 2019-10-23 |
URL | https://arxiv.org/abs/1910.10547v1 |
https://arxiv.org/pdf/1910.10547v1.pdf | |
PWC | https://paperswithcode.com/paper/knowledge-map-toward-a-new-approach |
Repo | |
Framework | |
The PhotoBook Dataset: Building Common Ground through Visually-Grounded Dialogue
Title | The PhotoBook Dataset: Building Common Ground through Visually-Grounded Dialogue |
Authors | Janosch Haber, Tim Baumgärtner, Ece Takmaz, Lieke Gelderloos, Elia Bruni, Raquel Fernández |
Abstract | This paper introduces the PhotoBook dataset, a large-scale collection of visually-grounded, task-oriented dialogues in English designed to investigate shared dialogue history accumulating during conversation. Taking inspiration from seminal work on dialogue analysis, we propose a data-collection task formulated as a collaborative game prompting two online participants to refer to images utilising both their visual context as well as previously established referring expressions. We provide a detailed description of the task setup and a thorough analysis of the 2,500 dialogues collected. To further illustrate the novel features of the dataset, we propose a baseline model for reference resolution which uses a simple method to take into account shared information accumulated in a reference chain. Our results show that this information is particularly important to resolve later descriptions and underline the need to develop more sophisticated models of common ground in dialogue interaction. |
Tasks | |
Published | 2019-06-04 |
URL | https://arxiv.org/abs/1906.01530v2 |
https://arxiv.org/pdf/1906.01530v2.pdf | |
PWC | https://paperswithcode.com/paper/the-photobook-dataset-building-common-ground |
Repo | |
Framework | |
Distributed Deep Learning Model for Intelligent Video Surveillance Systems with Edge Computing
Title | Distributed Deep Learning Model for Intelligent Video Surveillance Systems with Edge Computing |
Authors | Jianguo Chen, Kenli Li, Qingying Deng, Keqin Li, Philip S. Yu |
Abstract | In this paper, we propose a Distributed Intelligent Video Surveillance (DIVS) system using Deep Learning (DL) algorithms and deploy it in an edge computing environment. We establish a multi-layer edge computing architecture and a distributed DL training model for the DIVS system. The DIVS system can migrate computing workloads from the network center to network edges to reduce huge network communication overhead and provide low-latency and accurate video analysis solutions. We implement the proposed DIVS system and address the problems of parallel training, model synchronization, and workload balancing. Task-level parallel and model-level parallel training methods are proposed to further accelerate the video analysis process. In addition, we propose a model parameter updating method to achieve model synchronization of the global DL model in a distributed EC environment. Moreover, a dynamic data migration approach is proposed to address the imbalance of workload and computational power of edge nodes. Experimental results showed that the EC architecture can provide elastic and scalable computing power, and the proposed DIVS system can efficiently handle video surveillance and analysis tasks. |
Tasks | |
Published | 2019-04-12 |
URL | http://arxiv.org/abs/1904.06400v1 |
http://arxiv.org/pdf/1904.06400v1.pdf | |
PWC | https://paperswithcode.com/paper/distributed-deep-learning-model-for |
Repo | |
Framework | |
An Encoder-Decoder Based Approach for Anomaly Detection with Application in Additive Manufacturing
Title | An Encoder-Decoder Based Approach for Anomaly Detection with Application in Additive Manufacturing |
Authors | Baihong Jin, Yingshui Tan, Alexander Nettekoven, Yuxin Chen, Ufuk Topcu, Yisong Yue, Alberto Sangiovanni Vincentelli |
Abstract | We present a novel unsupervised deep learning approach that utilizes the encoder-decoder architecture for detecting anomalies in sequential sensor data collected during industrial manufacturing. Our approach is designed not only to detect whether there exists an anomaly at a given time step, but also to predict what will happen next in the (sequential) process. We demonstrate our approach on a dataset collected from a real-world testbed. The dataset contains images collected under both normal conditions and synthetic anomalies. We show that the encoder-decoder model is able to identify the injected anomalies in a modern manufacturing process in an unsupervised fashion. In addition, it also gives hints about the temperature non-uniformity of the testbed during manufacturing, which is what we are not aware of before doing the experiment. |
Tasks | Anomaly Detection |
Published | 2019-07-26 |
URL | https://arxiv.org/abs/1907.11778v1 |
https://arxiv.org/pdf/1907.11778v1.pdf | |
PWC | https://paperswithcode.com/paper/an-encoder-decoder-based-approach-for-anomaly |
Repo | |
Framework | |