January 29, 2020

3087 words 15 mins read

Paper Group ANR 751

Towards Verifying Robustness of Neural Networks Against Semantic Perturbations. Multi-modal dialog for browsing large visual catalogs using exploration-exploitation paradigm in a joint embedding space. Domain Adversarial Reinforcement Learning for Partial Domain Adaptation. Query-by-example on-device keyword spotting. Reinforcement Learning in Heal …

Towards Verifying Robustness of Neural Networks Against Semantic Perturbations


Title	Towards Verifying Robustness of Neural Networks Against Semantic Perturbations
Authors	Jeet Mohapatra, Tsui-Wei, Weng, Pin-Yu Chen, Sijia Liu, Luca Daniel
Abstract	Verifying robustness of neural networks given a specified threat model is a fundamental yet challenging task. While current verification methods mainly focus on the L_p-norm-ball threat model of the input instances, robustness verification against semantic adversarial attacks inducing large L_p-norm perturbations such as color shifting and lighting adjustment are beyond their capacity. To bridge this gap, we propose Semantify-NN, a model-agnostic and generic robustness verification approach against semantic perturbations for neural networks. By simply inserting our proposed semantic perturbation layers (SP-layers) to the input layer of any given model, Semantify-NN is model-agnostic, and any $L_p$-norm-ball based verification tools can be used to verify the model robustness against semantic perturbations. We illustrate the principles of designing the SP-layers and provide examples including semantic perturbations to image classification in the space of hue, saturation, lightness, brightness, contrast and rotation, respectively. Experimental results on various network architectures and different datasets demonstrate the superior verification performance of Semantify-NN over L_p-norm-based verification frameworks that naively convert semantic perturbation to L_p-norm. To the best of our knowledge, Semantify-NN is the first framework to support robustness verification against a wide range of semantic perturbations.
Tasks	Image Classification
Published	2019-12-19
URL	https://arxiv.org/abs/1912.09533v1
PDF	https://arxiv.org/pdf/1912.09533v1.pdf
PWC	https://paperswithcode.com/paper/towards-verifying-robustness-of-neural
Repo
Framework


Title	Multi-modal dialog for browsing large visual catalogs using exploration-exploitation paradigm in a joint embedding space
Authors	Indrani Bhattacharya, Arkabandhu Chowdhury, Vikas Raykar
Abstract	We present a multi-modal dialog system to assist online shoppers in visually browsing through large catalogs. Visual browsing is different from visual search in that it allows the user to explore the wide range of products in a catalog, beyond the exact search matches. We focus on a slightly asymmetric version of the complete multi-modal dialog where the system can understand both text and image queries but responds only in images. We formulate our problem of “showing $k$ best images to a user” based on the dialog context so far, as sampling from a Gaussian Mixture Model in a high dimensional joint multi-modal embedding space, that embed both the text and the image queries. Our system remembers the context of the dialog and uses an exploration-exploitation paradigm to assist in visual browsing. We train and evaluate the system on a multi-modal dialog dataset that we generate from large catalog data. Our experiments are promising and show that the agent is capable of learning and can display relevant results with an average cosine similarity of 0.85 to the ground truth. Our preliminary human evaluation also corroborates the fact that such a multi-modal dialog system for visual browsing is well-received and is capable of engaging human users.
Tasks
Published	2019-01-28
URL	http://arxiv.org/abs/1901.09854v2
PDF	http://arxiv.org/pdf/1901.09854v2.pdf
PWC	https://paperswithcode.com/paper/multi-modal-dialog-for-browsing-large-visual
Repo
Framework

Domain Adversarial Reinforcement Learning for Partial Domain Adaptation


Title	Domain Adversarial Reinforcement Learning for Partial Domain Adaptation
Authors	Jin Chen, Xinxiao Wu, Lixin Duan, Shenghua Gao
Abstract	Partial domain adaptation aims to transfer knowledge from a label-rich source domain to a label-scarce target domain which relaxes the fully shared label space assumption across different domains. In this more general and practical scenario, a major challenge is how to select source instances in the shared classes across different domains for positive transfer. To address this issue, we propose a Domain Adversarial Reinforcement Learning (DARL) framework to automatically select source instances in the shared classes for circumventing negative transfer as well as to simultaneously learn transferable features between domains by reducing the domain shift. Specifically, in this framework, we employ deep Q-learning to learn policies for an agent to make selection decisions by approximating the action-value function. Moreover, domain adversarial learning is introduced to learn domain-invariant features for the selected source instances by the agent and the target instances, and also to determine rewards for the agent based on how relevant the selected source instances are to the target domain. Experiments on several benchmark datasets demonstrate that the superior performance of our DARL method over existing state of the arts for partial domain adaptation.
Tasks	Domain Adaptation, Partial Domain Adaptation, Q-Learning
Published	2019-05-10
URL	https://arxiv.org/abs/1905.04094v1
PDF	https://arxiv.org/pdf/1905.04094v1.pdf
PWC	https://paperswithcode.com/paper/domain-adversarial-reinforcement-learning-for
Repo
Framework

Query-by-example on-device keyword spotting


Title	Query-by-example on-device keyword spotting
Authors	Byeonggeun Kim, Mingu Lee, Jinkyu Lee, Yeonseok Kim, Kyuwoong Hwang
Abstract	A keyword spotting (KWS) system determines the existence of, usually predefined, keyword in a continuous speech stream. This paper presents a query-by-example on-device KWS system which is user-specific. The proposed system consists of two main steps: query enrollment and testing. In query enrollment step, phonetic posteriors are output by a small-footprint automatic speech recognition model based on connectionist temporal classification. Using the phonetic-level posteriorgram, hypothesis graph of finite-state transducer (FST) is built, thus can enroll any keywords thus avoiding an out-of-vocabulary problem. In testing, a log-likelihood is scored for input audio using the FST. We propose a threshold prediction method while using the user-specific keyword hypothesis only. The system generates query-specific negatives by rearranging each query utterance in waveform. The threshold is decided based on the enrollment queries and generated negatives. We tested two keywords in English, and the proposed work shows promising performance while preserving simplicity.
Tasks	Keyword Spotting, Speech Recognition
Published	2019-10-11
URL	https://arxiv.org/abs/1910.05171v3
PDF	https://arxiv.org/pdf/1910.05171v3.pdf
PWC	https://paperswithcode.com/paper/query-by-example-on-device-keyword-spotting
Repo
Framework

Reinforcement Learning in Healthcare: A Survey


Title	Reinforcement Learning in Healthcare: A Survey
Authors	Chao Yu, Jiming Liu, Shamim Nemati
Abstract	As a subfield of machine learning, \emph{reinforcement learning} (RL) aims at empowering one’s capabilities in behavioural decision making by using interaction experience with the world and an evaluative feedback. Unlike traditional supervised learning methods that usually rely on one-shot, exhaustive and supervised reward signals, RL tackles with sequential decision making problems with sampled, evaluative and delayed feedback simultaneously. Such distinctive features make RL technique a suitable candidate for developing powerful solutions in a variety of healthcare domains, where diagnosing decisions or treatment regimes are usually characterized by a prolonged and sequential procedure. This survey will discuss the broad applications of RL techniques in healthcare domains, in order to provide the research community with systematic understanding of theoretical foundations, enabling methods and techniques, existing challenges, and new insights of this emerging paradigm. By first briefly examining theoretical foundations and key techniques in RL research from efficient and representational directions, we then provide an overview of RL applications in a variety of healthcare domains, ranging from dynamic treatment regimes in chronic diseases and critical care, automated medical diagnosis from both unstructured and structured clinical data, as well as many other control or scheduling domains that have infiltrated many aspects of a healthcare system. Finally, we summarize the challenges and open issues in current research, and point out some potential solutions and directions for future research.
Tasks	Decision Making, Medical Diagnosis
Published	2019-08-22
URL	https://arxiv.org/abs/1908.08796v1
PDF	https://arxiv.org/pdf/1908.08796v1.pdf
PWC	https://paperswithcode.com/paper/reinforcement-learning-in-healthcare-a-survey
Repo
Framework

An End-to-End Encrypted Neural Network for Gradient Updates Transmission in Federated Learning


Title	An End-to-End Encrypted Neural Network for Gradient Updates Transmission in Federated Learning
Authors	Hongyu Li, Tianqi Han
Abstract	Federated learning is a distributed learning method to train a shared model by aggregating the locally-computed gradient updates. In federated learning, bandwidth and privacy are two main concerns of gradient updates transmission. This paper proposes an end-to-end encrypted neural network for gradient updates transmission. This network first encodes the input gradient updates to a lower-dimension space in each client, which significantly mitigates the pressure of data communication in federated learning. The encoded gradient updates are directly recovered as a whole, i.e. the aggregated gradient updates of the trained model, in the decoding layers of the network on the server. In this way, gradient updates encrypted in each client are not only prevented from interception during communication, but also unknown to the server. Based on the encrypted neural network, a novel federated learning framework is designed in real applications. Experimental results show that the proposed network can effectively achieve two goals, privacy protection and data compression, under a little sacrifice of the model accuracy in federated learning.
Tasks
Published	2019-08-22
URL	https://arxiv.org/abs/1908.08340v1
PDF	https://arxiv.org/pdf/1908.08340v1.pdf
PWC	https://paperswithcode.com/paper/an-end-to-end-encrypted-neural-network-for
Repo
Framework

Applying Constraint Logic Programming to SQL Semantic Analysis


Title	Applying Constraint Logic Programming to SQL Semantic Analysis
Authors	Fernando Sáenz-Pérez
Abstract	This paper proposes the use of Constraint Logic Programming (CLP) to model SQL queries in a data-independent abstract layer by focusing on some semantic properties for signalling possible errors in such queries. First, we define a translation from SQL to Datalog, and from Datalog to CLP, so that solving this CLP program will give information about inconsistency, tautology, and possible simplifications. We use different constraint domains which are mapped to SQL types, and propose them to cooperate for improving accuracy. Our approach leverages a deductive system that includes SQL and Datalog, and we present an implementation in this system which is currently being tested in classroom, showing its advantages and differences with respect to other approaches, as well as some performance data. This paper is under consideration for acceptance in TPLP.
Tasks
Published	2019-07-25
URL	https://arxiv.org/abs/1907.10914v1
PDF	https://arxiv.org/pdf/1907.10914v1.pdf
PWC	https://paperswithcode.com/paper/applying-constraint-logic-programming-to-sql
Repo
Framework

Reconstructing dynamical networks via feature ranking


Title	Reconstructing dynamical networks via feature ranking
Authors	Marc G. Leguia, Zoran Levnajic, Ljupco Todorovski, Bernard Zenko
Abstract	Empirical data on real complex systems are becoming increasingly available. Parallel to this is the need for new methods of reconstructing (inferring) the topology of networks from time-resolved observations of their node-dynamics. The methods based on physical insights often rely on strong assumptions about the properties and dynamics of the scrutinized network. Here, we use the insights from machine learning to design a new method of network reconstruction that essentially makes no such assumptions. Specifically, we interpret the available trajectories (data) as features, and use two independent feature ranking approaches – Random forest and RReliefF – to rank the importance of each node for predicting the value of each other node, which yields the reconstructed adjacency matrix. We show that our method is fairly robust to coupling strength, system size, trajectory length and noise. We also find that the reconstruction quality strongly depends on the dynamical regime.
Tasks
Published	2019-02-11
URL	https://arxiv.org/abs/1902.03896v2
PDF	https://arxiv.org/pdf/1902.03896v2.pdf
PWC	https://paperswithcode.com/paper/reconstructing-dynamical-networks-via-feature
Repo
Framework

Evaluating the Stability of Recurrent Neural Models during Training with Eigenvalue Spectra Analysis


Title	Evaluating the Stability of Recurrent Neural Models during Training with Eigenvalue Spectra Analysis
Authors	Priyadarshini Panda, Efstathia Soufleri, Kaushik Roy
Abstract	We analyze the stability of recurrent networks, specifically, reservoir computing models during training by evaluating the eigenvalue spectra of the reservoir dynamics. To circumvent the instability arising in examining a closed loop reservoir system with feedback, we propose to break the closed loop system. Essentially, we unroll the reservoir dynamics over time while incorporating the feedback effects that preserve the overall temporal integrity of the system. We evaluate our methodology for fixed point and time varying targets with least squares regression and FORCE training, respectively. Our analysis establishes eigenvalue spectra (which is, shrinking of spectral circle as training progresses) as a valid and effective metric to gauge the convergence of training as well as the convergence of the chaotic activity of the reservoir toward stable states.
Tasks
Published	2019-05-08
URL	https://arxiv.org/abs/1905.03219v1
PDF	https://arxiv.org/pdf/1905.03219v1.pdf
PWC	https://paperswithcode.com/paper/evaluating-the-stability-of-recurrent-neural
Repo
Framework

Improving Reverberant Speech Training Using Diffuse Acoustic Simulation


Title	Improving Reverberant Speech Training Using Diffuse Acoustic Simulation
Authors	Zhenyu Tang, Lianwu Chen, Bo Wu, Dong Yu, Dinesh Manocha
Abstract	We present an efficient and realistic geometric acoustic simulation approach for generating and augmenting training data in speech-related machine learning tasks. Our physically-based acoustic simulation method is capable of modeling occlusion, specular and diffuse reflections of sound in complicated acoustic environments, whereas the classical image method can only model specular reflections in simple room settings. We show that by using our synthetic training data, the same neural networks gain significant performance improvement on real test sets in far-field speech recognition by 1.58% and keyword spotting by 21%, without fine-tuning using real impulse responses.
Tasks	Keyword Spotting, Speech Recognition
Published	2019-07-09
URL	https://arxiv.org/abs/1907.03988v4
PDF	https://arxiv.org/pdf/1907.03988v4.pdf
PWC	https://paperswithcode.com/paper/improving-reverberant-speech-training-using
Repo
Framework

Towards Reliable Online Clickbait Video Detection: A Content-Agnostic Approach


Title	Towards Reliable Online Clickbait Video Detection: A Content-Agnostic Approach
Authors	Lanyu Shang, Daniel Zhang, Michael Wang, Shuyue Lai, Dong Wang
Abstract	Online video sharing platforms (e.g., YouTube, Vimeo) have become an increasingly popular paradigm for people to consume video contents. Clickbait video, whose content clearly deviates from its title/thumbnail, has emerged as a critical problem on online video sharing platforms. Current clickbait detection solutions that mainly focus on analyzing the text of the title, the image of the thumbnail, or the content of the video are shown to be suboptimal in detecting the online clickbait videos. In this paper, we develop a novel content-agnostic scheme, Online Video Clickbait Protector (OVCP), to effectively detect clickbait videos by exploring the comments from the audience who watched the video. Different from existing solutions, OVCP does not directly analyze the content of the video and its pre-click information (e.g., title and thumbnail). Therefore, it is robust against sophisticated content creators who often generate clickbait videos that can bypass the current clickbait detectors. We evaluate OVCP with a real-world dataset collected from YouTube. Experimental results demonstrate that OVCP is effective in identifying clickbait videos and significantly outperforms both state-of-the-art baseline models and human annotators.
Tasks	Clickbait Detection
Published	2019-07-17
URL	https://arxiv.org/abs/1907.07604v2
PDF	https://arxiv.org/pdf/1907.07604v2.pdf
PWC	https://paperswithcode.com/paper/towards-reliable-online-clickbait-video
Repo
Framework

Learning a sparse database for patch-based medical image segmentation


Title	Learning a sparse database for patch-based medical image segmentation
Authors	Moti Freiman, Hannes Nickisch, Holger Schmitt, Pal Maurovich-Horvat, Patrick Donnelly, Mani Vembar, Liran Goshen
Abstract	We introduce a functional for the learning of an optimal database for patch-based image segmentation with application to coronary lumen segmentation from coronary computed tomography angiography (CCTA) data. The proposed functional consists of fidelity, sparseness and robustness to small-variations terms and their associated weights. Existing work address database optimization by prototype selection aiming to optimize the database by either adding or removing prototypes according to a set of predefined rules. In contrast, we formulate the database optimization task as an energy minimization problem that can be solved using standard numerical tools. We apply the proposed database optimization functional to the task of optimizing a database for patch-base coronary lumen segmentation. Our experiments using the publicly available MICCAI 2012 coronary lumen segmentation challenge data show that optimizing the database using the proposed approach reduced database size by 96% while maintaining the same level of lumen segmentation accuracy. Moreover, we show that the optimized database yields an improved specificity of CCTA based fractional flow reserve (0.73 vs 0.7 for all lesions and 0.68 vs 0.65 for obstructive lesions) using a training set of 132 (76 obstructive) coronary lesions with invasively measured FFR as the reference.
Tasks	Medical Image Segmentation, Semantic Segmentation
Published	2019-06-25
URL	https://arxiv.org/abs/1906.10338v1
PDF	https://arxiv.org/pdf/1906.10338v1.pdf
PWC	https://paperswithcode.com/paper/learning-a-sparse-database-for-patch-based
Repo
Framework

A Monaural Speech Enhancement Method for Robust Small-Footprint Keyword Spotting


Title	A Monaural Speech Enhancement Method for Robust Small-Footprint Keyword Spotting
Authors	Yue Gu, Zhihao Du, Hui Zhang, Xueliang Zhang
Abstract	Robustness against noise is critical for keyword spotting (KWS) in real-world environments. To improve the robustness, a speech enhancement front-end is involved. Instead of treating the speech enhancement as a separated preprocessing before the KWS system, in this study, a pre-trained speech enhancement front-end and a convolutional neural networks (CNNs) based KWS system are concatenated, where a feature transformation block is used to transform the output from the enhancement front-end into the KWS system’s input. The whole model is trained jointly, thus the linguistic and other useful information from the KWS system can be back-propagated to the enhancement front-end to improve its performance. To fit the small-footprint device, a novel convolution recurrent network is proposed, which needs fewer parameters and computation and does not degrade performance. Furthermore, by changing the input features from the power spectrogram to Mel-spectrogram, less computation and better performance are obtained. our experimental results demonstrate that the proposed method significantly improves the KWS system with respect to noise robustness.
Tasks	Keyword Spotting, Small-Footprint Keyword Spotting, Speech Enhancement
Published	2019-06-20
URL	https://arxiv.org/abs/1906.08415v1
PDF	https://arxiv.org/pdf/1906.08415v1.pdf
PWC	https://paperswithcode.com/paper/a-monaural-speech-enhancement-method-for
Repo
Framework

Perception Evaluation – A new solar image quality metric based on the multi-fractal property of texture features


Title	Perception Evaluation – A new solar image quality metric based on the multi-fractal property of texture features
Authors	Yi Huang, Peng Jia, Dongmei Cai, Bojun Cai
Abstract	Next-generation ground-based solar observations require good image quality metrics for post-facto processing techniques. Based on the assumption that texture features in solar images are multi-fractal which can be extracted by a trained deep neural network as feature maps, a new reduced-reference objective image quality metric, the perception evaluation is proposed. The perception evaluation is defined as cosine distance of Gram matrix between feature maps extracted from high resolution reference image and that from blurred images. We evaluate performance of the perception evaluation with simulated and real observation images. The results show that with a high resolution image as reference, the perception evaluation can give robust estimate of image quality for solar images of different scenes.
Tasks
Published	2019-05-24
URL	https://arxiv.org/abs/1905.09980v2
PDF	https://arxiv.org/pdf/1905.09980v2.pdf
PWC	https://paperswithcode.com/paper/perception-evaluation-a-new-solar-image
Repo
Framework

Scalable Neural Architecture Search for 3D Medical Image Segmentation


Title	Scalable Neural Architecture Search for 3D Medical Image Segmentation
Authors	Sungwoong Kim, Ildoo Kim, Sungbin Lim, Woonhyuk Baek, Chiheon Kim, Hyungjoo Cho, Boogeon Yoon, Taesup Kim
Abstract	In this paper, a neural architecture search (NAS) framework is proposed for 3D medical image segmentation, to automatically optimize a neural architecture from a large design space. Our NAS framework searches the structure of each layer including neural connectivities and operation types in both of the encoder and decoder. Since optimizing over a large discrete architecture space is difficult due to high-resolution 3D medical images, a novel stochastic sampling algorithm based on a continuous relaxation is also proposed for scalable gradient based optimization. On the 3D medical image segmentation tasks with a benchmark dataset, an automatically designed architecture by the proposed NAS framework outperforms the human-designed 3D U-Net, and moreover this optimized architecture is well suited to be transferred for different tasks.
Tasks	Medical Image Segmentation, Neural Architecture Search, Semantic Segmentation
Published	2019-06-13
URL	https://arxiv.org/abs/1906.05956v1
PDF	https://arxiv.org/pdf/1906.05956v1.pdf
PWC	https://paperswithcode.com/paper/scalable-neural-architecture-search-for-3d
Repo
Framework