January 27, 2020

3253 words 16 mins read

Paper Group ANR 1160

Exploring Stereovision-Based 3-D Scene Reconstruction for Augmented Reality. Challenging the Boundaries of Speech Recognition: The MALACH Corpus. A New CGAN Technique for Constrained Topology Design Optimization. MemGuard: Defending against Black-Box Membership Inference Attacks via Adversarial Examples. Remote Heart Rate Measurement from Highly Co …

Exploring Stereovision-Based 3-D Scene Reconstruction for Augmented Reality


Title	Exploring Stereovision-Based 3-D Scene Reconstruction for Augmented Reality
Authors	Guang-Yu Nie, Yun Liu, Cong Wang, Yue Liu, Yongtian Wang
Abstract	Three-dimensional (3-D) scene reconstruction is one of the key techniques in Augmented Reality (AR), which is related to the integration of image processing and display systems of complex information. Stereo matching is a computer vision based approach for 3-D scene reconstruction. In this paper, we explore an improved stereo matching network, SLED-Net, in which a Single Long Encoder-Decoder is proposed to replace the stacked hourglass network in PSM-Net for better contextual information learning. We compare SLED-Net to state-of-the-art methods recently published, and demonstrate its superior performance on Scene Flow and KITTI2015 test sets.
Tasks	Stereo Matching, Stereo Matching Hand
Published	2019-02-17
URL	http://arxiv.org/abs/1902.06255v1
PDF	http://arxiv.org/pdf/1902.06255v1.pdf
PWC	https://paperswithcode.com/paper/exploring-stereovision-based-3-d-scene
Repo
Framework

Challenging the Boundaries of Speech Recognition: The MALACH Corpus


Title	Challenging the Boundaries of Speech Recognition: The MALACH Corpus
Authors	Michael Picheny, Zóltan Tüske, Brian Kingsbury, Kartik Audhkhasi, Xiaodong Cui, George Saon
Abstract	There has been huge progress in speech recognition over the last several years. Tasks once thought extremely difficult, such as SWITCHBOARD, now approach levels of human performance. The MALACH corpus (LDC catalog LDC2012S05), a 375-Hour subset of a large archive of Holocaust testimonies collected by the Survivors of the Shoah Visual History Foundation, presents significant challenges to the speech community. The collection consists of unconstrained, natural speech filled with disfluencies, heavy accents, age-related coarticulations, un-cued speaker and language switching, and emotional speech - all still open problems for speech recognition systems. Transcription is challenging even for skilled human annotators. This paper proposes that the community place focus on the MALACH corpus to develop speech recognition systems that are more robust with respect to accents, disfluencies and emotional speech. To reduce the barrier for entry, a lexicon and training and testing setups have been created and baseline results using current deep learning technologies are presented. The metadata has just been released by LDC (LDC2019S11). It is hoped that this resource will enable the community to build on top of these baselines so that the extremely important information in these and related oral histories becomes accessible to a wider audience.
Tasks	Speech Recognition
Published	2019-08-09
URL	https://arxiv.org/abs/1908.03455v1
PDF	https://arxiv.org/pdf/1908.03455v1.pdf
PWC	https://paperswithcode.com/paper/challenging-the-boundaries-of-speech
Repo
Framework

A New CGAN Technique for Constrained Topology Design Optimization


Title	A New CGAN Technique for Constrained Topology Design Optimization
Authors	M. -H. Herman Shen, Liang Chen
Abstract	This paper presents a new conditional GAN (named convex relaxing CGAN or crCGAN) to replicate the conventional constrained topology optimization algorithms in an extremely effective and efficient process. The proposed crCGAN consists of a generator and a discriminator, both of which are deep convolutional neural networks (CNN) and the topology design constraint can be conditionally set to both the generator and discriminator. In order to improve the training efficiency and accuracy due to the dependency between the training images and the condition, a variety of crCGAN formulation are introduced to relax the non-convex design space. These new formulations were evaluated and validated via a series of comprehensive experiments. Moreover, a minibatch discrimination technique was introduced in the crCGAN training process to stabilize the convergence and avoid the mode collapse problems. Additional verifications were conducted using the state-of-the-art MNIST digits and CIFAR-10 images conditioned by class labels. The experimental evaluations clearly reveal that the new objective formulation with the minibatch discrimination training provides not only the accuracy but also the consistency of the designs.
Tasks
Published	2019-01-22
URL	http://arxiv.org/abs/1901.07675v2
PDF	http://arxiv.org/pdf/1901.07675v2.pdf
PWC	https://paperswithcode.com/paper/a-new-cgan-technique-for-constrained-topology
Repo
Framework

MemGuard: Defending against Black-Box Membership Inference Attacks via Adversarial Examples


Title	MemGuard: Defending against Black-Box Membership Inference Attacks via Adversarial Examples
Authors	Jinyuan Jia, Ahmed Salem, Michael Backes, Yang Zhang, Neil Zhenqiang Gong
Abstract	In a membership inference attack, an attacker aims to infer whether a data sample is in a target classifier’s training dataset or not. Specifically, given a black-box access to the target classifier, the attacker trains a binary classifier, which takes a data sample’s confidence score vector predicted by the target classifier as an input and predicts the data sample to be a member or non-member of the target classifier’s training dataset. Membership inference attacks pose severe privacy and security threats to the training dataset. Most existing defenses leverage differential privacy when training the target classifier or regularize the training process of the target classifier. These defenses suffer from two key limitations: 1) they do not have formal utility-loss guarantees of the confidence score vectors, and 2) they achieve suboptimal privacy-utility tradeoffs. In this work, we propose MemGuard, the first defense with formal utility-loss guarantees against black-box membership inference attacks. Instead of tampering the training process of the target classifier, MemGuard adds noise to each confidence score vector predicted by the target classifier. Our key observation is that attacker uses a classifier to predict member or non-member and classifier is vulnerable to adversarial examples. Based on the observation, we propose to add a carefully crafted noise vector to a confidence score vector to turn it into an adversarial example that misleads the attacker’s classifier. Our experimental results on three datasets show that MemGuard can effectively defend against membership inference attacks and achieve better privacy-utility tradeoffs than existing defenses. Our work is the first one to show that adversarial examples can be used as defensive mechanisms to defend against membership inference attacks.
Tasks	Inference Attack
Published	2019-09-23
URL	https://arxiv.org/abs/1909.10594v3
PDF	https://arxiv.org/pdf/1909.10594v3.pdf
PWC	https://paperswithcode.com/paper/memguard-defending-against-black-box
Repo
Framework

Remote Heart Rate Measurement from Highly Compressed Facial Videos: an End-to-end Deep Learning Solution with Video Enhancement


Title	Remote Heart Rate Measurement from Highly Compressed Facial Videos: an End-to-end Deep Learning Solution with Video Enhancement
Authors	Zitong Yu, Wei Peng, Xiaobai Li, Xiaopeng Hong, Guoying Zhao
Abstract	Remote photoplethysmography (rPPG), which aims at measuring heart activities without any contact, has great potential in many applications (e.g., remote healthcare). Existing rPPG approaches rely on analyzing very fine details of facial videos, which are prone to be affected by video compression. Here we propose a two-stage, end-to-end method using hidden rPPG information enhancement and attention networks, which is the first attempt to counter video compression loss and recover rPPG signals from highly compressed videos. The method includes two parts: 1) a Spatio-Temporal Video Enhancement Network (STVEN) for video enhancement, and 2) an rPPG network (rPPGNet) for rPPG signal recovery. The rPPGNet can work on its own for robust rPPG measurement, and the STVEN network can be added and jointly trained to further boost the performance especially on highly compressed videos. Comprehensive experiments are performed on two benchmark datasets to show that, 1) the proposed method not only achieves superior performance on compressed videos with high-quality videos pair, 2) it also generalizes well on novel data with only compressed videos available, which implies the promising potential for real world applications.
Tasks	Video Compression
Published	2019-07-27
URL	https://arxiv.org/abs/1907.11921v1
PDF	https://arxiv.org/pdf/1907.11921v1.pdf
PWC	https://paperswithcode.com/paper/remote-heart-rate-measurement-from-highly
Repo
Framework

A Deep Dive into Understanding Tumor Foci Classification using Multiparametric MRI Based on Convolutional Neural Network


Title	A Deep Dive into Understanding Tumor Foci Classification using Multiparametric MRI Based on Convolutional Neural Network
Authors	Weiwei Zong, Joon Lee, Chang Liu, Eric Carver, Aharon Feldman, Branislava Janic, Mohamed Elshaikh, Milan Pantelic, David Hearshen, Indrin Chetty, Benjamin Movsas, Ning Wen
Abstract	Data scarcity has refrained deep learning models from making greater progress in prostate images analysis using multiparametric MRI. In this paper, an efficient convolutional neural network (CNN) was developed to classify lesion malignancy for prostate cancer patients, based on which model interpretation was systematically analyzed to bridge the gap between natural images and MR images, which is the first one of its kind in the literature. The problem of small sample size was addressed and successfully tackled by feeding the intermediate features into a traditional classification algorithm known as weighted extreme learning machine, with imbalanced distribution among output categories taken into consideration. Model trained on public data set was able to generalize to data from an independent institution to make accurate prediction. The generated saliency map was found to overlay well with the lesion and could benefit clinicians for diagnosing purpose.
Tasks
Published	2019-03-29
URL	http://arxiv.org/abs/1903.12331v2
PDF	http://arxiv.org/pdf/1903.12331v2.pdf
PWC	https://paperswithcode.com/paper/a-deep-dive-into-understanding-tumor-foci
Repo
Framework

Temporal Graph Kernels for Classifying Dissemination Processes


Title	Temporal Graph Kernels for Classifying Dissemination Processes
Authors	Lutz Oettershagen, Nils M. Kriege, Christopher Morris, Petra Mutzel
Abstract	Many real-world graphs or networks are temporal, e.g., in a social network persons only interact at specific points in time. This information directs dissemination processes on the network, such as the spread of rumors, fake news, or diseases. However, the current state-of-the-art methods for supervised graph classification are designed mainly for static graphs and may not be able to capture temporal information. Hence, they are not powerful enough to distinguish between graphs modeling different dissemination processes. To address this, we introduce a framework to lift standard graph kernels to the temporal domain. Specifically, we explore three different approaches and investigate the trade-offs between loss of temporal information and efficiency. Moreover, to handle large-scale graphs, we propose stochastic variants of our kernels with provable approximation guarantees. We evaluate our methods on a wide range of real-world social networks. Our methods beat static kernels by a large margin in terms of accuracy while still being scalable to large graphs and data sets. Hence, we confirm that taking temporal information into account is crucial for the successful classification of dissemination processes.
Tasks	Graph Classification
Published	2019-10-14
URL	https://arxiv.org/abs/1911.05496v1
PDF	https://arxiv.org/pdf/1911.05496v1.pdf
PWC	https://paperswithcode.com/paper/temporal-graph-kernels-for-classifying
Repo
Framework

Assessment of the Local Tchebichef Moments Method for Texture Classification by Fine Tuning Extraction Parameters


Title	Assessment of the Local Tchebichef Moments Method for Texture Classification by Fine Tuning Extraction Parameters
Authors	Andre Barczak, Napoleon Reyes, Teo Susnjak
Abstract	In this paper we use machine learning to study the application of Local Tchebichef Moments (LTM) to the problem of texture classification. The original LTM method was proposed by Mukundan (2014). The LTM method can be used for texture analysis in many different ways, either using the moment values directly, or more simply creating a relationship between the moment values of different orders, producing a histogram similar to those of Local Binary Pattern (LBP) based methods. The original method was not fully tested with large datasets, and there are several parameters that should be characterised for performance. Among these parameters are the kernel size, the moment orders and the weights for each moment. We implemented the LTM method in a flexible way in order to allow for the modification of the parameters that can affect its performance. Using four subsets from the Outex dataset (a popular benchmark for texture analysis), we used Random Forests to create models and to classify texture images, recording the standard metrics for each classifier. We repeated the process using several variations of the LBP method for comparison. This allowed us to find the best combination of orders and weights for the LTM method for texture classification.
Tasks	Texture Classification
Published	2019-10-22
URL	https://arxiv.org/abs/1910.09758v1
PDF	https://arxiv.org/pdf/1910.09758v1.pdf
PWC	https://paperswithcode.com/paper/assessment-of-the-local-tchebichef-moments
Repo
Framework

Neural Networks for Predicting Human Interactions in Repeated Games


Title	Neural Networks for Predicting Human Interactions in Repeated Games
Authors	Yoav Kolumbus, Gali Noti
Abstract	We consider the problem of predicting human players’ actions in repeated strategic interactions. Our goal is to predict the dynamic step-by-step behavior of individual players in previously unseen games. We study the ability of neural networks to perform such predictions and the information that they require. We show on a dataset of normal-form games from experiments with human participants that standard neural networks are able to learn functions that provide more accurate predictions of the players’ actions than established models from behavioral economics. The networks outperform the other models in terms of prediction accuracy and cross-entropy, and yield higher economic value. We show that if the available input is only of a short sequence of play, economic information about the game is important for predicting behavior of human agents. However, interestingly, we find that when the networks are trained with long enough sequences of history of play, action-based networks do well and additional economic details about the game do not improve their performance, indicating that the sequence of actions encode sufficient information for the success in the prediction task.
Tasks
Published	2019-11-08
URL	https://arxiv.org/abs/1911.03233v1
PDF	https://arxiv.org/pdf/1911.03233v1.pdf
PWC	https://paperswithcode.com/paper/neural-networks-for-predicting-human
Repo
Framework

Deep Discriminative Clustering Analysis


Title	Deep Discriminative Clustering Analysis
Authors	Jianlong Chang, Yiwen Guo, Lingfeng Wang, Gaofeng Meng, Shiming Xiang, Chunhong Pan
Abstract	Traditional clustering methods often perform clustering with low-level indiscriminative representations and ignore relationships between patterns, resulting in slight achievements in the era of deep learning. To handle this problem, we develop Deep Discriminative Clustering (DDC) that models the clustering task by investigating relationships between patterns with a deep neural network. Technically, a global constraint is introduced to adaptively estimate the relationships, and a local constraint is developed to endow the network with the capability of learning high-level discriminative representations. By iteratively training the network and estimating the relationships in a mini-batch manner, DDC theoretically converges and the trained network enables to generate a group of discriminative representations that can be treated as clustering centers for straightway clustering. Extensive experiments strongly demonstrate that DDC outperforms current methods on eight image, text and audio datasets concurrently.
Tasks
Published	2019-05-05
URL	https://arxiv.org/abs/1905.01681v1
PDF	https://arxiv.org/pdf/1905.01681v1.pdf
PWC	https://paperswithcode.com/paper/deep-discriminative-clustering-analysis
Repo
Framework

Graph-Preserving Grid Layout: A Simple Graph Drawing Method for Graph Classification using CNNs


Title	Graph-Preserving Grid Layout: A Simple Graph Drawing Method for Graph Classification using CNNs
Authors	Yecheng Lyu, Xinming Huang, Ziming Zhang
Abstract	Graph convolutional networks (GCNs) suffer from the irregularity of graphs, while more widely-used convolutional neural networks (CNNs) benefit from regular grids. To bridge the gap between GCN and CNN, in contrast to previous works on generalizing the basic operations in CNNs to graph data, in this paper we address the problem of how to project undirected graphs onto the grid in a {\em principled} way where CNNs can be used as backbone for geometric deep learning. To this end, inspired by the literature of graph drawing we propose a novel graph-preserving grid layout (GPGL), an integer programming that minimizes the topological loss on the grid. Technically we propose solving GPGL approximately using a {\em regularized} Kamada-Kawai algorithm, a well-known nonconvex optimization technique in graph drawing, with a vertex separation penalty that improves the rounding performance on top of the solutions from relaxation. Using GPGL we can easily conduct data augmentation as every local minimum will lead to a grid layout for the same graph. Together with the help of multi-scale maxout CNNs, we demonstrate the empirical success of our method for graph classification.
Tasks	Data Augmentation, Graph Classification
Published	2019-09-26
URL	https://arxiv.org/abs/1909.12383v1
PDF	https://arxiv.org/pdf/1909.12383v1.pdf
PWC	https://paperswithcode.com/paper/graph-preserving-grid-layout-a-simple-graph
Repo
Framework

On a generalization of the Jensen-Shannon divergence and the JS-symmetrization of distances relying on abstract means


Title	On a generalization of the Jensen-Shannon divergence and the JS-symmetrization of distances relying on abstract means
Authors	Frank Nielsen
Abstract	The Jensen-Shannon divergence is a renown bounded symmetrization of the unbounded Kullback-Leibler divergence which measures the total Kullback-Leibler divergence to the average mixture distribution. However the Jensen-Shannon divergence between Gaussian distributions is not available in closed-form. To bypass this problem, we present a generalization of the Jensen-Shannon (JS) divergence using abstract means which yields closed-form expressions when the mean is chosen according to the parametric family of distributions. More generally, we define the JS-symmetrizations of any distance using mixtures derived from abstract means. In particular, we first show that the geometric mean is well-suited for exponential families, and report two closed-form formula for (i) the geometric Jensen-Shannon divergence between probability densities of the same exponential family, and (ii) the geometric JS-symmetrization of the reverse Kullback-Leibler divergence. As a second illustrating example, we show that the harmonic mean is well-suited for the scale Cauchy distributions, and report a closed-form formula for the harmonic Jensen-Shannon divergence between scale Cauchy distributions. Applications to clustering with respect to these novel Jensen-Shannon divergences are touched upon.
Tasks
Published	2019-04-08
URL	http://arxiv.org/abs/1904.04017v2
PDF	http://arxiv.org/pdf/1904.04017v2.pdf
PWC	https://paperswithcode.com/paper/on-a-generalization-of-the-jensen-shannon
Repo
Framework

Towards Testing of Deep Learning Systems with Training Set Reduction


Title	Towards Testing of Deep Learning Systems with Training Set Reduction
Authors	Helge Spieker, Arnaud Gotlieb
Abstract	Testing the implementation of deep learning systems and their training routines is crucial to maintain a reliable code base. Modern software development employs processes, such as Continuous Integration, in which changes to the software are frequently integrated and tested. However, testing the training routines requires running them and fully training a deep learning model can be resource-intensive, when using the full data set. Using only a subset of the training data can improve test run time, but can also reduce its effectiveness. We evaluate different ways for training set reduction and their ability to mimic the characteristics of model training with the original full data set. Our results underline the usefulness of training set reduction, especially in resource-constrained environments.
Tasks
Published	2019-01-14
URL	http://arxiv.org/abs/1901.04169v1
PDF	http://arxiv.org/pdf/1901.04169v1.pdf
PWC	https://paperswithcode.com/paper/towards-testing-of-deep-learning-systems-with
Repo
Framework

Flexible Operator Embeddings via Deep Learning


Title	Flexible Operator Embeddings via Deep Learning
Authors	Ryan Marcus, Olga Papaemmanouil
Abstract	Integrating machine learning into the internals of database management systems requires significant feature engineering, a human effort-intensive process to determine the best way to represent the pieces of information that are relevant to a task. In addition to being labor intensive, the process of hand-engineering features must generally be repeated for each data management task, and may make assumptions about the underlying database that are not universally true. We introduce flexible operator embeddings, a deep learning technique for automatically transforming query operators into feature vectors that are useful for a multiple data management tasks and is custom-tailored to the underlying database. Our approach works by taking advantage of an operator’s context, resulting in a neural network that quickly transforms sparse representations of query operators into dense, information-rich feature vectors. Experimentally, we show that our flexible operator embeddings perform well across a number of data management tasks, using both synthetic and real-world datasets.
Tasks	Feature Engineering
Published	2019-01-25
URL	http://arxiv.org/abs/1901.09090v2
PDF	http://arxiv.org/pdf/1901.09090v2.pdf
PWC	https://paperswithcode.com/paper/flexible-operator-embeddings-via-deep
Repo
Framework

Generating Persona Consistent Dialogues by Exploiting Natural Language Inference


Title	Generating Persona Consistent Dialogues by Exploiting Natural Language Inference
Authors	Haoyu Song, Wei-Nan Zhang, Jingwen Hu, Ting Liu
Abstract	Consistency is one of the major challenges faced by dialogue agents. A human-like dialogue agent should not only respond naturally, but also maintain a consistent persona. In this paper, we exploit the advantages of natural language inference (NLI) technique to address the issue of generating persona consistent dialogues. Different from existing work that re-ranks the retrieved responses through an NLI model, we cast the task as a reinforcement learning problem and propose to exploit the NLI signals from response-persona pairs as rewards for the process of dialogue generation. Specifically, our generator employs an attention-based encoder-decoder to generate persona-based responses. Our evaluator consists of two components: an adversarially trained naturalness module and an NLI based consistency module. Moreover, we use another well-performed NLI model in the evaluation of persona-consistency. Experimental results on both human and automatic metrics, including the model-based consistency evaluation, demonstrate that the proposed approach outperforms strong generative baselines, especially in the persona-consistency of generated responses.
Tasks	Dialogue Generation, Natural Language Inference
Published	2019-11-14
URL	https://arxiv.org/abs/1911.05889v3
PDF	https://arxiv.org/pdf/1911.05889v3.pdf
PWC	https://paperswithcode.com/paper/generating-persona-consistent-dialogues-by
Repo
Framework