October 17, 2019

3217 words 16 mins read

Paper Group ANR 925

Security Event Recognition for Visual Surveillance. Decision method choice in a human posture recognition context. Do WaveNets Dream of Acoustic Waves?. Multi-modal Image Processing based on Coupled Dictionary Learning. Evaluating Multimodal Representations on Sentence Similarity: vSTS, Visual Semantic Textual Similarity Dataset. A Tour of Unsuperv …

Security Event Recognition for Visual Surveillance


Title	Security Event Recognition for Visual Surveillance
Authors	Michael Ying Yang, Wentong Liao, Chun Yang, Yanpeng Cao, Bodo Rosenhahn
Abstract	With rapidly increasing deployment of surveillance cameras, the reliable methods for automatically analyzing the surveillance video and recognizing special events are demanded by different practical applications. This paper proposes a novel effective framework for security event analysis in surveillance videos. First, convolutional neural network (CNN) framework is used to detect objects of interest in the given videos. Second, the owners of the objects are recognized and monitored in real-time as well. If anyone moves any object, this person will be verified whether he/she is its owner. If not, this event will be further analyzed and distinguished between two different scenes: moving the object away or stealing it. To validate the proposed approach, a new video dataset consisting of various scenarios is constructed for more complex tasks. For comparison purpose, the experiments are also carried out on the benchmark databases related to the task on abandoned luggage detection. The experimental results show that the proposed approach outperforms the state-of-the-art methods and effective in recognizing complex security events.
Tasks
Published	2018-10-26
URL	http://arxiv.org/abs/1810.11348v1
PDF	http://arxiv.org/pdf/1810.11348v1.pdf
PWC	https://paperswithcode.com/paper/security-event-recognition-for-visual
Repo
Framework

Decision method choice in a human posture recognition context


Title	Decision method choice in a human posture recognition context
Authors	Stéphane Perrin, Eric Benoit, Didier Coquin
Abstract	Human posture recognition provides a dynamic field that has produced many methods. Using fuzzy subsets based data fusion methods to aggregate the results given by different types of recognition processes is a convenient way to improve recognition methods. Nevertheless, choosing a defuzzification method to imple-ment the decision is a crucial point of this approach. The goal of this paper is to present an approach where the choice of the defuzzification method is driven by the constraints of the final data user, which are expressed as limitations on indica-tors like confidence or accuracy. A practical experimentation illustrating this ap-proach is presented: from a depth camera sensor, human posture is interpreted and the defuzzification method is selected in accordance with the constraints of the final information consumer. The paper illustrates the interest of the approach in a context of postures based human robot communication.
Tasks
Published	2018-07-11
URL	http://arxiv.org/abs/1807.04170v1
PDF	http://arxiv.org/pdf/1807.04170v1.pdf
PWC	https://paperswithcode.com/paper/decision-method-choice-in-a-human-posture
Repo
Framework

Do WaveNets Dream of Acoustic Waves?


Title	Do WaveNets Dream of Acoustic Waves?
Authors	Kanru Hua
Abstract	Various sources have reported the WaveNet deep learning architecture being able to generate high-quality speech, but to our knowledge there haven’t been studies on the interpretation or visualization of trained WaveNets. This study investigates the possibility that WaveNet understands speech by unsupervisedly learning an acoustically meaningful latent representation of the speech signals in its receptive field; we also attempt to interpret the mechanism by which the feature extraction is performed. Suggested by singular value decomposition and linear regression analysis on the activations and known acoustic features (e.g. F0), the key findings are (1) activations in the higher layers are highly correlated with spectral features; (2) WaveNet explicitly performs pitch extraction despite being trained to directly predict the next audio sample and (3) for the said feature analysis to take place, the latent signal representation is converted back and forth between baseband and wideband components.
Tasks
Published	2018-02-23
URL	http://arxiv.org/abs/1802.08370v1
PDF	http://arxiv.org/pdf/1802.08370v1.pdf
PWC	https://paperswithcode.com/paper/do-wavenets-dream-of-acoustic-waves
Repo
Framework


Title	Multi-modal Image Processing based on Coupled Dictionary Learning
Authors	Pingfan Song, Miguel R. D. Rodrigues
Abstract	In real-world scenarios, many data processing problems often involve heterogeneous images associated with different imaging modalities. Since these multimodal images originate from the same phenomenon, it is realistic to assume that they share common attributes or characteristics. In this paper, we propose a multi-modal image processing framework based on coupled dictionary learning to capture similarities and disparities between different image modalities. In particular, our framework can capture favorable structure similarities across different image modalities such as edges, corners, and other elementary primitives in a learned sparse transform domain, instead of the original pixel domain, that can be used to improve a number of image processing tasks such as denoising, inpainting, or super-resolution. Practical experiments demonstrate that incorporating multimodal information using our framework brings notable benefits.
Tasks	Denoising, Dictionary Learning, Super-Resolution
Published	2018-06-26
URL	http://arxiv.org/abs/1806.09882v1
PDF	http://arxiv.org/pdf/1806.09882v1.pdf
PWC	https://paperswithcode.com/paper/multi-modal-image-processing-based-on-coupled
Repo
Framework

Evaluating Multimodal Representations on Sentence Similarity: vSTS, Visual Semantic Textual Similarity Dataset


Title	Evaluating Multimodal Representations on Sentence Similarity: vSTS, Visual Semantic Textual Similarity Dataset
Authors	Oier Lopez de Lacalle, Aitor Soroa, Eneko Agirre
Abstract	In this paper we introduce vSTS, a new dataset for measuring textual similarity of sentences using multimodal information. The dataset is comprised by images along with its respectively textual captions. We describe the dataset both quantitatively and qualitatively, and claim that it is a valid gold standard for measuring automatic multimodal textual similarity systems. We also describe the initial experiments combining the multimodal information.
Tasks	Semantic Textual Similarity
Published	2018-09-11
URL	http://arxiv.org/abs/1809.03695v1
PDF	http://arxiv.org/pdf/1809.03695v1.pdf
PWC	https://paperswithcode.com/paper/evaluating-multimodal-representations-on
Repo
Framework

A Tour of Unsupervised Deep Learning for Medical Image Analysis


Title	A Tour of Unsupervised Deep Learning for Medical Image Analysis
Authors	Khalid Raza, Nripendra Kumar Singh
Abstract	Interpretation of medical images for diagnosis and treatment of complex disease from high-dimensional and heterogeneous data remains a key challenge in transforming healthcare. In the last few years, both supervised and unsupervised deep learning achieved promising results in the area of medical imaging and image analysis. Unlike supervised learning which is biased towards how it is being supervised and manual efforts to create class label for the algorithm, unsupervised learning derive insights directly from the data itself, group the data and help to make data driven decisions without any external bias. This review systematically presents various unsupervised models applied to medical image analysis, including autoencoders and its several variants, Restricted Boltzmann machines, Deep belief networks, Deep Boltzmann machine and Generative adversarial network. Future research opportunities and challenges of unsupervised techniques for medical image analysis have also been discussed.
Tasks
Published	2018-12-19
URL	http://arxiv.org/abs/1812.07715v1
PDF	http://arxiv.org/pdf/1812.07715v1.pdf
PWC	https://paperswithcode.com/paper/a-tour-of-unsupervised-deep-learning-for
Repo
Framework

Using Multi-task and Transfer Learning to Solve Working Memory Tasks


Title	Using Multi-task and Transfer Learning to Solve Working Memory Tasks
Authors	T. S. Jayram, Tomasz Kornuta, Ryan L. McAvoy, Ahmet S. Ozcan
Abstract	We propose a new architecture called Memory-Augmented Encoder-Solver (MAES) that enables transfer learning to solve complex working memory tasks adapted from cognitive psychology. It uses dual recurrent neural network controllers, inside the encoder and solver, respectively, that interface with a shared memory module and is completely differentiable. We study different types of encoders in a systematic manner and demonstrate a unique advantage of multi-task learning in obtaining the best possible encoder. We show by extensive experimentation that the trained MAES models achieve task-size generalization, i.e., they are capable of handling sequential inputs 50 times longer than seen during training, with appropriately large memory modules. We demonstrate that the performance achieved by MAES far outperforms existing and well-known models such as the LSTM, NTM and DNC on the entire suite of tasks.
Tasks	Multi-Task Learning, Transfer Learning
Published	2018-09-28
URL	http://arxiv.org/abs/1809.10847v1
PDF	http://arxiv.org/pdf/1809.10847v1.pdf
PWC	https://paperswithcode.com/paper/using-multi-task-and-transfer-learning-to
Repo
Framework

Neural Speech Synthesis with Transformer Network


Title	Neural Speech Synthesis with Transformer Network
Authors	Naihan Li, Shujie Liu, Yanqing Liu, Sheng Zhao, Ming Liu, Ming Zhou
Abstract	Although end-to-end neural text-to-speech (TTS) methods (such as Tacotron2) are proposed and achieve state-of-the-art performance, they still suffer from two problems: 1) low efficiency during training and inference; 2) hard to model long dependency using current recurrent neural networks (RNNs). Inspired by the success of Transformer network in neural machine translation (NMT), in this paper, we introduce and adapt the multi-head attention mechanism to replace the RNN structures and also the original attention mechanism in Tacotron2. With the help of multi-head self-attention, the hidden states in the encoder and decoder are constructed in parallel, which improves the training efficiency. Meanwhile, any two inputs at different times are connected directly by self-attention mechanism, which solves the long range dependency problem effectively. Using phoneme sequences as input, our Transformer TTS network generates mel spectrograms, followed by a WaveNet vocoder to output the final audio results. Experiments are conducted to test the efficiency and performance of our new network. For the efficiency, our Transformer TTS network can speed up the training about 4.25 times faster compared with Tacotron2. For the performance, rigorous human tests show that our proposed model achieves state-of-the-art performance (outperforms Tacotron2 with a gap of 0.048) and is very close to human quality (4.39 vs 4.44 in MOS).
Tasks	Machine Translation, Speech Synthesis
Published	2018-09-19
URL	http://arxiv.org/abs/1809.08895v3
PDF	http://arxiv.org/pdf/1809.08895v3.pdf
PWC	https://paperswithcode.com/paper/neural-speech-synthesis-with-transformer
Repo
Framework

Multiaccuracy: Black-Box Post-Processing for Fairness in Classification


Title	Multiaccuracy: Black-Box Post-Processing for Fairness in Classification
Authors	Michael P. Kim, Amirata Ghorbani, James Zou
Abstract	Prediction systems are successfully deployed in applications ranging from disease diagnosis, to predicting credit worthiness, to image recognition. Even when the overall accuracy is high, these systems may exhibit systematic biases that harm specific subpopulations; such biases may arise inadvertently due to underrepresentation in the data used to train a machine-learning model, or as the result of intentional malicious discrimination. We develop a rigorous framework of multiaccuracy auditing and post-processing to ensure accurate predictions across identifiable subgroups. Our algorithm, MULTIACCURACY-BOOST, works in any setting where we have black-box access to a predictor and a relatively small set of labeled data for auditing; importantly, this black-box framework allows for improved fairness and accountability of predictions, even when the predictor is minimally transparent. We prove that MULTIACCURACY-BOOST converges efficiently and show that if the initial model is accurate on an identifiable subgroup, then the post-processed model will be also. We experimentally demonstrate the effectiveness of the approach to improve the accuracy among minority subgroups in diverse applications (image classification, finance, population health). Interestingly, MULTIACCURACY-BOOST can improve subpopulation accuracy (e.g. for “black women”) even when the sensitive features (e.g. “race”, “gender”) are not given to the algorithm explicitly.
Tasks	Image Classification
Published	2018-05-31
URL	http://arxiv.org/abs/1805.12317v2
PDF	http://arxiv.org/pdf/1805.12317v2.pdf
PWC	https://paperswithcode.com/paper/multiaccuracy-black-box-post-processing-for
Repo
Framework

MSc Dissertation: Exclusive Row Biclustering for Gene Expression Using a Combinatorial Auction Approach


Title	MSc Dissertation: Exclusive Row Biclustering for Gene Expression Using a Combinatorial Auction Approach
Authors	Amichai Painsky
Abstract	The availability of large microarray data has led to a growing interest in biclustering methods in the past decade. Several algorithms have been proposed to identify subsets of genes and conditions according to different similarity measures and under varying constraints. In this paper we focus on the exclusive row biclustering problem for gene expression data sets, in which each row can only be a member of a single bicluster while columns can participate in multiple ones. This type of biclustering may be adequate, for example, for clustering groups of cancer patients where each patient (row) is expected to be carrying only a single type of cancer, while each cancer type is associated with multiple (and possibly overlapping) genes (columns). We present a novel method to identify these exclusive row biclusters through a combination of existing biclustering algorithms and combinatorial auction techniques. We devise an approach for tuning the threshold for our algorithm based on comparison to a null model in the spirit of the Gap statistic approach. We demonstrate our approach on both synthetic and real-world gene expression data and show its power in identifying large span non-overlapping rows sub matrices, while considering their unique nature. The Gap statistic approach succeeds in identifying appropriate thresholds in all our examples.
Tasks
Published	2018-09-13
URL	http://arxiv.org/abs/1809.05077v2
PDF	http://arxiv.org/pdf/1809.05077v2.pdf
PWC	https://paperswithcode.com/paper/msc-dissertation-exclusive-row-biclustering
Repo
Framework

Implementation of Deep Convolutional Neural Network in Multi-class Categorical Image Classification


Title	Implementation of Deep Convolutional Neural Network in Multi-class Categorical Image Classification
Authors	Pushparaja Murugan
Abstract	Convolutional Neural Networks has been implemented in many complex machine learning takes such as image classification, object identification, autonomous vehicle and robotic vision tasks. However, ConvNet architecture efficiency and accuracy depend on a large number of fac- tors. Also, the complex architecture requires a significant amount of data to train and involves with a large number of hyperparameters that increases the computational expenses and difficul- ties. Hence, it is necessary to address the limitations and techniques to overcome the barriers to ensure that the architecture performs well in complex visual tasks. This article is intended to develop an efficient ConvNet architecture for multi-class image categorical classification applica- tion. In the development of the architecture, large pool of grey scale images are taken as input information images and split into training and test datasets. The numerously available technique is implemented to reduce the overfitting and poor generalization of the network. The hyperpa- rameters of determined by Bayesian Optimization with Gaussian Process prior algorithm. ReLu non-linear activation function is implemented after the convolutional layers. Max pooling op- eration is carried out to downsampling the data points in pooling layers. Cross-entropy loss function is used to measure the performance of the architecture where the softmax is used in the classification layer. Mini-batch gradient descent with Adam optimizer algorithm is used for backpropagation. Developed architecture is validated with confusion matrix and classification report.
Tasks	Image Classification
Published	2018-01-03
URL	http://arxiv.org/abs/1801.01397v1
PDF	http://arxiv.org/pdf/1801.01397v1.pdf
PWC	https://paperswithcode.com/paper/implementation-of-deep-convolutional-neural
Repo
Framework

Deep Auto-Set: A Deep Auto-Encoder-Set Network for Activity Recognition Using Wearables


Title	Deep Auto-Set: A Deep Auto-Encoder-Set Network for Activity Recognition Using Wearables
Authors	Alireza Abedin Varamin, Ehsan Abbasnejad, Qinfeng Shi, Damith Ranasinghe, Hamid Rezatofighi
Abstract	Automatic recognition of human activities from time-series sensor data (referred to as HAR) is a growing area of research in ubiquitous computing. Most recent research in the field adopts supervised deep learning paradigms to automate extraction of intrinsic features from raw signal inputs and addresses HAR as a multi-class classification problem where detecting a single activity class within the duration of a sensory data segment suffices. However, due to the innate diversity of human activities and their corresponding duration, no data segment is guaranteed to contain sensor recordings of a single activity type. In this paper, we express HAR more naturally as a set prediction problem where the predictions are sets of ongoing activity elements with unfixed and unknown cardinality. For the first time, we address this problem by presenting a novel HAR approach that learns to output activity sets using deep neural networks. Moreover, motivated by the limited availability of annotated HAR datasets as well as the unfortunate immaturity of existing unsupervised systems, we complement our supervised set learning scheme with a prior unsupervised feature learning process that adopts convolutional auto-encoders to exploit unlabeled data. The empirical experiments on two widely adopted HAR datasets demonstrate the substantial improvement of our proposed methodology over the baseline models.
Tasks	Activity Recognition, Time Series
Published	2018-11-20
URL	http://arxiv.org/abs/1811.08127v1
PDF	http://arxiv.org/pdf/1811.08127v1.pdf
PWC	https://paperswithcode.com/paper/deep-auto-set-a-deep-auto-encoder-set-network
Repo
Framework

Compressive Hyperspectral Imaging: Fourier Transform Interferometry meets Single Pixel Camera


Title	Compressive Hyperspectral Imaging: Fourier Transform Interferometry meets Single Pixel Camera
Authors	Amirafshar Moshtaghpour, José M. Bioucas-Dias, Laurent Jacques
Abstract	This paper introduces a single-pixel HyperSpectral (HS) imaging framework based on Fourier Transform Interferometry (FTI). By combining a space-time coding of the light illumination with partial interferometric observations of a collimated light beam (observed by a single pixel), our system benefits from (i) reduced measurement rate and light-exposure of the observed object compared to common (Nyquist) FTI imagers, and (ii) high spectral resolution as desirable in, e.g., Fluorescence Spectroscopy (FS). From the principles of compressive sensing with multilevel sampling, our method leverages the sparsity “in level” of FS data, both in the spectral and the spatial domains. This allows us to optimize the space-time light coding using time-modulated Hadamard patterns. We confirm the effectiveness of our approach by a few numerical experiments.
Tasks	Compressive Sensing
Published	2018-09-04
URL	http://arxiv.org/abs/1809.00950v1
PDF	http://arxiv.org/pdf/1809.00950v1.pdf
PWC	https://paperswithcode.com/paper/compressive-hyperspectral-imaging-fourier
Repo
Framework

Online local pool generation for dynamic classifier selection: an extended version


Title	Online local pool generation for dynamic classifier selection: an extended version
Authors	Mariana A. Souza, George D. C. Cavalcanti, Rafael M. O. Cruz, Robert Sabourin
Abstract	Dynamic Classifier Selection (DCS) techniques have difficulty in selecting the most competent classifier in a pool, even when its presence is assured. Since the DCS techniques rely only on local data to estimate a classifier’s competence, the manner in which the pool is generated could affect the choice of the best classifier for a given sample. That is, the global perspective in which pools are generated may not help the DCS techniques in selecting a competent classifier for samples that are likely to be mislabelled. Thus, we propose in this work an online pool generation method that produces a locally accurate pool for test samples in difficult regions of the feature space. The difficulty of a given area is determined by the classification difficulty of the samples in it. That way, by using classifiers that were generated in a local scope, it could be easier for the DCS techniques to select the best one for the difficult samples. For the query samples in easy regions, a simple nearest neighbors rule is used. In the extended version of this work, a deep analysis on the correlation between instance hardness and the performance of DCS techniques is presented. An instance hardness measure that conveys the degree of local class overlap is then used to decide when the local pool is used in the proposed scheme. The proposed method yielded significantly greater recognition rates in comparison to a Bagging-generated pool and two other global pool generation schemes for all DCS techniques evaluated. The proposed scheme’s performance was also significantly superior to three state-of-the-art classification models and statistically equivalent to five of them. Moreover, an extended analysis on the computational complexity of the proposed method and of several DS techniques is presented in this version. We also provide the implementation of the proposed technique using the DESLib library on GitHub.
Tasks
Published	2018-09-05
URL	http://arxiv.org/abs/1809.01628v1
PDF	http://arxiv.org/pdf/1809.01628v1.pdf
PWC	https://paperswithcode.com/paper/online-local-pool-generation-for-dynamic
Repo
Framework

Deep Learning Topological Invariants of Band Insulators


Title	Deep Learning Topological Invariants of Band Insulators
Authors	Ning Sun, Jinmin Yi, Pengfei Zhang, Huitao Shen, Hui Zhai
Abstract	In this work we design and train deep neural networks to predict topological invariants for one-dimensional four-band insulators in AIII class whose topological invariant is the winding number, and two-dimensional two-band insulators in A class whose topological invariant is the Chern number. Given Hamiltonians in the momentum space as the input, neural networks can predict topological invariants for both classes with accuracy close to or higher than 90%, even for Hamiltonians whose invariants are beyond the training data set. Despite the complexity of the neural network, we find that the output of certain intermediate hidden layers resembles either the winding angle for models in AIII class or the solid angle (Berry curvature) for models in A class, indicating that neural networks essentially capture the mathematical formula of topological invariants. Our work demonstrates the ability of neural networks to predict topological invariants for complicated models with local Hamiltonians as the only input, and offers an example that even a deep neural network is understandable.
Tasks
Published	2018-05-26
URL	http://arxiv.org/abs/1805.10503v2
PDF	http://arxiv.org/pdf/1805.10503v2.pdf
PWC	https://paperswithcode.com/paper/deep-learning-topological-invariants-of-band
Repo
Framework