Paper Group ANR 925
Security Event Recognition for Visual Surveillance. Decision method choice in a human posture recognition context. Do WaveNets Dream of Acoustic Waves?. Multi-modal Image Processing based on Coupled Dictionary Learning. Evaluating Multimodal Representations on Sentence Similarity: vSTS, Visual Semantic Textual Similarity Dataset. A Tour of Unsuperv …
Security Event Recognition for Visual Surveillance
Title | Security Event Recognition for Visual Surveillance |
Authors | Michael Ying Yang, Wentong Liao, Chun Yang, Yanpeng Cao, Bodo Rosenhahn |
Abstract | With rapidly increasing deployment of surveillance cameras, the reliable methods for automatically analyzing the surveillance video and recognizing special events are demanded by different practical applications. This paper proposes a novel effective framework for security event analysis in surveillance videos. First, convolutional neural network (CNN) framework is used to detect objects of interest in the given videos. Second, the owners of the objects are recognized and monitored in real-time as well. If anyone moves any object, this person will be verified whether he/she is its owner. If not, this event will be further analyzed and distinguished between two different scenes: moving the object away or stealing it. To validate the proposed approach, a new video dataset consisting of various scenarios is constructed for more complex tasks. For comparison purpose, the experiments are also carried out on the benchmark databases related to the task on abandoned luggage detection. The experimental results show that the proposed approach outperforms the state-of-the-art methods and effective in recognizing complex security events. |
Tasks | |
Published | 2018-10-26 |
URL | http://arxiv.org/abs/1810.11348v1 |
http://arxiv.org/pdf/1810.11348v1.pdf | |
PWC | https://paperswithcode.com/paper/security-event-recognition-for-visual |
Repo | |
Framework | |
Decision method choice in a human posture recognition context
Title | Decision method choice in a human posture recognition context |
Authors | Stéphane Perrin, Eric Benoit, Didier Coquin |
Abstract | Human posture recognition provides a dynamic field that has produced many methods. Using fuzzy subsets based data fusion methods to aggregate the results given by different types of recognition processes is a convenient way to improve recognition methods. Nevertheless, choosing a defuzzification method to imple-ment the decision is a crucial point of this approach. The goal of this paper is to present an approach where the choice of the defuzzification method is driven by the constraints of the final data user, which are expressed as limitations on indica-tors like confidence or accuracy. A practical experimentation illustrating this ap-proach is presented: from a depth camera sensor, human posture is interpreted and the defuzzification method is selected in accordance with the constraints of the final information consumer. The paper illustrates the interest of the approach in a context of postures based human robot communication. |
Tasks | |
Published | 2018-07-11 |
URL | http://arxiv.org/abs/1807.04170v1 |
http://arxiv.org/pdf/1807.04170v1.pdf | |
PWC | https://paperswithcode.com/paper/decision-method-choice-in-a-human-posture |
Repo | |
Framework | |
Do WaveNets Dream of Acoustic Waves?
Title | Do WaveNets Dream of Acoustic Waves? |
Authors | Kanru Hua |
Abstract | Various sources have reported the WaveNet deep learning architecture being able to generate high-quality speech, but to our knowledge there haven’t been studies on the interpretation or visualization of trained WaveNets. This study investigates the possibility that WaveNet understands speech by unsupervisedly learning an acoustically meaningful latent representation of the speech signals in its receptive field; we also attempt to interpret the mechanism by which the feature extraction is performed. Suggested by singular value decomposition and linear regression analysis on the activations and known acoustic features (e.g. F0), the key findings are (1) activations in the higher layers are highly correlated with spectral features; (2) WaveNet explicitly performs pitch extraction despite being trained to directly predict the next audio sample and (3) for the said feature analysis to take place, the latent signal representation is converted back and forth between baseband and wideband components. |
Tasks | |
Published | 2018-02-23 |
URL | http://arxiv.org/abs/1802.08370v1 |
http://arxiv.org/pdf/1802.08370v1.pdf | |
PWC | https://paperswithcode.com/paper/do-wavenets-dream-of-acoustic-waves |
Repo | |
Framework | |
Multi-modal Image Processing based on Coupled Dictionary Learning
Title | Multi-modal Image Processing based on Coupled Dictionary Learning |
Authors | Pingfan Song, Miguel R. D. Rodrigues |
Abstract | In real-world scenarios, many data processing problems often involve heterogeneous images associated with different imaging modalities. Since these multimodal images originate from the same phenomenon, it is realistic to assume that they share common attributes or characteristics. In this paper, we propose a multi-modal image processing framework based on coupled dictionary learning to capture similarities and disparities between different image modalities. In particular, our framework can capture favorable structure similarities across different image modalities such as edges, corners, and other elementary primitives in a learned sparse transform domain, instead of the original pixel domain, that can be used to improve a number of image processing tasks such as denoising, inpainting, or super-resolution. Practical experiments demonstrate that incorporating multimodal information using our framework brings notable benefits. |
Tasks | Denoising, Dictionary Learning, Super-Resolution |
Published | 2018-06-26 |
URL | http://arxiv.org/abs/1806.09882v1 |
http://arxiv.org/pdf/1806.09882v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-modal-image-processing-based-on-coupled |
Repo | |
Framework | |
Evaluating Multimodal Representations on Sentence Similarity: vSTS, Visual Semantic Textual Similarity Dataset
Title | Evaluating Multimodal Representations on Sentence Similarity: vSTS, Visual Semantic Textual Similarity Dataset |
Authors | Oier Lopez de Lacalle, Aitor Soroa, Eneko Agirre |
Abstract | In this paper we introduce vSTS, a new dataset for measuring textual similarity of sentences using multimodal information. The dataset is comprised by images along with its respectively textual captions. We describe the dataset both quantitatively and qualitatively, and claim that it is a valid gold standard for measuring automatic multimodal textual similarity systems. We also describe the initial experiments combining the multimodal information. |
Tasks | Semantic Textual Similarity |
Published | 2018-09-11 |
URL | http://arxiv.org/abs/1809.03695v1 |
http://arxiv.org/pdf/1809.03695v1.pdf | |
PWC | https://paperswithcode.com/paper/evaluating-multimodal-representations-on |
Repo | |
Framework | |
A Tour of Unsupervised Deep Learning for Medical Image Analysis
Title | A Tour of Unsupervised Deep Learning for Medical Image Analysis |
Authors | Khalid Raza, Nripendra Kumar Singh |
Abstract | Interpretation of medical images for diagnosis and treatment of complex disease from high-dimensional and heterogeneous data remains a key challenge in transforming healthcare. In the last few years, both supervised and unsupervised deep learning achieved promising results in the area of medical imaging and image analysis. Unlike supervised learning which is biased towards how it is being supervised and manual efforts to create class label for the algorithm, unsupervised learning derive insights directly from the data itself, group the data and help to make data driven decisions without any external bias. This review systematically presents various unsupervised models applied to medical image analysis, including autoencoders and its several variants, Restricted Boltzmann machines, Deep belief networks, Deep Boltzmann machine and Generative adversarial network. Future research opportunities and challenges of unsupervised techniques for medical image analysis have also been discussed. |
Tasks | |
Published | 2018-12-19 |
URL | http://arxiv.org/abs/1812.07715v1 |
http://arxiv.org/pdf/1812.07715v1.pdf | |
PWC | https://paperswithcode.com/paper/a-tour-of-unsupervised-deep-learning-for |
Repo | |
Framework | |
Using Multi-task and Transfer Learning to Solve Working Memory Tasks
Title | Using Multi-task and Transfer Learning to Solve Working Memory Tasks |
Authors | T. S. Jayram, Tomasz Kornuta, Ryan L. McAvoy, Ahmet S. Ozcan |
Abstract | We propose a new architecture called Memory-Augmented Encoder-Solver (MAES) that enables transfer learning to solve complex working memory tasks adapted from cognitive psychology. It uses dual recurrent neural network controllers, inside the encoder and solver, respectively, that interface with a shared memory module and is completely differentiable. We study different types of encoders in a systematic manner and demonstrate a unique advantage of multi-task learning in obtaining the best possible encoder. We show by extensive experimentation that the trained MAES models achieve task-size generalization, i.e., they are capable of handling sequential inputs 50 times longer than seen during training, with appropriately large memory modules. We demonstrate that the performance achieved by MAES far outperforms existing and well-known models such as the LSTM, NTM and DNC on the entire suite of tasks. |
Tasks | Multi-Task Learning, Transfer Learning |
Published | 2018-09-28 |
URL | http://arxiv.org/abs/1809.10847v1 |
http://arxiv.org/pdf/1809.10847v1.pdf | |
PWC | https://paperswithcode.com/paper/using-multi-task-and-transfer-learning-to |
Repo | |
Framework | |
Neural Speech Synthesis with Transformer Network
Title | Neural Speech Synthesis with Transformer Network |
Authors | Naihan Li, Shujie Liu, Yanqing Liu, Sheng Zhao, Ming Liu, Ming Zhou |
Abstract | Although end-to-end neural text-to-speech (TTS) methods (such as Tacotron2) are proposed and achieve state-of-the-art performance, they still suffer from two problems: 1) low efficiency during training and inference; 2) hard to model long dependency using current recurrent neural networks (RNNs). Inspired by the success of Transformer network in neural machine translation (NMT), in this paper, we introduce and adapt the multi-head attention mechanism to replace the RNN structures and also the original attention mechanism in Tacotron2. With the help of multi-head self-attention, the hidden states in the encoder and decoder are constructed in parallel, which improves the training efficiency. Meanwhile, any two inputs at different times are connected directly by self-attention mechanism, which solves the long range dependency problem effectively. Using phoneme sequences as input, our Transformer TTS network generates mel spectrograms, followed by a WaveNet vocoder to output the final audio results. Experiments are conducted to test the efficiency and performance of our new network. For the efficiency, our Transformer TTS network can speed up the training about 4.25 times faster compared with Tacotron2. For the performance, rigorous human tests show that our proposed model achieves state-of-the-art performance (outperforms Tacotron2 with a gap of 0.048) and is very close to human quality (4.39 vs 4.44 in MOS). |
Tasks | Machine Translation, Speech Synthesis |
Published | 2018-09-19 |
URL | http://arxiv.org/abs/1809.08895v3 |
http://arxiv.org/pdf/1809.08895v3.pdf | |
PWC | https://paperswithcode.com/paper/neural-speech-synthesis-with-transformer |
Repo | |
Framework | |
Multiaccuracy: Black-Box Post-Processing for Fairness in Classification
Title | Multiaccuracy: Black-Box Post-Processing for Fairness in Classification |
Authors | Michael P. Kim, Amirata Ghorbani, James Zou |
Abstract | Prediction systems are successfully deployed in applications ranging from disease diagnosis, to predicting credit worthiness, to image recognition. Even when the overall accuracy is high, these systems may exhibit systematic biases that harm specific subpopulations; such biases may arise inadvertently due to underrepresentation in the data used to train a machine-learning model, or as the result of intentional malicious discrimination. We develop a rigorous framework of multiaccuracy auditing and post-processing to ensure accurate predictions across identifiable subgroups. Our algorithm, MULTIACCURACY-BOOST, works in any setting where we have black-box access to a predictor and a relatively small set of labeled data for auditing; importantly, this black-box framework allows for improved fairness and accountability of predictions, even when the predictor is minimally transparent. We prove that MULTIACCURACY-BOOST converges efficiently and show that if the initial model is accurate on an identifiable subgroup, then the post-processed model will be also. We experimentally demonstrate the effectiveness of the approach to improve the accuracy among minority subgroups in diverse applications (image classification, finance, population health). Interestingly, MULTIACCURACY-BOOST can improve subpopulation accuracy (e.g. for “black women”) even when the sensitive features (e.g. “race”, “gender”) are not given to the algorithm explicitly. |
Tasks | Image Classification |
Published | 2018-05-31 |
URL | http://arxiv.org/abs/1805.12317v2 |
http://arxiv.org/pdf/1805.12317v2.pdf | |
PWC | https://paperswithcode.com/paper/multiaccuracy-black-box-post-processing-for |
Repo | |
Framework | |
MSc Dissertation: Exclusive Row Biclustering for Gene Expression Using a Combinatorial Auction Approach
Title | MSc Dissertation: Exclusive Row Biclustering for Gene Expression Using a Combinatorial Auction Approach |
Authors | Amichai Painsky |
Abstract | The availability of large microarray data has led to a growing interest in biclustering methods in the past decade. Several algorithms have been proposed to identify subsets of genes and conditions according to different similarity measures and under varying constraints. In this paper we focus on the exclusive row biclustering problem for gene expression data sets, in which each row can only be a member of a single bicluster while columns can participate in multiple ones. This type of biclustering may be adequate, for example, for clustering groups of cancer patients where each patient (row) is expected to be carrying only a single type of cancer, while each cancer type is associated with multiple (and possibly overlapping) genes (columns). We present a novel method to identify these exclusive row biclusters through a combination of existing biclustering algorithms and combinatorial auction techniques. We devise an approach for tuning the threshold for our algorithm based on comparison to a null model in the spirit of the Gap statistic approach. We demonstrate our approach on both synthetic and real-world gene expression data and show its power in identifying large span non-overlapping rows sub matrices, while considering their unique nature. The Gap statistic approach succeeds in identifying appropriate thresholds in all our examples. |
Tasks | |
Published | 2018-09-13 |
URL | http://arxiv.org/abs/1809.05077v2 |
http://arxiv.org/pdf/1809.05077v2.pdf | |
PWC | https://paperswithcode.com/paper/msc-dissertation-exclusive-row-biclustering |
Repo | |
Framework | |
Implementation of Deep Convolutional Neural Network in Multi-class Categorical Image Classification
Title | Implementation of Deep Convolutional Neural Network in Multi-class Categorical Image Classification |
Authors | Pushparaja Murugan |
Abstract | Convolutional Neural Networks has been implemented in many complex machine learning takes such as image classification, object identification, autonomous vehicle and robotic vision tasks. However, ConvNet architecture efficiency and accuracy depend on a large number of fac- tors. Also, the complex architecture requires a significant amount of data to train and involves with a large number of hyperparameters that increases the computational expenses and difficul- ties. Hence, it is necessary to address the limitations and techniques to overcome the barriers to ensure that the architecture performs well in complex visual tasks. This article is intended to develop an efficient ConvNet architecture for multi-class image categorical classification applica- tion. In the development of the architecture, large pool of grey scale images are taken as input information images and split into training and test datasets. The numerously available technique is implemented to reduce the overfitting and poor generalization of the network. The hyperpa- rameters of determined by Bayesian Optimization with Gaussian Process prior algorithm. ReLu non-linear activation function is implemented after the convolutional layers. Max pooling op- eration is carried out to downsampling the data points in pooling layers. Cross-entropy loss function is used to measure the performance of the architecture where the softmax is used in the classification layer. Mini-batch gradient descent with Adam optimizer algorithm is used for backpropagation. Developed architecture is validated with confusion matrix and classification report. |
Tasks | Image Classification |
Published | 2018-01-03 |
URL | http://arxiv.org/abs/1801.01397v1 |
http://arxiv.org/pdf/1801.01397v1.pdf | |
PWC | https://paperswithcode.com/paper/implementation-of-deep-convolutional-neural |
Repo | |
Framework | |
Deep Auto-Set: A Deep Auto-Encoder-Set Network for Activity Recognition Using Wearables
Title | Deep Auto-Set: A Deep Auto-Encoder-Set Network for Activity Recognition Using Wearables |
Authors | Alireza Abedin Varamin, Ehsan Abbasnejad, Qinfeng Shi, Damith Ranasinghe, Hamid Rezatofighi |
Abstract | Automatic recognition of human activities from time-series sensor data (referred to as HAR) is a growing area of research in ubiquitous computing. Most recent research in the field adopts supervised deep learning paradigms to automate extraction of intrinsic features from raw signal inputs and addresses HAR as a multi-class classification problem where detecting a single activity class within the duration of a sensory data segment suffices. However, due to the innate diversity of human activities and their corresponding duration, no data segment is guaranteed to contain sensor recordings of a single activity type. In this paper, we express HAR more naturally as a set prediction problem where the predictions are sets of ongoing activity elements with unfixed and unknown cardinality. For the first time, we address this problem by presenting a novel HAR approach that learns to output activity sets using deep neural networks. Moreover, motivated by the limited availability of annotated HAR datasets as well as the unfortunate immaturity of existing unsupervised systems, we complement our supervised set learning scheme with a prior unsupervised feature learning process that adopts convolutional auto-encoders to exploit unlabeled data. The empirical experiments on two widely adopted HAR datasets demonstrate the substantial improvement of our proposed methodology over the baseline models. |
Tasks | Activity Recognition, Time Series |
Published | 2018-11-20 |
URL | http://arxiv.org/abs/1811.08127v1 |
http://arxiv.org/pdf/1811.08127v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-auto-set-a-deep-auto-encoder-set-network |
Repo | |
Framework | |
Compressive Hyperspectral Imaging: Fourier Transform Interferometry meets Single Pixel Camera
Title | Compressive Hyperspectral Imaging: Fourier Transform Interferometry meets Single Pixel Camera |
Authors | Amirafshar Moshtaghpour, José M. Bioucas-Dias, Laurent Jacques |
Abstract | This paper introduces a single-pixel HyperSpectral (HS) imaging framework based on Fourier Transform Interferometry (FTI). By combining a space-time coding of the light illumination with partial interferometric observations of a collimated light beam (observed by a single pixel), our system benefits from (i) reduced measurement rate and light-exposure of the observed object compared to common (Nyquist) FTI imagers, and (ii) high spectral resolution as desirable in, e.g., Fluorescence Spectroscopy (FS). From the principles of compressive sensing with multilevel sampling, our method leverages the sparsity “in level” of FS data, both in the spectral and the spatial domains. This allows us to optimize the space-time light coding using time-modulated Hadamard patterns. We confirm the effectiveness of our approach by a few numerical experiments. |
Tasks | Compressive Sensing |
Published | 2018-09-04 |
URL | http://arxiv.org/abs/1809.00950v1 |
http://arxiv.org/pdf/1809.00950v1.pdf | |
PWC | https://paperswithcode.com/paper/compressive-hyperspectral-imaging-fourier |
Repo | |
Framework | |
Online local pool generation for dynamic classifier selection: an extended version
Title | Online local pool generation for dynamic classifier selection: an extended version |
Authors | Mariana A. Souza, George D. C. Cavalcanti, Rafael M. O. Cruz, Robert Sabourin |
Abstract | Dynamic Classifier Selection (DCS) techniques have difficulty in selecting the most competent classifier in a pool, even when its presence is assured. Since the DCS techniques rely only on local data to estimate a classifier’s competence, the manner in which the pool is generated could affect the choice of the best classifier for a given sample. That is, the global perspective in which pools are generated may not help the DCS techniques in selecting a competent classifier for samples that are likely to be mislabelled. Thus, we propose in this work an online pool generation method that produces a locally accurate pool for test samples in difficult regions of the feature space. The difficulty of a given area is determined by the classification difficulty of the samples in it. That way, by using classifiers that were generated in a local scope, it could be easier for the DCS techniques to select the best one for the difficult samples. For the query samples in easy regions, a simple nearest neighbors rule is used. In the extended version of this work, a deep analysis on the correlation between instance hardness and the performance of DCS techniques is presented. An instance hardness measure that conveys the degree of local class overlap is then used to decide when the local pool is used in the proposed scheme. The proposed method yielded significantly greater recognition rates in comparison to a Bagging-generated pool and two other global pool generation schemes for all DCS techniques evaluated. The proposed scheme’s performance was also significantly superior to three state-of-the-art classification models and statistically equivalent to five of them. Moreover, an extended analysis on the computational complexity of the proposed method and of several DS techniques is presented in this version. We also provide the implementation of the proposed technique using the DESLib library on GitHub. |
Tasks | |
Published | 2018-09-05 |
URL | http://arxiv.org/abs/1809.01628v1 |
http://arxiv.org/pdf/1809.01628v1.pdf | |
PWC | https://paperswithcode.com/paper/online-local-pool-generation-for-dynamic |
Repo | |
Framework | |
Deep Learning Topological Invariants of Band Insulators
Title | Deep Learning Topological Invariants of Band Insulators |
Authors | Ning Sun, Jinmin Yi, Pengfei Zhang, Huitao Shen, Hui Zhai |
Abstract | In this work we design and train deep neural networks to predict topological invariants for one-dimensional four-band insulators in AIII class whose topological invariant is the winding number, and two-dimensional two-band insulators in A class whose topological invariant is the Chern number. Given Hamiltonians in the momentum space as the input, neural networks can predict topological invariants for both classes with accuracy close to or higher than 90%, even for Hamiltonians whose invariants are beyond the training data set. Despite the complexity of the neural network, we find that the output of certain intermediate hidden layers resembles either the winding angle for models in AIII class or the solid angle (Berry curvature) for models in A class, indicating that neural networks essentially capture the mathematical formula of topological invariants. Our work demonstrates the ability of neural networks to predict topological invariants for complicated models with local Hamiltonians as the only input, and offers an example that even a deep neural network is understandable. |
Tasks | |
Published | 2018-05-26 |
URL | http://arxiv.org/abs/1805.10503v2 |
http://arxiv.org/pdf/1805.10503v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-topological-invariants-of-band |
Repo | |
Framework | |