January 28, 2020

3196 words 16 mins read

Paper Group ANR 853

Paper Group ANR 853

Degraded Historical Documents Images Binarization Using a Combination of Enhanced Techniques. Open Set Domain Adaptation for Image and Action Recognition. Predicting Soil pH by Using Nearest Fields. The Semantic Mutex Watershed for Efficient Bottom-Up Semantic Instance Segmentation. On the Vulnerability of CNN Classifiers in EEG-Based BCIs. Develop …

Degraded Historical Documents Images Binarization Using a Combination of Enhanced Techniques

Title Degraded Historical Documents Images Binarization Using a Combination of Enhanced Techniques
Authors Omar Boudraa, Walid Khaled Hidouci, Dominique Michelucci
Abstract Document image binarization is the initial step and a crucial in many document analysis and recognition scheme. In fact, it is still a relevant research subject and a fundamental challenge due to its importance and influence. This paper provides an original multi-phases system that hybridizes various efficient image thresholding methods in order to get the best binarization output. First, to improve contrast in particularly defective images, the application of CLAHE algorithm is suggested and justified. We then use a cooperative technique to segment image into two separated classes. At the end, a special transformation is applied for the purpose of removing scattered noise and of correcting characters forms. Experimentations demonstrate the precision and the robustness of our framework applied on historical degraded documents images within three benchmarks compared to other noted methods.
Tasks
Published 2019-01-27
URL http://arxiv.org/abs/1901.09425v1
PDF http://arxiv.org/pdf/1901.09425v1.pdf
PWC https://paperswithcode.com/paper/degraded-historical-documents-images
Repo
Framework

Open Set Domain Adaptation for Image and Action Recognition

Title Open Set Domain Adaptation for Image and Action Recognition
Authors Pau Panareda Busto, Ahsan Iqbal, Juergen Gall
Abstract Since annotating and curating large datasets is very expensive, there is a need to transfer the knowledge from existing annotated datasets to unlabelled data. Data that is relevant for a specific application, however, usually differs from publicly available datasets since it is sampled from a different domain. While domain adaptation methods compensate for such a domain shift, they assume that all categories in the target domain are known and match the categories in the source domain. Since this assumption is violated under real-world conditions, we propose an approach for open set domain adaptation where the target domain contains instances of categories that are not present in the source domain. The proposed approach achieves state-of-the-art results on various datasets for image classification and action recognition. Since the approach can be used for open set and closed set domain adaptation, as well as unsupervised and semi-supervised domain adaptation, it is a versatile tool for many applications.
Tasks Domain Adaptation, Image Classification
Published 2019-07-30
URL https://arxiv.org/abs/1907.12865v1
PDF https://arxiv.org/pdf/1907.12865v1.pdf
PWC https://paperswithcode.com/paper/open-set-domain-adaptation-for-image-and
Repo
Framework

Predicting Soil pH by Using Nearest Fields

Title Predicting Soil pH by Using Nearest Fields
Authors Quoc Hung Ngo, Nhien-An Le-Khac, Tahar Kechadi
Abstract In precision agriculture (PA), soil sampling and testing operation is prior to planting any new crop. It is an expensive operation since there are many soil characteristics to take into account. This paper gives an overview of soil characteristics and their relationships with crop yield and soil profiling. We propose an approach for predicting soil pH based on nearest neighbour fields. It implements spatial radius queries and various regression techniques in data mining. We use soil dataset containing about 4,000 fields profiles to evaluate them and analyse their robustness. A comparative study indicates that LR, SVR, and GBRT techniques achieved high accuracy, with the R_2 values of about 0.718 and MAE values of 0.29. The experimental results showed that the proposed approach is very promising and can contribute significantly to PA.
Tasks
Published 2019-12-03
URL https://arxiv.org/abs/1912.01303v1
PDF https://arxiv.org/pdf/1912.01303v1.pdf
PWC https://paperswithcode.com/paper/predicting-soil-ph-by-using-nearest-fields
Repo
Framework

The Semantic Mutex Watershed for Efficient Bottom-Up Semantic Instance Segmentation

Title The Semantic Mutex Watershed for Efficient Bottom-Up Semantic Instance Segmentation
Authors Steffen Wolf, Yuyan Li, Constantin Pape, Alberto Bailoni, Anna Kreshuk, Fred A. Hamprecht
Abstract Semantic instance segmentation is the task of simultaneously partitioning an image into distinct segments while associating each pixel with a class label. In commonly used pipelines, segmentation and label assignment are solved separately since joint optimization is computationally expensive. We propose a greedy algorithm for joint graph partitioning and labeling derived from the efficient Mutex Watershed partitioning algorithm. It optimizes an objective function closely related to the Symmetric Multiway Cut objective and empirically shows efficient scaling behavior. Due to the algorithm’s efficiency it can operate directly on pixels without prior over-segmentation of the image into superpixels. We evaluate the performance on the Cityscapes dataset (2D urban scenes) and on a 3D microscopy volume. In urban scenes, the proposed algorithm combined with current deep neural networks outperforms the strong baseline of `Panoptic Feature Pyramid Networks’ by Kirillov et al. (2019). In the 3D electron microscopy images, we show explicitly that our joint formulation outperforms a separate optimization of the partitioning and labeling problems. |
Tasks graph partitioning, Instance Segmentation, Semantic Segmentation
Published 2019-12-29
URL https://arxiv.org/abs/1912.12717v1
PDF https://arxiv.org/pdf/1912.12717v1.pdf
PWC https://paperswithcode.com/paper/the-semantic-mutex-watershed-for-efficient
Repo
Framework

On the Vulnerability of CNN Classifiers in EEG-Based BCIs

Title On the Vulnerability of CNN Classifiers in EEG-Based BCIs
Authors Xiao Zhang, Dongrui Wu
Abstract Deep learning has been successfully used in numerous applications because of its outstanding performance and the ability to avoid manual feature engineering. One such application is electroencephalogram (EEG) based brain-computer interface (BCI), where multiple convolutional neural network (CNN) models have been proposed for EEG classification. However, it has been found that deep learning models can be easily fooled with adversarial examples, which are normal examples with small deliberate perturbations. This paper proposes an unsupervised fast gradient sign method (UFGSM) to attack three popular CNN classifiers in BCIs, and demonstrates its effectiveness. We also verify the transferability of adversarial examples in BCIs, which means we can perform attacks even without knowing the architecture and parameters of the target models, or the datasets they were trained on. To our knowledge, this is the first study on the vulnerability of CNN classifiers in EEG-based BCIs, and hopefully will trigger more attention on the security of BCI systems.
Tasks EEG, Feature Engineering
Published 2019-03-31
URL http://arxiv.org/abs/1904.01002v1
PDF http://arxiv.org/pdf/1904.01002v1.pdf
PWC https://paperswithcode.com/paper/on-the-vulnerability-of-cnn-classifiers-in
Repo
Framework

Development of an Entropy-Based Feature Selection Method and Analysis of Online Reviews on Real Estate

Title Development of an Entropy-Based Feature Selection Method and Analysis of Online Reviews on Real Estate
Authors Hiroki Horino, Hirofumi Nonaka, Elisa Claire Alemán Carreón, Toru Hiraoka
Abstract In recent years, data posted about real estate on the Internet is currently increasing. In this study, in order to analyze user needs for real estate, we focus on “Mansion Community” which is a Japanese bulletin board system (hereinafter referred to as BBS) about Japanese real estate. In our study, extraction of keywords is performed based on the calculation of the entropy value of each word, and we used them as features in a machine learning classifier to analyze 6 million posts at “Mansion Community”. As a result, we achieved a 0.69 F-measure and found that the customers are particularly concerned about the facility of apartment, access, and price of an apartment.
Tasks Feature Selection
Published 2019-04-23
URL http://arxiv.org/abs/1904.11797v1
PDF http://arxiv.org/pdf/1904.11797v1.pdf
PWC https://paperswithcode.com/paper/development-of-an-entropy-based-feature
Repo
Framework

End-to-end Adaptation with Backpropagation through WFST for On-device Speech Recognition System

Title End-to-end Adaptation with Backpropagation through WFST for On-device Speech Recognition System
Authors Emiru Tsunoo, Yosuke Kashiwagi, Satoshi Asakawa, Toshiyuki Kumakura
Abstract An on-device DNN-HMM speech recognition system efficiently works with a limited vocabulary in the presence of a variety of predictable noise. In such a case, vocabulary and environment adaptation is highly effective. In this paper, we propose a novel method of end-to-end (E2E) adaptation, which adjusts not only an acoustic model (AM) but also a weighted finite-state transducer (WFST). We convert a pretrained WFST to a trainable neural network and adapt the system to target environments/vocabulary by E2E joint training with an AM. We replicate Viterbi decoding with forward–backward neural network computation, which is similar to recurrent neural networks (RNNs). By pooling output score sequences, a vocabulary posterior for each utterance is obtained and used for discriminative loss computation. Experiments using 2–10 hours of English/Japanese adaptation datasets indicate that the fine-tuning of only WFSTs and that of only AMs are both comparable to a state-of-the-art adaptation method, and E2E joint training of the two components achieves the best recognition performance. We also adapt each language system to the other language using the adaptation data, and the results show that the proposed method also works well for language adaptations.
Tasks Speech Recognition
Published 2019-05-17
URL https://arxiv.org/abs/1905.07149v3
PDF https://arxiv.org/pdf/1905.07149v3.pdf
PWC https://paperswithcode.com/paper/end-to-end-adaptation-with-backpropagation
Repo
Framework

Face Video Generation from a Single Image and Landmarks

Title Face Video Generation from a Single Image and Landmarks
Authors Kritaphat Songsri-in, Stefanos Zafeiriou
Abstract In this paper we are concerned with the challenging problem of producing a full image sequence of a deformable face given only an image and generic facial motions encoded by a set of sparse landmarks. To this end we build upon recent breakthroughs in image-to-image translation such as pix2pix, CycleGAN and StarGAN which learn Deep Convolutional Neural Networks (DCNNs) that learn to map aligned pairs or images between different domains (i.e., having different labels) and propose a new architecture which is not driven any more by labels but by spatial maps, facial landmarks. In particular, we propose the MotionGAN which transforms an input face image into a new one according to a heatmap of target landmarks. We show that it is possible to create very realistic face videos using a single image and a set of target landmarks. Furthermore, our method can be used to edit a facial image with arbitrary motions according to landmarks (e.g., expression, speech, etc.). This provides much more flexibility to face editing, expression transfer, facial video creation, etc. than models based on discrete expressions, audios or action units.
Tasks Image-to-Image Translation, Video Generation
Published 2019-04-25
URL http://arxiv.org/abs/1904.11521v1
PDF http://arxiv.org/pdf/1904.11521v1.pdf
PWC https://paperswithcode.com/paper/face-video-generation-from-a-single-image-and
Repo
Framework

VQVAE Unsupervised Unit Discovery and Multi-scale Code2Spec Inverter for Zerospeech Challenge 2019

Title VQVAE Unsupervised Unit Discovery and Multi-scale Code2Spec Inverter for Zerospeech Challenge 2019
Authors Andros Tjandra, Berrak Sisman, Mingyang Zhang, Sakriani Sakti, Haizhou Li, Satoshi Nakamura
Abstract We describe our submitted system for the ZeroSpeech Challenge 2019. The current challenge theme addresses the difficulty of constructing a speech synthesizer without any text or phonetic labels and requires a system that can (1) discover subword units in an unsupervised way, and (2) synthesize the speech with a target speaker’s voice. Moreover, the system should also balance the discrimination score ABX, the bit-rate compression rate, and the naturalness and the intelligibility of the constructed voice. To tackle these problems and achieve the best trade-off, we utilize a vector quantized variational autoencoder (VQ-VAE) and a multi-scale codebook-to-spectrogram (Code2Spec) inverter trained by mean square error and adversarial loss. The VQ-VAE extracts the speech to a latent space, forces itself to map it into the nearest codebook and produces compressed representation. Next, the inverter generates a magnitude spectrogram to the target voice, given the codebook vectors from VQ-VAE. In our experiments, we also investigated several other clustering algorithms, including K-Means and GMM, and compared them with the VQ-VAE result on ABX scores and bit rates. Our proposed approach significantly improved the intelligibility (in CER), the MOS, and discrimination ABX scores compared to the official ZeroSpeech 2019 baseline or even the topline.
Tasks
Published 2019-05-27
URL https://arxiv.org/abs/1905.11449v2
PDF https://arxiv.org/pdf/1905.11449v2.pdf
PWC https://paperswithcode.com/paper/vqvae-unsupervised-unit-discovery-and-multi
Repo
Framework

Long-Term Human Video Generation of Multiple Futures Using Poses

Title Long-Term Human Video Generation of Multiple Futures Using Poses
Authors Naoya Fushishita, Antonio Tejero-de-Pablos, Yusuke Mukuta, Tatsuya Harada
Abstract Predicting future human behavior from an input human video is a useful task for applications such as autonomous driving and robotics. While most previous works predict a single future, multiple futures with different behavior can potentially occur. Moreover, if the predicted future is too short (e.g., less than one second), it may not be fully usable by a human or other systems. In this paper, we propose a novel method for future human pose prediction capable of predicting multiple long-term futures. This makes the predictions more suitable for real applications. Also, from the input video and the predicted human behavior, we generate future videos. First, from an input human video, we generate sequences of future human poses (i.e., the image coordinates of their body-joints) via adversarial learning. Adversarial learning suffers from mode collapse, which makes it difficult to generate a variety of multiple poses. We solve this problem by utilizing two additional inputs to the generator to make the outputs diverse, namely, a latent code (to reflect various behaviors) and an attraction point (to reflect various trajectories). In addition, we generate long-term future human poses using a novel approach based on unidimensional convolutional neural networks. Last, we generate an output video based on the generated poses for visualization. We evaluate the generated future poses and videos using three criteria (i.e., realism, diversity and accuracy), and show that our proposed method outperforms other state-of-the-art works.
Tasks Autonomous Driving, Pose Prediction, Video Generation, Video Prediction
Published 2019-04-16
URL https://arxiv.org/abs/1904.07538v3
PDF https://arxiv.org/pdf/1904.07538v3.pdf
PWC https://paperswithcode.com/paper/long-term-video-generation-of-multiple
Repo
Framework

Driver Identification Based on Vehicle Telematics Data using LSTM-Recurrent Neural Network

Title Driver Identification Based on Vehicle Telematics Data using LSTM-Recurrent Neural Network
Authors Abenezer Girma, Xuyang Yan, Abdollah Homaifar
Abstract Despite advancements in vehicle security systems, over the last decade, auto-theft rates have increased, and cyber-security attacks on internet-connected and autonomous vehicles are becoming a new threat. In this paper, a deep learning model is proposed, which can identify drivers from their driving behaviors based on vehicle telematics data. The proposed Long-Short-Term-Memory (LSTM) model predicts the identity of the driver based on the individual’s unique driving patterns learned from the vehicle telematics data. Given the telematics is time-series data, the problem is formulated as a time series prediction task to exploit the embedded sequential information. The performance of the proposed approach is evaluated on three naturalistic driving datasets, which gives high accuracy prediction results. The robustness of the model on noisy and anomalous data that is usually caused by sensor defects or environmental factors is also investigated. Results show that the proposed model prediction accuracy remains satisfactory and outperforms the other approaches despite the extent of anomalies and noise-induced in the data.
Tasks Autonomous Vehicles, Time Series, Time Series Prediction
Published 2019-11-19
URL https://arxiv.org/abs/1911.08030v1
PDF https://arxiv.org/pdf/1911.08030v1.pdf
PWC https://paperswithcode.com/paper/driver-identification-based-on-vehicle
Repo
Framework

Fuzzy adaptive teaching learning-based optimization strategy for the problem of generating mixed strength t-way test suites

Title Fuzzy adaptive teaching learning-based optimization strategy for the problem of generating mixed strength t-way test suites
Authors Kamal Z. Zamli, Fakhrud Din, Salmi Baharom, Bestoun S. Ahmed
Abstract The teaching learning-based optimization (TLBO) algorithm has shown competitive performance in solving numerous real-world optimization problems. Nevertheless, this algorithm requires better control for exploitation and exploration to prevent premature convergence (i.e., trapped in local optima), as well as enhance solution diversity. Thus, this paper proposes a new TLBO variant based on Mamdani fuzzy inference system, called ATLBO, to permit adaptive selection of its global and local search operations. In order to assess its performances, we adopt ATLBO for the mixed strength t-way test generation problem. Experimental results reveal that ATLBO exhibits competitive performances against the original TLBO and other meta-heuristic counterparts.
Tasks
Published 2019-04-10
URL https://arxiv.org/abs/1906.08855v1
PDF https://arxiv.org/pdf/1906.08855v1.pdf
PWC https://paperswithcode.com/paper/fuzzy-adaptive-teaching-learning-based
Repo
Framework

Detecting Activities of Daily Living and Routine Behaviours in Dementia Patients Living Alone Using Smart Meter Load Disaggregation

Title Detecting Activities of Daily Living and Routine Behaviours in Dementia Patients Living Alone Using Smart Meter Load Disaggregation
Authors C. Chalmers, P. Fergus, C. Aday Curbelo Montanez, S. Sikdar, F. Ball, B. Kendall
Abstract The emergence of an ageing population is a significant public health concern. This has led to an increase in the number of people living with progressive neurodegenerative disorders like dementia. Consequently, the strain this is places on health and social care services means providing 24-hour monitoring is not sustainable. Technological intervention is being considered, however no solution exists to non-intrusively monitor the independent living needs of patients with dementia. As a result many patients hit crisis point before intervention and support is provided. In parallel, patient care relies on feedback from informal carers about significant behavioural changes. Yet, not all people have a social support network and early intervention in dementia care is often missed. The smart meter rollout has the potential to change this. Using machine learning and signal processing techniques, a home energy supply can be disaggregated to detect which home appliances are turned on and off. This will allow Activities of Daily Living (ADLs) to be assessed, such as eating and drinking, and observed changes in routine to be detected for early intervention. The primary aim is to help reduce deterioration and enable patients to stay in their homes for longer. A Support Vector Machine (SVM) and Random Decision Forest classifier are modelled using data from three test homes. The trained models are then used to monitor two patients with dementia during a six-month clinical trial undertaken in partnership with Mersey Care NHS Foundation Trust. In the case of load disaggregation for appliance detection, the SVM achieved (AUC=0.86074, Sen=0.756 and Spec=0.92838). While the Decision Forest achieved (AUC=0.9429, Sen=0.9634 and Spec=0.9634). ADLs are also analysed to identify the behavioural patterns of the occupant while detecting alterations in routine.
Tasks
Published 2019-03-18
URL http://arxiv.org/abs/1903.12080v1
PDF http://arxiv.org/pdf/1903.12080v1.pdf
PWC https://paperswithcode.com/paper/detecting-activities-of-daily-living-and
Repo
Framework

Controllable Paraphrase Generation with a Syntactic Exemplar

Title Controllable Paraphrase Generation with a Syntactic Exemplar
Authors Mingda Chen, Qingming Tang, Sam Wiseman, Kevin Gimpel
Abstract Prior work on controllable text generation usually assumes that the controlled attribute can take on one of a small set of values known a priori. In this work, we propose a novel task, where the syntax of a generated sentence is controlled rather by a sentential exemplar. To evaluate quantitatively with standard metrics, we create a novel dataset with human annotations. We also develop a variational model with a neural module specifically designed for capturing syntactic knowledge and several multitask training objectives to promote disentangled representation learning. Empirically, the proposed model is observed to achieve improvements over baselines and learn to capture desirable characteristics.
Tasks Paraphrase Generation, Representation Learning, Text Generation
Published 2019-06-03
URL https://arxiv.org/abs/1906.00565v1
PDF https://arxiv.org/pdf/1906.00565v1.pdf
PWC https://paperswithcode.com/paper/190600565
Repo
Framework

PolSAR Image Classification based on Polarimetric Scattering Coding and Sparse Support Matrix Machine

Title PolSAR Image Classification based on Polarimetric Scattering Coding and Sparse Support Matrix Machine
Authors Xu Liu, Licheng Jiao, Dan Zhang, Fang Liu
Abstract POLSAR image has an advantage over optical image because it can be acquired independently of cloud cover and solar illumination. PolSAR image classification is a hot and valuable topic for the interpretation of POLSAR image. In this paper, a novel POLSAR image classification method is proposed based on polarimetric scattering coding and sparse support matrix machine. First, we transform the original POLSAR data to get a real value matrix by the polarimetric scattering coding, which is called polarimetric scattering matrix and is a sparse matrix. Second, the sparse support matrix machine is used to classify the sparse polarimetric scattering matrix and get the classification map. The combination of these two steps takes full account of the characteristics of POLSAR. The experimental results show that the proposed method can get better results and is an effective classification method.
Tasks Image Classification
Published 2019-06-17
URL https://arxiv.org/abs/1906.07176v1
PDF https://arxiv.org/pdf/1906.07176v1.pdf
PWC https://paperswithcode.com/paper/polsar-image-classification-based-on
Repo
Framework
comments powered by Disqus