January 28, 2020

3196 words 16 mins read

Paper Group ANR 853

Degraded Historical Documents Images Binarization Using a Combination of Enhanced Techniques. Open Set Domain Adaptation for Image and Action Recognition. Predicting Soil pH by Using Nearest Fields. The Semantic Mutex Watershed for Efficient Bottom-Up Semantic Instance Segmentation. On the Vulnerability of CNN Classifiers in EEG-Based BCIs. Develop …

Degraded Historical Documents Images Binarization Using a Combination of Enhanced Techniques


Title	Degraded Historical Documents Images Binarization Using a Combination of Enhanced Techniques
Authors	Omar Boudraa, Walid Khaled Hidouci, Dominique Michelucci
Abstract	Document image binarization is the initial step and a crucial in many document analysis and recognition scheme. In fact, it is still a relevant research subject and a fundamental challenge due to its importance and influence. This paper provides an original multi-phases system that hybridizes various efficient image thresholding methods in order to get the best binarization output. First, to improve contrast in particularly defective images, the application of CLAHE algorithm is suggested and justified. We then use a cooperative technique to segment image into two separated classes. At the end, a special transformation is applied for the purpose of removing scattered noise and of correcting characters forms. Experimentations demonstrate the precision and the robustness of our framework applied on historical degraded documents images within three benchmarks compared to other noted methods.
Tasks
Published	2019-01-27
URL	http://arxiv.org/abs/1901.09425v1
PDF	http://arxiv.org/pdf/1901.09425v1.pdf
PWC	https://paperswithcode.com/paper/degraded-historical-documents-images
Repo
Framework

Open Set Domain Adaptation for Image and Action Recognition


Title	Open Set Domain Adaptation for Image and Action Recognition
Authors	Pau Panareda Busto, Ahsan Iqbal, Juergen Gall
Abstract	Since annotating and curating large datasets is very expensive, there is a need to transfer the knowledge from existing annotated datasets to unlabelled data. Data that is relevant for a specific application, however, usually differs from publicly available datasets since it is sampled from a different domain. While domain adaptation methods compensate for such a domain shift, they assume that all categories in the target domain are known and match the categories in the source domain. Since this assumption is violated under real-world conditions, we propose an approach for open set domain adaptation where the target domain contains instances of categories that are not present in the source domain. The proposed approach achieves state-of-the-art results on various datasets for image classification and action recognition. Since the approach can be used for open set and closed set domain adaptation, as well as unsupervised and semi-supervised domain adaptation, it is a versatile tool for many applications.
Tasks	Domain Adaptation, Image Classification
Published	2019-07-30
URL	https://arxiv.org/abs/1907.12865v1
PDF	https://arxiv.org/pdf/1907.12865v1.pdf
PWC	https://paperswithcode.com/paper/open-set-domain-adaptation-for-image-and
Repo
Framework

Predicting Soil pH by Using Nearest Fields


Title	Predicting Soil pH by Using Nearest Fields
Authors	Quoc Hung Ngo, Nhien-An Le-Khac, Tahar Kechadi
Abstract	In precision agriculture (PA), soil sampling and testing operation is prior to planting any new crop. It is an expensive operation since there are many soil characteristics to take into account. This paper gives an overview of soil characteristics and their relationships with crop yield and soil profiling. We propose an approach for predicting soil pH based on nearest neighbour fields. It implements spatial radius queries and various regression techniques in data mining. We use soil dataset containing about 4,000 fields profiles to evaluate them and analyse their robustness. A comparative study indicates that LR, SVR, and GBRT techniques achieved high accuracy, with the R_2 values of about 0.718 and MAE values of 0.29. The experimental results showed that the proposed approach is very promising and can contribute significantly to PA.
Tasks
Published	2019-12-03
URL	https://arxiv.org/abs/1912.01303v1
PDF	https://arxiv.org/pdf/1912.01303v1.pdf
PWC	https://paperswithcode.com/paper/predicting-soil-ph-by-using-nearest-fields
Repo
Framework

The Semantic Mutex Watershed for Efficient Bottom-Up Semantic Instance Segmentation


Title	The Semantic Mutex Watershed for Efficient Bottom-Up Semantic Instance Segmentation
Authors	Steffen Wolf, Yuyan Li, Constantin Pape, Alberto Bailoni, Anna Kreshuk, Fred A. Hamprecht
Abstract	Semantic instance segmentation is the task of simultaneously partitioning an image into distinct segments while associating each pixel with a class label. In commonly used pipelines, segmentation and label assignment are solved separately since joint optimization is computationally expensive. We propose a greedy algorithm for joint graph partitioning and labeling derived from the efficient Mutex Watershed partitioning algorithm. It optimizes an objective function closely related to the Symmetric Multiway Cut objective and empirically shows efficient scaling behavior. Due to the algorithm’s efficiency it can operate directly on pixels without prior over-segmentation of the image into superpixels. We evaluate the performance on the Cityscapes dataset (2D urban scenes) and on a 3D microscopy volume. In urban scenes, the proposed algorithm combined with current deep neural networks outperforms the strong baseline of `Panoptic Feature Pyramid Networks’ by Kirillov et al. (2019). In the 3D electron microscopy images, we show explicitly that our joint formulation outperforms a separate optimization of the partitioning and labeling problems. \|
Tasks	graph partitioning, Instance Segmentation, Semantic Segmentation
Published	2019-12-29
URL	https://arxiv.org/abs/1912.12717v1
PDF	https://arxiv.org/pdf/1912.12717v1.pdf
PWC	https://paperswithcode.com/paper/the-semantic-mutex-watershed-for-efficient
Repo
Framework

On the Vulnerability of CNN Classifiers in EEG-Based BCIs


Title	On the Vulnerability of CNN Classifiers in EEG-Based BCIs
Authors	Xiao Zhang, Dongrui Wu
Abstract	Deep learning has been successfully used in numerous applications because of its outstanding performance and the ability to avoid manual feature engineering. One such application is electroencephalogram (EEG) based brain-computer interface (BCI), where multiple convolutional neural network (CNN) models have been proposed for EEG classification. However, it has been found that deep learning models can be easily fooled with adversarial examples, which are normal examples with small deliberate perturbations. This paper proposes an unsupervised fast gradient sign method (UFGSM) to attack three popular CNN classifiers in BCIs, and demonstrates its effectiveness. We also verify the transferability of adversarial examples in BCIs, which means we can perform attacks even without knowing the architecture and parameters of the target models, or the datasets they were trained on. To our knowledge, this is the first study on the vulnerability of CNN classifiers in EEG-based BCIs, and hopefully will trigger more attention on the security of BCI systems.
Tasks	EEG, Feature Engineering
Published	2019-03-31
URL	http://arxiv.org/abs/1904.01002v1
PDF	http://arxiv.org/pdf/1904.01002v1.pdf
PWC	https://paperswithcode.com/paper/on-the-vulnerability-of-cnn-classifiers-in
Repo
Framework

Development of an Entropy-Based Feature Selection Method and Analysis of Online Reviews on Real Estate


Title	Development of an Entropy-Based Feature Selection Method and Analysis of Online Reviews on Real Estate
Authors	Hiroki Horino, Hirofumi Nonaka, Elisa Claire Alemán Carreón, Toru Hiraoka
Abstract	In recent years, data posted about real estate on the Internet is currently increasing. In this study, in order to analyze user needs for real estate, we focus on “Mansion Community” which is a Japanese bulletin board system (hereinafter referred to as BBS) about Japanese real estate. In our study, extraction of keywords is performed based on the calculation of the entropy value of each word, and we used them as features in a machine learning classifier to analyze 6 million posts at “Mansion Community”. As a result, we achieved a 0.69 F-measure and found that the customers are particularly concerned about the facility of apartment, access, and price of an apartment.
Tasks	Feature Selection
Published	2019-04-23
URL	http://arxiv.org/abs/1904.11797v1
PDF	http://arxiv.org/pdf/1904.11797v1.pdf
PWC	https://paperswithcode.com/paper/development-of-an-entropy-based-feature
Repo
Framework

End-to-end Adaptation with Backpropagation through WFST for On-device Speech Recognition System


Title	End-to-end Adaptation with Backpropagation through WFST for On-device Speech Recognition System
Authors	Emiru Tsunoo, Yosuke Kashiwagi, Satoshi Asakawa, Toshiyuki Kumakura
Abstract	An on-device DNN-HMM speech recognition system efficiently works with a limited vocabulary in the presence of a variety of predictable noise. In such a case, vocabulary and environment adaptation is highly effective. In this paper, we propose a novel method of end-to-end (E2E) adaptation, which adjusts not only an acoustic model (AM) but also a weighted finite-state transducer (WFST). We convert a pretrained WFST to a trainable neural network and adapt the system to target environments/vocabulary by E2E joint training with an AM. We replicate Viterbi decoding with forward–backward neural network computation, which is similar to recurrent neural networks (RNNs). By pooling output score sequences, a vocabulary posterior for each utterance is obtained and used for discriminative loss computation. Experiments using 2–10 hours of English/Japanese adaptation datasets indicate that the fine-tuning of only WFSTs and that of only AMs are both comparable to a state-of-the-art adaptation method, and E2E joint training of the two components achieves the best recognition performance. We also adapt each language system to the other language using the adaptation data, and the results show that the proposed method also works well for language adaptations.
Tasks	Speech Recognition
Published	2019-05-17
URL	https://arxiv.org/abs/1905.07149v3
PDF	https://arxiv.org/pdf/1905.07149v3.pdf
PWC	https://paperswithcode.com/paper/end-to-end-adaptation-with-backpropagation
Repo
Framework

Face Video Generation from a Single Image and Landmarks


Title	Face Video Generation from a Single Image and Landmarks
Authors	Kritaphat Songsri-in, Stefanos Zafeiriou
Abstract	In this paper we are concerned with the challenging problem of producing a full image sequence of a deformable face given only an image and generic facial motions encoded by a set of sparse landmarks. To this end we build upon recent breakthroughs in image-to-image translation such as pix2pix, CycleGAN and StarGAN which learn Deep Convolutional Neural Networks (DCNNs) that learn to map aligned pairs or images between different domains (i.e., having different labels) and propose a new architecture which is not driven any more by labels but by spatial maps, facial landmarks. In particular, we propose the MotionGAN which transforms an input face image into a new one according to a heatmap of target landmarks. We show that it is possible to create very realistic face videos using a single image and a set of target landmarks. Furthermore, our method can be used to edit a facial image with arbitrary motions according to landmarks (e.g., expression, speech, etc.). This provides much more flexibility to face editing, expression transfer, facial video creation, etc. than models based on discrete expressions, audios or action units.
Tasks	Image-to-Image Translation, Video Generation
Published	2019-04-25
URL	http://arxiv.org/abs/1904.11521v1
PDF	http://arxiv.org/pdf/1904.11521v1.pdf
PWC	https://paperswithcode.com/paper/face-video-generation-from-a-single-image-and
Repo
Framework

VQVAE Unsupervised Unit Discovery and Multi-scale Code2Spec Inverter for Zerospeech Challenge 2019


Title	VQVAE Unsupervised Unit Discovery and Multi-scale Code2Spec Inverter for Zerospeech Challenge 2019
Authors	Andros Tjandra, Berrak Sisman, Mingyang Zhang, Sakriani Sakti, Haizhou Li, Satoshi Nakamura
Abstract	We describe our submitted system for the ZeroSpeech Challenge 2019. The current challenge theme addresses the difficulty of constructing a speech synthesizer without any text or phonetic labels and requires a system that can (1) discover subword units in an unsupervised way, and (2) synthesize the speech with a target speaker’s voice. Moreover, the system should also balance the discrimination score ABX, the bit-rate compression rate, and the naturalness and the intelligibility of the constructed voice. To tackle these problems and achieve the best trade-off, we utilize a vector quantized variational autoencoder (VQ-VAE) and a multi-scale codebook-to-spectrogram (Code2Spec) inverter trained by mean square error and adversarial loss. The VQ-VAE extracts the speech to a latent space, forces itself to map it into the nearest codebook and produces compressed representation. Next, the inverter generates a magnitude spectrogram to the target voice, given the codebook vectors from VQ-VAE. In our experiments, we also investigated several other clustering algorithms, including K-Means and GMM, and compared them with the VQ-VAE result on ABX scores and bit rates. Our proposed approach significantly improved the intelligibility (in CER), the MOS, and discrimination ABX scores compared to the official ZeroSpeech 2019 baseline or even the topline.
Tasks
Published	2019-05-27
URL	https://arxiv.org/abs/1905.11449v2
PDF	https://arxiv.org/pdf/1905.11449v2.pdf
PWC	https://paperswithcode.com/paper/vqvae-unsupervised-unit-discovery-and-multi
Repo
Framework

Long-Term Human Video Generation of Multiple Futures Using Poses


Title	Long-Term Human Video Generation of Multiple Futures Using Poses
Authors	Naoya Fushishita, Antonio Tejero-de-Pablos, Yusuke Mukuta, Tatsuya Harada
Abstract	Predicting future human behavior from an input human video is a useful task for applications such as autonomous driving and robotics. While most previous works predict a single future, multiple futures with different behavior can potentially occur. Moreover, if the predicted future is too short (e.g., less than one second), it may not be fully usable by a human or other systems. In this paper, we propose a novel method for future human pose prediction capable of predicting multiple long-term futures. This makes the predictions more suitable for real applications. Also, from the input video and the predicted human behavior, we generate future videos. First, from an input human video, we generate sequences of future human poses (i.e., the image coordinates of their body-joints) via adversarial learning. Adversarial learning suffers from mode collapse, which makes it difficult to generate a variety of multiple poses. We solve this problem by utilizing two additional inputs to the generator to make the outputs diverse, namely, a latent code (to reflect various behaviors) and an attraction point (to reflect various trajectories). In addition, we generate long-term future human poses using a novel approach based on unidimensional convolutional neural networks. Last, we generate an output video based on the generated poses for visualization. We evaluate the generated future poses and videos using three criteria (i.e., realism, diversity and accuracy), and show that our proposed method outperforms other state-of-the-art works.
Tasks	Autonomous Driving, Pose Prediction, Video Generation, Video Prediction
Published	2019-04-16
URL	https://arxiv.org/abs/1904.07538v3
PDF	https://arxiv.org/pdf/1904.07538v3.pdf
PWC	https://paperswithcode.com/paper/long-term-video-generation-of-multiple
Repo
Framework

Driver Identification Based on Vehicle Telematics Data using LSTM-Recurrent Neural Network


Title	Driver Identification Based on Vehicle Telematics Data using LSTM-Recurrent Neural Network
Authors	Abenezer Girma, Xuyang Yan, Abdollah Homaifar
Abstract	Despite advancements in vehicle security systems, over the last decade, auto-theft rates have increased, and cyber-security attacks on internet-connected and autonomous vehicles are becoming a new threat. In this paper, a deep learning model is proposed, which can identify drivers from their driving behaviors based on vehicle telematics data. The proposed Long-Short-Term-Memory (LSTM) model predicts the identity of the driver based on the individual’s unique driving patterns learned from the vehicle telematics data. Given the telematics is time-series data, the problem is formulated as a time series prediction task to exploit the embedded sequential information. The performance of the proposed approach is evaluated on three naturalistic driving datasets, which gives high accuracy prediction results. The robustness of the model on noisy and anomalous data that is usually caused by sensor defects or environmental factors is also investigated. Results show that the proposed model prediction accuracy remains satisfactory and outperforms the other approaches despite the extent of anomalies and noise-induced in the data.
Tasks	Autonomous Vehicles, Time Series, Time Series Prediction
Published	2019-11-19
URL	https://arxiv.org/abs/1911.08030v1
PDF	https://arxiv.org/pdf/1911.08030v1.pdf
PWC	https://paperswithcode.com/paper/driver-identification-based-on-vehicle
Repo
Framework

Fuzzy adaptive teaching learning-based optimization strategy for the problem of generating mixed strength t-way test suites


Title	Fuzzy adaptive teaching learning-based optimization strategy for the problem of generating mixed strength t-way test suites
Authors	Kamal Z. Zamli, Fakhrud Din, Salmi Baharom, Bestoun S. Ahmed
Abstract	The teaching learning-based optimization (TLBO) algorithm has shown competitive performance in solving numerous real-world optimization problems. Nevertheless, this algorithm requires better control for exploitation and exploration to prevent premature convergence (i.e., trapped in local optima), as well as enhance solution diversity. Thus, this paper proposes a new TLBO variant based on Mamdani fuzzy inference system, called ATLBO, to permit adaptive selection of its global and local search operations. In order to assess its performances, we adopt ATLBO for the mixed strength t-way test generation problem. Experimental results reveal that ATLBO exhibits competitive performances against the original TLBO and other meta-heuristic counterparts.
Tasks
Published	2019-04-10
URL	https://arxiv.org/abs/1906.08855v1
PDF	https://arxiv.org/pdf/1906.08855v1.pdf
PWC	https://paperswithcode.com/paper/fuzzy-adaptive-teaching-learning-based
Repo
Framework

Detecting Activities of Daily Living and Routine Behaviours in Dementia Patients Living Alone Using Smart Meter Load Disaggregation


Title	Detecting Activities of Daily Living and Routine Behaviours in Dementia Patients Living Alone Using Smart Meter Load Disaggregation
Authors	C. Chalmers, P. Fergus, C. Aday Curbelo Montanez, S. Sikdar, F. Ball, B. Kendall
Abstract	The emergence of an ageing population is a significant public health concern. This has led to an increase in the number of people living with progressive neurodegenerative disorders like dementia. Consequently, the strain this is places on health and social care services means providing 24-hour monitoring is not sustainable. Technological intervention is being considered, however no solution exists to non-intrusively monitor the independent living needs of patients with dementia. As a result many patients hit crisis point before intervention and support is provided. In parallel, patient care relies on feedback from informal carers about significant behavioural changes. Yet, not all people have a social support network and early intervention in dementia care is often missed. The smart meter rollout has the potential to change this. Using machine learning and signal processing techniques, a home energy supply can be disaggregated to detect which home appliances are turned on and off. This will allow Activities of Daily Living (ADLs) to be assessed, such as eating and drinking, and observed changes in routine to be detected for early intervention. The primary aim is to help reduce deterioration and enable patients to stay in their homes for longer. A Support Vector Machine (SVM) and Random Decision Forest classifier are modelled using data from three test homes. The trained models are then used to monitor two patients with dementia during a six-month clinical trial undertaken in partnership with Mersey Care NHS Foundation Trust. In the case of load disaggregation for appliance detection, the SVM achieved (AUC=0.86074, Sen=0.756 and Spec=0.92838). While the Decision Forest achieved (AUC=0.9429, Sen=0.9634 and Spec=0.9634). ADLs are also analysed to identify the behavioural patterns of the occupant while detecting alterations in routine.
Tasks
Published	2019-03-18
URL	http://arxiv.org/abs/1903.12080v1
PDF	http://arxiv.org/pdf/1903.12080v1.pdf
PWC	https://paperswithcode.com/paper/detecting-activities-of-daily-living-and
Repo
Framework

Controllable Paraphrase Generation with a Syntactic Exemplar


Title	Controllable Paraphrase Generation with a Syntactic Exemplar
Authors	Mingda Chen, Qingming Tang, Sam Wiseman, Kevin Gimpel
Abstract	Prior work on controllable text generation usually assumes that the controlled attribute can take on one of a small set of values known a priori. In this work, we propose a novel task, where the syntax of a generated sentence is controlled rather by a sentential exemplar. To evaluate quantitatively with standard metrics, we create a novel dataset with human annotations. We also develop a variational model with a neural module specifically designed for capturing syntactic knowledge and several multitask training objectives to promote disentangled representation learning. Empirically, the proposed model is observed to achieve improvements over baselines and learn to capture desirable characteristics.
Tasks	Paraphrase Generation, Representation Learning, Text Generation
Published	2019-06-03
URL	https://arxiv.org/abs/1906.00565v1
PDF	https://arxiv.org/pdf/1906.00565v1.pdf
PWC	https://paperswithcode.com/paper/190600565
Repo
Framework

PolSAR Image Classification based on Polarimetric Scattering Coding and Sparse Support Matrix Machine


Title	PolSAR Image Classification based on Polarimetric Scattering Coding and Sparse Support Matrix Machine
Authors	Xu Liu, Licheng Jiao, Dan Zhang, Fang Liu
Abstract	POLSAR image has an advantage over optical image because it can be acquired independently of cloud cover and solar illumination. PolSAR image classification is a hot and valuable topic for the interpretation of POLSAR image. In this paper, a novel POLSAR image classification method is proposed based on polarimetric scattering coding and sparse support matrix machine. First, we transform the original POLSAR data to get a real value matrix by the polarimetric scattering coding, which is called polarimetric scattering matrix and is a sparse matrix. Second, the sparse support matrix machine is used to classify the sparse polarimetric scattering matrix and get the classification map. The combination of these two steps takes full account of the characteristics of POLSAR. The experimental results show that the proposed method can get better results and is an effective classification method.
Tasks	Image Classification
Published	2019-06-17
URL	https://arxiv.org/abs/1906.07176v1
PDF	https://arxiv.org/pdf/1906.07176v1.pdf
PWC	https://paperswithcode.com/paper/polsar-image-classification-based-on
Repo
Framework