Paper Group ANR 853
Degraded Historical Documents Images Binarization Using a Combination of Enhanced Techniques. Open Set Domain Adaptation for Image and Action Recognition. Predicting Soil pH by Using Nearest Fields. The Semantic Mutex Watershed for Efficient Bottom-Up Semantic Instance Segmentation. On the Vulnerability of CNN Classifiers in EEG-Based BCIs. Develop …
Degraded Historical Documents Images Binarization Using a Combination of Enhanced Techniques
Title | Degraded Historical Documents Images Binarization Using a Combination of Enhanced Techniques |
Authors | Omar Boudraa, Walid Khaled Hidouci, Dominique Michelucci |
Abstract | Document image binarization is the initial step and a crucial in many document analysis and recognition scheme. In fact, it is still a relevant research subject and a fundamental challenge due to its importance and influence. This paper provides an original multi-phases system that hybridizes various efficient image thresholding methods in order to get the best binarization output. First, to improve contrast in particularly defective images, the application of CLAHE algorithm is suggested and justified. We then use a cooperative technique to segment image into two separated classes. At the end, a special transformation is applied for the purpose of removing scattered noise and of correcting characters forms. Experimentations demonstrate the precision and the robustness of our framework applied on historical degraded documents images within three benchmarks compared to other noted methods. |
Tasks | |
Published | 2019-01-27 |
URL | http://arxiv.org/abs/1901.09425v1 |
http://arxiv.org/pdf/1901.09425v1.pdf | |
PWC | https://paperswithcode.com/paper/degraded-historical-documents-images |
Repo | |
Framework | |
Open Set Domain Adaptation for Image and Action Recognition
Title | Open Set Domain Adaptation for Image and Action Recognition |
Authors | Pau Panareda Busto, Ahsan Iqbal, Juergen Gall |
Abstract | Since annotating and curating large datasets is very expensive, there is a need to transfer the knowledge from existing annotated datasets to unlabelled data. Data that is relevant for a specific application, however, usually differs from publicly available datasets since it is sampled from a different domain. While domain adaptation methods compensate for such a domain shift, they assume that all categories in the target domain are known and match the categories in the source domain. Since this assumption is violated under real-world conditions, we propose an approach for open set domain adaptation where the target domain contains instances of categories that are not present in the source domain. The proposed approach achieves state-of-the-art results on various datasets for image classification and action recognition. Since the approach can be used for open set and closed set domain adaptation, as well as unsupervised and semi-supervised domain adaptation, it is a versatile tool for many applications. |
Tasks | Domain Adaptation, Image Classification |
Published | 2019-07-30 |
URL | https://arxiv.org/abs/1907.12865v1 |
https://arxiv.org/pdf/1907.12865v1.pdf | |
PWC | https://paperswithcode.com/paper/open-set-domain-adaptation-for-image-and |
Repo | |
Framework | |
Predicting Soil pH by Using Nearest Fields
Title | Predicting Soil pH by Using Nearest Fields |
Authors | Quoc Hung Ngo, Nhien-An Le-Khac, Tahar Kechadi |
Abstract | In precision agriculture (PA), soil sampling and testing operation is prior to planting any new crop. It is an expensive operation since there are many soil characteristics to take into account. This paper gives an overview of soil characteristics and their relationships with crop yield and soil profiling. We propose an approach for predicting soil pH based on nearest neighbour fields. It implements spatial radius queries and various regression techniques in data mining. We use soil dataset containing about 4,000 fields profiles to evaluate them and analyse their robustness. A comparative study indicates that LR, SVR, and GBRT techniques achieved high accuracy, with the R_2 values of about 0.718 and MAE values of 0.29. The experimental results showed that the proposed approach is very promising and can contribute significantly to PA. |
Tasks | |
Published | 2019-12-03 |
URL | https://arxiv.org/abs/1912.01303v1 |
https://arxiv.org/pdf/1912.01303v1.pdf | |
PWC | https://paperswithcode.com/paper/predicting-soil-ph-by-using-nearest-fields |
Repo | |
Framework | |
The Semantic Mutex Watershed for Efficient Bottom-Up Semantic Instance Segmentation
Title | The Semantic Mutex Watershed for Efficient Bottom-Up Semantic Instance Segmentation |
Authors | Steffen Wolf, Yuyan Li, Constantin Pape, Alberto Bailoni, Anna Kreshuk, Fred A. Hamprecht |
Abstract | Semantic instance segmentation is the task of simultaneously partitioning an image into distinct segments while associating each pixel with a class label. In commonly used pipelines, segmentation and label assignment are solved separately since joint optimization is computationally expensive. We propose a greedy algorithm for joint graph partitioning and labeling derived from the efficient Mutex Watershed partitioning algorithm. It optimizes an objective function closely related to the Symmetric Multiway Cut objective and empirically shows efficient scaling behavior. Due to the algorithm’s efficiency it can operate directly on pixels without prior over-segmentation of the image into superpixels. We evaluate the performance on the Cityscapes dataset (2D urban scenes) and on a 3D microscopy volume. In urban scenes, the proposed algorithm combined with current deep neural networks outperforms the strong baseline of `Panoptic Feature Pyramid Networks’ by Kirillov et al. (2019). In the 3D electron microscopy images, we show explicitly that our joint formulation outperforms a separate optimization of the partitioning and labeling problems. | |
Tasks | graph partitioning, Instance Segmentation, Semantic Segmentation |
Published | 2019-12-29 |
URL | https://arxiv.org/abs/1912.12717v1 |
https://arxiv.org/pdf/1912.12717v1.pdf | |
PWC | https://paperswithcode.com/paper/the-semantic-mutex-watershed-for-efficient |
Repo | |
Framework | |
On the Vulnerability of CNN Classifiers in EEG-Based BCIs
Title | On the Vulnerability of CNN Classifiers in EEG-Based BCIs |
Authors | Xiao Zhang, Dongrui Wu |
Abstract | Deep learning has been successfully used in numerous applications because of its outstanding performance and the ability to avoid manual feature engineering. One such application is electroencephalogram (EEG) based brain-computer interface (BCI), where multiple convolutional neural network (CNN) models have been proposed for EEG classification. However, it has been found that deep learning models can be easily fooled with adversarial examples, which are normal examples with small deliberate perturbations. This paper proposes an unsupervised fast gradient sign method (UFGSM) to attack three popular CNN classifiers in BCIs, and demonstrates its effectiveness. We also verify the transferability of adversarial examples in BCIs, which means we can perform attacks even without knowing the architecture and parameters of the target models, or the datasets they were trained on. To our knowledge, this is the first study on the vulnerability of CNN classifiers in EEG-based BCIs, and hopefully will trigger more attention on the security of BCI systems. |
Tasks | EEG, Feature Engineering |
Published | 2019-03-31 |
URL | http://arxiv.org/abs/1904.01002v1 |
http://arxiv.org/pdf/1904.01002v1.pdf | |
PWC | https://paperswithcode.com/paper/on-the-vulnerability-of-cnn-classifiers-in |
Repo | |
Framework | |
Development of an Entropy-Based Feature Selection Method and Analysis of Online Reviews on Real Estate
Title | Development of an Entropy-Based Feature Selection Method and Analysis of Online Reviews on Real Estate |
Authors | Hiroki Horino, Hirofumi Nonaka, Elisa Claire Alemán Carreón, Toru Hiraoka |
Abstract | In recent years, data posted about real estate on the Internet is currently increasing. In this study, in order to analyze user needs for real estate, we focus on “Mansion Community” which is a Japanese bulletin board system (hereinafter referred to as BBS) about Japanese real estate. In our study, extraction of keywords is performed based on the calculation of the entropy value of each word, and we used them as features in a machine learning classifier to analyze 6 million posts at “Mansion Community”. As a result, we achieved a 0.69 F-measure and found that the customers are particularly concerned about the facility of apartment, access, and price of an apartment. |
Tasks | Feature Selection |
Published | 2019-04-23 |
URL | http://arxiv.org/abs/1904.11797v1 |
http://arxiv.org/pdf/1904.11797v1.pdf | |
PWC | https://paperswithcode.com/paper/development-of-an-entropy-based-feature |
Repo | |
Framework | |
End-to-end Adaptation with Backpropagation through WFST for On-device Speech Recognition System
Title | End-to-end Adaptation with Backpropagation through WFST for On-device Speech Recognition System |
Authors | Emiru Tsunoo, Yosuke Kashiwagi, Satoshi Asakawa, Toshiyuki Kumakura |
Abstract | An on-device DNN-HMM speech recognition system efficiently works with a limited vocabulary in the presence of a variety of predictable noise. In such a case, vocabulary and environment adaptation is highly effective. In this paper, we propose a novel method of end-to-end (E2E) adaptation, which adjusts not only an acoustic model (AM) but also a weighted finite-state transducer (WFST). We convert a pretrained WFST to a trainable neural network and adapt the system to target environments/vocabulary by E2E joint training with an AM. We replicate Viterbi decoding with forward–backward neural network computation, which is similar to recurrent neural networks (RNNs). By pooling output score sequences, a vocabulary posterior for each utterance is obtained and used for discriminative loss computation. Experiments using 2–10 hours of English/Japanese adaptation datasets indicate that the fine-tuning of only WFSTs and that of only AMs are both comparable to a state-of-the-art adaptation method, and E2E joint training of the two components achieves the best recognition performance. We also adapt each language system to the other language using the adaptation data, and the results show that the proposed method also works well for language adaptations. |
Tasks | Speech Recognition |
Published | 2019-05-17 |
URL | https://arxiv.org/abs/1905.07149v3 |
https://arxiv.org/pdf/1905.07149v3.pdf | |
PWC | https://paperswithcode.com/paper/end-to-end-adaptation-with-backpropagation |
Repo | |
Framework | |
Face Video Generation from a Single Image and Landmarks
Title | Face Video Generation from a Single Image and Landmarks |
Authors | Kritaphat Songsri-in, Stefanos Zafeiriou |
Abstract | In this paper we are concerned with the challenging problem of producing a full image sequence of a deformable face given only an image and generic facial motions encoded by a set of sparse landmarks. To this end we build upon recent breakthroughs in image-to-image translation such as pix2pix, CycleGAN and StarGAN which learn Deep Convolutional Neural Networks (DCNNs) that learn to map aligned pairs or images between different domains (i.e., having different labels) and propose a new architecture which is not driven any more by labels but by spatial maps, facial landmarks. In particular, we propose the MotionGAN which transforms an input face image into a new one according to a heatmap of target landmarks. We show that it is possible to create very realistic face videos using a single image and a set of target landmarks. Furthermore, our method can be used to edit a facial image with arbitrary motions according to landmarks (e.g., expression, speech, etc.). This provides much more flexibility to face editing, expression transfer, facial video creation, etc. than models based on discrete expressions, audios or action units. |
Tasks | Image-to-Image Translation, Video Generation |
Published | 2019-04-25 |
URL | http://arxiv.org/abs/1904.11521v1 |
http://arxiv.org/pdf/1904.11521v1.pdf | |
PWC | https://paperswithcode.com/paper/face-video-generation-from-a-single-image-and |
Repo | |
Framework | |
VQVAE Unsupervised Unit Discovery and Multi-scale Code2Spec Inverter for Zerospeech Challenge 2019
Title | VQVAE Unsupervised Unit Discovery and Multi-scale Code2Spec Inverter for Zerospeech Challenge 2019 |
Authors | Andros Tjandra, Berrak Sisman, Mingyang Zhang, Sakriani Sakti, Haizhou Li, Satoshi Nakamura |
Abstract | We describe our submitted system for the ZeroSpeech Challenge 2019. The current challenge theme addresses the difficulty of constructing a speech synthesizer without any text or phonetic labels and requires a system that can (1) discover subword units in an unsupervised way, and (2) synthesize the speech with a target speaker’s voice. Moreover, the system should also balance the discrimination score ABX, the bit-rate compression rate, and the naturalness and the intelligibility of the constructed voice. To tackle these problems and achieve the best trade-off, we utilize a vector quantized variational autoencoder (VQ-VAE) and a multi-scale codebook-to-spectrogram (Code2Spec) inverter trained by mean square error and adversarial loss. The VQ-VAE extracts the speech to a latent space, forces itself to map it into the nearest codebook and produces compressed representation. Next, the inverter generates a magnitude spectrogram to the target voice, given the codebook vectors from VQ-VAE. In our experiments, we also investigated several other clustering algorithms, including K-Means and GMM, and compared them with the VQ-VAE result on ABX scores and bit rates. Our proposed approach significantly improved the intelligibility (in CER), the MOS, and discrimination ABX scores compared to the official ZeroSpeech 2019 baseline or even the topline. |
Tasks | |
Published | 2019-05-27 |
URL | https://arxiv.org/abs/1905.11449v2 |
https://arxiv.org/pdf/1905.11449v2.pdf | |
PWC | https://paperswithcode.com/paper/vqvae-unsupervised-unit-discovery-and-multi |
Repo | |
Framework | |
Long-Term Human Video Generation of Multiple Futures Using Poses
Title | Long-Term Human Video Generation of Multiple Futures Using Poses |
Authors | Naoya Fushishita, Antonio Tejero-de-Pablos, Yusuke Mukuta, Tatsuya Harada |
Abstract | Predicting future human behavior from an input human video is a useful task for applications such as autonomous driving and robotics. While most previous works predict a single future, multiple futures with different behavior can potentially occur. Moreover, if the predicted future is too short (e.g., less than one second), it may not be fully usable by a human or other systems. In this paper, we propose a novel method for future human pose prediction capable of predicting multiple long-term futures. This makes the predictions more suitable for real applications. Also, from the input video and the predicted human behavior, we generate future videos. First, from an input human video, we generate sequences of future human poses (i.e., the image coordinates of their body-joints) via adversarial learning. Adversarial learning suffers from mode collapse, which makes it difficult to generate a variety of multiple poses. We solve this problem by utilizing two additional inputs to the generator to make the outputs diverse, namely, a latent code (to reflect various behaviors) and an attraction point (to reflect various trajectories). In addition, we generate long-term future human poses using a novel approach based on unidimensional convolutional neural networks. Last, we generate an output video based on the generated poses for visualization. We evaluate the generated future poses and videos using three criteria (i.e., realism, diversity and accuracy), and show that our proposed method outperforms other state-of-the-art works. |
Tasks | Autonomous Driving, Pose Prediction, Video Generation, Video Prediction |
Published | 2019-04-16 |
URL | https://arxiv.org/abs/1904.07538v3 |
https://arxiv.org/pdf/1904.07538v3.pdf | |
PWC | https://paperswithcode.com/paper/long-term-video-generation-of-multiple |
Repo | |
Framework | |
Driver Identification Based on Vehicle Telematics Data using LSTM-Recurrent Neural Network
Title | Driver Identification Based on Vehicle Telematics Data using LSTM-Recurrent Neural Network |
Authors | Abenezer Girma, Xuyang Yan, Abdollah Homaifar |
Abstract | Despite advancements in vehicle security systems, over the last decade, auto-theft rates have increased, and cyber-security attacks on internet-connected and autonomous vehicles are becoming a new threat. In this paper, a deep learning model is proposed, which can identify drivers from their driving behaviors based on vehicle telematics data. The proposed Long-Short-Term-Memory (LSTM) model predicts the identity of the driver based on the individual’s unique driving patterns learned from the vehicle telematics data. Given the telematics is time-series data, the problem is formulated as a time series prediction task to exploit the embedded sequential information. The performance of the proposed approach is evaluated on three naturalistic driving datasets, which gives high accuracy prediction results. The robustness of the model on noisy and anomalous data that is usually caused by sensor defects or environmental factors is also investigated. Results show that the proposed model prediction accuracy remains satisfactory and outperforms the other approaches despite the extent of anomalies and noise-induced in the data. |
Tasks | Autonomous Vehicles, Time Series, Time Series Prediction |
Published | 2019-11-19 |
URL | https://arxiv.org/abs/1911.08030v1 |
https://arxiv.org/pdf/1911.08030v1.pdf | |
PWC | https://paperswithcode.com/paper/driver-identification-based-on-vehicle |
Repo | |
Framework | |
Fuzzy adaptive teaching learning-based optimization strategy for the problem of generating mixed strength t-way test suites
Title | Fuzzy adaptive teaching learning-based optimization strategy for the problem of generating mixed strength t-way test suites |
Authors | Kamal Z. Zamli, Fakhrud Din, Salmi Baharom, Bestoun S. Ahmed |
Abstract | The teaching learning-based optimization (TLBO) algorithm has shown competitive performance in solving numerous real-world optimization problems. Nevertheless, this algorithm requires better control for exploitation and exploration to prevent premature convergence (i.e., trapped in local optima), as well as enhance solution diversity. Thus, this paper proposes a new TLBO variant based on Mamdani fuzzy inference system, called ATLBO, to permit adaptive selection of its global and local search operations. In order to assess its performances, we adopt ATLBO for the mixed strength t-way test generation problem. Experimental results reveal that ATLBO exhibits competitive performances against the original TLBO and other meta-heuristic counterparts. |
Tasks | |
Published | 2019-04-10 |
URL | https://arxiv.org/abs/1906.08855v1 |
https://arxiv.org/pdf/1906.08855v1.pdf | |
PWC | https://paperswithcode.com/paper/fuzzy-adaptive-teaching-learning-based |
Repo | |
Framework | |
Detecting Activities of Daily Living and Routine Behaviours in Dementia Patients Living Alone Using Smart Meter Load Disaggregation
Title | Detecting Activities of Daily Living and Routine Behaviours in Dementia Patients Living Alone Using Smart Meter Load Disaggregation |
Authors | C. Chalmers, P. Fergus, C. Aday Curbelo Montanez, S. Sikdar, F. Ball, B. Kendall |
Abstract | The emergence of an ageing population is a significant public health concern. This has led to an increase in the number of people living with progressive neurodegenerative disorders like dementia. Consequently, the strain this is places on health and social care services means providing 24-hour monitoring is not sustainable. Technological intervention is being considered, however no solution exists to non-intrusively monitor the independent living needs of patients with dementia. As a result many patients hit crisis point before intervention and support is provided. In parallel, patient care relies on feedback from informal carers about significant behavioural changes. Yet, not all people have a social support network and early intervention in dementia care is often missed. The smart meter rollout has the potential to change this. Using machine learning and signal processing techniques, a home energy supply can be disaggregated to detect which home appliances are turned on and off. This will allow Activities of Daily Living (ADLs) to be assessed, such as eating and drinking, and observed changes in routine to be detected for early intervention. The primary aim is to help reduce deterioration and enable patients to stay in their homes for longer. A Support Vector Machine (SVM) and Random Decision Forest classifier are modelled using data from three test homes. The trained models are then used to monitor two patients with dementia during a six-month clinical trial undertaken in partnership with Mersey Care NHS Foundation Trust. In the case of load disaggregation for appliance detection, the SVM achieved (AUC=0.86074, Sen=0.756 and Spec=0.92838). While the Decision Forest achieved (AUC=0.9429, Sen=0.9634 and Spec=0.9634). ADLs are also analysed to identify the behavioural patterns of the occupant while detecting alterations in routine. |
Tasks | |
Published | 2019-03-18 |
URL | http://arxiv.org/abs/1903.12080v1 |
http://arxiv.org/pdf/1903.12080v1.pdf | |
PWC | https://paperswithcode.com/paper/detecting-activities-of-daily-living-and |
Repo | |
Framework | |
Controllable Paraphrase Generation with a Syntactic Exemplar
Title | Controllable Paraphrase Generation with a Syntactic Exemplar |
Authors | Mingda Chen, Qingming Tang, Sam Wiseman, Kevin Gimpel |
Abstract | Prior work on controllable text generation usually assumes that the controlled attribute can take on one of a small set of values known a priori. In this work, we propose a novel task, where the syntax of a generated sentence is controlled rather by a sentential exemplar. To evaluate quantitatively with standard metrics, we create a novel dataset with human annotations. We also develop a variational model with a neural module specifically designed for capturing syntactic knowledge and several multitask training objectives to promote disentangled representation learning. Empirically, the proposed model is observed to achieve improvements over baselines and learn to capture desirable characteristics. |
Tasks | Paraphrase Generation, Representation Learning, Text Generation |
Published | 2019-06-03 |
URL | https://arxiv.org/abs/1906.00565v1 |
https://arxiv.org/pdf/1906.00565v1.pdf | |
PWC | https://paperswithcode.com/paper/190600565 |
Repo | |
Framework | |
PolSAR Image Classification based on Polarimetric Scattering Coding and Sparse Support Matrix Machine
Title | PolSAR Image Classification based on Polarimetric Scattering Coding and Sparse Support Matrix Machine |
Authors | Xu Liu, Licheng Jiao, Dan Zhang, Fang Liu |
Abstract | POLSAR image has an advantage over optical image because it can be acquired independently of cloud cover and solar illumination. PolSAR image classification is a hot and valuable topic for the interpretation of POLSAR image. In this paper, a novel POLSAR image classification method is proposed based on polarimetric scattering coding and sparse support matrix machine. First, we transform the original POLSAR data to get a real value matrix by the polarimetric scattering coding, which is called polarimetric scattering matrix and is a sparse matrix. Second, the sparse support matrix machine is used to classify the sparse polarimetric scattering matrix and get the classification map. The combination of these two steps takes full account of the characteristics of POLSAR. The experimental results show that the proposed method can get better results and is an effective classification method. |
Tasks | Image Classification |
Published | 2019-06-17 |
URL | https://arxiv.org/abs/1906.07176v1 |
https://arxiv.org/pdf/1906.07176v1.pdf | |
PWC | https://paperswithcode.com/paper/polsar-image-classification-based-on |
Repo | |
Framework | |