October 18, 2019

3386 words 16 mins read

Paper Group ANR 591

Beyond Pixels: Image Provenance Analysis Leveraging Metadata. Bayesian optimization of the PC algorithm for learning Gaussian Bayesian networks. Unlabeled sample compression schemes and corner peelings for ample and maximum classes. Layerwise Perturbation-Based Adversarial Training for Hard Drive Health Degree Prediction. A Transfer Learning based …

Beyond Pixels: Image Provenance Analysis Leveraging Metadata


Title	Beyond Pixels: Image Provenance Analysis Leveraging Metadata
Authors	Aparna Bharati, Daniel Moreira, Joel Brogan, Patricia Hale, Kevin W. Bowyer, Patrick J. Flynn, Anderson Rocha, Walter J. Scheirer
Abstract	Creative works, whether paintings or memes, follow unique journeys that result in their final form. Understanding these journeys, a process known as “provenance analysis”, provides rich insights into the use, motivation, and authenticity underlying any given work. The application of this type of study to the expanse of unregulated content on the Internet is what we consider in this paper. Provenance analysis provides a snapshot of the chronology and validity of content as it is uploaded, re-uploaded, and modified over time. Although still in its infancy, automated provenance analysis for online multimedia is already being applied to different types of content. Most current works seek to build provenance graphs based on the shared content between images or videos. This can be a computationally expensive task, especially when considering the vast influx of content that the Internet sees every day. Utilizing non-content-based information, such as timestamps, geotags, and camera IDs can help provide important insights into the path a particular image or video has traveled during its time on the Internet without large computational overhead. This paper tests the scope and applicability of metadata-based inferences for provenance graph construction in two different scenarios: digital image forensics and cultural analytics.
Tasks	graph construction
Published	2018-07-09
URL	http://arxiv.org/abs/1807.03376v3
PDF	http://arxiv.org/pdf/1807.03376v3.pdf
PWC	https://paperswithcode.com/paper/beyond-pixels-image-provenance-analysis
Repo
Framework

Bayesian optimization of the PC algorithm for learning Gaussian Bayesian networks


Title	Bayesian optimization of the PC algorithm for learning Gaussian Bayesian networks
Authors	Irene Córdoba, Eduardo C. Garrido-Merchán, Daniel Hernández-Lobato, Concha Bielza, Pedro Larrañaga
Abstract	The PC algorithm is a popular method for learning the structure of Gaussian Bayesian networks. It carries out statistical tests to determine absent edges in the network. It is hence governed by two parameters: (i) The type of test, and (ii) its significance level. These parameters are usually set to values recommended by an expert. Nevertheless, such an approach can suffer from human bias, leading to suboptimal reconstruction results. In this paper we consider a more principled approach for choosing these parameters in an automatic way. For this we optimize a reconstruction score evaluated on a set of different Gaussian Bayesian networks. This objective is expensive to evaluate and lacks a closed-form expression, which means that Bayesian optimization (BO) is a natural choice. BO methods use a model to guide the search and are hence able to exploit smoothness properties of the objective surface. We show that the parameters found by a BO method outperform those found by a random search strategy and the expert recommendation. Importantly, we have found that an often overlooked statistical test provides the best over-all reconstruction results.
Tasks
Published	2018-06-28
URL	http://arxiv.org/abs/1806.11015v1
PDF	http://arxiv.org/pdf/1806.11015v1.pdf
PWC	https://paperswithcode.com/paper/bayesian-optimization-of-the-pc-algorithm-for
Repo
Framework

Unlabeled sample compression schemes and corner peelings for ample and maximum classes


Title	Unlabeled sample compression schemes and corner peelings for ample and maximum classes
Authors	Jérémie Chalopin, Victor Chepoi, Shay Moran, Manfred K. Warmuth
Abstract	We examine connections between combinatorial notions that arise in machine learning and topological notions in cubical/simplicial geometry. These connections enable to export results from geometry to machine learning. Our first main result is based on a geometric construction by Tracy Hall (2004) of a partial shelling of the cross-polytope which can not be extended. We use it to derive a maximum class of VC dimension 3 that has no corners. This refutes several previous works in machine learning from the past 11 years. In particular, it implies that all previous constructions of optimal unlabeled sample compression schemes for maximum classes are erroneous. On the positive side we present a new construction of an unlabeled sample compression scheme for maximum classes. We leave as open whether our unlabeled sample compression scheme extends to ample (a.k.a. lopsided or extremal) classes, which represent a natural and far-reaching generalization of maximum classes. Towards resolving this question, we provide a geometric characterization in terms of unique sink orientations of the 1-skeletons of associated cubical complexes.
Tasks
Published	2018-12-05
URL	http://arxiv.org/abs/1812.02099v1
PDF	http://arxiv.org/pdf/1812.02099v1.pdf
PWC	https://paperswithcode.com/paper/unlabeled-sample-compression-schemes-and
Repo
Framework

Layerwise Perturbation-Based Adversarial Training for Hard Drive Health Degree Prediction


Title	Layerwise Perturbation-Based Adversarial Training for Hard Drive Health Degree Prediction
Authors	Jianguo Zhang, Ji Wang, Lifang He, Zhao Li, Philip S. Yu
Abstract	With the development of cloud computing and big data, the reliability of data storage systems becomes increasingly important. Previous researchers have shown that machine learning algorithms based on SMART attributes are effective methods to predict hard drive failures. In this paper, we use SMART attributes to predict hard drive health degrees which are helpful for taking different fault tolerant actions in advance. Given the highly imbalanced SMART datasets, it is a nontrivial work to predict the health degree precisely. The proposed model would encounter overfitting and biased fitting problems if it is trained by the traditional methods. In order to resolve this problem, we propose two strategies to better utilize imbalanced data and improve performance. Firstly, we design a layerwise perturbation-based adversarial training method which can add perturbations to any layers of a neural network to improve the generalization of the network. Secondly, we extend the training method to the semi-supervised settings. Then, it is possible to utilize unlabeled data that have a potential of failure to further improve the performance of the model. Our extensive experiments on two real-world hard drive datasets demonstrate the superiority of the proposed schemes for both supervised and semi-supervised classification. The model trained by the proposed method can correctly predict the hard drive health status 5 and 15 days in advance. Finally, we verify the generality of the proposed training method in other similar anomaly detection tasks where the dataset is imbalanced. The results argue that the proposed methods are applicable to other domains.
Tasks	Anomaly Detection
Published	2018-09-11
URL	http://arxiv.org/abs/1809.04188v4
PDF	http://arxiv.org/pdf/1809.04188v4.pdf
PWC	https://paperswithcode.com/paper/layerwise-perturbation-based-adversarial
Repo
Framework

A Transfer Learning based Feature-Weak-Relevant Method for Image Clustering


Title	A Transfer Learning based Feature-Weak-Relevant Method for Image Clustering
Authors	Bo Dong, Xinnian Wang
Abstract	Image clustering is to group a set of images into disjoint clusters in a way that images in the same cluster are more similar to each other than to those in other clusters, which is an unsupervised or semi-supervised learning process. It is a crucial and challenging task in machine learning and computer vision. The performances of existing image clustering methods have close relations with features used for clustering, even if unsupervised coding based methods have improved the performances a lot. To reduce the effect of clustering features, we propose a feature-weak-relevant method for image clustering. The proposed method converts an unsupervised clustering process into an alternative iterative process of unsupervised learning and transfer learning. The clustering process firstly starts up from handcrafted features based image clustering to estimate an initial label for every image, and secondly use a proposed sampling strategy to choose images with reliable labels to feed a transfer-learning model to learn representative features that can be used for next round of unsupervised learning. In this manner, image clustering is iteratively optimized. What’s more, the handcrafted features are used to boot up the clustering process, and just have a little effect on the final performance; therefore, the proposed method is feature-weak-relevant. Experimental results on six kinds of public available datasets show that the proposed method outperforms state of the art methods and depends less on the employed features at the same time.
Tasks	Image Clustering, Transfer Learning
Published	2018-08-13
URL	http://arxiv.org/abs/1808.04068v2
PDF	http://arxiv.org/pdf/1808.04068v2.pdf
PWC	https://paperswithcode.com/paper/a-transfer-learning-based-feature-weak
Repo
Framework

The feasibility of automated identification of six algae types using neural networks and fluorescence-based spectral-morphological features


Title	The feasibility of automated identification of six algae types using neural networks and fluorescence-based spectral-morphological features
Authors	Jason L. Deglint, Chao Jin, Angela Chao, Alexander Wong
Abstract	Harmful algae blooms (HABs), which produce lethal toxins, are a growing global concern since they negatively affect the quality of drinking water and have major negative impact on wildlife, the fishing industry, as well as tourism and recreational water use. In this study, we investigate the feasibility of leveraging machine learning and fluorescence-based spectral-morphological features to enable the identification of six different algae types in an automated fashion. More specifically, a custom multi-band fluorescence imaging microscope is used to capture fluorescence imaging data of a water sample at six different excitation wavelengths ranging from 405 nm - 530 nm. A number of morphological and spectral fluorescence features are then extracted from the isolated micro-organism imaging data, and used to train neural network classification models designed for the purpose of identification of the six algae types given an isolated micro-organism. Experimental results using three different neural network classification models showed that the use of either fluorescence-based spectral features or fluorescence-based spectral-morphological features to train neural network classification models led to statistically significant improvements in identification accuracy when compared to the use of morphological features (with average identification accuracies of 95.7%+/-3.5% and 96.1%+/-1.5%, respectively). These preliminary results are quite promising, given that the identification accuracy of human taxonomists are typically between the range of 67% and 83%, and thus illustrates the feasibility of leveraging machine learning and fluorescence-based spectral-morphological features as a viable method for automated identification of different algae types.
Tasks
Published	2018-05-03
URL	http://arxiv.org/abs/1805.01093v1
PDF	http://arxiv.org/pdf/1805.01093v1.pdf
PWC	https://paperswithcode.com/paper/the-feasibility-of-automated-identification
Repo
Framework

Identity-Enhanced Network for Facial Expression Recognition


Title	Identity-Enhanced Network for Facial Expression Recognition
Authors	Yanwei Li, Xingang Wang, Shilei Zhang, Lingxi Xie, Wenqi Wu, Hongyuan Yu, Zheng Zhu
Abstract	Facial expression recognition is a challenging task, arguably because of large intra-class variations and high inter-class similarities. The core drawback of the existing approaches is the lack of ability to discriminate the changes in appearance caused by emotions and identities. In this paper, we present a novel identity-enhanced network (IDEnNet) to eliminate the negative impact of identity factor and focus on recognizing facial expressions. Spatial fusion combined with self-constrained multi-task learning are adopted to jointly learn the expression representations and identity-related information. We evaluate our approach on three popular datasets, namely Oulu-CASIA, CK+ and MMI. IDEnNet improves the baseline consistently, and achieves the best or comparable state-of-the-art on all three datasets.
Tasks	Facial Expression Recognition, Multi-Task Learning
Published	2018-12-11
URL	http://arxiv.org/abs/1812.04207v1
PDF	http://arxiv.org/pdf/1812.04207v1.pdf
PWC	https://paperswithcode.com/paper/identity-enhanced-network-for-facial
Repo
Framework

Chargrid: Towards Understanding 2D Documents


Title	Chargrid: Towards Understanding 2D Documents
Authors	Anoop Raveendra Katti, Christian Reisswig, Cordula Guder, Sebastian Brarda, Steffen Bickel, Johannes Höhne, Jean Baptiste Faddoul
Abstract	We introduce a novel type of text representation that preserves the 2D layout of a document. This is achieved by encoding each document page as a two-dimensional grid of characters. Based on this representation, we present a generic document understanding pipeline for structured documents. This pipeline makes use of a fully convolutional encoder-decoder network that predicts a segmentation mask and bounding boxes. We demonstrate its capabilities on an information extraction task from invoices and show that it significantly outperforms approaches based on sequential text or document images.
Tasks
Published	2018-09-24
URL	http://arxiv.org/abs/1809.08799v1
PDF	http://arxiv.org/pdf/1809.08799v1.pdf
PWC	https://paperswithcode.com/paper/chargrid-towards-understanding-2d-documents
Repo
Framework

Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning


Title	Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning
Authors	David Mascharka, Philip Tran, Ryan Soklaski, Arjun Majumdar
Abstract	Visual question answering requires high-order reasoning about an image, which is a fundamental capability needed by machine systems to follow complex directives. Recently, modular networks have been shown to be an effective framework for performing visual reasoning tasks. While modular networks were initially designed with a degree of model transparency, their performance on complex visual reasoning benchmarks was lacking. Current state-of-the-art approaches do not provide an effective mechanism for understanding the reasoning process. In this paper, we close the performance gap between interpretable models and state-of-the-art visual reasoning methods. We propose a set of visual-reasoning primitives which, when composed, manifest as a model capable of performing complex reasoning tasks in an explicitly-interpretable manner. The fidelity and interpretability of the primitives’ outputs enable an unparalleled ability to diagnose the strengths and weaknesses of the resulting model. Critically, we show that these primitives are highly performant, achieving state-of-the-art accuracy of 99.1% on the CLEVR dataset. We also show that our model is able to effectively learn generalized representations when provided a small amount of data containing novel object attributes. Using the CoGenT generalization task, we show more than a 20 percentage point improvement over the current state of the art.
Tasks	Question Answering, Visual Question Answering, Visual Reasoning
Published	2018-03-14
URL	http://arxiv.org/abs/1803.05268v2
PDF	http://arxiv.org/pdf/1803.05268v2.pdf
PWC	https://paperswithcode.com/paper/transparency-by-design-closing-the-gap
Repo
Framework

PDE-constrained optimization in medical image analysis


Title	PDE-constrained optimization in medical image analysis
Authors	Andreas Mang, Amir Gholami, Christos Davatzikos, George Biros
Abstract	PDE-constrained optimization problems find many applications in medical image analysis, for example, neuroimaging, cardiovascular imaging, and oncological imaging. We review related literature and give examples on the formulation, discretization, and numerical solution of PDE-constrained optimization problems for medical imaging. We discuss three examples. The first one is image registration. The second one is data assimilation for brain tumor patients, and the third one data assimilation in cardiovascular imaging. The image registration problem is a classical task in medical image analysis and seeks to find pointwise correspondences between two or more images. The data assimilation problems use a PDE-constrained formulation to link a biophysical model to patient-specific data obtained from medical images. The associated optimality systems turn out to be sets of nonlinear, multicomponent PDEs that are challenging to solve in an efficient way. The ultimate goal of our work is the design of inversion methods that integrate complementary data, and rigorously follow mathematical and physical principles, in an attempt to support clinical decision making. This requires reliable, high-fidelity algorithms with a short time-to-solution. This task is complicated by model and data uncertainties, and by the fact that PDE-constrained optimization problems are ill-posed in nature, and in general yield high-dimensional, severely ill-conditioned systems after discretization. These features make regularization, effective preconditioners, and iterative solvers that, in many cases, have to be implemented on distributed-memory architectures to be practical, a prerequisite. We showcase state-of-the-art techniques in scientific computing to tackle these challenges.
Tasks	Decision Making, Image Registration
Published	2018-02-28
URL	http://arxiv.org/abs/1803.00058v1
PDF	http://arxiv.org/pdf/1803.00058v1.pdf
PWC	https://paperswithcode.com/paper/pde-constrained-optimization-in-medical-image
Repo
Framework

Towards Highly Accurate Coral Texture Images Classification Using Deep Convolutional Neural Networks and Data Augmentation


Title	Towards Highly Accurate Coral Texture Images Classification Using Deep Convolutional Neural Networks and Data Augmentation
Authors	Anabel Gómez-Ríos, Siham Tabik, Julián Luengo, ASM Shihavuddin, Bartosz Krawczyk, Francisco Herrera
Abstract	The recognition of coral species based on underwater texture images pose a significant difficulty for machine learning algorithms, due to the three following challenges embedded in the nature of this data: 1) datasets do not include information about the global structure of the coral; 2) several species of coral have very similar characteristics; and 3) defining the spatial borders between classes is difficult as many corals tend to appear together in groups. For this reason, the classification of coral species has always required an aid from a domain expert. The objective of this paper is to develop an accurate classification model for coral texture images. Current datasets contain a large number of imbalanced classes, while the images are subject to inter-class variation. We have analyzed 1) several Convolutional Neural Network (CNN) architectures, 2) data augmentation techniques and 3) transfer learning. We have achieved the state-of-the art accuracies using different variations of ResNet on the two current coral texture datasets, EILAT and RSMAS.
Tasks	Data Augmentation, Transfer Learning
Published	2018-03-27
URL	http://arxiv.org/abs/1804.00516v1
PDF	http://arxiv.org/pdf/1804.00516v1.pdf
PWC	https://paperswithcode.com/paper/towards-highly-accurate-coral-texture-images
Repo
Framework

Face Recognition: Primates in the Wild


Title	Face Recognition: Primates in the Wild
Authors	Debayan Deb, Susan Wiper, Alexandra Russo, Sixue Gong, Yichun Shi, Cori Tymoszek, Anil Jain
Abstract	We present a new method of primate face recognition, and evaluate this method on several endangered primates, including golden monkeys, lemurs, and chimpanzees. The three datasets contain a total of 11,637 images of 280 individual primates from 14 species. Primate face recognition performance is evaluated using two existing state-of-the-art open-source systems, (i) FaceNet and (ii) SphereFace, (iii) a lemur face recognition system from literature, and (iv) our new convolutional neural network (CNN) architecture called PrimNet. Three recognition scenarios are considered: verification (1:1 comparison), and both open-set and closed-set identification (1:N search). We demonstrate that PrimNet outperforms all of the other systems in all three scenarios for all primate species tested. Finally, we implement an Android application of this recognition system to assist primate researchers and conservationists in the wild for individual recognition of primates.
Tasks	Face Recognition
Published	2018-04-24
URL	http://arxiv.org/abs/1804.08790v1
PDF	http://arxiv.org/pdf/1804.08790v1.pdf
PWC	https://paperswithcode.com/paper/face-recognition-primates-in-the-wild
Repo
Framework

Image Classification for Arabic: Assessing the Accuracy of Direct English to Arabic Translations


Title	Image Classification for Arabic: Assessing the Accuracy of Direct English to Arabic Translations
Authors	Abdulkareem Alsudais
Abstract	Image classification is an ongoing research challenge. Most of the available research focuses on image classification for the English language, however there is very little research on image classification for the Arabic language. Expanding image classification to Arabic has several applications. The present study investigated a method for generating Arabic labels for images of objects. The method used in this study involved a direct English to Arabic translation of the labels that are currently available on ImageNet, a database commonly used in image classification research. The purpose of this study was to test the accuracy of this method. In this study, 2,887 labeled images were randomly selected from ImageNet. All of the labels were translated from English to Arabic using Google Translate. The accuracy of the translations was evaluated. Results indicated that that 65.6% of the Arabic labels were accurate. This study makes three important contributions to the image classification literature: (1) it determined the baseline level of accuracy for algorithms that provide Arabic labels for images, (2) it provided 1,895 images that are tagged with accurate Arabic labels, and (3) provided the accuracy of translations of image labels from English to Arabic.
Tasks	Image Classification
Published	2018-07-13
URL	https://arxiv.org/abs/1807.05206v2
PDF	https://arxiv.org/pdf/1807.05206v2.pdf
PWC	https://paperswithcode.com/paper/image-classification-for-arabic-assessing-the
Repo
Framework

Deep Learning For Computer Vision Tasks: A review


Title	Deep Learning For Computer Vision Tasks: A review
Authors	Rajat Kumar Sinha, Ruchi Pandey, Rohan Pattnaik
Abstract	Deep learning has recently become one of the most popular sub-fields of machine learning owing to its distributed data representation with multiple levels of abstraction. A diverse range of deep learning algorithms are being employed to solve conventional artificial intelligence problems. This paper gives an overview of some of the most widely used deep learning algorithms applied in the field of computer vision. It first inspects the various approaches of deep learning algorithms, followed by a description of their applications in image classification, object identification, image extraction and semantic segmentation in the presence of noise. The paper concludes with the discussion of the future scope and challenges for construction and training of deep neural networks.
Tasks	Image Classification, Semantic Segmentation
Published	2018-04-11
URL	http://arxiv.org/abs/1804.03928v1
PDF	http://arxiv.org/pdf/1804.03928v1.pdf
PWC	https://paperswithcode.com/paper/deep-learning-for-computer-vision-tasks-a
Repo
Framework

Automatically augmenting an emotion dataset improves classification using audio


Title	Automatically augmenting an emotion dataset improves classification using audio
Authors	Egor Lakomkin, Cornelius Weber, Stefan Wermter
Abstract	In this work, we tackle a problem of speech emotion classification. One of the issues in the area of affective computation is that the amount of annotated data is very limited. On the other hand, the number of ways that the same emotion can be expressed verbally is enormous due to variability between speakers. This is one of the factors that limits performance and generalization. We propose a simple method that extracts audio samples from movies using textual sentiment analysis. As a result, it is possible to automatically construct a larger dataset of audio samples with positive, negative emotional and neutral speech. We show that pretraining recurrent neural network on such a dataset yields better results on the challenging EmotiW corpus. This experiment shows a potential benefit of combining textual sentiment analysis with vocal information.
Tasks	Emotion Classification, Sentiment Analysis
Published	2018-03-30
URL	http://arxiv.org/abs/1803.11506v1
PDF	http://arxiv.org/pdf/1803.11506v1.pdf
PWC	https://paperswithcode.com/paper/automatically-augmenting-an-emotion-dataset
Repo
Framework