January 31, 2020

3404 words 16 mins read

Paper Group ANR 76

POIRot: A rotation invariant omni-directional pointnet. FLATM: A Fuzzy Logic Approach Topic Model for Medical Documents. Feature Relevance Determination for Ordinal Regression in the Context of Feature Redundancies and Privileged Information. M2FPA: A Multi-Yaw Multi-Pitch High-Quality Database and Benchmark for Facial Pose Analysis. On the Self-Si …

POIRot: A rotation invariant omni-directional pointnet


Title	POIRot: A rotation invariant omni-directional pointnet
Authors	Liu Yang, Rudrasis Chakraborty, Stella X. Yu
Abstract	Point-cloud is an efficient way to represent 3D world. Analysis of point-cloud deals with understanding the underlying 3D geometric structure. But due to the lack of smooth topology, and hence the lack of neighborhood structure, standard correlation can not be directly applied on point-cloud. One of the popular approaches to do point correlation is to partition the point-cloud into voxels and extract features using standard 3D correlation. But this approach suffers from sparsity of point-cloud and hence results in multiple empty voxels. One possible solution to deal with this problem is to learn a MLP to map a point or its local neighborhood to a high dimensional feature space. All these methods suffer from a large number of parameters requirement and are susceptible to random rotations. A popular way to make the model “invariant” to rotations is to use data augmentation techniques with small rotations but the potential drawback includes \item more training samples \item susceptible to large rotations. In this work, we develop a rotation invariant point-cloud segmentation and classification scheme based on the omni-directional camera model (dubbed as {\bf POIRot$^1$}). Our proposed model is rotationally invariant and can preserve geometric shape of a 3D point-cloud. Because of the inherent rotation invariant property, our proposed framework requires fewer number of parameters (please see \cite{Iandola2017SqueezeNetAA} and the references therein for motivation of lean models). Several experiments have been performed to show that our proposed method can beat the state-of-the-art algorithms in classification and part segmentation applications.
Tasks	Data Augmentation
Published	2019-10-29
URL	https://arxiv.org/abs/1910.13050v2
PDF	https://arxiv.org/pdf/1910.13050v2.pdf
PWC	https://paperswithcode.com/paper/poirot-a-rotation-invariant-omni-directional
Repo
Framework

FLATM: A Fuzzy Logic Approach Topic Model for Medical Documents


Title	FLATM: A Fuzzy Logic Approach Topic Model for Medical Documents
Authors	Amir Karami, Aryya Gangopadhyay, Bin Zhou, Hadi Kharrazi
Abstract	One of the challenges for text analysis in medical domains is analyzing large-scale medical documents. As a consequence, finding relevant documents has become more difficult. One of the popular methods to retrieve information based on discovering the themes in the documents is topic modeling. The themes in the documents help to retrieve documents on the same topic with and without a query. In this paper, we present a novel approach to topic modeling using fuzzy clustering. To evaluate our model, we experiment with two text datasets of medical documents. The evaluation metrics carried out through document classification and document modeling show that our model produces better performance than LDA, indicating that fuzzy set theory can improve the performance of topic models in medical domains.
Tasks	Document Classification, Topic Models
Published	2019-11-25
URL	https://arxiv.org/abs/1911.10953v1
PDF	https://arxiv.org/pdf/1911.10953v1.pdf
PWC	https://paperswithcode.com/paper/flatm-a-fuzzy-logic-approach-topic-model-for
Repo
Framework

Feature Relevance Determination for Ordinal Regression in the Context of Feature Redundancies and Privileged Information


Title	Feature Relevance Determination for Ordinal Regression in the Context of Feature Redundancies and Privileged Information
Authors	Lukas Pfannschmidt, Jonathan Jakob, Fabian Hinder, Michael Biehl, Peter Tino, Barbara Hammer
Abstract	Advances in machine learning technologies have led to increasingly powerful models in particular in the context of big data. Yet, many application scenarios demand for robustly interpretable models rather than optimum model accuracy; as an example, this is the case if potential biomarkers or causal factors should be discovered based on a set of given measurements. In this contribution, we focus on feature selection paradigms, which enable us to uncover relevant factors of a given regularity based on a sparse model. We focus on the important specific setting of linear ordinal regression, i.e.\ data have to be ranked into one of a finite number of ordered categories by a linear projection. Unlike previous work, we consider the case that features are potentially redundant, such that no unique minimum set of relevant features exists. We aim for an identification of all strongly and all weakly relevant features as well as their type of relevance (strong or weak); we achieve this goal by determining feature relevance bounds, which correspond to the minimum and maximum feature relevance, respectively, if searched over all equivalent models. In addition, we discuss how this setting enables us to substitute some of the features, e.g.\ due to their semantics, and how to extend the framework of feature relevance intervals to the setting of privileged information, i.e.\ potentially relevant information is available for training purposes only, but cannot be used for the prediction itself.
Tasks	Feature Selection
Published	2019-12-10
URL	https://arxiv.org/abs/1912.04832v1
PDF	https://arxiv.org/pdf/1912.04832v1.pdf
PWC	https://paperswithcode.com/paper/feature-relevance-determination-for-ordinal
Repo
Framework

M2FPA: A Multi-Yaw Multi-Pitch High-Quality Database and Benchmark for Facial Pose Analysis


Title	M2FPA: A Multi-Yaw Multi-Pitch High-Quality Database and Benchmark for Facial Pose Analysis
Authors	Peipei Li, Xiang Wu, Yibo Hu, Ran He, Zhenan Sun
Abstract	Facial images in surveillance or mobile scenarios often have large view-point variations in terms of pitch and yaw angles. These jointly occurred angle variations make face recognition challenging. Current public face databases mainly consider the case of yaw variations. In this paper, a new large-scale Multi-yaw Multi-pitch high-quality database is proposed for Facial Pose Analysis (M2FPA), including face frontalization, face rotation, facial pose estimation and pose-invariant face recognition. It contains 397,544 images of 229 subjects with yaw, pitch, attribute, illumination and accessory. M2FPA is the most comprehensive multi-view face database for facial pose analysis. Further, we provide an effective benchmark for face frontalization and pose-invariant face recognition on M2FPA with several state-of-the-art methods, including DR-GAN, TP-GAN and CAPG-GAN. We believe that the new database and benchmark can significantly push forward the advance of facial pose analysis in real-world applications. Moreover, a simple yet effective parsing guided discriminator is introduced to capture the local consistency during GAN optimization. Extensive quantitative and qualitative results on M2FPA and Multi-PIE demonstrate the superiority of our face frontalization method. Baseline results for both face synthesis and face recognition from state-of-theart methods demonstrate the challenge offered by this new database.
Tasks	Face Generation, Face Recognition, Pose Estimation, Robust Face Recognition
Published	2019-03-30
URL	https://arxiv.org/abs/1904.00168v2
PDF	https://arxiv.org/pdf/1904.00168v2.pdf
PWC	https://paperswithcode.com/paper/m2fpa-a-multi-yaw-multi-pitch-high-quality
Repo
Framework

On the Self-Similarity of Natural Stochastic Textures


Title	On the Self-Similarity of Natural Stochastic Textures
Authors	Samah Khawaled, Yehoshua Y. Zeevi
Abstract	Self-similarity is the essence of fractal images and, as such, characterizes natural stochastic textures. This paper is concerned with the property of self-similarity in the statistical sense in the case of fully-textured images that contain both stochastic texture and structural (mostly deterministic) information. We firstly decompose a textured image into two layers corresponding to its texture and structure, and show that the layer representing the stochastic texture is characterized by random phase of uniform distribution, unlike the phase of the structured information which is coherent. The uniform distribution of the the random phase is verified by using a suitable hypothesis testing framework. We proceed by proposing two approaches to assessment of self-similarity. The first is based on patch-wise calculation of the mutual information, while the second measures the mutual information that exists across scales. Quantifying the extent of self-similarity by means of mutual information is of paramount importance in the analysis of natural stochastic textures that are encountered in medical imaging, geology, agriculture and in computer vision algorithms that are designed for application on fully-textures images.
Tasks
Published	2019-06-16
URL	https://arxiv.org/abs/1906.06768v1
PDF	https://arxiv.org/pdf/1906.06768v1.pdf
PWC	https://paperswithcode.com/paper/on-the-self-similarity-of-natural-stochastic
Repo
Framework

Wav2Pix: Speech-conditioned Face Generation using Generative Adversarial Networks


Title	Wav2Pix: Speech-conditioned Face Generation using Generative Adversarial Networks
Authors	Amanda Duarte, Francisco Roldan, Miquel Tubau, Janna Escur, Santiago Pascual, Amaia Salvador, Eva Mohedano, Kevin McGuinness, Jordi Torres, Xavier Giro-i-Nieto
Abstract	Speech is a rich biometric signal that contains information about the identity, gender and emotional state of the speaker. In this work, we explore its potential to generate face images of a speaker by conditioning a Generative Adversarial Network (GAN) with raw speech input. We propose a deep neural network that is trained from scratch in an end-to-end fashion, generating a face directly from the raw speech waveform without any additional identity information (e.g reference image or one-hot encoding). Our model is trained in a self-supervised approach by exploiting the audio and visual signals naturally aligned in videos. With the purpose of training from video data, we present a novel dataset collected for this work, with high-quality videos of youtubers with notable expressiveness in both the speech and visual signals.
Tasks	Face Generation
Published	2019-03-25
URL	http://arxiv.org/abs/1903.10195v1
PDF	http://arxiv.org/pdf/1903.10195v1.pdf
PWC	https://paperswithcode.com/paper/wav2pix-speech-conditioned-face-generation
Repo
Framework

Adversarial Security Attacks and Perturbations on Machine Learning and Deep Learning Methods


Title	Adversarial Security Attacks and Perturbations on Machine Learning and Deep Learning Methods
Authors	Arif Siddiqi
Abstract	The ever-growing big data and emerging artificial intelligence (AI) demand the use of machine learning (ML) and deep learning (DL) methods. Cybersecurity also benefits from ML and DL methods for various types of applications. These methods however are susceptible to security attacks. The adversaries can exploit the training and testing data of the learning models or can explore the workings of those models for launching advanced future attacks. The topic of adversarial security attacks and perturbations within the ML and DL domains is a recent exploration and a great interest is expressed by the security researchers and practitioners. The literature covers different adversarial security attacks and perturbations on ML and DL methods and those have their own presentation styles and merits. A need to review and consolidate knowledge that is comprehending of this increasingly focused and growing topic of research; however, is the current demand of the research communities. In this review paper, we specifically aim to target new researchers in the cybersecurity domain who may seek to acquire some basic knowledge on the machine learning and deep learning models and algorithms, as well as some of the relevant adversarial security attacks and perturbations.
Tasks
Published	2019-07-17
URL	https://arxiv.org/abs/1907.07291v1
PDF	https://arxiv.org/pdf/1907.07291v1.pdf
PWC	https://paperswithcode.com/paper/adversarial-security-attacks-and
Repo
Framework

CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition


Title	CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition
Authors	Linhao Dong, Bo Xu
Abstract	In this paper, we propose a novel soft and monotonic alignment mechanism used for sequence transduction. It is inspired by the integrate-and-fire model in spiking neural networks and employed in the encoder-decoder framework consists of continuous functions, thus being named as: Continuous Integrate-and-Fire (CIF). Applied to the ASR task, CIF not only shows a concise calculation, but also supports online recognition and acoustic boundary positioning, thus suitable for various ASR scenarios. Several support strategies are also proposed to alleviate the unique problems of CIF-based model. With the joint action of these methods, the CIF-based model shows competitive performance. Notably, it achieves a word error rate (WER) of 2.86% on the test-clean of Librispeech and creates new state-of-the-art result on Mandarin telephone ASR benchmark.
Tasks	End-To-End Speech Recognition, Language Modelling, Multi-Task Learning, Speech Recognition
Published	2019-05-27
URL	https://arxiv.org/abs/1905.11235v4
PDF	https://arxiv.org/pdf/1905.11235v4.pdf
PWC	https://paperswithcode.com/paper/cif-continuous-integrate-and-fire-for-end-to
Repo
Framework

Artificial Intelligence as a Services (AI-aaS) on Software-Defined Infrastructure


Title	Artificial Intelligence as a Services (AI-aaS) on Software-Defined Infrastructure
Authors	Saeedeh Parsaeefard, Iman Tabrizian, Alberto Leon-Garcia
Abstract	This paper investigates a paradigm for offering artificial intelligence as a service (AI-aaS) on software-defined infrastructures (SDIs). The increasing complexity of networking and computing infrastructures is already driving the introduction of automation in networking and cloud computing management systems. Here we consider how these automation mechanisms can be leveraged to offer AI-aaS. Use cases for AI-aaS are easily found in addressing smart applications in sectors such as transportation, manufacturing, energy, water, air quality, and emissions. We propose an architectural scheme based on SDIs where each AI-aaS application is comprised of a monitoring, analysis, policy, execution plus knowledge (MAPE-K) loop (MKL). Each application is composed as one or more specific service chains embedded in SDI, some of which will include a Machine Learning (ML) pipeline. Our model includes a new training plane and an AI-aaS plane to deal with the model-development and operational phases of AI applications. We also consider the role of an ML/MKL sandbox in ensuring coherency and consistency in the operation of multiple parallel MKL loops. We present experimental measurement results for three AI-aaS applications deployed on the SAVI testbed: 1. Compressing monitored data in SDI using autoencoders; 2. Traffic monitoring to allocate CPUs resources to VNFs; and 3. Highway segment classification in smart transportation.
Tasks
Published	2019-07-11
URL	https://arxiv.org/abs/1907.05505v1
PDF	https://arxiv.org/pdf/1907.05505v1.pdf
PWC	https://paperswithcode.com/paper/artificial-intelligence-as-a-services-ai-aas
Repo
Framework

DeepEvolution: A Search-Based Testing Approach for Deep Neural Networks


Title	DeepEvolution: A Search-Based Testing Approach for Deep Neural Networks
Authors	Houssem Ben Braiek, Foutse khomh
Abstract	The increasing inclusion of Deep Learning (DL) models in safety-critical systems such as autonomous vehicles have led to the development of multiple model-based DL testing techniques. One common denominator of these testing techniques is the automated generation of test cases, e.g., new inputs transformed from the original training data with the aim to optimize some test adequacy criteria. So far, the effectiveness of these approaches has been hindered by their reliance on random fuzzing or transformations that do not always produce test cases with a good diversity. To overcome these limitations, we propose, DeepEvolution, a novel search-based approach for testing DL models that relies on metaheuristics to ensure a maximum diversity in generated test cases. We assess the effectiveness of DeepEvolution in testing computer-vision DL models and found that it significantly increases the neuronal coverage of generated test cases. Moreover, using DeepEvolution, we could successfully find several corner-case behaviors. Finally, DeepEvolution outperformed Tensorfuzz (a coverage-guided fuzzing tool developed at Google Brain) in detecting latent defects introduced during the quantization of the models. These results suggest that search-based approaches can help build effective testing tools for DL systems.
Tasks	Autonomous Vehicles, Quantization
Published	2019-09-05
URL	https://arxiv.org/abs/1909.02563v1
PDF	https://arxiv.org/pdf/1909.02563v1.pdf
PWC	https://paperswithcode.com/paper/deepevolution-a-search-based-testing-approach
Repo
Framework

Feedback Recurrent AutoEncoder


Title	Feedback Recurrent AutoEncoder
Authors	Yang Yang, Guillaume Sautière, J. Jon Ryu, Taco S Cohen
Abstract	In this work, we propose a new recurrent autoencoder architecture, termed Feedback Recurrent AutoEncoder (FRAE), for online compression of sequential data with temporal dependency. The recurrent structure of FRAE is designed to efficiently extract the redundancy along the time dimension and allows a compact discrete representation of the data to be learned. We demonstrate its effectiveness in speech spectrogram compression. Specifically, we show that the FRAE, paired with a powerful neural vocoder, can produce high-quality speech waveforms at a low, fixed bitrate. We further show that by adding a learned prior for the latent space and using an entropy coder, we can achieve an even lower variable bitrate.
Tasks
Published	2019-11-11
URL	https://arxiv.org/abs/1911.04018v2
PDF	https://arxiv.org/pdf/1911.04018v2.pdf
PWC	https://paperswithcode.com/paper/feedback-recurrent-autoencoder
Repo
Framework

CochleaNet: A Robust Language-independent Audio-Visual Model for Speech Enhancement


Title	CochleaNet: A Robust Language-independent Audio-Visual Model for Speech Enhancement
Authors	Mandar Gogate, Kia Dashtipour, Ahsan Adeel, Amir Hussain
Abstract	Noisy situations cause huge problems for suffers of hearing loss as hearing aids often make the signal more audible but do not always restore the intelligibility. In noisy settings, humans routinely exploit the audio-visual (AV) nature of the speech to selectively suppress the background noise and to focus on the target speaker. In this paper, we present a causal, language, noise and speaker independent AV deep neural network (DNN) architecture for speech enhancement (SE). The model exploits the noisy acoustic cues and noise robust visual cues to focus on the desired speaker and improve the speech intelligibility. To evaluate the proposed SE framework a first of its kind AV binaural speech corpus, called ASPIRE, is recorded in real noisy environments including cafeteria and restaurant. We demonstrate superior performance of our approach in terms of objective measures and subjective listening tests over the state-of-the-art SE approaches as well as recent DNN based SE models. In addition, our work challenges a popular belief that a scarcity of multi-language large vocabulary AV corpus and wide variety of noises is a major bottleneck to build a robust language, speaker and noise independent SE systems. We show that a model trained on synthetic mixture of Grid corpus (with 33 speakers and a small English vocabulary) and ChiME 3 Noises (consisting of only bus, pedestrian, cafeteria, and street noises) generalise well not only on large vocabulary corpora but also on completely unrelated languages (such as Mandarin), wide variety of speakers and noises.
Tasks	Speech Enhancement
Published	2019-09-23
URL	https://arxiv.org/abs/1909.10407v1
PDF	https://arxiv.org/pdf/1909.10407v1.pdf
PWC	https://paperswithcode.com/paper/190910407
Repo
Framework

Classifying Topological Charge in SU(3) Yang-Mills Theory with Machine Learning


Title	Classifying Topological Charge in SU(3) Yang-Mills Theory with Machine Learning
Authors	Takuya Matsumoto, Masakiyo Kitazawa, Yasuhiro Kohno
Abstract	We apply a machine learning technique for identifying the topological charge of quantum gauge configurations in four-dimensional SU(3) Yang-Mills theory. The topological charge density measured on the original and smoothed gauge configurations with and without dimensional reduction is used for inputs of the neural networks (NN) with and without convolutional layers. The gradient flow is used for the smoothing of the gauge field. We find that the topological charge determined at a large flow time can be predicted with high accuracy from the data at small flow times by the trained NN; the accuracy exceeds $99%$ with the data at $t/a^2\le0.3$. High robustness against the change of simulation parameters is also confirmed. We find that the best performance is obtained when the spatial coordinates of the topological charge density are fully integrated out as a preprocessing, which implies that our convolutional NN does not find characteristic structures in multi-dimensional space relevant for the determination of the topological charge.
Tasks
Published	2019-09-13
URL	https://arxiv.org/abs/1909.06238v1
PDF	https://arxiv.org/pdf/1909.06238v1.pdf
PWC	https://paperswithcode.com/paper/classifying-topological-charge-in-su3-yang
Repo
Framework

When a Tweet is Actually Sexist. A more Comprehensive Classification of Different Online Harassment Categories and The Challenges in NLP


Title	When a Tweet is Actually Sexist. A more Comprehensive Classification of Different Online Harassment Categories and The Challenges in NLP
Authors	Sima Sharifirad, Stan Matwin
Abstract	Sexism is very common in social media and makes the boundaries of freedom tighter for feminist and female users. There is still no comprehensive classification of sexism attracting natural language processing techniques. Categorizing sexism in social media in the categories of hostile or benevolent sexism are so general that simply ignores the other types of sexism happening in these media. This paper proposes a more comprehensive and in-depth categories of online harassment in social media e.g. twitter into the following categories, “Indirect harassment”, “Information threat”, “sexual harassment”, “Physical harassment” and “Not sexist” and address the challenge of labeling them along with presenting the classification result of the categories. It is preliminary work applying machine learning to learn the concept of sexism and distinguishes itself by looking at more precise categories of sexism in social media.
Tasks
Published	2019-02-27
URL	http://arxiv.org/abs/1902.10584v1
PDF	http://arxiv.org/pdf/1902.10584v1.pdf
PWC	https://paperswithcode.com/paper/when-a-tweet-is-actually-sexist-a-more
Repo
Framework

Joint, Partially-joint, and Individual Independent Component Analysis in Multi-Subject fMRI Data


Title	Joint, Partially-joint, and Individual Independent Component Analysis in Multi-Subject fMRI Data
Authors	Mansooreh Pakravan, Mohammad Bagher Shamsollahi
Abstract	Objective: Joint analysis of multi-subject brain imaging datasets has wide applications in biomedical engineering. In these datasets, some sources belong to all subjects (joint), a subset of subjects (partially-joint), or a single subject (individual). In this paper, this source model is referred to as joint/partially-joint/individual multiple datasets multidimensional (JpJI-MDM), and accordingly, a source extraction method is developed. Method: We present a deflation-based algorithm utilizing higher order cumulants to analyze the JpJI-MDM source model. The algorithm maximizes a cost function which leads to an eigenvalue problem solved with thin-SVD (singular value decomposition) factorization. Furthermore, we introduce the JpJI-feature which indicates the spatial shape of each source and the amount of its jointness with other subjects. We use this feature to determine the type of sources. Results: We evaluate our algorithm by analyzing simulated data and two real functional magnetic resonance imaging (fMRI) datasets. In our simulation study, we will show that the proposed algorithm determines the type of sources with the accuracy of 95% and 100% for 2-class and 3-class clustering scenarios, respectively. Furthermore, our algorithm extracts meaningful joint and partially-joint sources from the two real datasets, which are consistent with the existing neuroscience studies. Conclusion: Our results in analyzing the real datasets reveal that both datasets follow the JpJI-MDM source model. This source model improves the accuracy of source extraction methods developed for multi-subject datasets. Significance: The proposed joint blind source separation algorithm is robust and avoids parameters which are difficult to fine-tune.
Tasks
Published	2019-09-09
URL	https://arxiv.org/abs/1909.03676v2
PDF	https://arxiv.org/pdf/1909.03676v2.pdf
PWC	https://paperswithcode.com/paper/joint-partially-joint-and-individual
Repo
Framework