January 31, 2020

3010 words 15 mins read

Paper Group ANR 162

Paper Group ANR 162

Improving Borderline Adulthood Facial Age Estimation through Ensemble Learning. Robust and Communication-Efficient Federated Learning from Non-IID Data. Introduction to Voice Presentation Attack Detection and Recent Advances. Connecting Touch and Vision via Cross-Modal Prediction. Augmenting Gastrointestinal Health: A Deep Learning Approach to Huma …

Improving Borderline Adulthood Facial Age Estimation through Ensemble Learning

Title Improving Borderline Adulthood Facial Age Estimation through Ensemble Learning
Authors Felix Anda, David Lillis, Aikaterini Kanta, Brett A. Becker, Elias Bou-Harb, Nhien-An Le-Khac, Mark Scanlon
Abstract Achieving high performance for facial age estimation with subjects in the borderline between adulthood and non-adulthood has always been a challenge. Several studies have used different approaches from the age of a baby to an elder adult and different datasets have been employed to measure the mean absolute error (MAE) ranging between 1.47 to 8 years. The weakness of the algorithms specifically in the borderline has been a motivation for this paper. In our approach, we have developed an ensemble technique that improves the accuracy of underage estimation in conjunction with our deep learning model (DS13K) that has been fine-tuned on the Deep Expectation (DEX) model. We have achieved an accuracy of 68% for the age group 16 to 17 years old, which is 4 times better than the DEX accuracy for such age range. We also present an evaluation of existing cloud-based and offline facial age prediction services, such as Amazon Rekognition, Microsoft Azure Cognitive Services, How-Old.net and DEX.
Tasks Age Estimation
Published 2019-07-02
URL https://arxiv.org/abs/1907.01427v1
PDF https://arxiv.org/pdf/1907.01427v1.pdf
PWC https://paperswithcode.com/paper/improving-borderline-adulthood-facial-age
Repo
Framework

Robust and Communication-Efficient Federated Learning from Non-IID Data

Title Robust and Communication-Efficient Federated Learning from Non-IID Data
Authors Felix Sattler, Simon Wiedemann, Klaus-Robert Müller, Wojciech Samek
Abstract Federated Learning allows multiple parties to jointly train a deep learning model on their combined data, without any of the participants having to reveal their local data to a centralized server. This form of privacy-preserving collaborative learning however comes at the cost of a significant communication overhead during training. To address this problem, several compression methods have been proposed in the distributed training literature that can reduce the amount of required communication by up to three orders of magnitude. These existing methods however are only of limited utility in the Federated Learning setting, as they either only compress the upstream communication from the clients to the server (leaving the downstream communication uncompressed) or only perform well under idealized conditions such as iid distribution of the client data, which typically can not be found in Federated Learning. In this work, we propose Sparse Ternary Compression (STC), a new compression framework that is specifically designed to meet the requirements of the Federated Learning environment. Our experiments on four different learning tasks demonstrate that STC distinctively outperforms Federated Averaging in common Federated Learning scenarios where clients either a) hold non-iid data, b) use small batch sizes during training, or where c) the number of clients is large and the participation rate in every communication round is low. We furthermore show that even if the clients hold iid data and use medium sized batches for training, STC still behaves pareto-superior to Federated Averaging in the sense that it achieves fixed target accuracies on our benchmarks within both fewer training iterations and a smaller communication budget.
Tasks
Published 2019-03-07
URL http://arxiv.org/abs/1903.02891v1
PDF http://arxiv.org/pdf/1903.02891v1.pdf
PWC https://paperswithcode.com/paper/robust-and-communication-efficient-federated
Repo
Framework

Introduction to Voice Presentation Attack Detection and Recent Advances

Title Introduction to Voice Presentation Attack Detection and Recent Advances
Authors Md Sahidullah, Hector Delgado, Massimiliano Todisco, Tomi Kinnunen, Nicholas Evans, Junichi Yamagishi, Kong-Aik Lee
Abstract Over the past few years significant progress has been made in the field of presentation attack detection (PAD) for automatic speaker recognition (ASV). This includes the development of new speech corpora, standard evaluation protocols and advancements in front-end feature extraction and back-end classifiers. The use of standard databases and evaluation protocols has enabled for the first time the meaningful benchmarking of different PAD solutions. This chapter summarises the progress, with a focus on studies completed in the last three years. The article presents a summary of findings and lessons learned from two ASVspoof challenges, the first community-led benchmarking efforts. These show that ASV PAD remains an unsolved problem and that further attention is required to develop generalised PAD solutions which have potential to detect diverse and previously unseen spoofing attacks.
Tasks Speaker Recognition
Published 2019-01-04
URL http://arxiv.org/abs/1901.01085v1
PDF http://arxiv.org/pdf/1901.01085v1.pdf
PWC https://paperswithcode.com/paper/introduction-to-voice-presentation-attack
Repo
Framework

Connecting Touch and Vision via Cross-Modal Prediction

Title Connecting Touch and Vision via Cross-Modal Prediction
Authors Yunzhu Li, Jun-Yan Zhu, Russ Tedrake, Antonio Torralba
Abstract Humans perceive the world using multi-modal sensory inputs such as vision, audition, and touch. In this work, we investigate the cross-modal connection between vision and touch. The main challenge in this cross-domain modeling task lies in the significant scale discrepancy between the two: while our eyes perceive an entire visual scene at once, humans can only feel a small region of an object at any given moment. To connect vision and touch, we introduce new tasks of synthesizing plausible tactile signals from visual inputs as well as imagining how we interact with objects given tactile data as input. To accomplish our goals, we first equip robots with both visual and tactile sensors and collect a large-scale dataset of corresponding vision and tactile image sequences. To close the scale gap, we present a new conditional adversarial model that incorporates the scale and location information of the touch. Human perceptual studies demonstrate that our model can produce realistic visual images from tactile data and vice versa. Finally, we present both qualitative and quantitative experimental results regarding different system designs, as well as visualizing the learned representations of our model.
Tasks
Published 2019-06-14
URL https://arxiv.org/abs/1906.06322v1
PDF https://arxiv.org/pdf/1906.06322v1.pdf
PWC https://paperswithcode.com/paper/connecting-touch-and-vision-via-cross-modal-1
Repo
Framework

Augmenting Gastrointestinal Health: A Deep Learning Approach to Human Stool Recognition and Characterization in Macroscopic Images

Title Augmenting Gastrointestinal Health: A Deep Learning Approach to Human Stool Recognition and Characterization in Macroscopic Images
Authors David Hachuel, Akshay Jha, Deborah Estrin, Alfonso Martinez, Kyle Staller, Christopher Velez
Abstract Purpose - Functional bowel diseases, including irritable bowel syndrome, chronic constipation, and chronic diarrhea, are some of the most common diseases seen in clinical practice. Many patients describe a range of triggers for altered bowel consistency and symptoms. However, characterization of the relationship between symptom triggers using bowel diaries is hampered by poor compliance and lack of objective stool consistency measurements. We sought to develop a stool detection and tracking system using computer vision and deep convolutional neural networks (CNN) that could be used by patients, providers, and researchers in the assessment of chronic gastrointestinal (GI) disease.
Tasks
Published 2019-03-25
URL http://arxiv.org/abs/1903.10578v1
PDF http://arxiv.org/pdf/1903.10578v1.pdf
PWC https://paperswithcode.com/paper/augmenting-gastrointestinal-health-a-deep
Repo
Framework

Free Component Analysis: Theory, Algorithms & Applications

Title Free Component Analysis: Theory, Algorithms & Applications
Authors Hao Wu, Raj Rao Nadakuditi
Abstract We describe a method for unmixing mixtures of freely independent random variables in a manner analogous to the independent component analysis (ICA) based method for unmixing independent random variables from their additive mixtures. Random matrices play the role of free random variables in this context so the method we develop, which we call Free component analysis (FCA), unmixes matrices from additive mixtures of matrices. We describe the theory, the various algorithms, and compare FCA to ICA. We show that FCA performs comparably to, and often better than, ICA in every application, such as image and speech unmixing, where ICA has been known to succeed. Our computational experiments suggest that not-so-random matrices, such as images and spectrograms of waveforms are (closer to being) freer “in the wild” than we might have theoretically expected.
Tasks
Published 2019-05-05
URL https://arxiv.org/abs/1905.01713v1
PDF https://arxiv.org/pdf/1905.01713v1.pdf
PWC https://paperswithcode.com/paper/free-component-analysis-theory-algorithms
Repo
Framework

Composing and Embedding the Words-as-Classifiers Model of Grounded Semantics

Title Composing and Embedding the Words-as-Classifiers Model of Grounded Semantics
Authors Daniele Moro, Stacy Black, Casey Kennington
Abstract The words-as-classifiers model of grounded lexical semantics learns a semantic fitness score between physical entities and the words that are used to denote those entities. In this paper, we explore how such a model can incrementally perform composition and how the model can be unified with a distributional representation. For the latter, we leverage the classifier coefficients as an embedding. For composition, we leverage the underlying mechanics of three different classifier types (i.e., logistic regression, decision trees, and multi-layer perceptrons) to arrive at a several systematic approaches to composition unique to each classifier including both denotational and connotational methods of composition. We compare these approaches to each other and to prior work in a visual reference resolution task using the refCOCO dataset. Our results demonstrate the need to expand upon existing composition strategies and bring together grounded and distributional representations.
Tasks
Published 2019-11-08
URL https://arxiv.org/abs/1911.03283v1
PDF https://arxiv.org/pdf/1911.03283v1.pdf
PWC https://paperswithcode.com/paper/composing-and-embedding-the-words-as
Repo
Framework

Zero Shot Learning with the Isoperimetric Loss

Title Zero Shot Learning with the Isoperimetric Loss
Authors Shay Deutsch, Andrea Bertozzi, Stefano Soatto
Abstract We introduce the isoperimetric loss as a regularization criterion for learning the map from a visual representation to a semantic embedding, to be used to transfer knowledge to unknown classes in a zero-shot learning setting. We use a pre-trained deep neural network model as a visual representation of image data, a Word2Vec embedding of class labels, and linear maps between the visual and semantic embedding spaces. However, the spaces themselves are not linear, and we postulate the sample embedding to be populated by noisy samples near otherwise smooth manifolds. We exploit the graph structure defined by the sample points to regularize the estimates of the manifolds by inferring the graph connectivity using a generalization of the isoperimetric inequalities from Riemannian geometry to graphs. Surprisingly, this regularization alone, paired with the simplest baseline model, outperforms the state-of-the-art among fully automated methods in zero-shot learning benchmarks such as AwA and CUB. This improvement is achieved solely by learning the structure of the underlying spaces by imposing regularity.
Tasks Zero-Shot Learning
Published 2019-03-15
URL https://arxiv.org/abs/1903.06781v2
PDF https://arxiv.org/pdf/1903.06781v2.pdf
PWC https://paperswithcode.com/paper/zero-shot-learning-with-the-isoperimetric
Repo
Framework

A Hybrid Persian Sentiment Analysis Framework: Integrating Dependency Grammar Based Rules and Deep Neural Networks

Title A Hybrid Persian Sentiment Analysis Framework: Integrating Dependency Grammar Based Rules and Deep Neural Networks
Authors Kia Dashtipour, Mandar Gogate, Jingpeng Li, Fengling Jiang, Bin Kong, Amir Hussain
Abstract Social media hold valuable, vast and unstructured information on public opinion that can be utilized to improve products and services. The automatic analysis of such data, however, requires a deep understanding of natural language. Current sentiment analysis approaches are mainly based on word co-occurrence frequencies, which are inadequate in most practical cases. In this work, we propose a novel hybrid framework for concept-level sentiment analysis in Persian language, that integrates linguistic rules and deep learning to optimize polarity detection. When a pattern is triggered, the framework allows sentiments to flow from words to concepts based on symbolic dependency relations. When no pattern is triggered, the framework switches to its subsymbolic counterpart and leverages deep neural networks (DNN) to perform the classification. The proposed framework outperforms state-of-the-art approaches (including support vector machine, and logistic regression) and DNN classifiers (long short-term memory, and Convolutional Neural Networks) with a margin of 10-15% and 3-4% respectively, using benchmark Persian product and hotel reviews corpora.
Tasks Sentiment Analysis
Published 2019-09-30
URL https://arxiv.org/abs/1909.13568v1
PDF https://arxiv.org/pdf/1909.13568v1.pdf
PWC https://paperswithcode.com/paper/a-hybrid-persian-sentiment-analysis-framework
Repo
Framework

Exploration and Exploitation in Symbolic Regression using Quality-Diversity and Evolutionary Strategies Algorithms

Title Exploration and Exploitation in Symbolic Regression using Quality-Diversity and Evolutionary Strategies Algorithms
Authors J. -P. Bruneton, L. Cazenille, A. Douin, V. Reverdy
Abstract By combining Genetic Programming, MAP-Elites and Covariance Matrix Adaptation Evolution Strategy, we demonstrate very high success rates in Symbolic Regression problems. MAP-Elites is used to improve exploration while preserving diversity and avoiding premature convergence and bloat. Then, a Covariance Matrix Adaptation-Evolution Strategy is used to evaluate free scalars through a non-gradient-based black-box optimizer. Although this evaluation approach is not computationally scalable to high dimensional problems, our algorithm is able to find exactly most of the $31$ targets extracted from the literature on which we evaluate it.
Tasks
Published 2019-06-10
URL https://arxiv.org/abs/1906.03959v1
PDF https://arxiv.org/pdf/1906.03959v1.pdf
PWC https://paperswithcode.com/paper/exploration-and-exploitation-in-symbolic
Repo
Framework

Large-scale Collaborative Filtering with Product Embeddings

Title Large-scale Collaborative Filtering with Product Embeddings
Authors Thom Lake, Sinead A. Williamson, Alexander T. Hawk, Christopher C. Johnson, Benjamin P. Wing
Abstract The application of machine learning techniques to large-scale personalized recommendation problems is a challenging task. Such systems must make sense of enormous amounts of implicit feedback in order to understand user preferences across numerous product categories. This paper presents a deep learning based solution to this problem within the collaborative filtering with implicit feedback framework. Our approach combines neural attention mechanisms, which allow for context dependent weighting of past behavioral signals, with representation learning techniques to produce models which obtain extremely high coverage, can easily incorporate new information as it becomes available, and are computationally efficient. Offline experiments demonstrate significant performance improvements when compared to several alternative methods from the literature. Results from an online setting show that the approach compares favorably with current production techniques used to produce personalized product recommendations.
Tasks Representation Learning
Published 2019-01-11
URL http://arxiv.org/abs/1901.04321v1
PDF http://arxiv.org/pdf/1901.04321v1.pdf
PWC https://paperswithcode.com/paper/large-scale-collaborative-filtering-with
Repo
Framework

Emotionless: Privacy-Preserving Speech Analysis for Voice Assistants

Title Emotionless: Privacy-Preserving Speech Analysis for Voice Assistants
Authors Ranya Aloufi, Hamed Haddadi, David Boyle
Abstract Voice-enabled interactions provide more human-like experiences in many popular IoT systems. Cloud-based speech analysis services extract useful information from voice input using speech recognition techniques. The voice signal is a rich resource that discloses several possible states of a speaker, such as emotional state, confidence and stress levels, physical condition, age, gender, and personal traits. Service providers can build a very accurate profile of a user’s demographic category, personal preferences, and may compromise privacy. To address this problem, a privacy-preserving intermediate layer between users and cloud services is proposed to sanitize the voice input. It aims to maintain utility while preserving user privacy. It achieves this by collecting real time speech data and analyzes the signal to ensure privacy protection prior to sharing of this data with services providers. Precisely, the sensitive representations are extracted from the raw signal by using transformation functions and then wrapped it via voice conversion technology. Experimental evaluation based on emotion recognition to assess the efficacy of the proposed method shows that identification of sensitive emotional state of the speaker is reduced by ~96 %.
Tasks Emotion Recognition, Speech Recognition, Voice Conversion
Published 2019-08-09
URL https://arxiv.org/abs/1908.03632v1
PDF https://arxiv.org/pdf/1908.03632v1.pdf
PWC https://paperswithcode.com/paper/emotionless-privacy-preserving-speech
Repo
Framework

LPM: Learnable Pooling Module for Efficient Full-Face Gaze Estimation

Title LPM: Learnable Pooling Module for Efficient Full-Face Gaze Estimation
Authors Reo Ogusu, Takao Yamanaka
Abstract Gaze tracking is an important technology in many domains. Techniques such as Convolutional Neural Networks (CNN) has allowed the invention of gaze tracking method that relies only on commodity hardware such as the camera on a personal computer. It has been shown that the full-face region for gaze estimation can provide better performance than from an eye image alone. However, a problem with using the full-face image is the heavy computation due to the larger image size. This study tackles this problem through compression of the input full-face image by removing redundant information using a novel learnable pooling module. The module can be trained end-to-end by backpropagation to learn the size of the grid in the pooling filter. The learnable pooling module keeps the resolution of valuable regions high and vice versa. This proposed method preserved the gaze estimation accuracy at a certain level when the image was reduced to a smaller size.
Tasks Gaze Estimation
Published 2019-03-13
URL http://arxiv.org/abs/1903.05761v2
PDF http://arxiv.org/pdf/1903.05761v2.pdf
PWC https://paperswithcode.com/paper/lpm-learnable-pooling-module-for-efficient
Repo
Framework

Photo-Geometric Autoencoding to Learn 3D Objects from Unlabelled Images

Title Photo-Geometric Autoencoding to Learn 3D Objects from Unlabelled Images
Authors Shangzhe Wu, Christian Rupprecht, Andrea Vedaldi
Abstract We show that generative models can be used to capture visual geometry constraints statistically. We use this fact to infer the 3D shape of object categories from raw single-view images. Differently from prior work, we use no external supervision, nor do we use multiple views or videos of the objects. We achieve this by a simple reconstruction task, exploiting the symmetry of the objects’ shape and albedo. Specifically, given a single image of the object seen from an arbitrary viewpoint, our model predicts a symmetric canonical view, the corresponding 3D shape and a viewpoint transformation, and trains with the goal of reconstructing the input view, resembling an auto-encoder. Our experiments show that this method can recover the 3D shape of human faces, cat faces, and cars from single view images, without supervision. On benchmarks, we demonstrate superior accuracy compared to other methods that use supervision at the level of 2D image correspondences.
Tasks
Published 2019-06-04
URL https://arxiv.org/abs/1906.01568v1
PDF https://arxiv.org/pdf/1906.01568v1.pdf
PWC https://paperswithcode.com/paper/photo-geometric-autoencoding-to-learn-3d
Repo
Framework

Meta-SR: A Magnification-Arbitrary Network for Super-Resolution

Title Meta-SR: A Magnification-Arbitrary Network for Super-Resolution
Authors Xuecai Hu, Haoyuan Mu, Xiangyu Zhang, Zilei Wang, Tieniu Tan, Jian Sun
Abstract Recent research on super-resolution has achieved great success due to the development of deep convolutional neural networks (DCNNs). However, super-resolution of arbitrary scale factor has been ignored for a long time. Most previous researchers regard super-resolution of different scale factors as independent tasks. They train a specific model for each scale factor which is inefficient in computing, and prior work only take the super-resolution of several integer scale factors into consideration. In this work, we propose a novel method called Meta-SR to firstly solve super-resolution of arbitrary scale factor (including non-integer scale factors) with a single model. In our Meta-SR, the Meta-Upscale Module is proposed to replace the traditional upscale module. For arbitrary scale factor, the Meta-Upscale Module dynamically predicts the weights of the upscale filters by taking the scale factor as input and use these weights to generate the HR image of arbitrary size. For any low-resolution image, our Meta-SR can continuously zoom in it with arbitrary scale factor by only using a single model. We evaluated the proposed method through extensive experiments on widely used benchmark datasets on single image super-resolution. The experimental results show the superiority of our Meta-Upscale.
Tasks Image Super-Resolution, Super-Resolution
Published 2019-03-03
URL http://arxiv.org/abs/1903.00875v4
PDF http://arxiv.org/pdf/1903.00875v4.pdf
PWC https://paperswithcode.com/paper/meta-sr-a-magnification-arbitrary-network-for
Repo
Framework
comments powered by Disqus