January 30, 2020

2985 words 15 mins read

Paper Group ANR 406

Paper Group ANR 406

ML-LOO: Detecting Adversarial Examples with Feature Attribution. Technically correct visualization of biological microscopic experiments. Margin Matters: Towards More Discriminative Deep Neural Network Embeddings for Speaker Recognition. Towards Sharing Task Environments to Support Reproducible Evaluations of Interactive Recommender Systems. Deep J …

ML-LOO: Detecting Adversarial Examples with Feature Attribution

Title ML-LOO: Detecting Adversarial Examples with Feature Attribution
Authors Puyudi Yang, Jianbo Chen, Cho-Jui Hsieh, Jane-Ling Wang, Michael I. Jordan
Abstract Deep neural networks obtain state-of-the-art performance on a series of tasks. However, they are easily fooled by adding a small adversarial perturbation to input. The perturbation is often human imperceptible on image data. We observe a significant difference in feature attributions of adversarially crafted examples from those of original ones. Based on this observation, we introduce a new framework to detect adversarial examples through thresholding a scale estimate of feature attribution scores. Furthermore, we extend our method to include multi-layer feature attributions in order to tackle the attacks with mixed confidence levels. Through vast experiments, our method achieves superior performances in distinguishing adversarial examples from popular attack methods on a variety of real data sets among state-of-the-art detection methods. In particular, our method is able to detect adversarial examples of mixed confidence levels, and transfer between different attacking methods.
Tasks
Published 2019-06-08
URL https://arxiv.org/abs/1906.03499v1
PDF https://arxiv.org/pdf/1906.03499v1.pdf
PWC https://paperswithcode.com/paper/ml-loo-detecting-adversarial-examples-with
Repo
Framework

Technically correct visualization of biological microscopic experiments

Title Technically correct visualization of biological microscopic experiments
Authors Ganna Platonova, Dalibor Stys, Pavel Soucek, Petr Machacek, Vladimir Kotal, Renata Rychtarikova
Abstract The most realistic images that reflect native cellular and intracellular structure and behavior can be obtained only using brightfield microscopy. At high-intensity pulsing LED illumination, we captured the primary 12-bit-per-channel (bpc) signal from an observed sample using a brightfield microscope equipped with a high-resolution (4872x3248) camera. In order to suppress image distortions arising from light passing through the whole microscope optical path, from camera sensor defects and geometrical peculiarities of sensor sensitivity, these uncompressed 12-bpc images underwent a kind of correction after simultaneous calibration of all the parts of the experimental arrangement. Moreover, the final corrected images (from biological experiments) show the number of photons reaching each camera pixel and can be visualized in 8-bpc intensity depth after the Least Information Loss compression (Stys et al., Lect. Notes Bioinform. 9656, 2016).
Tasks Calibration
Published 2019-03-14
URL http://arxiv.org/abs/1903.06519v1
PDF http://arxiv.org/pdf/1903.06519v1.pdf
PWC https://paperswithcode.com/paper/technically-correct-visualization-of
Repo
Framework

Margin Matters: Towards More Discriminative Deep Neural Network Embeddings for Speaker Recognition

Title Margin Matters: Towards More Discriminative Deep Neural Network Embeddings for Speaker Recognition
Authors Xu Xiang, Shuai Wang, Houjun Huang, Yanmin Qian, Kai Yu
Abstract Recently, speaker embeddings extracted from a speaker discriminative deep neural network (DNN) yield better performance than the conventional methods such as i-vector. In most cases, the DNN speaker classifier is trained using cross entropy loss with softmax. However, this kind of loss function does not explicitly encourage inter-class separability and intra-class compactness. As a result, the embeddings are not optimal for speaker recognition tasks. In this paper, to address this issue, three different margin based losses which not only separate classes but also demand a fixed margin between classes are introduced to deep speaker embedding learning. It could be demonstrated that the margin is the key to obtain more discriminative speaker embeddings. Experiments are conducted on two public text independent tasks: VoxCeleb1 and Speaker in The Wild (SITW). The proposed approach can achieve the state-of-the-art performance, with 25% ~ 30% equal error rate (EER) reduction on both tasks when compared to strong baselines using cross entropy loss with softmax, obtaining 2.238% EER on VoxCeleb1 test set and 2.761% EER on SITW core-core test set, respectively.
Tasks Speaker Recognition
Published 2019-06-18
URL https://arxiv.org/abs/1906.07317v1
PDF https://arxiv.org/pdf/1906.07317v1.pdf
PWC https://paperswithcode.com/paper/margin-matters-towards-more-discriminative
Repo
Framework

Towards Sharing Task Environments to Support Reproducible Evaluations of Interactive Recommender Systems

Title Towards Sharing Task Environments to Support Reproducible Evaluations of Interactive Recommender Systems
Authors Andrea Barraza-Urbina, Mathieu d’Aquin
Abstract Beyond sharing datasets or simulations, we believe the Recommender Systems (RS) community should share Task Environments. In this work, we propose a high-level logical architecture that will help to reason about the core components of a RS Task Environment, identify the differences between Environments, datasets and simulations; and most importantly, understand what needs to be shared about Environments to achieve reproducible experiments. The work presents itself as valuable initial groundwork, open to discussion and extensions.
Tasks Recommendation Systems
Published 2019-09-13
URL https://arxiv.org/abs/1909.06133v2
PDF https://arxiv.org/pdf/1909.06133v2.pdf
PWC https://paperswithcode.com/paper/towards-sharing-task-environments-to-support
Repo
Framework

Deep Joint Embeddings of Context and Content for Recommendation

Title Deep Joint Embeddings of Context and Content for Recommendation
Authors Miklas S. Kristoffersen, Jacob L. Wieland, Sven E. Shepstone, Zheng-Hua Tan, Vinoba Vinayagamoorthy
Abstract This paper proposes a deep learning-based method for learning joint context-content embeddings (JCCE) with a view to context-aware recommendations, and demonstrate its application in the television domain. JCCE builds on recent progress within latent representations for recommendation and deep metric learning. The model effectively groups viewing situations and associated consumed content, based on supervision from 2.7 million viewing events. Experiments confirm the recommendation ability of JCCE, achieving improvements when compared to state-of-the-art methods. Furthermore, the approach shows meaningful structures in the learned representations that can be used to gain valuable insights of underlying factors in the relationship between contextual settings and content properties.
Tasks Metric Learning
Published 2019-09-13
URL https://arxiv.org/abs/1909.06076v2
PDF https://arxiv.org/pdf/1909.06076v2.pdf
PWC https://paperswithcode.com/paper/deep-joint-embeddings-of-context-and-content
Repo
Framework

A Comprehensive Comparison of Machine Learning Based Methods Used in Bengali Question Classification

Title A Comprehensive Comparison of Machine Learning Based Methods Used in Bengali Question Classification
Authors Afra Anika, Md. Hasibur Rahman, Salekul Islam, Abu Shafin Mohammad Mahdee Jameel, Chowdhury Rafeed Rahman
Abstract QA classification system maps questions asked by humans to an appropriate answer category. A sound question classification (QC) system model is the pre-requisite of a sound QA system. This work demonstrates phases of assembling a QA type classification model. We present a comprehensive comparison (performance and computational complexity) among some machine learning based approaches used in QC for Bengali language.
Tasks Question Answering
Published 2019-11-08
URL https://arxiv.org/abs/1911.03059v2
PDF https://arxiv.org/pdf/1911.03059v2.pdf
PWC https://paperswithcode.com/paper/comparison-of-machine-learning-based-methods
Repo
Framework

Statistical Learning from Biased Training Samples

Title Statistical Learning from Biased Training Samples
Authors Pierre Laforgue, Stephan Clémençon
Abstract With the deluge of digitized information in the Big Data era, massive datasets are becoming increasingly available for learning predictive models. However, in many situations, the poor control of the data acquisition processes may naturally jeopardize the outputs of machine-learning algorithms and selection bias issues are now the subject of much attention in the literature. It is precisely the purpose of the present article to investigate how to extend Empirical Risk Minimization (ERM), the main paradigm of statistical learning, when the training observations are generated from biased models, i.e. from distributions that are different from that of the data in the test/prediction stage. Precisely, we show how to build a “nearly debiased” training statistical population from biased samples and the related biasing functions following in the footsteps of the approach originally proposed in Vardi (1985). Furthermore, we study from a non asymptotic perspective the performance of minimizers of an empirical version of the risk computed from the statistical population thus constructed. Remarkably, the learning rate achieved by this procedure is of the same order as that attained in absence of any selection bias phenomenon. Beyond these theoretical guarantees, illustrative experimental results supporting the relevance of the algorithmic approach promoted in this paper are also displayed.
Tasks
Published 2019-06-28
URL https://arxiv.org/abs/1906.12304v2
PDF https://arxiv.org/pdf/1906.12304v2.pdf
PWC https://paperswithcode.com/paper/statistical-learning-from-biased-training
Repo
Framework

ATZSL: Defensive Zero-Shot Recognition in the Presence of Adversaries

Title ATZSL: Defensive Zero-Shot Recognition in the Presence of Adversaries
Authors Xingxing Zhang, Shupeng Gui, Zhenfeng Zhu, Yao Zhao, Ji Liu
Abstract Zero-shot learning (ZSL) has received extensive attention recently especially in areas of fine-grained object recognition, retrieval, and image captioning. Due to the complete lack of training samples and high requirement of defense transferability, the ZSL model learned is particularly vulnerable against adversarial attacks. Recent work also showed adversarially robust generalization requires more data. This may significantly affect the robustness of ZSL. However, very few efforts have been devoted towards this direction. In this paper, we take an initial attempt, and propose a generic formulation to provide a systematical solution (named ATZSL) for learning a robust ZSL model. It is capable of achieving better generalization on various adversarial objects recognition while only losing a negligible performance on clean images for unseen classes, by casting ZSL into a min-max optimization problem. To address it, we design a defensive relation prediction network, which can bridge the seen and unseen class domains via attributes to generalize prediction and defense strategy. Additionally, our framework can be extended to deal with the poisoned scenario of unseen class attributes. An extensive group of experiments are then presented, demonstrating that ATZSL obtains remarkably more favorable trade-off between model transferability and robustness, over currently available alternatives under various settings.
Tasks Image Captioning, Object Recognition, Zero-Shot Learning
Published 2019-10-24
URL https://arxiv.org/abs/1910.10994v2
PDF https://arxiv.org/pdf/1910.10994v2.pdf
PWC https://paperswithcode.com/paper/atzsl-defensive-zero-shot-recognition-in-the
Repo
Framework

Blockwise Self-Attention for Long Document Understanding

Title Blockwise Self-Attention for Long Document Understanding
Authors Jiezhong Qiu, Hao Ma, Omer Levy, Scott Wen-tau Yih, Sinong Wang, Jie Tang
Abstract We present BlockBERT, a lightweight and efficient BERT model that is designed to better modeling long-distance dependencies. Our model extends BERT by introducing sparse block structures into the attention matrix to reduce both memory consumption and training time, which also enables attention heads to capture either short- or long-range contextual information. We conduct experiments on several benchmark question answering datasets with various paragraph lengths. Results show that BlockBERT uses 18.7-36.1% less memory and reduces the training time by 12.0-25.1%, while having comparable and sometimes better prediction accuracy, compared to an advanced BERT-based model, RoBERTa.
Tasks Question Answering
Published 2019-11-07
URL https://arxiv.org/abs/1911.02972v1
PDF https://arxiv.org/pdf/1911.02972v1.pdf
PWC https://paperswithcode.com/paper/blockwise-self-attention-for-long-document
Repo
Framework

A Spoken Dialogue System for Spatial Question Answering in a Physical Blocks World

Title A Spoken Dialogue System for Spatial Question Answering in a Physical Blocks World
Authors Georgiy Platonov, Benjamin Kane, Aaron Gindi, Lenhart K. Schubert
Abstract The blocks world is a classic toy domain that has long been used to build and test spatial reasoning systems. Despite its relative simplicity, tackling this domain in its full complexity requires the agent to exhibit a rich set of functional capabilities, ranging from vision to natural language understanding. There is currently a resurgence of interest in solving problems in such limited domains using modern techniques. In this work we tackle spatial question answering in a holistic way, using a vision system, speech input and output mediated by an animated avatar, a dialogue system that robustly interprets spatial queries, and a constraint solver that derives answers based on 3-D spatial modeling. The contributions of this work include a semantic parser that maps spatial questions into logical forms consistent with a general approach to meaning representation, a dialog manager based on a schema representation, and a constraint solver for spatial questions that provides answers in agreement with human perception. These and other components are integrated into a multi-modal human-computer interaction pipeline.
Tasks Question Answering
Published 2019-11-06
URL https://arxiv.org/abs/1911.02524v1
PDF https://arxiv.org/pdf/1911.02524v1.pdf
PWC https://paperswithcode.com/paper/a-spoken-dialogue-system-for-spatial-question
Repo
Framework

Hierarchical Prototype Learning for Zero-Shot Recognition

Title Hierarchical Prototype Learning for Zero-Shot Recognition
Authors Xingxing Zhang, Shupeng Gui, Zhenfeng Zhu, Yao Zhao, Ji Liu
Abstract Zero-Shot Learning (ZSL) has received extensive attention and successes in recent years especially in areas of fine-grained object recognition, retrieval, and image captioning. Key to ZSL is to transfer knowledge from the seen to the unseen classes via auxiliary semantic prototypes (e.g., word or attribute vectors). However, the popularly learned projection functions in previous works cannot generalize well due to non-visual components included in semantic prototypes. Besides, the incompleteness of provided prototypes and captured images has less been considered by the state-of-the-art approaches in ZSL. In this paper, we propose a hierarchical prototype learning formulation to provide a systematical solution (named HPL) for zero-shot recognition. Specifically, HPL is able to obtain discriminability on both seen and unseen class domains by learning visual prototypes respectively under the transductive setting. To narrow the gap of two domains, we further learn the interpretable super-prototypes in both visual and semantic spaces. Meanwhile, the two spaces are further bridged by maximizing their structural consistency. This not only facilitates the representativeness of visual prototypes, but also alleviates the loss of information of semantic prototypes. An extensive group of experiments are then carefully designed and presented, demonstrating that HPL obtains remarkably more favorable efficiency and effectiveness, over currently available alternatives under various settings.
Tasks Image Captioning, Object Recognition, Zero-Shot Learning
Published 2019-10-24
URL https://arxiv.org/abs/1910.11671v2
PDF https://arxiv.org/pdf/1910.11671v2.pdf
PWC https://paperswithcode.com/paper/hierarchical-prototype-learning-for-zero-shot
Repo
Framework

An Optimal Transport Framework for Zero-Shot Learning

Title An Optimal Transport Framework for Zero-Shot Learning
Authors Wenlin Wang, Hongteng Xu, Guoyin Wang, Wenqi Wang, Lawrence Carin
Abstract We present an optimal transport (OT) framework for generalized zero-shot learning (GZSL) of imaging data, seeking to distinguish samples for both seen and unseen classes, with the help of auxiliary attributes. The discrepancy between features and attributes is minimized by solving an optimal transport problem. {Specifically, we build a conditional generative model to generate features from seen-class attributes, and establish an optimal transport between the distribution of the generated features and that of the real features.} The generative model and the optimal transport are optimized iteratively with an attribute-based regularizer, that further enhances the discriminative power of the generated features. A classifier is learned based on the features generated for both the seen and unseen classes. In addition to generalized zero-shot learning, our framework is also applicable to standard and transductive ZSL problems. Experiments show that our optimal transport-based method outperforms state-of-the-art methods on several benchmark datasets.
Tasks Zero-Shot Learning
Published 2019-10-20
URL https://arxiv.org/abs/1910.09057v1
PDF https://arxiv.org/pdf/1910.09057v1.pdf
PWC https://paperswithcode.com/paper/an-optimal-transport-framework-for-zero-shot
Repo
Framework

Evaluating Disentangled Representations

Title Evaluating Disentangled Representations
Authors Anna Sepliarskaia, Julia Kiseleva, Maarten de Rijke
Abstract There is no generally agreed upon definition of disentangled representation. Intuitively, the data is generated by a few factors of variation, which are captured and separated in a disentangled representation. Disentangled representations are useful for many tasks such as reinforcement learning, transfer learning, and zero-shot learning. However, the absence of a formally accepted definition makes it difficult to evaluate algorithms for learning disentangled representations. Recently, important steps have been taken towards evaluating disentangled representations: the existing metrics of disentanglement were compared through an experimental study and a framework for the quantitative evaluation of disentangled representations was proposed. However, theoretical guarantees for existing metrics of disentanglement are still missing. In this paper, we analyze metrics of disentanglement and their properties. Specifically, we analyze if the metrics satisfy two desirable properties: (1)~give a high score to representations that are disentangled according to the definition; and (2)~give a low score to representations that are entangled according to the definition. We show that most of the current metrics do not satisfy at least one of these properties. Consequently, we propose a new definition for a metric of disentanglement that satisfies both of the properties.
Tasks Transfer Learning, Zero-Shot Learning
Published 2019-10-12
URL https://arxiv.org/abs/1910.05587v1
PDF https://arxiv.org/pdf/1910.05587v1.pdf
PWC https://paperswithcode.com/paper/evaluating-disentangled-representations
Repo
Framework

A Framework for Multi-f0 Modeling in SATB Choir Recordings

Title A Framework for Multi-f0 Modeling in SATB Choir Recordings
Authors Helena Cuesta, Emilia Gómez, Pritish Chandna
Abstract Fundamental frequency (f0) modeling is an important but relatively unexplored aspect of choir singing. Performance evaluation as well as auditory analysis of singing, whether individually or in a choir, often depend on extracting f0 contours for the singing voice. However, due to the large number of singers, singing at a similar frequency range, extracting the exact individual pitch contours from choir recordings is a challenging task. In this paper, we address this task and develop a methodology for modeling pitch contours of SATB choir recordings. A typical SATB choir consists of four parts, each covering a distinct range of pitches and often with multiple singers each. We first evaluate some state-of-the-art multi-f0 estimation systems for the particular case of choirs with a single singer per part, and observe that the pitch of individual singers can be estimated to a relatively high degree of accuracy. We observe, however, that the scenario of multiple singers for each choir part (i.e. unison singing) is far more challenging. In this work we propose a methodology based on combining a multi-f0 estimation methodology based on deep learning followed by a set of traditional DSP techniques to model f0 and its dispersion instead of a single f0 trajectory for each choir part. We present and discuss our observations and test our framework with different singer configurations.
Tasks
Published 2019-04-10
URL http://arxiv.org/abs/1904.05086v1
PDF http://arxiv.org/pdf/1904.05086v1.pdf
PWC https://paperswithcode.com/paper/a-framework-for-multi-f0-modeling-in-satb
Repo
Framework

An LGMD Based Competitive Collision Avoidance Strategy for UAV

Title An LGMD Based Competitive Collision Avoidance Strategy for UAV
Authors Jiannan Zhao, Xingzao Ma, Qinbing Fu, Cheng Hu, Shigang Yue
Abstract Building a reliable and efficient collision avoidance system for unmanned aerial vehicles (UAVs) is still a challenging problem. This research takes inspiration from locusts, which can fly in dense swarms for hundreds of miles without collision. In the locust’s brain, a visual pathway of LGMD-DCMD (lobula giant movement detector and descending contra-lateral motion detector) has been identified as collision perception system guiding fast collision avoidance for locusts, which is ideal for designing artificial vision systems. However, there is very few works investigating its potential in real-world UAV applications. In this paper, we present an LGMD based competitive collision avoidance method for UAV indoor navigation. Compared to previous works, we divided the UAV’s field of view into four subfields each handled by an LGMD neuron. Therefore, four individual competitive LGMDs (C-LGMD) compete for guiding the directional collision avoidance of UAV. With more degrees of freedom compared to ground robots and vehicles, the UAV can escape from collision along four cardinal directions (e.g. the object approaching from the left-side triggers a rightward shifting of the UAV). Our proposed method has been validated by both simulations and real-time quadcopter arena experiments.
Tasks
Published 2019-04-15
URL http://arxiv.org/abs/1904.07206v1
PDF http://arxiv.org/pdf/1904.07206v1.pdf
PWC https://paperswithcode.com/paper/an-lgmd-based-competitive-collision-avoidance
Repo
Framework
comments powered by Disqus