January 30, 2020

2985 words 15 mins read

Paper Group ANR 406

ML-LOO: Detecting Adversarial Examples with Feature Attribution. Technically correct visualization of biological microscopic experiments. Margin Matters: Towards More Discriminative Deep Neural Network Embeddings for Speaker Recognition. Towards Sharing Task Environments to Support Reproducible Evaluations of Interactive Recommender Systems. Deep J …

ML-LOO: Detecting Adversarial Examples with Feature Attribution


Title	ML-LOO: Detecting Adversarial Examples with Feature Attribution
Authors	Puyudi Yang, Jianbo Chen, Cho-Jui Hsieh, Jane-Ling Wang, Michael I. Jordan
Abstract	Deep neural networks obtain state-of-the-art performance on a series of tasks. However, they are easily fooled by adding a small adversarial perturbation to input. The perturbation is often human imperceptible on image data. We observe a significant difference in feature attributions of adversarially crafted examples from those of original ones. Based on this observation, we introduce a new framework to detect adversarial examples through thresholding a scale estimate of feature attribution scores. Furthermore, we extend our method to include multi-layer feature attributions in order to tackle the attacks with mixed confidence levels. Through vast experiments, our method achieves superior performances in distinguishing adversarial examples from popular attack methods on a variety of real data sets among state-of-the-art detection methods. In particular, our method is able to detect adversarial examples of mixed confidence levels, and transfer between different attacking methods.
Tasks
Published	2019-06-08
URL	https://arxiv.org/abs/1906.03499v1
PDF	https://arxiv.org/pdf/1906.03499v1.pdf
PWC	https://paperswithcode.com/paper/ml-loo-detecting-adversarial-examples-with
Repo
Framework

Technically correct visualization of biological microscopic experiments


Title	Technically correct visualization of biological microscopic experiments
Authors	Ganna Platonova, Dalibor Stys, Pavel Soucek, Petr Machacek, Vladimir Kotal, Renata Rychtarikova
Abstract	The most realistic images that reflect native cellular and intracellular structure and behavior can be obtained only using brightfield microscopy. At high-intensity pulsing LED illumination, we captured the primary 12-bit-per-channel (bpc) signal from an observed sample using a brightfield microscope equipped with a high-resolution (4872x3248) camera. In order to suppress image distortions arising from light passing through the whole microscope optical path, from camera sensor defects and geometrical peculiarities of sensor sensitivity, these uncompressed 12-bpc images underwent a kind of correction after simultaneous calibration of all the parts of the experimental arrangement. Moreover, the final corrected images (from biological experiments) show the number of photons reaching each camera pixel and can be visualized in 8-bpc intensity depth after the Least Information Loss compression (Stys et al., Lect. Notes Bioinform. 9656, 2016).
Tasks	Calibration
Published	2019-03-14
URL	http://arxiv.org/abs/1903.06519v1
PDF	http://arxiv.org/pdf/1903.06519v1.pdf
PWC	https://paperswithcode.com/paper/technically-correct-visualization-of
Repo
Framework

Margin Matters: Towards More Discriminative Deep Neural Network Embeddings for Speaker Recognition


Title	Margin Matters: Towards More Discriminative Deep Neural Network Embeddings for Speaker Recognition
Authors	Xu Xiang, Shuai Wang, Houjun Huang, Yanmin Qian, Kai Yu
Abstract	Recently, speaker embeddings extracted from a speaker discriminative deep neural network (DNN) yield better performance than the conventional methods such as i-vector. In most cases, the DNN speaker classifier is trained using cross entropy loss with softmax. However, this kind of loss function does not explicitly encourage inter-class separability and intra-class compactness. As a result, the embeddings are not optimal for speaker recognition tasks. In this paper, to address this issue, three different margin based losses which not only separate classes but also demand a fixed margin between classes are introduced to deep speaker embedding learning. It could be demonstrated that the margin is the key to obtain more discriminative speaker embeddings. Experiments are conducted on two public text independent tasks: VoxCeleb1 and Speaker in The Wild (SITW). The proposed approach can achieve the state-of-the-art performance, with 25% ~ 30% equal error rate (EER) reduction on both tasks when compared to strong baselines using cross entropy loss with softmax, obtaining 2.238% EER on VoxCeleb1 test set and 2.761% EER on SITW core-core test set, respectively.
Tasks	Speaker Recognition
Published	2019-06-18
URL	https://arxiv.org/abs/1906.07317v1
PDF	https://arxiv.org/pdf/1906.07317v1.pdf
PWC	https://paperswithcode.com/paper/margin-matters-towards-more-discriminative
Repo
Framework


Title	Towards Sharing Task Environments to Support Reproducible Evaluations of Interactive Recommender Systems
Authors	Andrea Barraza-Urbina, Mathieu d’Aquin
Abstract	Beyond sharing datasets or simulations, we believe the Recommender Systems (RS) community should share Task Environments. In this work, we propose a high-level logical architecture that will help to reason about the core components of a RS Task Environment, identify the differences between Environments, datasets and simulations; and most importantly, understand what needs to be shared about Environments to achieve reproducible experiments. The work presents itself as valuable initial groundwork, open to discussion and extensions.
Tasks	Recommendation Systems
Published	2019-09-13
URL	https://arxiv.org/abs/1909.06133v2
PDF	https://arxiv.org/pdf/1909.06133v2.pdf
PWC	https://paperswithcode.com/paper/towards-sharing-task-environments-to-support
Repo
Framework

Deep Joint Embeddings of Context and Content for Recommendation


Title	Deep Joint Embeddings of Context and Content for Recommendation
Authors	Miklas S. Kristoffersen, Jacob L. Wieland, Sven E. Shepstone, Zheng-Hua Tan, Vinoba Vinayagamoorthy
Abstract	This paper proposes a deep learning-based method for learning joint context-content embeddings (JCCE) with a view to context-aware recommendations, and demonstrate its application in the television domain. JCCE builds on recent progress within latent representations for recommendation and deep metric learning. The model effectively groups viewing situations and associated consumed content, based on supervision from 2.7 million viewing events. Experiments confirm the recommendation ability of JCCE, achieving improvements when compared to state-of-the-art methods. Furthermore, the approach shows meaningful structures in the learned representations that can be used to gain valuable insights of underlying factors in the relationship between contextual settings and content properties.
Tasks	Metric Learning
Published	2019-09-13
URL	https://arxiv.org/abs/1909.06076v2
PDF	https://arxiv.org/pdf/1909.06076v2.pdf
PWC	https://paperswithcode.com/paper/deep-joint-embeddings-of-context-and-content
Repo
Framework

A Comprehensive Comparison of Machine Learning Based Methods Used in Bengali Question Classification


Title	A Comprehensive Comparison of Machine Learning Based Methods Used in Bengali Question Classification
Authors	Afra Anika, Md. Hasibur Rahman, Salekul Islam, Abu Shafin Mohammad Mahdee Jameel, Chowdhury Rafeed Rahman
Abstract	QA classification system maps questions asked by humans to an appropriate answer category. A sound question classification (QC) system model is the pre-requisite of a sound QA system. This work demonstrates phases of assembling a QA type classification model. We present a comprehensive comparison (performance and computational complexity) among some machine learning based approaches used in QC for Bengali language.
Tasks	Question Answering
Published	2019-11-08
URL	https://arxiv.org/abs/1911.03059v2
PDF	https://arxiv.org/pdf/1911.03059v2.pdf
PWC	https://paperswithcode.com/paper/comparison-of-machine-learning-based-methods
Repo
Framework

Statistical Learning from Biased Training Samples


Title	Statistical Learning from Biased Training Samples
Authors	Pierre Laforgue, Stephan Clémençon
Abstract	With the deluge of digitized information in the Big Data era, massive datasets are becoming increasingly available for learning predictive models. However, in many situations, the poor control of the data acquisition processes may naturally jeopardize the outputs of machine-learning algorithms and selection bias issues are now the subject of much attention in the literature. It is precisely the purpose of the present article to investigate how to extend Empirical Risk Minimization (ERM), the main paradigm of statistical learning, when the training observations are generated from biased models, i.e. from distributions that are different from that of the data in the test/prediction stage. Precisely, we show how to build a “nearly debiased” training statistical population from biased samples and the related biasing functions following in the footsteps of the approach originally proposed in Vardi (1985). Furthermore, we study from a non asymptotic perspective the performance of minimizers of an empirical version of the risk computed from the statistical population thus constructed. Remarkably, the learning rate achieved by this procedure is of the same order as that attained in absence of any selection bias phenomenon. Beyond these theoretical guarantees, illustrative experimental results supporting the relevance of the algorithmic approach promoted in this paper are also displayed.
Tasks
Published	2019-06-28
URL	https://arxiv.org/abs/1906.12304v2
PDF	https://arxiv.org/pdf/1906.12304v2.pdf
PWC	https://paperswithcode.com/paper/statistical-learning-from-biased-training
Repo
Framework

ATZSL: Defensive Zero-Shot Recognition in the Presence of Adversaries


Title	ATZSL: Defensive Zero-Shot Recognition in the Presence of Adversaries
Authors	Xingxing Zhang, Shupeng Gui, Zhenfeng Zhu, Yao Zhao, Ji Liu
Abstract	Zero-shot learning (ZSL) has received extensive attention recently especially in areas of fine-grained object recognition, retrieval, and image captioning. Due to the complete lack of training samples and high requirement of defense transferability, the ZSL model learned is particularly vulnerable against adversarial attacks. Recent work also showed adversarially robust generalization requires more data. This may significantly affect the robustness of ZSL. However, very few efforts have been devoted towards this direction. In this paper, we take an initial attempt, and propose a generic formulation to provide a systematical solution (named ATZSL) for learning a robust ZSL model. It is capable of achieving better generalization on various adversarial objects recognition while only losing a negligible performance on clean images for unseen classes, by casting ZSL into a min-max optimization problem. To address it, we design a defensive relation prediction network, which can bridge the seen and unseen class domains via attributes to generalize prediction and defense strategy. Additionally, our framework can be extended to deal with the poisoned scenario of unseen class attributes. An extensive group of experiments are then presented, demonstrating that ATZSL obtains remarkably more favorable trade-off between model transferability and robustness, over currently available alternatives under various settings.
Tasks	Image Captioning, Object Recognition, Zero-Shot Learning
Published	2019-10-24
URL	https://arxiv.org/abs/1910.10994v2
PDF	https://arxiv.org/pdf/1910.10994v2.pdf
PWC	https://paperswithcode.com/paper/atzsl-defensive-zero-shot-recognition-in-the
Repo
Framework

Blockwise Self-Attention for Long Document Understanding


Title	Blockwise Self-Attention for Long Document Understanding
Authors	Jiezhong Qiu, Hao Ma, Omer Levy, Scott Wen-tau Yih, Sinong Wang, Jie Tang
Abstract	We present BlockBERT, a lightweight and efficient BERT model that is designed to better modeling long-distance dependencies. Our model extends BERT by introducing sparse block structures into the attention matrix to reduce both memory consumption and training time, which also enables attention heads to capture either short- or long-range contextual information. We conduct experiments on several benchmark question answering datasets with various paragraph lengths. Results show that BlockBERT uses 18.7-36.1% less memory and reduces the training time by 12.0-25.1%, while having comparable and sometimes better prediction accuracy, compared to an advanced BERT-based model, RoBERTa.
Tasks	Question Answering
Published	2019-11-07
URL	https://arxiv.org/abs/1911.02972v1
PDF	https://arxiv.org/pdf/1911.02972v1.pdf
PWC	https://paperswithcode.com/paper/blockwise-self-attention-for-long-document
Repo
Framework

A Spoken Dialogue System for Spatial Question Answering in a Physical Blocks World


Title	A Spoken Dialogue System for Spatial Question Answering in a Physical Blocks World
Authors	Georgiy Platonov, Benjamin Kane, Aaron Gindi, Lenhart K. Schubert
Abstract	The blocks world is a classic toy domain that has long been used to build and test spatial reasoning systems. Despite its relative simplicity, tackling this domain in its full complexity requires the agent to exhibit a rich set of functional capabilities, ranging from vision to natural language understanding. There is currently a resurgence of interest in solving problems in such limited domains using modern techniques. In this work we tackle spatial question answering in a holistic way, using a vision system, speech input and output mediated by an animated avatar, a dialogue system that robustly interprets spatial queries, and a constraint solver that derives answers based on 3-D spatial modeling. The contributions of this work include a semantic parser that maps spatial questions into logical forms consistent with a general approach to meaning representation, a dialog manager based on a schema representation, and a constraint solver for spatial questions that provides answers in agreement with human perception. These and other components are integrated into a multi-modal human-computer interaction pipeline.
Tasks	Question Answering
Published	2019-11-06
URL	https://arxiv.org/abs/1911.02524v1
PDF	https://arxiv.org/pdf/1911.02524v1.pdf
PWC	https://paperswithcode.com/paper/a-spoken-dialogue-system-for-spatial-question
Repo
Framework

Hierarchical Prototype Learning for Zero-Shot Recognition


Title	Hierarchical Prototype Learning for Zero-Shot Recognition
Authors	Xingxing Zhang, Shupeng Gui, Zhenfeng Zhu, Yao Zhao, Ji Liu
Abstract	Zero-Shot Learning (ZSL) has received extensive attention and successes in recent years especially in areas of fine-grained object recognition, retrieval, and image captioning. Key to ZSL is to transfer knowledge from the seen to the unseen classes via auxiliary semantic prototypes (e.g., word or attribute vectors). However, the popularly learned projection functions in previous works cannot generalize well due to non-visual components included in semantic prototypes. Besides, the incompleteness of provided prototypes and captured images has less been considered by the state-of-the-art approaches in ZSL. In this paper, we propose a hierarchical prototype learning formulation to provide a systematical solution (named HPL) for zero-shot recognition. Specifically, HPL is able to obtain discriminability on both seen and unseen class domains by learning visual prototypes respectively under the transductive setting. To narrow the gap of two domains, we further learn the interpretable super-prototypes in both visual and semantic spaces. Meanwhile, the two spaces are further bridged by maximizing their structural consistency. This not only facilitates the representativeness of visual prototypes, but also alleviates the loss of information of semantic prototypes. An extensive group of experiments are then carefully designed and presented, demonstrating that HPL obtains remarkably more favorable efficiency and effectiveness, over currently available alternatives under various settings.
Tasks	Image Captioning, Object Recognition, Zero-Shot Learning
Published	2019-10-24
URL	https://arxiv.org/abs/1910.11671v2
PDF	https://arxiv.org/pdf/1910.11671v2.pdf
PWC	https://paperswithcode.com/paper/hierarchical-prototype-learning-for-zero-shot
Repo
Framework

An Optimal Transport Framework for Zero-Shot Learning


Title	An Optimal Transport Framework for Zero-Shot Learning
Authors	Wenlin Wang, Hongteng Xu, Guoyin Wang, Wenqi Wang, Lawrence Carin
Abstract	We present an optimal transport (OT) framework for generalized zero-shot learning (GZSL) of imaging data, seeking to distinguish samples for both seen and unseen classes, with the help of auxiliary attributes. The discrepancy between features and attributes is minimized by solving an optimal transport problem. {Specifically, we build a conditional generative model to generate features from seen-class attributes, and establish an optimal transport between the distribution of the generated features and that of the real features.} The generative model and the optimal transport are optimized iteratively with an attribute-based regularizer, that further enhances the discriminative power of the generated features. A classifier is learned based on the features generated for both the seen and unseen classes. In addition to generalized zero-shot learning, our framework is also applicable to standard and transductive ZSL problems. Experiments show that our optimal transport-based method outperforms state-of-the-art methods on several benchmark datasets.
Tasks	Zero-Shot Learning
Published	2019-10-20
URL	https://arxiv.org/abs/1910.09057v1
PDF	https://arxiv.org/pdf/1910.09057v1.pdf
PWC	https://paperswithcode.com/paper/an-optimal-transport-framework-for-zero-shot
Repo
Framework

Evaluating Disentangled Representations


Title	Evaluating Disentangled Representations
Authors	Anna Sepliarskaia, Julia Kiseleva, Maarten de Rijke
Abstract	There is no generally agreed upon definition of disentangled representation. Intuitively, the data is generated by a few factors of variation, which are captured and separated in a disentangled representation. Disentangled representations are useful for many tasks such as reinforcement learning, transfer learning, and zero-shot learning. However, the absence of a formally accepted definition makes it difficult to evaluate algorithms for learning disentangled representations. Recently, important steps have been taken towards evaluating disentangled representations: the existing metrics of disentanglement were compared through an experimental study and a framework for the quantitative evaluation of disentangled representations was proposed. However, theoretical guarantees for existing metrics of disentanglement are still missing. In this paper, we analyze metrics of disentanglement and their properties. Specifically, we analyze if the metrics satisfy two desirable properties: (1)~give a high score to representations that are disentangled according to the definition; and (2)~give a low score to representations that are entangled according to the definition. We show that most of the current metrics do not satisfy at least one of these properties. Consequently, we propose a new definition for a metric of disentanglement that satisfies both of the properties.
Tasks	Transfer Learning, Zero-Shot Learning
Published	2019-10-12
URL	https://arxiv.org/abs/1910.05587v1
PDF	https://arxiv.org/pdf/1910.05587v1.pdf
PWC	https://paperswithcode.com/paper/evaluating-disentangled-representations
Repo
Framework

A Framework for Multi-f0 Modeling in SATB Choir Recordings


Title	A Framework for Multi-f0 Modeling in SATB Choir Recordings
Authors	Helena Cuesta, Emilia Gómez, Pritish Chandna
Abstract	Fundamental frequency (f0) modeling is an important but relatively unexplored aspect of choir singing. Performance evaluation as well as auditory analysis of singing, whether individually or in a choir, often depend on extracting f0 contours for the singing voice. However, due to the large number of singers, singing at a similar frequency range, extracting the exact individual pitch contours from choir recordings is a challenging task. In this paper, we address this task and develop a methodology for modeling pitch contours of SATB choir recordings. A typical SATB choir consists of four parts, each covering a distinct range of pitches and often with multiple singers each. We first evaluate some state-of-the-art multi-f0 estimation systems for the particular case of choirs with a single singer per part, and observe that the pitch of individual singers can be estimated to a relatively high degree of accuracy. We observe, however, that the scenario of multiple singers for each choir part (i.e. unison singing) is far more challenging. In this work we propose a methodology based on combining a multi-f0 estimation methodology based on deep learning followed by a set of traditional DSP techniques to model f0 and its dispersion instead of a single f0 trajectory for each choir part. We present and discuss our observations and test our framework with different singer configurations.
Tasks
Published	2019-04-10
URL	http://arxiv.org/abs/1904.05086v1
PDF	http://arxiv.org/pdf/1904.05086v1.pdf
PWC	https://paperswithcode.com/paper/a-framework-for-multi-f0-modeling-in-satb
Repo
Framework

An LGMD Based Competitive Collision Avoidance Strategy for UAV


Title	An LGMD Based Competitive Collision Avoidance Strategy for UAV
Authors	Jiannan Zhao, Xingzao Ma, Qinbing Fu, Cheng Hu, Shigang Yue
Abstract	Building a reliable and efficient collision avoidance system for unmanned aerial vehicles (UAVs) is still a challenging problem. This research takes inspiration from locusts, which can fly in dense swarms for hundreds of miles without collision. In the locust’s brain, a visual pathway of LGMD-DCMD (lobula giant movement detector and descending contra-lateral motion detector) has been identified as collision perception system guiding fast collision avoidance for locusts, which is ideal for designing artificial vision systems. However, there is very few works investigating its potential in real-world UAV applications. In this paper, we present an LGMD based competitive collision avoidance method for UAV indoor navigation. Compared to previous works, we divided the UAV’s field of view into four subfields each handled by an LGMD neuron. Therefore, four individual competitive LGMDs (C-LGMD) compete for guiding the directional collision avoidance of UAV. With more degrees of freedom compared to ground robots and vehicles, the UAV can escape from collision along four cardinal directions (e.g. the object approaching from the left-side triggers a rightward shifting of the UAV). Our proposed method has been validated by both simulations and real-time quadcopter arena experiments.
Tasks
Published	2019-04-15
URL	http://arxiv.org/abs/1904.07206v1
PDF	http://arxiv.org/pdf/1904.07206v1.pdf
PWC	https://paperswithcode.com/paper/an-lgmd-based-competitive-collision-avoidance
Repo
Framework