January 25, 2020

3409 words 17 mins read

Paper Group ANR 1669

Paper Group ANR 1669

Distilled embedding: non-linear embedding factorization using knowledge distillation. Variance Reduction for Evolution Strategies via Structured Control Variates. Speech Recognition With No Speech Or With Noisy Speech Beyond English. Stochastic Filter Groups for Multi-Task CNNs: Learning Specialist and Generalist Convolution Kernels. Safe Interacti …

Distilled embedding: non-linear embedding factorization using knowledge distillation

Title Distilled embedding: non-linear embedding factorization using knowledge distillation
Authors Vasileios Lioutas, Ahmad Rashid, Krtin Kumar, Md Akmal Haidar, Mehdi Rezagholizadeh
Abstract Word-embeddings are a vital component of Natural Language Processing (NLP) systems and have been extensively researched. Better representations of words have come at the cost of huge memory footprints, which has made deploying NLP models on edge-devices challenging due to memory limitations. Compressing embedding matrices without sacrificing model performance is essential for successful commercial edge deployment. In this paper, we propose Distilled Embedding, an (input/output) embedding compression method based on low-rank matrix decomposition with an added non-linearity. First, we initialize the weights of our decomposition by learning to reconstruct the full word-embedding and then fine-tune on the downstream task employing knowledge distillation on the factorized embedding. We conduct extensive experimentation with various compression rates on machine translation, using different data-sets with a shared word-embedding matrix for both embedding and vocabulary projection matrices. We show that the proposed technique outperforms conventional low-rank matrix factorization, and other recently proposed word-embedding matrix compression methods.
Tasks Machine Translation, Word Embeddings
Published 2019-10-02
URL https://arxiv.org/abs/1910.06720v1
PDF https://arxiv.org/pdf/1910.06720v1.pdf
PWC https://paperswithcode.com/paper/distilled-embedding-non-linear-embedding
Repo
Framework

Variance Reduction for Evolution Strategies via Structured Control Variates

Title Variance Reduction for Evolution Strategies via Structured Control Variates
Authors Yunhao Tang, Krzysztof Choromanski, Alp Kucukelbir
Abstract Evolution Strategies (ES) are a powerful class of blackbox optimization techniques that recently became a competitive alternative to state-of-the-art policy gradient (PG) algorithms for reinforcement learning (RL). We propose a new method for improving accuracy of the ES algorithms, that as opposed to recent approaches utilizing only Monte Carlo structure of the gradient estimator, takes advantage of the underlying MDP structure to reduce the variance. We observe that the gradient estimator of the ES objective can be alternatively computed using reparametrization and PG estimators, which leads to new control variate techniques for gradient estimation in ES optimization. We provide theoretical insights and show through extensive experiments that this RL-specific variance reduction approach outperforms general purpose variance reduction methods.
Tasks
Published 2019-05-29
URL https://arxiv.org/abs/1906.08868v2
PDF https://arxiv.org/pdf/1906.08868v2.pdf
PWC https://paperswithcode.com/paper/variance-reduction-for-evolution-strategies
Repo
Framework

Speech Recognition With No Speech Or With Noisy Speech Beyond English

Title Speech Recognition With No Speech Or With Noisy Speech Beyond English
Authors Gautam Krishna, Co Tran, Yan Han, Mason Carnahan, Ahmed H Tewfik
Abstract In this paper we demonstrate continuous noisy speech recognition using connectionist temporal classification (CTC) model on limited Chinese vocabulary using electroencephalography (EEG) features with no speech signal as input and we further demonstrate single CTC model based continuous noisy speech recognition on limited joint English and Chinese vocabulary using EEG features with no speech signal as input. We demonstrate our results using various EEG feature sets recently introduced in [1] as well as we propose a new deep learning architecture in this paper which can perform continuous speech recognition using raw EEG signals on limited joint English and Chinese vocabulary.
Tasks EEG, Noisy Speech Recognition, Speech Recognition
Published 2019-06-17
URL https://arxiv.org/abs/1906.08045v5
PDF https://arxiv.org/pdf/1906.08045v5.pdf
PWC https://paperswithcode.com/paper/speech-recognition-with-no-speech-or-with-1
Repo
Framework

Stochastic Filter Groups for Multi-Task CNNs: Learning Specialist and Generalist Convolution Kernels

Title Stochastic Filter Groups for Multi-Task CNNs: Learning Specialist and Generalist Convolution Kernels
Authors Felix J. S. Bragman, Ryutaro Tanno, Sebastien Ourselin, Daniel C. Alexander, M. Jorge Cardoso
Abstract The performance of multi-task learning in Convolutional Neural Networks (CNNs) hinges on the design of feature sharing between tasks within the architecture. The number of possible sharing patterns are combinatorial in the depth of the network and the number of tasks, and thus hand-crafting an architecture, purely based on the human intuitions of task relationships can be time-consuming and suboptimal. In this paper, we present a probabilistic approach to learning task-specific and shared representations in CNNs for multi-task learning. Specifically, we propose “stochastic filter groups’’ (SFG), a mechanism to assign convolution kernels in each layer to “specialist’’ or “generalist’’ groups, which are specific to or shared across different tasks, respectively. The SFG modules determine the connectivity between layers and the structures of task-specific and shared representations in the network. We employ variational inference to learn the posterior distribution over the possible grouping of kernels and network parameters. Experiments demonstrate that the proposed method generalises across multiple tasks and shows improved performance over baseline methods.
Tasks Multi-Task Learning
Published 2019-08-26
URL https://arxiv.org/abs/1908.09597v1
PDF https://arxiv.org/pdf/1908.09597v1.pdf
PWC https://paperswithcode.com/paper/stochastic-filter-groups-for-multi-task-cnns
Repo
Framework

Safe Interactive Model-Based Learning

Title Safe Interactive Model-Based Learning
Authors Marco Gallieri, Seyed Sina Mirrazavi Salehian, Nihat Engin Toklu, Alessio Quaglino, Jonathan Masci, Jan Koutník, Faustino Gomez
Abstract Control applications present hard operational constraints. A violation of these can result in unsafe behavior. This paper introduces Safe Interactive Model Based Learning (SiMBL), a framework to refine an existing controller and a system model while operating on the real environment. SiMBL is composed of the following trainable components: a Lyapunov function, which determines a safe set; a safe control policy; and a Bayesian RNN forward model. A min-max control framework, based on alternate minimisation and backpropagation through the forward model, is used for the offline computation of the controller and the safe set. Safety is formally verified a-posteriori with a probabilistic method that utilizes the Noise Contrastive Priors (NPC) idea to build a Bayesian RNN forward model with an additive state uncertainty estimate which is large outside the training data distribution. Iterative refinement of the model and the safe set is achieved thanks to a novel loss that conditions the uncertainty estimates of the new model to be close to the current one. The learned safe set and model can also be used for safe exploration, i.e., to collect data within the safe invariant set, for which a simple one-step MPC is proposed. The single components are tested on the simulation of an inverted pendulum with limited torque and stability region, showing that iteratively adding more data can improve the model, the controller and the size of the safe region.
Tasks Safe Exploration
Published 2019-11-15
URL https://arxiv.org/abs/1911.06556v2
PDF https://arxiv.org/pdf/1911.06556v2.pdf
PWC https://paperswithcode.com/paper/safe-interactive-model-based-learning
Repo
Framework

Deep Learning for Channel Coding via Neural Mutual Information Estimation

Title Deep Learning for Channel Coding via Neural Mutual Information Estimation
Authors Rick Fritschek, Rafael F. Schaefer, Gerhard Wunder
Abstract End-to-end deep learning for communication systems, i.e., systems whose encoder and decoder are learned, has attracted significant interest recently, due to its performance which comes close to well-developed classical encoder-decoder designs. However, one of the drawbacks of current learning approaches is that a differentiable channel model is needed for the training of the underlying neural networks. In real-world scenarios, such a channel model is hardly available and often the channel density is not even known at all. Some works, therefore, focus on a generative approach, i.e., generating the channel from samples, or rely on reinforcement learning to circumvent this problem. We present a novel approach which utilizes a recently proposed neural estimator of mutual information. We use this estimator to optimize the encoder for a maximized mutual information, only relying on channel samples. Moreover, we show that our approach achieves the same performance as state-of-the-art end-to-end learning with perfect channel model knowledge.
Tasks
Published 2019-03-07
URL http://arxiv.org/abs/1903.02865v1
PDF http://arxiv.org/pdf/1903.02865v1.pdf
PWC https://paperswithcode.com/paper/deep-learning-for-channel-coding-via-neural
Repo
Framework

Deep-learning-based identification of odontogenic keratocysts in hematoxylin- and eosin-stained jaw cyst specimens

Title Deep-learning-based identification of odontogenic keratocysts in hematoxylin- and eosin-stained jaw cyst specimens
Authors Kei Sakamoto, Kei-ichi Morita, Tohru Ikeda, Kou Kayamori
Abstract The aim of this study was to develop a digital histopathology system for identifying odontogenic keratocysts in hematoxylin- and eosin-stained tissue specimens of jaw cysts. Approximately 5000 microscopy images with 400$\times$ magnification were obtained from 199 odontogenic keratocysts, 208 dentigerous cysts, and 55 radicular cysts. A proportion of these images were used to make training patches, which were annotated as belonging to one of the following three classes: keratocysts, non-keratocysts, and stroma. The patches for the cysts contained the complete lining epithelium, with the cyst cavity being present on the upper side. The convolutional neural network (CNN) VGG16 was finetuned to this dataset. The trained CNN could recognize the basal cell palisading pattern, which is the definitive criterion for diagnosing keratocysts. Some of the remaining images were scanned and analyzed by the trained CNN, whose output was then used to train another CNN for binary classification (keratocyst or not). The area under the receiver operating characteristics curve for the entire algorithm was 0.997 for the test dataset. Thus, the proposed patch classification strategy is usable for automated keratocyst diagnosis. However, further optimization must be performed to make it suitable for practical use.
Tasks
Published 2019-01-12
URL http://arxiv.org/abs/1901.03857v1
PDF http://arxiv.org/pdf/1901.03857v1.pdf
PWC https://paperswithcode.com/paper/deep-learning-based-identification-of
Repo
Framework

Deep Anomaly Detection for Generalized Face Anti-Spoofing

Title Deep Anomaly Detection for Generalized Face Anti-Spoofing
Authors Daniel Pérez-Cabo, David Jiménez-Cabello, Artur Costa-Pazo, Roberto J. López-Sastre
Abstract Face recognition has achieved unprecedented results, surpassing human capabilities in certain scenarios. However, these automatic solutions are not ready for production because they can be easily fooled by simple identity impersonation attacks. And although much effort has been devoted to develop face anti-spoofing models, their generalization capacity still remains a challenge in real scenarios. In this paper, we introduce a novel approach that reformulates the Generalized Presentation Attack Detection (GPAD) problem from an anomaly detection perspective. Technically, a deep metric learning model is proposed, where a triplet focal loss is used as a regularization for a novel loss coined “metric-softmax”, which is in charge of guiding the learning process towards more discriminative feature representations in an embedding space. Finally, we demonstrate the benefits of our deep anomaly detection architecture, by introducing a few-shot a posteriori probability estimation that does not need any classifier to be trained on the learned features. We conduct extensive experiments using the GRAD-GPAD framework that provides the largest aggregated dataset for face GPAD. Results confirm that our approach is able to outperform all the state-of-the-art methods by a considerable margin.
Tasks Anomaly Detection, Face Anti-Spoofing, Face Recognition, Metric Learning
Published 2019-04-17
URL http://arxiv.org/abs/1904.08241v1
PDF http://arxiv.org/pdf/1904.08241v1.pdf
PWC https://paperswithcode.com/paper/190408241
Repo
Framework

Semantic Relationships Guided Representation Learning for Facial Action Unit Recognition

Title Semantic Relationships Guided Representation Learning for Facial Action Unit Recognition
Authors Guanbin Li, Xin Zhu, Yirui Zeng, Qing Wang, Liang Lin
Abstract Facial action unit (AU) recognition is a crucial task for facial expressions analysis and has attracted extensive attention in the field of artificial intelligence and computer vision. Existing works have either focused on designing or learning complex regional feature representations, or delved into various types of AU relationship modeling. Albeit with varying degrees of progress, it is still arduous for existing methods to handle complex situations. In this paper, we investigate how to integrate the semantic relationship propagation between AUs in a deep neural network framework to enhance the feature representation of facial regions, and propose an AU semantic relationship embedded representation learning (SRERL) framework. Specifically, by analyzing the symbiosis and mutual exclusion of AUs in various facial expressions, we organize the facial AUs in the form of structured knowledge-graph and integrate a Gated Graph Neural Network (GGNN) in a multi-scale CNN framework to propagate node information through the graph for generating enhanced AU representation. As the learned feature involves both the appearance characteristics and the AU relationship reasoning, the proposed model is more robust and can cope with more challenging cases, e.g., illumination change and partial occlusion. Extensive experiments on the two public benchmarks demonstrate that our method outperforms the previous work and achieves state of the art performance.
Tasks Facial Action Unit Detection, Representation Learning
Published 2019-04-22
URL http://arxiv.org/abs/1904.09939v1
PDF http://arxiv.org/pdf/1904.09939v1.pdf
PWC https://paperswithcode.com/paper/semantic-relationships-guided-representation
Repo
Framework
Title Towards Real-time Eyeblink Detection in The Wild:Dataset,Theory and Practices
Authors Guilei Hu, Yang Xiao, Zhiguo Cao, Lubin Meng, Zhiwen Fang, Joey Tianyi Zhou, Junsong Yuan
Abstract Effective and real-time eyeblink detection is of wide-range applications, such as deception detection, drive fatigue detection, face anti-spoofing, etc. Although numerous of efforts have already been paid, most of them focus on addressing the eyeblink detection problem under the constrained indoor conditions with the relative consistent subject and environment setup. Nevertheless, towards the practical applications eyeblink detection in the wild is more required, and of greater challenges. However, to our knowledge this has not been well studied before. In this paper, we shed the light to this research topic. A labelled eyeblink in the wild dataset (i.e., HUST-LEBW) of 673 eyeblink video samples (i.e., 381 positives, and 292 negatives) is first established by us. These samples are captured from the unconstrained movies, with the dramatic variation on human attribute, human pose, illumination condition, imaging configuration, etc. Then, we formulate eyeblink detection task as a spatial-temporal pattern recognition problem. After locating and tracking human eye using SeetaFace engine and KCF tracker respectively, a modified LSTM model able to capture the multi-scale temporal information is proposed to execute eyeblink verification. A feature extraction approach that reveals appearance and motion characteristics simultaneously is also proposed. The experiments on HUST-LEBW reveal the superiority and efficiency of our approach. It also verifies that, the existing eyeblink detection methods cannot achieve satisfactory performance in the wild.
Tasks Deception Detection, Face Anti-Spoofing
Published 2019-02-21
URL https://arxiv.org/abs/1902.07891v3
PDF https://arxiv.org/pdf/1902.07891v3.pdf
PWC https://paperswithcode.com/paper/towards-real-time-eyeblink-detection-in-the
Repo
Framework

Evaluating and Calibrating Uncertainty Prediction in Regression Tasks

Title Evaluating and Calibrating Uncertainty Prediction in Regression Tasks
Authors Dan Levi, Liran Gispan, Niv Giladi, Ethan Fetaya
Abstract Predicting not only the target but also an accurate measure of uncertainty is important for many machine learning applications and in particular safety-critical ones. In this work we study the calibration of uncertainty prediction for regression tasks which often arise in real-world systems. We show that the existing definition for calibration of a regression uncertainty [Kuleshov et al. 2018] has severe limitations in distinguishing informative from non-informative uncertainty predictions. We propose a new definition that escapes this caveat and an evaluation method using a simple histogram-based approach. Our method clusters examples with similar uncertainty prediction and compares the prediction with the empirical uncertainty on these examples. We also propose a simple, scaling-based calibration method that preforms as well as much more complex ones. We show results on both a synthetic, controlled problem and on the object detection bounding-box regression task using the COCO and KITTI datasets.
Tasks Calibration, Object Detection
Published 2019-05-28
URL https://arxiv.org/abs/1905.11659v3
PDF https://arxiv.org/pdf/1905.11659v3.pdf
PWC https://paperswithcode.com/paper/evaluating-and-calibrating-uncertainty
Repo
Framework

Category-Aware Location Embedding for Point-of-Interest Recommendation

Title Category-Aware Location Embedding for Point-of-Interest Recommendation
Authors Hossein A. Rahmani, Mohammad Aliannejadi, Rasoul Mirzaei Zadeh, Mitra Baratchi, Mohsen Afsharchi, Fabio Crestani
Abstract Recently, Point of interest (POI) recommendation has gained ever-increasing importance in various Location-Based Social Networks (LBSNs). With the recent advances of neural models, much work has sought to leverage neural networks to learn neural embeddings in a pre-training phase that achieve an improved representation of POIs and consequently a better recommendation. However, previous studies fail to capture crucial information about POIs such as categorical information. In this paper, we propose a novel neural model that generates a POI embedding incorporating sequential and categorical information from POIs. Our model consists of a check-in module and a category module. The check-in module captures the geographical influence of POIs derived from the sequence of users’ check-ins, while the category module captures the characteristics of POIs derived from the category information. To validate the efficacy of the model, we experimented with two large-scale LBSN datasets. Our experimental results demonstrate that our approach significantly outperforms state-of-the-art POI recommendation methods.
Tasks
Published 2019-07-31
URL https://arxiv.org/abs/1907.13376v1
PDF https://arxiv.org/pdf/1907.13376v1.pdf
PWC https://paperswithcode.com/paper/category-aware-location-embedding-for-point
Repo
Framework

Generalizing from a Few Examples: A Survey on Few-Shot Learning

Title Generalizing from a Few Examples: A Survey on Few-Shot Learning
Authors Yaqing Wang, Quanming Yao, James Kwok, Lionel M. Ni
Abstract Machine learning has been highly successful in data-intensive applications but is often hampered when the data set is small. Recently, Few-Shot Learning (FSL) is proposed to tackle this problem. Using prior knowledge, FSL can rapidly generalize to new tasks containing only a few samples with supervised information. In this paper, we conduct a thorough survey to fully understand FSL. Starting from a formal definition of FSL, we distinguish FSL from several relevant machine learning problems. We then point out that the core issue in FSL is that the empirical risk minimized is unreliable. Based on how prior knowledge can be used to handle this core issue, we categorize FSL methods from three perspectives: (i) data, which uses prior knowledge to augment the supervised experience; (ii) model, which uses prior knowledge to reduce the size of the hypothesis space; and (iii) algorithm, which uses prior knowledge to alter the search for the best hypothesis in the given hypothesis space. With this taxonomy, we review and discuss the pros and cons of each category. Promising directions, in the aspects of the FSL problem setups, techniques, applications and theories, are also proposed to provide insights for future research.
Tasks Few-Shot Learning
Published 2019-04-10
URL https://arxiv.org/abs/1904.05046v3
PDF https://arxiv.org/pdf/1904.05046v3.pdf
PWC https://paperswithcode.com/paper/few-shot-learning-a-survey
Repo
Framework

Enhance the Motion Cues for Face Anti-Spoofing using CNN-LSTM Architecture

Title Enhance the Motion Cues for Face Anti-Spoofing using CNN-LSTM Architecture
Authors Xiaoguang Tu, Hengsheng Zhang, Mei Xie, Yao Luo, Yuefei Zhang, Zheng Ma
Abstract Spatio-temporal information is very important to capture the discriminative cues between genuine and fake faces from video sequences. To explore such a temporal feature, the fine-grained motions (e.g., eye blinking, mouth movements and head swing) across video frames are very critical. In this paper, we propose a joint CNN-LSTM network for face anti-spoofing, focusing on the motion cues across video frames. We first extract the high discriminative features of video frames using the conventional Convolutional Neural Network (CNN). Then we leverage Long Short-Term Memory (LSTM) with the extracted features as inputs to capture the temporal dynamics in videos. To ensure the fine-grained motions more easily to be perceived in the training process, the eulerian motion magnification is used as the preprocessing to enhance the facial expressions exhibited by individuals, and the attention mechanism is embedded in LSTM to ensure the model learn to focus selectively on the dynamic frames across the video clips. Experiments on Replay Attack and MSU-MFSD databases show that the proposed method yields state-of-the-art performance with better generalization ability compared with several other popular algorithms.
Tasks Face Anti-Spoofing
Published 2019-01-17
URL http://arxiv.org/abs/1901.05635v1
PDF http://arxiv.org/pdf/1901.05635v1.pdf
PWC https://paperswithcode.com/paper/enhance-the-motion-cues-for-face-anti
Repo
Framework

Spatio-Temporal Deep Learning Models for Tip Force Estimation During Needle Insertion

Title Spatio-Temporal Deep Learning Models for Tip Force Estimation During Needle Insertion
Authors Nils Gessert, Torben Priegnitz, Thore Saathoff, Sven-Thomas Antoni, David Meyer, Moritz Franz Hamann, Klaus-Peter Jünemann, Christoph Otte, Alexander Schlaefer
Abstract Purpose. Precise placement of needles is a challenge in a number of clinical applications such as brachytherapy or biopsy. Forces acting at the needle cause tissue deformation and needle deflection which in turn may lead to misplacement or injury. Hence, a number of approaches to estimate the forces at the needle have been proposed. Yet, integrating sensors into the needle tip is challenging and a careful calibration is required to obtain good force estimates. Methods. We describe a fiber-optical needle tip force sensor design using a single OCT fiber for measurement. The fiber images the deformation of an epoxy layer placed below the needle tip which results in a stream of 1D depth profiles. We study different deep learning approaches to facilitate calibration between this spatio-temporal image data and the related forces. In particular, we propose a novel convGRU-CNN architecture for simultaneous spatial and temporal data processing. Results. The needle can be adapted to different operating ranges by changing the stiffness of the epoxy layer. Likewise, calibration can be adapted by training the deep learning models. Our novel convGRU-CNN architecture results in the lowest mean absolute error of 1.59 +- 1.3 mN and a cross-correlation coefficient of 0.9997, and clearly outperforms the other methods. Ex vivo experiments in human prostate tissue demonstrate the needle’s application. Conclusions. Our OCT-based fiber-optical sensor presents a viable alternative for needle tip force estimation. The results indicate that the rich spatio-temporal information included in the stream of images showing the deformation throughout the epoxy layer can be effectively used by deep learning models. Particularly, we demonstrate that the convGRU-CNN architecture performs favorably, making it a promising approach for other spatio-temporal learning problems.
Tasks Calibration
Published 2019-05-22
URL https://arxiv.org/abs/1905.09282v1
PDF https://arxiv.org/pdf/1905.09282v1.pdf
PWC https://paperswithcode.com/paper/spatio-temporal-deep-learning-models-for-tip
Repo
Framework
comments powered by Disqus