October 20, 2019

3087 words 15 mins read

Paper Group ANR 100

Paper Group ANR 100

Efficient Large-Scale Domain Classification with Personalized Attention. Training and Prediction Data Discrepancies: Challenges of Text Classification with Noisy, Historical Data. Characterizing and Avoiding Negative Transfer. Analysis of Noisy Evolutionary Optimization When Sampling Fails. Right for the Right Reason: Training Agnostic Networks. As …

Efficient Large-Scale Domain Classification with Personalized Attention

Title Efficient Large-Scale Domain Classification with Personalized Attention
Authors Young-Bum Kim, Dongchan Kim, Anjishnu Kumar, Ruhi Sarikaya
Abstract In this paper, we explore the task of mapping spoken language utterances to one of thousands of natural language understanding domains in intelligent personal digital assistants (IPDAs). This scenario is observed for many mainstream IPDAs in industry that allow third parties to develop thousands of new domains to augment built-in ones to rapidly increase domain coverage and overall IPDA capabilities. We propose a scalable neural model architecture with a shared encoder, a novel attention mechanism that incorporates personalization information and domain-specific classifiers that solves the problem efficiently. Our architecture is designed to efficiently accommodate new domains that appear in-between full model retraining cycles with a rapid bootstrapping mechanism two orders of magnitude faster than retraining. We account for practical constraints in real-time production systems, and design to minimize memory footprint and runtime latency. We demonstrate that incorporating personalization results in significantly more accurate domain classification in the setting with thousands of overlapping domains.
Tasks
Published 2018-04-22
URL http://arxiv.org/abs/1804.08065v1
PDF http://arxiv.org/pdf/1804.08065v1.pdf
PWC https://paperswithcode.com/paper/efficient-large-scale-domain-classification
Repo
Framework

Training and Prediction Data Discrepancies: Challenges of Text Classification with Noisy, Historical Data

Title Training and Prediction Data Discrepancies: Challenges of Text Classification with Noisy, Historical Data
Authors Emilia Apostolova, R. Andrew Kreek
Abstract Industry datasets used for text classification are rarely created for that purpose. In most cases, the data and target predictions are a by-product of accumulated historical data, typically fraught with noise, present in both the text-based document, as well as in the targeted labels. In this work, we address the question of how well performance metrics computed on noisy, historical data reflect the performance on the intended future machine learning model input. The results demonstrate the utility of dirty training datasets used to build prediction models for cleaner (and different) prediction inputs.
Tasks Text Classification
Published 2018-09-11
URL http://arxiv.org/abs/1809.04019v1
PDF http://arxiv.org/pdf/1809.04019v1.pdf
PWC https://paperswithcode.com/paper/training-and-prediction-data-discrepancies
Repo
Framework

Characterizing and Avoiding Negative Transfer

Title Characterizing and Avoiding Negative Transfer
Authors Zirui Wang, Zihang Dai, Barnabás Póczos, Jaime Carbonell
Abstract When labeled data is scarce for a specific target task, transfer learning often offers an effective solution by utilizing data from a related source task. However, when transferring knowledge from a less related source, it may inversely hurt the target performance, a phenomenon known as negative transfer. Despite its pervasiveness, negative transfer is usually described in an informal manner, lacking rigorous definition, careful analysis, or systematic treatment. This paper proposes a formal definition of negative transfer and analyzes three important aspects thereof. Stemming from this analysis, a novel technique is proposed to circumvent negative transfer by filtering out unrelated source data. Based on adversarial networks, the technique is highly generic and can be applied to a wide range of transfer learning algorithms. The proposed approach is evaluated on six state-of-the-art deep transfer methods via experiments on four benchmark datasets with varying levels of difficulty. Empirically, the proposed method consistently improves the performance of all baseline methods and largely avoids negative transfer, even when the source data is degenerate.
Tasks Transfer Learning
Published 2018-11-24
URL https://arxiv.org/abs/1811.09751v4
PDF https://arxiv.org/pdf/1811.09751v4.pdf
PWC https://paperswithcode.com/paper/characterizing-and-avoiding-negative-transfer
Repo
Framework

Analysis of Noisy Evolutionary Optimization When Sampling Fails

Title Analysis of Noisy Evolutionary Optimization When Sampling Fails
Authors Chao Qian, Chao Bian, Yang Yu, Ke Tang, Xin Yao
Abstract In noisy evolutionary optimization, sampling is a common strategy to deal with noise. By the sampling strategy, the fitness of a solution is evaluated multiple times (called \emph{sample size}) independently, and its true fitness is then approximated by the average of these evaluations. Previous studies on sampling are mainly empirical. In this paper, we first investigate the effect of sample size from a theoretical perspective. By analyzing the (1+1)-EA on the noisy LeadingOnes problem, we show that as the sample size increases, the running time can reduce from exponential to polynomial, but then return to exponential. This suggests that a proper sample size is crucial in practice. Then, we investigate what strategies can work when sampling with any fixed sample size fails. By two illustrative examples, we prove that using parent or offspring populations can be better. Finally, we construct an artificial noisy example to show that when using neither sampling nor populations is effective, adaptive sampling (i.e., sampling with an adaptive sample size) can work. This, for the first time, provides a theoretical support for the use of adaptive sampling.
Tasks
Published 2018-10-11
URL http://arxiv.org/abs/1810.05045v1
PDF http://arxiv.org/pdf/1810.05045v1.pdf
PWC https://paperswithcode.com/paper/analysis-of-noisy-evolutionary-optimization
Repo
Framework

Right for the Right Reason: Training Agnostic Networks

Title Right for the Right Reason: Training Agnostic Networks
Authors Sen Jia, Thomas Lansdall-Welfare, Nello Cristianini
Abstract We consider the problem of a neural network being requested to classify images (or other inputs) without making implicit use of a “protected concept”, that is a concept that should not play any role in the decision of the network. Typically these concepts include information such as gender or race, or other contextual information such as image backgrounds that might be implicitly reflected in unknown correlations with other variables, making it insufficient to simply remove them from the input features. In other words, making accurate predictions is not good enough if those predictions rely on information that should not be used: predictive performance is not the only important metric for learning systems. We apply a method developed in the context of domain adaptation to address this problem of “being right for the right reason”, where we request a classifier to make a decision in a way that is entirely ‘agnostic’ to a given protected concept (e.g. gender, race, background etc.), even if this could be implicitly reflected in other attributes via unknown correlations. After defining the concept of an ‘agnostic model’, we demonstrate how the Domain-Adversarial Neural Network can remove unwanted information from a model using a gradient reversal layer.
Tasks Domain Adaptation
Published 2018-06-16
URL http://arxiv.org/abs/1806.06296v1
PDF http://arxiv.org/pdf/1806.06296v1.pdf
PWC https://paperswithcode.com/paper/right-for-the-right-reason-training-agnostic
Repo
Framework

Assessing Crosslingual Discourse Relations in Machine Translation

Title Assessing Crosslingual Discourse Relations in Machine Translation
Authors Karin Sim Smith, Lucia Specia
Abstract In an attempt to improve overall translation quality, there has been an increasing focus on integrating more linguistic elements into Machine Translation (MT). While significant progress has been achieved, especially recently with neural models, automatically evaluating the output of such systems is still an open problem. Current practice in MT evaluation relies on a single reference translation, even though there are many ways of translating a particular text, and it tends to disregard higher level information such as discourse. We propose a novel approach that assesses the translated output based on the source text rather than the reference translation, and measures the extent to which the semantics of the discourse elements (discourse relations, in particular) in the source text are preserved in the MT output. The challenge is to detect the discourse relations in the source text and determine whether these relations are correctly transferred crosslingually to the target language – without a reference translation. This methodology could be used independently for discourse-level evaluation, or as a component in other metrics, at a time where substantial amounts of MT are online and would benefit from evaluation where the source text serves as a benchmark.
Tasks Machine Translation
Published 2018-10-07
URL http://arxiv.org/abs/1810.03148v1
PDF http://arxiv.org/pdf/1810.03148v1.pdf
PWC https://paperswithcode.com/paper/assessing-crosslingual-discourse-relations-in
Repo
Framework

An Adaptive Approach for Automated Grapevine Phenotyping using VGG-based Convolutional Neural Networks

Title An Adaptive Approach for Automated Grapevine Phenotyping using VGG-based Convolutional Neural Networks
Authors Jonatan Grimm, Katja Herzog, Florian Rist, Anna Kicherer, Reinhard Töpfer, Volker Steinhage
Abstract In (grapevine) breeding programs and research, periodic phenotyping and multi-year monitoring of different grapevine traits, like growth or yield, is needed especially in the field. This demand imply objective, precise and automated methods using sensors and adaptive software. This work presents a proof-of-concept analyzing RGB images of different growth stages of grapevines with the aim to detect and quantify promising plant organs which are related to yield. The input images are segmented by a Fully Convolutional Neural Network (FCN) into object and background pixels. The objects are plant organs like young shoots, pedicels, flower buds or grapes, which are principally suitable for yield estimation. In the ground truth of the training images, each object is separately annotated as a connected segment of object pixels, which enables end-to-end learning of the object features. Based on the CNN-based segmentation, the number of objects is determined by detecting and counting connected components of object pixels using region labeling. In an evaluation on six different data sets, the system achieves an IoU of up to 87.3% for the segmentation and an F1 score of up to 88.6% for the object detection.
Tasks Object Detection
Published 2018-11-23
URL http://arxiv.org/abs/1811.09561v2
PDF http://arxiv.org/pdf/1811.09561v2.pdf
PWC https://paperswithcode.com/paper/an-adaptive-approach-for-automated-grapevine
Repo
Framework

Combining Natural Gradient with Hessian Free Methods for Sequence Training

Title Combining Natural Gradient with Hessian Free Methods for Sequence Training
Authors Adnan Haider, P. C. Woodland
Abstract This paper presents a new optimisation approach to train Deep Neural Networks (DNNs) with discriminative sequence criteria. At each iteration, the method combines information from the Natural Gradient (NG) direction with local curvature information of the error surface that enables better paths on the parameter manifold to be traversed. The method is derived using an alternative derivation of Taylor’s theorem using the concepts of manifolds, tangent vectors and directional derivatives from the perspective of Information Geometry. The efficacy of the method is shown within a Hessian Free (HF) style optimisation framework to sequence train both standard fully-connected DNNs and Time Delay Neural Networks as speech recognition acoustic models. It is shown that for the same number of updates the proposed approach achieves larger reductions in the word error rate (WER) than both NG and HF, and also leads to a lower WER than standard stochastic gradient descent. The paper also addresses the issue of over-fitting due to mismatch between training criterion and Word Error Rate (WER) that primarily arises during sequence training of ReLU-DNN models.
Tasks Speech Recognition
Published 2018-10-03
URL http://arxiv.org/abs/1810.01873v1
PDF http://arxiv.org/pdf/1810.01873v1.pdf
PWC https://paperswithcode.com/paper/combining-natural-gradient-with-hessian-free
Repo
Framework

Factorized Distillation: Training Holistic Person Re-identification Model by Distilling an Ensemble of Partial ReID Models

Title Factorized Distillation: Training Holistic Person Re-identification Model by Distilling an Ensemble of Partial ReID Models
Authors Pengyuan Ren, Jianmin Li
Abstract Person re-identification (ReID) is aimed at identifying the same person across videos captured from different cameras. In the view that networks extracting global features using ordinary network architectures are difficult to extract local features due to their weak attention mechanisms, researchers have proposed a lot of elaborately designed ReID networks, while greatly improving the accuracy, the model size and the feature extraction latency are also soaring. We argue that a relatively compact ordinary network extracting globally pooled features has the capability to extract discriminative local features and can achieve state-of-the-art precision if only the model’s parameters are properly learnt. In order to reduce the difficulty in learning hard identity labels, we propose a novel knowledge distillation method: Factorized Distillation, which factorizes both feature maps and retrieval features of holistic ReID network to mimic representations of multiple partial ReID models, thus transferring the knowledge from partial ReID models to the holistic network. Experiments show that the performance of model trained with the proposed method can outperform state-of-the-art with relatively few network parameters.
Tasks Person Re-Identification
Published 2018-11-20
URL http://arxiv.org/abs/1811.08073v1
PDF http://arxiv.org/pdf/1811.08073v1.pdf
PWC https://paperswithcode.com/paper/factorized-distillation-training-holistic
Repo
Framework

Abstaining Classification When Error Costs are Unequal and Unknown

Title Abstaining Classification When Error Costs are Unequal and Unknown
Authors Hongjiao Guan, Yingtao Zhang, H. D. Cheng, Xianglong Tang
Abstract Abstaining classificaiton aims to reject to classify the easily misclassified examples, so it is an effective approach to increase the clasificaiton reliability and reduce the misclassification risk in the cost-sensitive applications. In such applications, different types of errors (false positive or false negative) usaully have unequal costs. And the error costs, which depend on specific applications, are usually unknown. However, current abstaining classification methods either do not distinguish the error types, or they need the cost information of misclassification and rejection, which are realized in the framework of cost-sensitive learning. In this paper, we propose a bounded-abstention method with two constraints of reject rates (BA2), which performs abstaining classification when error costs are unequal and unknown. BA2 aims to obtain the optimal area under the ROC curve (AUC) by constraining the reject rates of the positive and negative classes respectively. Specifically, we construct the receiver operating characteristic (ROC) curve, and stepwise search the optimal reject thresholds from both ends of the curve, untill the two constraints are satisfied. Experimental results show that BA2 obtains higher AUC and lower total cost than the state-of-the-art abstaining classification methods. Meanwhile, BA2 achieves controllable reject rates of the positive and negative classes.
Tasks
Published 2018-06-09
URL http://arxiv.org/abs/1806.03445v2
PDF http://arxiv.org/pdf/1806.03445v2.pdf
PWC https://paperswithcode.com/paper/abstaining-classification-when-error-costs
Repo
Framework

DeePathology: Deep Multi-Task Learning for Inferring Molecular Pathology from Cancer Transcriptome

Title DeePathology: Deep Multi-Task Learning for Inferring Molecular Pathology from Cancer Transcriptome
Authors Behrooz Azarkhalili, Ali Saberi, Hamidreza Chitsaz, Ali Sharifi-Zarchi
Abstract Despite great advances, molecular cancer pathology is often limited to the use of a small number of biomarkers rather than the whole transcriptome, partly due to computational challenges. Here, we introduce a novel architecture of Deep Neural Networks (DNNs) that is capable of simultaneous inference of various properties of biological samples, through multi-task and transfer learning. It encodes the whole transcription profile into a strikingly low-dimensional latent vector of size 8, and then recovers mRNA and miRNA expression profiles, tissue and disease type from this vector. This latent space is significantly better than the original gene expression profiles for discriminating samples based on their tissue and disease. We employed this architecture on mRNA transcription profiles of 10787 clinical samples from 34 classes (one healthy and 33 different types of cancer) from 27 tissues. Our method significantly outperforms prior works and classical machine learning approaches in predicting tissue-of-origin, normal or disease state and cancer type of each sample. For tissues with more than one type of cancer, it reaches 99.4% accuracy in identifying the correct cancer subtype. We also show this system is very robust against noise and missing values. Collectively, our results highlight applications of artificial intelligence in molecular cancer pathology and oncological research. DeePathology is freely available at \url{https://github.com/SharifBioinf/DeePathology}.
Tasks Multi-Task Learning, Transfer Learning
Published 2018-08-07
URL https://arxiv.org/abs/1808.02237v2
PDF https://arxiv.org/pdf/1808.02237v2.pdf
PWC https://paperswithcode.com/paper/inferring-molecular-pathology-and-micro-rna
Repo
Framework

Improved Adaptive Brovey as a New Method for Image Fusion

Title Improved Adaptive Brovey as a New Method for Image Fusion
Authors Hamid Reza Shahdoosti
Abstract An ideal fusion method preserves the Spectral information in fused image and adds spatial information to it with no spectral distortion. Among the existing fusion algorithms, the contourlet-based fusion method is the most frequently discussed one in recent publications, because the contourlet has the ability to capture and link the point of discontinuities to form a linear structure. The Brovey is a popular pan-sharpening method owing to its efficiency and high spatial resolution. This method can be explained by mathematical model of optical remote sensing sensors. This study presents a new fusion approach that integrates the advantages of both the Brovey and the cotourlet techniques to reduce the color distortion of fusion results. Visual and statistical analyzes show that the proposed algorithm clearly improves the merging quality in terms of: correlation coefficient, ERGAS, UIQI, and Q4; compared to fusion methods including IHS, PCA, Adaptive IHS, and Improved Adaptive PCA.
Tasks
Published 2018-07-24
URL http://arxiv.org/abs/1807.09610v1
PDF http://arxiv.org/pdf/1807.09610v1.pdf
PWC https://paperswithcode.com/paper/improved-adaptive-brovey-as-a-new-method-for
Repo
Framework

Towards Language Agnostic Universal Representations

Title Towards Language Agnostic Universal Representations
Authors Armen Aghajanyan, Xia Song, Saurabh Tiwary
Abstract When a bilingual student learns to solve word problems in math, we expect the student to be able to solve these problem in both languages the student is fluent in,even if the math lessons were only taught in one language. However, current representations in machine learning are language dependent. In this work, we present a method to decouple the language from the problem by learning language agnostic representations and therefore allowing training a model in one language and applying to a different one in a zero shot fashion. We learn these representations by taking inspiration from linguistics and formalizing Universal Grammar as an optimization process (Chomsky, 2014; Montague, 1970). We demonstrate the capabilities of these representations by showing that the models trained on a single language using language agnostic representations achieve very similar accuracies in other languages.
Tasks
Published 2018-09-23
URL http://arxiv.org/abs/1809.08510v1
PDF http://arxiv.org/pdf/1809.08510v1.pdf
PWC https://paperswithcode.com/paper/towards-language-agnostic-universal
Repo
Framework

Deep feature compression for collaborative object detection

Title Deep feature compression for collaborative object detection
Authors Hyomin Choi, Ivan V. Bajic
Abstract Recent studies have shown that the efficiency of deep neural networks in mobile applications can be significantly improved by distributing the computational workload between the mobile device and the cloud. This paradigm, termed collaborative intelligence, involves communicating feature data between the mobile and the cloud. The efficiency of such approach can be further improved by lossy compression of feature data, which has not been examined to date. In this work we focus on collaborative object detection and study the impact of both near-lossless and lossy compression of feature data on its accuracy. We also propose a strategy for improving the accuracy under lossy feature compression. Experiments indicate that using this strategy, the communication overhead can be reduced by up to 70% without sacrificing accuracy.
Tasks Object Detection
Published 2018-02-12
URL http://arxiv.org/abs/1802.03931v1
PDF http://arxiv.org/pdf/1802.03931v1.pdf
PWC https://paperswithcode.com/paper/deep-feature-compression-for-collaborative
Repo
Framework

Hierarchical Neural Network Architecture In Keyword Spotting

Title Hierarchical Neural Network Architecture In Keyword Spotting
Authors Yixiao Qu, Sihao Xue, Zhenyi Ying, Hang Zhou, Jue Sun
Abstract Keyword Spotting (KWS) provides the start signal of ASR problem, and thus it is essential to ensure a high recall rate. However, its real-time property requires low computation complexity. This contradiction inspires people to find a suitable model which is small enough to perform well in multi environments. To deal with this contradiction, we implement the Hierarchical Neural Network(HNN), which is proved to be effective in many speech recognition problems. HNN outperforms traditional DNN and CNN even though its model size and computation complexity are slightly less. Also, its simple topology structure makes easy to deploy on any device.
Tasks Keyword Spotting, Speech Recognition
Published 2018-11-06
URL http://arxiv.org/abs/1811.02320v1
PDF http://arxiv.org/pdf/1811.02320v1.pdf
PWC https://paperswithcode.com/paper/hierarchical-neural-network-architecture-in
Repo
Framework
comments powered by Disqus