Paper Group ANR 100
Efficient Large-Scale Domain Classification with Personalized Attention. Training and Prediction Data Discrepancies: Challenges of Text Classification with Noisy, Historical Data. Characterizing and Avoiding Negative Transfer. Analysis of Noisy Evolutionary Optimization When Sampling Fails. Right for the Right Reason: Training Agnostic Networks. As …
Efficient Large-Scale Domain Classification with Personalized Attention
Title | Efficient Large-Scale Domain Classification with Personalized Attention |
Authors | Young-Bum Kim, Dongchan Kim, Anjishnu Kumar, Ruhi Sarikaya |
Abstract | In this paper, we explore the task of mapping spoken language utterances to one of thousands of natural language understanding domains in intelligent personal digital assistants (IPDAs). This scenario is observed for many mainstream IPDAs in industry that allow third parties to develop thousands of new domains to augment built-in ones to rapidly increase domain coverage and overall IPDA capabilities. We propose a scalable neural model architecture with a shared encoder, a novel attention mechanism that incorporates personalization information and domain-specific classifiers that solves the problem efficiently. Our architecture is designed to efficiently accommodate new domains that appear in-between full model retraining cycles with a rapid bootstrapping mechanism two orders of magnitude faster than retraining. We account for practical constraints in real-time production systems, and design to minimize memory footprint and runtime latency. We demonstrate that incorporating personalization results in significantly more accurate domain classification in the setting with thousands of overlapping domains. |
Tasks | |
Published | 2018-04-22 |
URL | http://arxiv.org/abs/1804.08065v1 |
http://arxiv.org/pdf/1804.08065v1.pdf | |
PWC | https://paperswithcode.com/paper/efficient-large-scale-domain-classification |
Repo | |
Framework | |
Training and Prediction Data Discrepancies: Challenges of Text Classification with Noisy, Historical Data
Title | Training and Prediction Data Discrepancies: Challenges of Text Classification with Noisy, Historical Data |
Authors | Emilia Apostolova, R. Andrew Kreek |
Abstract | Industry datasets used for text classification are rarely created for that purpose. In most cases, the data and target predictions are a by-product of accumulated historical data, typically fraught with noise, present in both the text-based document, as well as in the targeted labels. In this work, we address the question of how well performance metrics computed on noisy, historical data reflect the performance on the intended future machine learning model input. The results demonstrate the utility of dirty training datasets used to build prediction models for cleaner (and different) prediction inputs. |
Tasks | Text Classification |
Published | 2018-09-11 |
URL | http://arxiv.org/abs/1809.04019v1 |
http://arxiv.org/pdf/1809.04019v1.pdf | |
PWC | https://paperswithcode.com/paper/training-and-prediction-data-discrepancies |
Repo | |
Framework | |
Characterizing and Avoiding Negative Transfer
Title | Characterizing and Avoiding Negative Transfer |
Authors | Zirui Wang, Zihang Dai, Barnabás Póczos, Jaime Carbonell |
Abstract | When labeled data is scarce for a specific target task, transfer learning often offers an effective solution by utilizing data from a related source task. However, when transferring knowledge from a less related source, it may inversely hurt the target performance, a phenomenon known as negative transfer. Despite its pervasiveness, negative transfer is usually described in an informal manner, lacking rigorous definition, careful analysis, or systematic treatment. This paper proposes a formal definition of negative transfer and analyzes three important aspects thereof. Stemming from this analysis, a novel technique is proposed to circumvent negative transfer by filtering out unrelated source data. Based on adversarial networks, the technique is highly generic and can be applied to a wide range of transfer learning algorithms. The proposed approach is evaluated on six state-of-the-art deep transfer methods via experiments on four benchmark datasets with varying levels of difficulty. Empirically, the proposed method consistently improves the performance of all baseline methods and largely avoids negative transfer, even when the source data is degenerate. |
Tasks | Transfer Learning |
Published | 2018-11-24 |
URL | https://arxiv.org/abs/1811.09751v4 |
https://arxiv.org/pdf/1811.09751v4.pdf | |
PWC | https://paperswithcode.com/paper/characterizing-and-avoiding-negative-transfer |
Repo | |
Framework | |
Analysis of Noisy Evolutionary Optimization When Sampling Fails
Title | Analysis of Noisy Evolutionary Optimization When Sampling Fails |
Authors | Chao Qian, Chao Bian, Yang Yu, Ke Tang, Xin Yao |
Abstract | In noisy evolutionary optimization, sampling is a common strategy to deal with noise. By the sampling strategy, the fitness of a solution is evaluated multiple times (called \emph{sample size}) independently, and its true fitness is then approximated by the average of these evaluations. Previous studies on sampling are mainly empirical. In this paper, we first investigate the effect of sample size from a theoretical perspective. By analyzing the (1+1)-EA on the noisy LeadingOnes problem, we show that as the sample size increases, the running time can reduce from exponential to polynomial, but then return to exponential. This suggests that a proper sample size is crucial in practice. Then, we investigate what strategies can work when sampling with any fixed sample size fails. By two illustrative examples, we prove that using parent or offspring populations can be better. Finally, we construct an artificial noisy example to show that when using neither sampling nor populations is effective, adaptive sampling (i.e., sampling with an adaptive sample size) can work. This, for the first time, provides a theoretical support for the use of adaptive sampling. |
Tasks | |
Published | 2018-10-11 |
URL | http://arxiv.org/abs/1810.05045v1 |
http://arxiv.org/pdf/1810.05045v1.pdf | |
PWC | https://paperswithcode.com/paper/analysis-of-noisy-evolutionary-optimization |
Repo | |
Framework | |
Right for the Right Reason: Training Agnostic Networks
Title | Right for the Right Reason: Training Agnostic Networks |
Authors | Sen Jia, Thomas Lansdall-Welfare, Nello Cristianini |
Abstract | We consider the problem of a neural network being requested to classify images (or other inputs) without making implicit use of a “protected concept”, that is a concept that should not play any role in the decision of the network. Typically these concepts include information such as gender or race, or other contextual information such as image backgrounds that might be implicitly reflected in unknown correlations with other variables, making it insufficient to simply remove them from the input features. In other words, making accurate predictions is not good enough if those predictions rely on information that should not be used: predictive performance is not the only important metric for learning systems. We apply a method developed in the context of domain adaptation to address this problem of “being right for the right reason”, where we request a classifier to make a decision in a way that is entirely ‘agnostic’ to a given protected concept (e.g. gender, race, background etc.), even if this could be implicitly reflected in other attributes via unknown correlations. After defining the concept of an ‘agnostic model’, we demonstrate how the Domain-Adversarial Neural Network can remove unwanted information from a model using a gradient reversal layer. |
Tasks | Domain Adaptation |
Published | 2018-06-16 |
URL | http://arxiv.org/abs/1806.06296v1 |
http://arxiv.org/pdf/1806.06296v1.pdf | |
PWC | https://paperswithcode.com/paper/right-for-the-right-reason-training-agnostic |
Repo | |
Framework | |
Assessing Crosslingual Discourse Relations in Machine Translation
Title | Assessing Crosslingual Discourse Relations in Machine Translation |
Authors | Karin Sim Smith, Lucia Specia |
Abstract | In an attempt to improve overall translation quality, there has been an increasing focus on integrating more linguistic elements into Machine Translation (MT). While significant progress has been achieved, especially recently with neural models, automatically evaluating the output of such systems is still an open problem. Current practice in MT evaluation relies on a single reference translation, even though there are many ways of translating a particular text, and it tends to disregard higher level information such as discourse. We propose a novel approach that assesses the translated output based on the source text rather than the reference translation, and measures the extent to which the semantics of the discourse elements (discourse relations, in particular) in the source text are preserved in the MT output. The challenge is to detect the discourse relations in the source text and determine whether these relations are correctly transferred crosslingually to the target language – without a reference translation. This methodology could be used independently for discourse-level evaluation, or as a component in other metrics, at a time where substantial amounts of MT are online and would benefit from evaluation where the source text serves as a benchmark. |
Tasks | Machine Translation |
Published | 2018-10-07 |
URL | http://arxiv.org/abs/1810.03148v1 |
http://arxiv.org/pdf/1810.03148v1.pdf | |
PWC | https://paperswithcode.com/paper/assessing-crosslingual-discourse-relations-in |
Repo | |
Framework | |
An Adaptive Approach for Automated Grapevine Phenotyping using VGG-based Convolutional Neural Networks
Title | An Adaptive Approach for Automated Grapevine Phenotyping using VGG-based Convolutional Neural Networks |
Authors | Jonatan Grimm, Katja Herzog, Florian Rist, Anna Kicherer, Reinhard Töpfer, Volker Steinhage |
Abstract | In (grapevine) breeding programs and research, periodic phenotyping and multi-year monitoring of different grapevine traits, like growth or yield, is needed especially in the field. This demand imply objective, precise and automated methods using sensors and adaptive software. This work presents a proof-of-concept analyzing RGB images of different growth stages of grapevines with the aim to detect and quantify promising plant organs which are related to yield. The input images are segmented by a Fully Convolutional Neural Network (FCN) into object and background pixels. The objects are plant organs like young shoots, pedicels, flower buds or grapes, which are principally suitable for yield estimation. In the ground truth of the training images, each object is separately annotated as a connected segment of object pixels, which enables end-to-end learning of the object features. Based on the CNN-based segmentation, the number of objects is determined by detecting and counting connected components of object pixels using region labeling. In an evaluation on six different data sets, the system achieves an IoU of up to 87.3% for the segmentation and an F1 score of up to 88.6% for the object detection. |
Tasks | Object Detection |
Published | 2018-11-23 |
URL | http://arxiv.org/abs/1811.09561v2 |
http://arxiv.org/pdf/1811.09561v2.pdf | |
PWC | https://paperswithcode.com/paper/an-adaptive-approach-for-automated-grapevine |
Repo | |
Framework | |
Combining Natural Gradient with Hessian Free Methods for Sequence Training
Title | Combining Natural Gradient with Hessian Free Methods for Sequence Training |
Authors | Adnan Haider, P. C. Woodland |
Abstract | This paper presents a new optimisation approach to train Deep Neural Networks (DNNs) with discriminative sequence criteria. At each iteration, the method combines information from the Natural Gradient (NG) direction with local curvature information of the error surface that enables better paths on the parameter manifold to be traversed. The method is derived using an alternative derivation of Taylor’s theorem using the concepts of manifolds, tangent vectors and directional derivatives from the perspective of Information Geometry. The efficacy of the method is shown within a Hessian Free (HF) style optimisation framework to sequence train both standard fully-connected DNNs and Time Delay Neural Networks as speech recognition acoustic models. It is shown that for the same number of updates the proposed approach achieves larger reductions in the word error rate (WER) than both NG and HF, and also leads to a lower WER than standard stochastic gradient descent. The paper also addresses the issue of over-fitting due to mismatch between training criterion and Word Error Rate (WER) that primarily arises during sequence training of ReLU-DNN models. |
Tasks | Speech Recognition |
Published | 2018-10-03 |
URL | http://arxiv.org/abs/1810.01873v1 |
http://arxiv.org/pdf/1810.01873v1.pdf | |
PWC | https://paperswithcode.com/paper/combining-natural-gradient-with-hessian-free |
Repo | |
Framework | |
Factorized Distillation: Training Holistic Person Re-identification Model by Distilling an Ensemble of Partial ReID Models
Title | Factorized Distillation: Training Holistic Person Re-identification Model by Distilling an Ensemble of Partial ReID Models |
Authors | Pengyuan Ren, Jianmin Li |
Abstract | Person re-identification (ReID) is aimed at identifying the same person across videos captured from different cameras. In the view that networks extracting global features using ordinary network architectures are difficult to extract local features due to their weak attention mechanisms, researchers have proposed a lot of elaborately designed ReID networks, while greatly improving the accuracy, the model size and the feature extraction latency are also soaring. We argue that a relatively compact ordinary network extracting globally pooled features has the capability to extract discriminative local features and can achieve state-of-the-art precision if only the model’s parameters are properly learnt. In order to reduce the difficulty in learning hard identity labels, we propose a novel knowledge distillation method: Factorized Distillation, which factorizes both feature maps and retrieval features of holistic ReID network to mimic representations of multiple partial ReID models, thus transferring the knowledge from partial ReID models to the holistic network. Experiments show that the performance of model trained with the proposed method can outperform state-of-the-art with relatively few network parameters. |
Tasks | Person Re-Identification |
Published | 2018-11-20 |
URL | http://arxiv.org/abs/1811.08073v1 |
http://arxiv.org/pdf/1811.08073v1.pdf | |
PWC | https://paperswithcode.com/paper/factorized-distillation-training-holistic |
Repo | |
Framework | |
Abstaining Classification When Error Costs are Unequal and Unknown
Title | Abstaining Classification When Error Costs are Unequal and Unknown |
Authors | Hongjiao Guan, Yingtao Zhang, H. D. Cheng, Xianglong Tang |
Abstract | Abstaining classificaiton aims to reject to classify the easily misclassified examples, so it is an effective approach to increase the clasificaiton reliability and reduce the misclassification risk in the cost-sensitive applications. In such applications, different types of errors (false positive or false negative) usaully have unequal costs. And the error costs, which depend on specific applications, are usually unknown. However, current abstaining classification methods either do not distinguish the error types, or they need the cost information of misclassification and rejection, which are realized in the framework of cost-sensitive learning. In this paper, we propose a bounded-abstention method with two constraints of reject rates (BA2), which performs abstaining classification when error costs are unequal and unknown. BA2 aims to obtain the optimal area under the ROC curve (AUC) by constraining the reject rates of the positive and negative classes respectively. Specifically, we construct the receiver operating characteristic (ROC) curve, and stepwise search the optimal reject thresholds from both ends of the curve, untill the two constraints are satisfied. Experimental results show that BA2 obtains higher AUC and lower total cost than the state-of-the-art abstaining classification methods. Meanwhile, BA2 achieves controllable reject rates of the positive and negative classes. |
Tasks | |
Published | 2018-06-09 |
URL | http://arxiv.org/abs/1806.03445v2 |
http://arxiv.org/pdf/1806.03445v2.pdf | |
PWC | https://paperswithcode.com/paper/abstaining-classification-when-error-costs |
Repo | |
Framework | |
DeePathology: Deep Multi-Task Learning for Inferring Molecular Pathology from Cancer Transcriptome
Title | DeePathology: Deep Multi-Task Learning for Inferring Molecular Pathology from Cancer Transcriptome |
Authors | Behrooz Azarkhalili, Ali Saberi, Hamidreza Chitsaz, Ali Sharifi-Zarchi |
Abstract | Despite great advances, molecular cancer pathology is often limited to the use of a small number of biomarkers rather than the whole transcriptome, partly due to computational challenges. Here, we introduce a novel architecture of Deep Neural Networks (DNNs) that is capable of simultaneous inference of various properties of biological samples, through multi-task and transfer learning. It encodes the whole transcription profile into a strikingly low-dimensional latent vector of size 8, and then recovers mRNA and miRNA expression profiles, tissue and disease type from this vector. This latent space is significantly better than the original gene expression profiles for discriminating samples based on their tissue and disease. We employed this architecture on mRNA transcription profiles of 10787 clinical samples from 34 classes (one healthy and 33 different types of cancer) from 27 tissues. Our method significantly outperforms prior works and classical machine learning approaches in predicting tissue-of-origin, normal or disease state and cancer type of each sample. For tissues with more than one type of cancer, it reaches 99.4% accuracy in identifying the correct cancer subtype. We also show this system is very robust against noise and missing values. Collectively, our results highlight applications of artificial intelligence in molecular cancer pathology and oncological research. DeePathology is freely available at \url{https://github.com/SharifBioinf/DeePathology}. |
Tasks | Multi-Task Learning, Transfer Learning |
Published | 2018-08-07 |
URL | https://arxiv.org/abs/1808.02237v2 |
https://arxiv.org/pdf/1808.02237v2.pdf | |
PWC | https://paperswithcode.com/paper/inferring-molecular-pathology-and-micro-rna |
Repo | |
Framework | |
Improved Adaptive Brovey as a New Method for Image Fusion
Title | Improved Adaptive Brovey as a New Method for Image Fusion |
Authors | Hamid Reza Shahdoosti |
Abstract | An ideal fusion method preserves the Spectral information in fused image and adds spatial information to it with no spectral distortion. Among the existing fusion algorithms, the contourlet-based fusion method is the most frequently discussed one in recent publications, because the contourlet has the ability to capture and link the point of discontinuities to form a linear structure. The Brovey is a popular pan-sharpening method owing to its efficiency and high spatial resolution. This method can be explained by mathematical model of optical remote sensing sensors. This study presents a new fusion approach that integrates the advantages of both the Brovey and the cotourlet techniques to reduce the color distortion of fusion results. Visual and statistical analyzes show that the proposed algorithm clearly improves the merging quality in terms of: correlation coefficient, ERGAS, UIQI, and Q4; compared to fusion methods including IHS, PCA, Adaptive IHS, and Improved Adaptive PCA. |
Tasks | |
Published | 2018-07-24 |
URL | http://arxiv.org/abs/1807.09610v1 |
http://arxiv.org/pdf/1807.09610v1.pdf | |
PWC | https://paperswithcode.com/paper/improved-adaptive-brovey-as-a-new-method-for |
Repo | |
Framework | |
Towards Language Agnostic Universal Representations
Title | Towards Language Agnostic Universal Representations |
Authors | Armen Aghajanyan, Xia Song, Saurabh Tiwary |
Abstract | When a bilingual student learns to solve word problems in math, we expect the student to be able to solve these problem in both languages the student is fluent in,even if the math lessons were only taught in one language. However, current representations in machine learning are language dependent. In this work, we present a method to decouple the language from the problem by learning language agnostic representations and therefore allowing training a model in one language and applying to a different one in a zero shot fashion. We learn these representations by taking inspiration from linguistics and formalizing Universal Grammar as an optimization process (Chomsky, 2014; Montague, 1970). We demonstrate the capabilities of these representations by showing that the models trained on a single language using language agnostic representations achieve very similar accuracies in other languages. |
Tasks | |
Published | 2018-09-23 |
URL | http://arxiv.org/abs/1809.08510v1 |
http://arxiv.org/pdf/1809.08510v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-language-agnostic-universal |
Repo | |
Framework | |
Deep feature compression for collaborative object detection
Title | Deep feature compression for collaborative object detection |
Authors | Hyomin Choi, Ivan V. Bajic |
Abstract | Recent studies have shown that the efficiency of deep neural networks in mobile applications can be significantly improved by distributing the computational workload between the mobile device and the cloud. This paradigm, termed collaborative intelligence, involves communicating feature data between the mobile and the cloud. The efficiency of such approach can be further improved by lossy compression of feature data, which has not been examined to date. In this work we focus on collaborative object detection and study the impact of both near-lossless and lossy compression of feature data on its accuracy. We also propose a strategy for improving the accuracy under lossy feature compression. Experiments indicate that using this strategy, the communication overhead can be reduced by up to 70% without sacrificing accuracy. |
Tasks | Object Detection |
Published | 2018-02-12 |
URL | http://arxiv.org/abs/1802.03931v1 |
http://arxiv.org/pdf/1802.03931v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-feature-compression-for-collaborative |
Repo | |
Framework | |
Hierarchical Neural Network Architecture In Keyword Spotting
Title | Hierarchical Neural Network Architecture In Keyword Spotting |
Authors | Yixiao Qu, Sihao Xue, Zhenyi Ying, Hang Zhou, Jue Sun |
Abstract | Keyword Spotting (KWS) provides the start signal of ASR problem, and thus it is essential to ensure a high recall rate. However, its real-time property requires low computation complexity. This contradiction inspires people to find a suitable model which is small enough to perform well in multi environments. To deal with this contradiction, we implement the Hierarchical Neural Network(HNN), which is proved to be effective in many speech recognition problems. HNN outperforms traditional DNN and CNN even though its model size and computation complexity are slightly less. Also, its simple topology structure makes easy to deploy on any device. |
Tasks | Keyword Spotting, Speech Recognition |
Published | 2018-11-06 |
URL | http://arxiv.org/abs/1811.02320v1 |
http://arxiv.org/pdf/1811.02320v1.pdf | |
PWC | https://paperswithcode.com/paper/hierarchical-neural-network-architecture-in |
Repo | |
Framework | |