October 20, 2019

3087 words 15 mins read

Paper Group ANR 100

Efficient Large-Scale Domain Classification with Personalized Attention. Training and Prediction Data Discrepancies: Challenges of Text Classification with Noisy, Historical Data. Characterizing and Avoiding Negative Transfer. Analysis of Noisy Evolutionary Optimization When Sampling Fails. Right for the Right Reason: Training Agnostic Networks. As …

Efficient Large-Scale Domain Classification with Personalized Attention


Title	Efficient Large-Scale Domain Classification with Personalized Attention
Authors	Young-Bum Kim, Dongchan Kim, Anjishnu Kumar, Ruhi Sarikaya
Abstract	In this paper, we explore the task of mapping spoken language utterances to one of thousands of natural language understanding domains in intelligent personal digital assistants (IPDAs). This scenario is observed for many mainstream IPDAs in industry that allow third parties to develop thousands of new domains to augment built-in ones to rapidly increase domain coverage and overall IPDA capabilities. We propose a scalable neural model architecture with a shared encoder, a novel attention mechanism that incorporates personalization information and domain-specific classifiers that solves the problem efficiently. Our architecture is designed to efficiently accommodate new domains that appear in-between full model retraining cycles with a rapid bootstrapping mechanism two orders of magnitude faster than retraining. We account for practical constraints in real-time production systems, and design to minimize memory footprint and runtime latency. We demonstrate that incorporating personalization results in significantly more accurate domain classification in the setting with thousands of overlapping domains.
Tasks
Published	2018-04-22
URL	http://arxiv.org/abs/1804.08065v1
PDF	http://arxiv.org/pdf/1804.08065v1.pdf
PWC	https://paperswithcode.com/paper/efficient-large-scale-domain-classification
Repo
Framework

Training and Prediction Data Discrepancies: Challenges of Text Classification with Noisy, Historical Data


Title	Training and Prediction Data Discrepancies: Challenges of Text Classification with Noisy, Historical Data
Authors	Emilia Apostolova, R. Andrew Kreek
Abstract	Industry datasets used for text classification are rarely created for that purpose. In most cases, the data and target predictions are a by-product of accumulated historical data, typically fraught with noise, present in both the text-based document, as well as in the targeted labels. In this work, we address the question of how well performance metrics computed on noisy, historical data reflect the performance on the intended future machine learning model input. The results demonstrate the utility of dirty training datasets used to build prediction models for cleaner (and different) prediction inputs.
Tasks	Text Classification
Published	2018-09-11
URL	http://arxiv.org/abs/1809.04019v1
PDF	http://arxiv.org/pdf/1809.04019v1.pdf
PWC	https://paperswithcode.com/paper/training-and-prediction-data-discrepancies
Repo
Framework

Characterizing and Avoiding Negative Transfer


Title	Characterizing and Avoiding Negative Transfer
Authors	Zirui Wang, Zihang Dai, Barnabás Póczos, Jaime Carbonell
Abstract	When labeled data is scarce for a specific target task, transfer learning often offers an effective solution by utilizing data from a related source task. However, when transferring knowledge from a less related source, it may inversely hurt the target performance, a phenomenon known as negative transfer. Despite its pervasiveness, negative transfer is usually described in an informal manner, lacking rigorous definition, careful analysis, or systematic treatment. This paper proposes a formal definition of negative transfer and analyzes three important aspects thereof. Stemming from this analysis, a novel technique is proposed to circumvent negative transfer by filtering out unrelated source data. Based on adversarial networks, the technique is highly generic and can be applied to a wide range of transfer learning algorithms. The proposed approach is evaluated on six state-of-the-art deep transfer methods via experiments on four benchmark datasets with varying levels of difficulty. Empirically, the proposed method consistently improves the performance of all baseline methods and largely avoids negative transfer, even when the source data is degenerate.
Tasks	Transfer Learning
Published	2018-11-24
URL	https://arxiv.org/abs/1811.09751v4
PDF	https://arxiv.org/pdf/1811.09751v4.pdf
PWC	https://paperswithcode.com/paper/characterizing-and-avoiding-negative-transfer
Repo
Framework

Analysis of Noisy Evolutionary Optimization When Sampling Fails


Title	Analysis of Noisy Evolutionary Optimization When Sampling Fails
Authors	Chao Qian, Chao Bian, Yang Yu, Ke Tang, Xin Yao
Abstract	In noisy evolutionary optimization, sampling is a common strategy to deal with noise. By the sampling strategy, the fitness of a solution is evaluated multiple times (called \emph{sample size}) independently, and its true fitness is then approximated by the average of these evaluations. Previous studies on sampling are mainly empirical. In this paper, we first investigate the effect of sample size from a theoretical perspective. By analyzing the (1+1)-EA on the noisy LeadingOnes problem, we show that as the sample size increases, the running time can reduce from exponential to polynomial, but then return to exponential. This suggests that a proper sample size is crucial in practice. Then, we investigate what strategies can work when sampling with any fixed sample size fails. By two illustrative examples, we prove that using parent or offspring populations can be better. Finally, we construct an artificial noisy example to show that when using neither sampling nor populations is effective, adaptive sampling (i.e., sampling with an adaptive sample size) can work. This, for the first time, provides a theoretical support for the use of adaptive sampling.
Tasks
Published	2018-10-11
URL	http://arxiv.org/abs/1810.05045v1
PDF	http://arxiv.org/pdf/1810.05045v1.pdf
PWC	https://paperswithcode.com/paper/analysis-of-noisy-evolutionary-optimization
Repo
Framework

Right for the Right Reason: Training Agnostic Networks


Title	Right for the Right Reason: Training Agnostic Networks
Authors	Sen Jia, Thomas Lansdall-Welfare, Nello Cristianini
Abstract	We consider the problem of a neural network being requested to classify images (or other inputs) without making implicit use of a “protected concept”, that is a concept that should not play any role in the decision of the network. Typically these concepts include information such as gender or race, or other contextual information such as image backgrounds that might be implicitly reflected in unknown correlations with other variables, making it insufficient to simply remove them from the input features. In other words, making accurate predictions is not good enough if those predictions rely on information that should not be used: predictive performance is not the only important metric for learning systems. We apply a method developed in the context of domain adaptation to address this problem of “being right for the right reason”, where we request a classifier to make a decision in a way that is entirely ‘agnostic’ to a given protected concept (e.g. gender, race, background etc.), even if this could be implicitly reflected in other attributes via unknown correlations. After defining the concept of an ‘agnostic model’, we demonstrate how the Domain-Adversarial Neural Network can remove unwanted information from a model using a gradient reversal layer.
Tasks	Domain Adaptation
Published	2018-06-16
URL	http://arxiv.org/abs/1806.06296v1
PDF	http://arxiv.org/pdf/1806.06296v1.pdf
PWC	https://paperswithcode.com/paper/right-for-the-right-reason-training-agnostic
Repo
Framework

Assessing Crosslingual Discourse Relations in Machine Translation


Title	Assessing Crosslingual Discourse Relations in Machine Translation
Authors	Karin Sim Smith, Lucia Specia
Abstract	In an attempt to improve overall translation quality, there has been an increasing focus on integrating more linguistic elements into Machine Translation (MT). While significant progress has been achieved, especially recently with neural models, automatically evaluating the output of such systems is still an open problem. Current practice in MT evaluation relies on a single reference translation, even though there are many ways of translating a particular text, and it tends to disregard higher level information such as discourse. We propose a novel approach that assesses the translated output based on the source text rather than the reference translation, and measures the extent to which the semantics of the discourse elements (discourse relations, in particular) in the source text are preserved in the MT output. The challenge is to detect the discourse relations in the source text and determine whether these relations are correctly transferred crosslingually to the target language – without a reference translation. This methodology could be used independently for discourse-level evaluation, or as a component in other metrics, at a time where substantial amounts of MT are online and would benefit from evaluation where the source text serves as a benchmark.
Tasks	Machine Translation
Published	2018-10-07
URL	http://arxiv.org/abs/1810.03148v1
PDF	http://arxiv.org/pdf/1810.03148v1.pdf
PWC	https://paperswithcode.com/paper/assessing-crosslingual-discourse-relations-in
Repo
Framework

An Adaptive Approach for Automated Grapevine Phenotyping using VGG-based Convolutional Neural Networks


Title	An Adaptive Approach for Automated Grapevine Phenotyping using VGG-based Convolutional Neural Networks
Authors	Jonatan Grimm, Katja Herzog, Florian Rist, Anna Kicherer, Reinhard Töpfer, Volker Steinhage
Abstract	In (grapevine) breeding programs and research, periodic phenotyping and multi-year monitoring of different grapevine traits, like growth or yield, is needed especially in the field. This demand imply objective, precise and automated methods using sensors and adaptive software. This work presents a proof-of-concept analyzing RGB images of different growth stages of grapevines with the aim to detect and quantify promising plant organs which are related to yield. The input images are segmented by a Fully Convolutional Neural Network (FCN) into object and background pixels. The objects are plant organs like young shoots, pedicels, flower buds or grapes, which are principally suitable for yield estimation. In the ground truth of the training images, each object is separately annotated as a connected segment of object pixels, which enables end-to-end learning of the object features. Based on the CNN-based segmentation, the number of objects is determined by detecting and counting connected components of object pixels using region labeling. In an evaluation on six different data sets, the system achieves an IoU of up to 87.3% for the segmentation and an F1 score of up to 88.6% for the object detection.
Tasks	Object Detection
Published	2018-11-23
URL	http://arxiv.org/abs/1811.09561v2
PDF	http://arxiv.org/pdf/1811.09561v2.pdf
PWC	https://paperswithcode.com/paper/an-adaptive-approach-for-automated-grapevine
Repo
Framework

Combining Natural Gradient with Hessian Free Methods for Sequence Training


Title	Combining Natural Gradient with Hessian Free Methods for Sequence Training
Authors	Adnan Haider, P. C. Woodland
Abstract	This paper presents a new optimisation approach to train Deep Neural Networks (DNNs) with discriminative sequence criteria. At each iteration, the method combines information from the Natural Gradient (NG) direction with local curvature information of the error surface that enables better paths on the parameter manifold to be traversed. The method is derived using an alternative derivation of Taylor’s theorem using the concepts of manifolds, tangent vectors and directional derivatives from the perspective of Information Geometry. The efficacy of the method is shown within a Hessian Free (HF) style optimisation framework to sequence train both standard fully-connected DNNs and Time Delay Neural Networks as speech recognition acoustic models. It is shown that for the same number of updates the proposed approach achieves larger reductions in the word error rate (WER) than both NG and HF, and also leads to a lower WER than standard stochastic gradient descent. The paper also addresses the issue of over-fitting due to mismatch between training criterion and Word Error Rate (WER) that primarily arises during sequence training of ReLU-DNN models.
Tasks	Speech Recognition
Published	2018-10-03
URL	http://arxiv.org/abs/1810.01873v1
PDF	http://arxiv.org/pdf/1810.01873v1.pdf
PWC	https://paperswithcode.com/paper/combining-natural-gradient-with-hessian-free
Repo
Framework

Factorized Distillation: Training Holistic Person Re-identification Model by Distilling an Ensemble of Partial ReID Models


Title	Factorized Distillation: Training Holistic Person Re-identification Model by Distilling an Ensemble of Partial ReID Models
Authors	Pengyuan Ren, Jianmin Li
Abstract	Person re-identification (ReID) is aimed at identifying the same person across videos captured from different cameras. In the view that networks extracting global features using ordinary network architectures are difficult to extract local features due to their weak attention mechanisms, researchers have proposed a lot of elaborately designed ReID networks, while greatly improving the accuracy, the model size and the feature extraction latency are also soaring. We argue that a relatively compact ordinary network extracting globally pooled features has the capability to extract discriminative local features and can achieve state-of-the-art precision if only the model’s parameters are properly learnt. In order to reduce the difficulty in learning hard identity labels, we propose a novel knowledge distillation method: Factorized Distillation, which factorizes both feature maps and retrieval features of holistic ReID network to mimic representations of multiple partial ReID models, thus transferring the knowledge from partial ReID models to the holistic network. Experiments show that the performance of model trained with the proposed method can outperform state-of-the-art with relatively few network parameters.
Tasks	Person Re-Identification
Published	2018-11-20
URL	http://arxiv.org/abs/1811.08073v1
PDF	http://arxiv.org/pdf/1811.08073v1.pdf
PWC	https://paperswithcode.com/paper/factorized-distillation-training-holistic
Repo
Framework

Abstaining Classification When Error Costs are Unequal and Unknown


Title	Abstaining Classification When Error Costs are Unequal and Unknown
Authors	Hongjiao Guan, Yingtao Zhang, H. D. Cheng, Xianglong Tang
Abstract	Abstaining classificaiton aims to reject to classify the easily misclassified examples, so it is an effective approach to increase the clasificaiton reliability and reduce the misclassification risk in the cost-sensitive applications. In such applications, different types of errors (false positive or false negative) usaully have unequal costs. And the error costs, which depend on specific applications, are usually unknown. However, current abstaining classification methods either do not distinguish the error types, or they need the cost information of misclassification and rejection, which are realized in the framework of cost-sensitive learning. In this paper, we propose a bounded-abstention method with two constraints of reject rates (BA2), which performs abstaining classification when error costs are unequal and unknown. BA2 aims to obtain the optimal area under the ROC curve (AUC) by constraining the reject rates of the positive and negative classes respectively. Specifically, we construct the receiver operating characteristic (ROC) curve, and stepwise search the optimal reject thresholds from both ends of the curve, untill the two constraints are satisfied. Experimental results show that BA2 obtains higher AUC and lower total cost than the state-of-the-art abstaining classification methods. Meanwhile, BA2 achieves controllable reject rates of the positive and negative classes.
Tasks
Published	2018-06-09
URL	http://arxiv.org/abs/1806.03445v2
PDF	http://arxiv.org/pdf/1806.03445v2.pdf
PWC	https://paperswithcode.com/paper/abstaining-classification-when-error-costs
Repo
Framework

DeePathology: Deep Multi-Task Learning for Inferring Molecular Pathology from Cancer Transcriptome


Title	DeePathology: Deep Multi-Task Learning for Inferring Molecular Pathology from Cancer Transcriptome
Authors	Behrooz Azarkhalili, Ali Saberi, Hamidreza Chitsaz, Ali Sharifi-Zarchi
Abstract	Despite great advances, molecular cancer pathology is often limited to the use of a small number of biomarkers rather than the whole transcriptome, partly due to computational challenges. Here, we introduce a novel architecture of Deep Neural Networks (DNNs) that is capable of simultaneous inference of various properties of biological samples, through multi-task and transfer learning. It encodes the whole transcription profile into a strikingly low-dimensional latent vector of size 8, and then recovers mRNA and miRNA expression profiles, tissue and disease type from this vector. This latent space is significantly better than the original gene expression profiles for discriminating samples based on their tissue and disease. We employed this architecture on mRNA transcription profiles of 10787 clinical samples from 34 classes (one healthy and 33 different types of cancer) from 27 tissues. Our method significantly outperforms prior works and classical machine learning approaches in predicting tissue-of-origin, normal or disease state and cancer type of each sample. For tissues with more than one type of cancer, it reaches 99.4% accuracy in identifying the correct cancer subtype. We also show this system is very robust against noise and missing values. Collectively, our results highlight applications of artificial intelligence in molecular cancer pathology and oncological research. DeePathology is freely available at \url{https://github.com/SharifBioinf/DeePathology}.
Tasks	Multi-Task Learning, Transfer Learning
Published	2018-08-07
URL	https://arxiv.org/abs/1808.02237v2
PDF	https://arxiv.org/pdf/1808.02237v2.pdf
PWC	https://paperswithcode.com/paper/inferring-molecular-pathology-and-micro-rna
Repo
Framework

Improved Adaptive Brovey as a New Method for Image Fusion


Title	Improved Adaptive Brovey as a New Method for Image Fusion
Authors	Hamid Reza Shahdoosti
Abstract	An ideal fusion method preserves the Spectral information in fused image and adds spatial information to it with no spectral distortion. Among the existing fusion algorithms, the contourlet-based fusion method is the most frequently discussed one in recent publications, because the contourlet has the ability to capture and link the point of discontinuities to form a linear structure. The Brovey is a popular pan-sharpening method owing to its efficiency and high spatial resolution. This method can be explained by mathematical model of optical remote sensing sensors. This study presents a new fusion approach that integrates the advantages of both the Brovey and the cotourlet techniques to reduce the color distortion of fusion results. Visual and statistical analyzes show that the proposed algorithm clearly improves the merging quality in terms of: correlation coefficient, ERGAS, UIQI, and Q4; compared to fusion methods including IHS, PCA, Adaptive IHS, and Improved Adaptive PCA.
Tasks
Published	2018-07-24
URL	http://arxiv.org/abs/1807.09610v1
PDF	http://arxiv.org/pdf/1807.09610v1.pdf
PWC	https://paperswithcode.com/paper/improved-adaptive-brovey-as-a-new-method-for
Repo
Framework

Towards Language Agnostic Universal Representations


Title	Towards Language Agnostic Universal Representations
Authors	Armen Aghajanyan, Xia Song, Saurabh Tiwary
Abstract	When a bilingual student learns to solve word problems in math, we expect the student to be able to solve these problem in both languages the student is fluent in,even if the math lessons were only taught in one language. However, current representations in machine learning are language dependent. In this work, we present a method to decouple the language from the problem by learning language agnostic representations and therefore allowing training a model in one language and applying to a different one in a zero shot fashion. We learn these representations by taking inspiration from linguistics and formalizing Universal Grammar as an optimization process (Chomsky, 2014; Montague, 1970). We demonstrate the capabilities of these representations by showing that the models trained on a single language using language agnostic representations achieve very similar accuracies in other languages.
Tasks
Published	2018-09-23
URL	http://arxiv.org/abs/1809.08510v1
PDF	http://arxiv.org/pdf/1809.08510v1.pdf
PWC	https://paperswithcode.com/paper/towards-language-agnostic-universal
Repo
Framework

Deep feature compression for collaborative object detection


Title	Deep feature compression for collaborative object detection
Authors	Hyomin Choi, Ivan V. Bajic
Abstract	Recent studies have shown that the efficiency of deep neural networks in mobile applications can be significantly improved by distributing the computational workload between the mobile device and the cloud. This paradigm, termed collaborative intelligence, involves communicating feature data between the mobile and the cloud. The efficiency of such approach can be further improved by lossy compression of feature data, which has not been examined to date. In this work we focus on collaborative object detection and study the impact of both near-lossless and lossy compression of feature data on its accuracy. We also propose a strategy for improving the accuracy under lossy feature compression. Experiments indicate that using this strategy, the communication overhead can be reduced by up to 70% without sacrificing accuracy.
Tasks	Object Detection
Published	2018-02-12
URL	http://arxiv.org/abs/1802.03931v1
PDF	http://arxiv.org/pdf/1802.03931v1.pdf
PWC	https://paperswithcode.com/paper/deep-feature-compression-for-collaborative
Repo
Framework

Hierarchical Neural Network Architecture In Keyword Spotting


Title	Hierarchical Neural Network Architecture In Keyword Spotting
Authors	Yixiao Qu, Sihao Xue, Zhenyi Ying, Hang Zhou, Jue Sun
Abstract	Keyword Spotting (KWS) provides the start signal of ASR problem, and thus it is essential to ensure a high recall rate. However, its real-time property requires low computation complexity. This contradiction inspires people to find a suitable model which is small enough to perform well in multi environments. To deal with this contradiction, we implement the Hierarchical Neural Network(HNN), which is proved to be effective in many speech recognition problems. HNN outperforms traditional DNN and CNN even though its model size and computation complexity are slightly less. Also, its simple topology structure makes easy to deploy on any device.
Tasks	Keyword Spotting, Speech Recognition
Published	2018-11-06
URL	http://arxiv.org/abs/1811.02320v1
PDF	http://arxiv.org/pdf/1811.02320v1.pdf
PWC	https://paperswithcode.com/paper/hierarchical-neural-network-architecture-in
Repo
Framework