April 1, 2020

3438 words 17 mins read

Paper Group ANR 503

SuperNet – An efficient method of neural networks ensembling. SLEIPNIR: Deterministic and Provably Accurate Feature Expansion for Gaussian Process Regression with Derivatives. Analysis of the quotation corpus of the Russian Wiktionary. Is the Meta-Learning Idea Able to Improve the Generalization of Deep Neural Networks on the Standard Supervised L …

SuperNet – An efficient method of neural networks ensembling


Title	SuperNet – An efficient method of neural networks ensembling
Authors	Ludwik Bukowski, Witold Dzwinel
Abstract	The main flaw of neural network ensembling is that it is exceptionally demanding computationally, especially, if the individual sub-models are large neural networks, which must be trained separately. Having in mind that modern DNNs can be very accurate, they are already the huge ensembles of simple classifiers, and that one can construct more thrifty compressed neural net of a similar performance for any ensemble, the idea of designing the expensive SuperNets can be questionable. The widespread belief that ensembling increases the prediction time, makes it not attractive and can be the reason that the main stream of ML research is directed towards developing better loss functions and learning strategies for more advanced and efficient neural networks. On the other hand, all these factors make the architectures more complex what may lead to overfitting and high computational complexity, that is, to the same flaws for which the highly parametrized SuperNets ensembles are blamed. The goal of the master thesis is to speed up the execution time required for ensemble generation. Instead of training K inaccurate sub-models, each of them can represent various phases of training (representing various local minima of the loss function) of a single DNN [Huang et al., 2017; Gripov et al., 2018]. Thus, the computational performance of the SuperNet can be comparable to the maximum CPU time spent on training its single sub-model, plus usually much shorter CPU time required for training the SuperNet coupling factors.
Tasks
Published	2020-03-29
URL	https://arxiv.org/abs/2003.13021v1
PDF	https://arxiv.org/pdf/2003.13021v1.pdf
PWC	https://paperswithcode.com/paper/supernet-an-efficient-method-of-neural
Repo
Framework

SLEIPNIR: Deterministic and Provably Accurate Feature Expansion for Gaussian Process Regression with Derivatives


Title	SLEIPNIR: Deterministic and Provably Accurate Feature Expansion for Gaussian Process Regression with Derivatives
Authors	Emmanouil Angelis, Philippe Wenk, Bernhard Schölkopf, Stefan Bauer, Andreas Krause
Abstract	Gaussian processes are an important regression tool with excellent analytic properties which allow for direct integration of derivative observations. However, vanilla GP methods scale cubically in the amount of observations. In this work, we propose a novel approach for scaling GP regression with derivatives based on quadrature Fourier features. We then prove deterministic, non-asymptotic and exponentially fast decaying error bounds which apply for both the approximated kernel as well as the approximated posterior. To furthermore illustrate the practical applicability of our method, we then apply it to ODIN, a recently developed algorithm for ODE parameter inference. In an extensive experiments section, all results are empirically validated, demonstrating the speed, accuracy, and practical applicability of this approach.
Tasks	Gaussian Processes
Published	2020-03-05
URL	https://arxiv.org/abs/2003.02658v1
PDF	https://arxiv.org/pdf/2003.02658v1.pdf
PWC	https://paperswithcode.com/paper/sleipnir-deterministic-and-provably-accurate
Repo
Framework

Analysis of the quotation corpus of the Russian Wiktionary


Title	Analysis of the quotation corpus of the Russian Wiktionary
Authors	A. Smirnov, T. Levashova, A. Karpov, I. Kipyatkova, A. Ronzhin, A. Krizhanovsky, N. Krizhanovsky
Abstract	The quantitative evaluation of quotations in the Russian Wiktionary was performed using the developed Wiktionary parser. It was found that the number of quotations in the dictionary is growing fast (51.5 thousands in 2011, 62 thousands in 2012). These quotations were extracted and saved in the relational database of a machine-readable dictionary. For this database, tables related to the quotations were designed. A histogram of distribution of quotations of literary works written in different years was built. It was made an attempt to explain the characteristics of the histogram by associating it with the years of the most popular and cited (in the Russian Wiktionary) writers of the nineteenth century. It was found that more than one-third of all the quotations (the example sentences) contained in the Russian Wiktionary are taken by the editors of a Wiktionary entry from the Russian National Corpus.
Tasks
Published	2020-01-20
URL	https://arxiv.org/abs/2002.00734v1
PDF	https://arxiv.org/pdf/2002.00734v1.pdf
PWC	https://paperswithcode.com/paper/analysis-of-the-quotation-corpus-of-the
Repo
Framework

Is the Meta-Learning Idea Able to Improve the Generalization of Deep Neural Networks on the Standard Supervised Learning?


Title	Is the Meta-Learning Idea Able to Improve the Generalization of Deep Neural Networks on the Standard Supervised Learning?
Authors	Xiang Deng, Zhongfei Zhang
Abstract	Substantial efforts have been made on improving the generalization abilities of deep neural networks (DNNs) in order to obtain better performances without introducing more parameters. On the other hand, meta-learning approaches exhibit powerful generalization on new tasks in few-shot learning. Intuitively, few-shot learning is more challenging than the standard supervised learning as each target class only has a very few or no training samples. The natural question that arises is whether the meta-learning idea can be used for improving the generalization of DNNs on the standard supervised learning. In this paper, we propose a novel meta-learning based training procedure (MLTP) for DNNs and demonstrate that the meta-learning idea can indeed improve the generalization abilities of DNNs. MLTP simulates the meta-training process by considering a batch of training samples as a task. The key idea is that the gradient descent step for improving the current task performance should also improve a new task performance, which is ignored by the current standard procedure for training neural networks. MLTP also benefits from all the existing training techniques such as dropout, weight decay, and batch normalization. We evaluate MLTP by training a variety of small and large neural networks on three benchmark datasets, i.e., CIFAR-10, CIFAR-100, and Tiny ImageNet. The experimental results show a consistently improved generalization performance on all the DNNs with different sizes, which verifies the promise of MLTP and demonstrates that the meta-learning idea is indeed able to improve the generalization of DNNs on the standard supervised learning.
Tasks	Few-Shot Learning, Meta-Learning
Published	2020-02-27
URL	https://arxiv.org/abs/2002.12455v1
PDF	https://arxiv.org/pdf/2002.12455v1.pdf
PWC	https://paperswithcode.com/paper/is-the-meta-learning-idea-able-to-improve-the
Repo
Framework

Exemplar Normalization for Learning Deep Representation


Title	Exemplar Normalization for Learning Deep Representation
Authors	Ruimao Zhang, Zhanglin Peng, Lingyun Wu, Zhen Li, Ping Luo
Abstract	Normalization techniques are important in different advanced neural networks and different tasks. This work investigates a novel dynamic learning-to-normalize (L2N) problem by proposing Exemplar Normalization (EN), which is able to learn different normalization methods for different convolutional layers and image samples of a deep network. EN significantly improves flexibility of the recently proposed switchable normalization (SN), which solves a static L2N problem by linearly combining several normalizers in each normalization layer (the combination is the same for all samples). Instead of directly employing a multi-layer perceptron (MLP) to learn data-dependent parameters as conditional batch normalization (cBN) did, the internal architecture of EN is carefully designed to stabilize its optimization, leading to many appealing benefits. (1) EN enables different convolutional layers, image samples, categories, benchmarks, and tasks to use different normalization methods, shedding light on analyzing them in a holistic view. (2) EN is effective for various network architectures and tasks. (3) It could replace any normalization layers in a deep network and still produce stable model training. Extensive experiments demonstrate the effectiveness of EN in a wide spectrum of tasks including image recognition, noisy label learning, and semantic segmentation. For example, by replacing BN in the ordinary ResNet50, improvement produced by EN is 300% more than that of SN on both ImageNet and the noisy WebVision dataset.
Tasks	Semantic Segmentation
Published	2020-03-19
URL	https://arxiv.org/abs/2003.08761v2
PDF	https://arxiv.org/pdf/2003.08761v2.pdf
PWC	https://paperswithcode.com/paper/exemplar-normalization-for-learning-deep
Repo
Framework

Meta-learning for mixed linear regression


Title	Meta-learning for mixed linear regression
Authors	Weihao Kong, Raghav Somani, Zhao Song, Sham Kakade, Sewoong Oh
Abstract	In modern supervised learning, there are a large number of tasks, but many of them are associated with only a small amount of labeled data. These include data from medical image processing and robotic interaction. Even though each individual task cannot be meaningfully trained in isolation, one seeks to meta-learn across the tasks from past experiences by exploiting some similarities. We study a fundamental question of interest: When can abundant tasks with small data compensate for lack of tasks with big data? We focus on a canonical scenario where each task is drawn from a mixture of $k$ linear regressions, and identify sufficient conditions for such a graceful exchange to hold; The total number of examples necessary with only small data tasks scales similarly as when big data tasks are available. To this end, we introduce a novel spectral approach and show that we can efficiently utilize small data tasks with the help of $\tilde\Omega(k^{3/2})$ medium data tasks each with $\tilde\Omega(k^{1/2})$ examples.
Tasks	Meta-Learning
Published	2020-02-20
URL	https://arxiv.org/abs/2002.08936v1
PDF	https://arxiv.org/pdf/2002.08936v1.pdf
PWC	https://paperswithcode.com/paper/meta-learning-for-mixed-linear-regression
Repo
Framework

A Structured Prediction Approach for Conditional Meta-Learning


Title	A Structured Prediction Approach for Conditional Meta-Learning
Authors	Ruohan Wang, Yiannis Demiris, Carlo Ciliberto
Abstract	Optimization-based meta-learning algorithms are a powerful class of methods for learning-to-learn applications such as few-shot learning. They tackle the limited availability of training data by leveraging the experience gained from previously observed tasks. However, when the complexity of the tasks distribution cannot be captured by a single set of shared meta-parameters, existing methods may fail to fully adapt to a target task. We address this issue with a novel perspective on conditional meta-learning based on structured prediction. We propose task-adaptive structured meta-learning (TASML), a principled estimator that weighs meta-training data conditioned on the target task to design tailored meta-learning objectives. In addition, we introduce algorithmic improvements to tackle key computational limitations of existing methods. Experimentally, we show that TASML outperforms state-of-the-art methods on benchmark datasets both in terms of accuracy and efficiency. An ablation study quantifies the individual contribution of model components and suggests useful practices for meta-learning.
Tasks	Few-Shot Learning, Meta-Learning, Structured Prediction
Published	2020-02-20
URL	https://arxiv.org/abs/2002.08799v1
PDF	https://arxiv.org/pdf/2002.08799v1.pdf
PWC	https://paperswithcode.com/paper/a-structured-prediction-approach-for-2
Repo
Framework

Discoverability in Satellite Imagery: A Good Sentence is Worth a Thousand Pictures


Title	Discoverability in Satellite Imagery: A Good Sentence is Worth a Thousand Pictures
Authors	David Noever, Wes Regian, Matt Ciolino, Josh Kalin, Dom Hambrick, Kaye Blankenship
Abstract	Small satellite constellations provide daily global coverage of the earth’s landmass, but image enrichment relies on automating key tasks like change detection or feature searches. For example, to extract text annotations from raw pixels requires two dependent machine learning models, one to analyze the overhead image and the other to generate a descriptive caption. We evaluate seven models on the previously largest benchmark for satellite image captions. We extend the labeled image samples five-fold, then augment, correct and prune the vocabulary to approach a rough min-max (minimum word, maximum description). This outcome compares favorably to previous work with large pre-trained image models but offers a hundred-fold reduction in model size without sacrificing overall accuracy (when measured with log entropy loss). These smaller models provide new deployment opportunities, particularly when pushed to edge processors, on-board satellites, or distributed ground stations. To quantify a caption’s descriptiveness, we introduce a novel multi-class confusion or error matrix to score both human-labeled test data and never-labeled images that include bounding box detection but lack full sentence captions. This work suggests future captioning strategies, particularly ones that can enrich the class coverage beyond land use applications and that lessen color-centered and adjacency adjectives (“green”, “near”, “between”, etc.). Many modern language transformers present novel and exploitable models with world knowledge gleaned from training from their vast online corpus. One interesting, but easy example might learn the word association between wind and waves, thus enriching a beach scene with more than just color descriptions that otherwise might be accessed from raw pixels without text annotation.
Tasks	Image Captioning
Published	2020-01-03
URL	https://arxiv.org/abs/2001.05839v1
PDF	https://arxiv.org/pdf/2001.05839v1.pdf
PWC	https://paperswithcode.com/paper/discoverability-in-satellite-imagery-a-good
Repo
Framework

ARAACOM: ARAbic Algerian Corpus for Opinion Mining


Title	ARAACOM: ARAbic Algerian Corpus for Opinion Mining
Authors	Zitouni Abdelhafid, Hichem Rahab, Abdelhafid Zitouni, Mahieddine Djoudi
Abstract	Nowadays, it is no more needed to do an enormous effort to distribute a lot of forms to thousands of people and collect them, then convert this from into electronic format to track people opinion about some subjects. A lot of web sites can today reach a large spectrum with less effort. The majority of web sites suggest to their visitors to leave backups about their feeling of the site or events. So, this makes for us a lot of data which need powerful mean to exploit. Opinion mining in the web becomes more and more an attracting task, due the increasing need for individuals and societies to track the mood of people against several subjects of daily life (sports, politics, television,…). A lot of works in opinion mining was developed in western languages especially English, such works in Arabic language still very scarce. In this paper, we propose our approach, for opinion mining in Arabic Algerian news paper. CCS CONCEPTS $\bullet$Information systems~Sentiment analysis $\bullet$ Computing methodologies~Natural language processing
Tasks	Opinion Mining, Sentiment Analysis
Published	2020-01-22
URL	https://arxiv.org/abs/2001.08010v1
PDF	https://arxiv.org/pdf/2001.08010v1.pdf
PWC	https://paperswithcode.com/paper/araacom-arabic-algerian-corpus-for-opinion
Repo
Framework

Online Joint Bid/Daily Budget Optimization of Internet Advertising Campaigns


Title	Online Joint Bid/Daily Budget Optimization of Internet Advertising Campaigns
Authors	Alessandro Nuara, Francesco Trovò, Nicola Gatti, Marcello Restelli
Abstract	Pay-per-click advertising includes various formats (\emph{e.g.}, search, contextual, social) with a total investment of more than 200 billion USD per year worldwide. An advertiser is given a daily budget to allocate over several, even thousands, campaigns, mainly distinguishing for the ad, target, or channel. Furthermore, publishers choose the ads to display and how to allocate them employing auctioning mechanisms, in which every day the advertisers set for each campaign a bid corresponding to the maximum amount of money per click they are willing to pay and the fraction of the daily budget to invest. In this paper, we study the problem of automating the online joint bid/daily budget optimization of pay-per-click advertising campaigns over multiple channels. We formulate our problem as a combinatorial semi-bandit problem, which requires solving a special case of the Multiple-Choice Knapsack problem every day. Furthermore, for every campaign, we capture the dependency of the number of clicks on the bid and daily budget by Gaussian Processes, thus requiring mild assumptions on the regularity of these functions. We design four algorithms and show that they suffer from a regret that is upper bounded with high probability as O(sqrt{T}), where T is the time horizon of the learning process. We experimentally evaluate our algorithms with synthetic settings generated from real data from Yahoo!, and we present the results of the adoption of our algorithms in a real-world application with a daily average spent of 1,000 Euros for more than one year.
Tasks	Gaussian Processes
Published	2020-03-03
URL	https://arxiv.org/abs/2003.01452v1
PDF	https://arxiv.org/pdf/2003.01452v1.pdf
PWC	https://paperswithcode.com/paper/online-joint-biddaily-budget-optimization-of
Repo
Framework

Context-Aware Domain Adaptation in Semantic Segmentation


Title	Context-Aware Domain Adaptation in Semantic Segmentation
Authors	Jinyu Yang, Weizhi An, Chaochao Yan, Peilin Zhao, Junzhou Huang
Abstract	In this paper, we consider the problem of unsupervised domain adaptation in the semantic segmentation. There are two primary issues in this field, i.e., what and how to transfer domain knowledge across two domains. Existing methods mainly focus on adapting domain-invariant features (what to transfer) through adversarial learning (how to transfer). Context dependency is essential for semantic segmentation, however, its transferability is still not well understood. Furthermore, how to transfer contextual information across two domains remains unexplored. Motivated by this, we propose a cross-attention mechanism based on self-attention to capture context dependencies between two domains and adapt transferable context. To achieve this goal, we design two cross-domain attention modules to adapt context dependencies from both spatial and channel views. Specifically, the spatial attention module captures local feature dependencies between each position in the source and target image. The channel attention module models semantic dependencies between each pair of cross-domain channel maps. To adapt context dependencies, we further selectively aggregate the context information from two domains. The superiority of our method over existing state-of-the-art methods is empirically proved on “GTA5 to Cityscapes” and “SYNTHIA to Cityscapes”.
Tasks	Domain Adaptation, Semantic Segmentation, Unsupervised Domain Adaptation
Published	2020-03-09
URL	https://arxiv.org/abs/2003.04010v1
PDF	https://arxiv.org/pdf/2003.04010v1.pdf
PWC	https://paperswithcode.com/paper/context-aware-domain-adaptation-in-semantic
Repo
Framework

Domain Adaptation with Conditional Distribution Matching and Generalized Label Shift


Title	Domain Adaptation with Conditional Distribution Matching and Generalized Label Shift
Authors	Remi Tachet des Combes, Han Zhao, Yu-Xiang Wang, Geoff Gordon
Abstract	Adversarial learning has demonstrated good performance in the unsupervised domain adaptation setting, by learning domain-invariant representations that perform well on the source domain. However, recent work has underlined limitations of existing methods in the presence of mismatched label distributions between the source and target domains. In this paper, we extend a recent upper-bound on the performance of adversarial domain adaptation to multi-class classification and more general discriminators. We then propose generalized label shift (GLS) as a way to improve robustness against mismatched label distributions. GLS states that, conditioned on the label, there exists a representation of the input that is invariant between the source and target domains. Under GLS, we provide theoretical guarantees on the transfer performance of any classifier. We also devise necessary and sufficient conditions for GLS to hold. The conditions are based on the estimation of the relative class weights between domains and on an appropriate reweighting of samples. Guided by our theoretical insights, we modify three widely used algorithms, JAN, DANN and CDAN and evaluate their performance on standard domain adaptation tasks where our method outperforms the base versions. We also demonstrate significant gains on artificially created tasks with large divergences between their source and target label distributions.
Tasks	Domain Adaptation, Unsupervised Domain Adaptation
Published	2020-03-10
URL	https://arxiv.org/abs/2003.04475v1
PDF	https://arxiv.org/pdf/2003.04475v1.pdf
PWC	https://paperswithcode.com/paper/domain-adaptation-with-conditional
Repo
Framework

Visual Commonsense R-CNN


Title	Visual Commonsense R-CNN
Authors	Tan Wang, Jianqiang Huang, Hanwang Zhang, Qianru Sun
Abstract	We present a novel unsupervised feature representation learning method, Visual Commonsense Region-based Convolutional Neural Network (VC R-CNN), to serve as an improved visual region encoder for high-level tasks such as captioning and VQA. Given a set of detected object regions in an image (e.g., using Faster R-CNN), like any other unsupervised feature learning methods (e.g., word2vec), the proxy training objective of VC R-CNN is to predict the contextual objects of a region. However, they are fundamentally different: the prediction of VC R-CNN is by using causal intervention: P(Ydo(X)), while others are by using the conventional likelihood: P(YX). This is also the core reason why VC R-CNN can learn “sense-making” knowledge like chair can be sat — while not just “common” co-occurrences such as chair is likely to exist if table is observed. We extensively apply VC R-CNN features in prevailing models of three popular tasks: Image Captioning, VQA, and VCR, and observe consistent performance boosts across them, achieving many new state-of-the-arts. Code and feature are available at https://github.com/Wangt-CN/VC-R-CNN.
Tasks	Image Captioning, Representation Learning, Visual Question Answering
Published	2020-02-27
URL	https://arxiv.org/abs/2002.12204v2
PDF	https://arxiv.org/pdf/2002.12204v2.pdf
PWC	https://paperswithcode.com/paper/visual-commonsense-r-cnn
Repo
Framework

An Advance on Variable Elimination with Applications to Tensor-Based Computation


Title	An Advance on Variable Elimination with Applications to Tensor-Based Computation
Authors	Adnan Darwiche
Abstract	We present new results on the classical algorithm of variable elimination, which underlies many algorithms including for probabilistic inference. The results relate to exploiting functional dependencies, allowing one to perform inference and learning efficiently on models that have very large treewidth. The highlight of the advance is that it works with standard (dense) factors, without the need for sparse factors or techniques based on knowledge compilation that are commonly utilized. This is significant as it permits a direct implementation of the improved variable elimination algorithm using tensors and their operations, leading to extremely efficient implementations especially when learning model parameters. Moreover, the proposed technique does not require knowledge of the specific functional dependencies, only that they exist, so can be used when learning these dependencies. We illustrate the efficacy of our proposed algorithm by compiling Bayesian network queries into tensor graphs and then learning their parameters from labeled data using a standard tool for tensor computation.
Tasks
Published	2020-02-21
URL	https://arxiv.org/abs/2002.09320v1
PDF	https://arxiv.org/pdf/2002.09320v1.pdf
PWC	https://paperswithcode.com/paper/an-advance-on-variable-elimination-with
Repo
Framework

Measuring and improving the quality of visual explanations


Title	Measuring and improving the quality of visual explanations
Authors	Agnieszka Grabska-Barwińska
Abstract	The ability of to explain neural network decisions goes hand in hand with their safe deployment. Several methods have been proposed to highlight features important for a given network decision. However, there is no consensus on how to measure effectiveness of these methods. We propose a new procedure for evaluating explanations. We use it to investigate visual explanations extracted from a range of possible sources in a neural network. We quantify the benefit of combining these sources and challenge a recent appeal for taking bias parameters into account. We support our conclusions with a general assessment of the impact of bias parameters in ImageNet classifiers
Tasks
Published	2020-03-14
URL	https://arxiv.org/abs/2003.08774v2
PDF	https://arxiv.org/pdf/2003.08774v2.pdf
PWC	https://paperswithcode.com/paper/measuring-and-improving-the-quality-of-visual
Repo
Framework