Paper Group ANR 503
SuperNet – An efficient method of neural networks ensembling. SLEIPNIR: Deterministic and Provably Accurate Feature Expansion for Gaussian Process Regression with Derivatives. Analysis of the quotation corpus of the Russian Wiktionary. Is the Meta-Learning Idea Able to Improve the Generalization of Deep Neural Networks on the Standard Supervised L …
SuperNet – An efficient method of neural networks ensembling
Title | SuperNet – An efficient method of neural networks ensembling |
Authors | Ludwik Bukowski, Witold Dzwinel |
Abstract | The main flaw of neural network ensembling is that it is exceptionally demanding computationally, especially, if the individual sub-models are large neural networks, which must be trained separately. Having in mind that modern DNNs can be very accurate, they are already the huge ensembles of simple classifiers, and that one can construct more thrifty compressed neural net of a similar performance for any ensemble, the idea of designing the expensive SuperNets can be questionable. The widespread belief that ensembling increases the prediction time, makes it not attractive and can be the reason that the main stream of ML research is directed towards developing better loss functions and learning strategies for more advanced and efficient neural networks. On the other hand, all these factors make the architectures more complex what may lead to overfitting and high computational complexity, that is, to the same flaws for which the highly parametrized SuperNets ensembles are blamed. The goal of the master thesis is to speed up the execution time required for ensemble generation. Instead of training K inaccurate sub-models, each of them can represent various phases of training (representing various local minima of the loss function) of a single DNN [Huang et al., 2017; Gripov et al., 2018]. Thus, the computational performance of the SuperNet can be comparable to the maximum CPU time spent on training its single sub-model, plus usually much shorter CPU time required for training the SuperNet coupling factors. |
Tasks | |
Published | 2020-03-29 |
URL | https://arxiv.org/abs/2003.13021v1 |
https://arxiv.org/pdf/2003.13021v1.pdf | |
PWC | https://paperswithcode.com/paper/supernet-an-efficient-method-of-neural |
Repo | |
Framework | |
SLEIPNIR: Deterministic and Provably Accurate Feature Expansion for Gaussian Process Regression with Derivatives
Title | SLEIPNIR: Deterministic and Provably Accurate Feature Expansion for Gaussian Process Regression with Derivatives |
Authors | Emmanouil Angelis, Philippe Wenk, Bernhard Schölkopf, Stefan Bauer, Andreas Krause |
Abstract | Gaussian processes are an important regression tool with excellent analytic properties which allow for direct integration of derivative observations. However, vanilla GP methods scale cubically in the amount of observations. In this work, we propose a novel approach for scaling GP regression with derivatives based on quadrature Fourier features. We then prove deterministic, non-asymptotic and exponentially fast decaying error bounds which apply for both the approximated kernel as well as the approximated posterior. To furthermore illustrate the practical applicability of our method, we then apply it to ODIN, a recently developed algorithm for ODE parameter inference. In an extensive experiments section, all results are empirically validated, demonstrating the speed, accuracy, and practical applicability of this approach. |
Tasks | Gaussian Processes |
Published | 2020-03-05 |
URL | https://arxiv.org/abs/2003.02658v1 |
https://arxiv.org/pdf/2003.02658v1.pdf | |
PWC | https://paperswithcode.com/paper/sleipnir-deterministic-and-provably-accurate |
Repo | |
Framework | |
Analysis of the quotation corpus of the Russian Wiktionary
Title | Analysis of the quotation corpus of the Russian Wiktionary |
Authors | A. Smirnov, T. Levashova, A. Karpov, I. Kipyatkova, A. Ronzhin, A. Krizhanovsky, N. Krizhanovsky |
Abstract | The quantitative evaluation of quotations in the Russian Wiktionary was performed using the developed Wiktionary parser. It was found that the number of quotations in the dictionary is growing fast (51.5 thousands in 2011, 62 thousands in 2012). These quotations were extracted and saved in the relational database of a machine-readable dictionary. For this database, tables related to the quotations were designed. A histogram of distribution of quotations of literary works written in different years was built. It was made an attempt to explain the characteristics of the histogram by associating it with the years of the most popular and cited (in the Russian Wiktionary) writers of the nineteenth century. It was found that more than one-third of all the quotations (the example sentences) contained in the Russian Wiktionary are taken by the editors of a Wiktionary entry from the Russian National Corpus. |
Tasks | |
Published | 2020-01-20 |
URL | https://arxiv.org/abs/2002.00734v1 |
https://arxiv.org/pdf/2002.00734v1.pdf | |
PWC | https://paperswithcode.com/paper/analysis-of-the-quotation-corpus-of-the |
Repo | |
Framework | |
Is the Meta-Learning Idea Able to Improve the Generalization of Deep Neural Networks on the Standard Supervised Learning?
Title | Is the Meta-Learning Idea Able to Improve the Generalization of Deep Neural Networks on the Standard Supervised Learning? |
Authors | Xiang Deng, Zhongfei Zhang |
Abstract | Substantial efforts have been made on improving the generalization abilities of deep neural networks (DNNs) in order to obtain better performances without introducing more parameters. On the other hand, meta-learning approaches exhibit powerful generalization on new tasks in few-shot learning. Intuitively, few-shot learning is more challenging than the standard supervised learning as each target class only has a very few or no training samples. The natural question that arises is whether the meta-learning idea can be used for improving the generalization of DNNs on the standard supervised learning. In this paper, we propose a novel meta-learning based training procedure (MLTP) for DNNs and demonstrate that the meta-learning idea can indeed improve the generalization abilities of DNNs. MLTP simulates the meta-training process by considering a batch of training samples as a task. The key idea is that the gradient descent step for improving the current task performance should also improve a new task performance, which is ignored by the current standard procedure for training neural networks. MLTP also benefits from all the existing training techniques such as dropout, weight decay, and batch normalization. We evaluate MLTP by training a variety of small and large neural networks on three benchmark datasets, i.e., CIFAR-10, CIFAR-100, and Tiny ImageNet. The experimental results show a consistently improved generalization performance on all the DNNs with different sizes, which verifies the promise of MLTP and demonstrates that the meta-learning idea is indeed able to improve the generalization of DNNs on the standard supervised learning. |
Tasks | Few-Shot Learning, Meta-Learning |
Published | 2020-02-27 |
URL | https://arxiv.org/abs/2002.12455v1 |
https://arxiv.org/pdf/2002.12455v1.pdf | |
PWC | https://paperswithcode.com/paper/is-the-meta-learning-idea-able-to-improve-the |
Repo | |
Framework | |
Exemplar Normalization for Learning Deep Representation
Title | Exemplar Normalization for Learning Deep Representation |
Authors | Ruimao Zhang, Zhanglin Peng, Lingyun Wu, Zhen Li, Ping Luo |
Abstract | Normalization techniques are important in different advanced neural networks and different tasks. This work investigates a novel dynamic learning-to-normalize (L2N) problem by proposing Exemplar Normalization (EN), which is able to learn different normalization methods for different convolutional layers and image samples of a deep network. EN significantly improves flexibility of the recently proposed switchable normalization (SN), which solves a static L2N problem by linearly combining several normalizers in each normalization layer (the combination is the same for all samples). Instead of directly employing a multi-layer perceptron (MLP) to learn data-dependent parameters as conditional batch normalization (cBN) did, the internal architecture of EN is carefully designed to stabilize its optimization, leading to many appealing benefits. (1) EN enables different convolutional layers, image samples, categories, benchmarks, and tasks to use different normalization methods, shedding light on analyzing them in a holistic view. (2) EN is effective for various network architectures and tasks. (3) It could replace any normalization layers in a deep network and still produce stable model training. Extensive experiments demonstrate the effectiveness of EN in a wide spectrum of tasks including image recognition, noisy label learning, and semantic segmentation. For example, by replacing BN in the ordinary ResNet50, improvement produced by EN is 300% more than that of SN on both ImageNet and the noisy WebVision dataset. |
Tasks | Semantic Segmentation |
Published | 2020-03-19 |
URL | https://arxiv.org/abs/2003.08761v2 |
https://arxiv.org/pdf/2003.08761v2.pdf | |
PWC | https://paperswithcode.com/paper/exemplar-normalization-for-learning-deep |
Repo | |
Framework | |
Meta-learning for mixed linear regression
Title | Meta-learning for mixed linear regression |
Authors | Weihao Kong, Raghav Somani, Zhao Song, Sham Kakade, Sewoong Oh |
Abstract | In modern supervised learning, there are a large number of tasks, but many of them are associated with only a small amount of labeled data. These include data from medical image processing and robotic interaction. Even though each individual task cannot be meaningfully trained in isolation, one seeks to meta-learn across the tasks from past experiences by exploiting some similarities. We study a fundamental question of interest: When can abundant tasks with small data compensate for lack of tasks with big data? We focus on a canonical scenario where each task is drawn from a mixture of $k$ linear regressions, and identify sufficient conditions for such a graceful exchange to hold; The total number of examples necessary with only small data tasks scales similarly as when big data tasks are available. To this end, we introduce a novel spectral approach and show that we can efficiently utilize small data tasks with the help of $\tilde\Omega(k^{3/2})$ medium data tasks each with $\tilde\Omega(k^{1/2})$ examples. |
Tasks | Meta-Learning |
Published | 2020-02-20 |
URL | https://arxiv.org/abs/2002.08936v1 |
https://arxiv.org/pdf/2002.08936v1.pdf | |
PWC | https://paperswithcode.com/paper/meta-learning-for-mixed-linear-regression |
Repo | |
Framework | |
A Structured Prediction Approach for Conditional Meta-Learning
Title | A Structured Prediction Approach for Conditional Meta-Learning |
Authors | Ruohan Wang, Yiannis Demiris, Carlo Ciliberto |
Abstract | Optimization-based meta-learning algorithms are a powerful class of methods for learning-to-learn applications such as few-shot learning. They tackle the limited availability of training data by leveraging the experience gained from previously observed tasks. However, when the complexity of the tasks distribution cannot be captured by a single set of shared meta-parameters, existing methods may fail to fully adapt to a target task. We address this issue with a novel perspective on conditional meta-learning based on structured prediction. We propose task-adaptive structured meta-learning (TASML), a principled estimator that weighs meta-training data conditioned on the target task to design tailored meta-learning objectives. In addition, we introduce algorithmic improvements to tackle key computational limitations of existing methods. Experimentally, we show that TASML outperforms state-of-the-art methods on benchmark datasets both in terms of accuracy and efficiency. An ablation study quantifies the individual contribution of model components and suggests useful practices for meta-learning. |
Tasks | Few-Shot Learning, Meta-Learning, Structured Prediction |
Published | 2020-02-20 |
URL | https://arxiv.org/abs/2002.08799v1 |
https://arxiv.org/pdf/2002.08799v1.pdf | |
PWC | https://paperswithcode.com/paper/a-structured-prediction-approach-for-2 |
Repo | |
Framework | |
Discoverability in Satellite Imagery: A Good Sentence is Worth a Thousand Pictures
Title | Discoverability in Satellite Imagery: A Good Sentence is Worth a Thousand Pictures |
Authors | David Noever, Wes Regian, Matt Ciolino, Josh Kalin, Dom Hambrick, Kaye Blankenship |
Abstract | Small satellite constellations provide daily global coverage of the earth’s landmass, but image enrichment relies on automating key tasks like change detection or feature searches. For example, to extract text annotations from raw pixels requires two dependent machine learning models, one to analyze the overhead image and the other to generate a descriptive caption. We evaluate seven models on the previously largest benchmark for satellite image captions. We extend the labeled image samples five-fold, then augment, correct and prune the vocabulary to approach a rough min-max (minimum word, maximum description). This outcome compares favorably to previous work with large pre-trained image models but offers a hundred-fold reduction in model size without sacrificing overall accuracy (when measured with log entropy loss). These smaller models provide new deployment opportunities, particularly when pushed to edge processors, on-board satellites, or distributed ground stations. To quantify a caption’s descriptiveness, we introduce a novel multi-class confusion or error matrix to score both human-labeled test data and never-labeled images that include bounding box detection but lack full sentence captions. This work suggests future captioning strategies, particularly ones that can enrich the class coverage beyond land use applications and that lessen color-centered and adjacency adjectives (“green”, “near”, “between”, etc.). Many modern language transformers present novel and exploitable models with world knowledge gleaned from training from their vast online corpus. One interesting, but easy example might learn the word association between wind and waves, thus enriching a beach scene with more than just color descriptions that otherwise might be accessed from raw pixels without text annotation. |
Tasks | Image Captioning |
Published | 2020-01-03 |
URL | https://arxiv.org/abs/2001.05839v1 |
https://arxiv.org/pdf/2001.05839v1.pdf | |
PWC | https://paperswithcode.com/paper/discoverability-in-satellite-imagery-a-good |
Repo | |
Framework | |
ARAACOM: ARAbic Algerian Corpus for Opinion Mining
Title | ARAACOM: ARAbic Algerian Corpus for Opinion Mining |
Authors | Zitouni Abdelhafid, Hichem Rahab, Abdelhafid Zitouni, Mahieddine Djoudi |
Abstract | Nowadays, it is no more needed to do an enormous effort to distribute a lot of forms to thousands of people and collect them, then convert this from into electronic format to track people opinion about some subjects. A lot of web sites can today reach a large spectrum with less effort. The majority of web sites suggest to their visitors to leave backups about their feeling of the site or events. So, this makes for us a lot of data which need powerful mean to exploit. Opinion mining in the web becomes more and more an attracting task, due the increasing need for individuals and societies to track the mood of people against several subjects of daily life (sports, politics, television,…). A lot of works in opinion mining was developed in western languages especially English, such works in Arabic language still very scarce. In this paper, we propose our approach, for opinion mining in Arabic Algerian news paper. CCS CONCEPTS $\bullet$Information systems~Sentiment analysis $\bullet$ Computing methodologies~Natural language processing |
Tasks | Opinion Mining, Sentiment Analysis |
Published | 2020-01-22 |
URL | https://arxiv.org/abs/2001.08010v1 |
https://arxiv.org/pdf/2001.08010v1.pdf | |
PWC | https://paperswithcode.com/paper/araacom-arabic-algerian-corpus-for-opinion |
Repo | |
Framework | |
Online Joint Bid/Daily Budget Optimization of Internet Advertising Campaigns
Title | Online Joint Bid/Daily Budget Optimization of Internet Advertising Campaigns |
Authors | Alessandro Nuara, Francesco Trovò, Nicola Gatti, Marcello Restelli |
Abstract | Pay-per-click advertising includes various formats (\emph{e.g.}, search, contextual, social) with a total investment of more than 200 billion USD per year worldwide. An advertiser is given a daily budget to allocate over several, even thousands, campaigns, mainly distinguishing for the ad, target, or channel. Furthermore, publishers choose the ads to display and how to allocate them employing auctioning mechanisms, in which every day the advertisers set for each campaign a bid corresponding to the maximum amount of money per click they are willing to pay and the fraction of the daily budget to invest. In this paper, we study the problem of automating the online joint bid/daily budget optimization of pay-per-click advertising campaigns over multiple channels. We formulate our problem as a combinatorial semi-bandit problem, which requires solving a special case of the Multiple-Choice Knapsack problem every day. Furthermore, for every campaign, we capture the dependency of the number of clicks on the bid and daily budget by Gaussian Processes, thus requiring mild assumptions on the regularity of these functions. We design four algorithms and show that they suffer from a regret that is upper bounded with high probability as O(sqrt{T}), where T is the time horizon of the learning process. We experimentally evaluate our algorithms with synthetic settings generated from real data from Yahoo!, and we present the results of the adoption of our algorithms in a real-world application with a daily average spent of 1,000 Euros for more than one year. |
Tasks | Gaussian Processes |
Published | 2020-03-03 |
URL | https://arxiv.org/abs/2003.01452v1 |
https://arxiv.org/pdf/2003.01452v1.pdf | |
PWC | https://paperswithcode.com/paper/online-joint-biddaily-budget-optimization-of |
Repo | |
Framework | |
Context-Aware Domain Adaptation in Semantic Segmentation
Title | Context-Aware Domain Adaptation in Semantic Segmentation |
Authors | Jinyu Yang, Weizhi An, Chaochao Yan, Peilin Zhao, Junzhou Huang |
Abstract | In this paper, we consider the problem of unsupervised domain adaptation in the semantic segmentation. There are two primary issues in this field, i.e., what and how to transfer domain knowledge across two domains. Existing methods mainly focus on adapting domain-invariant features (what to transfer) through adversarial learning (how to transfer). Context dependency is essential for semantic segmentation, however, its transferability is still not well understood. Furthermore, how to transfer contextual information across two domains remains unexplored. Motivated by this, we propose a cross-attention mechanism based on self-attention to capture context dependencies between two domains and adapt transferable context. To achieve this goal, we design two cross-domain attention modules to adapt context dependencies from both spatial and channel views. Specifically, the spatial attention module captures local feature dependencies between each position in the source and target image. The channel attention module models semantic dependencies between each pair of cross-domain channel maps. To adapt context dependencies, we further selectively aggregate the context information from two domains. The superiority of our method over existing state-of-the-art methods is empirically proved on “GTA5 to Cityscapes” and “SYNTHIA to Cityscapes”. |
Tasks | Domain Adaptation, Semantic Segmentation, Unsupervised Domain Adaptation |
Published | 2020-03-09 |
URL | https://arxiv.org/abs/2003.04010v1 |
https://arxiv.org/pdf/2003.04010v1.pdf | |
PWC | https://paperswithcode.com/paper/context-aware-domain-adaptation-in-semantic |
Repo | |
Framework | |
Domain Adaptation with Conditional Distribution Matching and Generalized Label Shift
Title | Domain Adaptation with Conditional Distribution Matching and Generalized Label Shift |
Authors | Remi Tachet des Combes, Han Zhao, Yu-Xiang Wang, Geoff Gordon |
Abstract | Adversarial learning has demonstrated good performance in the unsupervised domain adaptation setting, by learning domain-invariant representations that perform well on the source domain. However, recent work has underlined limitations of existing methods in the presence of mismatched label distributions between the source and target domains. In this paper, we extend a recent upper-bound on the performance of adversarial domain adaptation to multi-class classification and more general discriminators. We then propose generalized label shift (GLS) as a way to improve robustness against mismatched label distributions. GLS states that, conditioned on the label, there exists a representation of the input that is invariant between the source and target domains. Under GLS, we provide theoretical guarantees on the transfer performance of any classifier. We also devise necessary and sufficient conditions for GLS to hold. The conditions are based on the estimation of the relative class weights between domains and on an appropriate reweighting of samples. Guided by our theoretical insights, we modify three widely used algorithms, JAN, DANN and CDAN and evaluate their performance on standard domain adaptation tasks where our method outperforms the base versions. We also demonstrate significant gains on artificially created tasks with large divergences between their source and target label distributions. |
Tasks | Domain Adaptation, Unsupervised Domain Adaptation |
Published | 2020-03-10 |
URL | https://arxiv.org/abs/2003.04475v1 |
https://arxiv.org/pdf/2003.04475v1.pdf | |
PWC | https://paperswithcode.com/paper/domain-adaptation-with-conditional |
Repo | |
Framework | |
Visual Commonsense R-CNN
Title | Visual Commonsense R-CNN |
Authors | Tan Wang, Jianqiang Huang, Hanwang Zhang, Qianru Sun |
Abstract | We present a novel unsupervised feature representation learning method, Visual Commonsense Region-based Convolutional Neural Network (VC R-CNN), to serve as an improved visual region encoder for high-level tasks such as captioning and VQA. Given a set of detected object regions in an image (e.g., using Faster R-CNN), like any other unsupervised feature learning methods (e.g., word2vec), the proxy training objective of VC R-CNN is to predict the contextual objects of a region. However, they are fundamentally different: the prediction of VC R-CNN is by using causal intervention: P(Ydo(X)), while others are by using the conventional likelihood: P(YX). This is also the core reason why VC R-CNN can learn “sense-making” knowledge like chair can be sat — while not just “common” co-occurrences such as chair is likely to exist if table is observed. We extensively apply VC R-CNN features in prevailing models of three popular tasks: Image Captioning, VQA, and VCR, and observe consistent performance boosts across them, achieving many new state-of-the-arts. Code and feature are available at https://github.com/Wangt-CN/VC-R-CNN. |
Tasks | Image Captioning, Representation Learning, Visual Question Answering |
Published | 2020-02-27 |
URL | https://arxiv.org/abs/2002.12204v2 |
https://arxiv.org/pdf/2002.12204v2.pdf | |
PWC | https://paperswithcode.com/paper/visual-commonsense-r-cnn |
Repo | |
Framework | |
An Advance on Variable Elimination with Applications to Tensor-Based Computation
Title | An Advance on Variable Elimination with Applications to Tensor-Based Computation |
Authors | Adnan Darwiche |
Abstract | We present new results on the classical algorithm of variable elimination, which underlies many algorithms including for probabilistic inference. The results relate to exploiting functional dependencies, allowing one to perform inference and learning efficiently on models that have very large treewidth. The highlight of the advance is that it works with standard (dense) factors, without the need for sparse factors or techniques based on knowledge compilation that are commonly utilized. This is significant as it permits a direct implementation of the improved variable elimination algorithm using tensors and their operations, leading to extremely efficient implementations especially when learning model parameters. Moreover, the proposed technique does not require knowledge of the specific functional dependencies, only that they exist, so can be used when learning these dependencies. We illustrate the efficacy of our proposed algorithm by compiling Bayesian network queries into tensor graphs and then learning their parameters from labeled data using a standard tool for tensor computation. |
Tasks | |
Published | 2020-02-21 |
URL | https://arxiv.org/abs/2002.09320v1 |
https://arxiv.org/pdf/2002.09320v1.pdf | |
PWC | https://paperswithcode.com/paper/an-advance-on-variable-elimination-with |
Repo | |
Framework | |
Measuring and improving the quality of visual explanations
Title | Measuring and improving the quality of visual explanations |
Authors | Agnieszka Grabska-Barwińska |
Abstract | The ability of to explain neural network decisions goes hand in hand with their safe deployment. Several methods have been proposed to highlight features important for a given network decision. However, there is no consensus on how to measure effectiveness of these methods. We propose a new procedure for evaluating explanations. We use it to investigate visual explanations extracted from a range of possible sources in a neural network. We quantify the benefit of combining these sources and challenge a recent appeal for taking bias parameters into account. We support our conclusions with a general assessment of the impact of bias parameters in ImageNet classifiers |
Tasks | |
Published | 2020-03-14 |
URL | https://arxiv.org/abs/2003.08774v2 |
https://arxiv.org/pdf/2003.08774v2.pdf | |
PWC | https://paperswithcode.com/paper/measuring-and-improving-the-quality-of-visual |
Repo | |
Framework | |