January 25, 2020

2646 words 13 mins read

Paper Group ANR 1671

Split Batch Normalization: Improving Semi-Supervised Learning under Domain Shift. Copy this Sentence. Faster Algorithms for High-Dimensional Robust Covariance Estimation. Learning from Adversarial Features for Few-Shot Classification. A Walk-based Model on Entity Graphs for Relation Extraction. On-line Non-Convex Constrained Optimization. Detector …

Split Batch Normalization: Improving Semi-Supervised Learning under Domain Shift


Title	Split Batch Normalization: Improving Semi-Supervised Learning under Domain Shift
Authors	Michał Zając, Konrad Żołna, Stanisław Jastrzębski
Abstract	Recent work has shown that using unlabeled data in semi-supervised learning is not always beneficial and can even hurt generalization, especially when there is a class mismatch between the unlabeled and labeled examples. We investigate this phenomenon for image classification on the CIFAR-10 and the ImageNet datasets, and with many other forms of domain shifts applied (e.g. salt-and-pepper noise). Our main contribution is Split Batch Normalization (Split-BN), a technique to improve SSL when the additional unlabeled data comes from a shifted distribution. We achieve it by using separate batch normalization statistics for unlabeled examples. Due to its simplicity, we recommend it as a standard practice. Finally, we analyse how domain shift affects the SSL training process. In particular, we find that during training the statistics of hidden activations in late layers become markedly different between the unlabeled and the labeled examples.
Tasks	Image Classification
Published	2019-04-06
URL	http://arxiv.org/abs/1904.03515v1
PDF	http://arxiv.org/pdf/1904.03515v1.pdf
PWC	https://paperswithcode.com/paper/split-batch-normalization-improving-semi
Repo
Framework

Copy this Sentence


Title	Copy this Sentence
Authors	Vasileios Lioutas, Andriy Drozdyuk
Abstract	Attention is an operation that selects some largest element from some set, where the notion of largest is defined elsewhere. Applying this operation to sequence to sequence mapping results in significant improvements to the task at hand. In this paper we provide the mathematical definition of attention and examine its application to sequence to sequence models. We highlight the exact correspondences between machine learning implementations of attention and our mathematical definition. We provide clear evidence of effectiveness of attention mechanisms evaluating models with varying degrees of attention on a very simple task: copying a sentence. We find that models that make greater use of attention perform much better on sequence to sequence mapping tasks, converge faster and are more stable.
Tasks
Published	2019-05-23
URL	https://arxiv.org/abs/1905.09856v1
PDF	https://arxiv.org/pdf/1905.09856v1.pdf
PWC	https://paperswithcode.com/paper/copy-this-sentence
Repo
Framework

Faster Algorithms for High-Dimensional Robust Covariance Estimation


Title	Faster Algorithms for High-Dimensional Robust Covariance Estimation
Authors	Yu Cheng, Ilias Diakonikolas, Rong Ge, David Woodruff
Abstract	We study the problem of estimating the covariance matrix of a high-dimensional distribution when a small constant fraction of the samples can be arbitrarily corrupted. Recent work gave the first polynomial time algorithms for this problem with near-optimal error guarantees for several natural structured distributions. Our main contribution is to develop faster algorithms for this problem whose running time nearly matches that of computing the empirical covariance. Given $N = \tilde{\Omega}(d^2/\epsilon^2)$ samples from a $d$-dimensional Gaussian distribution, an $\epsilon$-fraction of which may be arbitrarily corrupted, our algorithm runs in time $\tilde{O}(d^{3.26})/\mathrm{poly}(\epsilon)$ and approximates the unknown covariance matrix to optimal error up to a logarithmic factor. Previous robust algorithms with comparable error guarantees all have runtimes $\tilde{\Omega}(d^{2 \omega})$ when $\epsilon = \Omega(1)$, where $\omega$ is the exponent of matrix multiplication. We also provide evidence that improving the running time of our algorithm may require new algorithmic techniques.
Tasks
Published	2019-06-11
URL	https://arxiv.org/abs/1906.04661v1
PDF	https://arxiv.org/pdf/1906.04661v1.pdf
PWC	https://paperswithcode.com/paper/faster-algorithms-for-high-dimensional-robust
Repo
Framework

Learning from Adversarial Features for Few-Shot Classification


Title	Learning from Adversarial Features for Few-Shot Classification
Authors	Wei Shen, Ziqiang Shi, Jun Sun
Abstract	Many recent few-shot learning methods concentrate on designing novel model architectures. In this paper, we instead show that with a simple backbone convolutional network we can even surpass state-of-the-art classification accuracy. The essential part that contributes to this superior performance is an adversarial feature learning strategy that improves the generalization capability of our model. In this work, adversarial features are those features that can cause the classifier uncertain about its prediction. In order to generate adversarial features, we firstly locate adversarial regions based on the derivative of the entropy with respect to an averaging mask. Then we use the adversarial region attention to aggregate the feature maps to obtain the adversarial features. In this way, we can explore and exploit the entire spatial area of the feature maps to mine more diverse discriminative knowledge. We perform extensive model evaluations and analyses on miniImageNet and tieredImageNet datasets demonstrating the effectiveness of the proposed method.
Tasks	Few-Shot Learning
Published	2019-03-25
URL	http://arxiv.org/abs/1903.10225v1
PDF	http://arxiv.org/pdf/1903.10225v1.pdf
PWC	https://paperswithcode.com/paper/learning-from-adversarial-features-for-few
Repo
Framework

A Walk-based Model on Entity Graphs for Relation Extraction


Title	A Walk-based Model on Entity Graphs for Relation Extraction
Authors	Fenia Christopoulou, Makoto Miwa, Sophia Ananiadou
Abstract	We present a novel graph-based neural network model for relation extraction. Our model treats multiple pairs in a sentence simultaneously and considers interactions among them. All the entities in a sentence are placed as nodes in a fully-connected graph structure. The edges are represented with position-aware contexts around the entity pairs. In order to consider different relation paths between two entities, we construct up to l-length walks between each pair. The resulting walks are merged and iteratively used to update the edge representations into longer walks representations. We show that the model achieves performance comparable to the state-of-the-art systems on the ACE 2005 dataset without using any external tools.
Tasks	Relation Extraction
Published	2019-02-19
URL	https://arxiv.org/abs/1902.07023v2
PDF	https://arxiv.org/pdf/1902.07023v2.pdf
PWC	https://paperswithcode.com/paper/a-walk-based-model-on-entity-graphs-for
Repo
Framework

On-line Non-Convex Constrained Optimization


Title	On-line Non-Convex Constrained Optimization
Authors	Olivier Massicot, Jakub Marecek
Abstract	Time-varying non-convex continuous-valued non-linear constrained optimization is a fundamental problem. We study conditions wherein a momentum-like regularising term allow for the tracking of local optima by considering an ordinary differential equation (ODE). We then derive an efficient algorithm based on a predictor-corrector method, to track the ODE solution.
Tasks
Published	2019-09-16
URL	https://arxiv.org/abs/1909.07492v1
PDF	https://arxiv.org/pdf/1909.07492v1.pdf
PWC	https://paperswithcode.com/paper/on-line-non-convex-constrained-optimization
Repo
Framework

Detector With Focus: Normalizing Gradient In Image Pyramid


Title	Detector With Focus: Normalizing Gradient In Image Pyramid
Authors	Yonghyun Kim, Bong-Nam Kang, Daijin Kim
Abstract	An image pyramid can extend many object detection algorithms to solve detection on multiple scales. However, interpolation during the resampling process of an image pyramid causes gradient variation, which is the difference of the gradients between the original image and the scaled images. Our key insight is that the increased variance of gradients makes the classifiers have difficulty in correctly assigning categories. We prove the existence of the gradient variation by formulating the ratio of gradient expectations between an original image and scaled images, then propose a simple and novel gradient normalization method to eliminate the effect of this variation. The proposed normalization method reduce the variance in an image pyramid and allow the classifier to focus on a smaller coverage. We show the improvement in three different visual recognition problems: pedestrian detection, pose estimation, and object detection. The method is generally applicable to many vision algorithms based on an image pyramid with gradients.
Tasks	Object Detection, Pedestrian Detection, Pose Estimation
Published	2019-09-05
URL	https://arxiv.org/abs/1909.02301v1
PDF	https://arxiv.org/pdf/1909.02301v1.pdf
PWC	https://paperswithcode.com/paper/detector-with-focus-normalizing-gradient-in
Repo
Framework

Adversarial Learning for Improved Onsets and Frames Music Transcription


Title	Adversarial Learning for Improved Onsets and Frames Music Transcription
Authors	Jong Wook Kim, Juan Pablo Bello
Abstract	Automatic music transcription is considered to be one of the hardest problems in music information retrieval, yet recent deep learning approaches have achieved substantial improvements on transcription performance. These approaches commonly employ supervised learning models that predict various time-frequency representations, by minimizing element-wise losses such as the cross entropy function. However, applying the loss in this manner assumes conditional independence of each label given the input, and thus cannot accurately express inter-label dependencies. To address this issue, we introduce an adversarial training scheme that operates directly on the time-frequency representations and makes the output distribution closer to the ground-truth. Through adversarial learning, we achieve a consistent improvement in both frame-level and note-level metrics over Onsets and Frames, a state-of-the-art music transcription model. Our results show that adversarial learning can significantly reduce the error rate while increasing the confidence of the model estimations. Our approach is generic and applicable to any transcription model based on multi-label predictions, which are very common in music signal analysis.
Tasks	Information Retrieval, Music Information Retrieval
Published	2019-06-20
URL	https://arxiv.org/abs/1906.08512v1
PDF	https://arxiv.org/pdf/1906.08512v1.pdf
PWC	https://paperswithcode.com/paper/adversarial-learning-for-improved-onsets-and
Repo
Framework

A Center in Your Neighborhood: Fairness in Facility Location


Title	A Center in Your Neighborhood: Fairness in Facility Location
Authors	Christopher Jung, Sampath Kannan, Neil Lutz
Abstract	When selecting locations for a set of facilities, standard clustering algorithms may place unfair burden on some individuals and neighborhoods. We formulate a fairness concept that takes local population densities into account. In particular, given $k$ facilities to locate and a population of size $n$, we define the “neighborhood radius” of an individual $i$ as the minimum radius of a ball centered at $i$ that contains at least $n/k$ individuals. Our objective is to ensure that each individual has a facility within at most a small constant factor of her neighborhood radius. We present several theoretical results: We show that optimizing this factor is NP-hard; we give an approximation algorithm that guarantees a factor of at most 2 in all metric spaces; and we prove matching lower bounds in some metric spaces. We apply a variant of this algorithm to real-world address data, showing that it is quite different from standard clustering algorithms and outperforms them on our objective function and balances the load between facilities more evenly.
Tasks
Published	2019-08-23
URL	https://arxiv.org/abs/1908.09041v2
PDF	https://arxiv.org/pdf/1908.09041v2.pdf
PWC	https://paperswithcode.com/paper/a-center-in-your-neighborhood-fairness-in
Repo
Framework

Analysis of CNN-based remote-PPG to understand limitations and sensitivities


Title	Analysis of CNN-based remote-PPG to understand limitations and sensitivities
Authors	Qi Zhan, Wenjin Wang, Gerard de Haan
Abstract	Deep learning based on Convolutional Neural Network (CNN) has shown promising results in various vision-based applications, recently also in camera-based vital signs monitoring. The CNN-based Photoplethysmography (PPG) extraction has, so far, been focused on performance rather than understanding. In this paper, we try to answer four questions with experiments aiming at improving our understanding of this methodology as it gains popularity. We conclude that the network exploits the blood absorption variation to extract the physiological signals, and that the choice and parameters (phase, spectral content, etc.) of the reference-signal may be more critical than anticipated. The availability of multiple convolutional kernels is necessary for CNN to arrive at a flexible channel combination through the spatial operation, but may not provide the same motion-robustness as a multi-site measurement using knowledge-based PPG extraction. Finally, we conclude that the PPG-related prior knowledge is still helpful for the CNN-based PPG extraction. Consequently, we recommend further investigation of hybrid CNN-based methods to include prior knowledge in their design.
Tasks	Photoplethysmography (PPG)
Published	2019-11-07
URL	https://arxiv.org/abs/1911.02736v2
PDF	https://arxiv.org/pdf/1911.02736v2.pdf
PWC	https://paperswithcode.com/paper/analysis-of-cnn-based-remote-ppg-to
Repo
Framework

Sparse Spectrum Gaussian Process for Bayesian Optimisation


Title	Sparse Spectrum Gaussian Process for Bayesian Optimisation
Authors	Ang Yang, Cheng Li, Santu Rana, Sunil Gupta, Svetha Venkatesh
Abstract	We propose a novel sparse spectrum approximation of Gaussian process (GP) tailored for Bayesian optimisation. Whilst the current sparse spectrum methods provide good approximations for regression problems, it is observed that this particular form of sparse approximations generates an overconfident GP, i.e. it predicts less variance than the original GP. Since the balance between predictive mean and the predictive variance is a key determinant in the success of Bayesian optimisation, the current sparse spectrum methods are less suitable. We derive a regularised marginal likelihood for finding the optimal frequencies in optimisation problems. The regulariser trades the accuracy in the model fitting with the targeted increase in the variance of the resultant GP. We first consider the entropy of the distribution over the maxima as the regulariser that needs to be maximised. Later we show that the Expected Improvement acquisition function can also be used as a proxy for that, thus making the optimisation less computationally expensive. Experiments show an increase in the Bayesian optimisation convergence rate over the vanilla sparse spectrum method.
Tasks	Bayesian Optimisation
Published	2019-06-21
URL	https://arxiv.org/abs/1906.08898v1
PDF	https://arxiv.org/pdf/1906.08898v1.pdf
PWC	https://paperswithcode.com/paper/sparse-spectrum-gaussian-process-for-bayesian
Repo
Framework

A Survey on Recent Advancements for AI Enabled Radiomics in Neuro-Oncology


Title	A Survey on Recent Advancements for AI Enabled Radiomics in Neuro-Oncology
Authors	Syed Muhammad Anwar, Tooba Altaf, Khola Rafique, Harish RaviPrakash, Hassan Mohy-ud-Din, Ulas Bagci
Abstract	Artificial intelligence (AI) enabled radiomics has evolved immensely especially in the field of oncology. Radiomics provide assistancein diagnosis of cancer, planning of treatment strategy, and predictionof survival. Radiomics in neuro-oncology has progressed significantly inthe recent past. Deep learning has outperformed conventional machinelearning methods in most image-based applications. Convolutional neu-ral networks (CNNs) have seen some popularity in radiomics, since theydo not require hand-crafted features and can automatically extract fea-tures during the learning process. In this regard, it is observed that CNNbased radiomics could provide state-of-the-art results in neuro-oncology,similar to the recent success of such methods in a wide spectrum ofmedical image analysis applications. Herein we present a review of the most recent best practices and establish the future trends for AI enabled radiomics in neuro-oncology.
Tasks
Published	2019-10-16
URL	https://arxiv.org/abs/1910.07470v1
PDF	https://arxiv.org/pdf/1910.07470v1.pdf
PWC	https://paperswithcode.com/paper/a-survey-on-recent-advancements-for-ai
Repo
Framework

Predicting credit default probabilities using machine learning techniques in the face of unequal class distributions


Title	Predicting credit default probabilities using machine learning techniques in the face of unequal class distributions
Authors	Anna Stelzer
Abstract	This study conducts a benchmarking study, comparing 23 different statistical and machine learning methods in a credit scoring application. In order to do so, the models’ performance is evaluated over four different data sets in combination with five data sampling strategies to tackle existing class imbalances in the data. Six different performance measures are used to cover different aspects of predictive performance. The results indicate a strong superiority of ensemble methods and show that simple sampling strategies deliver better results than more sophisticated ones.
Tasks
Published	2019-07-30
URL	https://arxiv.org/abs/1907.12996v1
PDF	https://arxiv.org/pdf/1907.12996v1.pdf
PWC	https://paperswithcode.com/paper/predicting-credit-default-probabilities-using
Repo
Framework

Quantum-Inspired Support Vector Machine


Title	Quantum-Inspired Support Vector Machine
Authors	Chen Ding, Tian-Yi Bao, He-Liang Huang
Abstract	Support vector machine (SVM) is a particularly powerful and flexible supervised learning model that analyze data for both classification and regression, whose usual algorithm complexity scales polynomially with the dimension of data space and the number of data points. Inspired by quantum SVM, we present a quantum-inspired classical algorithm for SVM using fast sampling techniques. In our approach, we developed a method sampling kernel matrix by the given information on data points and make classification through estimation of classification expression. Our approach can be applied to various types of SVM, such as linear SVM, non-linear SVM and soft SVM. Theoretical analysis shows one can make classification with arbitrary success probability in logarithmic runtime of both the dimension of data space and the number of data points, matching the runtime of the quantum SVM.
Tasks
Published	2019-06-21
URL	https://arxiv.org/abs/1906.08902v2
PDF	https://arxiv.org/pdf/1906.08902v2.pdf
PWC	https://paperswithcode.com/paper/quantum-inspired-support-vector-machine
Repo
Framework

Real-time Policy Distillation in Deep Reinforcement Learning


Title	Real-time Policy Distillation in Deep Reinforcement Learning
Authors	Yuxiang Sun, Pooyan Fazli
Abstract	Policy distillation in deep reinforcement learning provides an effective way to transfer control policies from a larger network to a smaller untrained network without a significant degradation in performance. However, policy distillation is underexplored in deep reinforcement learning, and existing approaches are computationally inefficient, resulting in a long distillation time. In addition, the effectiveness of the distillation process is still limited to the model capacity. We propose a new distillation mechanism, called real-time policy distillation, in which training the teacher model and distilling the policy to the student model occur simultaneously. Accordingly, the teacher’s latest policy is transferred to the student model in real time. This reduces the distillation time to half the original time or even less and also makes it possible for extremely small student models to learn skills at the expert level. We evaluated the proposed algorithm in the Atari 2600 domain. The results show that our approach can achieve full distillation in most games, even with compression ratios up to 1.7%.
Tasks
Published	2019-12-29
URL	https://arxiv.org/abs/1912.12630v1
PDF	https://arxiv.org/pdf/1912.12630v1.pdf
PWC	https://paperswithcode.com/paper/real-time-policy-distillation-in-deep
Repo
Framework