Paper Group ANR 1671
Split Batch Normalization: Improving Semi-Supervised Learning under Domain Shift. Copy this Sentence. Faster Algorithms for High-Dimensional Robust Covariance Estimation. Learning from Adversarial Features for Few-Shot Classification. A Walk-based Model on Entity Graphs for Relation Extraction. On-line Non-Convex Constrained Optimization. Detector …
Split Batch Normalization: Improving Semi-Supervised Learning under Domain Shift
Title | Split Batch Normalization: Improving Semi-Supervised Learning under Domain Shift |
Authors | Michał Zając, Konrad Żołna, Stanisław Jastrzębski |
Abstract | Recent work has shown that using unlabeled data in semi-supervised learning is not always beneficial and can even hurt generalization, especially when there is a class mismatch between the unlabeled and labeled examples. We investigate this phenomenon for image classification on the CIFAR-10 and the ImageNet datasets, and with many other forms of domain shifts applied (e.g. salt-and-pepper noise). Our main contribution is Split Batch Normalization (Split-BN), a technique to improve SSL when the additional unlabeled data comes from a shifted distribution. We achieve it by using separate batch normalization statistics for unlabeled examples. Due to its simplicity, we recommend it as a standard practice. Finally, we analyse how domain shift affects the SSL training process. In particular, we find that during training the statistics of hidden activations in late layers become markedly different between the unlabeled and the labeled examples. |
Tasks | Image Classification |
Published | 2019-04-06 |
URL | http://arxiv.org/abs/1904.03515v1 |
http://arxiv.org/pdf/1904.03515v1.pdf | |
PWC | https://paperswithcode.com/paper/split-batch-normalization-improving-semi |
Repo | |
Framework | |
Copy this Sentence
Title | Copy this Sentence |
Authors | Vasileios Lioutas, Andriy Drozdyuk |
Abstract | Attention is an operation that selects some largest element from some set, where the notion of largest is defined elsewhere. Applying this operation to sequence to sequence mapping results in significant improvements to the task at hand. In this paper we provide the mathematical definition of attention and examine its application to sequence to sequence models. We highlight the exact correspondences between machine learning implementations of attention and our mathematical definition. We provide clear evidence of effectiveness of attention mechanisms evaluating models with varying degrees of attention on a very simple task: copying a sentence. We find that models that make greater use of attention perform much better on sequence to sequence mapping tasks, converge faster and are more stable. |
Tasks | |
Published | 2019-05-23 |
URL | https://arxiv.org/abs/1905.09856v1 |
https://arxiv.org/pdf/1905.09856v1.pdf | |
PWC | https://paperswithcode.com/paper/copy-this-sentence |
Repo | |
Framework | |
Faster Algorithms for High-Dimensional Robust Covariance Estimation
Title | Faster Algorithms for High-Dimensional Robust Covariance Estimation |
Authors | Yu Cheng, Ilias Diakonikolas, Rong Ge, David Woodruff |
Abstract | We study the problem of estimating the covariance matrix of a high-dimensional distribution when a small constant fraction of the samples can be arbitrarily corrupted. Recent work gave the first polynomial time algorithms for this problem with near-optimal error guarantees for several natural structured distributions. Our main contribution is to develop faster algorithms for this problem whose running time nearly matches that of computing the empirical covariance. Given $N = \tilde{\Omega}(d^2/\epsilon^2)$ samples from a $d$-dimensional Gaussian distribution, an $\epsilon$-fraction of which may be arbitrarily corrupted, our algorithm runs in time $\tilde{O}(d^{3.26})/\mathrm{poly}(\epsilon)$ and approximates the unknown covariance matrix to optimal error up to a logarithmic factor. Previous robust algorithms with comparable error guarantees all have runtimes $\tilde{\Omega}(d^{2 \omega})$ when $\epsilon = \Omega(1)$, where $\omega$ is the exponent of matrix multiplication. We also provide evidence that improving the running time of our algorithm may require new algorithmic techniques. |
Tasks | |
Published | 2019-06-11 |
URL | https://arxiv.org/abs/1906.04661v1 |
https://arxiv.org/pdf/1906.04661v1.pdf | |
PWC | https://paperswithcode.com/paper/faster-algorithms-for-high-dimensional-robust |
Repo | |
Framework | |
Learning from Adversarial Features for Few-Shot Classification
Title | Learning from Adversarial Features for Few-Shot Classification |
Authors | Wei Shen, Ziqiang Shi, Jun Sun |
Abstract | Many recent few-shot learning methods concentrate on designing novel model architectures. In this paper, we instead show that with a simple backbone convolutional network we can even surpass state-of-the-art classification accuracy. The essential part that contributes to this superior performance is an adversarial feature learning strategy that improves the generalization capability of our model. In this work, adversarial features are those features that can cause the classifier uncertain about its prediction. In order to generate adversarial features, we firstly locate adversarial regions based on the derivative of the entropy with respect to an averaging mask. Then we use the adversarial region attention to aggregate the feature maps to obtain the adversarial features. In this way, we can explore and exploit the entire spatial area of the feature maps to mine more diverse discriminative knowledge. We perform extensive model evaluations and analyses on miniImageNet and tieredImageNet datasets demonstrating the effectiveness of the proposed method. |
Tasks | Few-Shot Learning |
Published | 2019-03-25 |
URL | http://arxiv.org/abs/1903.10225v1 |
http://arxiv.org/pdf/1903.10225v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-from-adversarial-features-for-few |
Repo | |
Framework | |
A Walk-based Model on Entity Graphs for Relation Extraction
Title | A Walk-based Model on Entity Graphs for Relation Extraction |
Authors | Fenia Christopoulou, Makoto Miwa, Sophia Ananiadou |
Abstract | We present a novel graph-based neural network model for relation extraction. Our model treats multiple pairs in a sentence simultaneously and considers interactions among them. All the entities in a sentence are placed as nodes in a fully-connected graph structure. The edges are represented with position-aware contexts around the entity pairs. In order to consider different relation paths between two entities, we construct up to l-length walks between each pair. The resulting walks are merged and iteratively used to update the edge representations into longer walks representations. We show that the model achieves performance comparable to the state-of-the-art systems on the ACE 2005 dataset without using any external tools. |
Tasks | Relation Extraction |
Published | 2019-02-19 |
URL | https://arxiv.org/abs/1902.07023v2 |
https://arxiv.org/pdf/1902.07023v2.pdf | |
PWC | https://paperswithcode.com/paper/a-walk-based-model-on-entity-graphs-for |
Repo | |
Framework | |
On-line Non-Convex Constrained Optimization
Title | On-line Non-Convex Constrained Optimization |
Authors | Olivier Massicot, Jakub Marecek |
Abstract | Time-varying non-convex continuous-valued non-linear constrained optimization is a fundamental problem. We study conditions wherein a momentum-like regularising term allow for the tracking of local optima by considering an ordinary differential equation (ODE). We then derive an efficient algorithm based on a predictor-corrector method, to track the ODE solution. |
Tasks | |
Published | 2019-09-16 |
URL | https://arxiv.org/abs/1909.07492v1 |
https://arxiv.org/pdf/1909.07492v1.pdf | |
PWC | https://paperswithcode.com/paper/on-line-non-convex-constrained-optimization |
Repo | |
Framework | |
Detector With Focus: Normalizing Gradient In Image Pyramid
Title | Detector With Focus: Normalizing Gradient In Image Pyramid |
Authors | Yonghyun Kim, Bong-Nam Kang, Daijin Kim |
Abstract | An image pyramid can extend many object detection algorithms to solve detection on multiple scales. However, interpolation during the resampling process of an image pyramid causes gradient variation, which is the difference of the gradients between the original image and the scaled images. Our key insight is that the increased variance of gradients makes the classifiers have difficulty in correctly assigning categories. We prove the existence of the gradient variation by formulating the ratio of gradient expectations between an original image and scaled images, then propose a simple and novel gradient normalization method to eliminate the effect of this variation. The proposed normalization method reduce the variance in an image pyramid and allow the classifier to focus on a smaller coverage. We show the improvement in three different visual recognition problems: pedestrian detection, pose estimation, and object detection. The method is generally applicable to many vision algorithms based on an image pyramid with gradients. |
Tasks | Object Detection, Pedestrian Detection, Pose Estimation |
Published | 2019-09-05 |
URL | https://arxiv.org/abs/1909.02301v1 |
https://arxiv.org/pdf/1909.02301v1.pdf | |
PWC | https://paperswithcode.com/paper/detector-with-focus-normalizing-gradient-in |
Repo | |
Framework | |
Adversarial Learning for Improved Onsets and Frames Music Transcription
Title | Adversarial Learning for Improved Onsets and Frames Music Transcription |
Authors | Jong Wook Kim, Juan Pablo Bello |
Abstract | Automatic music transcription is considered to be one of the hardest problems in music information retrieval, yet recent deep learning approaches have achieved substantial improvements on transcription performance. These approaches commonly employ supervised learning models that predict various time-frequency representations, by minimizing element-wise losses such as the cross entropy function. However, applying the loss in this manner assumes conditional independence of each label given the input, and thus cannot accurately express inter-label dependencies. To address this issue, we introduce an adversarial training scheme that operates directly on the time-frequency representations and makes the output distribution closer to the ground-truth. Through adversarial learning, we achieve a consistent improvement in both frame-level and note-level metrics over Onsets and Frames, a state-of-the-art music transcription model. Our results show that adversarial learning can significantly reduce the error rate while increasing the confidence of the model estimations. Our approach is generic and applicable to any transcription model based on multi-label predictions, which are very common in music signal analysis. |
Tasks | Information Retrieval, Music Information Retrieval |
Published | 2019-06-20 |
URL | https://arxiv.org/abs/1906.08512v1 |
https://arxiv.org/pdf/1906.08512v1.pdf | |
PWC | https://paperswithcode.com/paper/adversarial-learning-for-improved-onsets-and |
Repo | |
Framework | |
A Center in Your Neighborhood: Fairness in Facility Location
Title | A Center in Your Neighborhood: Fairness in Facility Location |
Authors | Christopher Jung, Sampath Kannan, Neil Lutz |
Abstract | When selecting locations for a set of facilities, standard clustering algorithms may place unfair burden on some individuals and neighborhoods. We formulate a fairness concept that takes local population densities into account. In particular, given $k$ facilities to locate and a population of size $n$, we define the “neighborhood radius” of an individual $i$ as the minimum radius of a ball centered at $i$ that contains at least $n/k$ individuals. Our objective is to ensure that each individual has a facility within at most a small constant factor of her neighborhood radius. We present several theoretical results: We show that optimizing this factor is NP-hard; we give an approximation algorithm that guarantees a factor of at most 2 in all metric spaces; and we prove matching lower bounds in some metric spaces. We apply a variant of this algorithm to real-world address data, showing that it is quite different from standard clustering algorithms and outperforms them on our objective function and balances the load between facilities more evenly. |
Tasks | |
Published | 2019-08-23 |
URL | https://arxiv.org/abs/1908.09041v2 |
https://arxiv.org/pdf/1908.09041v2.pdf | |
PWC | https://paperswithcode.com/paper/a-center-in-your-neighborhood-fairness-in |
Repo | |
Framework | |
Analysis of CNN-based remote-PPG to understand limitations and sensitivities
Title | Analysis of CNN-based remote-PPG to understand limitations and sensitivities |
Authors | Qi Zhan, Wenjin Wang, Gerard de Haan |
Abstract | Deep learning based on Convolutional Neural Network (CNN) has shown promising results in various vision-based applications, recently also in camera-based vital signs monitoring. The CNN-based Photoplethysmography (PPG) extraction has, so far, been focused on performance rather than understanding. In this paper, we try to answer four questions with experiments aiming at improving our understanding of this methodology as it gains popularity. We conclude that the network exploits the blood absorption variation to extract the physiological signals, and that the choice and parameters (phase, spectral content, etc.) of the reference-signal may be more critical than anticipated. The availability of multiple convolutional kernels is necessary for CNN to arrive at a flexible channel combination through the spatial operation, but may not provide the same motion-robustness as a multi-site measurement using knowledge-based PPG extraction. Finally, we conclude that the PPG-related prior knowledge is still helpful for the CNN-based PPG extraction. Consequently, we recommend further investigation of hybrid CNN-based methods to include prior knowledge in their design. |
Tasks | Photoplethysmography (PPG) |
Published | 2019-11-07 |
URL | https://arxiv.org/abs/1911.02736v2 |
https://arxiv.org/pdf/1911.02736v2.pdf | |
PWC | https://paperswithcode.com/paper/analysis-of-cnn-based-remote-ppg-to |
Repo | |
Framework | |
Sparse Spectrum Gaussian Process for Bayesian Optimisation
Title | Sparse Spectrum Gaussian Process for Bayesian Optimisation |
Authors | Ang Yang, Cheng Li, Santu Rana, Sunil Gupta, Svetha Venkatesh |
Abstract | We propose a novel sparse spectrum approximation of Gaussian process (GP) tailored for Bayesian optimisation. Whilst the current sparse spectrum methods provide good approximations for regression problems, it is observed that this particular form of sparse approximations generates an overconfident GP, i.e. it predicts less variance than the original GP. Since the balance between predictive mean and the predictive variance is a key determinant in the success of Bayesian optimisation, the current sparse spectrum methods are less suitable. We derive a regularised marginal likelihood for finding the optimal frequencies in optimisation problems. The regulariser trades the accuracy in the model fitting with the targeted increase in the variance of the resultant GP. We first consider the entropy of the distribution over the maxima as the regulariser that needs to be maximised. Later we show that the Expected Improvement acquisition function can also be used as a proxy for that, thus making the optimisation less computationally expensive. Experiments show an increase in the Bayesian optimisation convergence rate over the vanilla sparse spectrum method. |
Tasks | Bayesian Optimisation |
Published | 2019-06-21 |
URL | https://arxiv.org/abs/1906.08898v1 |
https://arxiv.org/pdf/1906.08898v1.pdf | |
PWC | https://paperswithcode.com/paper/sparse-spectrum-gaussian-process-for-bayesian |
Repo | |
Framework | |
A Survey on Recent Advancements for AI Enabled Radiomics in Neuro-Oncology
Title | A Survey on Recent Advancements for AI Enabled Radiomics in Neuro-Oncology |
Authors | Syed Muhammad Anwar, Tooba Altaf, Khola Rafique, Harish RaviPrakash, Hassan Mohy-ud-Din, Ulas Bagci |
Abstract | Artificial intelligence (AI) enabled radiomics has evolved immensely especially in the field of oncology. Radiomics provide assistancein diagnosis of cancer, planning of treatment strategy, and predictionof survival. Radiomics in neuro-oncology has progressed significantly inthe recent past. Deep learning has outperformed conventional machinelearning methods in most image-based applications. Convolutional neu-ral networks (CNNs) have seen some popularity in radiomics, since theydo not require hand-crafted features and can automatically extract fea-tures during the learning process. In this regard, it is observed that CNNbased radiomics could provide state-of-the-art results in neuro-oncology,similar to the recent success of such methods in a wide spectrum ofmedical image analysis applications. Herein we present a review of the most recent best practices and establish the future trends for AI enabled radiomics in neuro-oncology. |
Tasks | |
Published | 2019-10-16 |
URL | https://arxiv.org/abs/1910.07470v1 |
https://arxiv.org/pdf/1910.07470v1.pdf | |
PWC | https://paperswithcode.com/paper/a-survey-on-recent-advancements-for-ai |
Repo | |
Framework | |
Predicting credit default probabilities using machine learning techniques in the face of unequal class distributions
Title | Predicting credit default probabilities using machine learning techniques in the face of unequal class distributions |
Authors | Anna Stelzer |
Abstract | This study conducts a benchmarking study, comparing 23 different statistical and machine learning methods in a credit scoring application. In order to do so, the models’ performance is evaluated over four different data sets in combination with five data sampling strategies to tackle existing class imbalances in the data. Six different performance measures are used to cover different aspects of predictive performance. The results indicate a strong superiority of ensemble methods and show that simple sampling strategies deliver better results than more sophisticated ones. |
Tasks | |
Published | 2019-07-30 |
URL | https://arxiv.org/abs/1907.12996v1 |
https://arxiv.org/pdf/1907.12996v1.pdf | |
PWC | https://paperswithcode.com/paper/predicting-credit-default-probabilities-using |
Repo | |
Framework | |
Quantum-Inspired Support Vector Machine
Title | Quantum-Inspired Support Vector Machine |
Authors | Chen Ding, Tian-Yi Bao, He-Liang Huang |
Abstract | Support vector machine (SVM) is a particularly powerful and flexible supervised learning model that analyze data for both classification and regression, whose usual algorithm complexity scales polynomially with the dimension of data space and the number of data points. Inspired by quantum SVM, we present a quantum-inspired classical algorithm for SVM using fast sampling techniques. In our approach, we developed a method sampling kernel matrix by the given information on data points and make classification through estimation of classification expression. Our approach can be applied to various types of SVM, such as linear SVM, non-linear SVM and soft SVM. Theoretical analysis shows one can make classification with arbitrary success probability in logarithmic runtime of both the dimension of data space and the number of data points, matching the runtime of the quantum SVM. |
Tasks | |
Published | 2019-06-21 |
URL | https://arxiv.org/abs/1906.08902v2 |
https://arxiv.org/pdf/1906.08902v2.pdf | |
PWC | https://paperswithcode.com/paper/quantum-inspired-support-vector-machine |
Repo | |
Framework | |
Real-time Policy Distillation in Deep Reinforcement Learning
Title | Real-time Policy Distillation in Deep Reinforcement Learning |
Authors | Yuxiang Sun, Pooyan Fazli |
Abstract | Policy distillation in deep reinforcement learning provides an effective way to transfer control policies from a larger network to a smaller untrained network without a significant degradation in performance. However, policy distillation is underexplored in deep reinforcement learning, and existing approaches are computationally inefficient, resulting in a long distillation time. In addition, the effectiveness of the distillation process is still limited to the model capacity. We propose a new distillation mechanism, called real-time policy distillation, in which training the teacher model and distilling the policy to the student model occur simultaneously. Accordingly, the teacher’s latest policy is transferred to the student model in real time. This reduces the distillation time to half the original time or even less and also makes it possible for extremely small student models to learn skills at the expert level. We evaluated the proposed algorithm in the Atari 2600 domain. The results show that our approach can achieve full distillation in most games, even with compression ratios up to 1.7%. |
Tasks | |
Published | 2019-12-29 |
URL | https://arxiv.org/abs/1912.12630v1 |
https://arxiv.org/pdf/1912.12630v1.pdf | |
PWC | https://paperswithcode.com/paper/real-time-policy-distillation-in-deep |
Repo | |
Framework | |