February 1, 2020

3238 words 16 mins read

Paper Group AWR 317

Paper Group AWR 317

Improving short text classification through global augmentation methods. Constrained K-means with General Pairwise and Cardinality Constraints. Robust Attribution Regularization. Extreme Classification in Log Memory using Count-Min Sketch: A Case Study of Amazon Search with 50M Products. Enhancing Adversarial Defense by k-Winners-Take-All. How does …

Improving short text classification through global augmentation methods

Title Improving short text classification through global augmentation methods
Authors Vukosi Marivate, Tshephisho Sefara
Abstract We study the effect of different approaches to text augmentation. To do this we use 3 datasets that include social media and formal text in the form of news articles. Our goal is to provide insights for practitioners and researchers on making choices for augmentation for classification use cases. We observe that Word2vec-based augmentation is a viable option when one does not have access to a formal synonym model (like WordNet-based augmentation). The use of \emph{mixup} further improves performance of all text based augmentations and reduces the effects of overfitting on a tested deep learning model. Round-trip translation with a translation service proves to be harder to use due to cost and as such is less accessible for both normal and low resource use-cases.
Tasks Text Augmentation, Text Classification
Published 2019-07-07
URL https://arxiv.org/abs/1907.03752v1
PDF https://arxiv.org/pdf/1907.03752v1.pdf
PWC https://paperswithcode.com/paper/improving-short-text-classification-through
Repo https://github.com/dsfsi/textaugment
Framework none

Constrained K-means with General Pairwise and Cardinality Constraints

Title Constrained K-means with General Pairwise and Cardinality Constraints
Authors Adel Bibi, Baoyuan Wu, Bernard Ghanem
Abstract In this work, we study constrained clustering, where constraints are utilized to guide the clustering process. In existing works, two categories of constraints have been widely explored, namely pairwise and cardinality constraints. Pairwise constraints enforce the cluster labels of two instances to be the same (must-link constraints) or different (cannot-link constraints). Cardinality constraints encourage cluster sizes to satisfy a user-specified distribution. However, most existing constrained clustering models can only utilize one category of constraints at a time. In this paper, we enforce the above two categories into a unified clustering model starting with the integer program formulation of the standard K-means. As these two categories provide useful information at different levels, utilizing both of them is expected to allow for better clustering performance. However, the optimization is difficult due to the binary and quadratic constraints in the proposed unified formulation. To alleviate this difficulty, we utilize two techniques: equivalently replacing the binary constraints by the intersection of two continuous constraints; the other is transforming the quadratic constraints into bi-linear constraints by introducing extra variables. Then we derive an equivalent continuous reformulation with simple constraints, which can be efficiently solved by Alternating Direction Method of Multipliers (ADMM) algorithm. Extensive experiments on both synthetic and real data demonstrate: (1) when utilizing a single category of constraint, the proposed model is superior to or competitive with state-of-the-art constrained clustering models, and (2) when utilizing both categories of constraints jointly, the proposed model shows better performance than the case of the single category.
Tasks
Published 2019-07-24
URL https://arxiv.org/abs/1907.10410v1
PDF https://arxiv.org/pdf/1907.10410v1.pdf
PWC https://paperswithcode.com/paper/constrained-k-means-with-general-pairwise-and
Repo https://github.com/wubaoyuan/Lpbox-ADMM
Framework none

Robust Attribution Regularization

Title Robust Attribution Regularization
Authors Jiefeng Chen, Xi Wu, Vaibhav Rastogi, Yingyu Liang, Somesh Jha
Abstract An emerging problem in trustworthy machine learning is to train models that produce robust interpretations for their predictions. We take a step towards solving this problem through the lens of axiomatic attribution of neural networks. Our theory is grounded in the recent work, Integrated Gradients (IG), in axiomatically attributing a neural network’s output change to its input change. We propose training objectives in classic robust optimization models to achieve robust IG attributions. Our objectives give principled generalizations of previous objectives designed for robust predictions, and they naturally degenerate to classic soft-margin training for one-layer neural networks. We also generalize previous theory and prove that the objectives for different robust optimization models are closely related. Experiments demonstrate the effectiveness of our method, and also point to intriguing problems which hint at the need for better optimization techniques or better neural network architectures for robust attribution training.
Tasks
Published 2019-05-23
URL https://arxiv.org/abs/1905.09957v3
PDF https://arxiv.org/pdf/1905.09957v3.pdf
PWC https://paperswithcode.com/paper/robust-attribution-regularization
Repo https://github.com/jfc43/robust-attribution-regularization
Framework tf

Extreme Classification in Log Memory using Count-Min Sketch: A Case Study of Amazon Search with 50M Products

Title Extreme Classification in Log Memory using Count-Min Sketch: A Case Study of Amazon Search with 50M Products
Authors Tharun Medini, Qixuan Huang, Yiqiu Wang, Vijai Mohan, Anshumali Shrivastava
Abstract In the last decade, it has been shown that many hard AI tasks, especially in NLP, can be naturally modeled as extreme classification problems leading to improved precision. However, such models are prohibitively expensive to train due to the memory blow-up in the last layer. For example, a reasonable softmax layer for the dataset of interest in this paper can easily reach well beyond 100 billion parameters (>400 GB memory). To alleviate this problem, we present Merged-Average Classifiers via Hashing (MACH), a generic K-classification algorithm where memory provably scales at O(logK) without any strong assumption on the classes. MACH is subtly a count-min sketch structure in disguise, which uses universal hashing to reduce classification with a large number of classes to few embarrassingly parallel and independent classification tasks with a small (constant) number of classes. MACH naturally provides a technique for zero communication model parallelism. We experiment with 6 datasets; some multiclass and some multilabel, and show consistent improvement over respective state-of-the-art baselines. In particular, we train an end-to-end deep classifier on a private product search dataset sampled from Amazon Search Engine with 70 million queries and 49.46 million products. MACH outperforms, by a significant margin,the state-of-the-art extreme classification models deployed on commercial search engines: Parabel and dense embedding models. Our largest model has 6.4 billion parameters and trains in less than 35 hours on a single p3.16x machine. Our training times are 7-10x faster, and our memory footprints are 2-4x smaller than the best baselines. This training time is also significantly lower than the one reported by Google’s mixture of experts (MoE) language model on a comparable model size and hardware.
Tasks Language Modelling
Published 2019-10-28
URL https://arxiv.org/abs/1910.13830v1
PDF https://arxiv.org/pdf/1910.13830v1.pdf
PWC https://paperswithcode.com/paper/extreme-classification-in-log-memory-using
Repo https://github.com/Tharun24/MACH
Framework tf

Enhancing Adversarial Defense by k-Winners-Take-All

Title Enhancing Adversarial Defense by k-Winners-Take-All
Authors Chang Xiao, Peilin Zhong, Changxi Zheng
Abstract We propose a simple change to existing neural network structures for better defending against gradient-based adversarial attacks. Instead of using popular activation functions (such as ReLU), we advocate the use of k-Winners-Take-All (k-WTA) activation, a C0 discontinuous function that purposely invalidates the neural network model’s gradient at densely distributed input data points. The proposed k-WTA activation can be readily used in nearly all existing networks and training methods with no significant overhead. Our proposal is theoretically rationalized. We analyze why the discontinuities in k-WTA networks can largely prevent gradient-based search of adversarial examples and why they at the same time remain innocuous to the network training. This understanding is also empirically backed. We test k-WTA activation on various network structures optimized by a training method, be it adversarial training or not. In all cases, the robustness of k-WTA networks outperforms that of traditional networks under white-box attacks.
Tasks Adversarial Defense
Published 2019-05-25
URL https://arxiv.org/abs/1905.10510v3
PDF https://arxiv.org/pdf/1905.10510v3.pdf
PWC https://paperswithcode.com/paper/resisting-adversarial-attacks-by-k-winners
Repo https://github.com/a554b554/kWTA-Activation
Framework pytorch

How does Disagreement Help Generalization against Label Corruption?

Title How does Disagreement Help Generalization against Label Corruption?
Authors Xingrui Yu, Bo Han, Jiangchao Yao, Gang Niu, Ivor W. Tsang, Masashi Sugiyama
Abstract Learning with noisy labels is one of the hottest problems in weakly-supervised learning. Based on memorization effects of deep neural networks, training on small-loss instances becomes very promising for handling noisy labels. This fosters the state-of-the-art approach “Co-teaching” that cross-trains two deep neural networks using the small-loss trick. However, with the increase of epochs, two networks converge to a consensus and Co-teaching reduces to the self-training MentorNet. To tackle this issue, we propose a robust learning paradigm called Co-teaching+, which bridges the “Update by Disagreement” strategy with the original Co-teaching. First, two networks feed forward and predict all data, but keep prediction disagreement data only. Then, among such disagreement data, each network selects its small-loss data, but back propagates the small-loss data from its peer network and updates its own parameters. Empirical results on benchmark datasets demonstrate that Co-teaching+ is much superior to many state-of-the-art methods in the robustness of trained models.
Tasks
Published 2019-01-14
URL https://arxiv.org/abs/1901.04215v3
PDF https://arxiv.org/pdf/1901.04215v3.pdf
PWC https://paperswithcode.com/paper/how-does-disagreement-help-generalization
Repo https://github.com/xingruiyu/coteaching_plus
Framework pytorch

You Only Propagate Once: Accelerating Adversarial Training via Maximal Principle

Title You Only Propagate Once: Accelerating Adversarial Training via Maximal Principle
Authors Dinghuai Zhang, Tianyuan Zhang, Yiping Lu, Zhanxing Zhu, Bin Dong
Abstract Deep learning achieves state-of-the-art results in many tasks in computer vision and natural language processing. However, recent works have shown that deep networks can be vulnerable to adversarial perturbations, which raised a serious robustness issue of deep networks. Adversarial training, typically formulated as a robust optimization problem, is an effective way of improving the robustness of deep networks. A major drawback of existing adversarial training algorithms is the computational overhead of the generation of adversarial examples, typically far greater than that of the network training. This leads to the unbearable overall computational cost of adversarial training. In this paper, we show that adversarial training can be cast as a discrete time differential game. Through analyzing the Pontryagin’s Maximal Principle (PMP) of the problem, we observe that the adversary update is only coupled with the parameters of the first layer of the network. This inspires us to restrict most of the forward and back propagation within the first layer of the network during adversary updates. This effectively reduces the total number of full forward and backward propagation to only one for each group of adversary updates. Therefore, we refer to this algorithm YOPO (You Only Propagate Once). Numerical experiments demonstrate that YOPO can achieve comparable defense accuracy with approximately 1/5 ~ 1/4 GPU time of the projected gradient descent (PGD) algorithm. Our codes are available at https://https://github.com/a1600012888/YOPO-You-Only-Propagate-Once.
Tasks Adversarial Defense
Published 2019-05-02
URL https://arxiv.org/abs/1905.00877v6
PDF https://arxiv.org/pdf/1905.00877v6.pdf
PWC https://paperswithcode.com/paper/you-only-propagate-once-painless-adversarial
Repo https://github.com/a1600012888/YOPO-You-Only-Propagate-Once
Framework pytorch

Machine learning for music genre: multifaceted review and experimentation with audioset

Title Machine learning for music genre: multifaceted review and experimentation with audioset
Authors Jaime Ramírez, M. Julia Flores
Abstract Music genre classification is one of the sub-disciplines of music information retrieval (MIR) with growing popularity among researchers, mainly due to the already open challenges. Although research has been prolific in terms of number of published works, the topic still suffers from a problem in its foundations: there is no clear and formal definition of what genre is. Music categorizations are vague and unclear, suffering from human subjectivity and lack of agreement. In its first part, this paper offers a survey trying to cover the many different aspects of the matter. Its main goal is give the reader an overview of the history and the current state-of-the-art, exploring techniques and datasets used to the date, as well as identifying current challenges, such as this ambiguity of genre definitions or the introduction of human-centric approaches. The paper pays special attention to new trends in machine learning applied to the music annotation problem. Finally, we also include a music genre classification experiment that compares different machine learning models using Audioset.
Tasks Information Retrieval, Music Information Retrieval
Published 2019-11-28
URL https://arxiv.org/abs/1911.12618v1
PDF https://arxiv.org/pdf/1911.12618v1.pdf
PWC https://paperswithcode.com/paper/machine-learning-for-music-genre-multifaceted
Repo https://github.com/jramcast/music-genre-classification-audioset
Framework none

Benchmarking Neural Network Robustness to Common Corruptions and Perturbations

Title Benchmarking Neural Network Robustness to Common Corruptions and Perturbations
Authors Dan Hendrycks, Thomas Dietterich
Abstract In this paper we establish rigorous benchmarks for image classifier robustness. Our first benchmark, ImageNet-C, standardizes and expands the corruption robustness topic, while showing which classifiers are preferable in safety-critical applications. Then we propose a new dataset called ImageNet-P which enables researchers to benchmark a classifier’s robustness to common perturbations. Unlike recent robustness research, this benchmark evaluates performance on common corruptions and perturbations not worst-case adversarial perturbations. We find that there are negligible changes in relative corruption robustness from AlexNet classifiers to ResNet classifiers. Afterward we discover ways to enhance corruption and perturbation robustness. We even find that a bypassed adversarial defense provides substantial common perturbation robustness. Together our benchmarks may aid future work toward networks that robustly generalize.
Tasks Adversarial Defense
Published 2019-03-28
URL http://arxiv.org/abs/1903.12261v1
PDF http://arxiv.org/pdf/1903.12261v1.pdf
PWC https://paperswithcode.com/paper/benchmarking-neural-network-robustness-to-2
Repo https://github.com/hendrycks/robustness
Framework pytorch

Better Long-Range Dependency By Bootstrapping A Mutual Information Regularizer

Title Better Long-Range Dependency By Bootstrapping A Mutual Information Regularizer
Authors Yanshuai Cao, Peng Xu
Abstract In this work, we develop a novel regularizer to improve the learning of long-range dependency of sequence data. Applied on language modelling, our regularizer expresses the inductive bias that sequence variables should have high mutual information even though the model might not see abundant observations for complex long-range dependency. We show how the `next sentence prediction (classification)’ heuristic can be derived in a principled way from our mutual information estimation framework, and be further extended to maximize the mutual information of sequence variables. The proposed approach not only is effective at increasing the mutual information of segments under the learned model but more importantly, leads to a higher likelihood on holdout data, and improved generation quality. Code is released at https://github.com/BorealisAI/BMI. |
Tasks Language Modelling
Published 2019-05-28
URL https://arxiv.org/abs/1905.11978v2
PDF https://arxiv.org/pdf/1905.11978v2.pdf
PWC https://paperswithcode.com/paper/better-long-range-dependency-by-bootstrapping
Repo https://github.com/BorealisAI/BMI
Framework pytorch

dpUGC: Learn Differentially Private Representation for User Generated Contents

Title dpUGC: Learn Differentially Private Representation for User Generated Contents
Authors Xuan-Son Vu, Son N. Tran, Lili Jiang
Abstract This paper firstly proposes a simple yet efficient generalized approach to apply differential privacy to text representation (i.e., word embedding). Based on it, we propose a user-level approach to learn personalized differentially private word embedding model on user generated contents (UGC). To our best knowledge, this is the first work of learning user-level differentially private word embedding model from text for sharing. The proposed approaches protect the privacy of the individual from re-identification, especially provide better trade-off of privacy and data utility on UGC data for sharing. The experimental results show that the trained embedding models are applicable for the classic text analysis tasks (e.g., regression). Moreover, the proposed approaches of learning differentially private embedding models are both framework- and data- independent, which facilitates the deployment and sharing. The source code is available at https://github.com/sonvx/dpText.
Tasks
Published 2019-03-25
URL http://arxiv.org/abs/1903.10453v1
PDF http://arxiv.org/pdf/1903.10453v1.pdf
PWC https://paperswithcode.com/paper/dpugc-learn-differentially-private
Repo https://github.com/sonvx/dpText
Framework tf

ANODE: Unconditionally Accurate Memory-Efficient Gradients for Neural ODEs

Title ANODE: Unconditionally Accurate Memory-Efficient Gradients for Neural ODEs
Authors Amir Gholami, Kurt Keutzer, George Biros
Abstract Residual neural networks can be viewed as the forward Euler discretization of an Ordinary Differential Equation (ODE) with a unit time step. This has recently motivated researchers to explore other discretization approaches and train ODE based networks. However, an important challenge of neural ODEs is their prohibitive memory cost during gradient backpropogation. Recently a method proposed in [8], claimed that this memory overhead can be reduced from O(LN_t), where N_t is the number of time steps, down to O(L) by solving forward ODE backwards in time, where L is the depth of the network. However, we will show that this approach may lead to several problems: (i) it may be numerically unstable for ReLU/non-ReLU activations and general convolution operators, and (ii) the proposed optimize-then-discretize approach may lead to divergent training due to inconsistent gradients for small time step sizes. We discuss the underlying problems, and to address them we propose ANODE, an Adjoint based Neural ODE framework which avoids the numerical instability related problems noted above, and provides unconditionally accurate gradients. ANODE has a memory footprint of O(L) + O(N_t), with the same computational cost as reversing ODE solve. We furthermore, discuss a memory efficient algorithm which can further reduce this footprint with a trade-off of additional computational cost. We show results on Cifar-10/100 datasets using ResNet and SqueezeNext neural networks.
Tasks
Published 2019-02-27
URL https://arxiv.org/abs/1902.10298v3
PDF https://arxiv.org/pdf/1902.10298v3.pdf
PWC https://paperswithcode.com/paper/anode-unconditionally-accurate-memory
Repo https://github.com/alexandrejash/Chivvo
Framework pytorch

DeepSUM: Deep neural network for Super-resolution of Unregistered Multitemporal images

Title DeepSUM: Deep neural network for Super-resolution of Unregistered Multitemporal images
Authors Andrea Bordone Molini, Diego Valsesia, Giulia Fracastoro, Enrico Magli
Abstract Recently, convolutional neural networks (CNN) have been successfully applied to many remote sensing problems. However, deep learning techniques for multi-image super-resolution from multitemporal unregistered imagery have received little attention so far. This work proposes a novel CNN-based technique that exploits both spatial and temporal correlations to combine multiple images. This novel framework integrates the spatial registration task directly inside the CNN, and allows to exploit the representation learning capabilities of the network to enhance registration accuracy. The entire super-resolution process relies on a single CNN with three main stages: shared 2D convolutions to extract high-dimensional features from the input images; a subnetwork proposing registration filters derived from the high-dimensional feature representations; 3D convolutions for slow fusion of the features from multiple images. The whole network can be trained end-to-end to recover a single high resolution image from multiple unregistered low resolution images. The method presented in this paper is the winner of the PROBA-V super-resolution challenge issued by the European Space Agency.
Tasks Image Super-Resolution, Multi-Frame Super-Resolution, Representation Learning, Super-Resolution
Published 2019-07-15
URL https://arxiv.org/abs/1907.06490v2
PDF https://arxiv.org/pdf/1907.06490v2.pdf
PWC https://paperswithcode.com/paper/deepsum-deep-neural-network-for-super
Repo https://github.com/diegovalsesia/deepsum
Framework tf

Auto-Embedding Generative Adversarial Networks for High Resolution Image Synthesis

Title Auto-Embedding Generative Adversarial Networks for High Resolution Image Synthesis
Authors Yong Guo, Qi Chen, Jian Chen, Qingyao Wu, Qinfeng Shi, Mingkui Tan
Abstract Generating images via the generative adversarial network (GAN) has attracted much attention recently. However, most of the existing GAN-based methods can only produce low-resolution images of limited quality. Directly generating high-resolution images using GANs is nontrivial, and often produces problematic images with incomplete objects. To address this issue, we develop a novel GAN called Auto-Embedding Generative Adversarial Network (AEGAN), which simultaneously encodes the global structure features and captures the fine-grained details. In our network, we use an autoencoder to learn the intrinsic high-level structure of real images and design a novel denoiser network to provide photo-realistic details for the generated images. In the experiments, we are able to produce 512x512 images of promising quality directly from the input noise. The resultant images exhibit better perceptual photo-realism, i.e., with sharper structure and richer details, than other baselines on several datasets, including Oxford-102 Flowers, Caltech-UCSD Birds (CUB), High-Quality Large-scale CelebFaces Attributes (CelebA-HQ), Large-scale Scene Understanding (LSUN) and ImageNet.
Tasks Image Generation, Scene Understanding
Published 2019-03-27
URL http://arxiv.org/abs/1903.11250v2
PDF http://arxiv.org/pdf/1903.11250v2.pdf
PWC https://paperswithcode.com/paper/auto-embedding-generative-adversarial
Repo https://github.com/guoyongcs/AEGAN
Framework tf

Evolving Structures in Complex Systems

Title Evolving Structures in Complex Systems
Authors Hugo Cisneros, Josef Sivic, Tomas Mikolov
Abstract In this paper we propose an approach for measuring growth of complexity of emerging patterns in complex systems such as cellular automata. We discuss several ways how a metric for measuring the complexity growth can be defined. This includes approaches based on compression algorithms and artificial neural networks. We believe such a metric can be useful for designing systems that could exhibit open-ended evolution, which itself might be a prerequisite for development of general artificial intelligence. We conduct experiments on 1D and 2D grid worlds and demonstrate that using the proposed metric we can automatically construct computational models with emerging properties similar to those found in the Conway’s Game of Life, as well as many other emergent phenomena. Interestingly, some of the patterns we observe resemble forms of artificial life. Our metric of structural complexity growth can be applied to a wide range of complex systems, as it is not limited to cellular automata.
Tasks Artificial Life
Published 2019-11-04
URL https://arxiv.org/abs/1911.01086v2
PDF https://arxiv.org/pdf/1911.01086v2.pdf
PWC https://paperswithcode.com/paper/evolving-structures-in-complex-systems
Repo https://github.com/hugcis/evolving-structures-in-complex-systems
Framework none
comments powered by Disqus