February 1, 2020

3175 words 15 mins read

Paper Group AWR 256

Paper Group AWR 256

Efficient Exploration via State Marginal Matching. Heavy-Tailed Universality Predicts Trends in Test Accuracies for Very Large Pre-Trained Deep Neural Networks. From Patches to Pictures (PaQ-2-PiQ): Mapping the Perceptual Space of Picture Quality. Be Your Own Teacher: Improve the Performance of Convolutional Neural Networks via Self Distillation. T …

Efficient Exploration via State Marginal Matching

Title Efficient Exploration via State Marginal Matching
Authors Lisa Lee, Benjamin Eysenbach, Emilio Parisotto, Eric Xing, Sergey Levine, Ruslan Salakhutdinov
Abstract Exploration is critical to a reinforcement learning agent’s performance in its given environment. Prior exploration methods are often based on using heuristic auxiliary predictions to guide policy behavior, lacking a mathematically-grounded objective with clear properties. In contrast, we recast exploration as a problem of State Marginal Matching (SMM), where we aim to learn a policy for which the state marginal distribution matches a given target state distribution. The target distribution is a uniform distribution in most cases, but can incorporate prior knowledge if available. In effect, SMM amortizes the cost of learning to explore in a given environment. The SMM objective can be viewed as a two-player, zero-sum game between a state density model and a parametric policy, an idea that we use to build an algorithm for optimizing the SMM objective. Using this formalism, we further demonstrate that prior work approximately maximizes the SMM objective, offering an explanation for the success of these methods. On both simulated and real-world tasks, we demonstrate that agents that directly optimize the SMM objective explore faster and adapt more quickly to new tasks as compared to prior exploration methods.
Tasks Efficient Exploration
Published 2019-06-12
URL https://arxiv.org/abs/1906.05274v3
PDF https://arxiv.org/pdf/1906.05274v3.pdf
PWC https://paperswithcode.com/paper/efficient-exploration-via-state-marginal
Repo https://github.com/RLAgent/state-marginal-matching
Framework none
Title Heavy-Tailed Universality Predicts Trends in Test Accuracies for Very Large Pre-Trained Deep Neural Networks
Authors Charles H. Martin, Michael W. Mahoney
Abstract Given two or more Deep Neural Networks (DNNs) with the same or similar architectures, and trained on the same dataset, but trained with different solvers, parameters, hyper-parameters, regularization, etc., can we predict which DNN will have the best test accuracy, and can we do so without peeking at the test data? In this paper, we show how to use a new Theory of Heavy-Tailed Self-Regularization (HT-SR) to answer this. HT-SR suggests, among other things, that modern DNNs exhibit what we call Heavy-Tailed Mechanistic Universality (HT-MU), meaning that the correlations in the layer weight matrices can be fit to a power law (PL) with exponents that lie in common Universality classes from Heavy-Tailed Random Matrix Theory (HT-RMT). From this, we develop a Universal capacity control metric that is a weighted average of PL exponents. Rather than considering small toy NNs, we examine over 50 different, large-scale pre-trained DNNs, ranging over 15 different architectures, trained on ImagetNet, each of which has been reported to have different test accuracies. We show that this new capacity metric correlates very well with the reported test accuracies of these DNNs, looking across each architecture (VGG16/…/VGG19, ResNet10/…/ResNet152, etc.). We also show how to approximate the metric by the more familiar Product Norm capacity measure, as the average of the log Frobenius norm of the layer weight matrices. Our approach requires no changes to the underlying DNN or its loss function, it does not require us to train a model (although it could be used to monitor training), and it does not even require access to the ImageNet data.
Tasks
Published 2019-01-24
URL https://arxiv.org/abs/1901.08278v2
PDF https://arxiv.org/pdf/1901.08278v2.pdf
PWC https://paperswithcode.com/paper/heavy-tailed-universality-predicts-trends-in
Repo https://github.com/CalculatedContent/WeightWatcher
Framework pytorch

From Patches to Pictures (PaQ-2-PiQ): Mapping the Perceptual Space of Picture Quality

Title From Patches to Pictures (PaQ-2-PiQ): Mapping the Perceptual Space of Picture Quality
Authors Zhenqiang Ying, Haoran Niu, Praful Gupta, Dhruv Mahajan, Deepti Ghadiyaram, Alan Bovik
Abstract Blind or no-reference (NR) perceptual picture quality prediction is a difficult, unsolved problem of great consequence to the social and streaming media industries that impacts billions of viewers daily. Unfortunately, popular NR prediction models perform poorly on real-world distorted pictures. To advance progress on this problem, we introduce the largest (by far) subjective picture quality database, containing about 40000 real-world distorted pictures and 120000 patches, on which we collected about 4M human judgments of picture quality. Using these picture and patch quality labels, we built deep region-based architectures that learn to produce state-of-the-art global picture quality predictions as well as useful local picture quality maps. Our innovations include picture quality prediction architectures that produce global-to-local inferences as well as local-to-global inferences (via feedback).
Tasks
Published 2019-12-20
URL https://arxiv.org/abs/1912.10088v1
PDF https://arxiv.org/pdf/1912.10088v1.pdf
PWC https://paperswithcode.com/paper/from-patches-to-pictures-paq-2-piq-mapping
Repo https://github.com/baidut/PaQ-2-PiQ
Framework pytorch

Be Your Own Teacher: Improve the Performance of Convolutional Neural Networks via Self Distillation

Title Be Your Own Teacher: Improve the Performance of Convolutional Neural Networks via Self Distillation
Authors Linfeng Zhang, Jiebo Song, Anni Gao, Jingwei Chen, Chenglong Bao, Kaisheng Ma
Abstract Convolutional neural networks have been widely deployed in various application scenarios. In order to extend the applications’ boundaries to some accuracy-crucial domains, researchers have been investigating approaches to boost accuracy through either deeper or wider network structures, which brings with them the exponential increment of the computational and storage cost, delaying the responding time. In this paper, we propose a general training framework named self distillation, which notably enhances the performance (accuracy) of convolutional neural networks through shrinking the size of the network rather than aggrandizing it. Different from traditional knowledge distillation - a knowledge transformation methodology among networks, which forces student neural networks to approximate the softmax layer outputs of pre-trained teacher neural networks, the proposed self distillation framework distills knowledge within network itself. The networks are firstly divided into several sections. Then the knowledge in the deeper portion of the networks is squeezed into the shallow ones. Experiments further prove the generalization of the proposed self distillation framework: enhancement of accuracy at average level is 2.65%, varying from 0.61% in ResNeXt as minimum to 4.07% in VGG19 as maximum. In addition, it can also provide flexibility of depth-wise scalable inference on resource-limited edge devices.Our codes will be released on github soon.
Tasks
Published 2019-05-17
URL https://arxiv.org/abs/1905.08094v1
PDF https://arxiv.org/pdf/1905.08094v1.pdf
PWC https://paperswithcode.com/paper/be-your-own-teacher-improve-the-performance
Repo https://github.com/luanyunteng/pytorch-be-your-own-teacher
Framework pytorch

Text Classification Algorithms: A Survey

Title Text Classification Algorithms: A Survey
Authors Kamran Kowsari, Kiana Jafari Meimandi, Mojtaba Heidarysafa, Sanjana Mendu, Laura E. Barnes, Donald E. Brown
Abstract In recent years, there has been an exponential growth in the number of complex documents and texts that require a deeper understanding of machine learning methods to be able to accurately classify texts in many applications. Many machine learning approaches have achieved surpassing results in natural language processing. The success of these learning algorithms relies on their capacity to understand complex models and non-linear relationships within data. However, finding suitable structures, architectures, and techniques for text classification is a challenge for researchers. In this paper, a brief overview of text classification algorithms is discussed. This overview covers different text feature extractions, dimensionality reduction methods, existing algorithms and techniques, and evaluations methods. Finally, the limitations of each technique and their application in the real-world problem are discussed.
Tasks Dimensionality Reduction, Text Classification
Published 2019-04-17
URL https://arxiv.org/abs/1904.08067v4
PDF https://arxiv.org/pdf/1904.08067v4.pdf
PWC https://paperswithcode.com/paper/text-classification-algorithms-a-survey
Repo https://github.com/kk7nc/Text_Classification
Framework tf

Self-Referencing Embedded Strings (SELFIES): A 100% robust molecular string representation

Title Self-Referencing Embedded Strings (SELFIES): A 100% robust molecular string representation
Authors Mario Krenn, Florian Häse, AkshatKumar Nigam, Pascal Friederich, Alán Aspuru-Guzik
Abstract The discovery of novel materials and functional molecules can help to solve some of society’s most urgent challenges, ranging from efficient energy harvesting and storage to uncovering novel pharmaceutical drug candidates. Traditionally matter engineering – generally denoted as inverse design – was based massively on human intuition and high-throughput virtual screening. The last few years have seen the emergence of significant interest in computer-inspired designs based on evolutionary or deep learning methods. The major challenge here is that the standard strings molecular representation SMILES shows substantial weaknesses in that task because large fractions of strings do not correspond to valid molecules. Here, we solve this problem at a fundamental level and introduce SELFIES (SELF-referencIng Embedded Strings), a string-based representation of molecules which is 100% robust. Every SELFIES string corresponds to a valid molecule, and SELFIES can represent every molecule. SELFIES can be directly applied in arbitrary machine learning models without the adaptation of the models; each of the generated molecule candidates is valid. In our experiments, the model’s internal memory stores two orders of magnitude more diverse molecules than a similar test with SMILES. Furthermore, as all molecules are valid, it allows for explanation and interpretation of the internal working of the generative models.
Tasks
Published 2019-05-31
URL https://arxiv.org/abs/1905.13741v2
PDF https://arxiv.org/pdf/1905.13741v2.pdf
PWC https://paperswithcode.com/paper/selfies-a-robust-representation-of
Repo https://github.com/aspuru-guzik-group/selfies
Framework none

Weight Agnostic Neural Networks

Title Weight Agnostic Neural Networks
Authors Adam Gaier, David Ha
Abstract Not all neural network architectures are created equal, some perform much better than others for certain tasks. But how important are the weight parameters of a neural network compared to its architecture? In this work, we question to what extent neural network architectures alone, without learning any weight parameters, can encode solutions for a given task. We propose a search method for neural network architectures that can already perform a task without any explicit weight training. To evaluate these networks, we populate the connections with a single shared weight parameter sampled from a uniform random distribution, and measure the expected performance. We demonstrate that our method can find minimal neural network architectures that can perform several reinforcement learning tasks without weight training. On a supervised learning domain, we find network architectures that achieve much higher than chance accuracy on MNIST using random weights. Interactive version of this paper at https://weightagnostic.github.io/
Tasks Car Racing, Image Classification
Published 2019-06-11
URL https://arxiv.org/abs/1906.04358v2
PDF https://arxiv.org/pdf/1906.04358v2.pdf
PWC https://paperswithcode.com/paper/weight-agnostic-neural-networks
Repo https://github.com/google/brain-tokyo-workshop
Framework none

IPOD: Corpus of 190,000 Industrial Occupations

Title IPOD: Corpus of 190,000 Industrial Occupations
Authors Junhua Liu, Chu Guo, Yung Chuen Ng, Kristin L. Wood, Kwan Hui Lim
Abstract Job titles are the most fundamental building blocks for occupational data mining tasks, such as Career Modelling and Job Recommendation. However, there are no publicly available dataset to support such efforts. In this work, we present the Industrial and Professional Occupations Dataset (IPOD), which is a comprehensive corpus that consists of over 190,000 job titles crawled from over 56,000 profiles from Linkedin. To the best of our knowledge, IPOD is the first dataset released for industrial occupations mining. We use a knowledge-based approach for sequence tagging, creating a gazzetteer with domain-specific named entities tagged by 3 experts. All title NE tags are populated by the gazetteer using BIOES scheme. Finally, We develop 4 baseline models for the dataset on NER task with several models, including Linear Regression, CRF, LSTM and the state-of-the-art bi-directional LSTM-CRF. Both CRF and LSTM-CRF outperform human in both exact-match accuracy and f1 scores.
Tasks
Published 2019-10-22
URL https://arxiv.org/abs/1910.10495v1
PDF https://arxiv.org/pdf/1910.10495v1.pdf
PWC https://paperswithcode.com/paper/ipod-corpus-of-190000-industrial-occupations
Repo https://github.com/junhua/IPOD
Framework none

Towards Digital Retina in Smart Cities: A Model Generation, Utilization and Communication Paradigm

Title Towards Digital Retina in Smart Cities: A Model Generation, Utilization and Communication Paradigm
Authors Yihang Lou, Ling-Yu Duan, Yong Luo, Ziqian Chen, Tongliang Liu, Shiqi Wang, Wen Gao
Abstract The digital retina in smart cities is to select what the City Eye tells the City Brain, and convert the acquired visual data from front-end visual sensors to features in an intelligent sensing manner. By deploying deep learning and/or handcrafted models in front-end devices, the compact features can be extracted and subsequently delivered to back-end cloud for search and advanced analytics. In this context, we propose a model generation, utilization, and communication paradigm, aiming to address a set of unique challenges for better artificial intelligence services in smart cities. In particular, we present an integrated multiple deep learning models reuse and prediction strategy, which greatly increases the feasibility of the digital retina in processing and analyzing the large-scale visual data in smart cities. The promise of the proposed paradigm is demonstrated through a set of experiments.
Tasks
Published 2019-07-31
URL https://arxiv.org/abs/1907.13368v1
PDF https://arxiv.org/pdf/1907.13368v1.pdf
PWC https://paperswithcode.com/paper/towards-digital-retina-in-smart-cities-a
Repo https://github.com/PKU-IMRE/Retina
Framework none

Using Deep Learning and Machine Learning to Detect Epileptic Seizure with Electroencephalography (EEG) Data

Title Using Deep Learning and Machine Learning to Detect Epileptic Seizure with Electroencephalography (EEG) Data
Authors Haotian Liu, Lin Xi, Ying Zhao, Zhixiang Li
Abstract The prediction of epileptic seizure has always been extremely challenging in medical domain. However, as the development of computer technology, the application of machine learning introduced new ideas for seizure forecasting. Applying machine learning model onto the predication of epileptic seizure could help us obtain a better result and there have been plenty of scientists who have been doing such works so that there are sufficient medical data provided for researchers to do training of machine learning models.
Tasks EEG
Published 2019-10-06
URL https://arxiv.org/abs/1910.02544v1
PDF https://arxiv.org/pdf/1910.02544v1.pdf
PWC https://paperswithcode.com/paper/using-deep-learning-and-machine-learning-to
Repo https://github.com/gabi-a/EEG-Literature
Framework none

Signal2Image Modules in Deep Neural Networks for EEG Classification

Title Signal2Image Modules in Deep Neural Networks for EEG Classification
Authors Paschalis Bizopoulos, George I Lambrou, Dimitrios Koutsouris
Abstract Deep learning has revolutionized computer vision utilizing the increased availability of big data and the power of parallel computational units such as graphical processing units. The vast majority of deep learning research is conducted using images as training data, however the biomedical domain is rich in physiological signals that are used for diagnosis and prediction problems. It is still an open research question how to best utilize signals to train deep neural networks. In this paper we define the term Signal2Image (S2Is) as trainable or non-trainable prefix modules that convert signals, such as Electroencephalography (EEG), to image-like representations making them suitable for training image-based deep neural networks defined as base models'. We compare the accuracy and time performance of four S2Is (signal as image’, spectrogram, one and two layer Convolutional Neural Networks (CNNs)) combined with a set of `base models’ (LeNet, AlexNet, VGGnet, ResNet, DenseNet) along with the depth-wise and 1D variations of the latter. We also provide empirical evidence that the one layer CNN S2I performs better in eleven out of fifteen tested models than non-trainable S2Is for classifying EEG signals and we present visual comparisons of the outputs of the S2Is. |
Tasks EEG
Published 2019-04-18
URL https://arxiv.org/abs/1904.13216v3
PDF https://arxiv.org/pdf/1904.13216v3.pdf
PWC https://paperswithcode.com/paper/190413216
Repo https://github.com/pbizopoulos/signal2image-modules-in-deep-neural-networks-for-eeg-classification
Framework pytorch

Asymmetric Valleys: Beyond Sharp and Flat Local Minima

Title Asymmetric Valleys: Beyond Sharp and Flat Local Minima
Authors Haowei He, Gao Huang, Yang Yuan
Abstract Despite the non-convex nature of their loss functions, deep neural networks are known to generalize well when optimized with stochastic gradient descent (SGD). Recent work conjectures that SGD with proper configuration is able to find wide and flat local minima, which have been proposed to be associated with good generalization performance. In this paper, we observe that local minima of modern deep networks are more than being flat or sharp. Specifically, at a local minimum there exist many asymmetric directions such that the loss increases abruptly along one side, and slowly along the opposite side–we formally define such minima as asymmetric valleys. Under mild assumptions, we prove that for asymmetric valleys, a solution biased towards the flat side generalizes better than the exact minimizer. Further, we show that simply averaging the weights along the SGD trajectory gives rise to such biased solutions implicitly. This provides a theoretical explanation for the intriguing phenomenon observed by Izmailov et al. (2018). In addition, we empirically find that batch normalization (BN) appears to be a major cause for asymmetric valleys.
Tasks
Published 2019-02-02
URL http://arxiv.org/abs/1902.00744v2
PDF http://arxiv.org/pdf/1902.00744v2.pdf
PWC https://paperswithcode.com/paper/asymmetric-valleys-beyond-sharp-and-flat
Repo https://github.com/962086838/code-for-Asymmetric-Valley
Framework pytorch

Learning Cross-modal Context Graph for Visual Grounding

Title Learning Cross-modal Context Graph for Visual Grounding
Authors Yongfei Liu, Bo Wan, Xiaodan Zhu, Xuming He
Abstract Visual grounding is a ubiquitous building block in many vision-language tasks and yet remains challenging due to large variations in visual and linguistic features of grounding entities, strong context effect and the resulting semantic ambiguities. Prior works typically focus on learning representations of individual phrases with limited context information. To address their limitations, this paper proposes a language-guided graph representation to capture the global context of grounding entities and their relations, and develop a cross-modal graph matching strategy for the multiple-phrase visual grounding task. In particular, we introduce a modular graph neural network to compute context-aware representations of phrases and object proposals respectively via message propagation, followed by a graph-based matching module to generate globally consistent localization of grounding phrases. We train the entire graph neural network jointly in a two-stage strategy and evaluate it on the Flickr30K Entities benchmark. Extensive experiments show that our method outperforms the prior state of the arts by a sizable margin, evidencing the efficacy of our grounding framework. Code is available at “https://github.com/youngfly11/LCMCG-PyTorch".
Tasks Graph Matching
Published 2019-11-20
URL https://arxiv.org/abs/1911.09042v2
PDF https://arxiv.org/pdf/1911.09042v2.pdf
PWC https://paperswithcode.com/paper/learning-cross-modal-context-graph-for-visual-1
Repo https://github.com/youngfly11/LCMCG-PyTorch
Framework pytorch

Prescribed Generative Adversarial Networks

Title Prescribed Generative Adversarial Networks
Authors Adji B. Dieng, Francisco J. R. Ruiz, David M. Blei, Michalis K. Titsias
Abstract Generative adversarial networks (GANs) are a powerful approach to unsupervised learning. They have achieved state-of-the-art performance in the image domain. However, GANs are limited in two ways. They often learn distributions with low support—a phenomenon known as mode collapse—and they do not guarantee the existence of a probability density, which makes evaluating generalization using predictive log-likelihood impossible. In this paper, we develop the prescribed GAN (PresGAN) to address these shortcomings. PresGANs add noise to the output of a density network and optimize an entropy-regularized adversarial loss. The added noise renders tractable approximations of the predictive log-likelihood and stabilizes the training procedure. The entropy regularizer encourages PresGANs to capture all the modes of the data distribution. Fitting PresGANs involves computing the intractable gradients of the entropy regularization term; PresGANs sidestep this intractability using unbiased stochastic estimates. We evaluate PresGANs on several datasets and found they mitigate mode collapse and generate samples with high perceptual quality. We further found that PresGANs reduce the gap in performance in terms of predictive log-likelihood between traditional GANs and variational autoencoders (VAEs).
Tasks Image Generation
Published 2019-10-09
URL https://arxiv.org/abs/1910.04302v1
PDF https://arxiv.org/pdf/1910.04302v1.pdf
PWC https://paperswithcode.com/paper/prescribed-generative-adversarial-networks
Repo https://github.com/adjidieng/PresGANs
Framework pytorch

Posterior-regularized REINFORCE for Instance Selection in Distant Supervision

Title Posterior-regularized REINFORCE for Instance Selection in Distant Supervision
Authors Qi Zhang, Siliang Tang, Xiang Ren, Fei Wu, Shiliang Pu, Yueting Zhuang
Abstract This paper provides a new way to improve the efficiency of the REINFORCE training process. We apply it to the task of instance selection in distant supervision. Modeling the instance selection in one bag as a sequential decision process, a reinforcement learning agent is trained to determine whether an instance is valuable or not and construct a new bag with less noisy instances. However unbiased methods, such as REINFORCE, could usually take much time to train. This paper adopts posterior regularization (PR) to integrate some domain-specific rules in instance selection using REINFORCE. As the experiment results show, this method remarkably improves the performance of the relation classifier trained on cleaned distant supervision dataset as well as the efficiency of the REINFORCE training.
Tasks
Published 2019-04-17
URL http://arxiv.org/abs/1904.08051v1
PDF http://arxiv.org/pdf/1904.08051v1.pdf
PWC https://paperswithcode.com/paper/posterior-regularized-reinforce-for-instance
Repo https://github.com/hitcszq/PRRLRE
Framework tf
comments powered by Disqus