July 29, 2019

3145 words 15 mins read

Paper Group AWR 123

Practical Bayesian Optimization for Variable Cost Objectives. Neural Message Passing for Quantum Chemistry. DOTA: A Large-scale Dataset for Object Detection in Aerial Images. Lifelong Generative Modeling. Artistic style transfer for videos and spherical images. A probabilistic and multi-objective analysis of lexicase selection and epsilon-lexicase …

Practical Bayesian Optimization for Variable Cost Objectives


Title	Practical Bayesian Optimization for Variable Cost Objectives
Authors	Mark McLeod, Michael A. Osborne, Stephen J. Roberts
Abstract	We propose a novel Bayesian Optimization approach for black-box functions with an environmental variable whose value determines the tradeoff between evaluation cost and the fidelity of the evaluations. Further, we use a novel approach to sampling support points, allowing faster construction of the acquisition function. This allows us to achieve optimization with lower overheads than previous approaches and is implemented for a more general class of problem. We show this approach to be effective on synthetic and real world benchmark problems.
Tasks
Published	2017-03-13
URL	http://arxiv.org/abs/1703.04335v2
PDF	http://arxiv.org/pdf/1703.04335v2.pdf
PWC	https://paperswithcode.com/paper/practical-bayesian-optimization-for-variable
Repo	https://github.com/markm541374/gpbo
Framework	none

Neural Message Passing for Quantum Chemistry


Title	Neural Message Passing for Quantum Chemistry
Authors	Justin Gilmer, Samuel S. Schoenholz, Patrick F. Riley, Oriol Vinyals, George E. Dahl
Abstract	Supervised learning on molecules has incredible potential to be useful in chemistry, drug discovery, and materials science. Luckily, several promising and closely related neural network models invariant to molecular symmetries have already been described in the literature. These models learn a message passing algorithm and aggregation procedure to compute a function of their entire input graph. At this point, the next step is to find a particularly effective variant of this general approach and apply it to chemical prediction benchmarks until we either solve them or reach the limits of the approach. In this paper, we reformulate existing models into a single common framework we call Message Passing Neural Networks (MPNNs) and explore additional novel variations within this framework. Using MPNNs we demonstrate state of the art results on an important molecular property prediction benchmark; these results are strong enough that we believe future work should focus on datasets with larger molecules or more accurate ground truth labels.
Tasks	Drug Discovery, Formation Energy, Graph Regression, Molecular Property Prediction, Node Classification
Published	2017-04-04
URL	http://arxiv.org/abs/1704.01212v2
PDF	http://arxiv.org/pdf/1704.01212v2.pdf
PWC	https://paperswithcode.com/paper/neural-message-passing-for-quantum-chemistry
Repo	https://github.com/Microsoft/gated-graph-neural-network-samples
Framework	tf

DOTA: A Large-scale Dataset for Object Detection in Aerial Images


Title	DOTA: A Large-scale Dataset for Object Detection in Aerial Images
Authors	Gui-Song Xia, Xiang Bai, Jian Ding, Zhen Zhu, Serge Belongie, Jiebo Luo, Mihai Datcu, Marcello Pelillo, Liangpei Zhang
Abstract	Object detection is an important and challenging problem in computer vision. Although the past decade has witnessed major advances in object detection in natural scenes, such successes have been slow to aerial imagery, not only because of the huge variation in the scale, orientation and shape of the object instances on the earth’s surface, but also due to the scarcity of well-annotated datasets of objects in aerial scenes. To advance object detection research in Earth Vision, also known as Earth Observation and Remote Sensing, we introduce a large-scale Dataset for Object deTection in Aerial images (DOTA). To this end, we collect $2806$ aerial images from different sensors and platforms. Each image is of the size about 4000-by-4000 pixels and contains objects exhibiting a wide variety of scales, orientations, and shapes. These DOTA images are then annotated by experts in aerial image interpretation using $15$ common object categories. The fully annotated DOTA images contains $188,282$ instances, each of which is labeled by an arbitrary (8 d.o.f.) quadrilateral To build a baseline for object detection in Earth Vision, we evaluate state-of-the-art object detection algorithms on DOTA. Experiments demonstrate that DOTA well represents real Earth Vision applications and are quite challenging.
Tasks	Object Detection, Object Detection In Aerial Images
Published	2017-11-28
URL	https://arxiv.org/abs/1711.10398v3
PDF	https://arxiv.org/pdf/1711.10398v3.pdf
PWC	https://paperswithcode.com/paper/dota-a-large-scale-dataset-for-object
Repo	https://github.com/CAPTAIN-WHU/DOTA_devkit
Framework	none

Lifelong Generative Modeling


Title	Lifelong Generative Modeling
Authors	Jason Ramapuram, Magda Gregorova, Alexandros Kalousis
Abstract	Lifelong learning is the problem of learning multiple consecutive tasks in a sequential manner, where knowledge gained from previous tasks is retained and used to aid future learning over the lifetime of the learner. It is essential towards the development of intelligent machines that can adapt to their surroundings. In this work we focus on a lifelong learning approach to unsupervised generative modeling, where we continuously incorporate newly observed distributions into a learned model. We do so through a student-teacher Variational Autoencoder architecture which allows us to learn and preserve all the distributions seen so far, without the need to retain the past data nor the past models. Through the introduction of a novel cross-model regularizer, inspired by a Bayesian update rule, the student model leverages the information learned by the teacher, which acts as a probabilistic knowledge store. The regularizer reduces the effect of catastrophic interference that appears when we learn over sequences of distributions. We validate our model’s performance on sequential variants of MNIST, FashionMNIST, PermutedMNIST, SVHN and Celeb-A and demonstrate that our model mitigates the effects of catastrophic interference faced by neural networks in sequential learning scenarios.
Tasks	Transfer Learning
Published	2017-05-27
URL	https://arxiv.org/abs/1705.09847v6
PDF	https://arxiv.org/pdf/1705.09847v6.pdf
PWC	https://paperswithcode.com/paper/lifelong-generative-modeling
Repo	https://github.com/jramapuram/LifelongVAE_pytorch
Framework	pytorch

Artistic style transfer for videos and spherical images


Title	Artistic style transfer for videos and spherical images
Authors	Manuel Ruder, Alexey Dosovitskiy, Thomas Brox
Abstract	Manually re-drawing an image in a certain artistic style takes a professional artist a long time. Doing this for a video sequence single-handedly is beyond imagination. We present two computational approaches that transfer the style from one image (for example, a painting) to a whole video sequence. In our first approach, we adapt to videos the original image style transfer technique by Gatys et al. based on energy minimization. We introduce new ways of initialization and new loss functions to generate consistent and stable stylized video sequences even in cases with large motion and strong occlusion. Our second approach formulates video stylization as a learning problem. We propose a deep network architecture and training procedures that allow us to stylize arbitrary-length videos in a consistent and stable way, and nearly in real time. We show that the proposed methods clearly outperform simpler baselines both qualitatively and quantitatively. Finally, we propose a way to adapt these approaches also to 360 degree images and videos as they emerge with recent virtual reality hardware.
Tasks	Style Transfer
Published	2017-08-13
URL	http://arxiv.org/abs/1708.04538v3
PDF	http://arxiv.org/pdf/1708.04538v3.pdf
PWC	https://paperswithcode.com/paper/artistic-style-transfer-for-videos-and
Repo	https://github.com/Kishwar/tensorflow
Framework	tf

A probabilistic and multi-objective analysis of lexicase selection and epsilon-lexicase selection


Title	A probabilistic and multi-objective analysis of lexicase selection and epsilon-lexicase selection
Authors	William La Cava, Thomas Helmuth, Lee Spector, Jason H. Moore
Abstract	Lexicase selection is a parent selection method that considers training cases individually, rather than in aggregate, when performing parent selection. Whereas previous work has demonstrated the ability of lexicase selection to solve difficult problems in program synthesis and symbolic regression, the central goal of this paper is to develop the theoretical underpinnings that explain its performance. To this end, we derive an analytical formula that gives the expected probabilities of selection under lexicase selection, given a population and its behavior. In addition, we expand upon the relation of lexicase selection to many-objective optimization methods to describe the behavior of lexicase selection, which is to select individuals on the boundaries of Pareto fronts in high-dimensional space. We show analytically why lexicase selection performs more poorly for certain sizes of population and training cases, and show why it has been shown to perform more poorly in continuous error spaces. To address this last concern, we propose new variants of epsilon-lexicase selection, a method that modifies the pass condition in lexicase selection to allow near-elite individuals to pass cases, thereby improving selection performance with continuous errors. We show that epsilon-lexicase outperforms several diversity-maintenance strategies on a number of real-world and synthetic regression problems.
Tasks	Program Synthesis
Published	2017-09-15
URL	http://arxiv.org/abs/1709.05394v3
PDF	http://arxiv.org/pdf/1709.05394v3.pdf
PWC	https://paperswithcode.com/paper/a-probabilistic-and-multi-objective-analysis
Repo	https://github.com/lacava/epsilon_lexicase
Framework	none

Supervised Deep Sparse Coding Networks


Title	Supervised Deep Sparse Coding Networks
Authors	Xiaoxia Sun, Nasser M. Nasrabadi, Trac D. Tran
Abstract	In this paper, we describe the deep sparse coding network (SCN), a novel deep network that encodes intermediate representations with nonnegative sparse coding. The SCN is built upon a number of cascading bottleneck modules, where each module consists of two sparse coding layers with relatively wide and slim dictionaries that are specialized to produce high dimensional discriminative features and low dimensional representations for clustering, respectively. During training, both the dictionaries and regularization parameters are optimized with an end-to-end supervised learning algorithm based on multilevel optimization. Effectiveness of an SCN with seven bottleneck modules is verified on several popular benchmark datasets. Remarkably, with few parameters to learn, our SCN achieves 5.81% and 19.93% classification error rate on CIFAR-10 and CIFAR-100, respectively.
Tasks
Published	2017-01-29
URL	http://arxiv.org/abs/1701.08349v3
PDF	http://arxiv.org/pdf/1701.08349v3.pdf
PWC	https://paperswithcode.com/paper/supervised-deep-sparse-coding-networks
Repo	https://github.com/XiaoxiaSun/supervised-deep-sparse-coding-networks
Framework	none

Using convolutional networks and satellite imagery to identify patterns in urban environments at a large scale


Title	Using convolutional networks and satellite imagery to identify patterns in urban environments at a large scale
Authors	Adrian Albert, Jasleen Kaur, Marta Gonzalez
Abstract	Urban planning applications (energy audits, investment, etc.) require an understanding of built infrastructure and its environment, i.e., both low-level, physical features (amount of vegetation, building area and geometry etc.), as well as higher-level concepts such as land use classes (which encode expert understanding of socio-economic end uses). This kind of data is expensive and labor-intensive to obtain, which limits its availability (particularly in developing countries). We analyze patterns in land use in urban neighborhoods using large-scale satellite imagery data (which is available worldwide from third-party providers) and state-of-the-art computer vision techniques based on deep convolutional neural networks. For supervision, given the limited availability of standard benchmarks for remote-sensing data, we obtain ground truth land use class labels carefully sampled from open-source surveys, in particular the Urban Atlas land classification dataset of $20$ land use classes across $~300$ European cities. We use this data to train and compare deep architectures which have recently shown good performance on standard computer vision tasks (image classification and segmentation), including on geospatial data. Furthermore, we show that the deep representations extracted from satellite imagery of urban environments can be used to compare neighborhoods across several cities. We make our dataset available for other machine learning researchers to use for remote-sensing applications.
Tasks	Image Classification
Published	2017-04-10
URL	http://arxiv.org/abs/1704.02965v2
PDF	http://arxiv.org/pdf/1704.02965v2.pdf
PWC	https://paperswithcode.com/paper/using-convolutional-networks-and-satellite
Repo	https://github.com/adrianalbert/urban-environments
Framework	tf

Improving Generalization Performance by Switching from Adam to SGD


Title	Improving Generalization Performance by Switching from Adam to SGD
Authors	Nitish Shirish Keskar, Richard Socher
Abstract	Despite superior training outcomes, adaptive optimization methods such as Adam, Adagrad or RMSprop have been found to generalize poorly compared to Stochastic gradient descent (SGD). These methods tend to perform well in the initial portion of training but are outperformed by SGD at later stages of training. We investigate a hybrid strategy that begins training with an adaptive method and switches to SGD when appropriate. Concretely, we propose SWATS, a simple strategy which switches from Adam to SGD when a triggering condition is satisfied. The condition we propose relates to the projection of Adam steps on the gradient subspace. By design, the monitoring process for this condition adds very little overhead and does not increase the number of hyperparameters in the optimizer. We report experiments on several standard benchmarks such as: ResNet, SENet, DenseNet and PyramidNet for the CIFAR-10 and CIFAR-100 data sets, ResNet on the tiny-ImageNet data set and language modeling with recurrent networks on the PTB and WT2 data sets. The results show that our strategy is capable of closing the generalization gap between SGD and Adam on a majority of the tasks.
Tasks	Language Modelling
Published	2017-12-20
URL	http://arxiv.org/abs/1712.07628v1
PDF	http://arxiv.org/pdf/1712.07628v1.pdf
PWC	https://paperswithcode.com/paper/improving-generalization-performance-by
Repo	https://github.com/mhmdsabry/Sentiment_Analysis
Framework	none

Reinforcement Learning with a Corrupted Reward Channel


Title	Reinforcement Learning with a Corrupted Reward Channel
Authors	Tom Everitt, Victoria Krakovna, Laurent Orseau, Marcus Hutter, Shane Legg
Abstract	No real-world reward function is perfect. Sensory errors and software bugs may result in RL agents observing higher (or lower) rewards than they should. For example, a reinforcement learning agent may prefer states where a sensory error gives it the maximum reward, but where the true reward is actually small. We formalise this problem as a generalised Markov Decision Problem called Corrupt Reward MDP. Traditional RL methods fare poorly in CRMDPs, even under strong simplifying assumptions and when trying to compensate for the possibly corrupt rewards. Two ways around the problem are investigated. First, by giving the agent richer data, such as in inverse reinforcement learning and semi-supervised reinforcement learning, reward corruption stemming from systematic sensory errors may sometimes be completely managed. Second, by using randomisation to blunt the agent’s optimisation, reward corruption can be partially managed under some assumptions.
Tasks
Published	2017-05-23
URL	http://arxiv.org/abs/1705.08417v2
PDF	http://arxiv.org/pdf/1705.08417v2.pdf
PWC	https://paperswithcode.com/paper/reinforcement-learning-with-a-corrupted
Repo	https://github.com/jvmancuso/safe-grid-agents
Framework	tf

Deep learning for universal linear embeddings of nonlinear dynamics


Title	Deep learning for universal linear embeddings of nonlinear dynamics
Authors	Bethany Lusch, J. Nathan Kutz, Steven L. Brunton
Abstract	Identifying coordinate transformations that make strongly nonlinear dynamics approximately linear is a central challenge in modern dynamical systems. These transformations have the potential to enable prediction, estimation, and control of nonlinear systems using standard linear theory. The Koopman operator has emerged as a leading data-driven embedding, as eigenfunctions of this operator provide intrinsic coordinates that globally linearize the dynamics. However, identifying and representing these eigenfunctions has proven to be mathematically and computationally challenging. This work leverages the power of deep learning to discover representations of Koopman eigenfunctions from trajectory data of dynamical systems. Our network is parsimonious and interpretable by construction, embedding the dynamics on a low-dimensional manifold that is of the intrinsic rank of the dynamics and parameterized by the Koopman eigenfunctions. In particular, we identify nonlinear coordinates on which the dynamics are globally linear using a modified auto-encoder. We also generalize Koopman representations to include a ubiquitous class of systems that exhibit continuous spectra, ranging from the simple pendulum to nonlinear optics and broadband turbulence. Our framework parametrizes the continuous frequency using an auxiliary network, enabling a compact and efficient embedding at the intrinsic rank, while connecting our models to half a century of asymptotics. In this way, we benefit from the power and generality of deep learning, while retaining the physical interpretability of Koopman embeddings.
Tasks
Published	2017-12-27
URL	http://arxiv.org/abs/1712.09707v2
PDF	http://arxiv.org/pdf/1712.09707v2.pdf
PWC	https://paperswithcode.com/paper/deep-learning-for-universal-linear-embeddings
Repo	https://github.com/BethanyL/DeepKoopman
Framework	none

Revisiting Simple Neural Networks for Learning Representations of Knowledge Graphs


Title	Revisiting Simple Neural Networks for Learning Representations of Knowledge Graphs
Authors	Srinivas Ravishankar, Chandrahas, Partha Pratim Talukdar
Abstract	We address the problem of learning vector representations for entities and relations in Knowledge Graphs (KGs) for Knowledge Base Completion (KBC). This problem has received significant attention in the past few years and multiple methods have been proposed. Most of the existing methods in the literature use a predefined characteristic scoring function for evaluating the correctness of KG triples. These scoring functions distinguish correct triples (high score) from incorrect ones (low score). However, their performance vary across different datasets. In this work, we demonstrate that a simple neural network based score function can consistently achieve near start-of-the-art performance on multiple datasets. We also quantitatively demonstrate biases in standard benchmark datasets, and highlight the need to perform evaluation spanning various datasets.
Tasks	Knowledge Base Completion, Knowledge Graphs
Published	2017-11-15
URL	http://arxiv.org/abs/1711.05401v3
PDF	http://arxiv.org/pdf/1711.05401v3.pdf
PWC	https://paperswithcode.com/paper/revisiting-simple-neural-networks-for
Repo	https://github.com/Srinivas-R/AKBC-2017-Paper-14
Framework	tf

Characterizing Political Fake News in Twitter by its Meta-Data


Title	Characterizing Political Fake News in Twitter by its Meta-Data
Authors	Julio Amador, Axel Oehmichen, Miguel Molina-Solana
Abstract	This article presents a preliminary approach towards characterizing political fake news on Twitter through the analysis of their meta-data. In particular, we focus on more than 1.5M tweets collected on the day of the election of Donald Trump as 45th president of the United States of America. We use the meta-data embedded within those tweets in order to look for differences between tweets containing fake news and tweets not containing them. Specifically, we perform our analysis only on tweets that went viral, by studying proxies for users’ exposure to the tweets, by characterizing accounts spreading fake news, and by looking at their polarization. We found significant differences on the distribution of followers, the number of URLs on tweets, and the verification of the users.
Tasks
Published	2017-12-16
URL	http://arxiv.org/abs/1712.05999v1
PDF	http://arxiv.org/pdf/1712.05999v1.pdf
PWC	https://paperswithcode.com/paper/characterizing-political-fake-news-in-twitter
Repo	https://github.com/MissLummie/ADAFake
Framework	none

From safe screening rules to working sets for faster Lasso-type solvers


Title	From safe screening rules to working sets for faster Lasso-type solvers
Authors	Mathurin Massias, Alexandre Gramfort, Joseph Salmon
Abstract	Convex sparsity-promoting regularizations are ubiquitous in modern statistical learning. By construction, they yield solutions with few non-zero coefficients, which correspond to saturated constraints in the dual optimization formulation. Working set (WS) strategies are generic optimization techniques that consist in solving simpler problems that only consider a subset of constraints, whose indices form the WS. Working set methods therefore involve two nested iterations: the outer loop corresponds to the definition of the WS and the inner loop calls a solver for the subproblems. For the Lasso estimator a WS is a set of features, while for a Group Lasso it refers to a set of groups. In practice, WS are generally small in this context so the associated feature Gram matrix can fit in memory. Here we show that the Gauss-Southwell rule (a greedy strategy for block coordinate descent techniques) leads to fast solvers in this case. Combined with a working set strategy based on an aggressive use of so-called Gap Safe screening rules, we propose a solver achieving state-of-the-art performance on sparse learning problems. Results are presented on Lasso and multi-task Lasso estimators.
Tasks	Sparse Learning
Published	2017-03-21
URL	http://arxiv.org/abs/1703.07285v2
PDF	http://arxiv.org/pdf/1703.07285v2.pdf
PWC	https://paperswithcode.com/paper/from-safe-screening-rules-to-working-sets-for
Repo	https://github.com/mathurinm/A5G
Framework	none

A Large Dimensional Analysis of Least Squares Support Vector Machines


Title	A Large Dimensional Analysis of Least Squares Support Vector Machines
Authors	Zhenyu Liao, Romain Couillet
Abstract	In this article, a large dimensional performance analysis of kernel least squares support vector machines (LS-SVMs) is provided under the assumption of a two-class Gaussian mixture model for the input data. Building upon recent advances in random matrix theory, we show, when the dimension of data $p$ and their number $n$ are both large, that the LS-SVM decision function can be well approximated by a normally distributed random variable, the mean and variance of which depend explicitly on a local behavior of the kernel function. This theoretical result is then applied to the MNIST and Fashion-MNIST datasets which, despite their non-Gaussianity, exhibit a convincingly close behavior. Most importantly, our analysis provides a deeper understanding of the mechanism into play in SVM-type methods and in particular of the impact on the choice of the kernel function as well as some of their theoretical limits in separating high dimensional Gaussian vectors.
Tasks
Published	2017-01-11
URL	http://arxiv.org/abs/1701.02967v2
PDF	http://arxiv.org/pdf/1701.02967v2.pdf
PWC	https://paperswithcode.com/paper/a-large-dimensional-analysis-of-least-squares
Repo	https://github.com/Zhenyu-LIAO/RMT4LSSVM
Framework	none