Paper Group AWR 123
Practical Bayesian Optimization for Variable Cost Objectives. Neural Message Passing for Quantum Chemistry. DOTA: A Large-scale Dataset for Object Detection in Aerial Images. Lifelong Generative Modeling. Artistic style transfer for videos and spherical images. A probabilistic and multi-objective analysis of lexicase selection and epsilon-lexicase …
Practical Bayesian Optimization for Variable Cost Objectives
Title | Practical Bayesian Optimization for Variable Cost Objectives |
Authors | Mark McLeod, Michael A. Osborne, Stephen J. Roberts |
Abstract | We propose a novel Bayesian Optimization approach for black-box functions with an environmental variable whose value determines the tradeoff between evaluation cost and the fidelity of the evaluations. Further, we use a novel approach to sampling support points, allowing faster construction of the acquisition function. This allows us to achieve optimization with lower overheads than previous approaches and is implemented for a more general class of problem. We show this approach to be effective on synthetic and real world benchmark problems. |
Tasks | |
Published | 2017-03-13 |
URL | http://arxiv.org/abs/1703.04335v2 |
http://arxiv.org/pdf/1703.04335v2.pdf | |
PWC | https://paperswithcode.com/paper/practical-bayesian-optimization-for-variable |
Repo | https://github.com/markm541374/gpbo |
Framework | none |
Neural Message Passing for Quantum Chemistry
Title | Neural Message Passing for Quantum Chemistry |
Authors | Justin Gilmer, Samuel S. Schoenholz, Patrick F. Riley, Oriol Vinyals, George E. Dahl |
Abstract | Supervised learning on molecules has incredible potential to be useful in chemistry, drug discovery, and materials science. Luckily, several promising and closely related neural network models invariant to molecular symmetries have already been described in the literature. These models learn a message passing algorithm and aggregation procedure to compute a function of their entire input graph. At this point, the next step is to find a particularly effective variant of this general approach and apply it to chemical prediction benchmarks until we either solve them or reach the limits of the approach. In this paper, we reformulate existing models into a single common framework we call Message Passing Neural Networks (MPNNs) and explore additional novel variations within this framework. Using MPNNs we demonstrate state of the art results on an important molecular property prediction benchmark; these results are strong enough that we believe future work should focus on datasets with larger molecules or more accurate ground truth labels. |
Tasks | Drug Discovery, Formation Energy, Graph Regression, Molecular Property Prediction, Node Classification |
Published | 2017-04-04 |
URL | http://arxiv.org/abs/1704.01212v2 |
http://arxiv.org/pdf/1704.01212v2.pdf | |
PWC | https://paperswithcode.com/paper/neural-message-passing-for-quantum-chemistry |
Repo | https://github.com/Microsoft/gated-graph-neural-network-samples |
Framework | tf |
DOTA: A Large-scale Dataset for Object Detection in Aerial Images
Title | DOTA: A Large-scale Dataset for Object Detection in Aerial Images |
Authors | Gui-Song Xia, Xiang Bai, Jian Ding, Zhen Zhu, Serge Belongie, Jiebo Luo, Mihai Datcu, Marcello Pelillo, Liangpei Zhang |
Abstract | Object detection is an important and challenging problem in computer vision. Although the past decade has witnessed major advances in object detection in natural scenes, such successes have been slow to aerial imagery, not only because of the huge variation in the scale, orientation and shape of the object instances on the earth’s surface, but also due to the scarcity of well-annotated datasets of objects in aerial scenes. To advance object detection research in Earth Vision, also known as Earth Observation and Remote Sensing, we introduce a large-scale Dataset for Object deTection in Aerial images (DOTA). To this end, we collect $2806$ aerial images from different sensors and platforms. Each image is of the size about 4000-by-4000 pixels and contains objects exhibiting a wide variety of scales, orientations, and shapes. These DOTA images are then annotated by experts in aerial image interpretation using $15$ common object categories. The fully annotated DOTA images contains $188,282$ instances, each of which is labeled by an arbitrary (8 d.o.f.) quadrilateral To build a baseline for object detection in Earth Vision, we evaluate state-of-the-art object detection algorithms on DOTA. Experiments demonstrate that DOTA well represents real Earth Vision applications and are quite challenging. |
Tasks | Object Detection, Object Detection In Aerial Images |
Published | 2017-11-28 |
URL | https://arxiv.org/abs/1711.10398v3 |
https://arxiv.org/pdf/1711.10398v3.pdf | |
PWC | https://paperswithcode.com/paper/dota-a-large-scale-dataset-for-object |
Repo | https://github.com/CAPTAIN-WHU/DOTA_devkit |
Framework | none |
Lifelong Generative Modeling
Title | Lifelong Generative Modeling |
Authors | Jason Ramapuram, Magda Gregorova, Alexandros Kalousis |
Abstract | Lifelong learning is the problem of learning multiple consecutive tasks in a sequential manner, where knowledge gained from previous tasks is retained and used to aid future learning over the lifetime of the learner. It is essential towards the development of intelligent machines that can adapt to their surroundings. In this work we focus on a lifelong learning approach to unsupervised generative modeling, where we continuously incorporate newly observed distributions into a learned model. We do so through a student-teacher Variational Autoencoder architecture which allows us to learn and preserve all the distributions seen so far, without the need to retain the past data nor the past models. Through the introduction of a novel cross-model regularizer, inspired by a Bayesian update rule, the student model leverages the information learned by the teacher, which acts as a probabilistic knowledge store. The regularizer reduces the effect of catastrophic interference that appears when we learn over sequences of distributions. We validate our model’s performance on sequential variants of MNIST, FashionMNIST, PermutedMNIST, SVHN and Celeb-A and demonstrate that our model mitigates the effects of catastrophic interference faced by neural networks in sequential learning scenarios. |
Tasks | Transfer Learning |
Published | 2017-05-27 |
URL | https://arxiv.org/abs/1705.09847v6 |
https://arxiv.org/pdf/1705.09847v6.pdf | |
PWC | https://paperswithcode.com/paper/lifelong-generative-modeling |
Repo | https://github.com/jramapuram/LifelongVAE_pytorch |
Framework | pytorch |
Artistic style transfer for videos and spherical images
Title | Artistic style transfer for videos and spherical images |
Authors | Manuel Ruder, Alexey Dosovitskiy, Thomas Brox |
Abstract | Manually re-drawing an image in a certain artistic style takes a professional artist a long time. Doing this for a video sequence single-handedly is beyond imagination. We present two computational approaches that transfer the style from one image (for example, a painting) to a whole video sequence. In our first approach, we adapt to videos the original image style transfer technique by Gatys et al. based on energy minimization. We introduce new ways of initialization and new loss functions to generate consistent and stable stylized video sequences even in cases with large motion and strong occlusion. Our second approach formulates video stylization as a learning problem. We propose a deep network architecture and training procedures that allow us to stylize arbitrary-length videos in a consistent and stable way, and nearly in real time. We show that the proposed methods clearly outperform simpler baselines both qualitatively and quantitatively. Finally, we propose a way to adapt these approaches also to 360 degree images and videos as they emerge with recent virtual reality hardware. |
Tasks | Style Transfer |
Published | 2017-08-13 |
URL | http://arxiv.org/abs/1708.04538v3 |
http://arxiv.org/pdf/1708.04538v3.pdf | |
PWC | https://paperswithcode.com/paper/artistic-style-transfer-for-videos-and |
Repo | https://github.com/Kishwar/tensorflow |
Framework | tf |
A probabilistic and multi-objective analysis of lexicase selection and epsilon-lexicase selection
Title | A probabilistic and multi-objective analysis of lexicase selection and epsilon-lexicase selection |
Authors | William La Cava, Thomas Helmuth, Lee Spector, Jason H. Moore |
Abstract | Lexicase selection is a parent selection method that considers training cases individually, rather than in aggregate, when performing parent selection. Whereas previous work has demonstrated the ability of lexicase selection to solve difficult problems in program synthesis and symbolic regression, the central goal of this paper is to develop the theoretical underpinnings that explain its performance. To this end, we derive an analytical formula that gives the expected probabilities of selection under lexicase selection, given a population and its behavior. In addition, we expand upon the relation of lexicase selection to many-objective optimization methods to describe the behavior of lexicase selection, which is to select individuals on the boundaries of Pareto fronts in high-dimensional space. We show analytically why lexicase selection performs more poorly for certain sizes of population and training cases, and show why it has been shown to perform more poorly in continuous error spaces. To address this last concern, we propose new variants of epsilon-lexicase selection, a method that modifies the pass condition in lexicase selection to allow near-elite individuals to pass cases, thereby improving selection performance with continuous errors. We show that epsilon-lexicase outperforms several diversity-maintenance strategies on a number of real-world and synthetic regression problems. |
Tasks | Program Synthesis |
Published | 2017-09-15 |
URL | http://arxiv.org/abs/1709.05394v3 |
http://arxiv.org/pdf/1709.05394v3.pdf | |
PWC | https://paperswithcode.com/paper/a-probabilistic-and-multi-objective-analysis |
Repo | https://github.com/lacava/epsilon_lexicase |
Framework | none |
Supervised Deep Sparse Coding Networks
Title | Supervised Deep Sparse Coding Networks |
Authors | Xiaoxia Sun, Nasser M. Nasrabadi, Trac D. Tran |
Abstract | In this paper, we describe the deep sparse coding network (SCN), a novel deep network that encodes intermediate representations with nonnegative sparse coding. The SCN is built upon a number of cascading bottleneck modules, where each module consists of two sparse coding layers with relatively wide and slim dictionaries that are specialized to produce high dimensional discriminative features and low dimensional representations for clustering, respectively. During training, both the dictionaries and regularization parameters are optimized with an end-to-end supervised learning algorithm based on multilevel optimization. Effectiveness of an SCN with seven bottleneck modules is verified on several popular benchmark datasets. Remarkably, with few parameters to learn, our SCN achieves 5.81% and 19.93% classification error rate on CIFAR-10 and CIFAR-100, respectively. |
Tasks | |
Published | 2017-01-29 |
URL | http://arxiv.org/abs/1701.08349v3 |
http://arxiv.org/pdf/1701.08349v3.pdf | |
PWC | https://paperswithcode.com/paper/supervised-deep-sparse-coding-networks |
Repo | https://github.com/XiaoxiaSun/supervised-deep-sparse-coding-networks |
Framework | none |
Using convolutional networks and satellite imagery to identify patterns in urban environments at a large scale
Title | Using convolutional networks and satellite imagery to identify patterns in urban environments at a large scale |
Authors | Adrian Albert, Jasleen Kaur, Marta Gonzalez |
Abstract | Urban planning applications (energy audits, investment, etc.) require an understanding of built infrastructure and its environment, i.e., both low-level, physical features (amount of vegetation, building area and geometry etc.), as well as higher-level concepts such as land use classes (which encode expert understanding of socio-economic end uses). This kind of data is expensive and labor-intensive to obtain, which limits its availability (particularly in developing countries). We analyze patterns in land use in urban neighborhoods using large-scale satellite imagery data (which is available worldwide from third-party providers) and state-of-the-art computer vision techniques based on deep convolutional neural networks. For supervision, given the limited availability of standard benchmarks for remote-sensing data, we obtain ground truth land use class labels carefully sampled from open-source surveys, in particular the Urban Atlas land classification dataset of $20$ land use classes across $~300$ European cities. We use this data to train and compare deep architectures which have recently shown good performance on standard computer vision tasks (image classification and segmentation), including on geospatial data. Furthermore, we show that the deep representations extracted from satellite imagery of urban environments can be used to compare neighborhoods across several cities. We make our dataset available for other machine learning researchers to use for remote-sensing applications. |
Tasks | Image Classification |
Published | 2017-04-10 |
URL | http://arxiv.org/abs/1704.02965v2 |
http://arxiv.org/pdf/1704.02965v2.pdf | |
PWC | https://paperswithcode.com/paper/using-convolutional-networks-and-satellite |
Repo | https://github.com/adrianalbert/urban-environments |
Framework | tf |
Improving Generalization Performance by Switching from Adam to SGD
Title | Improving Generalization Performance by Switching from Adam to SGD |
Authors | Nitish Shirish Keskar, Richard Socher |
Abstract | Despite superior training outcomes, adaptive optimization methods such as Adam, Adagrad or RMSprop have been found to generalize poorly compared to Stochastic gradient descent (SGD). These methods tend to perform well in the initial portion of training but are outperformed by SGD at later stages of training. We investigate a hybrid strategy that begins training with an adaptive method and switches to SGD when appropriate. Concretely, we propose SWATS, a simple strategy which switches from Adam to SGD when a triggering condition is satisfied. The condition we propose relates to the projection of Adam steps on the gradient subspace. By design, the monitoring process for this condition adds very little overhead and does not increase the number of hyperparameters in the optimizer. We report experiments on several standard benchmarks such as: ResNet, SENet, DenseNet and PyramidNet for the CIFAR-10 and CIFAR-100 data sets, ResNet on the tiny-ImageNet data set and language modeling with recurrent networks on the PTB and WT2 data sets. The results show that our strategy is capable of closing the generalization gap between SGD and Adam on a majority of the tasks. |
Tasks | Language Modelling |
Published | 2017-12-20 |
URL | http://arxiv.org/abs/1712.07628v1 |
http://arxiv.org/pdf/1712.07628v1.pdf | |
PWC | https://paperswithcode.com/paper/improving-generalization-performance-by |
Repo | https://github.com/mhmdsabry/Sentiment_Analysis |
Framework | none |
Reinforcement Learning with a Corrupted Reward Channel
Title | Reinforcement Learning with a Corrupted Reward Channel |
Authors | Tom Everitt, Victoria Krakovna, Laurent Orseau, Marcus Hutter, Shane Legg |
Abstract | No real-world reward function is perfect. Sensory errors and software bugs may result in RL agents observing higher (or lower) rewards than they should. For example, a reinforcement learning agent may prefer states where a sensory error gives it the maximum reward, but where the true reward is actually small. We formalise this problem as a generalised Markov Decision Problem called Corrupt Reward MDP. Traditional RL methods fare poorly in CRMDPs, even under strong simplifying assumptions and when trying to compensate for the possibly corrupt rewards. Two ways around the problem are investigated. First, by giving the agent richer data, such as in inverse reinforcement learning and semi-supervised reinforcement learning, reward corruption stemming from systematic sensory errors may sometimes be completely managed. Second, by using randomisation to blunt the agent’s optimisation, reward corruption can be partially managed under some assumptions. |
Tasks | |
Published | 2017-05-23 |
URL | http://arxiv.org/abs/1705.08417v2 |
http://arxiv.org/pdf/1705.08417v2.pdf | |
PWC | https://paperswithcode.com/paper/reinforcement-learning-with-a-corrupted |
Repo | https://github.com/jvmancuso/safe-grid-agents |
Framework | tf |
Deep learning for universal linear embeddings of nonlinear dynamics
Title | Deep learning for universal linear embeddings of nonlinear dynamics |
Authors | Bethany Lusch, J. Nathan Kutz, Steven L. Brunton |
Abstract | Identifying coordinate transformations that make strongly nonlinear dynamics approximately linear is a central challenge in modern dynamical systems. These transformations have the potential to enable prediction, estimation, and control of nonlinear systems using standard linear theory. The Koopman operator has emerged as a leading data-driven embedding, as eigenfunctions of this operator provide intrinsic coordinates that globally linearize the dynamics. However, identifying and representing these eigenfunctions has proven to be mathematically and computationally challenging. This work leverages the power of deep learning to discover representations of Koopman eigenfunctions from trajectory data of dynamical systems. Our network is parsimonious and interpretable by construction, embedding the dynamics on a low-dimensional manifold that is of the intrinsic rank of the dynamics and parameterized by the Koopman eigenfunctions. In particular, we identify nonlinear coordinates on which the dynamics are globally linear using a modified auto-encoder. We also generalize Koopman representations to include a ubiquitous class of systems that exhibit continuous spectra, ranging from the simple pendulum to nonlinear optics and broadband turbulence. Our framework parametrizes the continuous frequency using an auxiliary network, enabling a compact and efficient embedding at the intrinsic rank, while connecting our models to half a century of asymptotics. In this way, we benefit from the power and generality of deep learning, while retaining the physical interpretability of Koopman embeddings. |
Tasks | |
Published | 2017-12-27 |
URL | http://arxiv.org/abs/1712.09707v2 |
http://arxiv.org/pdf/1712.09707v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-for-universal-linear-embeddings |
Repo | https://github.com/BethanyL/DeepKoopman |
Framework | none |
Revisiting Simple Neural Networks for Learning Representations of Knowledge Graphs
Title | Revisiting Simple Neural Networks for Learning Representations of Knowledge Graphs |
Authors | Srinivas Ravishankar, Chandrahas, Partha Pratim Talukdar |
Abstract | We address the problem of learning vector representations for entities and relations in Knowledge Graphs (KGs) for Knowledge Base Completion (KBC). This problem has received significant attention in the past few years and multiple methods have been proposed. Most of the existing methods in the literature use a predefined characteristic scoring function for evaluating the correctness of KG triples. These scoring functions distinguish correct triples (high score) from incorrect ones (low score). However, their performance vary across different datasets. In this work, we demonstrate that a simple neural network based score function can consistently achieve near start-of-the-art performance on multiple datasets. We also quantitatively demonstrate biases in standard benchmark datasets, and highlight the need to perform evaluation spanning various datasets. |
Tasks | Knowledge Base Completion, Knowledge Graphs |
Published | 2017-11-15 |
URL | http://arxiv.org/abs/1711.05401v3 |
http://arxiv.org/pdf/1711.05401v3.pdf | |
PWC | https://paperswithcode.com/paper/revisiting-simple-neural-networks-for |
Repo | https://github.com/Srinivas-R/AKBC-2017-Paper-14 |
Framework | tf |
Characterizing Political Fake News in Twitter by its Meta-Data
Title | Characterizing Political Fake News in Twitter by its Meta-Data |
Authors | Julio Amador, Axel Oehmichen, Miguel Molina-Solana |
Abstract | This article presents a preliminary approach towards characterizing political fake news on Twitter through the analysis of their meta-data. In particular, we focus on more than 1.5M tweets collected on the day of the election of Donald Trump as 45th president of the United States of America. We use the meta-data embedded within those tweets in order to look for differences between tweets containing fake news and tweets not containing them. Specifically, we perform our analysis only on tweets that went viral, by studying proxies for users’ exposure to the tweets, by characterizing accounts spreading fake news, and by looking at their polarization. We found significant differences on the distribution of followers, the number of URLs on tweets, and the verification of the users. |
Tasks | |
Published | 2017-12-16 |
URL | http://arxiv.org/abs/1712.05999v1 |
http://arxiv.org/pdf/1712.05999v1.pdf | |
PWC | https://paperswithcode.com/paper/characterizing-political-fake-news-in-twitter |
Repo | https://github.com/MissLummie/ADAFake |
Framework | none |
From safe screening rules to working sets for faster Lasso-type solvers
Title | From safe screening rules to working sets for faster Lasso-type solvers |
Authors | Mathurin Massias, Alexandre Gramfort, Joseph Salmon |
Abstract | Convex sparsity-promoting regularizations are ubiquitous in modern statistical learning. By construction, they yield solutions with few non-zero coefficients, which correspond to saturated constraints in the dual optimization formulation. Working set (WS) strategies are generic optimization techniques that consist in solving simpler problems that only consider a subset of constraints, whose indices form the WS. Working set methods therefore involve two nested iterations: the outer loop corresponds to the definition of the WS and the inner loop calls a solver for the subproblems. For the Lasso estimator a WS is a set of features, while for a Group Lasso it refers to a set of groups. In practice, WS are generally small in this context so the associated feature Gram matrix can fit in memory. Here we show that the Gauss-Southwell rule (a greedy strategy for block coordinate descent techniques) leads to fast solvers in this case. Combined with a working set strategy based on an aggressive use of so-called Gap Safe screening rules, we propose a solver achieving state-of-the-art performance on sparse learning problems. Results are presented on Lasso and multi-task Lasso estimators. |
Tasks | Sparse Learning |
Published | 2017-03-21 |
URL | http://arxiv.org/abs/1703.07285v2 |
http://arxiv.org/pdf/1703.07285v2.pdf | |
PWC | https://paperswithcode.com/paper/from-safe-screening-rules-to-working-sets-for |
Repo | https://github.com/mathurinm/A5G |
Framework | none |
A Large Dimensional Analysis of Least Squares Support Vector Machines
Title | A Large Dimensional Analysis of Least Squares Support Vector Machines |
Authors | Zhenyu Liao, Romain Couillet |
Abstract | In this article, a large dimensional performance analysis of kernel least squares support vector machines (LS-SVMs) is provided under the assumption of a two-class Gaussian mixture model for the input data. Building upon recent advances in random matrix theory, we show, when the dimension of data $p$ and their number $n$ are both large, that the LS-SVM decision function can be well approximated by a normally distributed random variable, the mean and variance of which depend explicitly on a local behavior of the kernel function. This theoretical result is then applied to the MNIST and Fashion-MNIST datasets which, despite their non-Gaussianity, exhibit a convincingly close behavior. Most importantly, our analysis provides a deeper understanding of the mechanism into play in SVM-type methods and in particular of the impact on the choice of the kernel function as well as some of their theoretical limits in separating high dimensional Gaussian vectors. |
Tasks | |
Published | 2017-01-11 |
URL | http://arxiv.org/abs/1701.02967v2 |
http://arxiv.org/pdf/1701.02967v2.pdf | |
PWC | https://paperswithcode.com/paper/a-large-dimensional-analysis-of-least-squares |
Repo | https://github.com/Zhenyu-LIAO/RMT4LSSVM |
Framework | none |