Paper Group AWR 139
Self-Tuning Networks: Bilevel Optimization of Hyperparameters using Structured Best-Response Functions. CUDA-Self-Organizing feature map based visual sentiment analysis of bank customer complaints for Analytical CRM. Graph Transformer for Graph-to-Sequence Learning. Market Trend Prediction using Sentiment Analysis: Lessons Learned and Paths Forward …
Self-Tuning Networks: Bilevel Optimization of Hyperparameters using Structured Best-Response Functions
Title | Self-Tuning Networks: Bilevel Optimization of Hyperparameters using Structured Best-Response Functions |
Authors | Matthew MacKay, Paul Vicol, Jon Lorraine, David Duvenaud, Roger Grosse |
Abstract | Hyperparameter optimization can be formulated as a bilevel optimization problem, where the optimal parameters on the training set depend on the hyperparameters. We aim to adapt regularization hyperparameters for neural networks by fitting compact approximations to the best-response function, which maps hyperparameters to optimal weights and biases. We show how to construct scalable best-response approximations for neural networks by modeling the best-response as a single network whose hidden units are gated conditionally on the regularizer. We justify this approximation by showing the exact best-response for a shallow linear network with L2-regularized Jacobian can be represented by a similar gating mechanism. We fit this model using a gradient-based hyperparameter optimization algorithm which alternates between approximating the best-response around the current hyperparameters and optimizing the hyperparameters using the approximate best-response function. Unlike other gradient-based approaches, we do not require differentiating the training loss with respect to the hyperparameters, allowing us to tune discrete hyperparameters, data augmentation hyperparameters, and dropout probabilities. Because the hyperparameters are adapted online, our approach discovers hyperparameter schedules that can outperform fixed hyperparameter values. Empirically, our approach outperforms competing hyperparameter optimization methods on large-scale deep learning problems. We call our networks, which update their own hyperparameters online during training, Self-Tuning Networks (STNs). |
Tasks | bilevel optimization, Data Augmentation, Hyperparameter Optimization |
Published | 2019-03-07 |
URL | http://arxiv.org/abs/1903.03088v1 |
http://arxiv.org/pdf/1903.03088v1.pdf | |
PWC | https://paperswithcode.com/paper/self-tuning-networks-bilevel-optimization-of |
Repo | https://github.com/lessw2020/auto-adaptive-ai |
Framework | pytorch |
CUDA-Self-Organizing feature map based visual sentiment analysis of bank customer complaints for Analytical CRM
Title | CUDA-Self-Organizing feature map based visual sentiment analysis of bank customer complaints for Analytical CRM |
Authors | Rohit Gavval, Vadlamani Ravi, Kalavala Revanth Harshal, Akhilesh Gangwar, Kumar Ravi |
Abstract | With the widespread use of social media, companies now have access to a wealth of customer feedback data which has valuable applications to Customer Relationship Management (CRM). Analyzing customer grievances data, is paramount as their speedy non-redressal would lead to customer churn resulting in lower profitability. In this paper, we propose a descriptive analytics framework using Self-organizing feature map (SOM), for Visual Sentiment Analysis of customer complaints. The network learns the inherent grouping of the complaints automatically which can then be visualized too using various techniques. Analytical Customer Relationship Management (ACRM) executives can draw useful business insights from the maps and take timely remedial action. We also propose a high-performance version of the algorithm CUDASOM (CUDA based Self Organizing feature Map) implemented using NVIDIA parallel computing platform, CUDA, which speeds up the processing of high-dimensional text data and generates fast results. The efficacy of the proposed model has been demonstrated on the customer complaints data regarding the products and services of four leading Indian banks. CUDASOM achieved an average speed up of 44 times. Our approach can expand research into intelligent grievance redressal system to provide rapid solutions to the complaining customers. |
Tasks | Sentiment Analysis |
Published | 2019-05-23 |
URL | https://arxiv.org/abs/1905.09598v1 |
https://arxiv.org/pdf/1905.09598v1.pdf | |
PWC | https://paperswithcode.com/paper/cuda-self-organizing-feature-map-based-visual |
Repo | https://github.com/kravi2018/ffca_sentiment_analysis |
Framework | none |
Graph Transformer for Graph-to-Sequence Learning
Title | Graph Transformer for Graph-to-Sequence Learning |
Authors | Deng Cai, Wai Lam |
Abstract | The dominant graph-to-sequence transduction models employ graph neural networks for graph representation learning, where the structural information is reflected by the receptive field of neurons. Unlike graph neural networks that restrict the information exchange between immediate neighborhood, we propose a new model, known as Graph Transformer, that uses explicit relation encoding and allows direct communication between two distant nodes. It provides a more efficient way for global graph structure modeling. Experiments on the applications of text generation from Abstract Meaning Representation (AMR) and syntax-based neural machine translation show the superiority of our proposed model. Specifically, our model achieves 27.4 BLEU on LDC2015E86 and 29.7 BLEU on LDC2017T10 for AMR-to-text generation, outperforming the state-of-the-art results by up to 2.2 points. On the syntax-based translation tasks, our model establishes new single-model state-of-the-art BLEU scores, 21.3 for English-to-German and 14.1 for English-to-Czech, improving over the existing best results, including ensembles, by over 1 BLEU. |
Tasks | Graph Representation Learning, Graph-to-Sequence, Machine Translation, Representation Learning, Text Generation |
Published | 2019-11-18 |
URL | https://arxiv.org/abs/1911.07470v2 |
https://arxiv.org/pdf/1911.07470v2.pdf | |
PWC | https://paperswithcode.com/paper/graph-transformer-for-graph-to-sequence |
Repo | https://github.com/jcyk/gtos |
Framework | pytorch |
Market Trend Prediction using Sentiment Analysis: Lessons Learned and Paths Forward
Title | Market Trend Prediction using Sentiment Analysis: Lessons Learned and Paths Forward |
Authors | Andrius Mudinas, Dell Zhang, Mark Levene |
Abstract | Financial market forecasting is one of the most attractive practical applications of sentiment analysis. In this paper, we investigate the potential of using sentiment \emph{attitudes} (positive vs negative) and also sentiment \emph{emotions} (joy, sadness, etc.) extracted from financial news or tweets to help predict stock price movements. Our extensive experiments using the \emph{Granger-causality} test have revealed that (i) in general sentiment attitudes do not seem to Granger-cause stock price changes; and (ii) while on some specific occasions sentiment emotions do seem to Granger-cause stock price changes, the exhibited pattern is not universal and must be looked at on a case by case basis. Furthermore, it has been observed that at least for certain stocks, integrating sentiment emotions as additional features into the machine learning based market trend prediction model could improve its accuracy. |
Tasks | Sentiment Analysis |
Published | 2019-03-13 |
URL | http://arxiv.org/abs/1903.05440v1 |
http://arxiv.org/pdf/1903.05440v1.pdf | |
PWC | https://paperswithcode.com/paper/market-trend-prediction-using-sentiment |
Repo | https://github.com/AndMu/Market-Wisdom |
Framework | none |
Quantitative Error Prediction of Medical Image Registration using Regression Forests
Title | Quantitative Error Prediction of Medical Image Registration using Regression Forests |
Authors | Hessam Sokooti, Gorkem Saygili, Ben Glocker, Boudewijn P. F. Lelieveldt, Marius Staring |
Abstract | Predicting registration error can be useful for evaluation of registration procedures, which is important for the adoption of registration techniques in the clinic. In addition, quantitative error prediction can be helpful in improving the registration quality. The task of predicting registration error is demanding due to the lack of a ground truth in medical images. This paper proposes a new automatic method to predict the registration error in a quantitative manner, and is applied to chest CT scans. A random regression forest is utilized to predict the registration error locally. The forest is built with features related to the transformation model and features related to the dissimilarity after registration. The forest is trained and tested using manually annotated corresponding points between pairs of chest CT scans in two experiments: SPREAD (trained and tested on SPREAD) and inter-database (including three databases SPREAD, DIR-Lab-4DCT and DIR-Lab-COPDgene). The results show that the mean absolute errors of regression are 1.07 $\pm$ 1.86 and 1.76 $\pm$ 2.59 mm for the SPREAD and inter-database experiment, respectively. The overall accuracy of classification in three classes (correct, poor and wrong registration) is 90.7% and 75.4%, for SPREAD and inter-database respectively. The good performance of the proposed method enables important applications such as automatic quality control in large-scale image analysis. |
Tasks | Image Registration, Medical Image Registration |
Published | 2019-05-18 |
URL | https://arxiv.org/abs/1905.07624v1 |
https://arxiv.org/pdf/1905.07624v1.pdf | |
PWC | https://paperswithcode.com/paper/quantitative-error-prediction-of-medical |
Repo | https://github.com/hsokooti/regun |
Framework | none |
DeepAtlas: Joint Semi-Supervised Learning of Image Registration and Segmentation
Title | DeepAtlas: Joint Semi-Supervised Learning of Image Registration and Segmentation |
Authors | Zhenlin Xu, Marc Niethammer |
Abstract | Deep convolutional neural networks (CNNs) are state-of-the-art for semantic image segmentation, but typically require many labeled training samples. Obtaining 3D segmentations of medical images for supervised training is difficult and labor intensive. Motivated by classical approaches for joint segmentation and registration we therefore propose a deep learning framework that jointly learns networks for image registration and image segmentation. In contrast to previous work on deep unsupervised image registration, which showed the benefit of weak supervision via image segmentations, our approach can use existing segmentations when available and computes them via the segmentation network otherwise, thereby providing the same registration benefit. Conversely, segmentation network training benefits from the registration, which essentially provides a realistic form of data augmentation. Experiments on knee and brain 3D magnetic resonance (MR) images show that our approach achieves large simultaneous improvements of segmentation and registration accuracy (over independently trained networks) and allows training high-quality models with very limited training data. Specifically, in a one-shot-scenario (with only one manually labeled image) our approach increases Dice scores (%) over an unsupervised registration network by 2.7 and 1.8 on the knee and brain images respectively. |
Tasks | Data Augmentation, Image Registration, Semantic Segmentation |
Published | 2019-04-17 |
URL | https://arxiv.org/abs/1904.08465v2 |
https://arxiv.org/pdf/1904.08465v2.pdf | |
PWC | https://paperswithcode.com/paper/deepatlas-joint-semi-supervised-learning-of |
Repo | https://github.com/uncbiag/DeepAtlas |
Framework | none |
Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer
Title | Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer |
Authors | René Ranftl, Katrin Lasinger, David Hafner, Konrad Schindler, Vladlen Koltun |
Abstract | The success of monocular depth estimation relies on large and diverse training sets. Due to the challenges associated with acquiring dense ground-truth depth across different environments at scale, a number of datasets with distinct characteristics and biases have emerged. We develop tools that enable mixing multiple datasets during training, even if their annotations are incompatible. In particular, we propose a robust training objective that is invariant to changes in depth range and scale, advocate the use of principled multi-objective learning to combine data from different sources, and highlight the importance of pretraining encoders on auxiliary tasks. Armed with these tools, we experiment with five diverse training datasets, including a new, massive data source: 3D films. To demonstrate the generalization power of our approach we use zero-shot cross-dataset transfer}, i.e. we evaluate on datasets that were not seen during training. The experiments confirm that mixing data from complementary sources greatly improves monocular depth estimation. Our approach clearly outperforms competing methods across diverse datasets, setting a new state of the art for monocular depth estimation. Some results are shown in the supplementary video at https://youtu.be/D46FzVyL9I8 |
Tasks | Depth Estimation, Monocular Depth Estimation |
Published | 2019-07-02 |
URL | https://arxiv.org/abs/1907.01341v2 |
https://arxiv.org/pdf/1907.01341v2.pdf | |
PWC | https://paperswithcode.com/paper/towards-robust-monocular-depth-estimation |
Repo | https://github.com/intel-isl/MiDaS |
Framework | pytorch |
Feature-Based Image Clustering and Segmentation Using Wavelets
Title | Feature-Based Image Clustering and Segmentation Using Wavelets |
Authors | Junyu Chen, Eric C. Frey |
Abstract | Pixel intensity is a widely used feature for clustering and segmentation algorithms, the resulting segmentation using only intensity values might suffer from noises and lack of spatial context information. Wavelet transform is often used for image denoising and classification. We proposed a novel method to incorporate Wavelet features in segmentation and clustering algorithms. The conventional K-means, Fuzzy c-means (FCM), and Active contour without edges (ACWE) algorithms were modified to adapt Wavelet features, leading to robust clustering/segmentation algorithms. A weighting parameter to control the weight of low-frequency sub-band information was also introduced. The new algorithms showed the capability to converge to different segmentation results based on the frequency information derived from the Wavelet sub-bands. |
Tasks | Denoising, Image Clustering, Image Denoising |
Published | 2019-07-05 |
URL | https://arxiv.org/abs/1907.03591v1 |
https://arxiv.org/pdf/1907.03591v1.pdf | |
PWC | https://paperswithcode.com/paper/feature-based-image-clustering-and |
Repo | https://github.com/junyuchen245/Active-Contour-Wavelet-Seg |
Framework | none |
Deep Prediction of Investor Interest: a Supervised Clustering Approach
Title | Deep Prediction of Investor Interest: a Supervised Clustering Approach |
Authors | Baptiste Barreau, Laurent Carlier, Damien Challet |
Abstract | We propose a novel deep learning architecture suitable for the prediction of investor interest for a given asset in a given time frame. This architecture performs both investor clustering and modelling at the same time. We first verify its superior performance on a synthetic scenario inspired by real data and then apply it to two real-world databases, a publicly available dataset about the position of investors in Spanish stock market and proprietary data from BNP Paribas Corporate and Institutional Banking. |
Tasks | |
Published | 2019-09-11 |
URL | https://arxiv.org/abs/1909.05289v2 |
https://arxiv.org/pdf/1909.05289v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-prediction-of-investor-interest-a |
Repo | https://github.com/BptBrr/deep_prediction |
Framework | tf |
How to Evaluate Machine Learning Approaches for Combinatorial Optimization: Application to the Travelling Salesman Problem
Title | How to Evaluate Machine Learning Approaches for Combinatorial Optimization: Application to the Travelling Salesman Problem |
Authors | Antoine François, Quentin Cappart, Louis-Martin Rousseau |
Abstract | Combinatorial optimization is the field devoted to the study and practice of algorithms that solve NP-hard problems. As Machine Learning (ML) and deep learning have popularized, several research groups have started to use ML to solve combinatorial optimization problems, such as the well-known Travelling Salesman Problem (TSP). Based on deep (reinforcement) learning, new models and architecture for the TSP have been successively developed and have gained increasing performances. At the time of writing, state-of-the-art models provide solutions to TSP instances of 100 cities that are roughly 1.33% away from optimal solutions. However, despite these apparently positive results, the performances remain far from those that can be achieved using a specialized search procedure. In this paper, we address the limitations of ML approaches for solving the TSP and investigate two fundamental questions: (1) how can we measure the level of accuracy of the pure ML component of such methods; and (2) what is the impact of a search procedure plugged inside a ML model on the performances? To answer these questions, we propose a new metric, ratio of optimal decisions (ROD), based on a fair comparison with a parametrized oracle, mimicking a ML model with a controlled accuracy. All the experiments are carried out on four state-of-the-art ML approaches dedicated to solve the TSP. Finally, we made ROD open-source in order to ease future research in the field. |
Tasks | Combinatorial Optimization |
Published | 2019-09-28 |
URL | https://arxiv.org/abs/1909.13121v1 |
https://arxiv.org/pdf/1909.13121v1.pdf | |
PWC | https://paperswithcode.com/paper/how-to-evaluate-machine-learning-approaches |
Repo | https://github.com/qcappart/ROD_oracle |
Framework | none |
Boosting Soft Actor-Critic: Emphasizing Recent Experience without Forgetting the Past
Title | Boosting Soft Actor-Critic: Emphasizing Recent Experience without Forgetting the Past |
Authors | Che Wang, Keith Ross |
Abstract | Soft Actor-Critic (SAC) is an off-policy actor-critic deep reinforcement learning (DRL) algorithm based on maximum entropy reinforcement learning. By combining off-policy updates with an actor-critic formulation, SAC achieves state-of-the-art performance on a range of continuous-action benchmark tasks, outperforming prior on-policy and off-policy methods. The off-policy method employed by SAC samples data uniformly from past experience when performing parameter updates. We propose Emphasizing Recent Experience (ERE), a simple but powerful off-policy sampling technique, which emphasizes recently observed data while not forgetting the past. The ERE algorithm samples more aggressively from recent experience, and also orders the updates to ensure that updates from old data do not overwrite updates from new data. We compare vanilla SAC and SAC+ERE, and show that ERE is more sample efficient than vanilla SAC for continuous-action Mujoco tasks. We also consider combining SAC with Priority Experience Replay (PER), a scheme originally proposed for deep Q-learning which prioritizes the data based on temporal-difference (TD) error. We show that SAC+PER can marginally improve the sample efficiency performance of SAC, but much less so than SAC+ERE. Finally, we propose an algorithm which integrates ERE and PER and show that this hybrid algorithm can give the best results for some of the Mujoco tasks. |
Tasks | Q-Learning |
Published | 2019-06-10 |
URL | https://arxiv.org/abs/1906.04009v1 |
https://arxiv.org/pdf/1906.04009v1.pdf | |
PWC | https://paperswithcode.com/paper/boosting-soft-actor-critic-emphasizing-recent |
Repo | https://github.com/BY571/Soft-Actor-Critic-and-Extensions |
Framework | pytorch |
Blaze: Simplified High Performance Cluster Computing
Title | Blaze: Simplified High Performance Cluster Computing |
Authors | Junhao Li, Hang Zhang |
Abstract | MapReduce and its variants have significantly simplified and accelerated the process of developing parallel programs. However, most MapReduce implementations focus on data-intensive tasks while many real-world tasks are compute intensive and their data can fit distributedly into the memory. For these tasks, the speed of MapReduce programs can be much slower than those hand-optimized ones. We present Blaze, a C++ library that makes it easy to develop high performance parallel programs for such compute intensive tasks. At the core of Blaze is a highly-optimized in-memory MapReduce function, which has three main improvements over conventional MapReduce implementations: eager reduction, fast serialization, and special treatment for a small fixed key range. We also offer additional conveniences that make developing parallel programs similar to developing serial programs. These improvements make Blaze an easy-to-use cluster computing library that approaches the speed of hand-optimized parallel code. We apply Blaze to some common data mining tasks, including word frequency count, PageRank, k-means, expectation maximization (Gaussian mixture model), and k-nearest neighbors. Blaze outperforms Apache Spark by more than 10 times on average for these tasks, and the speed of Blaze scales almost linearly with the number of nodes. In addition, Blaze uses only the MapReduce function and 3 utility functions in its implementation while Spark uses almost 30 different parallel primitives in its official implementation. |
Tasks | |
Published | 2019-02-04 |
URL | http://arxiv.org/abs/1902.01437v2 |
http://arxiv.org/pdf/1902.01437v2.pdf | |
PWC | https://paperswithcode.com/paper/blaze-simplified-high-performance-cluster |
Repo | https://github.com/junhao12131/blaze |
Framework | none |
Data driven approximation of parametrized PDEs by Reduced Basis and Neural Networks
Title | Data driven approximation of parametrized PDEs by Reduced Basis and Neural Networks |
Authors | Niccolò Dal Santo, Simone Deparis, Luca Pegolotti |
Abstract | We are interested in the approximation of partial differential equations with a data-driven approach based on the reduced basis method and machine learning. We suppose that the phenomenon of interest can be modeled by a parametrized partial differential equation, but that the value of the physical parameters is unknown or difficult to be directly measured. Our method allows to estimate fields of interest, for instance temperature of a sample of material or velocity of a fluid, given data at a handful of points in the domain. We propose to accomplish this task with a neural network embedding a reduced basis solver as exotic activation function in the last layer. The reduced basis solver accounts for the underlying physical phenomenonon and it is constructed from snapshots obtained from randomly selected values of the physical parameters during an expensive offline phase. The same full order solutions are then employed for the training of the neural network. As a matter of fact, the chosen architecture resembles an asymmetric autoencoder in which the decoder is the reduced basis solver and as such it does not contain trainable parameters. The resulting latent space of our autoencoder includes parameter-dependent quantities feeding the reduced basis solver, which – depending on the considered partial differential equation – are the values of the physical parameters themselves or the affine decomposition coefficients of the differential operators. |
Tasks | Network Embedding |
Published | 2019-04-02 |
URL | https://arxiv.org/abs/1904.01514v2 |
https://arxiv.org/pdf/1904.01514v2.pdf | |
PWC | https://paperswithcode.com/paper/data-driven-approximation-of-parametrized |
Repo | https://github.com/ndalsanto/PDE-DNN |
Framework | tf |
Global Vectors for Node Representations
Title | Global Vectors for Node Representations |
Authors | Robin Brochier, Adrien Guille, Julien Velcin |
Abstract | Most network embedding algorithms consist in measuring co-occurrences of nodes via random walks then learning the embeddings using Skip-Gram with Negative Sampling. While it has proven to be a relevant choice, there are alternatives, such as GloVe, which has not been investigated yet for network embedding. Even though SGNS better handles non co-occurrence than GloVe, it has a worse time-complexity. In this paper, we propose a matrix factorization approach for network embedding, inspired by GloVe, that better handles non co-occurrence with a competitive time-complexity. We also show how to extend this model to deal with networks where nodes are documents, by simultaneously learning word, node and document representations. Quantitative evaluations show that our model achieves state-of-the-art performance, while not being so sensitive to the choice of hyper-parameters. Qualitatively speaking, we show how our model helps exploring a network of documents by generating complementary network-oriented and content-oriented keywords. |
Tasks | Network Embedding |
Published | 2019-02-28 |
URL | http://arxiv.org/abs/1902.11004v1 |
http://arxiv.org/pdf/1902.11004v1.pdf | |
PWC | https://paperswithcode.com/paper/global-vectors-for-node-representations |
Repo | https://github.com/brochier/gvnr |
Framework | tf |
Monte Carlo Gradient Estimation in Machine Learning
Title | Monte Carlo Gradient Estimation in Machine Learning |
Authors | Shakir Mohamed, Mihaela Rosca, Michael Figurnov, Andriy Mnih |
Abstract | This paper is a broad and accessible survey of the methods we have at our disposal for Monte Carlo gradient estimation in machine learning and across the statistical sciences: the problem of computing the gradient of an expectation of a function with respect to parameters defining the distribution that is integrated; the problem of sensitivity analysis. In machine learning research, this gradient problem lies at the core of many learning problems, in supervised, unsupervised and reinforcement learning. We will generally seek to rewrite such gradients in a form that allows for Monte Carlo estimation, allowing them to be easily and efficiently used and analysed. We explore three strategies–the pathwise, score function, and measure-valued gradient estimators–exploring their historical developments, derivation, and underlying assumptions. We describe their use in other fields, show how they are related and can be combined, and expand on their possible generalisations. Wherever Monte Carlo gradient estimators have been derived and deployed in the past, important advances have followed. A deeper and more widely-held understanding of this problem will lead to further advances, and it is these advances that we wish to support. |
Tasks | |
Published | 2019-06-25 |
URL | https://arxiv.org/abs/1906.10652v1 |
https://arxiv.org/pdf/1906.10652v1.pdf | |
PWC | https://paperswithcode.com/paper/monte-carlo-gradient-estimation-in-machine |
Repo | https://github.com/deepmind/mc_gradients |
Framework | none |