February 1, 2020

3372 words 16 mins read

Paper Group AWR 139

Self-Tuning Networks: Bilevel Optimization of Hyperparameters using Structured Best-Response Functions. CUDA-Self-Organizing feature map based visual sentiment analysis of bank customer complaints for Analytical CRM. Graph Transformer for Graph-to-Sequence Learning. Market Trend Prediction using Sentiment Analysis: Lessons Learned and Paths Forward …

Self-Tuning Networks: Bilevel Optimization of Hyperparameters using Structured Best-Response Functions


Title	Self-Tuning Networks: Bilevel Optimization of Hyperparameters using Structured Best-Response Functions
Authors	Matthew MacKay, Paul Vicol, Jon Lorraine, David Duvenaud, Roger Grosse
Abstract	Hyperparameter optimization can be formulated as a bilevel optimization problem, where the optimal parameters on the training set depend on the hyperparameters. We aim to adapt regularization hyperparameters for neural networks by fitting compact approximations to the best-response function, which maps hyperparameters to optimal weights and biases. We show how to construct scalable best-response approximations for neural networks by modeling the best-response as a single network whose hidden units are gated conditionally on the regularizer. We justify this approximation by showing the exact best-response for a shallow linear network with L2-regularized Jacobian can be represented by a similar gating mechanism. We fit this model using a gradient-based hyperparameter optimization algorithm which alternates between approximating the best-response around the current hyperparameters and optimizing the hyperparameters using the approximate best-response function. Unlike other gradient-based approaches, we do not require differentiating the training loss with respect to the hyperparameters, allowing us to tune discrete hyperparameters, data augmentation hyperparameters, and dropout probabilities. Because the hyperparameters are adapted online, our approach discovers hyperparameter schedules that can outperform fixed hyperparameter values. Empirically, our approach outperforms competing hyperparameter optimization methods on large-scale deep learning problems. We call our networks, which update their own hyperparameters online during training, Self-Tuning Networks (STNs).
Tasks	bilevel optimization, Data Augmentation, Hyperparameter Optimization
Published	2019-03-07
URL	http://arxiv.org/abs/1903.03088v1
PDF	http://arxiv.org/pdf/1903.03088v1.pdf
PWC	https://paperswithcode.com/paper/self-tuning-networks-bilevel-optimization-of
Repo	https://github.com/lessw2020/auto-adaptive-ai
Framework	pytorch

CUDA-Self-Organizing feature map based visual sentiment analysis of bank customer complaints for Analytical CRM


Title	CUDA-Self-Organizing feature map based visual sentiment analysis of bank customer complaints for Analytical CRM
Authors	Rohit Gavval, Vadlamani Ravi, Kalavala Revanth Harshal, Akhilesh Gangwar, Kumar Ravi
Abstract	With the widespread use of social media, companies now have access to a wealth of customer feedback data which has valuable applications to Customer Relationship Management (CRM). Analyzing customer grievances data, is paramount as their speedy non-redressal would lead to customer churn resulting in lower profitability. In this paper, we propose a descriptive analytics framework using Self-organizing feature map (SOM), for Visual Sentiment Analysis of customer complaints. The network learns the inherent grouping of the complaints automatically which can then be visualized too using various techniques. Analytical Customer Relationship Management (ACRM) executives can draw useful business insights from the maps and take timely remedial action. We also propose a high-performance version of the algorithm CUDASOM (CUDA based Self Organizing feature Map) implemented using NVIDIA parallel computing platform, CUDA, which speeds up the processing of high-dimensional text data and generates fast results. The efficacy of the proposed model has been demonstrated on the customer complaints data regarding the products and services of four leading Indian banks. CUDASOM achieved an average speed up of 44 times. Our approach can expand research into intelligent grievance redressal system to provide rapid solutions to the complaining customers.
Tasks	Sentiment Analysis
Published	2019-05-23
URL	https://arxiv.org/abs/1905.09598v1
PDF	https://arxiv.org/pdf/1905.09598v1.pdf
PWC	https://paperswithcode.com/paper/cuda-self-organizing-feature-map-based-visual
Repo	https://github.com/kravi2018/ffca_sentiment_analysis
Framework	none

Graph Transformer for Graph-to-Sequence Learning


Title	Graph Transformer for Graph-to-Sequence Learning
Authors	Deng Cai, Wai Lam
Abstract	The dominant graph-to-sequence transduction models employ graph neural networks for graph representation learning, where the structural information is reflected by the receptive field of neurons. Unlike graph neural networks that restrict the information exchange between immediate neighborhood, we propose a new model, known as Graph Transformer, that uses explicit relation encoding and allows direct communication between two distant nodes. It provides a more efficient way for global graph structure modeling. Experiments on the applications of text generation from Abstract Meaning Representation (AMR) and syntax-based neural machine translation show the superiority of our proposed model. Specifically, our model achieves 27.4 BLEU on LDC2015E86 and 29.7 BLEU on LDC2017T10 for AMR-to-text generation, outperforming the state-of-the-art results by up to 2.2 points. On the syntax-based translation tasks, our model establishes new single-model state-of-the-art BLEU scores, 21.3 for English-to-German and 14.1 for English-to-Czech, improving over the existing best results, including ensembles, by over 1 BLEU.
Tasks	Graph Representation Learning, Graph-to-Sequence, Machine Translation, Representation Learning, Text Generation
Published	2019-11-18
URL	https://arxiv.org/abs/1911.07470v2
PDF	https://arxiv.org/pdf/1911.07470v2.pdf
PWC	https://paperswithcode.com/paper/graph-transformer-for-graph-to-sequence
Repo	https://github.com/jcyk/gtos
Framework	pytorch

Market Trend Prediction using Sentiment Analysis: Lessons Learned and Paths Forward


Title	Market Trend Prediction using Sentiment Analysis: Lessons Learned and Paths Forward
Authors	Andrius Mudinas, Dell Zhang, Mark Levene
Abstract	Financial market forecasting is one of the most attractive practical applications of sentiment analysis. In this paper, we investigate the potential of using sentiment \emph{attitudes} (positive vs negative) and also sentiment \emph{emotions} (joy, sadness, etc.) extracted from financial news or tweets to help predict stock price movements. Our extensive experiments using the \emph{Granger-causality} test have revealed that (i) in general sentiment attitudes do not seem to Granger-cause stock price changes; and (ii) while on some specific occasions sentiment emotions do seem to Granger-cause stock price changes, the exhibited pattern is not universal and must be looked at on a case by case basis. Furthermore, it has been observed that at least for certain stocks, integrating sentiment emotions as additional features into the machine learning based market trend prediction model could improve its accuracy.
Tasks	Sentiment Analysis
Published	2019-03-13
URL	http://arxiv.org/abs/1903.05440v1
PDF	http://arxiv.org/pdf/1903.05440v1.pdf
PWC	https://paperswithcode.com/paper/market-trend-prediction-using-sentiment
Repo	https://github.com/AndMu/Market-Wisdom
Framework	none

Quantitative Error Prediction of Medical Image Registration using Regression Forests


Title	Quantitative Error Prediction of Medical Image Registration using Regression Forests
Authors	Hessam Sokooti, Gorkem Saygili, Ben Glocker, Boudewijn P. F. Lelieveldt, Marius Staring
Abstract	Predicting registration error can be useful for evaluation of registration procedures, which is important for the adoption of registration techniques in the clinic. In addition, quantitative error prediction can be helpful in improving the registration quality. The task of predicting registration error is demanding due to the lack of a ground truth in medical images. This paper proposes a new automatic method to predict the registration error in a quantitative manner, and is applied to chest CT scans. A random regression forest is utilized to predict the registration error locally. The forest is built with features related to the transformation model and features related to the dissimilarity after registration. The forest is trained and tested using manually annotated corresponding points between pairs of chest CT scans in two experiments: SPREAD (trained and tested on SPREAD) and inter-database (including three databases SPREAD, DIR-Lab-4DCT and DIR-Lab-COPDgene). The results show that the mean absolute errors of regression are 1.07 $\pm$ 1.86 and 1.76 $\pm$ 2.59 mm for the SPREAD and inter-database experiment, respectively. The overall accuracy of classification in three classes (correct, poor and wrong registration) is 90.7% and 75.4%, for SPREAD and inter-database respectively. The good performance of the proposed method enables important applications such as automatic quality control in large-scale image analysis.
Tasks	Image Registration, Medical Image Registration
Published	2019-05-18
URL	https://arxiv.org/abs/1905.07624v1
PDF	https://arxiv.org/pdf/1905.07624v1.pdf
PWC	https://paperswithcode.com/paper/quantitative-error-prediction-of-medical
Repo	https://github.com/hsokooti/regun
Framework	none

DeepAtlas: Joint Semi-Supervised Learning of Image Registration and Segmentation


Title	DeepAtlas: Joint Semi-Supervised Learning of Image Registration and Segmentation
Authors	Zhenlin Xu, Marc Niethammer
Abstract	Deep convolutional neural networks (CNNs) are state-of-the-art for semantic image segmentation, but typically require many labeled training samples. Obtaining 3D segmentations of medical images for supervised training is difficult and labor intensive. Motivated by classical approaches for joint segmentation and registration we therefore propose a deep learning framework that jointly learns networks for image registration and image segmentation. In contrast to previous work on deep unsupervised image registration, which showed the benefit of weak supervision via image segmentations, our approach can use existing segmentations when available and computes them via the segmentation network otherwise, thereby providing the same registration benefit. Conversely, segmentation network training benefits from the registration, which essentially provides a realistic form of data augmentation. Experiments on knee and brain 3D magnetic resonance (MR) images show that our approach achieves large simultaneous improvements of segmentation and registration accuracy (over independently trained networks) and allows training high-quality models with very limited training data. Specifically, in a one-shot-scenario (with only one manually labeled image) our approach increases Dice scores (%) over an unsupervised registration network by 2.7 and 1.8 on the knee and brain images respectively.
Tasks	Data Augmentation, Image Registration, Semantic Segmentation
Published	2019-04-17
URL	https://arxiv.org/abs/1904.08465v2
PDF	https://arxiv.org/pdf/1904.08465v2.pdf
PWC	https://paperswithcode.com/paper/deepatlas-joint-semi-supervised-learning-of
Repo	https://github.com/uncbiag/DeepAtlas
Framework	none

Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer


Title	Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer
Authors	René Ranftl, Katrin Lasinger, David Hafner, Konrad Schindler, Vladlen Koltun
Abstract	The success of monocular depth estimation relies on large and diverse training sets. Due to the challenges associated with acquiring dense ground-truth depth across different environments at scale, a number of datasets with distinct characteristics and biases have emerged. We develop tools that enable mixing multiple datasets during training, even if their annotations are incompatible. In particular, we propose a robust training objective that is invariant to changes in depth range and scale, advocate the use of principled multi-objective learning to combine data from different sources, and highlight the importance of pretraining encoders on auxiliary tasks. Armed with these tools, we experiment with five diverse training datasets, including a new, massive data source: 3D films. To demonstrate the generalization power of our approach we use zero-shot cross-dataset transfer}, i.e. we evaluate on datasets that were not seen during training. The experiments confirm that mixing data from complementary sources greatly improves monocular depth estimation. Our approach clearly outperforms competing methods across diverse datasets, setting a new state of the art for monocular depth estimation. Some results are shown in the supplementary video at https://youtu.be/D46FzVyL9I8
Tasks	Depth Estimation, Monocular Depth Estimation
Published	2019-07-02
URL	https://arxiv.org/abs/1907.01341v2
PDF	https://arxiv.org/pdf/1907.01341v2.pdf
PWC	https://paperswithcode.com/paper/towards-robust-monocular-depth-estimation
Repo	https://github.com/intel-isl/MiDaS
Framework	pytorch

Feature-Based Image Clustering and Segmentation Using Wavelets


Title	Feature-Based Image Clustering and Segmentation Using Wavelets
Authors	Junyu Chen, Eric C. Frey
Abstract	Pixel intensity is a widely used feature for clustering and segmentation algorithms, the resulting segmentation using only intensity values might suffer from noises and lack of spatial context information. Wavelet transform is often used for image denoising and classification. We proposed a novel method to incorporate Wavelet features in segmentation and clustering algorithms. The conventional K-means, Fuzzy c-means (FCM), and Active contour without edges (ACWE) algorithms were modified to adapt Wavelet features, leading to robust clustering/segmentation algorithms. A weighting parameter to control the weight of low-frequency sub-band information was also introduced. The new algorithms showed the capability to converge to different segmentation results based on the frequency information derived from the Wavelet sub-bands.
Tasks	Denoising, Image Clustering, Image Denoising
Published	2019-07-05
URL	https://arxiv.org/abs/1907.03591v1
PDF	https://arxiv.org/pdf/1907.03591v1.pdf
PWC	https://paperswithcode.com/paper/feature-based-image-clustering-and
Repo	https://github.com/junyuchen245/Active-Contour-Wavelet-Seg
Framework	none

Deep Prediction of Investor Interest: a Supervised Clustering Approach


Title	Deep Prediction of Investor Interest: a Supervised Clustering Approach
Authors	Baptiste Barreau, Laurent Carlier, Damien Challet
Abstract	We propose a novel deep learning architecture suitable for the prediction of investor interest for a given asset in a given time frame. This architecture performs both investor clustering and modelling at the same time. We first verify its superior performance on a synthetic scenario inspired by real data and then apply it to two real-world databases, a publicly available dataset about the position of investors in Spanish stock market and proprietary data from BNP Paribas Corporate and Institutional Banking.
Tasks
Published	2019-09-11
URL	https://arxiv.org/abs/1909.05289v2
PDF	https://arxiv.org/pdf/1909.05289v2.pdf
PWC	https://paperswithcode.com/paper/deep-prediction-of-investor-interest-a
Repo	https://github.com/BptBrr/deep_prediction
Framework	tf

How to Evaluate Machine Learning Approaches for Combinatorial Optimization: Application to the Travelling Salesman Problem


Title	How to Evaluate Machine Learning Approaches for Combinatorial Optimization: Application to the Travelling Salesman Problem
Authors	Antoine François, Quentin Cappart, Louis-Martin Rousseau
Abstract	Combinatorial optimization is the field devoted to the study and practice of algorithms that solve NP-hard problems. As Machine Learning (ML) and deep learning have popularized, several research groups have started to use ML to solve combinatorial optimization problems, such as the well-known Travelling Salesman Problem (TSP). Based on deep (reinforcement) learning, new models and architecture for the TSP have been successively developed and have gained increasing performances. At the time of writing, state-of-the-art models provide solutions to TSP instances of 100 cities that are roughly 1.33% away from optimal solutions. However, despite these apparently positive results, the performances remain far from those that can be achieved using a specialized search procedure. In this paper, we address the limitations of ML approaches for solving the TSP and investigate two fundamental questions: (1) how can we measure the level of accuracy of the pure ML component of such methods; and (2) what is the impact of a search procedure plugged inside a ML model on the performances? To answer these questions, we propose a new metric, ratio of optimal decisions (ROD), based on a fair comparison with a parametrized oracle, mimicking a ML model with a controlled accuracy. All the experiments are carried out on four state-of-the-art ML approaches dedicated to solve the TSP. Finally, we made ROD open-source in order to ease future research in the field.
Tasks	Combinatorial Optimization
Published	2019-09-28
URL	https://arxiv.org/abs/1909.13121v1
PDF	https://arxiv.org/pdf/1909.13121v1.pdf
PWC	https://paperswithcode.com/paper/how-to-evaluate-machine-learning-approaches
Repo	https://github.com/qcappart/ROD_oracle
Framework	none

Boosting Soft Actor-Critic: Emphasizing Recent Experience without Forgetting the Past


Title	Boosting Soft Actor-Critic: Emphasizing Recent Experience without Forgetting the Past
Authors	Che Wang, Keith Ross
Abstract	Soft Actor-Critic (SAC) is an off-policy actor-critic deep reinforcement learning (DRL) algorithm based on maximum entropy reinforcement learning. By combining off-policy updates with an actor-critic formulation, SAC achieves state-of-the-art performance on a range of continuous-action benchmark tasks, outperforming prior on-policy and off-policy methods. The off-policy method employed by SAC samples data uniformly from past experience when performing parameter updates. We propose Emphasizing Recent Experience (ERE), a simple but powerful off-policy sampling technique, which emphasizes recently observed data while not forgetting the past. The ERE algorithm samples more aggressively from recent experience, and also orders the updates to ensure that updates from old data do not overwrite updates from new data. We compare vanilla SAC and SAC+ERE, and show that ERE is more sample efficient than vanilla SAC for continuous-action Mujoco tasks. We also consider combining SAC with Priority Experience Replay (PER), a scheme originally proposed for deep Q-learning which prioritizes the data based on temporal-difference (TD) error. We show that SAC+PER can marginally improve the sample efficiency performance of SAC, but much less so than SAC+ERE. Finally, we propose an algorithm which integrates ERE and PER and show that this hybrid algorithm can give the best results for some of the Mujoco tasks.
Tasks	Q-Learning
Published	2019-06-10
URL	https://arxiv.org/abs/1906.04009v1
PDF	https://arxiv.org/pdf/1906.04009v1.pdf
PWC	https://paperswithcode.com/paper/boosting-soft-actor-critic-emphasizing-recent
Repo	https://github.com/BY571/Soft-Actor-Critic-and-Extensions
Framework	pytorch

Blaze: Simplified High Performance Cluster Computing


Title	Blaze: Simplified High Performance Cluster Computing
Authors	Junhao Li, Hang Zhang
Abstract	MapReduce and its variants have significantly simplified and accelerated the process of developing parallel programs. However, most MapReduce implementations focus on data-intensive tasks while many real-world tasks are compute intensive and their data can fit distributedly into the memory. For these tasks, the speed of MapReduce programs can be much slower than those hand-optimized ones. We present Blaze, a C++ library that makes it easy to develop high performance parallel programs for such compute intensive tasks. At the core of Blaze is a highly-optimized in-memory MapReduce function, which has three main improvements over conventional MapReduce implementations: eager reduction, fast serialization, and special treatment for a small fixed key range. We also offer additional conveniences that make developing parallel programs similar to developing serial programs. These improvements make Blaze an easy-to-use cluster computing library that approaches the speed of hand-optimized parallel code. We apply Blaze to some common data mining tasks, including word frequency count, PageRank, k-means, expectation maximization (Gaussian mixture model), and k-nearest neighbors. Blaze outperforms Apache Spark by more than 10 times on average for these tasks, and the speed of Blaze scales almost linearly with the number of nodes. In addition, Blaze uses only the MapReduce function and 3 utility functions in its implementation while Spark uses almost 30 different parallel primitives in its official implementation.
Tasks
Published	2019-02-04
URL	http://arxiv.org/abs/1902.01437v2
PDF	http://arxiv.org/pdf/1902.01437v2.pdf
PWC	https://paperswithcode.com/paper/blaze-simplified-high-performance-cluster
Repo	https://github.com/junhao12131/blaze
Framework	none

Data driven approximation of parametrized PDEs by Reduced Basis and Neural Networks


Title	Data driven approximation of parametrized PDEs by Reduced Basis and Neural Networks
Authors	Niccolò Dal Santo, Simone Deparis, Luca Pegolotti
Abstract	We are interested in the approximation of partial differential equations with a data-driven approach based on the reduced basis method and machine learning. We suppose that the phenomenon of interest can be modeled by a parametrized partial differential equation, but that the value of the physical parameters is unknown or difficult to be directly measured. Our method allows to estimate fields of interest, for instance temperature of a sample of material or velocity of a fluid, given data at a handful of points in the domain. We propose to accomplish this task with a neural network embedding a reduced basis solver as exotic activation function in the last layer. The reduced basis solver accounts for the underlying physical phenomenonon and it is constructed from snapshots obtained from randomly selected values of the physical parameters during an expensive offline phase. The same full order solutions are then employed for the training of the neural network. As a matter of fact, the chosen architecture resembles an asymmetric autoencoder in which the decoder is the reduced basis solver and as such it does not contain trainable parameters. The resulting latent space of our autoencoder includes parameter-dependent quantities feeding the reduced basis solver, which – depending on the considered partial differential equation – are the values of the physical parameters themselves or the affine decomposition coefficients of the differential operators.
Tasks	Network Embedding
Published	2019-04-02
URL	https://arxiv.org/abs/1904.01514v2
PDF	https://arxiv.org/pdf/1904.01514v2.pdf
PWC	https://paperswithcode.com/paper/data-driven-approximation-of-parametrized
Repo	https://github.com/ndalsanto/PDE-DNN
Framework	tf

Global Vectors for Node Representations


Title	Global Vectors for Node Representations
Authors	Robin Brochier, Adrien Guille, Julien Velcin
Abstract	Most network embedding algorithms consist in measuring co-occurrences of nodes via random walks then learning the embeddings using Skip-Gram with Negative Sampling. While it has proven to be a relevant choice, there are alternatives, such as GloVe, which has not been investigated yet for network embedding. Even though SGNS better handles non co-occurrence than GloVe, it has a worse time-complexity. In this paper, we propose a matrix factorization approach for network embedding, inspired by GloVe, that better handles non co-occurrence with a competitive time-complexity. We also show how to extend this model to deal with networks where nodes are documents, by simultaneously learning word, node and document representations. Quantitative evaluations show that our model achieves state-of-the-art performance, while not being so sensitive to the choice of hyper-parameters. Qualitatively speaking, we show how our model helps exploring a network of documents by generating complementary network-oriented and content-oriented keywords.
Tasks	Network Embedding
Published	2019-02-28
URL	http://arxiv.org/abs/1902.11004v1
PDF	http://arxiv.org/pdf/1902.11004v1.pdf
PWC	https://paperswithcode.com/paper/global-vectors-for-node-representations
Repo	https://github.com/brochier/gvnr
Framework	tf

Monte Carlo Gradient Estimation in Machine Learning


Title	Monte Carlo Gradient Estimation in Machine Learning
Authors	Shakir Mohamed, Mihaela Rosca, Michael Figurnov, Andriy Mnih
Abstract	This paper is a broad and accessible survey of the methods we have at our disposal for Monte Carlo gradient estimation in machine learning and across the statistical sciences: the problem of computing the gradient of an expectation of a function with respect to parameters defining the distribution that is integrated; the problem of sensitivity analysis. In machine learning research, this gradient problem lies at the core of many learning problems, in supervised, unsupervised and reinforcement learning. We will generally seek to rewrite such gradients in a form that allows for Monte Carlo estimation, allowing them to be easily and efficiently used and analysed. We explore three strategies–the pathwise, score function, and measure-valued gradient estimators–exploring their historical developments, derivation, and underlying assumptions. We describe their use in other fields, show how they are related and can be combined, and expand on their possible generalisations. Wherever Monte Carlo gradient estimators have been derived and deployed in the past, important advances have followed. A deeper and more widely-held understanding of this problem will lead to further advances, and it is these advances that we wish to support.
Tasks
Published	2019-06-25
URL	https://arxiv.org/abs/1906.10652v1
PDF	https://arxiv.org/pdf/1906.10652v1.pdf
PWC	https://paperswithcode.com/paper/monte-carlo-gradient-estimation-in-machine
Repo	https://github.com/deepmind/mc_gradients
Framework	none