January 25, 2020

2818 words 14 mins read

Paper Group ANR 1664

Latent Relation Language Models. Data Extraction from Charts via Single Deep Neural Network. Gaussian Process Optimization with Adaptive Sketching: Scalable and No Regret. Financial Applications of Gaussian Processes and Bayesian Optimization. Evaluating the Cross-Lingual Effectiveness of Massively Multilingual Neural Machine Translation. Compositi …

Latent Relation Language Models


Title	Latent Relation Language Models
Authors	Hiroaki Hayashi, Zecong Hu, Chenyan Xiong, Graham Neubig
Abstract	In this paper, we propose Latent Relation Language Models (LRLMs), a class of language models that parameterizes the joint distribution over the words in a document and the entities that occur therein via knowledge graph relations. This model has a number of attractive properties: it not only improves language modeling performance, but is also able to annotate the posterior probability of entity spans for a given text through relations. Experiments demonstrate empirical improvements over both a word-based baseline language model and a previous approach that incorporates knowledge graph information. Qualitative analysis further demonstrates the proposed model’s ability to learn to predict appropriate relations in context.
Tasks	Language Modelling
Published	2019-08-21
URL	https://arxiv.org/abs/1908.07690v1
PDF	https://arxiv.org/pdf/1908.07690v1.pdf
PWC	https://paperswithcode.com/paper/190807690
Repo
Framework

Data Extraction from Charts via Single Deep Neural Network


Title	Data Extraction from Charts via Single Deep Neural Network
Authors	Xiaoyi Liu, Diego Klabjan, Patrick NBless
Abstract	Automatic data extraction from charts is challenging for two reasons: there exist many relations among objects in a chart, which is not a common consideration in general computer vision problems; and different types of charts may not be processed by the same model. To address these problems, we propose a framework of a single deep neural network, which consists of object detection, text recognition and object matching modules. The framework handles both bar and pie charts, and it may also be extended to other types of charts by slight revisions and by augmenting the training data. Our model performs successfully on 79.4% of test simulated bar charts and 88.0% of test simulated pie charts, while for charts outside of the training domain it degrades for 57.5% and 62.3%, respectively.
Tasks	Object Detection
Published	2019-06-06
URL	https://arxiv.org/abs/1906.11906v1
PDF	https://arxiv.org/pdf/1906.11906v1.pdf
PWC	https://paperswithcode.com/paper/data-extraction-from-charts-via-single-deep
Repo
Framework

Gaussian Process Optimization with Adaptive Sketching: Scalable and No Regret


Title	Gaussian Process Optimization with Adaptive Sketching: Scalable and No Regret
Authors	Daniele Calandriello, Luigi Carratino, Alessandro Lazaric, Michal Valko, Lorenzo Rosasco
Abstract	Gaussian processes (GP) are a well studied Bayesian approach for the optimization of black-box functions. Despite their effectiveness in simple problems, GP-based algorithms hardly scale to high-dimensional functions, as their per-iteration time and space cost is at least quadratic in the number of dimensions $d$ and iterations $t$. Given a set of $A$ alternatives to choose from, the overall runtime $O(t^3A)$ is prohibitive. In this paper we introduce BKB (budgeted kernelized bandit), a new approximate GP algorithm for optimization under bandit feedback that achieves near-optimal regret (and hence near-optimal convergence rate) with near-constant per-iteration complexity and remarkably no assumption on the input space or covariance of the GP. We combine a kernelized linear bandit algorithm (GP-UCB) with randomized matrix sketching based on leverage score sampling, and we prove that randomly sampling inducing points based on their posterior variance gives an accurate low-rank approximation of the GP, preserving variance estimates and confidence intervals. As a consequence, BKB does not suffer from variance starvation, an important problem faced by many previous sparse GP approximations. Moreover, we show that our procedure selects at most $\tilde{O}(d_{eff})$ points, where $d_{eff}$ is the effective dimension of the explored space, which is typically much smaller than both $d$ and $t$. This greatly reduces the dimensionality of the problem, thus leading to a $O(TAd_{eff}^2)$ runtime and $O(A d_{eff})$ space complexity.
Tasks	Gaussian Processes
Published	2019-03-13
URL	https://arxiv.org/abs/1903.05594v2
PDF	https://arxiv.org/pdf/1903.05594v2.pdf
PWC	https://paperswithcode.com/paper/gaussian-process-optimization-with-adaptive
Repo
Framework

Financial Applications of Gaussian Processes and Bayesian Optimization


Title	Financial Applications of Gaussian Processes and Bayesian Optimization
Authors	Joan Gonzalvez, Edmond Lezmi, Thierry Roncalli, Jiali Xu
Abstract	In the last five years, the financial industry has been impacted by the emergence of digitalization and machine learning. In this article, we explore two methods that have undergone rapid development in recent years: Gaussian processes and Bayesian optimization. Gaussian processes can be seen as a generalization of Gaussian random vectors and are associated with the development of kernel methods. Bayesian optimization is an approach for performing derivative-free global optimization in a small dimension, and uses Gaussian processes to locate the global maximum of a black-box function. The first part of the article reviews these two tools and shows how they are connected. In particular, we focus on the Gaussian process regression, which is the core of Bayesian machine learning, and the issue of hyperparameter selection. The second part is dedicated to two financial applications. We first consider the modeling of the term structure of interest rates. More precisely, we test the fitting method and compare the GP prediction and the random walk model. The second application is the construction of trend-following strategies, in particular the online estimation of trend and covariance windows.
Tasks	Gaussian Processes
Published	2019-03-12
URL	http://arxiv.org/abs/1903.04841v1
PDF	http://arxiv.org/pdf/1903.04841v1.pdf
PWC	https://paperswithcode.com/paper/financial-applications-of-gaussian-processes
Repo
Framework

Evaluating the Cross-Lingual Effectiveness of Massively Multilingual Neural Machine Translation


Title	Evaluating the Cross-Lingual Effectiveness of Massively Multilingual Neural Machine Translation
Authors	Aditya Siddhant, Melvin Johnson, Henry Tsai, Naveen Arivazhagan, Jason Riesa, Ankur Bapna, Orhan Firat, Karthik Raman
Abstract	The recently proposed massively multilingual neural machine translation (NMT) system has been shown to be capable of translating over 100 languages to and from English within a single model. Its improved translation performance on low resource languages hints at potential cross-lingual transfer capability for downstream tasks. In this paper, we evaluate the cross-lingual effectiveness of representations from the encoder of a massively multilingual NMT model on 5 downstream classification and sequence labeling tasks covering a diverse set of over 50 languages. We compare against a strong baseline, multilingual BERT (mBERT), in different cross-lingual transfer learning scenarios and show gains in zero-shot transfer in 4 out of these 5 tasks.
Tasks	Cross-Lingual Transfer, Machine Translation, Transfer Learning
Published	2019-09-01
URL	https://arxiv.org/abs/1909.00437v1
PDF	https://arxiv.org/pdf/1909.00437v1.pdf
PWC	https://paperswithcode.com/paper/evaluating-the-cross-lingual-effectiveness-of
Repo
Framework

Compositional Video Prediction


Title	Compositional Video Prediction
Authors	Yufei Ye, Maneesh Singh, Abhinav Gupta, Shubham Tulsiani
Abstract	We present an approach for pixel-level future prediction given an input image of a scene. We observe that a scene is comprised of distinct entities that undergo motion and present an approach that operationalizes this insight. We implicitly predict future states of independent entities while reasoning about their interactions, and compose future video frames using these predicted states. We overcome the inherent multi-modality of the task using a global trajectory-level latent random variable, and show that this allows us to sample diverse and plausible futures. We empirically validate our approach against alternate representations and ways of incorporating multi-modality. We examine two datasets, one comprising of stacked objects that may fall, and the other containing videos of humans performing activities in a gym, and show that our approach allows realistic stochastic video prediction across these diverse settings. See https://judyye.github.io/CVP/ for video predictions.
Tasks	Future prediction, Video Prediction
Published	2019-08-22
URL	https://arxiv.org/abs/1908.08522v1
PDF	https://arxiv.org/pdf/1908.08522v1.pdf
PWC	https://paperswithcode.com/paper/compositional-video-prediction
Repo
Framework

Towards Model-Agnostic Adversarial Defenses using Adversarially Trained Autoencoders


Title	Towards Model-Agnostic Adversarial Defenses using Adversarially Trained Autoencoders
Authors	Pratik Vaishnavi, Kevin Eykholt, Atul Prakash, Amir Rahmati
Abstract	Adversarial machine learning is a well-studied field of research where an adversary causes predictable errors in a machine learning algorithm through precise manipulation of the input. Numerous techniques have been proposed to harden machine learning algorithms and mitigate the effect of adversarial attacks. Of these techniques, adversarial training, which augments the training data with adversarial samples, has proven to be an effective defense with respect to a certain class of attacks. However, adversarial training is computationally expensive and its improvements are limited to a single model. In this work, we take a first step toward creating a model-agnostic adversarial defense. We propose Adversarially-Trained Autoencoder Augmentation (AAA), the first model-agnostic adversarial defense that is robust against certain adaptive adversaries. We show that AAA allows us to achieve a partially model-agnostic defense by training a single autoencoder to protect multiple pre-trained classifiers; achieving adversarial performance on par or better than adversarial training without modifying the classifiers. Furthermore, we demonstrate that AAA can be used to create a fully model-agnostic defense for MNIST and Fashion MNIST datasets by improving the adversarial performance of a never before seen pre-trained classifier by at least 45% with no additional training. Finally, using a natural image corruption dataset, we show that our approach improves robustness to naturally corrupted images,which has been identified as strongly indicative of true adversarial robustness.
Tasks	Adversarial Defense
Published	2019-09-12
URL	https://arxiv.org/abs/1909.05921v3
PDF	https://arxiv.org/pdf/1909.05921v3.pdf
PWC	https://paperswithcode.com/paper/transferable-adversarial-robustness-using
Repo
Framework

Generalized Domain Adaptation with Covariate and Label Shift CO-ALignment


Title	Generalized Domain Adaptation with Covariate and Label Shift CO-ALignment
Authors	Shuhan Tan, Xingchao Peng, Kate Saenko
Abstract	Unsupervised knowledge transfer has a great potential to improve the generalizability of deep models to novel domains. Yet the current literature assumes that the label distribution is domain-invariant and only aligns the covariate or vice versa. In this paper, we explore the task of Generalized Domain Adaptation (GDA): How to transfer knowledge across different domains in the presence of both covariate and label shift? We propose a covariate and label distribution CO-ALignment (COAL) model to tackle this problem. Our model leverages prototype-based conditional alignment and label distribution estimation to diminish the covariate and label shifts, respectively. We demonstrate experimentally that when both types of shift exist in the data, COAL leads to state-of-the-art performance on several cross-domain benchmarks.
Tasks	Domain Adaptation, Transfer Learning
Published	2019-10-23
URL	https://arxiv.org/abs/1910.10320v1
PDF	https://arxiv.org/pdf/1910.10320v1.pdf
PWC	https://paperswithcode.com/paper/generalized-domain-adaptation-with-covariate-1
Repo
Framework

Learning Gaussian Policies from Corrective Human Feedback


Title	Learning Gaussian Policies from Corrective Human Feedback
Authors	Daan Wout, Jan Scholten, Carlos Celemin, Jens Kober
Abstract	Learning from human feedback is a viable alternative to control design that does not require modelling or control expertise. Particularly, learning from corrective advice garners advantages over evaluative feedback as it is a more intuitive and scalable format. The current state-of-the-art in this field, COACH, has proven to be a effective approach for confined problems. However, it parameterizes the policy with Radial Basis Function networks, which require meticulous feature space engineering for higher order systems. We introduce Gaussian Process Coach (GPC), where feature space engineering is avoided by employing Gaussian Processes. In addition, we use the available policy uncertainty to 1) inquire feedback samples of maximal utility and 2) to adapt the learning rate to the teacher’s learning phase. We demonstrate that the novel algorithm outperforms the current state-of-the-art in final performance, convergence rate and robustness to erroneous feedback in OpenAI Gym continuous control benchmarks, both for simulated and real human teachers.
Tasks	Continuous Control, Gaussian Processes
Published	2019-03-12
URL	http://arxiv.org/abs/1903.05216v1
PDF	http://arxiv.org/pdf/1903.05216v1.pdf
PWC	https://paperswithcode.com/paper/learning-gaussian-policies-from-corrective
Repo
Framework

Scalable Grouped Gaussian Processes via Direct Cholesky Functional Representations


Title	Scalable Grouped Gaussian Processes via Direct Cholesky Functional Representations
Authors	Astrid Dahl, Edwin V. Bonilla
Abstract	We consider multi-task regression models where observations are assumed to be a linear combination of several latent node and weight functions, all drawn from Gaussian process (GP) priors that allow nonzero covariance between grouped latent functions. We show that when these grouped functions are conditionally independent given a group-dependent pivot, it is possible to parameterize the prior through sparse Cholesky factors directly, hence avoiding their computation during inference. Furthermore, we establish that kernels that are multiplicatively separable over input points give rise to such sparse parameterizations naturally without any additional assumptions. Finally, we extend the use of these sparse structures to approximate posteriors within variational inference, further improving scalability on the number of functions. We test our approach on multi-task datasets concerning distributed solar forecasting and show that it outperforms several multi-task GP baselines and that our sparse specifications achieve the same or better accuracy than non-sparse counterparts.
Tasks	Gaussian Processes
Published	2019-03-10
URL	https://arxiv.org/abs/1903.03986v2
PDF	https://arxiv.org/pdf/1903.03986v2.pdf
PWC	https://paperswithcode.com/paper/sparse-grouped-gaussian-processes-for-solar
Repo
Framework

Making the Dynamic Time Warping Distance Warping-Invariant


Title	Making the Dynamic Time Warping Distance Warping-Invariant
Authors	Brijnesh Jain
Abstract	The literature postulates that the dynamic time warping (dtw) distance can cope with temporal variations but stores and processes time series in a form as if the dtw-distance cannot cope with such variations. To address this inconsistency, we first show that the dtw-distance is not warping-invariant. The lack of warping-invariance contributes to the inconsistency mentioned above and to a strange behavior. To eliminate these peculiarities, we convert the dtw-distance to a warping-invariant semi-metric, called time-warp-invariant (twi) distance. Empirical results suggest that the error rates of the twi and dtw nearest-neighbor classifier are practically equivalent in a Bayesian sense. However, the twi-distance requires less storage and computation time than the dtw-distance for a broad range of problems. These results challenge the current practice of applying the dtw-distance in nearest-neighbor classification and suggest the proposed twi-distance as a more efficient and consistent option.
Tasks	Time Series
Published	2019-03-04
URL	http://arxiv.org/abs/1903.01454v2
PDF	http://arxiv.org/pdf/1903.01454v2.pdf
PWC	https://paperswithcode.com/paper/making-the-dynamic-time-warping-distance
Repo
Framework

Towards Automation of Creativity: A Machine Intelligence Approach


Title	Towards Automation of Creativity: A Machine Intelligence Approach
Authors	Subodh Deolekar, Siby Abraham
Abstract	This paper demonstrates emergence of computational creativity in the field of music. Different aspects of creativity such as producer, process, product and press are studied and formulated. Different notions of computational creativity such as novelty, quality and typicality of compositions as products are studied and evaluated. We formulate an algorithmic perception on human creativity and propose a prototype that is capable of demonstrating human-level creativity. We then validate the proposed prototype by applying various creativity benchmarks with the results obtained and compare the proposed prototype with the other existing computational creative systems.
Tasks
Published	2019-04-27
URL	http://arxiv.org/abs/1904.12194v1
PDF	http://arxiv.org/pdf/1904.12194v1.pdf
PWC	https://paperswithcode.com/paper/towards-automation-of-creativity-a-machine
Repo
Framework

Style Mixer: Semantic-aware Multi-Style Transfer Network


Title	Style Mixer: Semantic-aware Multi-Style Transfer Network
Authors	Zixuan Huang, Jinghuai Zhang, Jing Liao
Abstract	Recent neural style transfer frameworks have obtained astonishing visual quality and flexibility in Single-style Transfer (SST), but little attention has been paid to Multi-style Transfer (MST) which refers to simultaneously transferring multiple styles to the same image. Compared to SST, MST has the potential to create more diverse and visually pleasing stylization results. In this paper, we propose the first MST framework to automatically incorporate multiple styles into one result based on regional semantics. We first improve the existing SST backbone network by introducing a novel multi-level feature fusion module and a patch attention module to achieve better semantic correspondences and preserve richer style details. For MST, we designed a conceptually simple yet effective region-based style fusion module to insert into the backbone. It assigns corresponding styles to content regions based on semantic matching, and then seamlessly combines multiple styles together. Comprehensive evaluations demonstrate that our framework outperforms existing works of SST and MST.
Tasks
Published	2019-10-29
URL	https://arxiv.org/abs/1910.13093v1
PDF	https://arxiv.org/pdf/1910.13093v1.pdf
PWC	https://paperswithcode.com/paper/style-mixer-semantic-aware-multi-style
Repo
Framework

Key Protected Classification for Collaborative Learning


Title	Key Protected Classification for Collaborative Learning
Authors	Mert Bülent Sarıyıldız, Ramazan Gökberk Cinbiş, Erman Ayday
Abstract	Large-scale datasets play a fundamental role in training deep learning models. However, dataset collection is difficult in domains that involve sensitive information. Collaborative learning techniques provide a privacy-preserving solution, by enabling training over a number of private datasets that are not shared by their owners. However, recently, it has been shown that the existing collaborative learning frameworks are vulnerable to an active adversary that runs a generative adversarial network (GAN) attack. In this work, we propose a novel classification model that is resilient against such attacks by design. More specifically, we introduce a key-based classification model and a principled training scheme that protects class scores by using class-specific private keys, which effectively hides the information necessary for a GAN attack. We additionally show how to utilize high dimensional keys to improve the robustness against attacks without increasing the model complexity. Our detailed experiments demonstrate the effectiveness of the proposed technique.
Tasks
Published	2019-08-27
URL	https://arxiv.org/abs/1908.10172v1
PDF	https://arxiv.org/pdf/1908.10172v1.pdf
PWC	https://paperswithcode.com/paper/key-protected-classification-for
Repo
Framework

How does the Mind store Information?


Title	How does the Mind store Information?
Authors	Rina Panigrahy
Abstract	How we store information in our mind has been a major intriguing open question. We approach this question not from a physiological standpoint as to how information is physically stored in the brain, but from a conceptual and algorithm standpoint as to the right data structures to be used to organize and index information. Here we propose a memory architecture directly based on the recursive sketching ideas from the paper “Recursive Sketches for Modular Deep Networks”, ICML 2019 (arXiv:1905.12730), to store information in memory as concise sketches. We also give a high level, informal exposition of the recursive sketching idea from the paper that makes use of subspace embeddings to capture deep network computations into a concise sketch. These sketches form an implicit knowledge graph that can be used to find related information via sketches from the past while processing an event.
Tasks
Published	2019-10-03
URL	https://arxiv.org/abs/1910.06718v1
PDF	https://arxiv.org/pdf/1910.06718v1.pdf
PWC	https://paperswithcode.com/paper/how-does-the-mind-store-information
Repo
Framework