Paper Group ANR 1664
Latent Relation Language Models. Data Extraction from Charts via Single Deep Neural Network. Gaussian Process Optimization with Adaptive Sketching: Scalable and No Regret. Financial Applications of Gaussian Processes and Bayesian Optimization. Evaluating the Cross-Lingual Effectiveness of Massively Multilingual Neural Machine Translation. Compositi …
Latent Relation Language Models
Title | Latent Relation Language Models |
Authors | Hiroaki Hayashi, Zecong Hu, Chenyan Xiong, Graham Neubig |
Abstract | In this paper, we propose Latent Relation Language Models (LRLMs), a class of language models that parameterizes the joint distribution over the words in a document and the entities that occur therein via knowledge graph relations. This model has a number of attractive properties: it not only improves language modeling performance, but is also able to annotate the posterior probability of entity spans for a given text through relations. Experiments demonstrate empirical improvements over both a word-based baseline language model and a previous approach that incorporates knowledge graph information. Qualitative analysis further demonstrates the proposed model’s ability to learn to predict appropriate relations in context. |
Tasks | Language Modelling |
Published | 2019-08-21 |
URL | https://arxiv.org/abs/1908.07690v1 |
https://arxiv.org/pdf/1908.07690v1.pdf | |
PWC | https://paperswithcode.com/paper/190807690 |
Repo | |
Framework | |
Data Extraction from Charts via Single Deep Neural Network
Title | Data Extraction from Charts via Single Deep Neural Network |
Authors | Xiaoyi Liu, Diego Klabjan, Patrick NBless |
Abstract | Automatic data extraction from charts is challenging for two reasons: there exist many relations among objects in a chart, which is not a common consideration in general computer vision problems; and different types of charts may not be processed by the same model. To address these problems, we propose a framework of a single deep neural network, which consists of object detection, text recognition and object matching modules. The framework handles both bar and pie charts, and it may also be extended to other types of charts by slight revisions and by augmenting the training data. Our model performs successfully on 79.4% of test simulated bar charts and 88.0% of test simulated pie charts, while for charts outside of the training domain it degrades for 57.5% and 62.3%, respectively. |
Tasks | Object Detection |
Published | 2019-06-06 |
URL | https://arxiv.org/abs/1906.11906v1 |
https://arxiv.org/pdf/1906.11906v1.pdf | |
PWC | https://paperswithcode.com/paper/data-extraction-from-charts-via-single-deep |
Repo | |
Framework | |
Gaussian Process Optimization with Adaptive Sketching: Scalable and No Regret
Title | Gaussian Process Optimization with Adaptive Sketching: Scalable and No Regret |
Authors | Daniele Calandriello, Luigi Carratino, Alessandro Lazaric, Michal Valko, Lorenzo Rosasco |
Abstract | Gaussian processes (GP) are a well studied Bayesian approach for the optimization of black-box functions. Despite their effectiveness in simple problems, GP-based algorithms hardly scale to high-dimensional functions, as their per-iteration time and space cost is at least quadratic in the number of dimensions $d$ and iterations $t$. Given a set of $A$ alternatives to choose from, the overall runtime $O(t^3A)$ is prohibitive. In this paper we introduce BKB (budgeted kernelized bandit), a new approximate GP algorithm for optimization under bandit feedback that achieves near-optimal regret (and hence near-optimal convergence rate) with near-constant per-iteration complexity and remarkably no assumption on the input space or covariance of the GP. We combine a kernelized linear bandit algorithm (GP-UCB) with randomized matrix sketching based on leverage score sampling, and we prove that randomly sampling inducing points based on their posterior variance gives an accurate low-rank approximation of the GP, preserving variance estimates and confidence intervals. As a consequence, BKB does not suffer from variance starvation, an important problem faced by many previous sparse GP approximations. Moreover, we show that our procedure selects at most $\tilde{O}(d_{eff})$ points, where $d_{eff}$ is the effective dimension of the explored space, which is typically much smaller than both $d$ and $t$. This greatly reduces the dimensionality of the problem, thus leading to a $O(TAd_{eff}^2)$ runtime and $O(A d_{eff})$ space complexity. |
Tasks | Gaussian Processes |
Published | 2019-03-13 |
URL | https://arxiv.org/abs/1903.05594v2 |
https://arxiv.org/pdf/1903.05594v2.pdf | |
PWC | https://paperswithcode.com/paper/gaussian-process-optimization-with-adaptive |
Repo | |
Framework | |
Financial Applications of Gaussian Processes and Bayesian Optimization
Title | Financial Applications of Gaussian Processes and Bayesian Optimization |
Authors | Joan Gonzalvez, Edmond Lezmi, Thierry Roncalli, Jiali Xu |
Abstract | In the last five years, the financial industry has been impacted by the emergence of digitalization and machine learning. In this article, we explore two methods that have undergone rapid development in recent years: Gaussian processes and Bayesian optimization. Gaussian processes can be seen as a generalization of Gaussian random vectors and are associated with the development of kernel methods. Bayesian optimization is an approach for performing derivative-free global optimization in a small dimension, and uses Gaussian processes to locate the global maximum of a black-box function. The first part of the article reviews these two tools and shows how they are connected. In particular, we focus on the Gaussian process regression, which is the core of Bayesian machine learning, and the issue of hyperparameter selection. The second part is dedicated to two financial applications. We first consider the modeling of the term structure of interest rates. More precisely, we test the fitting method and compare the GP prediction and the random walk model. The second application is the construction of trend-following strategies, in particular the online estimation of trend and covariance windows. |
Tasks | Gaussian Processes |
Published | 2019-03-12 |
URL | http://arxiv.org/abs/1903.04841v1 |
http://arxiv.org/pdf/1903.04841v1.pdf | |
PWC | https://paperswithcode.com/paper/financial-applications-of-gaussian-processes |
Repo | |
Framework | |
Evaluating the Cross-Lingual Effectiveness of Massively Multilingual Neural Machine Translation
Title | Evaluating the Cross-Lingual Effectiveness of Massively Multilingual Neural Machine Translation |
Authors | Aditya Siddhant, Melvin Johnson, Henry Tsai, Naveen Arivazhagan, Jason Riesa, Ankur Bapna, Orhan Firat, Karthik Raman |
Abstract | The recently proposed massively multilingual neural machine translation (NMT) system has been shown to be capable of translating over 100 languages to and from English within a single model. Its improved translation performance on low resource languages hints at potential cross-lingual transfer capability for downstream tasks. In this paper, we evaluate the cross-lingual effectiveness of representations from the encoder of a massively multilingual NMT model on 5 downstream classification and sequence labeling tasks covering a diverse set of over 50 languages. We compare against a strong baseline, multilingual BERT (mBERT), in different cross-lingual transfer learning scenarios and show gains in zero-shot transfer in 4 out of these 5 tasks. |
Tasks | Cross-Lingual Transfer, Machine Translation, Transfer Learning |
Published | 2019-09-01 |
URL | https://arxiv.org/abs/1909.00437v1 |
https://arxiv.org/pdf/1909.00437v1.pdf | |
PWC | https://paperswithcode.com/paper/evaluating-the-cross-lingual-effectiveness-of |
Repo | |
Framework | |
Compositional Video Prediction
Title | Compositional Video Prediction |
Authors | Yufei Ye, Maneesh Singh, Abhinav Gupta, Shubham Tulsiani |
Abstract | We present an approach for pixel-level future prediction given an input image of a scene. We observe that a scene is comprised of distinct entities that undergo motion and present an approach that operationalizes this insight. We implicitly predict future states of independent entities while reasoning about their interactions, and compose future video frames using these predicted states. We overcome the inherent multi-modality of the task using a global trajectory-level latent random variable, and show that this allows us to sample diverse and plausible futures. We empirically validate our approach against alternate representations and ways of incorporating multi-modality. We examine two datasets, one comprising of stacked objects that may fall, and the other containing videos of humans performing activities in a gym, and show that our approach allows realistic stochastic video prediction across these diverse settings. See https://judyye.github.io/CVP/ for video predictions. |
Tasks | Future prediction, Video Prediction |
Published | 2019-08-22 |
URL | https://arxiv.org/abs/1908.08522v1 |
https://arxiv.org/pdf/1908.08522v1.pdf | |
PWC | https://paperswithcode.com/paper/compositional-video-prediction |
Repo | |
Framework | |
Towards Model-Agnostic Adversarial Defenses using Adversarially Trained Autoencoders
Title | Towards Model-Agnostic Adversarial Defenses using Adversarially Trained Autoencoders |
Authors | Pratik Vaishnavi, Kevin Eykholt, Atul Prakash, Amir Rahmati |
Abstract | Adversarial machine learning is a well-studied field of research where an adversary causes predictable errors in a machine learning algorithm through precise manipulation of the input. Numerous techniques have been proposed to harden machine learning algorithms and mitigate the effect of adversarial attacks. Of these techniques, adversarial training, which augments the training data with adversarial samples, has proven to be an effective defense with respect to a certain class of attacks. However, adversarial training is computationally expensive and its improvements are limited to a single model. In this work, we take a first step toward creating a model-agnostic adversarial defense. We propose Adversarially-Trained Autoencoder Augmentation (AAA), the first model-agnostic adversarial defense that is robust against certain adaptive adversaries. We show that AAA allows us to achieve a partially model-agnostic defense by training a single autoencoder to protect multiple pre-trained classifiers; achieving adversarial performance on par or better than adversarial training without modifying the classifiers. Furthermore, we demonstrate that AAA can be used to create a fully model-agnostic defense for MNIST and Fashion MNIST datasets by improving the adversarial performance of a never before seen pre-trained classifier by at least 45% with no additional training. Finally, using a natural image corruption dataset, we show that our approach improves robustness to naturally corrupted images,which has been identified as strongly indicative of true adversarial robustness. |
Tasks | Adversarial Defense |
Published | 2019-09-12 |
URL | https://arxiv.org/abs/1909.05921v3 |
https://arxiv.org/pdf/1909.05921v3.pdf | |
PWC | https://paperswithcode.com/paper/transferable-adversarial-robustness-using |
Repo | |
Framework | |
Generalized Domain Adaptation with Covariate and Label Shift CO-ALignment
Title | Generalized Domain Adaptation with Covariate and Label Shift CO-ALignment |
Authors | Shuhan Tan, Xingchao Peng, Kate Saenko |
Abstract | Unsupervised knowledge transfer has a great potential to improve the generalizability of deep models to novel domains. Yet the current literature assumes that the label distribution is domain-invariant and only aligns the covariate or vice versa. In this paper, we explore the task of Generalized Domain Adaptation (GDA): How to transfer knowledge across different domains in the presence of both covariate and label shift? We propose a covariate and label distribution CO-ALignment (COAL) model to tackle this problem. Our model leverages prototype-based conditional alignment and label distribution estimation to diminish the covariate and label shifts, respectively. We demonstrate experimentally that when both types of shift exist in the data, COAL leads to state-of-the-art performance on several cross-domain benchmarks. |
Tasks | Domain Adaptation, Transfer Learning |
Published | 2019-10-23 |
URL | https://arxiv.org/abs/1910.10320v1 |
https://arxiv.org/pdf/1910.10320v1.pdf | |
PWC | https://paperswithcode.com/paper/generalized-domain-adaptation-with-covariate-1 |
Repo | |
Framework | |
Learning Gaussian Policies from Corrective Human Feedback
Title | Learning Gaussian Policies from Corrective Human Feedback |
Authors | Daan Wout, Jan Scholten, Carlos Celemin, Jens Kober |
Abstract | Learning from human feedback is a viable alternative to control design that does not require modelling or control expertise. Particularly, learning from corrective advice garners advantages over evaluative feedback as it is a more intuitive and scalable format. The current state-of-the-art in this field, COACH, has proven to be a effective approach for confined problems. However, it parameterizes the policy with Radial Basis Function networks, which require meticulous feature space engineering for higher order systems. We introduce Gaussian Process Coach (GPC), where feature space engineering is avoided by employing Gaussian Processes. In addition, we use the available policy uncertainty to 1) inquire feedback samples of maximal utility and 2) to adapt the learning rate to the teacher’s learning phase. We demonstrate that the novel algorithm outperforms the current state-of-the-art in final performance, convergence rate and robustness to erroneous feedback in OpenAI Gym continuous control benchmarks, both for simulated and real human teachers. |
Tasks | Continuous Control, Gaussian Processes |
Published | 2019-03-12 |
URL | http://arxiv.org/abs/1903.05216v1 |
http://arxiv.org/pdf/1903.05216v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-gaussian-policies-from-corrective |
Repo | |
Framework | |
Scalable Grouped Gaussian Processes via Direct Cholesky Functional Representations
Title | Scalable Grouped Gaussian Processes via Direct Cholesky Functional Representations |
Authors | Astrid Dahl, Edwin V. Bonilla |
Abstract | We consider multi-task regression models where observations are assumed to be a linear combination of several latent node and weight functions, all drawn from Gaussian process (GP) priors that allow nonzero covariance between grouped latent functions. We show that when these grouped functions are conditionally independent given a group-dependent pivot, it is possible to parameterize the prior through sparse Cholesky factors directly, hence avoiding their computation during inference. Furthermore, we establish that kernels that are multiplicatively separable over input points give rise to such sparse parameterizations naturally without any additional assumptions. Finally, we extend the use of these sparse structures to approximate posteriors within variational inference, further improving scalability on the number of functions. We test our approach on multi-task datasets concerning distributed solar forecasting and show that it outperforms several multi-task GP baselines and that our sparse specifications achieve the same or better accuracy than non-sparse counterparts. |
Tasks | Gaussian Processes |
Published | 2019-03-10 |
URL | https://arxiv.org/abs/1903.03986v2 |
https://arxiv.org/pdf/1903.03986v2.pdf | |
PWC | https://paperswithcode.com/paper/sparse-grouped-gaussian-processes-for-solar |
Repo | |
Framework | |
Making the Dynamic Time Warping Distance Warping-Invariant
Title | Making the Dynamic Time Warping Distance Warping-Invariant |
Authors | Brijnesh Jain |
Abstract | The literature postulates that the dynamic time warping (dtw) distance can cope with temporal variations but stores and processes time series in a form as if the dtw-distance cannot cope with such variations. To address this inconsistency, we first show that the dtw-distance is not warping-invariant. The lack of warping-invariance contributes to the inconsistency mentioned above and to a strange behavior. To eliminate these peculiarities, we convert the dtw-distance to a warping-invariant semi-metric, called time-warp-invariant (twi) distance. Empirical results suggest that the error rates of the twi and dtw nearest-neighbor classifier are practically equivalent in a Bayesian sense. However, the twi-distance requires less storage and computation time than the dtw-distance for a broad range of problems. These results challenge the current practice of applying the dtw-distance in nearest-neighbor classification and suggest the proposed twi-distance as a more efficient and consistent option. |
Tasks | Time Series |
Published | 2019-03-04 |
URL | http://arxiv.org/abs/1903.01454v2 |
http://arxiv.org/pdf/1903.01454v2.pdf | |
PWC | https://paperswithcode.com/paper/making-the-dynamic-time-warping-distance |
Repo | |
Framework | |
Towards Automation of Creativity: A Machine Intelligence Approach
Title | Towards Automation of Creativity: A Machine Intelligence Approach |
Authors | Subodh Deolekar, Siby Abraham |
Abstract | This paper demonstrates emergence of computational creativity in the field of music. Different aspects of creativity such as producer, process, product and press are studied and formulated. Different notions of computational creativity such as novelty, quality and typicality of compositions as products are studied and evaluated. We formulate an algorithmic perception on human creativity and propose a prototype that is capable of demonstrating human-level creativity. We then validate the proposed prototype by applying various creativity benchmarks with the results obtained and compare the proposed prototype with the other existing computational creative systems. |
Tasks | |
Published | 2019-04-27 |
URL | http://arxiv.org/abs/1904.12194v1 |
http://arxiv.org/pdf/1904.12194v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-automation-of-creativity-a-machine |
Repo | |
Framework | |
Style Mixer: Semantic-aware Multi-Style Transfer Network
Title | Style Mixer: Semantic-aware Multi-Style Transfer Network |
Authors | Zixuan Huang, Jinghuai Zhang, Jing Liao |
Abstract | Recent neural style transfer frameworks have obtained astonishing visual quality and flexibility in Single-style Transfer (SST), but little attention has been paid to Multi-style Transfer (MST) which refers to simultaneously transferring multiple styles to the same image. Compared to SST, MST has the potential to create more diverse and visually pleasing stylization results. In this paper, we propose the first MST framework to automatically incorporate multiple styles into one result based on regional semantics. We first improve the existing SST backbone network by introducing a novel multi-level feature fusion module and a patch attention module to achieve better semantic correspondences and preserve richer style details. For MST, we designed a conceptually simple yet effective region-based style fusion module to insert into the backbone. It assigns corresponding styles to content regions based on semantic matching, and then seamlessly combines multiple styles together. Comprehensive evaluations demonstrate that our framework outperforms existing works of SST and MST. |
Tasks | |
Published | 2019-10-29 |
URL | https://arxiv.org/abs/1910.13093v1 |
https://arxiv.org/pdf/1910.13093v1.pdf | |
PWC | https://paperswithcode.com/paper/style-mixer-semantic-aware-multi-style |
Repo | |
Framework | |
Key Protected Classification for Collaborative Learning
Title | Key Protected Classification for Collaborative Learning |
Authors | Mert Bülent Sarıyıldız, Ramazan Gökberk Cinbiş, Erman Ayday |
Abstract | Large-scale datasets play a fundamental role in training deep learning models. However, dataset collection is difficult in domains that involve sensitive information. Collaborative learning techniques provide a privacy-preserving solution, by enabling training over a number of private datasets that are not shared by their owners. However, recently, it has been shown that the existing collaborative learning frameworks are vulnerable to an active adversary that runs a generative adversarial network (GAN) attack. In this work, we propose a novel classification model that is resilient against such attacks by design. More specifically, we introduce a key-based classification model and a principled training scheme that protects class scores by using class-specific private keys, which effectively hides the information necessary for a GAN attack. We additionally show how to utilize high dimensional keys to improve the robustness against attacks without increasing the model complexity. Our detailed experiments demonstrate the effectiveness of the proposed technique. |
Tasks | |
Published | 2019-08-27 |
URL | https://arxiv.org/abs/1908.10172v1 |
https://arxiv.org/pdf/1908.10172v1.pdf | |
PWC | https://paperswithcode.com/paper/key-protected-classification-for |
Repo | |
Framework | |
How does the Mind store Information?
Title | How does the Mind store Information? |
Authors | Rina Panigrahy |
Abstract | How we store information in our mind has been a major intriguing open question. We approach this question not from a physiological standpoint as to how information is physically stored in the brain, but from a conceptual and algorithm standpoint as to the right data structures to be used to organize and index information. Here we propose a memory architecture directly based on the recursive sketching ideas from the paper “Recursive Sketches for Modular Deep Networks”, ICML 2019 (arXiv:1905.12730), to store information in memory as concise sketches. We also give a high level, informal exposition of the recursive sketching idea from the paper that makes use of subspace embeddings to capture deep network computations into a concise sketch. These sketches form an implicit knowledge graph that can be used to find related information via sketches from the past while processing an event. |
Tasks | |
Published | 2019-10-03 |
URL | https://arxiv.org/abs/1910.06718v1 |
https://arxiv.org/pdf/1910.06718v1.pdf | |
PWC | https://paperswithcode.com/paper/how-does-the-mind-store-information |
Repo | |
Framework | |