October 16, 2019

3203 words 16 mins read

Paper Group ANR 1087

Paper Group ANR 1087

Unsupervised Learning of Artistic Styles with Archetypal Style Analysis. Accelerating Learning in Constructive Predictive Frameworks with the Successor Representation. Robust Text-to-SQL Generation with Execution-Guided Decoding. The LKPY Package for Recommender Systems Experiments: Next-Generation Tools and Lessons Learned from the LensKit Project …

Unsupervised Learning of Artistic Styles with Archetypal Style Analysis

Title Unsupervised Learning of Artistic Styles with Archetypal Style Analysis
Authors Daan Wynen, Cordelia Schmid, Julien Mairal
Abstract In this paper, we introduce an unsupervised learning approach to automatically discover, summarize, and manipulate artistic styles from large collections of paintings. Our method is based on archetypal analysis, which is an unsupervised learning technique akin to sparse coding with a geometric interpretation. When applied to deep image representations from a collection of artworks, it learns a dictionary of archetypal styles, which can be easily visualized. After training the model, the style of a new image, which is characterized by local statistics of deep visual features, is approximated by a sparse convex combination of archetypes. This enables us to interpret which archetypal styles are present in the input image, and in which proportion. Finally, our approach allows us to manipulate the coefficients of the latent archetypal decomposition, and achieve various special effects such as style enhancement, transfer, and interpolation between multiple archetypes.
Tasks
Published 2018-05-28
URL http://arxiv.org/abs/1805.11155v2
PDF http://arxiv.org/pdf/1805.11155v2.pdf
PWC https://paperswithcode.com/paper/unsupervised-learning-of-artistic-styles-with
Repo
Framework

Accelerating Learning in Constructive Predictive Frameworks with the Successor Representation

Title Accelerating Learning in Constructive Predictive Frameworks with the Successor Representation
Authors Craig Sherstan, Marlos C. Machado, Patrick M. Pilarski
Abstract Here we propose using the successor representation (SR) to accelerate learning in a constructive knowledge system based on general value functions (GVFs). In real-world settings like robotics for unstructured and dynamic environments, it is infeasible to model all meaningful aspects of a system and its environment by hand due to both complexity and size. Instead, robots must be capable of learning and adapting to changes in their environment and task, incrementally constructing models from their own experience. GVFs, taken from the field of reinforcement learning (RL), are a way of modeling the world as predictive questions. One approach to such models proposes a massive network of interconnected and interdependent GVFs, which are incrementally added over time. It is reasonable to expect that new, incrementally added predictions can be learned more swiftly if the learning process leverages knowledge gained from past experience. The SR provides such a means of separating the dynamics of the world from the prediction targets and thus capturing regularities that can be reused across multiple GVFs. As a primary contribution of this work, we show that using SR-based predictions can improve sample efficiency and learning speed in a continual learning setting where new predictions are incrementally added and learned over time. We analyze our approach in a grid-world and then demonstrate its potential on data from a physical robot arm.
Tasks Continual Learning
Published 2018-03-23
URL http://arxiv.org/abs/1803.09001v1
PDF http://arxiv.org/pdf/1803.09001v1.pdf
PWC https://paperswithcode.com/paper/accelerating-learning-in-constructive
Repo
Framework

Robust Text-to-SQL Generation with Execution-Guided Decoding

Title Robust Text-to-SQL Generation with Execution-Guided Decoding
Authors Chenglong Wang, Kedar Tatwawadi, Marc Brockschmidt, Po-Sen Huang, Yi Mao, Oleksandr Polozov, Rishabh Singh
Abstract We consider the problem of neural semantic parsing, which translates natural language questions into executable SQL queries. We introduce a new mechanism, execution guidance, to leverage the semantics of SQL. It detects and excludes faulty programs during the decoding procedure by conditioning on the execution of partially generated program. The mechanism can be used with any autoregressive generative model, which we demonstrate on four state-of-the-art recurrent or template-based semantic parsing models. We demonstrate that execution guidance universally improves model performance on various text-to-SQL datasets with different scales and query complexity: WikiSQL, ATIS, and GeoQuery. As a result, we achieve new state-of-the-art execution accuracy of 83.8% on WikiSQL.
Tasks Semantic Parsing, Text-To-Sql
Published 2018-07-09
URL http://arxiv.org/abs/1807.03100v3
PDF http://arxiv.org/pdf/1807.03100v3.pdf
PWC https://paperswithcode.com/paper/robust-text-to-sql-generation-with-execution
Repo
Framework

The LKPY Package for Recommender Systems Experiments: Next-Generation Tools and Lessons Learned from the LensKit Project

Title The LKPY Package for Recommender Systems Experiments: Next-Generation Tools and Lessons Learned from the LensKit Project
Authors Michael D. Ekstrand
Abstract Since 2010, we have built and maintained LensKit, an open-source toolkit for building, researching, and learning about recommender systems. We have successfully used the software in a wide range of recommender systems experiments, to support education in traditional classroom and online settings, and as the algorithmic backend for user-facing recommendation services in movies and books. This experience, along with community feedback, has surfaced a number of challenges with LensKit’s design and environmental choices. In response to these challenges, we are developing a new set of tools that leverage the PyData stack to enable the kinds of research experiments and educational experiences that we have been able to deliver with LensKit, along with new experimental structures that the existing code makes difficult. The result is a set of research tools that should significantly increase research velocity and provide much smoother integration with other software such as Keras while maintaining the same level of reproducibility as a LensKit experiment. In this paper, we reflect on the LensKit project, particularly on our experience using it for offline evaluation experiments, and describe the next-generation LKPY tools for enabling new offline evaluations and experiments with flexible, open-ended designs and well-tested evaluation primitives.
Tasks Recommendation Systems
Published 2018-09-10
URL https://arxiv.org/abs/1809.03125v2
PDF https://arxiv.org/pdf/1809.03125v2.pdf
PWC https://paperswithcode.com/paper/the-lkpy-package-for-recommender-systems
Repo
Framework

Asynchronous Byzantine Machine Learning (the case of SGD)

Title Asynchronous Byzantine Machine Learning (the case of SGD)
Authors Georgios Damaskinos, El Mahdi El Mhamdi, Rachid Guerraoui, Rhicheek Patra, Mahsa Taziki
Abstract Asynchronous distributed machine learning solutions have proven very effective so far, but always assuming perfectly functioning workers. In practice, some of the workers can however exhibit Byzantine behavior, caused by hardware failures, software bugs, corrupt data, or even malicious attacks. We introduce \emph{Kardam}, the first distributed asynchronous stochastic gradient descent (SGD) algorithm that copes with Byzantine workers. Kardam consists of two complementary components: a filtering and a dampening component. The first is scalar-based and ensures resilience against $\frac{1}{3}$ Byzantine workers. Essentially, this filter leverages the Lipschitzness of cost functions and acts as a self-stabilizer against Byzantine workers that would attempt to corrupt the progress of SGD. The dampening component bounds the convergence rate by adjusting to stale information through a generic gradient weighting scheme. We prove that Kardam guarantees almost sure convergence in the presence of asynchrony and Byzantine behavior, and we derive its convergence rate. We evaluate Kardam on the CIFAR-100 and EMNIST datasets and measure its overhead with respect to non Byzantine-resilient solutions. We empirically show that Kardam does not introduce additional noise to the learning procedure but does induce a slowdown (the cost of Byzantine resilience) that we both theoretically and empirically show to be less than $f/n$, where $f$ is the number of Byzantine failures tolerated and $n$ the total number of workers. Interestingly, we also empirically observe that the dampening component is interesting in its own right for it enables to build an SGD algorithm that outperforms alternative staleness-aware asynchronous competitors in environments with honest workers.
Tasks
Published 2018-02-22
URL http://arxiv.org/abs/1802.07928v2
PDF http://arxiv.org/pdf/1802.07928v2.pdf
PWC https://paperswithcode.com/paper/asynchronous-byzantine-machine-learning-the
Repo
Framework

Linking Gaussian Process regression with data-driven manifold embeddings for nonlinear data fusion

Title Linking Gaussian Process regression with data-driven manifold embeddings for nonlinear data fusion
Authors Seungjoon Lee, Felix Dietrich, George E. Karniadakis, Ioannis G. Kevrekidis
Abstract In statistical modeling with Gaussian Process regression, it has been shown that combining (few) high-fidelity data with (many) low-fidelity data can enhance prediction accuracy, compared to prediction based on the few high-fidelity data only. Such information fusion techniques for multifidelity data commonly approach the high-fidelity model $f_h(t)$ as a function of two variables $(t,y)$, and then using $f_l(t)$ as the $y$ data. More generally, the high-fidelity model can be written as a function of several variables $(t,y_1,y_2….)$; the low-fidelity model $f_l$ and, say, some of its derivatives, can then be substituted for these variables. In this paper, we will explore mathematical algorithms for multifidelity information fusion that use such an approach towards improving the representation of the high-fidelity function with only a few training data points. Given that $f_h$ may not be a simple function – and sometimes not even a function – of $f_l$, we demonstrate that using additional functions of $t$, such as derivatives or shifts of $f_l$, can drastically improve the approximation of $f_h$ through Gaussian Processes. We also point out a connection with “embedology” techniques from topology and dynamical systems.
Tasks Gaussian Processes
Published 2018-12-16
URL http://arxiv.org/abs/1812.06467v1
PDF http://arxiv.org/pdf/1812.06467v1.pdf
PWC https://paperswithcode.com/paper/linking-gaussian-process-regression-with-data
Repo
Framework

Models for Capturing Temporal Smoothness in Evolving Networks for Learning Latent Representation of Nodes

Title Models for Capturing Temporal Smoothness in Evolving Networks for Learning Latent Representation of Nodes
Authors Tanay Kumar Saha, Thomas Williams, Mohammad Al Hasan, Shafiq Joty, Nicholas K. Varberg
Abstract In a dynamic network, the neighborhood of the vertices evolve across different temporal snapshots of the network. Accurate modeling of this temporal evolution can help solve complex tasks involving real-life social and interaction networks. However, existing models for learning latent representation are inadequate for obtaining the representation vectors of the vertices for different time-stamps of a dynamic network in a meaningful way. In this paper, we propose latent representation learning models for dynamic networks which overcome the above limitation by considering two different kinds of temporal smoothness: (i) retrofitted, and (ii) linear transformation. The retrofitted model tracks the representation vector of a vertex over time, facilitating vertex-based temporal analysis of a network. On the other hand, linear transformation based model provides a smooth transition operator which maps the representation vectors of all vertices from one temporal snapshot to the next (unobserved) snapshot-this facilitates prediction of the state of a network in a future time-stamp. We validate the performance of our proposed models by employing them for solving the temporal link prediction task. Experiments on 9 real-life networks from various domains validate that the proposed models are significantly better than the existing models for predicting the dynamics of an evolving network.
Tasks Link Prediction, Representation Learning
Published 2018-04-16
URL http://arxiv.org/abs/1804.05816v1
PDF http://arxiv.org/pdf/1804.05816v1.pdf
PWC https://paperswithcode.com/paper/models-for-capturing-temporal-smoothness-in
Repo
Framework

Doc2Im: document to image conversion through self-attentive embedding

Title Doc2Im: document to image conversion through self-attentive embedding
Authors Mithun Das Gupta
Abstract Text classification is a fundamental task in NLP applications. Latest research in this field has largely been divided into two major sub-fields. Learning representations is one sub-field and learning deeper models, both sequential and convolutional, which again connects back to the representation is the other side. We posit the idea that the stronger the representation is, the simpler classifier models are needed to achieve higher performance. In this paper we propose a completely novel direction to text classification research, wherein we convert text to a representation very similar to images, such that any deep network able to handle images is equally able to handle text. We take a deeper look at the representation of documents as an image and subsequently utilize very simple convolution based models taken as is from computer vision domain. This image can be cropped, re-scaled, re-sampled and augmented just like any other image to work with most of the state-of-the-art large convolution based models which have been designed to handle large image datasets. We show impressive results with some of the latest benchmarks in the related fields. We perform transfer learning experiments, both from text to text domain and also from image to text domain. We believe this is a paradigm shift from the way document understanding and text classification has been traditionally done, and will drive numerous novel research ideas in the community.
Tasks Document To Image Conversion, Text Classification, Transfer Learning
Published 2018-11-08
URL http://arxiv.org/abs/1811.03291v1
PDF http://arxiv.org/pdf/1811.03291v1.pdf
PWC https://paperswithcode.com/paper/doc2im-document-to-image-conversion-through
Repo
Framework

In-depth Question classification using Convolutional Neural Networks

Title In-depth Question classification using Convolutional Neural Networks
Authors Prudhvi Raj Dachapally, Srikanth Ramanam
Abstract Convolutional neural networks for computer vision are fairly intuitive. In a typical CNN used in image classification, the first layers learn edges, and the following layers learn some filters that can identify an object. But CNNs for Natural Language Processing are not used often and are not completely intuitive. We have a good idea about what the convolution filters learn for the task of text classification, and to that, we propose a neural network structure that will be able to give good results in less time. We will be using convolutional neural networks to predict the primary or broader topic of a question, and then use separate networks for each of these predicted topics to accurately classify their sub-topics.
Tasks Image Classification, Text Classification
Published 2018-03-31
URL http://arxiv.org/abs/1804.00968v1
PDF http://arxiv.org/pdf/1804.00968v1.pdf
PWC https://paperswithcode.com/paper/in-depth-question-classification-using
Repo
Framework

Learning To Simulate

Title Learning To Simulate
Authors Nataniel Ruiz, Samuel Schulter, Manmohan Chandraker
Abstract Simulation is a useful tool in situations where training data for machine learning models is costly to annotate or even hard to acquire. In this work, we propose a reinforcement learning-based method for automatically adjusting the parameters of any (non-differentiable) simulator, thereby controlling the distribution of synthesized data in order to maximize the accuracy of a model trained on that data. In contrast to prior art that hand-crafts these simulation parameters or adjusts only parts of the available parameters, our approach fully controls the simulator with the actual underlying goal of maximizing accuracy, rather than mimicking the real data distribution or randomly generating a large volume of data. We find that our approach (i) quickly converges to the optimal simulation parameters in controlled experiments and (ii) can indeed discover good sets of parameters for an image rendering simulator in actual computer vision applications.
Tasks
Published 2018-10-05
URL https://arxiv.org/abs/1810.02513v2
PDF https://arxiv.org/pdf/1810.02513v2.pdf
PWC https://paperswithcode.com/paper/learning-to-simulate
Repo
Framework

Modular Mechanistic Networks: On Bridging Mechanistic and Phenomenological Models with Deep Neural Networks in Natural Language Processing

Title Modular Mechanistic Networks: On Bridging Mechanistic and Phenomenological Models with Deep Neural Networks in Natural Language Processing
Authors Simon Dobnik, John D. Kelleher
Abstract Natural language processing (NLP) can be done using either top-down (theory driven) and bottom-up (data driven) approaches, which we call mechanistic and phenomenological respectively. The approaches are frequently considered to stand in opposition to each other. Examining some recent approaches in deep learning we argue that deep neural networks incorporate both perspectives and, furthermore, that leveraging this aspect of deep learning may help in solving complex problems within language technology, such as modelling language and perception in the domain of spatial cognition.
Tasks
Published 2018-07-21
URL http://arxiv.org/abs/1807.09844v2
PDF http://arxiv.org/pdf/1807.09844v2.pdf
PWC https://paperswithcode.com/paper/modular-mechanistic-networks-on-bridging
Repo
Framework

Nonparametric Estimation of Low Rank Matrix Valued Function

Title Nonparametric Estimation of Low Rank Matrix Valued Function
Authors Fan Zhou
Abstract Let $A:[0,1]\rightarrow\mathbb{H}m$ (the space of Hermitian matrices) be a matrix valued function which is low rank with entries in H"{o}lder class $\Sigma(\beta,L)$. The goal of this paper is to study statistical estimation of $A$ based on the regression model $\mathbb{E}(Y_j\tau_j,X_j) = \langle A(\tau_j), X_j \rangle,$ where $\tau_j$ are i.i.d. uniformly distributed in $[0,1]$, $X_j$ are i.i.d. matrix completion sampling matrices, $Y_j$ are independent bounded responses. We propose an innovative nuclear norm penalized local polynomial estimator and establish an upper bound on its point-wise risk measured by Frobenius norm. Then we extend this estimator globally and prove an upper bound on its integrated risk measured by $L_2$-norm. We also propose another new estimator based on bias-reducing kernels to study the case when $A$ is not necessarily low rank and establish an upper bound on its risk measured by $L{\infty}$-norm. We show that the obtained rates are all optimal up to some logarithmic factor in minimax sense. Finally, we propose an adaptive estimation procedure based on Lepskii’s method and model selection with data splitting which is computationally efficient and can be easily implemented and parallelized.
Tasks Matrix Completion, Model Selection
Published 2018-02-17
URL http://arxiv.org/abs/1802.06292v3
PDF http://arxiv.org/pdf/1802.06292v3.pdf
PWC https://paperswithcode.com/paper/nonparametric-estimation-of-low-rank-matrix
Repo
Framework

Graph Oracle Models, Lower Bounds, and Gaps for Parallel Stochastic Optimization

Title Graph Oracle Models, Lower Bounds, and Gaps for Parallel Stochastic Optimization
Authors Blake Woodworth, Jialei Wang, Adam Smith, Brendan McMahan, Nathan Srebro
Abstract We suggest a general oracle-based framework that captures different parallel stochastic optimization settings described by a dependency graph, and derive generic lower bounds in terms of this graph. We then use the framework and derive lower bounds for several specific parallel optimization settings, including delayed updates and parallel processing with intermittent communication. We highlight gaps between lower and upper bounds on the oracle complexity, and cases where the “natural” algorithms are not known to be optimal.
Tasks Stochastic Optimization
Published 2018-05-25
URL http://arxiv.org/abs/1805.10222v3
PDF http://arxiv.org/pdf/1805.10222v3.pdf
PWC https://paperswithcode.com/paper/graph-oracle-models-lower-bounds-and-gaps-for
Repo
Framework

Detecting Anomalous Faces with ‘No Peeking’ Autoencoders

Title Detecting Anomalous Faces with ‘No Peeking’ Autoencoders
Authors Anand Bhattad, Jason Rock, David Forsyth
Abstract Detecting anomalous faces has important applications. For example, a system might tell when a train driver is incapacitated by a medical event, and assist in adopting a safe recovery strategy. These applications are demanding, because they require accurate detection of rare anomalies that may be seen only at runtime. Such a setting causes supervised methods to perform poorly. We describe a method for detecting an anomalous face image that meets these requirements. We construct a feature vector that reliably has large entries for anomalous images, then use various simple unsupervised methods to score the image based on the feature. Obvious constructions (autoencoder codes; autoencoder residuals) are defeated by a ‘peeking’ behavior in autoencoders. Our feature construction removes rectangular patches from the image, predicts the likely content of the patch conditioned on the rest of the image using a specially trained autoencoder, then compares the result to the image. High scores suggest that the patch was difficult for an autoencoder to predict, and so is likely anomalous. We demonstrate that our method can identify real anomalous face images in pools of typical images, taken from celeb-A, that is much larger than usual in state-of-the-art experiments. A control experiment based on our method with another set of normal celebrity images - a ‘typical set’, but nonceleb-A are not identified as anomalous; confirms this is not due to special properties of celeb-A.
Tasks
Published 2018-02-15
URL http://arxiv.org/abs/1802.05798v1
PDF http://arxiv.org/pdf/1802.05798v1.pdf
PWC https://paperswithcode.com/paper/detecting-anomalous-faces-with-no-peeking
Repo
Framework

Temporal Interpolation as an Unsupervised Pretraining Task for Optical Flow Estimation

Title Temporal Interpolation as an Unsupervised Pretraining Task for Optical Flow Estimation
Authors Jonas Wulff, Michael J. Black
Abstract The difficulty of annotating training data is a major obstacle to using CNNs for low-level tasks in video. Synthetic data often does not generalize to real videos, while unsupervised methods require heuristic losses. Proxy tasks can overcome these issues, and start by training a network for a task for which annotation is easier or which can be trained unsupervised. The trained network is then fine-tuned for the original task using small amounts of ground truth data. Here, we investigate frame interpolation as a proxy task for optical flow. Using real movies, we train a CNN unsupervised for temporal interpolation. Such a network implicitly estimates motion, but cannot handle untextured regions. By fine-tuning on small amounts of ground truth flow, the network can learn to fill in homogeneous regions and compute full optical flow fields. Using this unsupervised pre-training, our network outperforms similar architectures that were trained supervised using synthetic optical flow.
Tasks Optical Flow Estimation
Published 2018-09-21
URL http://arxiv.org/abs/1809.08317v1
PDF http://arxiv.org/pdf/1809.08317v1.pdf
PWC https://paperswithcode.com/paper/temporal-interpolation-as-an-unsupervised
Repo
Framework
comments powered by Disqus