February 1, 2020

3513 words 17 mins read

Paper Group AWR 239

Paper Group AWR 239

Curriculum Learning for Cumulative Return Maximization. ToyADMOS: A Dataset of Miniature-Machine Operating Sounds for Anomalous Sound Detection. Statistical Model Aggregation via Parameter Matching. Deep Learning Based Chatbot Models. Data assimilation in Agent-based models using creation and annihilation operators. A Capsule Network-based Model fo …

Curriculum Learning for Cumulative Return Maximization

Title Curriculum Learning for Cumulative Return Maximization
Authors Francesco Foglino, Christiano Coletto Christakou, Ricardo Luna Gutierrez, Matteo Leonetti
Abstract Curriculum learning has been successfully used in reinforcement learning to accelerate the learning process, through knowledge transfer between tasks of increasing complexity. Critical tasks, in which suboptimal exploratory actions must be minimized, can benefit from curriculum learning, and its ability to shape exploration through transfer. We propose a task sequencing algorithm maximizing the cumulative return, that is, the return obtained by the agent across all the learning episodes. By maximizing the cumulative return, the agent not only aims at achieving high rewards as fast as possible, but also at doing so while limiting suboptimal actions. We experimentally compare our task sequencing algorithm to several popular metaheuristic algorithms for combinatorial optimization, and show that it achieves significantly better performance on the problem of cumulative return maximization. Furthermore, we validate our algorithm on a critical task, optimizing a home controller for a micro energy grid.
Tasks Combinatorial Optimization, Transfer Learning
Published 2019-06-13
URL https://arxiv.org/abs/1906.06178v1
PDF https://arxiv.org/pdf/1906.06178v1.pdf
PWC https://paperswithcode.com/paper/curriculum-learning-for-cumulative-return
Repo https://github.com/francescofoglino/Curriculum-Learning
Framework none

ToyADMOS: A Dataset of Miniature-Machine Operating Sounds for Anomalous Sound Detection

Title ToyADMOS: A Dataset of Miniature-Machine Operating Sounds for Anomalous Sound Detection
Authors Yuma Koizumi, Shoichiro Saito, Hisashi Uematsu, Noboru Harada, Keisuke Imoto
Abstract This paper introduces a new dataset called “ToyADMOS” designed for anomaly detection in machine operating sounds (ADMOS). To the best our knowledge, no large-scale datasets are available for ADMOS, although large-scale datasets have contributed to recent advancements in acoustic signal processing. This is because anomalous sound data are difficult to collect. To build a large-scale dataset for ADMOS, we collected anomalous operating sounds of miniature machines (toys) by deliberately damaging them. The released dataset consists of three sub-datasets for machine-condition inspection, fault diagnosis of machines with geometrically fixed tasks, and fault diagnosis of machines with moving tasks. Each sub-dataset includes over 180 hours of normal machine-operating sounds and over 4,000 samples of anomalous sounds collected with four microphones at a 48-kHz sampling rate. The dataset is freely available for download at https://github.com/YumaKoizumi/ToyADMOS-dataset
Tasks Anomaly Detection
Published 2019-08-09
URL https://arxiv.org/abs/1908.03299v1
PDF https://arxiv.org/pdf/1908.03299v1.pdf
PWC https://paperswithcode.com/paper/toyadmos-a-dataset-of-miniature-machine
Repo https://github.com/YumaKoizumi/ToyADMOS-dataset
Framework none

Statistical Model Aggregation via Parameter Matching

Title Statistical Model Aggregation via Parameter Matching
Authors Mikhail Yurochkin, Mayank Agarwal, Soumya Ghosh, Kristjan Greenewald, Trong Nghia Hoang
Abstract We consider the problem of aggregating models learned from sequestered, possibly heterogeneous datasets. Exploiting tools from Bayesian nonparametrics, we develop a general meta-modeling framework that learns shared global latent structures by identifying correspondences among local model parameterizations. Our proposed framework is model-independent and is applicable to a wide range of model types. After verifying our approach on simulated data, we demonstrate its utility in aggregating Gaussian topic models, hierarchical Dirichlet process based hidden Markov models, and sparse Gaussian processes with applications spanning text summarization, motion capture analysis, and temperature forecasting.
Tasks Gaussian Processes, Motion Capture, Text Summarization, Topic Models
Published 2019-11-01
URL https://arxiv.org/abs/1911.00218v1
PDF https://arxiv.org/pdf/1911.00218v1.pdf
PWC https://paperswithcode.com/paper/statistical-model-aggregation-via-parameter
Repo https://github.com/IBM/SPAHM
Framework none

Deep Learning Based Chatbot Models

Title Deep Learning Based Chatbot Models
Authors Richard Csaky
Abstract A conversational agent (chatbot) is a piece of software that is able to communicate with humans using natural language. Modeling conversation is an important task in natural language processing and artificial intelligence. While chatbots can be used for various tasks, in general they have to understand users’ utterances and provide responses that are relevant to the problem at hand. In my work, I conduct an in-depth survey of recent literature, examining over 70 publications related to chatbots published in the last 3 years. Then, I proceed to make the argument that the very nature of the general conversation domain demands approaches that are different from current state-of-of-the-art architectures. Based on several examples from the literature I show why current chatbot models fail to take into account enough priors when generating responses and how this affects the quality of the conversation. In the case of chatbots, these priors can be outside sources of information that the conversation is conditioned on like the persona or mood of the conversers. In addition to presenting the reasons behind this problem, I propose several ideas on how it could be remedied. The next section focuses on adapting the very recent Transformer model to the chatbot domain, which is currently state-of-the-art in neural machine translation. I first present experiments with the vanilla model, using conversations extracted from the Cornell Movie-Dialog Corpus. Secondly, I augment the model with some of my ideas regarding the issues of encoder-decoder architectures. More specifically, I feed additional features into the model like mood or persona together with the raw conversation data. Finally, I conduct a detailed analysis of how the vanilla model performs on conversational data by comparing it to previous chatbot models and how the additional features affect the quality of the generated responses.
Tasks Chatbot, Language Modelling, Machine Translation
Published 2019-08-23
URL https://arxiv.org/abs/1908.08835v1
PDF https://arxiv.org/pdf/1908.08835v1.pdf
PWC https://paperswithcode.com/paper/deep-learning-based-chatbot-models
Repo https://github.com/ricsinaruto/Seq2seqChatbots
Framework tf

Data assimilation in Agent-based models using creation and annihilation operators

Title Data assimilation in Agent-based models using creation and annihilation operators
Authors Daniel Tang
Abstract Agent-based models are a powerful tool for studying the behaviour of complex systems that can be described in terms of multiple, interacting ``agents’'. However, because of their inherently discrete and often highly non-linear nature, it is very difficult to reason about the relationship between the state of the model, on the one hand, and our observations of the real world on the other. In this paper we consider agents that have a discrete set of states that, at any instant, act with a probability that may depend on the environment or the state of other agents. Given this, we show how the mathematical apparatus of quantum field theory can be used to reason probabilistically about the state and dynamics the model, and describe an algorithm to update our belief in the state of the model in the light of new, real-world observations. Using a simple predator-prey model on a 2-dimensional spatial grid as an example, we demonstrate the assimilation of incomplete, noisy observations and show that this leads to an increase in the mutual information between the actual state of the observed system and the posterior distribution given the observations, when compared to a null model. |
Tasks
Published 2019-10-08
URL https://arxiv.org/abs/1910.09442v1
PDF https://arxiv.org/pdf/1910.09442v1.pdf
PWC https://paperswithcode.com/paper/data-assimilation-in-agent-based-models-using
Repo https://github.com/deselby-research/ProbabilisticABM
Framework none

A Capsule Network-based Model for Learning Node Embeddings

Title A Capsule Network-based Model for Learning Node Embeddings
Authors Dai Quoc Nguyen, Tu Dinh Nguyen, Dat Quoc Nguyen, Dinh Phung
Abstract In this paper, we focus on learning low-dimensional embeddings of entity nodes from graph-structured data, where we can use the learned node embeddings for a downstream task of node classification. Existing node embedding models often suffer from a limitation of exploiting graph information to infer plausible embeddings of unseen nodes. To address this issue, we propose Caps2NE—a new unsupervised embedding model using a network of two capsule layers. Given a target node and its context nodes, Caps2NE applies a routing process to aggregate features of the context nodes at the first capsule layer, then feed these features into the second capsule layer to produce an embedding vector. This embedding vector is then used to infer a plausible embedding for the target node. Experimental results for the node classification task on six well-known benchmark datasets show that our Caps2NE obtains state-of-the-art performances.
Tasks Node Classification
Published 2019-11-12
URL https://arxiv.org/abs/1911.04822v1
PDF https://arxiv.org/pdf/1911.04822v1.pdf
PWC https://paperswithcode.com/paper/a-capsule-network-based-model-for-learning
Repo https://github.com/daiquocnguyen/Caps2NE
Framework tf

Identification and Estimation of Hierarchical Latent Attribute Models

Title Identification and Estimation of Hierarchical Latent Attribute Models
Authors Yuqi Gu, Gongjun Xu
Abstract Hierarchical Latent Attribute Models (HLAMs) are a popular family of discrete latent variable models widely used in social and biological sciences. The key ingredients of an HLAM include a binary structural matrix specifying how the observed variables depend on the latent attributes, and also certain hierarchical constraints on allowable configurations of the latent attributes. This paper studies the theoretical identifiability issue and the practical estimation problem of HLAMs. For identification, the challenging problem of identifiability under a complex hierarchy is addressed and sufficient and almost necessary identification conditions are proposed. For estimation, a scalable algorithm for estimating both the structural matrix and the attribute hierarchy is developed. The superior performance of the proposed algorithm is demonstrated in various experimental settings, including both synthetic data and a real dataset from an international educational assessment.
Tasks Latent Variable Models
Published 2019-06-19
URL https://arxiv.org/abs/1906.07869v1
PDF https://arxiv.org/pdf/1906.07869v1.pdf
PWC https://paperswithcode.com/paper/identification-and-estimation-of-hierarchical
Repo https://github.com/zhenkewu/slamR
Framework none

Learning without feedback: Direct random target projection as a feedback-alignment algorithm with layerwise feedforward training

Title Learning without feedback: Direct random target projection as a feedback-alignment algorithm with layerwise feedforward training
Authors Charlotte Frenkel, Martin Lefebvre, David Bol
Abstract While the backpropagation of error algorithm allowed for a rapid rise in the development and deployment of artificial neural networks, two key issues currently preclude biological plausibility: (i) symmetry is required between forward and backward weights, which is known as the weight transport problem, and (ii) updates are locked before both the forward and backward passes have been completed. The feedback alignment (FA) algorithm uses fixed random feedback weights to release the weight transport problem. The direct feedback alignment (DFA) variation directly propagates the output error to each hidden layer through fixed random connectivity matrices. In this work, we show that using only the error sign is sufficient to maintain feedback alignment and to provide learning in the hidden layers. As in classification problems the error sign information is already contained in the target vector, using the latter as a proxy for the error brings three advantages: (i) it solves the weight transport problem by eliminating the requirement for an explicit feedback pathway, which also reduces the computational workload, (ii) it reduces memory requirements by removing update locking, allowing for weight updates to be computed in each layer independently without requiring a full forward pass, and (iii) it leads to a purely feedforward and low-cost algorithm that only requires a label-dependent random vector selection to estimate the layerwise loss gradients. Therefore, in this work, we propose the direct random target projection (DRTP) algorithm and demonstrate on the MNIST and CIFAR-10 datasets that, despite the absence of an explicit error feedback, DRTP performance can still lie close to the one of BP, FA and DFA. The low memory and computational cost of DRTP and its reliance only on layerwise feedforward computation make it suitable for deployment in adaptive edge computing devices.
Tasks
Published 2019-09-03
URL https://arxiv.org/abs/1909.01311v1
PDF https://arxiv.org/pdf/1909.01311v1.pdf
PWC https://paperswithcode.com/paper/learning-without-feedback-direct-random
Repo https://github.com/ChFrenkel/DirectRandomTargetProjection
Framework pytorch

Structural Neural Encoders for AMR-to-text Generation

Title Structural Neural Encoders for AMR-to-text Generation
Authors Marco Damonte, Shay B. Cohen
Abstract AMR-to-text generation is a problem recently introduced to the NLP community, in which the goal is to generate sentences from Abstract Meaning Representation (AMR) graphs. Sequence-to-sequence models can be used to this end by converting the AMR graphs to strings. Approaching the problem while working directly with graphs requires the use of graph-to-sequence models that encode the AMR graph into a vector representation. Such encoding has been shown to be beneficial in the past, and unlike sequential encoding, it allows us to explicitly capture reentrant structures in the AMR graphs. We investigate the extent to which reentrancies (nodes with multiple parents) have an impact on AMR-to-text generation by comparing graph encoders to tree encoders, where reentrancies are not preserved. We show that improvements in the treatment of reentrancies and long-range dependencies contribute to higher overall scores for graph encoders. Our best model achieves 24.40 BLEU on LDC2015E86, outperforming the state of the art by 1.1 points and 24.54 BLEU on LDC2017T10, outperforming the state of the art by 1.24 points.
Tasks Graph-to-Sequence, Text Generation
Published 2019-03-27
URL https://arxiv.org/abs/1903.11410v2
PDF https://arxiv.org/pdf/1903.11410v2.pdf
PWC https://paperswithcode.com/paper/structural-neural-encoders-for-amr-to-text
Repo https://github.com/mdtux89/OpenNMT-py
Framework pytorch

Statistical Loss and Analysis for Deep Learning in Hyperspectral Image Classification

Title Statistical Loss and Analysis for Deep Learning in Hyperspectral Image Classification
Authors Zhiqiang Gong, Ping Zhong, Weidong Hu
Abstract Nowadays, deep learning methods, especially the convolutional neural networks (CNNs), have shown impressive performance on extracting abstract and high-level features from the hyperspectral image. However, general training process of CNNs mainly considers the pixel-wise information or the samples’ correlation to formulate the penalization while ignores the statistical properties especially the spectral variability of each class in the hyperspectral image. These samples-based penalizations would lead to the uncertainty of the training process due to the imbalanced and limited number of training samples. To overcome this problem, this work characterizes each class from the hyperspectral image as a statistical distribution and further develops a novel statistical loss with the distributions, not directly with samples for deep learning. Based on the Fisher discrimination criterion, the loss penalizes the sample variance of each class distribution to decrease the intra-class variance of the training samples. Moreover, an additional diversity-promoting condition is added to enlarge the inter-class variance between different class distributions and this could better discriminate samples from different classes in hyperspectral image. Finally, the statistical estimation form of the statistical loss is developed with the training samples through multi-variant statistical analysis. Experiments over the real-world hyperspectral images show the effectiveness of the developed statistical loss for deep learning.
Tasks Hyperspectral Image Classification, Image Classification
Published 2019-12-28
URL https://arxiv.org/abs/1912.12385v2
PDF https://arxiv.org/pdf/1912.12385v2.pdf
PWC https://paperswithcode.com/paper/statistical-loss-and-analysis-for-deep
Repo https://github.com/shendu-sw/statistical-loss
Framework none

Total Least Squares Regression in Input Sparsity Time

Title Total Least Squares Regression in Input Sparsity Time
Authors Huaian Diao, Zhao Song, David P. Woodruff, Xin Yang
Abstract In the total least squares problem, one is given an $m \times n$ matrix $A$, and an $m \times d$ matrix $B$, and one seeks to “correct” both $A$ and $B$, obtaining matrices $\hat{A}$ and $\hat{B}$, so that there exists an $X$ satisfying the equation $\hat{A}X = \hat{B}$. Typically the problem is overconstrained, meaning that $m \gg \max(n,d)$. The cost of the solution $\hat{A}, \hat{B}$ is given by $\A-\hat{A}_F^2 + \B - \hat{B}_F^2$. We give an algorithm for finding a solution $X$ to the linear system $\hat{A}X=\hat{B}$ for which the cost $\A-\hat{A}_F^2 + \B-\hat{B}_F^2$ is at most a multiplicative $(1+\epsilon)$ factor times the optimal cost, up to an additive error $\eta$ that may be an arbitrarily small function of $n$. Importantly, our running time is $\tilde{O}( \mathrm{nnz}(A) + \mathrm{nnz}(B) ) + \mathrm{poly}(n/\epsilon) \cdot d$, where for a matrix $C$, $\mathrm{nnz}(C)$ denotes its number of non-zero entries. Importantly, our running time does not directly depend on the large parameter $m$. As total least squares regression is known to be solvable via low rank approximation, a natural approach is to invoke fast algorithms for approximate low rank approximation, obtaining matrices $\hat{A}$ and $\hat{B}$ from this low rank approximation, and then solving for $X$ so that $\hat{A}X = \hat{B}$. However, existing algorithms do not apply since in total least squares the rank of the low rank approximation needs to be $n$, and so the running time of known methods would be at least $mn^2$. In contrast, we are able to achieve a much faster running time for finding $X$ by never explicitly forming the equation $\hat{A} X = \hat{B}$, but instead solving for an $X$ which is a solution to an implicit such equation. Finally, we generalize our algorithm to the total least squares problem with regularization.
Tasks
Published 2019-09-27
URL https://arxiv.org/abs/1909.12441v1
PDF https://arxiv.org/pdf/1909.12441v1.pdf
PWC https://paperswithcode.com/paper/total-least-squares-regression-in-input
Repo https://github.com/yangxinuw/total_least_squares_code
Framework none

FastDepth: Fast Monocular Depth Estimation on Embedded Systems

Title FastDepth: Fast Monocular Depth Estimation on Embedded Systems
Authors Diana Wofk, Fangchang Ma, Tien-Ju Yang, Sertac Karaman, Vivienne Sze
Abstract Depth sensing is a critical function for robotic tasks such as localization, mapping and obstacle detection. There has been a significant and growing interest in depth estimation from a single RGB image, due to the relatively low cost and size of monocular cameras. However, state-of-the-art single-view depth estimation algorithms are based on fairly complex deep neural networks that are too slow for real-time inference on an embedded platform, for instance, mounted on a micro aerial vehicle. In this paper, we address the problem of fast depth estimation on embedded systems. We propose an efficient and lightweight encoder-decoder network architecture and apply network pruning to further reduce computational complexity and latency. In particular, we focus on the design of a low-latency decoder. Our methodology demonstrates that it is possible to achieve similar accuracy as prior work on depth estimation, but at inference speeds that are an order of magnitude faster. Our proposed network, FastDepth, runs at 178 fps on an NVIDIA Jetson TX2 GPU and at 27 fps when using only the TX2 CPU, with active power consumption under 10 W. FastDepth achieves close to state-of-the-art accuracy on the NYU Depth v2 dataset. To the best of the authors’ knowledge, this paper demonstrates real-time monocular depth estimation using a deep neural network with the lowest latency and highest throughput on an embedded platform that can be carried by a micro aerial vehicle.
Tasks Depth Estimation, Monocular Depth Estimation, Network Pruning
Published 2019-03-08
URL http://arxiv.org/abs/1903.03273v1
PDF http://arxiv.org/pdf/1903.03273v1.pdf
PWC https://paperswithcode.com/paper/fastdepth-fast-monocular-depth-estimation-on
Repo https://github.com/dwofk/fast-depth
Framework pytorch

Transformation of Dense and Sparse Text Representations

Title Transformation of Dense and Sparse Text Representations
Authors Wenpeng Hu, Mengyu Wang, Bing Liu, Feng Ji, Haiqing Chen, Dongyan Zhao, Jinwen Ma, Rui Yan
Abstract Sparsity is regarded as a desirable property of representations, especially in terms of explanation. However, its usage has been limited due to the gap with dense representations. Most NLP research progresses in recent years are based on dense representations. Thus the desirable property of sparsity cannot be leveraged. Inspired by Fourier Transformation, in this paper, we propose a novel Semantic Transformation method to bridge the dense and sparse spaces, which can facilitate the NLP research to shift from dense space to sparse space or to jointly use both spaces. The key idea of the proposed approach is to use a Forward Transformation to transform dense representations to sparse representations. Then some useful operations in the sparse space can be performed over the sparse representations, and the sparse representations can be used directly to perform downstream tasks such as text classification and natural language inference. Then, a Backward Transformation can also be carried out to transform those processed sparse representations to dense representations. Experiments using classification tasks and natural language inference task show that the proposed Semantic Transformation is effective.
Tasks Natural Language Inference, Text Classification
Published 2019-11-07
URL https://arxiv.org/abs/1911.02914v1
PDF https://arxiv.org/pdf/1911.02914v1.pdf
PWC https://paperswithcode.com/paper/transformation-of-dense-and-sparse-text
Repo https://github.com/morning-dews/ST
Framework none

Quantum adiabatic machine learning with zooming

Title Quantum adiabatic machine learning with zooming
Authors Alexander Zlokapa, Alex Mott, Joshua Job, Jean-Roch Vlimant, Daniel Lidar, Maria Spiropulu
Abstract Recent work has shown that quantum annealing for machine learning (QAML) can perform comparably to state-of-the-art machine learning methods with a specific application to Higgs boson classification. We propose a variant algorithm (QAML-Z) that iteratively zooms in on a region of the energy surface by mapping the problem to a continuous space and sequentially applying quantum annealing to an augmented set of weak classifiers. Results on a programmable quantum annealer show that QAML-Z increases the performance difference between QAML and classical deep neural networks by over 40% as measured by area under the ROC curve for small training set sizes. Furthermore, QAML-Z reduces the advantage of deep neural networks over QAML for large training sets by around 50%, indicating that QAML-Z produces stronger classifiers that retain the robustness of the original QAML algorithm.
Tasks
Published 2019-08-13
URL https://arxiv.org/abs/1908.04480v1
PDF https://arxiv.org/pdf/1908.04480v1.pdf
PWC https://paperswithcode.com/paper/quantum-adiabatic-machine-learning-with
Repo https://github.com/quantummind/qaml-z
Framework none

Trade-offs in Large-Scale Distributed Tuplewise Estimation and Learning

Title Trade-offs in Large-Scale Distributed Tuplewise Estimation and Learning
Authors Robin Vogel, Aurélien Bellet, Stephan Clémençon, Ons Jelassi, Guillaume Papa
Abstract The development of cluster computing frameworks has allowed practitioners to scale out various statistical estimation and machine learning algorithms with minimal programming effort. This is especially true for machine learning problems whose objective function is nicely separable across individual data points, such as classification and regression. In contrast, statistical learning tasks involving pairs (or more generally tuples) of data points - such as metric learning, clustering or ranking do not lend themselves as easily to data-parallelism and in-memory computing. In this paper, we investigate how to balance between statistical performance and computational efficiency in such distributed tuplewise statistical problems. We first propose a simple strategy based on occasionally repartitioning data across workers between parallel computation stages, where the number of repartitioning steps rules the trade-off between accuracy and runtime. We then present some theoretical results highlighting the benefits brought by the proposed method in terms of variance reduction, and extend our results to design distributed stochastic gradient descent algorithms for tuplewise empirical risk minimization. Our results are supported by numerical experiments in pairwise statistical estimation and learning on synthetic and real-world datasets.
Tasks Metric Learning
Published 2019-06-21
URL https://arxiv.org/abs/1906.09234v1
PDF https://arxiv.org/pdf/1906.09234v1.pdf
PWC https://paperswithcode.com/paper/trade-offs-in-large-scale-distributed
Repo https://github.com/RobinVogel/Trade-offs-in-Large-Scale-Distributed-Tuplewise-Estimation-and-Learning
Framework none
comments powered by Disqus