January 29, 2020

2937 words 14 mins read

Paper Group ANR 642

Paper Group ANR 642

Knowing The What But Not The Where in Bayesian Optimization. Tree-Wasserstein Barycenter for Large-Scale Multilevel Clustering and Scalable Bayes. Exact slice sampler for Hierarchical Dirichlet Processes. A Note On $k$-Means Probabilistic Poverty. Self-Attentional Models for Lattice Inputs. Orthogonal Wasserstein GANs. Randomly initialized EM algor …

Knowing The What But Not The Where in Bayesian Optimization

Title Knowing The What But Not The Where in Bayesian Optimization
Authors Vu Nguyen, Michael A. Osborne
Abstract Bayesian optimization has demonstrated impressive success in finding the optimum input x* and output f* = f(x*) = max f(x) of a black-box function f. In some applications, however, the optimum output f* is known in advance and the goal is to find the corresponding optimum input x*. In this paper, we consider a new setting in BO in which the knowledge of the optimum output f* is available. Our goal is to exploit the knowledge about f* to search for the input x* efficiently. To achieve this goal, we first transform the Gaussian process surrogate using the information about the optimum output. Then, we propose two acquisition functions, called confidence bound minimization and expected regret minimization. We show that our approaches work intuitively and give quantitatively better performance against standard BO methods. We demonstrate real applications in tuning a deep reinforcement learning algorithm on the CartPole problem and XGBoost on Skin Segmentation dataset in which the optimum values are publicly available.
Tasks
Published 2019-05-07
URL https://arxiv.org/abs/1905.02685v4
PDF https://arxiv.org/pdf/1905.02685v4.pdf
PWC https://paperswithcode.com/paper/knowing-the-what-but-not-the-where-in
Repo
Framework

Tree-Wasserstein Barycenter for Large-Scale Multilevel Clustering and Scalable Bayes

Title Tree-Wasserstein Barycenter for Large-Scale Multilevel Clustering and Scalable Bayes
Authors Tam Le, Viet Huynh, Nhat Ho, Dinh Phung, Makoto Yamada
Abstract We study in this paper a variant of Wasserstein barycenter problem, which we refer to as tree-Wasserstein barycenter, by leveraging a specific class of ground metrics, namely tree metrics, for Wasserstein distance. Drawing on the tree structure, we propose an efficient algorithmic approach to solve the tree-Wasserstein barycenter and its variants. The proposed approach is not only fast for computation but also efficient for memory usage. Exploiting the tree-Wasserstein barycenter and its variants, we scale up multi-level clustering and scalable Bayes, especially for large-scale applications where the number of supports in probability measures is large. Empirically, we test our proposed approach against other baselines on large-scale synthetic and real datasets.
Tasks
Published 2019-10-10
URL https://arxiv.org/abs/1910.04483v3
PDF https://arxiv.org/pdf/1910.04483v3.pdf
PWC https://paperswithcode.com/paper/on-scalable-variant-of-wasserstein-barycenter
Repo
Framework

Exact slice sampler for Hierarchical Dirichlet Processes

Title Exact slice sampler for Hierarchical Dirichlet Processes
Authors Arash A. Amini, Marina Paez, Lizhen Lin, Zahra S. Razaee
Abstract We propose an exact slice sampler for Hierarchical Dirichlet process (HDP) and its associated mixture models (Teh et al., 2006). Although there are existing MCMC algorithms for sampling from the HDP, a slice sampler has been missing from the literature. Slice sampling is well-known for its desirable properties including its fast mixing and its natural potential for parallelization. On the other hand, the hierarchical nature of HDPs poses challenges to adopting a full-fledged slice sampler that automatically truncates all the infinite measures involved without ad-hoc modifications. In this work, we adopt the powerful idea of Bayesian variable augmentation to address this challenge. By introducing new latent variables, we obtain a full factorization of the joint distribution that is suitable for slice sampling. Our algorithm has several appealing features such as (1) fast mixing; (2) remaining exact while allowing natural truncation of the underlying infinite-dimensional measures, as in (Kalli et al., 2011), resulting in updates of only a finite number of necessary atoms and weights in each iteration; and (3) being naturally suited to parallel implementations. The underlying principle for joint factorization of the full likelihood is simple and can be applied to many other settings, such as designing sampling algorithms for general dependent Dirichlet process (DDP) models.
Tasks
Published 2019-03-21
URL http://arxiv.org/abs/1903.08829v1
PDF http://arxiv.org/pdf/1903.08829v1.pdf
PWC https://paperswithcode.com/paper/exact-slice-sampler-for-hierarchical
Repo
Framework

A Note On $k$-Means Probabilistic Poverty

Title A Note On $k$-Means Probabilistic Poverty
Authors Mieczysław A. Kłopotek
Abstract It is proven, by example, that the version of $k$-means with random initialization does not have the property \emph{probabilistic $k$-richness}.
Tasks
Published 2019-09-28
URL https://arxiv.org/abs/1910.00413v1
PDF https://arxiv.org/pdf/1910.00413v1.pdf
PWC https://paperswithcode.com/paper/a-note-on-k-means-probabilistic-poverty
Repo
Framework

Self-Attentional Models for Lattice Inputs

Title Self-Attentional Models for Lattice Inputs
Authors Matthias Sperber, Graham Neubig, Ngoc-Quan Pham, Alex Waibel
Abstract Lattices are an efficient and effective method to encode ambiguity of upstream systems in natural language processing tasks, for example to compactly capture multiple speech recognition hypotheses, or to represent multiple linguistic analyses. Previous work has extended recurrent neural networks to model lattice inputs and achieved improvements in various tasks, but these models suffer from very slow computation speeds. This paper extends the recently proposed paradigm of self-attention to handle lattice inputs. Self-attention is a sequence modeling technique that relates inputs to one another by computing pairwise similarities and has gained popularity for both its strong results and its computational efficiency. To extend such models to handle lattices, we introduce probabilistic reachability masks that incorporate lattice structure into the model and support lattice scores if available. We also propose a method for adapting positional embeddings to lattice structures. We apply the proposed model to a speech translation task and find that it outperforms all examined baselines while being much faster to compute than previous neural lattice models during both training and inference.
Tasks Speech Recognition
Published 2019-06-04
URL https://arxiv.org/abs/1906.01617v1
PDF https://arxiv.org/pdf/1906.01617v1.pdf
PWC https://paperswithcode.com/paper/self-attentional-models-for-lattice-inputs
Repo
Framework

Orthogonal Wasserstein GANs

Title Orthogonal Wasserstein GANs
Authors Jan Müller, Reinhard Klein, Michael Weinmann
Abstract Wasserstein-GANs have been introduced to address the deficiencies of generative adversarial networks (GANs) regarding the problems of vanishing gradients and mode collapse during the training, leading to improved convergence behaviour and improved image quality. However, Wasserstein-GANs require the discriminator to be Lipschitz continuous. In current state-of-the-art Wasserstein-GANs this constraint is enforced via gradient norm regularization. In this paper, we demonstrate that this regularization does not encourage a broad distribution of spectral-values in the discriminator weights, hence resulting in less fidelity in the learned distribution. We therefore investigate the possibility of substituting this Lipschitz constraint with an orthogonality constraint on the weight matrices. We compare three different weight orthogonalization techniques with regards to their convergence properties, their ability to ensure the Lipschitz condition and the achieved quality of the learned distribution. In addition, we provide a comparison to Wasserstein-GANs trained with current state-of-the-art methods, where we demonstrate the potential of solely using orthogonality-based regularization. In this context, we propose an improved training procedure for Wasserstein-GANs which utilizes orthogonalization to further increase its generalization capability. Finally, we provide a novel metric to evaluate the generalization capabilities of the discriminators of different Wasserstein-GANs.
Tasks
Published 2019-11-29
URL https://arxiv.org/abs/1911.13060v2
PDF https://arxiv.org/pdf/1911.13060v2.pdf
PWC https://paperswithcode.com/paper/orthogonal-wasserstein-gans
Repo
Framework

Randomly initialized EM algorithm for two-component Gaussian mixture achieves near optimality in $O(\sqrt{n})$ iterations

Title Randomly initialized EM algorithm for two-component Gaussian mixture achieves near optimality in $O(\sqrt{n})$ iterations
Authors Yihong Wu, Harrison H. Zhou
Abstract We analyze the classical EM algorithm for parameter estimation in the symmetric two-component Gaussian mixtures in $d$ dimensions. We show that, even in the absence of any separation between components, provided that the sample size satisfies $n=\Omega(d \log^3 d)$, the randomly initialized EM algorithm converges to an estimate in at most $O(\sqrt{n})$ iterations with high probability, which is at most $O((\frac{d \log^3 n}{n})^{1/4})$ in Euclidean distance from the true parameter and within logarithmic factors of the minimax rate of $(\frac{d}{n})^{1/4}$. Both the nonparametric statistical rate and the sublinear convergence rate are direct consequences of the zero Fisher information in the worst case. Refined pointwise guarantees beyond worst-case analysis and convergence to the MLE are also shown under mild conditions. This improves the previous result of Balakrishnan et al \cite{BWY17} which requires strong conditions on both the separation of the components and the quality of the initialization, and that of Daskalakis et al \cite{DTZ17} which requires sample splitting and restarting the EM iteration.
Tasks
Published 2019-08-28
URL https://arxiv.org/abs/1908.10935v1
PDF https://arxiv.org/pdf/1908.10935v1.pdf
PWC https://paperswithcode.com/paper/randomly-initialized-em-algorithm-for-two
Repo
Framework

Automated curriculum generation for Policy Gradients from Demonstrations

Title Automated curriculum generation for Policy Gradients from Demonstrations
Authors Anirudh Srinivasan, Dzmitry Bahdanau, Maxime Chevalier-Boisvert, Yoshua Bengio
Abstract In this paper, we present a technique that improves the process of training an agent (using RL) for instruction following. We develop a training curriculum that uses a nominal number of expert demonstrations and trains the agent in a manner that draws parallels from one of the ways in which humans learn to perform complex tasks, i.e by starting from the goal and working backwards. We test our method on the BabyAI platform and show an improvement in sample efficiency for some of its tasks compared to a PPO (proximal policy optimization) baseline.
Tasks
Published 2019-12-01
URL https://arxiv.org/abs/1912.00444v1
PDF https://arxiv.org/pdf/1912.00444v1.pdf
PWC https://paperswithcode.com/paper/automated-curriculum-generation-for-policy
Repo
Framework

PIRM2018 Challenge on Spectral Image Super-Resolution: Dataset and Study

Title PIRM2018 Challenge on Spectral Image Super-Resolution: Dataset and Study
Authors Mehrdad Shoeiby, Antonio Robles-Kelly, Ran Wei, Radu Timofte
Abstract This paper introduces a newly collected and novel dataset (StereoMSI) for example-based single and colour-guided spectral image super-resolution. The dataset was first released and promoted during the PIRM2018 spectral image super-resolution challenge. To the best of our knowledge, the dataset is the first of its kind, comprising 350 registered colour-spectral image pairs. The dataset has been used for the two tracks of the challenge and, for each of these, we have provided a split into training, validation and testing. This arrangement is a result of the challenge structure and phases, with the first track focusing on example-based spectral image super-resolution and the second one aiming at exploiting the registered stereo colour imagery to improve the resolution of the spectral images. Each of the tracks and splits has been selected to be consistent across a number of image quality metrics. The dataset is quite general in nature and can be used for a wide variety of applications in addition to the development of spectral image super-resolution methods.
Tasks Image Super-Resolution, Super-Resolution
Published 2019-04-01
URL http://arxiv.org/abs/1904.00540v2
PDF http://arxiv.org/pdf/1904.00540v2.pdf
PWC https://paperswithcode.com/paper/pirm2018-challenge-on-spectral-image-super
Repo
Framework

LS-Net: Fast Single-Shot Line-Segment Detector

Title LS-Net: Fast Single-Shot Line-Segment Detector
Authors Van Nhan Nguyen, Robert Jenssen, Davide Roverso
Abstract In low-altitude Unmanned Aerial Vehicle (UAV) flights, power lines are considered as one of the most threatening hazards and one of the most difficult obstacles to avoid. In recent years, many vision-based techniques have been proposed to detect power lines to facilitate self-driving UAVs and automatic obstacle avoidance. However, most of the proposed methods are typically based on a common three-step approach: (i) edge detection, (ii) the Hough transform, and (iii) spurious line elimination based on power line constrains. These approaches not only are slow and inaccurate but also require a huge amount of effort in post-processing to distinguish between power lines and spurious lines. In this paper, we introduce LS-Net, a fast single-shot line-segment detector, and apply it to power line detection. The LS-Net is by design fully convolutional and consists of three modules: (i) a fully convolutional feature extractor, (ii) a classifier, and (iii) a line segment regressor. Due to the unavailability of large datasets with annotations of power lines, we render synthetic images of power lines using the Physically Based Rendering (PBR) approach and propose a series of effective data augmentation techniques to generate more training data. With a customized version of the VGG-16 network as the backbone, the proposed approach outperforms existing state-of-the-art approaches. In addition, the LS-Net can detect power lines in near real-time (20.4 FPS). This suggests that our proposed approach has a promising role in automatic obstacle avoidance and as a valuable component of self-driving UAVs, especially for automatic autonomous power line inspection.
Tasks Data Augmentation, Edge Detection
Published 2019-12-19
URL https://arxiv.org/abs/1912.09532v2
PDF https://arxiv.org/pdf/1912.09532v2.pdf
PWC https://paperswithcode.com/paper/ls-net-fast-single-shot-line-segment-detector
Repo
Framework

On Gossip-based Information Dissemination in Pervasive Recommender Systems

Title On Gossip-based Information Dissemination in Pervasive Recommender Systems
Authors Tobias Eichinger, Felix Beierle, Robin Papke, Lucas Rebscher, Hong Chinh Tran, Magdalena Trzeciak
Abstract Pervasive computing systems employ distributed and embedded devices in order to raise, communicate, and process data in an anytime-anywhere fashion. Certainly, its most prominent device is the smartphone due to its wide proliferation, growing computation power, and wireless networking capabilities. In this context, we revisit the implementation of digitalized word-of-mouth that suggests exchanging item preferences between smartphones offline and directly in immediate proximity. Collaboratively and decentrally collecting data in this way has two benefits. First, it allows to attach for instance location-sensitive context information in order to enrich collected item preferences. %enhance on-device recommendations. Second, model building does not require network connectivity. Despite the benefits, the approach naturally raises data privacy and data scarcity issues. In order to address both, we propose Propagate and Filter, a method that translates the traditional approach of finding similar peers and exchanging item preferences among each other from the field of decentralized to that of pervasive recommender systems. Additionally, we present preliminary results on a prototype mobile application that implements the proposed device-to-device information exchange. Average ad-hoc connection delays of 25.9 seconds and reliable connection success rates within 6 meters underpin the approach’s technical feasibility.
Tasks Recommendation Systems
Published 2019-08-15
URL https://arxiv.org/abs/1908.05544v1
PDF https://arxiv.org/pdf/1908.05544v1.pdf
PWC https://paperswithcode.com/paper/on-gossip-based-information-dissemination-in
Repo
Framework

Feature discriminativity estimation in CNNs for transfer learning

Title Feature discriminativity estimation in CNNs for transfer learning
Authors Victor Gimenez-Abalos, Armand Vilalta, Dario Garcia-Gasulla, Jesus Labarta, Eduard Ayguadé
Abstract The purpose of feature extraction on convolutional neural networks is to reuse deep representations learnt for a pre-trained model to solve a new, potentially unrelated problem. However, raw feature extraction from all layers is unfeasible given the massive size of these networks. Recently, a supervised method using complexity reduction was proposed, resulting in significant improvements in performance for transfer learning tasks. This approach first computes the discriminative power of features, and then discretises them using thresholds computed for the task. In this paper, we analyse the behaviour of these thresholds, with the purpose of finding a methodology for their estimation. After a comprehensive study, we find a very strong correlation between problem size and threshold value, with coefficient of determination above 90%. These results allow us to propose a unified model for threshold estimation, with potential application to transfer learning tasks.
Tasks Transfer Learning
Published 2019-11-08
URL https://arxiv.org/abs/1911.03332v1
PDF https://arxiv.org/pdf/1911.03332v1.pdf
PWC https://paperswithcode.com/paper/feature-discriminativity-estimation-in-cnns
Repo
Framework

Learning Ensembles of Anomaly Detectors on Synthetic Data

Title Learning Ensembles of Anomaly Detectors on Synthetic Data
Authors D. Smolyakov, N. Sviridenko, V. Ishimtsev, E. Burikov, E. Burnaev
Abstract The main aim of this work is to develop and implement an automatic anomaly detection algorithm for meteorological time-series. To achieve this goal we develop an approach to constructing an ensemble of anomaly detectors in combination with adaptive threshold selection based on artificially generated anomalies. We demonstrate the efficiency of the proposed method by integrating the corresponding implementation into ``Minimax-94’’ road weather information system. |
Tasks Anomaly Detection, Time Series
Published 2019-05-20
URL https://arxiv.org/abs/1905.07892v1
PDF https://arxiv.org/pdf/1905.07892v1.pdf
PWC https://paperswithcode.com/paper/learning-ensembles-of-anomaly-detectors-on
Repo
Framework

Sequential VAE-LSTM for Anomaly Detection on Time Series

Title Sequential VAE-LSTM for Anomaly Detection on Time Series
Authors Run-Qing Chen, Guang-Hui Shi, Wan-Lei Zhao, Chang-Hui Liang
Abstract In order to support stable web-based applications and services, anomalies on the IT performance status have to be detected timely. Moreover, the performance trend across the time series should be predicted. In this paper, we propose SeqVL (Sequential VAE-LSTM), a neural network model based on both VAE (Variational Auto-Encoder) and LSTM (Long Short-Term Memory). This work is the first attempt to integrate unsupervised anomaly detection and trend prediction under one framework. Moreover, this model performs considerably better on detection and prediction than VAE and LSTM work alone. On unsupervised anomaly detection, SeqVL achieves competitive experimental results compared with other state-of-the-art methods on public datasets. On trend prediction, SeqVL outperforms several classic time series prediction models in the experiments of the public dataset.
Tasks Anomaly Detection, Time Series, Time Series Prediction, Unsupervised Anomaly Detection
Published 2019-10-09
URL https://arxiv.org/abs/1910.03818v2
PDF https://arxiv.org/pdf/1910.03818v2.pdf
PWC https://paperswithcode.com/paper/sequential-vae-lstm-for-anomaly-detection-on
Repo
Framework

Spatial Feature Extraction in Airborne Hyperspectral Images Using Local Spectral Similarity

Title Spatial Feature Extraction in Airborne Hyperspectral Images Using Local Spectral Similarity
Authors Anand S Sahadevan, Arundhati Misra, Praveen Gupta
Abstract Local spectral similarity (LSS) algorithm has been developed for detecting homogeneous areas and edges in hyperspectral images (HSIs). The proposed algorithm transforms the 3-D data cube (within a spatial window) into a spectral similarity matrix by calculating the vector-similarity between the center pixel-spectrum and the neighborhood spectra. The final edge intensity is derived upon order statistics of the similarity matrix or spatial convolution of the similarity matrix with the spatial kernels. The LSS algorithm facilitates simultaneous use of spectral-spatial information for the edge detection by considering the spatial pattern of similar spectra within a spatial window. The proposed edge-detection method is tested on benchmark HSIs as well as the image obtained from Airborne-Visible-and-Infra-RedImaging-Spectrometer-Next-Generation (AVIRIS-NG). Robustness of the LSS method against multivariate Gaussian noise and low spatial resolution scenarios were also verified with the benchmark HSIs. Figure-of-merit, false-alarm-count and miss-count were applied to evaluate the performance of edge detection methods. Results showed that Fractional distance measure and Euclidean distance measure were able to detect the edges in HSIs more precisely as compared to other spectral similarity measures. The proposed method can be applied to radiance and reflectance data (whole spectrum) and it has shown good performance on principal component images as well. In addition, the proposed algorithm outperforms the traditional multichannel edge detectors in terms of both fastness, accuracy and the robustness. The experimental results also confirm that LSS can be applied as a pre-processing approach to reduce the errors in clustering as well as classification outputs.
Tasks Edge Detection
Published 2019-11-06
URL https://arxiv.org/abs/1911.02285v1
PDF https://arxiv.org/pdf/1911.02285v1.pdf
PWC https://paperswithcode.com/paper/spatial-feature-extraction-in-airborne
Repo
Framework
comments powered by Disqus