July 27, 2019

2904 words 14 mins read

Paper Group ANR 483

FADO: A Deterministic Detection/Learning Algorithm. Entropic Trace Estimates for Log Determinants. Neumann Optimizer: A Practical Optimization Algorithm for Deep Neural Networks. Multiscale sequence modeling with a learned dictionary. Dual-fisheye lens stitching for 360-degree imaging. Super-Resolution of Wavelet-Encoded Images. Memory-Efficient Gl …

FADO: A Deterministic Detection/Learning Algorithm


Title	FADO: A Deterministic Detection/Learning Algorithm
Authors	Kristiaan Pelckmans
Abstract	This paper proposes and studies a detection technique for adversarial scenarios (dubbed deterministic detection). This technique provides an alternative detection methodology in case the usual stochastic methods are not applicable: this can be because the studied phenomenon does not follow a stochastic sampling scheme, samples are high-dimensional and subsequent multiple-testing corrections render results overly conservative, sample sizes are too low for asymptotic results (as e.g. the central limit theorem) to kick in, or one cannot allow for the small probability of failure inherent to stochastic approaches. This paper instead designs a method based on insights from machine learning and online learning theory: this detection algorithm - named Online FAult Detection (FADO) - comes with theoretical guarantees of its detection capabilities. A version of the margin is found to regulate the detection performance of FADO. A precise expression is derived for bounding the performance, and experimental results are presented assessing the influence of involved quantities. A case study of scene detection is used to illustrate the approach. The technology is closely related to the linear perceptron rule, inherits its computational attractiveness and flexibility towards various extensions.
Tasks	Fault Detection
Published	2017-11-07
URL	http://arxiv.org/abs/1711.02361v1
PDF	http://arxiv.org/pdf/1711.02361v1.pdf
PWC	https://paperswithcode.com/paper/fado-a-deterministic-detectionlearning
Repo
Framework

Entropic Trace Estimates for Log Determinants


Title	Entropic Trace Estimates for Log Determinants
Authors	Jack Fitzsimons, Diego Granziol, Kurt Cutajar, Michael Osborne, Maurizio Filippone, Stephen Roberts
Abstract	The scalable calculation of matrix determinants has been a bottleneck to the widespread application of many machine learning methods such as determinantal point processes, Gaussian processes, generalised Markov random fields, graph models and many others. In this work, we estimate log determinants under the framework of maximum entropy, given information in the form of moment constraints from stochastic trace estimation. The estimates demonstrate a significant improvement on state-of-the-art alternative methods, as shown on a wide variety of UFL sparse matrices. By taking the example of a general Markov random field, we also demonstrate how this approach can significantly accelerate inference in large-scale learning methods involving the log determinant.
Tasks	Gaussian Processes, Point Processes
Published	2017-04-24
URL	http://arxiv.org/abs/1704.07223v1
PDF	http://arxiv.org/pdf/1704.07223v1.pdf
PWC	https://paperswithcode.com/paper/entropic-trace-estimates-for-log-determinants
Repo
Framework

Neumann Optimizer: A Practical Optimization Algorithm for Deep Neural Networks


Title	Neumann Optimizer: A Practical Optimization Algorithm for Deep Neural Networks
Authors	Shankar Krishnan, Ying Xiao, Rif A. Saurous
Abstract	Progress in deep learning is slowed by the days or weeks it takes to train large models. The natural solution of using more hardware is limited by diminishing returns, and leads to inefficient use of additional resources. In this paper, we present a large batch, stochastic optimization algorithm that is both faster than widely used algorithms for fixed amounts of computation, and also scales up substantially better as more computational resources become available. Our algorithm implicitly computes the inverse Hessian of each mini-batch to produce descent directions; we do so without either an explicit approximation to the Hessian or Hessian-vector products. We demonstrate the effectiveness of our algorithm by successfully training large ImageNet models (Inception-V3, Resnet-50, Resnet-101 and Inception-Resnet-V2) with mini-batch sizes of up to 32000 with no loss in validation error relative to current baselines, and no increase in the total number of steps. At smaller mini-batch sizes, our optimizer improves the validation error in these models by 0.8-0.9%. Alternatively, we can trade off this accuracy to reduce the number of training steps needed by roughly 10-30%. Our work is practical and easily usable by others – only one hyperparameter (learning rate) needs tuning, and furthermore, the algorithm is as computationally cheap as the commonly used Adam optimizer.
Tasks	Stochastic Optimization
Published	2017-12-08
URL	http://arxiv.org/abs/1712.03298v1
PDF	http://arxiv.org/pdf/1712.03298v1.pdf
PWC	https://paperswithcode.com/paper/neumann-optimizer-a-practical-optimization
Repo
Framework

Multiscale sequence modeling with a learned dictionary


Title	Multiscale sequence modeling with a learned dictionary
Authors	Bart van Merriënboer, Amartya Sanyal, Hugo Larochelle, Yoshua Bengio
Abstract	We propose a generalization of neural network sequence models. Instead of predicting one symbol at a time, our multi-scale model makes predictions over multiple, potentially overlapping multi-symbol tokens. A variation of the byte-pair encoding (BPE) compression algorithm is used to learn the dictionary of tokens that the model is trained with. When applied to language modelling, our model has the flexibility of character-level models while maintaining many of the performance benefits of word-level models. Our experiments show that this model performs better than a regular LSTM on language modeling tasks, especially for smaller models.
Tasks	Language Modelling
Published	2017-07-03
URL	http://arxiv.org/abs/1707.00762v2
PDF	http://arxiv.org/pdf/1707.00762v2.pdf
PWC	https://paperswithcode.com/paper/multiscale-sequence-modeling-with-a-learned
Repo
Framework

Dual-fisheye lens stitching for 360-degree imaging


Title	Dual-fisheye lens stitching for 360-degree imaging
Authors	Tuan Ho, Madhukar Budagavi
Abstract	Dual-fisheye lens cameras have been increasingly used for 360-degree immersive imaging. However, the limited overlapping field of views and misalignment between the two lenses give rise to visible discontinuities in the stitching boundaries. This paper introduces a novel method for dual-fisheye camera stitching that adaptively minimizes the discontinuities in the overlapping regions to generate full spherical 360-degree images. Results show that this approach can produce good quality stitched images for Samsung Gear 360 – a dual-fisheye camera, even with hard-to-stitch objects in the stitching borders.
Tasks
Published	2017-08-20
URL	http://arxiv.org/abs/1708.08988v1
PDF	http://arxiv.org/pdf/1708.08988v1.pdf
PWC	https://paperswithcode.com/paper/dual-fisheye-lens-stitching-for-360-degree
Repo
Framework

Super-Resolution of Wavelet-Encoded Images


Title	Super-Resolution of Wavelet-Encoded Images
Authors	Vildan Atalay Aydin, Hassan Foroosh
Abstract	Multiview super-resolution image reconstruction (SRIR) is often cast as a resampling problem by merging non-redundant data from multiple low-resolution (LR) images on a finer high-resolution (HR) grid, while inverting the effect of the camera point spread function (PSF). One main problem with multiview methods is that resampling from nonuniform samples (provided by LR images) and the inversion of the PSF are highly nonlinear and ill-posed problems. Non-linearity and ill-posedness are typically overcome by linearization and regularization, often through an iterative optimization process, which essentially trade off the very same information (i.e. high frequency) that we want to recover. We propose a novel point of view for multiview SRIR: Unlike existing multiview methods that reconstruct the entire spectrum of the HR image from the multiple given LR images, we derive explicit expressions that show how the high-frequency spectra of the unknown HR image are related to the spectra of the LR images. Therefore, by taking any of the LR images as the reference to represent the low-frequency spectra of the HR image, one can reconstruct the super-resolution image by focusing only on the reconstruction of the high-frequency spectra. This is very much like single-image methods, which extrapolate the spectrum of one image, except that we rely on information provided by all other views, rather than by prior constraints as in single-image methods (which may not be an accurate source of information). This is made possible by deriving and applying explicit closed-form expressions that define how the local high frequency information that we aim to recover for the reference high resolution image is related to the local low frequency information in the sequence of views. Results and comparisons with recently published state-of-the-art methods show the superiority of the proposed solution.
Tasks	Image Reconstruction, Super-Resolution
Published	2017-05-03
URL	http://arxiv.org/abs/1705.01258v1
PDF	http://arxiv.org/pdf/1705.01258v1.pdf
PWC	https://paperswithcode.com/paper/super-resolution-of-wavelet-encoded-images
Repo
Framework


Title	Memory-Efficient Global Refinement of Decision-Tree Ensembles and its Application to Face Alignment
Authors	Nenad Markuš, Ivan Gogić, Igor S. Pandžić, Jörgen Ahlberg
Abstract	Ren et al. recently introduced a method for aggregating multiple decision trees into a strong predictor by interpreting a path taken by a sample down each tree as a binary vector and performing linear regression on top of these vectors stacked together. They provided experimental evidence that the method offers advantages over the usual approaches for combining decision trees (random forests and boosting). The method truly shines when the regression target is a large vector with correlated dimensions, such as a 2D face shape represented with the positions of several facial landmarks. However, we argue that their basic method is not applicable in many practical scenarios due to large memory requirements. This paper shows how this issue can be solved through the use of quantization and architectural changes of the predictor that maps decision tree-derived encodings to the desired output.
Tasks	Face Alignment, Quantization
Published	2017-02-27
URL	http://arxiv.org/abs/1702.08481v2
PDF	http://arxiv.org/pdf/1702.08481v2.pdf
PWC	https://paperswithcode.com/paper/memory-efficient-global-refinement-of
Repo
Framework

DeepSafe: A Data-driven Approach for Checking Adversarial Robustness in Neural Networks


Title	DeepSafe: A Data-driven Approach for Checking Adversarial Robustness in Neural Networks
Authors	Divya Gopinath, Guy Katz, Corina S. Pasareanu, Clark Barrett
Abstract	Deep neural networks have become widely used, obtaining remarkable results in domains such as computer vision, speech recognition, natural language processing, audio recognition, social network filtering, machine translation, and bio-informatics, where they have produced results comparable to human experts. However, these networks can be easily fooled by adversarial perturbations: minimal changes to correctly-classified inputs, that cause the network to mis-classify them. This phenomenon represents a concern for both safety and security, but it is currently unclear how to measure a network’s robustness against such perturbations. Existing techniques are limited to checking robustness around a few individual input points, providing only very limited guarantees. We propose a novel approach for automatically identifying safe regions of the input space, within which the network is robust against adversarial perturbations. The approach is data-guided, relying on clustering to identify well-defined geometric regions as candidate safe regions. We then utilize verification techniques to confirm that these regions are safe or to provide counter-examples showing that they are not safe. We also introduce the notion of targeted robustness which, for a given target label and region, ensures that a NN does not map any input in the region to the target label. We evaluated our technique on the MNIST dataset and on a neural network implementation of a controller for the next-generation Airborne Collision Avoidance System for unmanned aircraft (ACAS Xu). For these networks, our approach identified multiple regions which were completely safe as well as some which were only safe for specific labels. It also discovered several adversarial perturbations of interest.
Tasks	Machine Translation, Speech Recognition
Published	2017-10-02
URL	https://arxiv.org/abs/1710.00486v2
PDF	https://arxiv.org/pdf/1710.00486v2.pdf
PWC	https://paperswithcode.com/paper/deepsafe-a-data-driven-approach-for-checking
Repo
Framework

Objective Bayesian Analysis for Change Point Problems


Title	Objective Bayesian Analysis for Change Point Problems
Authors	Laurentiu Hinoveanu, Fabrizio Leisen, Cristiano Villa
Abstract	In this paper we present a loss-based approach to change point analysis. In particular, we look at the problem from two perspectives. The first focuses on the definition of a prior when the number of change points is known a priori. The second contribution aims to estimate the number of change points by using a loss-based approach recently introduced in the literature. The latter considers change point estimation as a model selection exercise. We show the performance of the proposed approach on simulated data and real data sets.
Tasks	Model Selection
Published	2017-02-17
URL	http://arxiv.org/abs/1702.05462v2
PDF	http://arxiv.org/pdf/1702.05462v2.pdf
PWC	https://paperswithcode.com/paper/objective-bayesian-analysis-for-change-point
Repo
Framework

Fast mixing for Latent Dirichlet allocation


Title	Fast mixing for Latent Dirichlet allocation
Authors	Johan Jonasson
Abstract	Markov chain Monte Carlo (MCMC) algorithms are ubiquitous in probability theory in general and in machine learning in particular. A Markov chain is devised so that its stationary distribution is some probability distribution of interest. Then one samples from the given distribution by running the Markov chain for a “long time” until it appears to be stationary and then collects the sample. However these chains are often very complex and there are no theoretical guarantees that stationarity is actually reached. In this paper we study the Gibbs sampler of the posterior distribution of a very simple case of Latent Dirichlet Allocation, the arguably most well known Bayesian unsupervised learning model for text generation and text classification. It is shown that when the corpus consists of two long documents of equal length $m$ and the vocabulary consists of only two different words, the mixing time is at most of order $m^2\log m$ (which corresponds to $m\log m$ rounds over the corpus). It will be apparent from our analysis that it seems very likely that the mixing time is not much worse in the more relevant case when the number of documents and the size of the vocabulary are also large as long as each word is represented a large number in each document, even though the computations involved may be intractable.
Tasks	Text Classification, Text Generation
Published	2017-01-11
URL	http://arxiv.org/abs/1701.02960v2
PDF	http://arxiv.org/pdf/1701.02960v2.pdf
PWC	https://paperswithcode.com/paper/fast-mixing-for-latent-dirichlet-allocation
Repo
Framework

Inducing Interpretability in Knowledge Graph Embeddings


Title	Inducing Interpretability in Knowledge Graph Embeddings
Authors	Chandrahas, Tathagata Sengupta, Cibi Pragadeesh, Partha Pratim Talukdar
Abstract	We study the problem of inducing interpretability in KG embeddings. Specifically, we explore the Universal Schema (Riedel et al., 2013) and propose a method to induce interpretability. There have been many vector space models proposed for the problem, however, most of these methods don’t address the interpretability (semantics) of individual dimensions. In this work, we study this problem and propose a method for inducing interpretability in KG embeddings using entity co-occurrence statistics. The proposed method significantly improves the interpretability, while maintaining comparable performance in other KG tasks.
Tasks	Knowledge Graph Embeddings
Published	2017-12-10
URL	http://arxiv.org/abs/1712.03547v1
PDF	http://arxiv.org/pdf/1712.03547v1.pdf
PWC	https://paperswithcode.com/paper/inducing-interpretability-in-knowledge-graph
Repo
Framework

Minimax Statistical Learning with Wasserstein Distances


Title	Minimax Statistical Learning with Wasserstein Distances
Authors	Jaeho Lee, Maxim Raginsky
Abstract	As opposed to standard empirical risk minimization (ERM), distributionally robust optimization aims to minimize the worst-case risk over a larger ambiguity set containing the original empirical distribution of the training data. In this work, we describe a minimax framework for statistical learning with ambiguity sets given by balls in Wasserstein space. In particular, we prove generalization bounds that involve the covering number properties of the original ERM problem. As an illustrative example, we provide generalization guarantees for transport-based domain adaptation problems where the Wasserstein distance between the source and target domain distributions can be reliably estimated from unlabeled samples.
Tasks	Domain Adaptation
Published	2017-05-22
URL	http://arxiv.org/abs/1705.07815v2
PDF	http://arxiv.org/pdf/1705.07815v2.pdf
PWC	https://paperswithcode.com/paper/minimax-statistical-learning-with-wasserstein
Repo
Framework

WebAPIRec: Recommending Web APIs to Software Projects via Personalized Ranking


Title	WebAPIRec: Recommending Web APIs to Software Projects via Personalized Ranking
Authors	Ferdian Thung, Richard J. Oentaryo, David Lo, Yuan Tian
Abstract	Application programming interfaces (APIs) offer a plethora of functionalities for developers to reuse without reinventing the wheel. Identifying the appropriate APIs given a project requirement is critical for the success of a project, as many functionalities can be reused to achieve faster development. However, the massive number of APIs would often hinder the developers’ ability to quickly find the right APIs. In this light, we propose a new, automated approach called WebAPIRec that takes as input a project profile and outputs a ranked list of {web} APIs that can be used to implement the project. At its heart, WebAPIRec employs a personalized ranking model that ranks web APIs specific (personalized) to a project. Based on the historical data of {web} API usages, WebAPIRec learns a model that minimizes the incorrect ordering of web APIs, i.e., when a used {web} API is ranked lower than an unused (or a not-yet-used) web API. We have evaluated our approach on a dataset comprising 9,883 web APIs and 4,315 web application projects from ProgrammableWeb with promising results. For 84.0% of the projects, WebAPIRec is able to successfully return correct APIs that are used to implement the projects in the top-5 positions. This is substantially better than the recommendations provided by ProgrammableWeb’s native search functionality. WebAPIRec also outperforms McMillan et al.‘s application search engine and popularity-based recommendation.
Tasks
Published	2017-05-01
URL	http://arxiv.org/abs/1705.00561v1
PDF	http://arxiv.org/pdf/1705.00561v1.pdf
PWC	https://paperswithcode.com/paper/webapirec-recommending-web-apis-to-software
Repo
Framework

Machine Translation in Indian Languages: Challenges and Resolution


Title	Machine Translation in Indian Languages: Challenges and Resolution
Authors	Raj Nath Patel, Prakash B. Pimpale, M Sasikumar
Abstract	English to Indian language machine translation poses the challenge of structural and morphological divergence. This paper describes English to Indian language statistical machine translation using pre-ordering and suffix separation. The pre-ordering uses rules to transfer the structure of the source sentences prior to training and translation. This syntactic restructuring helps statistical machine translation to tackle the structural divergence and hence better translation quality. The suffix separation is used to tackle the morphological divergence between English and highly agglutinative Indian languages. We demonstrate that the use of pre-ordering and suffix separation helps in improving the quality of English to Indian Language machine translation.
Tasks	Machine Translation
Published	2017-08-26
URL	http://arxiv.org/abs/1708.07950v3
PDF	http://arxiv.org/pdf/1708.07950v3.pdf
PWC	https://paperswithcode.com/paper/machine-translation-in-indian-languages
Repo
Framework

The Causal Role of Astrocytes in Slow-Wave Rhythmogenesis: A Computational Modelling Study


Title	The Causal Role of Astrocytes in Slow-Wave Rhythmogenesis: A Computational Modelling Study
Authors	Leo Kozachkov, Konstantinos P. Michmizos
Abstract	Finding the origin of slow and infra-slow oscillations could reveal or explain brain mechanisms in health and disease. Here, we present a biophysically constrained computational model of a neural network where the inclusion of astrocytes introduced slow and infra-slow-oscillations, through two distinct mechanisms. Specifically, we show how astrocytes can modulate the fast network activity through their slow inter-cellular calcium wave speed and amplitude and possibly cause the oscillatory imbalances observed in diseases commonly known for such abnormalities, namely Alzheimer’s disease, Parkinson’s disease, epilepsy, depression and ischemic stroke. This work aims to increase our knowledge on how astrocytes and neurons synergize to affect brain function and dysfunction.
Tasks
Published	2017-02-13
URL	http://arxiv.org/abs/1702.03993v1
PDF	http://arxiv.org/pdf/1702.03993v1.pdf
PWC	https://paperswithcode.com/paper/the-causal-role-of-astrocytes-in-slow-wave
Repo
Framework