Paper Group ANR 483
FADO: A Deterministic Detection/Learning Algorithm. Entropic Trace Estimates for Log Determinants. Neumann Optimizer: A Practical Optimization Algorithm for Deep Neural Networks. Multiscale sequence modeling with a learned dictionary. Dual-fisheye lens stitching for 360-degree imaging. Super-Resolution of Wavelet-Encoded Images. Memory-Efficient Gl …
FADO: A Deterministic Detection/Learning Algorithm
Title | FADO: A Deterministic Detection/Learning Algorithm |
Authors | Kristiaan Pelckmans |
Abstract | This paper proposes and studies a detection technique for adversarial scenarios (dubbed deterministic detection). This technique provides an alternative detection methodology in case the usual stochastic methods are not applicable: this can be because the studied phenomenon does not follow a stochastic sampling scheme, samples are high-dimensional and subsequent multiple-testing corrections render results overly conservative, sample sizes are too low for asymptotic results (as e.g. the central limit theorem) to kick in, or one cannot allow for the small probability of failure inherent to stochastic approaches. This paper instead designs a method based on insights from machine learning and online learning theory: this detection algorithm - named Online FAult Detection (FADO) - comes with theoretical guarantees of its detection capabilities. A version of the margin is found to regulate the detection performance of FADO. A precise expression is derived for bounding the performance, and experimental results are presented assessing the influence of involved quantities. A case study of scene detection is used to illustrate the approach. The technology is closely related to the linear perceptron rule, inherits its computational attractiveness and flexibility towards various extensions. |
Tasks | Fault Detection |
Published | 2017-11-07 |
URL | http://arxiv.org/abs/1711.02361v1 |
http://arxiv.org/pdf/1711.02361v1.pdf | |
PWC | https://paperswithcode.com/paper/fado-a-deterministic-detectionlearning |
Repo | |
Framework | |
Entropic Trace Estimates for Log Determinants
Title | Entropic Trace Estimates for Log Determinants |
Authors | Jack Fitzsimons, Diego Granziol, Kurt Cutajar, Michael Osborne, Maurizio Filippone, Stephen Roberts |
Abstract | The scalable calculation of matrix determinants has been a bottleneck to the widespread application of many machine learning methods such as determinantal point processes, Gaussian processes, generalised Markov random fields, graph models and many others. In this work, we estimate log determinants under the framework of maximum entropy, given information in the form of moment constraints from stochastic trace estimation. The estimates demonstrate a significant improvement on state-of-the-art alternative methods, as shown on a wide variety of UFL sparse matrices. By taking the example of a general Markov random field, we also demonstrate how this approach can significantly accelerate inference in large-scale learning methods involving the log determinant. |
Tasks | Gaussian Processes, Point Processes |
Published | 2017-04-24 |
URL | http://arxiv.org/abs/1704.07223v1 |
http://arxiv.org/pdf/1704.07223v1.pdf | |
PWC | https://paperswithcode.com/paper/entropic-trace-estimates-for-log-determinants |
Repo | |
Framework | |
Neumann Optimizer: A Practical Optimization Algorithm for Deep Neural Networks
Title | Neumann Optimizer: A Practical Optimization Algorithm for Deep Neural Networks |
Authors | Shankar Krishnan, Ying Xiao, Rif A. Saurous |
Abstract | Progress in deep learning is slowed by the days or weeks it takes to train large models. The natural solution of using more hardware is limited by diminishing returns, and leads to inefficient use of additional resources. In this paper, we present a large batch, stochastic optimization algorithm that is both faster than widely used algorithms for fixed amounts of computation, and also scales up substantially better as more computational resources become available. Our algorithm implicitly computes the inverse Hessian of each mini-batch to produce descent directions; we do so without either an explicit approximation to the Hessian or Hessian-vector products. We demonstrate the effectiveness of our algorithm by successfully training large ImageNet models (Inception-V3, Resnet-50, Resnet-101 and Inception-Resnet-V2) with mini-batch sizes of up to 32000 with no loss in validation error relative to current baselines, and no increase in the total number of steps. At smaller mini-batch sizes, our optimizer improves the validation error in these models by 0.8-0.9%. Alternatively, we can trade off this accuracy to reduce the number of training steps needed by roughly 10-30%. Our work is practical and easily usable by others – only one hyperparameter (learning rate) needs tuning, and furthermore, the algorithm is as computationally cheap as the commonly used Adam optimizer. |
Tasks | Stochastic Optimization |
Published | 2017-12-08 |
URL | http://arxiv.org/abs/1712.03298v1 |
http://arxiv.org/pdf/1712.03298v1.pdf | |
PWC | https://paperswithcode.com/paper/neumann-optimizer-a-practical-optimization |
Repo | |
Framework | |
Multiscale sequence modeling with a learned dictionary
Title | Multiscale sequence modeling with a learned dictionary |
Authors | Bart van Merriënboer, Amartya Sanyal, Hugo Larochelle, Yoshua Bengio |
Abstract | We propose a generalization of neural network sequence models. Instead of predicting one symbol at a time, our multi-scale model makes predictions over multiple, potentially overlapping multi-symbol tokens. A variation of the byte-pair encoding (BPE) compression algorithm is used to learn the dictionary of tokens that the model is trained with. When applied to language modelling, our model has the flexibility of character-level models while maintaining many of the performance benefits of word-level models. Our experiments show that this model performs better than a regular LSTM on language modeling tasks, especially for smaller models. |
Tasks | Language Modelling |
Published | 2017-07-03 |
URL | http://arxiv.org/abs/1707.00762v2 |
http://arxiv.org/pdf/1707.00762v2.pdf | |
PWC | https://paperswithcode.com/paper/multiscale-sequence-modeling-with-a-learned |
Repo | |
Framework | |
Dual-fisheye lens stitching for 360-degree imaging
Title | Dual-fisheye lens stitching for 360-degree imaging |
Authors | Tuan Ho, Madhukar Budagavi |
Abstract | Dual-fisheye lens cameras have been increasingly used for 360-degree immersive imaging. However, the limited overlapping field of views and misalignment between the two lenses give rise to visible discontinuities in the stitching boundaries. This paper introduces a novel method for dual-fisheye camera stitching that adaptively minimizes the discontinuities in the overlapping regions to generate full spherical 360-degree images. Results show that this approach can produce good quality stitched images for Samsung Gear 360 – a dual-fisheye camera, even with hard-to-stitch objects in the stitching borders. |
Tasks | |
Published | 2017-08-20 |
URL | http://arxiv.org/abs/1708.08988v1 |
http://arxiv.org/pdf/1708.08988v1.pdf | |
PWC | https://paperswithcode.com/paper/dual-fisheye-lens-stitching-for-360-degree |
Repo | |
Framework | |
Super-Resolution of Wavelet-Encoded Images
Title | Super-Resolution of Wavelet-Encoded Images |
Authors | Vildan Atalay Aydin, Hassan Foroosh |
Abstract | Multiview super-resolution image reconstruction (SRIR) is often cast as a resampling problem by merging non-redundant data from multiple low-resolution (LR) images on a finer high-resolution (HR) grid, while inverting the effect of the camera point spread function (PSF). One main problem with multiview methods is that resampling from nonuniform samples (provided by LR images) and the inversion of the PSF are highly nonlinear and ill-posed problems. Non-linearity and ill-posedness are typically overcome by linearization and regularization, often through an iterative optimization process, which essentially trade off the very same information (i.e. high frequency) that we want to recover. We propose a novel point of view for multiview SRIR: Unlike existing multiview methods that reconstruct the entire spectrum of the HR image from the multiple given LR images, we derive explicit expressions that show how the high-frequency spectra of the unknown HR image are related to the spectra of the LR images. Therefore, by taking any of the LR images as the reference to represent the low-frequency spectra of the HR image, one can reconstruct the super-resolution image by focusing only on the reconstruction of the high-frequency spectra. This is very much like single-image methods, which extrapolate the spectrum of one image, except that we rely on information provided by all other views, rather than by prior constraints as in single-image methods (which may not be an accurate source of information). This is made possible by deriving and applying explicit closed-form expressions that define how the local high frequency information that we aim to recover for the reference high resolution image is related to the local low frequency information in the sequence of views. Results and comparisons with recently published state-of-the-art methods show the superiority of the proposed solution. |
Tasks | Image Reconstruction, Super-Resolution |
Published | 2017-05-03 |
URL | http://arxiv.org/abs/1705.01258v1 |
http://arxiv.org/pdf/1705.01258v1.pdf | |
PWC | https://paperswithcode.com/paper/super-resolution-of-wavelet-encoded-images |
Repo | |
Framework | |
Memory-Efficient Global Refinement of Decision-Tree Ensembles and its Application to Face Alignment
Title | Memory-Efficient Global Refinement of Decision-Tree Ensembles and its Application to Face Alignment |
Authors | Nenad Markuš, Ivan Gogić, Igor S. Pandžić, Jörgen Ahlberg |
Abstract | Ren et al. recently introduced a method for aggregating multiple decision trees into a strong predictor by interpreting a path taken by a sample down each tree as a binary vector and performing linear regression on top of these vectors stacked together. They provided experimental evidence that the method offers advantages over the usual approaches for combining decision trees (random forests and boosting). The method truly shines when the regression target is a large vector with correlated dimensions, such as a 2D face shape represented with the positions of several facial landmarks. However, we argue that their basic method is not applicable in many practical scenarios due to large memory requirements. This paper shows how this issue can be solved through the use of quantization and architectural changes of the predictor that maps decision tree-derived encodings to the desired output. |
Tasks | Face Alignment, Quantization |
Published | 2017-02-27 |
URL | http://arxiv.org/abs/1702.08481v2 |
http://arxiv.org/pdf/1702.08481v2.pdf | |
PWC | https://paperswithcode.com/paper/memory-efficient-global-refinement-of |
Repo | |
Framework | |
DeepSafe: A Data-driven Approach for Checking Adversarial Robustness in Neural Networks
Title | DeepSafe: A Data-driven Approach for Checking Adversarial Robustness in Neural Networks |
Authors | Divya Gopinath, Guy Katz, Corina S. Pasareanu, Clark Barrett |
Abstract | Deep neural networks have become widely used, obtaining remarkable results in domains such as computer vision, speech recognition, natural language processing, audio recognition, social network filtering, machine translation, and bio-informatics, where they have produced results comparable to human experts. However, these networks can be easily fooled by adversarial perturbations: minimal changes to correctly-classified inputs, that cause the network to mis-classify them. This phenomenon represents a concern for both safety and security, but it is currently unclear how to measure a network’s robustness against such perturbations. Existing techniques are limited to checking robustness around a few individual input points, providing only very limited guarantees. We propose a novel approach for automatically identifying safe regions of the input space, within which the network is robust against adversarial perturbations. The approach is data-guided, relying on clustering to identify well-defined geometric regions as candidate safe regions. We then utilize verification techniques to confirm that these regions are safe or to provide counter-examples showing that they are not safe. We also introduce the notion of targeted robustness which, for a given target label and region, ensures that a NN does not map any input in the region to the target label. We evaluated our technique on the MNIST dataset and on a neural network implementation of a controller for the next-generation Airborne Collision Avoidance System for unmanned aircraft (ACAS Xu). For these networks, our approach identified multiple regions which were completely safe as well as some which were only safe for specific labels. It also discovered several adversarial perturbations of interest. |
Tasks | Machine Translation, Speech Recognition |
Published | 2017-10-02 |
URL | https://arxiv.org/abs/1710.00486v2 |
https://arxiv.org/pdf/1710.00486v2.pdf | |
PWC | https://paperswithcode.com/paper/deepsafe-a-data-driven-approach-for-checking |
Repo | |
Framework | |
Objective Bayesian Analysis for Change Point Problems
Title | Objective Bayesian Analysis for Change Point Problems |
Authors | Laurentiu Hinoveanu, Fabrizio Leisen, Cristiano Villa |
Abstract | In this paper we present a loss-based approach to change point analysis. In particular, we look at the problem from two perspectives. The first focuses on the definition of a prior when the number of change points is known a priori. The second contribution aims to estimate the number of change points by using a loss-based approach recently introduced in the literature. The latter considers change point estimation as a model selection exercise. We show the performance of the proposed approach on simulated data and real data sets. |
Tasks | Model Selection |
Published | 2017-02-17 |
URL | http://arxiv.org/abs/1702.05462v2 |
http://arxiv.org/pdf/1702.05462v2.pdf | |
PWC | https://paperswithcode.com/paper/objective-bayesian-analysis-for-change-point |
Repo | |
Framework | |
Fast mixing for Latent Dirichlet allocation
Title | Fast mixing for Latent Dirichlet allocation |
Authors | Johan Jonasson |
Abstract | Markov chain Monte Carlo (MCMC) algorithms are ubiquitous in probability theory in general and in machine learning in particular. A Markov chain is devised so that its stationary distribution is some probability distribution of interest. Then one samples from the given distribution by running the Markov chain for a “long time” until it appears to be stationary and then collects the sample. However these chains are often very complex and there are no theoretical guarantees that stationarity is actually reached. In this paper we study the Gibbs sampler of the posterior distribution of a very simple case of Latent Dirichlet Allocation, the arguably most well known Bayesian unsupervised learning model for text generation and text classification. It is shown that when the corpus consists of two long documents of equal length $m$ and the vocabulary consists of only two different words, the mixing time is at most of order $m^2\log m$ (which corresponds to $m\log m$ rounds over the corpus). It will be apparent from our analysis that it seems very likely that the mixing time is not much worse in the more relevant case when the number of documents and the size of the vocabulary are also large as long as each word is represented a large number in each document, even though the computations involved may be intractable. |
Tasks | Text Classification, Text Generation |
Published | 2017-01-11 |
URL | http://arxiv.org/abs/1701.02960v2 |
http://arxiv.org/pdf/1701.02960v2.pdf | |
PWC | https://paperswithcode.com/paper/fast-mixing-for-latent-dirichlet-allocation |
Repo | |
Framework | |
Inducing Interpretability in Knowledge Graph Embeddings
Title | Inducing Interpretability in Knowledge Graph Embeddings |
Authors | Chandrahas, Tathagata Sengupta, Cibi Pragadeesh, Partha Pratim Talukdar |
Abstract | We study the problem of inducing interpretability in KG embeddings. Specifically, we explore the Universal Schema (Riedel et al., 2013) and propose a method to induce interpretability. There have been many vector space models proposed for the problem, however, most of these methods don’t address the interpretability (semantics) of individual dimensions. In this work, we study this problem and propose a method for inducing interpretability in KG embeddings using entity co-occurrence statistics. The proposed method significantly improves the interpretability, while maintaining comparable performance in other KG tasks. |
Tasks | Knowledge Graph Embeddings |
Published | 2017-12-10 |
URL | http://arxiv.org/abs/1712.03547v1 |
http://arxiv.org/pdf/1712.03547v1.pdf | |
PWC | https://paperswithcode.com/paper/inducing-interpretability-in-knowledge-graph |
Repo | |
Framework | |
Minimax Statistical Learning with Wasserstein Distances
Title | Minimax Statistical Learning with Wasserstein Distances |
Authors | Jaeho Lee, Maxim Raginsky |
Abstract | As opposed to standard empirical risk minimization (ERM), distributionally robust optimization aims to minimize the worst-case risk over a larger ambiguity set containing the original empirical distribution of the training data. In this work, we describe a minimax framework for statistical learning with ambiguity sets given by balls in Wasserstein space. In particular, we prove generalization bounds that involve the covering number properties of the original ERM problem. As an illustrative example, we provide generalization guarantees for transport-based domain adaptation problems where the Wasserstein distance between the source and target domain distributions can be reliably estimated from unlabeled samples. |
Tasks | Domain Adaptation |
Published | 2017-05-22 |
URL | http://arxiv.org/abs/1705.07815v2 |
http://arxiv.org/pdf/1705.07815v2.pdf | |
PWC | https://paperswithcode.com/paper/minimax-statistical-learning-with-wasserstein |
Repo | |
Framework | |
WebAPIRec: Recommending Web APIs to Software Projects via Personalized Ranking
Title | WebAPIRec: Recommending Web APIs to Software Projects via Personalized Ranking |
Authors | Ferdian Thung, Richard J. Oentaryo, David Lo, Yuan Tian |
Abstract | Application programming interfaces (APIs) offer a plethora of functionalities for developers to reuse without reinventing the wheel. Identifying the appropriate APIs given a project requirement is critical for the success of a project, as many functionalities can be reused to achieve faster development. However, the massive number of APIs would often hinder the developers’ ability to quickly find the right APIs. In this light, we propose a new, automated approach called WebAPIRec that takes as input a project profile and outputs a ranked list of {web} APIs that can be used to implement the project. At its heart, WebAPIRec employs a personalized ranking model that ranks web APIs specific (personalized) to a project. Based on the historical data of {web} API usages, WebAPIRec learns a model that minimizes the incorrect ordering of web APIs, i.e., when a used {web} API is ranked lower than an unused (or a not-yet-used) web API. We have evaluated our approach on a dataset comprising 9,883 web APIs and 4,315 web application projects from ProgrammableWeb with promising results. For 84.0% of the projects, WebAPIRec is able to successfully return correct APIs that are used to implement the projects in the top-5 positions. This is substantially better than the recommendations provided by ProgrammableWeb’s native search functionality. WebAPIRec also outperforms McMillan et al.‘s application search engine and popularity-based recommendation. |
Tasks | |
Published | 2017-05-01 |
URL | http://arxiv.org/abs/1705.00561v1 |
http://arxiv.org/pdf/1705.00561v1.pdf | |
PWC | https://paperswithcode.com/paper/webapirec-recommending-web-apis-to-software |
Repo | |
Framework | |
Machine Translation in Indian Languages: Challenges and Resolution
Title | Machine Translation in Indian Languages: Challenges and Resolution |
Authors | Raj Nath Patel, Prakash B. Pimpale, M Sasikumar |
Abstract | English to Indian language machine translation poses the challenge of structural and morphological divergence. This paper describes English to Indian language statistical machine translation using pre-ordering and suffix separation. The pre-ordering uses rules to transfer the structure of the source sentences prior to training and translation. This syntactic restructuring helps statistical machine translation to tackle the structural divergence and hence better translation quality. The suffix separation is used to tackle the morphological divergence between English and highly agglutinative Indian languages. We demonstrate that the use of pre-ordering and suffix separation helps in improving the quality of English to Indian Language machine translation. |
Tasks | Machine Translation |
Published | 2017-08-26 |
URL | http://arxiv.org/abs/1708.07950v3 |
http://arxiv.org/pdf/1708.07950v3.pdf | |
PWC | https://paperswithcode.com/paper/machine-translation-in-indian-languages |
Repo | |
Framework | |
The Causal Role of Astrocytes in Slow-Wave Rhythmogenesis: A Computational Modelling Study
Title | The Causal Role of Astrocytes in Slow-Wave Rhythmogenesis: A Computational Modelling Study |
Authors | Leo Kozachkov, Konstantinos P. Michmizos |
Abstract | Finding the origin of slow and infra-slow oscillations could reveal or explain brain mechanisms in health and disease. Here, we present a biophysically constrained computational model of a neural network where the inclusion of astrocytes introduced slow and infra-slow-oscillations, through two distinct mechanisms. Specifically, we show how astrocytes can modulate the fast network activity through their slow inter-cellular calcium wave speed and amplitude and possibly cause the oscillatory imbalances observed in diseases commonly known for such abnormalities, namely Alzheimer’s disease, Parkinson’s disease, epilepsy, depression and ischemic stroke. This work aims to increase our knowledge on how astrocytes and neurons synergize to affect brain function and dysfunction. |
Tasks | |
Published | 2017-02-13 |
URL | http://arxiv.org/abs/1702.03993v1 |
http://arxiv.org/pdf/1702.03993v1.pdf | |
PWC | https://paperswithcode.com/paper/the-causal-role-of-astrocytes-in-slow-wave |
Repo | |
Framework | |