Paper Group ANR 257
Statistical Properties of European Languages and Voynich Manuscript Analysis. The Computational Power of Dynamic Bayesian Networks. On the Existence of a Projective Reconstruction. Primal-Dual Rates and Certificates. Learning to predict where to look in interactive environments using deep recurrent q-learning. Nonsymbolic Text Representation. Faste …
Statistical Properties of European Languages and Voynich Manuscript Analysis
Title | Statistical Properties of European Languages and Voynich Manuscript Analysis |
Authors | Andronik Arutyunov, Leonid Borisov, Sergey Fedorov, Anastasiya Ivchenko, Elizabeth Kirina-Lilinskaya, Yurii Orlov, Konstantin Osminin, Sergey Shilin, Dmitriy Zeniuk |
Abstract | The statistical properties of letters frequencies in European literature texts are investigated. The determination of logarithmic dependence of letters sequence for one-language and two-language texts are examined. The pare of languages is suggested for Voynich Manuscript. The internal structure of Manuscript is considered. The spectral portraits of two-letters distribution are constructed. |
Tasks | |
Published | 2016-11-18 |
URL | http://arxiv.org/abs/1611.09122v1 |
http://arxiv.org/pdf/1611.09122v1.pdf | |
PWC | https://paperswithcode.com/paper/statistical-properties-of-european-languages |
Repo | |
Framework | |
The Computational Power of Dynamic Bayesian Networks
Title | The Computational Power of Dynamic Bayesian Networks |
Authors | Joshua Brulé |
Abstract | This paper considers the computational power of constant size, dynamic Bayesian networks. Although discrete dynamic Bayesian networks are no more powerful than hidden Markov models, dynamic Bayesian networks with continuous random variables and discrete children of continuous parents are capable of performing Turing-complete computation. With modified versions of existing algorithms for belief propagation, such a simulation can be carried out in real time. This result suggests that dynamic Bayesian networks may be more powerful than previously considered. Relationships to causal models and recurrent neural networks are also discussed. |
Tasks | |
Published | 2016-03-19 |
URL | http://arxiv.org/abs/1603.06125v1 |
http://arxiv.org/pdf/1603.06125v1.pdf | |
PWC | https://paperswithcode.com/paper/the-computational-power-of-dynamic-bayesian |
Repo | |
Framework | |
On the Existence of a Projective Reconstruction
Title | On the Existence of a Projective Reconstruction |
Authors | Hon-Leung Lee |
Abstract | In this note we study the connection between the existence of a projective reconstruction and the existence of a fundamental matrix satisfying the epipolar constraints. |
Tasks | |
Published | 2016-08-19 |
URL | http://arxiv.org/abs/1608.05518v1 |
http://arxiv.org/pdf/1608.05518v1.pdf | |
PWC | https://paperswithcode.com/paper/on-the-existence-of-a-projective |
Repo | |
Framework | |
Primal-Dual Rates and Certificates
Title | Primal-Dual Rates and Certificates |
Authors | Celestine Dünner, Simone Forte, Martin Takáč, Martin Jaggi |
Abstract | We propose an algorithm-independent framework to equip existing optimization methods with primal-dual certificates. Such certificates and corresponding rate of convergence guarantees are important for practitioners to diagnose progress, in particular in machine learning applications. We obtain new primal-dual convergence rates, e.g., for the Lasso as well as many L1, Elastic Net, group Lasso and TV-regularized problems. The theory applies to any norm-regularized generalized linear model. Our approach provides efficiently computable duality gaps which are globally defined, without modifying the original problems in the region of interest. |
Tasks | |
Published | 2016-02-16 |
URL | http://arxiv.org/abs/1602.05205v2 |
http://arxiv.org/pdf/1602.05205v2.pdf | |
PWC | https://paperswithcode.com/paper/primal-dual-rates-and-certificates |
Repo | |
Framework | |
Learning to predict where to look in interactive environments using deep recurrent q-learning
Title | Learning to predict where to look in interactive environments using deep recurrent q-learning |
Authors | Sajad Mousavi, Michael Schukat, Enda Howley, Ali Borji, Nasser Mozayani |
Abstract | Bottom-Up (BU) saliency models do not perform well in complex interactive environments where humans are actively engaged in tasks (e.g., sandwich making and playing the video games). In this paper, we leverage Reinforcement Learning (RL) to highlight task-relevant locations of input frames. We propose a soft attention mechanism combined with the Deep Q-Network (DQN) model to teach an RL agent how to play a game and where to look by focusing on the most pertinent parts of its visual input. Our evaluations on several Atari 2600 games show that the soft attention based model could predict fixation locations significantly better than bottom-up models such as Itti-Kochs saliency and Graph-Based Visual Saliency (GBVS) models. |
Tasks | Atari Games, Q-Learning |
Published | 2016-12-17 |
URL | http://arxiv.org/abs/1612.05753v2 |
http://arxiv.org/pdf/1612.05753v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-predict-where-to-look-in |
Repo | |
Framework | |
Nonsymbolic Text Representation
Title | Nonsymbolic Text Representation |
Authors | Hinrich Schuetze, Heike Adel, Ehsaneddin Asgari |
Abstract | We introduce the first generic text representation model that is completely nonsymbolic, i.e., it does not require the availability of a segmentation or tokenization method that attempts to identify words or other symbolic units in text. This applies to training the parameters of the model on a training corpus as well as to applying it when computing the representation of a new text. We show that our model performs better than prior work on an information extraction and a text denoising task. |
Tasks | Denoising, Tokenization |
Published | 2016-10-03 |
URL | http://arxiv.org/abs/1610.00479v3 |
http://arxiv.org/pdf/1610.00479v3.pdf | |
PWC | https://paperswithcode.com/paper/nonsymbolic-text-representation |
Repo | |
Framework | |
Faster variational inducing input Gaussian process classification
Title | Faster variational inducing input Gaussian process classification |
Authors | Pavel Izmailov, Dmitry Kropotov |
Abstract | Gaussian processes (GP) provide a prior over functions and allow finding complex regularities in data. Gaussian processes are successfully used for classification/regression problems and dimensionality reduction. In this work we consider the classification problem only. The complexity of standard methods for GP-classification scales cubically with the size of the training dataset. This complexity makes them inapplicable to big data problems. Therefore, a variety of methods were introduced to overcome this limitation. In the paper we focus on methods based on so called inducing inputs. This approach is based on variational inference and proposes a particular lower bound for marginal likelihood (evidence). This bound is then maximized w.r.t. parameters of kernel function of the Gaussian process, thus fitting the model to data. The computational complexity of this method is $O(nm^2)$, where $m$ is the number of inducing inputs used by the model and is assumed to be substantially smaller than the size of the dataset $n$. Recently, a new evidence lower bound for GP-classification problem was introduced. It allows using stochastic optimization, which makes it suitable for big data problems. However, the new lower bound depends on $O(m^2)$ variational parameter, which makes optimization challenging in case of big m. In this work we develop a new approach for training inducing input GP models for classification problems. Here we use quadratic approximation of several terms in the aforementioned evidence lower bound, obtaining analytical expressions for optimal values of most of the parameters in the optimization, thus sufficiently reducing the dimension of optimization space. In our experiments we achieve as well or better results, compared to the existing method. Moreover, our method doesn’t require the user to manually set the learning rate, making it more practical, than the existing method. |
Tasks | Dimensionality Reduction, Gaussian Processes, Stochastic Optimization |
Published | 2016-11-18 |
URL | http://arxiv.org/abs/1611.06132v1 |
http://arxiv.org/pdf/1611.06132v1.pdf | |
PWC | https://paperswithcode.com/paper/faster-variational-inducing-input-gaussian |
Repo | |
Framework | |
Deep Variational Canonical Correlation Analysis
Title | Deep Variational Canonical Correlation Analysis |
Authors | Weiran Wang, Xinchen Yan, Honglak Lee, Karen Livescu |
Abstract | We present deep variational canonical correlation analysis (VCCA), a deep multi-view learning model that extends the latent variable model interpretation of linear CCA to nonlinear observation models parameterized by deep neural networks. We derive variational lower bounds of the data likelihood by parameterizing the posterior probability of the latent variables from the view that is available at test time. We also propose a variant of VCCA called VCCA-private that can, in addition to the “common variables” underlying both views, extract the “private variables” within each view, and disentangles the shared and private information for multi-view data without hard supervision. Experimental results on real-world datasets show that our methods are competitive across domains. |
Tasks | MULTI-VIEW LEARNING |
Published | 2016-10-11 |
URL | http://arxiv.org/abs/1610.03454v3 |
http://arxiv.org/pdf/1610.03454v3.pdf | |
PWC | https://paperswithcode.com/paper/deep-variational-canonical-correlation |
Repo | |
Framework | |
LSTM with Working Memory
Title | LSTM with Working Memory |
Authors | Andrew Pulver, Siwei Lyu |
Abstract | Previous RNN architectures have largely been superseded by LSTM, or “Long Short-Term Memory”. Since its introduction, there have been many variations on this simple design. However, it is still widely used and we are not aware of a gated-RNN architecture that outperforms LSTM in a broad sense while still being as simple and efficient. In this paper we propose a modified LSTM-like architecture. Our architecture is still simple and achieves better performance on the tasks that we tested on. We also introduce a new RNN performance benchmark that uses the handwritten digits and stresses several important network capabilities. |
Tasks | |
Published | 2016-05-06 |
URL | http://arxiv.org/abs/1605.01988v3 |
http://arxiv.org/pdf/1605.01988v3.pdf | |
PWC | https://paperswithcode.com/paper/lstm-with-working-memory |
Repo | |
Framework | |
Expectation Consistent Approximate Inference: Generalizations and Convergence
Title | Expectation Consistent Approximate Inference: Generalizations and Convergence |
Authors | Alyson K. Fletcher, Mojtaba Sahraee-Ardakan, Sundeep Rangan, Philip Schniter |
Abstract | Approximations of loopy belief propagation, including expectation propagation and approximate message passing, have attracted considerable attention for probabilistic inference problems. This paper proposes and analyzes a generalization of Opper and Winther’s expectation consistent (EC) approximate inference method. The proposed method, called Generalized Expectation Consistency (GEC), can be applied to both maximum a posteriori (MAP) and minimum mean squared error (MMSE) estimation. Here we characterize its fixed points, convergence, and performance relative to the replica prediction of optimality. |
Tasks | |
Published | 2016-02-25 |
URL | http://arxiv.org/abs/1602.07795v2 |
http://arxiv.org/pdf/1602.07795v2.pdf | |
PWC | https://paperswithcode.com/paper/expectation-consistent-approximate-inference |
Repo | |
Framework | |
Keyphrase Extraction using Sequential Labeling
Title | Keyphrase Extraction using Sequential Labeling |
Authors | Sujatha Das Gollapalli, Xiao-li Li |
Abstract | Keyphrases efficiently summarize a document’s content and are used in various document processing and retrieval tasks. Several unsupervised techniques and classifiers exist for extracting keyphrases from text documents. Most of these methods operate at a phrase-level and rely on part-of-speech (POS) filters for candidate phrase generation. In addition, they do not directly handle keyphrases of varying lengths. We overcome these modeling shortcomings by addressing keyphrase extraction as a sequential labeling task in this paper. We explore a basic set of features commonly used in NLP tasks as well as predictions from various unsupervised methods to train our taggers. In addition to a more natural modeling for the keyphrase extraction problem, we show that tagging models yield significant performance benefits over existing state-of-the-art extraction methods. |
Tasks | |
Published | 2016-08-01 |
URL | http://arxiv.org/abs/1608.00329v2 |
http://arxiv.org/pdf/1608.00329v2.pdf | |
PWC | https://paperswithcode.com/paper/keyphrase-extraction-using-sequential |
Repo | |
Framework | |
Multi-Output Artificial Neural Network for Storm Surge Prediction in North Carolina
Title | Multi-Output Artificial Neural Network for Storm Surge Prediction in North Carolina |
Authors | Anton Bezuglov, Brian Blanton, Reinaldo Santiago |
Abstract | During hurricane seasons, emergency managers and other decision makers need accurate and `on-time’ information on potential storm surge impacts. Fully dynamical computer models, such as the ADCIRC tide, storm surge, and wind-wave model take several hours to complete a forecast when configured at high spatial resolution. Additionally, statically meaningful ensembles of high-resolution models (needed for uncertainty estimation) cannot easily be computed in near real-time. This paper discusses an artificial neural network model for storm surge prediction in North Carolina. The network model provides fast, real-time storm surge estimates at coastal locations in North Carolina. The paper studies the performance of the neural network model vs. other models on synthetic and real hurricane data. | |
Tasks | |
Published | 2016-09-23 |
URL | http://arxiv.org/abs/1609.07378v1 |
http://arxiv.org/pdf/1609.07378v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-output-artificial-neural-network-for |
Repo | |
Framework | |
Distributed Deep Learning Using Synchronous Stochastic Gradient Descent
Title | Distributed Deep Learning Using Synchronous Stochastic Gradient Descent |
Authors | Dipankar Das, Sasikanth Avancha, Dheevatsa Mudigere, Karthikeyan Vaidynathan, Srinivas Sridharan, Dhiraj Kalamkar, Bharat Kaul, Pradeep Dubey |
Abstract | We design and implement a distributed multinode synchronous SGD algorithm, without altering hyper parameters, or compressing data, or altering algorithmic behavior. We perform a detailed analysis of scaling, and identify optimal design points for different networks. We demonstrate scaling of CNNs on 100s of nodes, and present what we believe to be record training throughputs. A 512 minibatch VGG-A CNN training run is scaled 90X on 128 nodes. Also 256 minibatch VGG-A and OverFeat-FAST networks are scaled 53X and 42X respectively on a 64 node cluster. We also demonstrate the generality of our approach via best-in-class 6.5X scaling for a 7-layer DNN on 16 nodes. Thereafter we attempt to democratize deep-learning by training on an Ethernet based AWS cluster and show ~14X scaling on 16 nodes. |
Tasks | |
Published | 2016-02-22 |
URL | http://arxiv.org/abs/1602.06709v1 |
http://arxiv.org/pdf/1602.06709v1.pdf | |
PWC | https://paperswithcode.com/paper/distributed-deep-learning-using-synchronous |
Repo | |
Framework | |
Temporal Attention Model for Neural Machine Translation
Title | Temporal Attention Model for Neural Machine Translation |
Authors | Baskaran Sankaran, Haitao Mi, Yaser Al-Onaizan, Abe Ittycheriah |
Abstract | Attention-based Neural Machine Translation (NMT) models suffer from attention deficiency issues as has been observed in recent research. We propose a novel mechanism to address some of these limitations and improve the NMT attention. Specifically, our approach memorizes the alignments temporally (within each sentence) and modulates the attention with the accumulated temporal memory, as the decoder generates the candidate translation. We compare our approach against the baseline NMT model and two other related approaches that address this issue either explicitly or implicitly. Large-scale experiments on two language pairs show that our approach achieves better and robust gains over the baseline and related NMT approaches. Our model further outperforms strong SMT baselines in some settings even without using ensembles. |
Tasks | Machine Translation |
Published | 2016-08-09 |
URL | http://arxiv.org/abs/1608.02927v1 |
http://arxiv.org/pdf/1608.02927v1.pdf | |
PWC | https://paperswithcode.com/paper/temporal-attention-model-for-neural-machine |
Repo | |
Framework | |
Distilling Information Reliability and Source Trustworthiness from Digital Traces
Title | Distilling Information Reliability and Source Trustworthiness from Digital Traces |
Authors | Behzad Tabibian, Isabel Valera, Mehrdad Farajtabar, Le Song, Bernhard Schölkopf, Manuel Gomez-Rodriguez |
Abstract | Online knowledge repositories typically rely on their users or dedicated editors to evaluate the reliability of their content. These evaluations can be viewed as noisy measurements of both information reliability and information source trustworthiness. Can we leverage these noisy evaluations, often biased, to distill a robust, unbiased and interpretable measure of both notions? In this paper, we argue that the temporal traces left by these noisy evaluations give cues on the reliability of the information and the trustworthiness of the sources. Then, we propose a temporal point process modeling framework that links these temporal traces to robust, unbiased and interpretable notions of information reliability and source trustworthiness. Furthermore, we develop an efficient convex optimization procedure to learn the parameters of the model from historical traces. Experiments on real-world data gathered from Wikipedia and Stack Overflow show that our modeling framework accurately predicts evaluation events, provides an interpretable measure of information reliability and source trustworthiness, and yields interesting insights about real-world events. |
Tasks | |
Published | 2016-10-24 |
URL | http://arxiv.org/abs/1610.07472v3 |
http://arxiv.org/pdf/1610.07472v3.pdf | |
PWC | https://paperswithcode.com/paper/distilling-information-reliability-and-source |
Repo | |
Framework | |