May 6, 2019

3027 words 15 mins read

Paper Group ANR 272

Paper Group ANR 272

A Context-aware Attention Network for Interactive Question Answering. Reparameterization trick for discrete variables. Limitations and Alternatives for the Evaluation of Large-scale Link Prediction. Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition. On the consistency of inversion-free parameter est …

A Context-aware Attention Network for Interactive Question Answering

Title A Context-aware Attention Network for Interactive Question Answering
Authors Huayu Li, Martin Renqiang Min, Yong Ge, Asim Kadav
Abstract Neural network based sequence-to-sequence models in an encoder-decoder framework have been successfully applied to solve Question Answering (QA) problems, predicting answers from statements and questions. However, almost all previous models have failed to consider detailed context information and unknown states under which systems do not have enough information to answer given questions. These scenarios with incomplete or ambiguous information are very common in the setting of Interactive Question Answering (IQA). To address this challenge, we develop a novel model, employing context-dependent word-level attention for more accurate statement representations and question-guided sentence-level attention for better context modeling. We also generate unique IQA datasets to test our model, which will be made publicly available. Employing these attention mechanisms, our model accurately understands when it can output an answer or when it requires generating a supplementary question for additional input depending on different contexts. When available, user’s feedback is encoded and directly applied to update sentence-level attention to infer an answer. Extensive experiments on QA and IQA datasets quantitatively demonstrate the effectiveness of our model with significant improvement over state-of-the-art conventional QA models.
Tasks Question Answering
Published 2016-12-22
URL http://arxiv.org/abs/1612.07411v2
PDF http://arxiv.org/pdf/1612.07411v2.pdf
PWC https://paperswithcode.com/paper/a-context-aware-attention-network-for
Repo
Framework

Reparameterization trick for discrete variables

Title Reparameterization trick for discrete variables
Authors Seiya Tokui, Issei sato
Abstract Low-variance gradient estimation is crucial for learning directed graphical models parameterized by neural networks, where the reparameterization trick is widely used for those with continuous variables. While this technique gives low-variance gradient estimates, it has not been directly applicable to discrete variables, the sampling of which inherently requires discontinuous operations. We argue that the discontinuity can be bypassed by marginalizing out the variable of interest, which results in a new reparameterization trick for discrete variables. This reparameterization greatly reduces the variance, which is understood by regarding the method as an application of common random numbers to the estimation. The resulting estimator is theoretically guaranteed to have a variance not larger than that of the likelihood-ratio method with the optimal input-dependent baseline. We give empirical results for variational learning of sigmoid belief networks.
Tasks
Published 2016-11-04
URL http://arxiv.org/abs/1611.01239v1
PDF http://arxiv.org/pdf/1611.01239v1.pdf
PWC https://paperswithcode.com/paper/reparameterization-trick-for-discrete
Repo
Framework
Title Limitations and Alternatives for the Evaluation of Large-scale Link Prediction
Authors Dario Garcia-Gasulla, Eduard Ayguadé, Jesús Labarta, Ulises Cortés
Abstract Link prediction, the problem of identifying missing links among a set of inter-related data entities, is a popular field of research due to its application to graph-like domains. Producing consistent evaluations of the performance of the many link prediction algorithms being proposed can be challenging due to variable graph properties, such as size and density. In this paper we first discuss traditional data mining solutions which are applicable to link prediction evaluation, arguing about their capacity for producing faithful and useful evaluations. We also introduce an innovative modification to a traditional evaluation methodology with the goal of adapting it to the problem of evaluating link prediction algorithms when applied to large graphs, by tackling the problem of class imbalance. We empirically evaluate the proposed methodology and, building on these findings, make a case for its importance on the evaluation of large-scale graph processing.
Tasks Link Prediction
Published 2016-11-02
URL http://arxiv.org/abs/1611.00547v2
PDF http://arxiv.org/pdf/1611.00547v2.pdf
PWC https://paperswithcode.com/paper/limitations-and-alternatives-for-the
Repo
Framework

Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition

Title Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition
Authors Hamed Karimi, Julie Nutini, Mark Schmidt
Abstract In 1963, Polyak proposed a simple condition that is sufficient to show a global linear convergence rate for gradient descent. This condition is a special case of the \L{}ojasiewicz inequality proposed in the same year, and it does not require strong convexity (or even convexity). In this work, we show that this much-older Polyak-\L{}ojasiewicz (PL) inequality is actually weaker than the main conditions that have been explored to show linear convergence rates without strong convexity over the last 25 years. We also use the PL inequality to give new analyses of randomized and greedy coordinate descent methods, sign-based gradient descent methods, and stochastic gradient methods in the classic setting (with decreasing or constant step-sizes) as well as the variance-reduced setting. We further propose a generalization that applies to proximal-gradient methods for non-smooth optimization, leading to simple proofs of linear convergence of these methods. Along the way, we give simple convergence results for a wide variety of problems in machine learning: least squares, logistic regression, boosting, resilient backpropagation, L1-regularization, support vector machines, stochastic dual coordinate ascent, and stochastic variance-reduced gradient methods.
Tasks
Published 2016-08-16
URL http://arxiv.org/abs/1608.04636v3
PDF http://arxiv.org/pdf/1608.04636v3.pdf
PWC https://paperswithcode.com/paper/linear-convergence-of-gradient-and-proximal
Repo
Framework

On the consistency of inversion-free parameter estimation for Gaussian random fields

Title On the consistency of inversion-free parameter estimation for Gaussian random fields
Authors Hossein Keshavarz, Clayton Scott, XuanLong Nguyen
Abstract Gaussian random fields are a powerful tool for modeling environmental processes. For high dimensional samples, classical approaches for estimating the covariance parameters require highly challenging and massive computations, such as the evaluation of the Cholesky factorization or solving linear systems. Recently, Anitescu, Chen and Stein \cite{M.Anitescu} proposed a fast and scalable algorithm which does not need such burdensome computations. The main focus of this article is to study the asymptotic behavior of the algorithm of Anitescu et al. (ACS) for regular and irregular grids in the increasing domain setting. Consistency, minimax optimality and asymptotic normality of this algorithm are proved under mild differentiability conditions on the covariance function. Despite the fact that ACS’s method entails a non-concave maximization, our results hold for any stationary point of the objective function. A numerical study is presented to evaluate the efficiency of this algorithm for large data sets.
Tasks
Published 2016-01-15
URL http://arxiv.org/abs/1601.03822v2
PDF http://arxiv.org/pdf/1601.03822v2.pdf
PWC https://paperswithcode.com/paper/on-the-consistency-of-inversion-free
Repo
Framework

Lie Access Neural Turing Machine

Title Lie Access Neural Turing Machine
Authors Greg Yang
Abstract Following the recent trend in explicit neural memory structures, we present a new design of an external memory, wherein memories are stored in an Euclidean key space $\mathbb R^n$. An LSTM controller performs read and write via specialized read and write heads. It can move a head by either providing a new address in the key space (aka random access) or moving from its previous position via a Lie group action (aka Lie access). In this way, the “L” and “R” instructions of a traditional Turing Machine are generalized to arbitrary elements of a fixed Lie group action. For this reason, we name this new model the Lie Access Neural Turing Machine, or LANTM. We tested two different configurations of LANTM against an LSTM baseline in several basic experiments. We found the right configuration of LANTM to outperform the baseline in all of our experiments. In particular, we trained LANTM on addition of $k$-digit numbers for $2 \le k \le 16$, but it was able to generalize almost perfectly to $17 \le k \le 32$, all with the number of parameters 2 orders of magnitude below the LSTM baseline.
Tasks
Published 2016-02-28
URL http://arxiv.org/abs/1602.08671v3
PDF http://arxiv.org/pdf/1602.08671v3.pdf
PWC https://paperswithcode.com/paper/lie-access-neural-turing-machine
Repo
Framework

Pre-Translation for Neural Machine Translation

Title Pre-Translation for Neural Machine Translation
Authors Jan Niehues, Eunah Cho, Thanh-Le Ha, Alex Waibel
Abstract Recently, the development of neural machine translation (NMT) has significantly improved the translation quality of automatic machine translation. While most sentences are more accurate and fluent than translations by statistical machine translation (SMT)-based systems, in some cases, the NMT system produces translations that have a completely different meaning. This is especially the case when rare words occur. When using statistical machine translation, it has already been shown that significant gains can be achieved by simplifying the input in a preprocessing step. A commonly used example is the pre-reordering approach. In this work, we used phrase-based machine translation to pre-translate the input into the target language. Then a neural machine translation system generates the final hypothesis using the pre-translation. Thereby, we use either only the output of the phrase-based machine translation (PBMT) system or a combination of the PBMT output and the source sentence. We evaluate the technique on the English to German translation task. Using this approach we are able to outperform the PBMT system as well as the baseline neural MT system by up to 2 BLEU points. We analyzed the influence of the quality of the initial system on the final result.
Tasks Machine Translation
Published 2016-10-17
URL http://arxiv.org/abs/1610.05243v1
PDF http://arxiv.org/pdf/1610.05243v1.pdf
PWC https://paperswithcode.com/paper/pre-translation-for-neural-machine
Repo
Framework

Distributed Mean Estimation with Limited Communication

Title Distributed Mean Estimation with Limited Communication
Authors Ananda Theertha Suresh, Felix X. Yu, Sanjiv Kumar, H. Brendan McMahan
Abstract Motivated by the need for distributed learning and optimization algorithms with low communication cost, we study communication efficient algorithms for distributed mean estimation. Unlike previous works, we make no probabilistic assumptions on the data. We first show that for $d$ dimensional data with $n$ clients, a naive stochastic binary rounding approach yields a mean squared error (MSE) of $\Theta(d/n)$ and uses a constant number of bits per dimension per client. We then extend this naive algorithm in two ways: we show that applying a structured random rotation before quantization reduces the error to $\mathcal{O}((\log d)/n)$ and a better coding strategy further reduces the error to $\mathcal{O}(1/n)$ and uses a constant number of bits per dimension per client. We also show that the latter coding strategy is optimal up to a constant in the minimax sense i.e., it achieves the best MSE for a given communication cost. We finally demonstrate the practicality of our algorithms by applying them to distributed Lloyd’s algorithm for k-means and power iteration for PCA.
Tasks Quantization
Published 2016-11-02
URL http://arxiv.org/abs/1611.00429v3
PDF http://arxiv.org/pdf/1611.00429v3.pdf
PWC https://paperswithcode.com/paper/distributed-mean-estimation-with-limited
Repo
Framework

sk_p: a neural program corrector for MOOCs

Title sk_p: a neural program corrector for MOOCs
Authors Yewen Pu, Karthik Narasimhan, Armando Solar-Lezama, Regina Barzilay
Abstract We present a novel technique for automatic program correction in MOOCs, capable of fixing both syntactic and semantic errors without manual, problem specific correction strategies. Given an incorrect student program, it generates candidate programs from a distribution of likely corrections, and checks each candidate for correctness against a test suite. The key observation is that in MOOCs many programs share similar code fragments, and the seq2seq neural network model, used in the natural-language processing task of machine translation, can be modified and trained to recover these fragments. Experiment shows our scheme can correct 29% of all incorrect submissions and out-performs state of the art approach which requires manual, problem specific correction strategies.
Tasks Machine Translation
Published 2016-07-11
URL http://arxiv.org/abs/1607.02902v1
PDF http://arxiv.org/pdf/1607.02902v1.pdf
PWC https://paperswithcode.com/paper/sk_p-a-neural-program-corrector-for-moocs
Repo
Framework

Applying Naive Bayes Classification to Google Play Apps Categorization

Title Applying Naive Bayes Classification to Google Play Apps Categorization
Authors Babatunde Olabenjo
Abstract There are over one million apps on Google Play Store and over half a million publishers. Having such a huge number of apps and developers can pose a challenge to app users and new publishers on the store. Discovering apps can be challenging if apps are not correctly published in the right category, and, in turn, reduce earnings for app developers. Additionally, with over 41 categories on Google Play Store, deciding on the right category to publish an app can be challenging for developers due to the number of categories they have to choose from. Machine Learning has been very useful, especially in classification problems such sentiment analysis, document classification and spam detection. These strategies can also be applied to app categorization on Google Play Store to suggest appropriate categories for app publishers using details from their application. In this project, we built two variations of the Naive Bayes classifier using open metadata from top developer apps on Google Play Store in other to classify new apps on the store. These classifiers are then evaluated using various evaluation methods and their results compared against each other. The results show that the Naive Bayes algorithm performs well for our classification problem and can potentially automate app categorization for Android app publishers on Google Play Store
Tasks Document Classification, Sentiment Analysis
Published 2016-08-30
URL http://arxiv.org/abs/1608.08574v1
PDF http://arxiv.org/pdf/1608.08574v1.pdf
PWC https://paperswithcode.com/paper/applying-naive-bayes-classification-to-google
Repo
Framework

Comparative Study of Instance Based Learning and Back Propagation for Classification Problems

Title Comparative Study of Instance Based Learning and Back Propagation for Classification Problems
Authors Nadia Kanwal, Erkan Bostanci
Abstract The paper presents a comparative study of the performance of Back Propagation and Instance Based Learning Algorithm for classification tasks. The study is carried out by a series of experiments will all possible combinations of parameter values for the algorithms under evaluation. The algorithm’s classification accuracy is compared over a range of datasets and measurements like Cross Validation, Kappa Statistics, Root Mean Squared Value and True Positive vs False Positive rate have been used to evaluate their performance. Along with performance comparison, techniques of handling missing values have also been compared that include Mean or Mode replacement and Multiple Imputation. The results showed that parameter adjustment plays vital role in improving an algorithm’s accuracy and therefore, Back Propagation has shown better results as compared to Instance Based Learning. Furthermore, the problem of missing values was better handled by Multiple imputation method, however, not suitable for less amount of data.
Tasks Imputation
Published 2016-04-19
URL http://arxiv.org/abs/1604.05429v1
PDF http://arxiv.org/pdf/1604.05429v1.pdf
PWC https://paperswithcode.com/paper/comparative-study-of-instance-based-learning
Repo
Framework

Modern WLAN Fingerprinting Indoor Positioning Methods and Deployment Challenges

Title Modern WLAN Fingerprinting Indoor Positioning Methods and Deployment Challenges
Authors Ali Khalajmehrabadi, Nikolaos Gatsis, David Akopian
Abstract Wireless Local Area Network (WLAN) has become a promising choice for indoor positioning as the only existing and established infrastructure, to localize the mobile and stationary users indoors. However, since WLAN has been initially designed for wireless networking and not positioning, the localization task based on WLAN signals has several challenges. Amongst the WLAN positioning methods, WLAN fingerprinting localization has recently achieved great attention due to its promising results. WLAN fingerprinting faces several challenges and hence, in this paper, our goal is to overview these challenges and the state-of-the-art solutions. This paper consists of three main parts: 1) Conventional localization schemes; 2) State-of-the-art approaches; 3) Practical deployment challenges. Since all the proposed methods in WLAN literature have been conducted and tested in different settings, the reported results are not equally comparable. So, we compare some of the main localization schemes in a single real environment and assess their localization accuracy, positioning error statistics, and complexity. Our results depict illustrative evaluation of WLAN localization systems and guide to future improvement opportunities.
Tasks
Published 2016-10-18
URL http://arxiv.org/abs/1610.05424v1
PDF http://arxiv.org/pdf/1610.05424v1.pdf
PWC https://paperswithcode.com/paper/modern-wlan-fingerprinting-indoor-positioning
Repo
Framework

Unsupervised single-particle deep clustering via statistical manifold learning

Title Unsupervised single-particle deep clustering via statistical manifold learning
Authors Jiayi Wu, Yong-Bei Ma, Charles Congdon, Bevin Brett, Shuobing Chen, Qi Ouyang, Youdong Mao
Abstract Motivation: Structural heterogeneity in single-particle cryo-electron microscopy (cryo-EM) data represents a major challenge for high-resolution structure determination. Unsupervised classification may serve as the first step in the assessment of structural heterogeneity. Traditional algorithms for unsupervised classification, such as K-means clustering and maximum likelihood optimization, may classify images into wrong classes with decreasing signal-to-noise-ratio (SNR) in the image data, yet demand increased cost in computation. Overcoming these limitations requires further development on clustering algorithms for high-performance cryo-EM data analysis. Results: Here we introduce a statistical manifold learning algorithm for unsupervised single-particle deep clustering. We show that statistical manifold learning improves classification accuracy by about 40% in the absence of input references for lower SNR data. Applications to several experimental datasets suggest that our deep clustering approach can detect subtle structural difference among classes. Through code optimization over the Intel high-performance computing (HPC) processors, our software implementation can generate thousands of reference-free class averages within several hours from hundreds of thousands of single-particle cryo-EM images, which allows significant improvement in ab initio 3D reconstruction resolution and quality. Our approach has been successfully applied in several structural determination projects. We expect that it provides a powerful computational tool in analyzing highly heterogeneous structural data and assisting in computational purification of single-particle datasets for high-resolution reconstruction.
Tasks 3D Reconstruction
Published 2016-04-15
URL http://arxiv.org/abs/1604.04539v2
PDF http://arxiv.org/pdf/1604.04539v2.pdf
PWC https://paperswithcode.com/paper/unsupervised-single-particle-deep-clustering
Repo
Framework

Online Learning of Commission Avoidant Portfolio Ensembles

Title Online Learning of Commission Avoidant Portfolio Ensembles
Authors Guy Uziel, Ran El-Yaniv
Abstract We present a novel online ensemble learning strategy for portfolio selection. The new strategy controls and exploits any set of commission-oblivious portfolio selection algorithms. The strategy handles transaction costs using a novel commission avoidance mechanism. We prove a logarithmic regret bound for our strategy with respect to optimal mixtures of the base algorithms. Numerical examples validate the viability of our method and show significant improvement over the state-of-the-art.
Tasks
Published 2016-05-03
URL http://arxiv.org/abs/1605.00788v2
PDF http://arxiv.org/pdf/1605.00788v2.pdf
PWC https://paperswithcode.com/paper/online-learning-of-commission-avoidant
Repo
Framework

Multilevel Weighted Support Vector Machine for Classification on Healthcare Data with Missing Values

Title Multilevel Weighted Support Vector Machine for Classification on Healthcare Data with Missing Values
Authors Talayeh Razzaghi, Oleg Roderick, Ilya Safro, Nicholas Marko
Abstract This work is motivated by the needs of predictive analytics on healthcare data as represented by Electronic Medical Records. Such data is invariably problematic: noisy, with missing entries, with imbalance in classes of interests, leading to serious bias in predictive modeling. Since standard data mining methods often produce poor performance measures, we argue for development of specialized techniques of data-preprocessing and classification. In this paper, we propose a new method to simultaneously classify large datasets and reduce the effects of missing values. It is based on a multilevel framework of the cost-sensitive SVM and the expected maximization imputation method for missing values, which relies on iterated regression analyses. We compare classification results of multilevel SVM-based algorithms on public benchmark datasets with imbalanced classes and missing values as well as real data in health applications, and show that our multilevel SVM-based method produces fast, and more accurate and robust classification results.
Tasks Imputation
Published 2016-04-07
URL http://arxiv.org/abs/1604.02123v1
PDF http://arxiv.org/pdf/1604.02123v1.pdf
PWC https://paperswithcode.com/paper/multilevel-weighted-support-vector-machine
Repo
Framework
comments powered by Disqus