May 5, 2019

2893 words 14 mins read

Paper Group ANR 465

Learning the Curriculum with Bayesian Optimization for Task-Specific Word Representation Learning. An empirical analysis of the optimization of deep network loss surfaces. Simple and Efficient Parallelization for Probabilistic Temporal Tensor Factorization. CaMKII activation supports reward-based neural network optimization through Hamiltonian samp …

Learning the Curriculum with Bayesian Optimization for Task-Specific Word Representation Learning


Title	Learning the Curriculum with Bayesian Optimization for Task-Specific Word Representation Learning
Authors	Yulia Tsvetkov, Manaal Faruqui, Wang Ling, Brian MacWhinney, Chris Dyer
Abstract	We use Bayesian optimization to learn curricula for word representation learning, optimizing performance on downstream tasks that depend on the learned representations as features. The curricula are modeled by a linear ranking function which is the scalar product of a learned weight vector and an engineered feature vector that characterizes the different aspects of the complexity of each instance in the training corpus. We show that learning the curriculum improves performance on a variety of downstream tasks over random orders and in comparison to the natural corpus order.
Tasks	Representation Learning
Published	2016-05-12
URL	http://arxiv.org/abs/1605.03852v2
PDF	http://arxiv.org/pdf/1605.03852v2.pdf
PWC	https://paperswithcode.com/paper/learning-the-curriculum-with-bayesian
Repo
Framework

An empirical analysis of the optimization of deep network loss surfaces


Title	An empirical analysis of the optimization of deep network loss surfaces
Authors	Daniel Jiwoong Im, Michael Tao, Kristin Branson
Abstract	The success of deep neural networks hinges on our ability to accurately and efficiently optimize high-dimensional, non-convex functions. In this paper, we empirically investigate the loss functions of state-of-the-art networks, and how commonly-used stochastic gradient descent variants optimize these loss functions. To do this, we visualize the loss function by projecting them down to low-dimensional spaces chosen based on the convergence points of different optimization algorithms. Our observations suggest that optimization algorithms encounter and choose different descent directions at many saddle points to find different final weights. Based on consistency we observe across re-runs of the same stochastic optimization algorithm, we hypothesize that each optimization algorithm makes characteristic choices at these saddle points.
Tasks	Stochastic Optimization
Published	2016-12-13
URL	http://arxiv.org/abs/1612.04010v4
PDF	http://arxiv.org/pdf/1612.04010v4.pdf
PWC	https://paperswithcode.com/paper/an-empirical-analysis-of-the-optimization-of
Repo
Framework

Simple and Efficient Parallelization for Probabilistic Temporal Tensor Factorization


Title	Simple and Efficient Parallelization for Probabilistic Temporal Tensor Factorization
Authors	Guangxi Li, Zenglin Xu, Linnan Wang, Jinmian Ye, Irwin King, Michael Lyu
Abstract	Probabilistic Temporal Tensor Factorization (PTTF) is an effective algorithm to model the temporal tensor data. It leverages a time constraint to capture the evolving properties of tensor data. Nowadays the exploding dataset demands a large scale PTTF analysis, and a parallel solution is critical to accommodate the trend. Whereas, the parallelization of PTTF still remains unexplored. In this paper, we propose a simple yet efficient Parallel Probabilistic Temporal Tensor Factorization, referred to as P$^2$T$^2$F, to provide a scalable PTTF solution. P$^2$T$^2$F is fundamentally disparate from existing parallel tensor factorizations by considering the probabilistic decomposition and the temporal effects of tensor data. It adopts a new tensor data split strategy to subdivide a large tensor into independent sub-tensors, the computation of which is inherently parallel. We train P$^2$T$^2$F with an efficient algorithm of stochastic Alternating Direction Method of Multipliers, and show that the convergence is guaranteed. Experiments on several real-word tensor datasets demonstrate that P$^2$T$^2$F is a highly effective and efficiently scalable algorithm dedicated for large scale probabilistic temporal tensor analysis.
Tasks
Published	2016-11-11
URL	http://arxiv.org/abs/1611.03578v1
PDF	http://arxiv.org/pdf/1611.03578v1.pdf
PWC	https://paperswithcode.com/paper/simple-and-efficient-parallelization-for
Repo
Framework

CaMKII activation supports reward-based neural network optimization through Hamiltonian sampling


Title	CaMKII activation supports reward-based neural network optimization through Hamiltonian sampling
Authors	Zhaofei Yu, David Kappel, Robert Legenstein, Sen Song, Feng Chen, Wolfgang Maass
Abstract	Synaptic plasticity is implemented and controlled through over thousand different types of molecules in the postsynaptic density and presynaptic boutons that assume a staggering array of different states through phosporylation and other mechanisms. One of the most prominent molecule in the postsynaptic density is CaMKII, that is described in molecular biology as a “memory molecule” that can integrate through auto-phosporylation Ca-influx signals on a relatively large time scale of dozens of seconds. The functional impact of this memory mechanism is largely unknown. We show that the experimental data on the specific role of CaMKII activation in dopamine-gated spine consolidation suggest a general functional role in speeding up reward-guided search for network configurations that maximize reward expectation. Our theoretical analysis shows that stochastic search could in principle even attain optimal network configurations by emulating one of the most well-known nonlinear optimization methods, simulated annealing. But this optimization is usually impeded by slowness of stochastic search at a given temperature. We propose that CaMKII contributes a momentum term that substantially speeds up this search. In particular, it allows the network to overcome saddle points of the fitness function. The resulting improved stochastic policy search can be understood on a more abstract level as Hamiltonian sampling, which is known to be one of the most efficient stochastic search methods.
Tasks
Published	2016-06-01
URL	http://arxiv.org/abs/1606.00157v3
PDF	http://arxiv.org/pdf/1606.00157v3.pdf
PWC	https://paperswithcode.com/paper/camkii-activation-supports-reward-based
Repo
Framework

Improving Recurrent Neural Networks For Sequence Labelling


Title	Improving Recurrent Neural Networks For Sequence Labelling
Authors	Marco Dinarelli, Isabelle Tellier
Abstract	In this paper we study different types of Recurrent Neural Networks (RNN) for sequence labeling tasks. We propose two new variants of RNNs integrating improvements for sequence labeling, and we compare them to the more traditional Elman and Jordan RNNs. We compare all models, either traditional or new, on four distinct tasks of sequence labeling: two on Spoken Language Understanding (ATIS and MEDIA); and two of POS tagging for the French Treebank (FTB) and the Penn Treebank (PTB) corpora. The results show that our new variants of RNNs are always more effective than the others.
Tasks	Spoken Language Understanding
Published	2016-06-08
URL	http://arxiv.org/abs/1606.02555v1
PDF	http://arxiv.org/pdf/1606.02555v1.pdf
PWC	https://paperswithcode.com/paper/improving-recurrent-neural-networks-for
Repo
Framework

High Accuracy Android Malware Detection Using Ensemble Learning


Title	High Accuracy Android Malware Detection Using Ensemble Learning
Authors	Suleiman Y. Yerima, Sakir Sezer, Igor Muttik
Abstract	With over 50 billion downloads and more than 1.3 million apps in the Google official market, Android has continued to gain popularity amongst smartphone users worldwide. At the same time there has been a rise in malware targeting the platform, with more recent strains employing highly sophisticated detection avoidance techniques. As traditional signature based methods become less potent in detecting unknown malware, alternatives are needed for timely zero-day discovery. Thus this paper proposes an approach that utilizes ensemble learning for Android malware detection. It combines advantages of static analysis with the efficiency and performance of ensemble machine learning to improve Android malware detection accuracy. The machine learning models are built using a large repository of malware samples and benign apps from a leading antivirus vendor. Experimental results and analysis presented shows that the proposed method which uses a large feature space to leverage the power of ensemble learning is capable of 97.3 to 99 percent detection accuracy with very low false positive rates.
Tasks	Android Malware Detection, Malware Detection
Published	2016-08-02
URL	http://arxiv.org/abs/1608.00835v1
PDF	http://arxiv.org/pdf/1608.00835v1.pdf
PWC	https://paperswithcode.com/paper/high-accuracy-android-malware-detection-using
Repo
Framework

Analysis of Bayesian Classification based Approaches for Android Malware Detection


Title	Analysis of Bayesian Classification based Approaches for Android Malware Detection
Authors	Suleiman Y. Yerima, Sakir Sezer, Gavin McWilliams
Abstract	Mobile malware has been growing in scale and complexity spurred by the unabated uptake of smartphones worldwide. Android is fast becoming the most popular mobile platform resulting in sharp increase in malware targeting the platform. Additionally, Android malware is evolving rapidly to evade detection by traditional signature-based scanning. Despite current detection measures in place, timely discovery of new malware is still a critical issue. This calls for novel approaches to mitigate the growing threat of zero-day Android malware. Hence, in this paper we develop and analyze proactive Machine Learning approaches based on Bayesian classification aimed at uncovering unknown Android malware via static analysis. The study, which is based on a large malware sample set of majority of the existing families, demonstrates detection capabilities with high accuracy. Empirical results and comparative analysis are presented offering useful insight towards development of effective static-analytic Bayesian classification based solutions for detecting unknown Android malware.
Tasks	Android Malware Detection, Malware Detection
Published	2016-08-20
URL	http://arxiv.org/abs/1608.05812v1
PDF	http://arxiv.org/pdf/1608.05812v1.pdf
PWC	https://paperswithcode.com/paper/analysis-of-bayesian-classification-based
Repo
Framework

Android Malware Detection Using Parallel Machine Learning Classifiers


Title	Android Malware Detection Using Parallel Machine Learning Classifiers
Authors	Suleiman Y. Yerima, Sakir Sezer, Igor Muttik
Abstract	Mobile malware has continued to grow at an alarming rate despite on-going efforts towards mitigating the problem. This has been particularly noticeable on Android due to its being an open platform that has subsequently overtaken other platforms in the share of the mobile smart devices market. Hence, incentivizing a new wave of emerging Android malware sophisticated enough to evade most common detection methods. This paper proposes and investigates a parallel machine learning based classification approach for early detection of Android malware. Using real malware samples and benign applications, a composite classification model is developed from parallel combination of heterogeneous classifiers. The empirical evaluation of the model under different combination schemes demonstrates its efficacy and potential to improve detection accuracy. More importantly, by utilizing several classifiers with diverse characteristics, their strengths can be harnessed not only for enhanced Android malware detection but also quicker white box analysis by means of the more interpretable constituent classifiers.
Tasks	Android Malware Detection, Malware Detection
Published	2016-07-27
URL	http://arxiv.org/abs/1607.08186v1
PDF	http://arxiv.org/pdf/1607.08186v1.pdf
PWC	https://paperswithcode.com/paper/android-malware-detection-using-parallel
Repo
Framework

On the Simultaneous Preservation of Privacy and Community Structure in Anonymized Networks


Title	On the Simultaneous Preservation of Privacy and Community Structure in Anonymized Networks
Authors	Daniel Cullina, Kushagra Singhal, Negar Kiyavash, Prateek Mittal
Abstract	We consider the problem of performing community detection on a network, while maintaining privacy, assuming that the adversary has access to an auxiliary correlated network. We ask the question “Does there exist a regime where the network cannot be deanonymized perfectly, yet the community structure could be learned?.” To answer this question, we derive information theoretic converses for the perfect deanonymization problem using the Stochastic Block Model and edge sub-sampling. We also provide an almost tight achievability result for perfect deanonymization. We also evaluate the performance of percolation based deanonymization algorithm on Stochastic Block Model data-sets that satisfy the conditions of our converse. Although our converse applies to exact deanonymization, the algorithm fails drastically when the conditions of the converse are met. Additionally, we study the effect of edge sub-sampling on the community structure of a real world dataset. Results show that the dataset falls under the purview of the idea of this paper. There results suggest that it may be possible to prove stronger partial deanonymizability converses, which would enable better privacy guarantees.
Tasks	Community Detection
Published	2016-03-25
URL	http://arxiv.org/abs/1603.08028v1
PDF	http://arxiv.org/pdf/1603.08028v1.pdf
PWC	https://paperswithcode.com/paper/on-the-simultaneous-preservation-of-privacy
Repo
Framework

Efficient batch-sequential Bayesian optimization with moments of truncated Gaussian vectors


Title	Efficient batch-sequential Bayesian optimization with moments of truncated Gaussian vectors
Authors	Sébastien Marmin, Clément Chevalier, David Ginsbourger
Abstract	We deal with the efficient parallelization of Bayesian global optimization algorithms, and more specifically of those based on the expected improvement criterion and its variants. A closed form formula relying on multivariate Gaussian cumulative distribution functions is established for a generalized version of the multipoint expected improvement criterion. In turn, the latter relies on intermediate results that could be of independent interest concerning moments of truncated Gaussian vectors. The obtained expansion of the criterion enables studying its differentiability with respect to point batches and calculating the corresponding gradient in closed form. Furthermore , we derive fast numerical approximations of this gradient and propose efficient batch optimization strategies. Numerical experiments illustrate that the proposed approaches enable computational savings of between one and two order of magnitudes, hence enabling derivative-based batch-sequential acquisition function maximization to become a practically implementable and efficient standard.
Tasks
Published	2016-09-09
URL	http://arxiv.org/abs/1609.02700v1
PDF	http://arxiv.org/pdf/1609.02700v1.pdf
PWC	https://paperswithcode.com/paper/efficient-batch-sequential-bayesian
Repo
Framework

Neural Speech Recognizer: Acoustic-to-Word LSTM Model for Large Vocabulary Speech Recognition


Title	Neural Speech Recognizer: Acoustic-to-Word LSTM Model for Large Vocabulary Speech Recognition
Authors	Hagen Soltau, Hank Liao, Hasim Sak
Abstract	We present results that show it is possible to build a competitive, greatly simplified, large vocabulary continuous speech recognition system with whole words as acoustic units. We model the output vocabulary of about 100,000 words directly using deep bi-directional LSTM RNNs with CTC loss. The model is trained on 125,000 hours of semi-supervised acoustic training data, which enables us to alleviate the data sparsity problem for word models. We show that the CTC word models work very well as an end-to-end all-neural speech recognition model without the use of traditional context-dependent sub-word phone units that require a pronunciation lexicon, and without any language model removing the need to decode. We demonstrate that the CTC word models perform better than a strong, more complex, state-of-the-art baseline with sub-word units.
Tasks	Language Modelling, Large Vocabulary Continuous Speech Recognition, Speech Recognition
Published	2016-10-31
URL	http://arxiv.org/abs/1610.09975v1
PDF	http://arxiv.org/pdf/1610.09975v1.pdf
PWC	https://paperswithcode.com/paper/neural-speech-recognizer-acoustic-to-word
Repo
Framework

Distributed Inexact Damped Newton Method: Data Partitioning and Load-Balancing


Title	Distributed Inexact Damped Newton Method: Data Partitioning and Load-Balancing
Authors	Chenxin Ma, Martin Takáč
Abstract	In this paper we study inexact dumped Newton method implemented in a distributed environment. We start with an original DiSCO algorithm [Communication-Efficient Distributed Optimization of Self-Concordant Empirical Loss, Yuchen Zhang and Lin Xiao, 2015]. We will show that this algorithm may not scale well and propose an algorithmic modifications which will lead to less communications, better load-balancing and more efficient computation. We perform numerical experiments with an regularized empirical loss minimization instance described by a 273GB dataset.
Tasks	Distributed Optimization
Published	2016-03-16
URL	http://arxiv.org/abs/1603.05191v1
PDF	http://arxiv.org/pdf/1603.05191v1.pdf
PWC	https://paperswithcode.com/paper/distributed-inexact-damped-newton-method-data
Repo
Framework

Image Aesthetic Assessment: An Experimental Survey


Title	Image Aesthetic Assessment: An Experimental Survey
Authors	Yubin Deng, Chen Change Loy, Xiaoou Tang
Abstract	This survey aims at reviewing recent computer vision techniques used in the assessment of image aesthetic quality. Image aesthetic assessment aims at computationally distinguishing high-quality photos from low-quality ones based on photographic rules, typically in the form of binary classification or quality scoring. A variety of approaches has been proposed in the literature trying to solve this challenging problem. In this survey, we present a systematic listing of the reviewed approaches based on visual feature types (hand-crafted features and deep features) and evaluation criteria (dataset characteristics and evaluation metrics). Main contributions and novelties of the reviewed approaches are highlighted and discussed. In addition, following the emergence of deep learning techniques, we systematically evaluate recent deep learning settings that are useful for developing a robust deep model for aesthetic scoring. Experiments are conducted using simple yet solid baselines that are competitive with the current state-of-the-arts. Moreover, we discuss the possibility of manipulating the aesthetics of images through computational approaches. We hope that our survey could serve as a comprehensive reference source for future research on the study of image aesthetic assessment.
Tasks
Published	2016-10-04
URL	http://arxiv.org/abs/1610.00838v2
PDF	http://arxiv.org/pdf/1610.00838v2.pdf
PWC	https://paperswithcode.com/paper/image-aesthetic-assessment-an-experimental
Repo
Framework

Without-Replacement Sampling for Stochastic Gradient Methods: Convergence Results and Application to Distributed Optimization


Title	Without-Replacement Sampling for Stochastic Gradient Methods: Convergence Results and Application to Distributed Optimization
Authors	Ohad Shamir
Abstract	Stochastic gradient methods for machine learning and optimization problems are usually analyzed assuming data points are sampled \emph{with} replacement. In practice, however, sampling \emph{without} replacement is very common, easier to implement in many cases, and often performs better. In this paper, we provide competitive convergence guarantees for without-replacement sampling, under various scenarios, for three types of algorithms: Any algorithm with online regret guarantees, stochastic gradient descent, and SVRG. A useful application of our SVRG analysis is a nearly-optimal algorithm for regularized least squares in a distributed setting, in terms of both communication complexity and runtime complexity, when the data is randomly partitioned and the condition number can be as large as the data size per machine (up to logarithmic factors). Our proof techniques combine ideas from stochastic optimization, adversarial online learning, and transductive learning theory, and can potentially be applied to other stochastic optimization and learning problems.
Tasks	Distributed Optimization, Stochastic Optimization
Published	2016-03-02
URL	http://arxiv.org/abs/1603.00570v3
PDF	http://arxiv.org/pdf/1603.00570v3.pdf
PWC	https://paperswithcode.com/paper/without-replacement-sampling-for-stochastic
Repo
Framework

Projected Regression Methods for Inverting Fredholm Integrals: Formalism and Application to Analytical Continuation


Title	Projected Regression Methods for Inverting Fredholm Integrals: Formalism and Application to Analytical Continuation
Authors	Louis-Francois Arsenault, Richard Neuberg, Lauren A. Hannah, Andrew J. Millis
Abstract	We present a machine learning approach to the inversion of Fredholm integrals of the first kind. The approach provides a natural regularization in cases where the inverse of the Fredholm kernel is ill-conditioned. It also provides an efficient and stable treatment of constraints. The key observation is that the stability of the forward problem permits the construction of a large database of outputs for physically meaningful inputs. We apply machine learning to this database to generate a regression function of controlled complexity, which returns approximate solutions for previously unseen inputs; the approximate solutions are then projected onto the subspace of functions satisfying relevant constraints. We also derive and present uncertainty estimates. We illustrate the approach by applying it to the analytical continuation problem of quantum many-body physics, which involves reconstructing the frequency dependence of physical excitation spectra from data obtained at specific points in the complex frequency plane. Under standard error metrics the method performs as well or better than the Maximum Entropy method for low input noise and is substantially more robust to increased input noise. We expect the methodology to be similarly effective for any problem involving a formally ill-conditioned inversion, provided that the forward problem can be efficiently solved.
Tasks
Published	2016-12-15
URL	http://arxiv.org/abs/1612.04895v1
PDF	http://arxiv.org/pdf/1612.04895v1.pdf
PWC	https://paperswithcode.com/paper/projected-regression-methods-for-inverting
Repo
Framework