July 27, 2019

3253 words 16 mins read

Paper Group ANR 612

Paper Group ANR 612

Using Deep Neural Networks to Automate Large Scale Statistical Analysis for Big Data Applications. Persistence Flamelets: multiscale Persistent Homology for kernel density exploration. Quantitative Analysis of Automatic Image Cropping Algorithms: A Dataset and Comparative Study. Fast single image super-resolution based on sigmoid transformation. Co …

Using Deep Neural Networks to Automate Large Scale Statistical Analysis for Big Data Applications

Title Using Deep Neural Networks to Automate Large Scale Statistical Analysis for Big Data Applications
Authors Rongrong Zhang, Wei Deng, Michael Yu Zhu
Abstract Statistical analysis (SA) is a complex process to deduce population properties from analysis of data. It usually takes a well-trained analyst to successfully perform SA, and it becomes extremely challenging to apply SA to big data applications. We propose to use deep neural networks to automate the SA process. In particular, we propose to construct convolutional neural networks (CNNs) to perform automatic model selection and parameter estimation, two most important SA tasks. We refer to the resulting CNNs as the neural model selector and the neural model estimator, respectively, which can be properly trained using labeled data systematically generated from candidate models. Simulation study shows that both the selector and estimator demonstrate excellent performances. The idea and proposed framework can be further extended to automate the entire SA process and have the potential to revolutionize how SA is performed in big data analytics.
Tasks Model Selection
Published 2017-08-09
URL http://arxiv.org/abs/1708.03027v1
PDF http://arxiv.org/pdf/1708.03027v1.pdf
PWC https://paperswithcode.com/paper/using-deep-neural-networks-to-automate-large
Repo
Framework

Persistence Flamelets: multiscale Persistent Homology for kernel density exploration

Title Persistence Flamelets: multiscale Persistent Homology for kernel density exploration
Authors Tullia Padellini, Pierpaolo Brutti
Abstract In recent years there has been noticeable interest in the study of the “shape of data”. Among the many ways a “shape” could be defined, topology is the most general one, as it describes an object in terms of its connectivity structure: connected components (topological features of dimension 0), cycles (features of dimension 1) and so on. There is a growing number of techniques, generally denoted as Topological Data Analysis, aimed at estimating topological invariants of a fixed object; when we allow this object to change, however, little has been done to investigate the evolution in its topology. In this work we define the Persistence Flamelets, a multiscale version of one of the most popular tool in TDA, the Persistence Landscape. We examine its theoretical properties and we show how it could be used to gain insights on KDEs bandwidth parameter.
Tasks Topological Data Analysis
Published 2017-09-20
URL http://arxiv.org/abs/1709.07097v1
PDF http://arxiv.org/pdf/1709.07097v1.pdf
PWC https://paperswithcode.com/paper/persistence-flamelets-multiscale-persistent
Repo
Framework

Quantitative Analysis of Automatic Image Cropping Algorithms: A Dataset and Comparative Study

Title Quantitative Analysis of Automatic Image Cropping Algorithms: A Dataset and Comparative Study
Authors Yi-Ling Chen, Tzu-Wei Huang, Kai-Han Chang, Yu-Chen Tsai, Hwann-Tzong Chen, Bing-Yu Chen
Abstract Automatic photo cropping is an important tool for improving visual quality of digital photos without resorting to tedious manual selection. Traditionally, photo cropping is accomplished by determining the best proposal window through visual quality assessment or saliency detection. In essence, the performance of an image cropper highly depends on the ability to correctly rank a number of visually similar proposal windows. Despite the ranking nature of automatic photo cropping, little attention has been paid to learning-to-rank algorithms in tackling such a problem. In this work, we conduct an extensive study on traditional approaches as well as ranking-based croppers trained on various image features. In addition, a new dataset consisting of high quality cropping and pairwise ranking annotations is presented to evaluate the performance of various baselines. The experimental results on the new dataset provide useful insights into the design of better photo cropping algorithms.
Tasks Image Cropping, Learning-To-Rank, Saliency Detection
Published 2017-01-05
URL http://arxiv.org/abs/1701.01480v1
PDF http://arxiv.org/pdf/1701.01480v1.pdf
PWC https://paperswithcode.com/paper/quantitative-analysis-of-automatic-image
Repo
Framework

Fast single image super-resolution based on sigmoid transformation

Title Fast single image super-resolution based on sigmoid transformation
Authors Longguang Wang, Zaiping Lin, Jinyan Gao, Xinpu Deng, Wei An
Abstract Single image super-resolution aims to generate a high-resolution image from a single low-resolution image, which is of great significance in extensive applications. As an ill-posed problem, numerous methods have been proposed to reconstruct the missing image details based on exemplars or priors. In this paper, we propose a fast and simple single image super-resolution strategy utilizing patch-wise sigmoid transformation as an imposed sharpening regularization term in the reconstruction, which realizes amazing reconstruction performance. Extensive experiments compared with other state-of-the-art approaches demonstrate the superior effectiveness and efficiency of the proposed algorithm.
Tasks Image Super-Resolution, Super-Resolution
Published 2017-08-23
URL http://arxiv.org/abs/1708.07029v3
PDF http://arxiv.org/pdf/1708.07029v3.pdf
PWC https://paperswithcode.com/paper/fast-single-image-super-resolution-based-on
Repo
Framework

Convergence Analysis of Gradient EM for Multi-component Gaussian Mixture

Title Convergence Analysis of Gradient EM for Multi-component Gaussian Mixture
Authors Bowei Yan, Mingzhang Yin, Purnamrita Sarkar
Abstract In this paper, we study convergence properties of the gradient Expectation-Maximization algorithm \cite{lange1995gradient} for Gaussian Mixture Models for general number of clusters and mixing coefficients. We derive the convergence rate depending on the mixing coefficients, minimum and maximum pairwise distances between the true centers and dimensionality and number of components; and obtain a near-optimal local contraction radius. While there have been some recent notable works that derive local convergence rates for EM in the two equal mixture symmetric GMM, in the more general case, the derivations need structurally different and non-trivial arguments. We use recent tools from learning theory and empirical processes to achieve our theoretical results.
Tasks
Published 2017-05-23
URL http://arxiv.org/abs/1705.08530v2
PDF http://arxiv.org/pdf/1705.08530v2.pdf
PWC https://paperswithcode.com/paper/convergence-analysis-of-gradient-em-for-multi
Repo
Framework

Acoustic data-driven lexicon learning based on a greedy pronunciation selection framework

Title Acoustic data-driven lexicon learning based on a greedy pronunciation selection framework
Authors Xiaohui Zhang, Vimal Manohar, Daniel Povey, Sanjeev Khudanpur
Abstract Speech recognition systems for irregularly-spelled languages like English normally require hand-written pronunciations. In this paper, we describe a system for automatically obtaining pronunciations of words for which pronunciations are not available, but for which transcribed data exists. Our method integrates information from the letter sequence and from the acoustic evidence. The novel aspect of the problem that we address is the problem of how to prune entries from such a lexicon (since, empirically, lexicons with too many entries do not tend to be good for ASR performance). Experiments on various ASR tasks show that, with the proposed framework, starting with an initial lexicon of several thousand words, we are able to learn a lexicon which performs close to a full expert lexicon in terms of WER performance on test data, and is better than lexicons built using G2P alone or with a pruning criterion based on pronunciation probability.
Tasks Speech Recognition
Published 2017-06-12
URL http://arxiv.org/abs/1706.03747v1
PDF http://arxiv.org/pdf/1706.03747v1.pdf
PWC https://paperswithcode.com/paper/acoustic-data-driven-lexicon-learning-based
Repo
Framework

Scatteract: Automated extraction of data from scatter plots

Title Scatteract: Automated extraction of data from scatter plots
Authors Mathieu Cliche, David Rosenberg, Dhruv Madeka, Connie Yee
Abstract Charts are an excellent way to convey patterns and trends in data, but they do not facilitate further modeling of the data or close inspection of individual data points. We present a fully automated system for extracting the numerical values of data points from images of scatter plots. We use deep learning techniques to identify the key components of the chart, and optical character recognition together with robust regression to map from pixels to the coordinate system of the chart. We focus on scatter plots with linear scales, which already have several interesting challenges. Previous work has done fully automatic extraction for other types of charts, but to our knowledge this is the first approach that is fully automatic for scatter plots. Our method performs well, achieving successful data extraction on 89% of the plots in our test set.
Tasks Optical Character Recognition
Published 2017-04-21
URL http://arxiv.org/abs/1704.06687v1
PDF http://arxiv.org/pdf/1704.06687v1.pdf
PWC https://paperswithcode.com/paper/scatteract-automated-extraction-of-data-from
Repo
Framework

TFW, DamnGina, Juvie, and Hotsie-Totsie: On the Linguistic and Social Aspects of Internet Slang

Title TFW, DamnGina, Juvie, and Hotsie-Totsie: On the Linguistic and Social Aspects of Internet Slang
Authors Vivek Kulkarni, William Yang Wang
Abstract Slang is ubiquitous on the Internet. The emergence of new social contexts like micro-blogs, question-answering forums, and social networks has enabled slang and non-standard expressions to abound on the web. Despite this, slang has been traditionally viewed as a form of non-standard language – a form of language that is not the focus of linguistic analysis and has largely been neglected. In this work, we use UrbanDictionary to conduct the first large-scale linguistic analysis of slang and its social aspects on the Internet to yield insights into this variety of language that is increasingly used all over the world online. We begin by computationally analyzing the phonological, morphological and syntactic properties of slang. We then study linguistic patterns in four specific categories of slang namely alphabetisms, blends, clippings, and reduplicatives. Our analysis reveals that slang demonstrates extra-grammatical rules of phonological and morphological formation that markedly distinguish it from the standard form shedding insight into its generative patterns. Next, we analyze the social aspects of slang by studying subject restriction and stereotyping in slang usage. Analyzing tens of thousands of such slang words reveals that the majority of slang on the Internet belongs to two major categories: sex and drugs. We also noted that not only is slang usage not immune to prevalent social biases and prejudices but also reflects such biases and stereotypes more intensely than the standard variety.
Tasks Question Answering
Published 2017-12-22
URL http://arxiv.org/abs/1712.08291v1
PDF http://arxiv.org/pdf/1712.08291v1.pdf
PWC https://paperswithcode.com/paper/tfw-damngina-juvie-and-hotsie-totsie-on-the
Repo
Framework

Sparse Representation Based Augmented Multinomial Logistic Extreme Learning Machine with Weighted Composite Features for Spectral Spatial Hyperspectral Image Classification

Title Sparse Representation Based Augmented Multinomial Logistic Extreme Learning Machine with Weighted Composite Features for Spectral Spatial Hyperspectral Image Classification
Authors Faxian Cao, Zhijing Yang, Jinchang Ren, Wing-Kuen Ling
Abstract Although extreme learning machine (ELM) has been successfully applied to a number of pattern recognition problems, it fails to pro-vide sufficient good results in hyperspectral image (HSI) classification due to two main drawbacks. The first is due to the random weights and bias of ELM, which may lead to ill-posed problems. The second is the lack of spatial information for classification. To tackle these two problems, in this paper, we propose a new framework for ELM based spectral-spatial classification of HSI, where probabilistic modelling with sparse representation and weighted composite features (WCF) are employed respectively to derive the op-timized output weights and extract spatial features. First, the ELM is represented as a concave logarithmic likelihood function under statistical modelling using the maximum a posteriori (MAP). Second, the sparse representation is applied to the Laplacian prior to effi-ciently determine a logarithmic posterior with a unique maximum in order to solve the ill-posed problem of ELM. The variable splitting and the augmented Lagrangian are subsequently used to further reduce the computation complexity of the proposed algorithm and it has been proven a more efficient method for speed improvement. Third, the spatial information is extracted using the weighted compo-site features (WCFs) to construct the spectral-spatial classification framework. In addition, the lower bound of the proposed method is derived by a rigorous mathematical proof. Experimental results on two publicly available HSI data sets demonstrate that the proposed methodology outperforms ELM and a number of state-of-the-art approaches.
Tasks Hyperspectral Image Classification, Image Classification
Published 2017-09-12
URL http://arxiv.org/abs/1709.03792v2
PDF http://arxiv.org/pdf/1709.03792v2.pdf
PWC https://paperswithcode.com/paper/sparse-representation-based-augmented
Repo
Framework

Continual One-Shot Learning of Hidden Spike-Patterns with Neural Network Simulation Expansion and STDP Convergence Predictions

Title Continual One-Shot Learning of Hidden Spike-Patterns with Neural Network Simulation Expansion and STDP Convergence Predictions
Authors Toby Lightheart, Steven Grainger, Tien-Fu Lu
Abstract This paper presents a constructive algorithm that achieves successful one-shot learning of hidden spike-patterns in a competitive detection task. It has previously been shown (Masquelier et al., 2008) that spike-timing-dependent plasticity (STDP) and lateral inhibition can result in neurons competitively tuned to repeating spike-patterns concealed in high rates of overall presynaptic activity. One-shot construction of neurons with synapse weights calculated as estimates of converged STDP outcomes results in immediate selective detection of hidden spike-patterns. The capability of continual learning is demonstrated through the successful one-shot detection of new sets of spike-patterns introduced after long intervals in the simulation time. Simulation expansion (Lightheart et al., 2013) has been proposed as an approach to the development of constructive algorithms that are compatible with simulations of biological neural networks. A simulation of a biological neural network may have orders of magnitude fewer neurons and connections than the related biological neural systems; therefore, simulated neural networks can be assumed to be a subset of a larger neural system. The constructive algorithm is developed using simulation expansion concepts to perform an operation equivalent to the exchange of neurons between the simulation and the larger hypothetical neural system. The dynamic selection of neurons to simulate within a larger neural system (hypothetical or stored in memory) may be a starting point for a wide range of developments and applications in machine learning and the simulation of biology.
Tasks Continual Learning, One-Shot Learning
Published 2017-08-30
URL http://arxiv.org/abs/1708.09072v1
PDF http://arxiv.org/pdf/1708.09072v1.pdf
PWC https://paperswithcode.com/paper/continual-one-shot-learning-of-hidden-spike
Repo
Framework

Comparison of Decision Tree Based Classification Strategies to Detect External Chemical Stimuli from Raw and Filtered Plant Electrical Response

Title Comparison of Decision Tree Based Classification Strategies to Detect External Chemical Stimuli from Raw and Filtered Plant Electrical Response
Authors Shre Kumar Chatterjee, Saptarshi Das, Koushik Maharatna, Elisa Masi, Luisa Santopolo, Ilaria Colzi, Stefano Mancuso, Andrea Vitaletti
Abstract Plants monitor their surrounding environment and control their physiological functions by producing an electrical response. We recorded electrical signals from different plants by exposing them to Sodium Chloride (NaCl), Ozone (O3) and Sulfuric Acid (H2SO4) under laboratory conditions. After applying pre-processing techniques such as filtering and drift removal, we extracted few statistical features from the acquired plant electrical signals. Using these features, combined with different classification algorithms, we used a decision tree based multi-class classification strategy to identify the three different external chemical stimuli. We here present our exploration to obtain the optimum set of ranked feature and classifier combination that can separate a particular chemical stimulus from the incoming stream of plant electrical signals. The paper also reports an exhaustive comparison of similar feature based classification using the filtered and the raw plant signals, containing the high frequency stochastic part and also the low frequency trends present in it, as two different cases for feature extraction. The work, presented in this paper opens up new possibilities for using plant electrical signals to monitor and detect other environmental stimuli apart from NaCl, O3 and H2SO4 in future.
Tasks
Published 2017-05-13
URL http://arxiv.org/abs/1707.07620v1
PDF http://arxiv.org/pdf/1707.07620v1.pdf
PWC https://paperswithcode.com/paper/comparison-of-decision-tree-based
Repo
Framework

Dynamic time warping distance for message propagation classification in Twitter

Title Dynamic time warping distance for message propagation classification in Twitter
Authors Siwar Jendoubi, Arnaud Martin, Ludovic Liétard, Boutheina Ben Yaghlane, Hend Ben Hadji
Abstract Social messages classification is a research domain that has attracted the attention of many researchers in these last years. Indeed, the social message is different from ordinary text because it has some special characteristics like its shortness. Then the development of new approaches for the processing of the social message is now essential to make its classification more efficient. In this paper, we are mainly interested in the classification of social messages based on their spreading on online social networks (OSN). We proposed a new distance metric based on the Dynamic Time Warping distance and we use it with the probabilistic and the evidential k Nearest Neighbors (k-NN) classifiers to classify propagation networks (PrNets) of messages. The propagation network is a directed acyclic graph (DAG) that is used to record propagation traces of the message, the traversed links and their types. We tested the proposed metric with the chosen k-NN classifiers on real world propagation traces that were collected from Twitter social network and we got good classification accuracies.
Tasks
Published 2017-01-26
URL http://arxiv.org/abs/1701.07756v1
PDF http://arxiv.org/pdf/1701.07756v1.pdf
PWC https://paperswithcode.com/paper/dynamic-time-warping-distance-for-message
Repo
Framework

Learning Random Fourier Features by Hybrid Constrained Optimization

Title Learning Random Fourier Features by Hybrid Constrained Optimization
Authors Jianqiao Wangni, Jingwei Zhuo, Jun Zhu
Abstract The kernel embedding algorithm is an important component for adapting kernel methods to large datasets. Since the algorithm consumes a major computation cost in the testing phase, we propose a novel teacher-learner framework of learning computation-efficient kernel embeddings from specific data. In the framework, the high-precision embeddings (teacher) transfer the data information to the computation-efficient kernel embeddings (learner). We jointly select informative embedding functions and pursue an orthogonal transformation between two embeddings. We propose a novel approach of constrained variational expectation maximization (CVEM), where the alternate direction method of multiplier (ADMM) is applied over a nonconvex domain in the maximization step. We also propose two specific formulations based on the prevalent Random Fourier Feature (RFF), the masked and blocked version of Computation-Efficient RFF (CERF), by imposing a random binary mask or a block structure on the transformation matrix. By empirical studies of several applications on different real-world datasets, we demonstrate that the CERF significantly improves the performance of kernel methods upon the RFF, under certain arithmetic operation requirements, and suitable for structured matrix multiplication in Fastfood type algorithms.
Tasks
Published 2017-12-07
URL http://arxiv.org/abs/1712.02527v1
PDF http://arxiv.org/pdf/1712.02527v1.pdf
PWC https://paperswithcode.com/paper/learning-random-fourier-features-by-hybrid
Repo
Framework

Spectrum-based deep neural networks for fraud detection

Title Spectrum-based deep neural networks for fraud detection
Authors Shuhan Yuan, Xintao Wu, Jun Li, Aidong Lu
Abstract In this paper, we focus on fraud detection on a signed graph with only a small set of labeled training data. We propose a novel framework that combines deep neural networks and spectral graph analysis. In particular, we use the node projection (called as spectral coordinate) in the low dimensional spectral space of the graph’s adjacency matrix as input of deep neural networks. Spectral coordinates in the spectral space capture the most useful topology information of the network. Due to the small dimension of spectral coordinates (compared with the dimension of the adjacency matrix derived from a graph), training deep neural networks becomes feasible. We develop and evaluate two neural networks, deep autoencoder and convolutional neural network, in our fraud detection framework. Experimental results on a real signed graph show that our spectrum based deep neural networks are effective in fraud detection.
Tasks Fraud Detection
Published 2017-06-03
URL http://arxiv.org/abs/1706.00891v1
PDF http://arxiv.org/pdf/1706.00891v1.pdf
PWC https://paperswithcode.com/paper/spectrum-based-deep-neural-networks-for-fraud
Repo
Framework

Optimizing scoring function of dynamic programming of pairwise profile alignment using derivative free neural network

Title Optimizing scoring function of dynamic programming of pairwise profile alignment using derivative free neural network
Authors Kazunori D Yamada
Abstract A profile comparison method with position-specific scoring matrix (PSSM) is one of the most accurate alignment methods. Currently, cosine similarity and correlation coefficient are used as scoring functions of dynamic programming to calculate similarity between PSSMs. However, it is unclear that these functions are optimal for profile alignment methods. At least, by definition, these functions cannot capture non-linear relationships between profiles. Therefore, in this study, we attempted to discover a novel scoring function, which was more suitable for the profile comparison method than the existing ones. Firstly we implemented a new derivative free neural network by combining the conventional neural network with evolutionary strategy optimization method. Next, using the framework, the scoring function was optimized for aligning remote sequence pairs. Nepal, the pairwise profile aligner with the novel scoring function significantly improved both alignment sensitivity and precision, compared to aligners with the existing functions. Nepal improved alignment quality because of adaptation to remote sequence alignment and increasing the expressive power of similarity score. The novel scoring function can be realized using a simple matrix operation and easily incorporated into other aligners. With our scoring function, the performance of homology detection and/or multiple sequence alignment for remote homologous sequences would be further improved.
Tasks
Published 2017-08-30
URL http://arxiv.org/abs/1708.09097v2
PDF http://arxiv.org/pdf/1708.09097v2.pdf
PWC https://paperswithcode.com/paper/optimizing-scoring-function-of-dynamic
Repo
Framework
comments powered by Disqus