Paper Group ANR 559
Representation of linguistic form and function in recurrent neural networks. Computer Aided Restoration of Handwritten Character Strokes. Sparse additive Gaussian process with soft interactions. Process Discovery using Inductive Miner and Decomposition. Adapting ELM to Time Series Classification: A Novel Diversified Top-k Shapelets Extraction Metho …
Representation of linguistic form and function in recurrent neural networks
Title | Representation of linguistic form and function in recurrent neural networks |
Authors | Ákos Kádár, Grzegorz Chrupała, Afra Alishahi |
Abstract | We present novel methods for analyzing the activation patterns of RNNs from a linguistic point of view and explore the types of linguistic structure they learn. As a case study, we use a multi-task gated recurrent network architecture consisting of two parallel pathways with shared word embeddings trained on predicting the representations of the visual scene corresponding to an input sentence, and predicting the next word in the same sentence. Based on our proposed method to estimate the amount of contribution of individual tokens in the input to the final prediction of the networks we show that the image prediction pathway: a) is sensitive to the information structure of the sentence b) pays selective attention to lexical categories and grammatical functions that carry semantic information c) learns to treat the same input token differently depending on its grammatical functions in the sentence. In contrast the language model is comparatively more sensitive to words with a syntactic function. Furthermore, we propose methods to ex- plore the function of individual hidden units in RNNs and show that the two pathways of the architecture in our case study contain specialized units tuned to patterns informative for the task, some of which can carry activations to later time steps to encode long-term dependencies. |
Tasks | Language Modelling, Word Embeddings |
Published | 2016-02-29 |
URL | http://arxiv.org/abs/1602.08952v2 |
http://arxiv.org/pdf/1602.08952v2.pdf | |
PWC | https://paperswithcode.com/paper/representation-of-linguistic-form-and |
Repo | |
Framework | |
Computer Aided Restoration of Handwritten Character Strokes
Title | Computer Aided Restoration of Handwritten Character Strokes |
Authors | Barak Sober, David Levin |
Abstract | This work suggests a new variational approach to the task of computer aided restoration of incomplete characters, residing in a highly noisy document. We model character strokes as the movement of a pen with a varying radius. Following this model, a cubic spline representation is being utilized to perform gradient descent steps, while maintaining interpolation at some initial (manually sampled) points. The proposed algorithm was utilized in the process of restoring approximately 1000 ancient Hebrew characters (dating to ca. 8th-7th century BCE), some of which are presented herein and show that the algorithm yields plausible results when applied on deteriorated documents. |
Tasks | |
Published | 2016-02-23 |
URL | http://arxiv.org/abs/1602.07038v2 |
http://arxiv.org/pdf/1602.07038v2.pdf | |
PWC | https://paperswithcode.com/paper/computer-aided-restoration-of-handwritten |
Repo | |
Framework | |
Sparse additive Gaussian process with soft interactions
Title | Sparse additive Gaussian process with soft interactions |
Authors | Garret Vo, Debdeep Pati |
Abstract | Additive nonparametric regression models provide an attractive tool for variable selection in high dimensions when the relationship between the response and predictors is complex. They offer greater flexibility compared to parametric non-linear regression models and better interpretability and scalability than the non-parametric regression models. However, achieving sparsity simultaneously in the number of nonparametric components as well as in the variables within each nonparametric component poses a stiff computational challenge. In this article, we develop a novel Bayesian additive regression model using a combination of hard and soft shrinkages to separately control the number of additive components and the variables within each component. An efficient algorithm is developed to select the importance variables and estimate the interaction network. Excellent performance is obtained in simulated and real data examples. |
Tasks | |
Published | 2016-07-09 |
URL | http://arxiv.org/abs/1607.02670v1 |
http://arxiv.org/pdf/1607.02670v1.pdf | |
PWC | https://paperswithcode.com/paper/sparse-additive-gaussian-process-with-soft |
Repo | |
Framework | |
Process Discovery using Inductive Miner and Decomposition
Title | Process Discovery using Inductive Miner and Decomposition |
Authors | Raji Ghawi |
Abstract | This report presents a submission to the Process Discovery Contest. The contest is dedicated to the assessment of tools and techniques that discover business process models from event logs. The objective is to compare the efficiency of techniques to discover process models that provide a proper balance between “overfitting” and “underfitting”. In the context of the Process Discovery Contest, process discovery is turned into a classification task with a training set and a test set; where a process model needs to decide whether traces are fitting or not. In this report, we first show how we use two discovery techniques, namely: Inductive Miner and Decomposition, to discover process models from the training set using ProM tool. Second, we show how we use replay results to 1) check the rediscoverability of models, and to 2) classify unseen traces (in test logs) as fitting or not. Then, we discuss the classification results of validation logs, the complexity of discovered models, and their impact on the selection of models for submission. The report ends with the pictures of the submitted process models. |
Tasks | |
Published | 2016-10-25 |
URL | http://arxiv.org/abs/1610.07989v1 |
http://arxiv.org/pdf/1610.07989v1.pdf | |
PWC | https://paperswithcode.com/paper/process-discovery-using-inductive-miner-and |
Repo | |
Framework | |
Adapting ELM to Time Series Classification: A Novel Diversified Top-k Shapelets Extraction Method
Title | Adapting ELM to Time Series Classification: A Novel Diversified Top-k Shapelets Extraction Method |
Authors | Qiuyan Yan, Qifa Sun, Xinming Yan |
Abstract | ELM (Extreme Learning Machine) is a single hidden layer feed-forward network, where the weights between input and hidden layer are initialized randomly. ELM is efficient due to its utilization of the analytical approach to compute weights between hidden and output layer. However, ELM still fails to output the semantic classification outcome. To address such limitation, in this paper, we propose a diversified top-k shapelets transform framework, where the shapelets are the subsequences i.e., the best representative and interpretative features of each class. As we identified, the most challenge problems are how to extract the best k shapelets in original candidate sets and how to automatically determine the k value. Specifically, we first define the similar shapelets and diversified top-k shapelets to construct diversity shapelets graph. Then, a novel diversity graph based top-k shapelets extraction algorithm named as \textbf{DivTopkshapelets}\ is proposed to search top-k diversified shapelets. Finally, we propose a shapelets transformed ELM algorithm named as \textbf{DivShapELM} to automatically determine the k value, which is further utilized for time series classification. The experimental results over public data sets demonstrate that the proposed approach significantly outperforms traditional ELM algorithm in terms of effectiveness and efficiency. |
Tasks | Time Series, Time Series Classification |
Published | 2016-06-20 |
URL | http://arxiv.org/abs/1606.05934v1 |
http://arxiv.org/pdf/1606.05934v1.pdf | |
PWC | https://paperswithcode.com/paper/adapting-elm-to-time-series-classification-a |
Repo | |
Framework | |
Unsupervised Learning For Effective User Engagement on Social Media
Title | Unsupervised Learning For Effective User Engagement on Social Media |
Authors | Thai Pham, Camelia Simoiu |
Abstract | In this paper, we investigate the effectiveness of unsupervised feature learning techniques in predicting user engagement on social media. Specifically, we compare two methods to predict the number of feedbacks (i.e., comments) that a blog post is likely to receive. We compare Principal Component Analysis (PCA) and sparse Autoencoder to a baseline method where the data are only centered and scaled, on each of two models: Linear Regression and Regression Tree. We find that unsupervised learning techniques significantly improve the prediction accuracy on both models. For the Linear Regression model, sparse Autoencoder achieves the best result, with an improvement in the root mean squared error (RMSE) on the test set of 42% over the baseline method. For the Regression Tree model, PCA achieves the best result, with an improvement in RMSE of 15% over the baseline. |
Tasks | |
Published | 2016-11-11 |
URL | http://arxiv.org/abs/1611.03894v1 |
http://arxiv.org/pdf/1611.03894v1.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-learning-for-effective-user |
Repo | |
Framework | |
Predicting online extremism, content adopters, and interaction reciprocity
Title | Predicting online extremism, content adopters, and interaction reciprocity |
Authors | Emilio Ferrara, Wen-Qiang Wang, Onur Varol, Alessandro Flammini, Aram Galstyan |
Abstract | We present a machine learning framework that leverages a mixture of metadata, network, and temporal features to detect extremist users, and predict content adopters and interaction reciprocity in social media. We exploit a unique dataset containing millions of tweets generated by more than 25 thousand users who have been manually identified, reported, and suspended by Twitter due to their involvement with extremist campaigns. We also leverage millions of tweets generated by a random sample of 25 thousand regular users who were exposed to, or consumed, extremist content. We carry out three forecasting tasks, (i) to detect extremist users, (ii) to estimate whether regular users will adopt extremist content, and finally (iii) to predict whether users will reciprocate contacts initiated by extremists. All forecasting tasks are set up in two scenarios: a post hoc (time independent) prediction task on aggregated data, and a simulated real-time prediction task. The performance of our framework is extremely promising, yielding in the different forecasting scenarios up to 93% AUC for extremist user detection, up to 80% AUC for content adoption prediction, and finally up to 72% AUC for interaction reciprocity forecasting. We conclude by providing a thorough feature analysis that helps determine which are the emerging signals that provide predictive power in different scenarios. |
Tasks | |
Published | 2016-05-02 |
URL | http://arxiv.org/abs/1605.00659v1 |
http://arxiv.org/pdf/1605.00659v1.pdf | |
PWC | https://paperswithcode.com/paper/predicting-online-extremism-content-adopters |
Repo | |
Framework | |
A new correlation clustering method for cancer mutation analysis
Title | A new correlation clustering method for cancer mutation analysis |
Authors | Jack P. Hou, Amin Emad, Gregory J. Puleo, Jian Ma, Olgica Milenkovic |
Abstract | Cancer genomes exhibit a large number of different alterations that affect many genes in a diverse manner. It is widely believed that these alterations follow combinatorial patterns that have a strong connection with the underlying molecular interaction networks and functional pathways. A better understanding of the generative mechanisms behind the mutation rules and their influence on gene communities is of great importance for the process of driver mutations discovery and for identification of network modules related to cancer development and progression. We developed a new method for cancer mutation pattern analysis based on a constrained form of correlation clustering. Correlation clustering is an agnostic learning method that can be used for general community detection problems in which the number of communities or their structure is not known beforehand. The resulting algorithm, named $C^3$, leverages mutual exclusivity of mutations, patient coverage, and driver network concentration principles; it accepts as its input a user determined combination of heterogeneous patient data, such as that available from TCGA (including mutation, copy number, and gene expression information), and creates a large number of clusters containing mutually exclusive mutated genes in a particular type of cancer. The cluster sizes may be required to obey some useful soft size constraints, without impacting the computational complexity of the algorithm. To test $C^3$, we performed a detailed analysis on TCGA breast cancer and glioblastoma data and showed that our algorithm outperforms the state-of-the-art CoMEt method in terms of discovering mutually exclusive gene modules and identifying driver genes. Our $C^3$ method represents a unique tool for efficient and reliable identification of mutation patterns and driver pathways in large-scale cancer genomics studies. |
Tasks | Community Detection |
Published | 2016-01-25 |
URL | http://arxiv.org/abs/1601.06476v1 |
http://arxiv.org/pdf/1601.06476v1.pdf | |
PWC | https://paperswithcode.com/paper/a-new-correlation-clustering-method-for |
Repo | |
Framework | |
Early Detection of Combustion Instabilities using Deep Convolutional Selective Autoencoders on Hi-speed Flame Video
Title | Early Detection of Combustion Instabilities using Deep Convolutional Selective Autoencoders on Hi-speed Flame Video |
Authors | Adedotun Akintayo, Kin Gwn Lore, Soumalya Sarkar, Soumik Sarkar |
Abstract | This paper proposes an end-to-end convolutional selective autoencoder approach for early detection of combustion instabilities using rapidly arriving flame image frames. The instabilities arising in combustion processes cause significant deterioration and safety issues in various human-engineered systems such as land and air based gas turbine engines. These properties are described as self-sustaining, large amplitude pressure oscillations and show varying spatial scales periodic coherent vortex structure shedding. However, such instability is extremely difficult to detect before a combustion process becomes completely unstable due to its sudden (bifurcation-type) nature. In this context, an autoencoder is trained to selectively mask stable flame and allow unstable flame image frames. In that process, the model learns to identify and extract rich descriptive and explanatory flame shape features. With such a training scheme, the selective autoencoder is shown to be able to detect subtle instability features as a combustion process makes transition from stable to unstable region. As a consequence, the deep learning tool-chain can perform as an early detection framework for combustion instabilities that will have a transformative impact on the safety and performance of modern engines. |
Tasks | |
Published | 2016-03-25 |
URL | http://arxiv.org/abs/1603.07839v1 |
http://arxiv.org/pdf/1603.07839v1.pdf | |
PWC | https://paperswithcode.com/paper/early-detection-of-combustion-instabilities |
Repo | |
Framework | |
Local Discriminant Hyperalignment for multi-subject fMRI data alignment
Title | Local Discriminant Hyperalignment for multi-subject fMRI data alignment |
Authors | Muhammad Yousefnezhad, Daoqiang Zhang |
Abstract | Multivariate Pattern (MVP) classification can map different cognitive states to the brain tasks. One of the main challenges in MVP analysis is validating the generated results across subjects. However, analyzing multi-subject fMRI data requires accurate functional alignments between neuronal activities of different subjects, which can rapidly increase the performance and robustness of the final results. Hyperalignment (HA) is one of the most effective functional alignment methods, which can be mathematically formulated by the Canonical Correlation Analysis (CCA) methods. Since HA mostly uses the unsupervised CCA techniques, its solution may not be optimized for MVP analysis. By incorporating the idea of Local Discriminant Analysis (LDA) into CCA, this paper proposes Local Discriminant Hyperalignment (LDHA) as a novel supervised HA method, which can provide better functional alignment for MVP analysis. Indeed, the locality is defined based on the stimuli categories in the train-set, where the correlation between all stimuli in the same category will be maximized and the correlation between distinct categories of stimuli approaches to near zero. Experimental studies on multi-subject MVP analysis confirm that the LDHA method achieves superior performance to other state-of-the-art HA algorithms. |
Tasks | Multi-Subject Fmri Data Alignment |
Published | 2016-11-25 |
URL | http://arxiv.org/abs/1611.08366v1 |
http://arxiv.org/pdf/1611.08366v1.pdf | |
PWC | https://paperswithcode.com/paper/local-discriminant-hyperalignment-for-multi |
Repo | |
Framework | |
Segmentation Rectification for Video Cutout via One-Class Structured Learning
Title | Segmentation Rectification for Video Cutout via One-Class Structured Learning |
Authors | Junyan Wang, Sai-kit Yeung, Jue Wang, Kun Zhou |
Abstract | Recent works on interactive video object cutout mainly focus on designing dynamic foreground-background (FB) classifiers for segmentation propagation. However, the research on optimally removing errors from the FB classification is sparse, and the errors often accumulate rapidly, causing significant errors in the propagated frames. In this work, we take the initial steps to addressing this problem, and we call this new task \emph{segmentation rectification}. Our key observation is that the possibly asymmetrically distributed false positive and false negative errors were handled equally in the conventional methods. We, alternatively, propose to optimally remove these two types of errors. To this effect, we propose a novel bilayer Markov Random Field (MRF) model for this new task. We also adopt the well-established structured learning framework to learn the optimal model from data. Additionally, we propose a novel one-class structured SVM (OSSVM) which greatly speeds up the structured learning process. Our method naturally extends to RGB-D videos as well. Comprehensive experiments on both RGB and RGB-D data demonstrate that our simple and effective method significantly outperforms the segmentation propagation methods adopted in the state-of-the-art video cutout systems, and the results also suggest the potential usefulness of our method in image cutout system. |
Tasks | |
Published | 2016-02-16 |
URL | http://arxiv.org/abs/1602.04906v1 |
http://arxiv.org/pdf/1602.04906v1.pdf | |
PWC | https://paperswithcode.com/paper/segmentation-rectification-for-video-cutout |
Repo | |
Framework | |
Fast Graph-Based Object Segmentation for RGB-D Images
Title | Fast Graph-Based Object Segmentation for RGB-D Images |
Authors | Giorgio Toscana, Stefano Rosa |
Abstract | Object segmentation is an important capability for robotic systems, in particular for grasping. We present a graph- based approach for the segmentation of simple objects from RGB-D images. We are interested in segmenting objects with large variety in appearance, from lack of texture to strong textures, for the task of robotic grasping. The algorithm does not rely on image features or machine learning. We propose a modified Canny edge detector for extracting robust edges by using depth information and two simple cost functions for combining color and depth cues. The cost functions are used to build an undirected graph, which is partitioned using the concept of internal and external differences between graph regions. The partitioning is fast with O(NlogN) complexity. We also discuss ways to deal with missing depth information. We test the approach on different publicly available RGB-D object datasets, such as the Rutgers APC RGB-D dataset and the RGB-D Object Dataset, and compare the results with other existing methods. |
Tasks | Robotic Grasping, Semantic Segmentation |
Published | 2016-05-12 |
URL | http://arxiv.org/abs/1605.03746v1 |
http://arxiv.org/pdf/1605.03746v1.pdf | |
PWC | https://paperswithcode.com/paper/fast-graph-based-object-segmentation-for-rgb |
Repo | |
Framework | |
Restoring STM images via Sparse Coding: noise and artifact removal
Title | Restoring STM images via Sparse Coding: noise and artifact removal |
Authors | João P. Oliveira, Ana Bragança, José Bioucas-Dias, Mário Figueiredo, Luís Alcácer, Jorge Morgado, Quirina Ferreira |
Abstract | In this article, we present a denoising algorithm to improve the interpretation and quality of scanning tunneling microscopy (STM) images. Given the high level of self-similarity of STM images, we propose a denoising algorithm by reformulating the true estimation problem as a sparse regression, often termed sparse coding. We introduce modifications to the algorithm to cope with the existence of artifacts, mainly dropouts, which appear in a structured way as consecutive line segments on the scanning direction. The resulting algorithm treats the artifacts as missing data, and the estimated values outperform those algorithms that substitute the outliers by a local filtering. We provide code implementations for both Matlab and Gwyddion. |
Tasks | Denoising |
Published | 2016-10-11 |
URL | http://arxiv.org/abs/1610.03437v1 |
http://arxiv.org/pdf/1610.03437v1.pdf | |
PWC | https://paperswithcode.com/paper/restoring-stm-images-via-sparse-coding-noise |
Repo | |
Framework | |
Network-Efficient Distributed Word2vec Training System for Large Vocabularies
Title | Network-Efficient Distributed Word2vec Training System for Large Vocabularies |
Authors | Erik Ordentlich, Lee Yang, Andy Feng, Peter Cnudde, Mihajlo Grbovic, Nemanja Djuric, Vladan Radosavljevic, Gavin Owens |
Abstract | Word2vec is a popular family of algorithms for unsupervised training of dense vector representations of words on large text corpuses. The resulting vectors have been shown to capture semantic relationships among their corresponding words, and have shown promise in reducing a number of natural language processing (NLP) tasks to mathematical operations on these vectors. While heretofore applications of word2vec have centered around vocabularies with a few million words, wherein the vocabulary is the set of words for which vectors are simultaneously trained, novel applications are emerging in areas outside of NLP with vocabularies comprising several 100 million words. Existing word2vec training systems are impractical for training such large vocabularies as they either require that the vectors of all vocabulary words be stored in the memory of a single server or suffer unacceptable training latency due to massive network data transfer. In this paper, we present a novel distributed, parallel training system that enables unprecedented practical training of vectors for vocabularies with several 100 million words on a shared cluster of commodity servers, using far less network traffic than the existing solutions. We evaluate the proposed system on a benchmark dataset, showing that the quality of vectors does not degrade relative to non-distributed training. Finally, for several quarters, the system has been deployed for the purpose of matching queries to ads in Gemini, the sponsored search advertising platform at Yahoo, resulting in significant improvement of business metrics. |
Tasks | |
Published | 2016-06-27 |
URL | http://arxiv.org/abs/1606.08495v1 |
http://arxiv.org/pdf/1606.08495v1.pdf | |
PWC | https://paperswithcode.com/paper/network-efficient-distributed-word2vec |
Repo | |
Framework | |
Flexible Models for Microclustering with Application to Entity Resolution
Title | Flexible Models for Microclustering with Application to Entity Resolution |
Authors | Giacomo Zanella, Brenda Betancourt, Hanna Wallach, Jeffrey Miller, Abbas Zaidi, Rebecca C. Steorts |
Abstract | Most generative models for clustering implicitly assume that the number of data points in each cluster grows linearly with the total number of data points. Finite mixture models, Dirichlet process mixture models, and Pitman–Yor process mixture models make this assumption, as do all other infinitely exchangeable clustering models. However, for some applications, this assumption is inappropriate. For example, when performing entity resolution, the size of each cluster should be unrelated to the size of the data set, and each cluster should contain a negligible fraction of the total number of data points. These applications require models that yield clusters whose sizes grow sublinearly with the size of the data set. We address this requirement by defining the microclustering property and introducing a new class of models that can exhibit this property. We compare models within this class to two commonly used clustering models using four entity-resolution data sets. |
Tasks | Entity Resolution |
Published | 2016-10-31 |
URL | http://arxiv.org/abs/1610.09780v1 |
http://arxiv.org/pdf/1610.09780v1.pdf | |
PWC | https://paperswithcode.com/paper/flexible-models-for-microclustering-with |
Repo | |
Framework | |