Paper Group ANR 161
A Novel Transfer Learning Approach upon Hindi, Arabic, and Bangla Numerals using Convolutional Neural Networks. HPSLPred: An Ensemble Multi-label Classifier for Human Protein Subcellular Location Prediction with Imbalanced Source. Dynamic Clustering Algorithms via Small-Variance Analysis of Markov Chain Mixture Models. Minimally Naturalistic Artifi …
A Novel Transfer Learning Approach upon Hindi, Arabic, and Bangla Numerals using Convolutional Neural Networks
Title | A Novel Transfer Learning Approach upon Hindi, Arabic, and Bangla Numerals using Convolutional Neural Networks |
Authors | Abdul Kawsar Tushar, Akm Ashiquzzaman, Afia Afrin, Md. Rashedul Islam |
Abstract | Increased accuracy in predictive models for handwritten character recognition will open up new frontiers for optical character recognition. Major drawbacks of predictive machine learning models are headed by the elongated training time taken by some models, and the requirement that training and test data be in the same feature space and consist of the same distribution. In this study, these obstacles are minimized by presenting a model for transferring knowledge from one task to another. This model is presented for the recognition of handwritten numerals in Indic languages. The model utilizes convolutional neural networks with backpropagation for error reduction and dropout for data overfitting. The output performance of the proposed neural network is shown to have closely matched other state-of-the-art methods using only a fraction of time used by the state-of-the-arts. |
Tasks | Optical Character Recognition, Transfer Learning |
Published | 2017-07-26 |
URL | http://arxiv.org/abs/1707.08385v1 |
http://arxiv.org/pdf/1707.08385v1.pdf | |
PWC | https://paperswithcode.com/paper/a-novel-transfer-learning-approach-upon-hindi |
Repo | |
Framework | |
HPSLPred: An Ensemble Multi-label Classifier for Human Protein Subcellular Location Prediction with Imbalanced Source
Title | HPSLPred: An Ensemble Multi-label Classifier for Human Protein Subcellular Location Prediction with Imbalanced Source |
Authors | Shixiang Wan, Quan Zou |
Abstract | Predicting the subcellular localization of proteins is an important and challenging problem. Traditional experimental approaches are often expensive and time-consuming. Consequently, a growing number of research efforts employ a series of machine learning approaches to predict the subcellular location of proteins. There are two main challenges among the state-of-the-art prediction methods. First, most of the existing techniques are designed to deal with multi-class rather than multi-label classification, which ignores connections between multiple labels. In reality, multiple locations of particular proteins implies that there are vital and unique biological significances that deserve special focus and cannot be ignored. Second, techniques for handling imbalanced data in multi-label classification problems are necessary, but never employed. For solving these two issues, we have developed an ensemble multi-label classifier called HPSLPred, which can be applied for multi-label classification with an imbalanced protein source. For convenience, a user-friendly webserver has been established at http://server.malab.cn/HPSLPred. |
Tasks | Multi-Label Classification |
Published | 2017-04-18 |
URL | http://arxiv.org/abs/1704.05204v1 |
http://arxiv.org/pdf/1704.05204v1.pdf | |
PWC | https://paperswithcode.com/paper/hpslpred-an-ensemble-multi-label-classifier |
Repo | |
Framework | |
Dynamic Clustering Algorithms via Small-Variance Analysis of Markov Chain Mixture Models
Title | Dynamic Clustering Algorithms via Small-Variance Analysis of Markov Chain Mixture Models |
Authors | Trevor Campbell, Brian Kulis, Jonathan How |
Abstract | Bayesian nonparametrics are a class of probabilistic models in which the model size is inferred from data. A recently developed methodology in this field is small-variance asymptotic analysis, a mathematical technique for deriving learning algorithms that capture much of the flexibility of Bayesian nonparametric inference algorithms, but are simpler to implement and less computationally expensive. Past work on small-variance analysis of Bayesian nonparametric inference algorithms has exclusively considered batch models trained on a single, static dataset, which are incapable of capturing time evolution in the latent structure of the data. This work presents a small-variance analysis of the maximum a posteriori filtering problem for a temporally varying mixture model with a Markov dependence structure, which captures temporally evolving clusters within a dataset. Two clustering algorithms result from the analysis: D-Means, an iterative clustering algorithm for linearly separable, spherical clusters; and SD-Means, a spectral clustering algorithm derived from a kernelized, relaxed version of the clustering problem. Empirical results from experiments demonstrate the advantages of using D-Means and SD-Means over contemporary clustering algorithms, in terms of both computational cost and clustering accuracy. |
Tasks | |
Published | 2017-07-26 |
URL | http://arxiv.org/abs/1707.08493v1 |
http://arxiv.org/pdf/1707.08493v1.pdf | |
PWC | https://paperswithcode.com/paper/dynamic-clustering-algorithms-via-small |
Repo | |
Framework | |
Minimally Naturalistic Artificial Intelligence
Title | Minimally Naturalistic Artificial Intelligence |
Authors | Steven Stenberg Hansen |
Abstract | The rapid advancement of machine learning techniques has re-energized research into general artificial intelligence. While the idea of domain-agnostic meta-learning is appealing, this emerging field must come to terms with its relationship to human cognition and the statistics and structure of the tasks humans perform. The position of this article is that only by aligning our agents’ abilities and environments with those of humans do we stand a chance at developing general artificial intelligence (GAI). A broad reading of the famous ‘No Free Lunch’ theorem is that there is no universally optimal inductive bias or, equivalently, bias-free learning is impossible. This follows from the fact that there are an infinite number of ways to extrapolate data, any of which might be the one used by the data generating environment; an inductive bias prefers some of these extrapolations to others, which lowers performance in environments using these adversarial extrapolations. We may posit that the optimal GAI is the one that maximally exploits the statistics of its environment to create its inductive bias; accepting the fact that this agent is guaranteed to be extremely sub-optimal for some alternative environments. This trade-off appears benign when thinking about the environment as being the physical universe, as performance on any fictive universe is obviously irrelevant. But, we should expect a sharper inductive bias if we further constrain our environment. Indeed, we implicitly do so by defining GAI in terms of accomplishing that humans consider useful. One common version of this is need the for ‘common-sense reasoning’, which implicitly appeals to the statistics of physical universe as perceived by humans. |
Tasks | Common Sense Reasoning, Meta-Learning |
Published | 2017-01-14 |
URL | http://arxiv.org/abs/1701.03868v1 |
http://arxiv.org/pdf/1701.03868v1.pdf | |
PWC | https://paperswithcode.com/paper/minimally-naturalistic-artificial |
Repo | |
Framework | |
Semi-supervised Embedding in Attributed Networks with Outliers
Title | Semi-supervised Embedding in Attributed Networks with Outliers |
Authors | Jiongqian Liang, Peter Jacobs, Jiankai Sun, Srinivasan Parthasarathy |
Abstract | In this paper, we propose a novel framework, called Semi-supervised Embedding in Attributed Networks with Outliers (SEANO), to learn a low-dimensional vector representation that systematically captures the topological proximity, attribute affinity and label similarity of vertices in a partially labeled attributed network (PLAN). Our method is designed to work in both transductive and inductive settings while explicitly alleviating noise effects from outliers. Experimental results on various datasets drawn from the web, text and image domains demonstrate the advantages of SEANO over state-of-the-art methods in semi-supervised classification under transductive as well as inductive settings. We also show that a subset of parameters in SEANO is interpretable as outlier score and can significantly outperform baseline methods when applied for detecting network outliers. Finally, we present the use of SEANO in a challenging real-world setting – flood mapping of satellite images and show that it is able to outperform modern remote sensing algorithms for this task. |
Tasks | |
Published | 2017-03-23 |
URL | http://arxiv.org/abs/1703.08100v4 |
http://arxiv.org/pdf/1703.08100v4.pdf | |
PWC | https://paperswithcode.com/paper/semi-supervised-embedding-in-attributed |
Repo | |
Framework | |
The Scaling Limit of High-Dimensional Online Independent Component Analysis
Title | The Scaling Limit of High-Dimensional Online Independent Component Analysis |
Authors | Chuang Wang, Yue M. Lu |
Abstract | We analyze the dynamics of an online algorithm for independent component analysis in the high-dimensional scaling limit. As the ambient dimension tends to infinity, and with proper time scaling, we show that the time-varying joint empirical measure of the target feature vector and the estimates provided by the algorithm will converge weakly to a deterministic measured-valued process that can be characterized as the unique solution of a nonlinear PDE. Numerical solutions of this PDE, which involves two spatial variables and one time variable, can be efficiently obtained. These solutions provide detailed information about the performance of the ICA algorithm, as many practical performance metrics are functionals of the joint empirical measures. Numerical simulations show that our asymptotic analysis is accurate even for moderate dimensions. In addition to providing a tool for understanding the performance of the algorithm, our PDE analysis also provides useful insight. In particular, in the high-dimensional limit, the original coupled dynamics associated with the algorithm will be asymptotically “decoupled”, with each coordinate independently solving a 1-D effective minimization problem via stochastic gradient descent. Exploiting this insight to design new algorithms for achieving optimal trade-offs between computational and statistical efficiency may prove an interesting line of future research. |
Tasks | |
Published | 2017-10-15 |
URL | http://arxiv.org/abs/1710.05384v2 |
http://arxiv.org/pdf/1710.05384v2.pdf | |
PWC | https://paperswithcode.com/paper/the-scaling-limit-of-high-dimensional-online |
Repo | |
Framework | |
Eigendecompositions of Transfer Operators in Reproducing Kernel Hilbert Spaces
Title | Eigendecompositions of Transfer Operators in Reproducing Kernel Hilbert Spaces |
Authors | Stefan Klus, Ingmar Schuster, Krikamol Muandet |
Abstract | Transfer operators such as the Perron–Frobenius or Koopman operator play an important role in the global analysis of complex dynamical systems. The eigenfunctions of these operators can be used to detect metastable sets, to project the dynamics onto the dominant slow processes, or to separate superimposed signals. We extend transfer operator theory to reproducing kernel Hilbert spaces and show that these operators are related to Hilbert space representations of conditional distributions, known as conditional mean embeddings in the machine learning community. Moreover, numerical methods to compute empirical estimates of these embeddings are akin to data-driven methods for the approximation of transfer operators such as extended dynamic mode decomposition and its variants. One main benefit of the presented kernel-based approaches is that these methods can be applied to any domain where a similarity measure given by a kernel is available. We illustrate the results with the aid of guiding examples and highlight potential applications in molecular dynamics as well as video and text data analysis. |
Tasks | |
Published | 2017-12-05 |
URL | https://arxiv.org/abs/1712.01572v3 |
https://arxiv.org/pdf/1712.01572v3.pdf | |
PWC | https://paperswithcode.com/paper/eigendecompositions-of-transfer-operators-in |
Repo | |
Framework | |
Automatic Discovery, Association Estimation and Learning of Semantic Attributes for a Thousand Categories
Title | Automatic Discovery, Association Estimation and Learning of Semantic Attributes for a Thousand Categories |
Authors | Ziad Al-Halah, Rainer Stiefelhagen |
Abstract | Attribute-based recognition models, due to their impressive performance and their ability to generalize well on novel categories, have been widely adopted for many computer vision applications. However, usually both the attribute vocabulary and the class-attribute associations have to be provided manually by domain experts or large number of annotators. This is very costly and not necessarily optimal regarding recognition performance, and most importantly, it limits the applicability of attribute-based models to large scale data sets. To tackle this problem, we propose an end-to-end unsupervised attribute learning approach. We utilize online text corpora to automatically discover a salient and discriminative vocabulary that correlates well with the human concept of semantic attributes. Moreover, we propose a deep convolutional model to optimize class-attribute associations with a linguistic prior that accounts for noise and missing data in text. In a thorough evaluation on ImageNet, we demonstrate that our model is able to efficiently discover and learn semantic attributes at a large scale. Furthermore, we demonstrate that our model outperforms the state-of-the-art in zero-shot learning on three data sets: ImageNet, Animals with Attributes and aPascal/aYahoo. Finally, we enable attribute-based learning on ImageNet and will share the attributes and associations for future research. |
Tasks | Zero-Shot Learning |
Published | 2017-04-12 |
URL | http://arxiv.org/abs/1704.03607v1 |
http://arxiv.org/pdf/1704.03607v1.pdf | |
PWC | https://paperswithcode.com/paper/automatic-discovery-association-estimation |
Repo | |
Framework | |
Really? Well. Apparently Bootstrapping Improves the Performance of Sarcasm and Nastiness Classifiers for Online Dialogue
Title | Really? Well. Apparently Bootstrapping Improves the Performance of Sarcasm and Nastiness Classifiers for Online Dialogue |
Authors | Stephanie Lukin, Marilyn Walker |
Abstract | More and more of the information on the web is dialogic, from Facebook newsfeeds, to forum conversations, to comment threads on news articles. In contrast to traditional, monologic Natural Language Processing resources such as news, highly social dialogue is frequent in social media, making it a challenging context for NLP. This paper tests a bootstrapping method, originally proposed in a monologic domain, to train classifiers to identify two different types of subjective language in dialogue: sarcasm and nastiness. We explore two methods of developing linguistic indicators to be used in a first level classifier aimed at maximizing precision at the expense of recall. The best performing classifier for the first phase achieves 54% precision and 38% recall for sarcastic utterances. We then use general syntactic patterns from previous work to create more general sarcasm indicators, improving precision to 62% and recall to 52%. To further test the generality of the method, we then apply it to bootstrapping a classifier for nastiness dialogic acts. Our first phase, using crowdsourced nasty indicators, achieves 58% precision and 49% recall, which increases to 75% precision and 62% recall when we bootstrap over the first level with generalized syntactic patterns. |
Tasks | |
Published | 2017-08-29 |
URL | http://arxiv.org/abs/1708.08572v1 |
http://arxiv.org/pdf/1708.08572v1.pdf | |
PWC | https://paperswithcode.com/paper/really-well-apparently-bootstrapping-improves |
Repo | |
Framework | |
DeepKSPD: Learning Kernel-matrix-based SPD Representation for Fine-grained Image Recognition
Title | DeepKSPD: Learning Kernel-matrix-based SPD Representation for Fine-grained Image Recognition |
Authors | Melih Engin, Lei Wang, Luping Zhou, Xinwang Liu |
Abstract | Being symmetric positive-definite (SPD), covariance matrix has traditionally been used to represent a set of local descriptors in visual recognition. Recent study shows that kernel matrix can give considerably better representation by modelling the nonlinearity in the local descriptor set. Nevertheless, neither the descriptors nor the kernel matrix is deeply learned. Worse, they are considered separately, hindering the pursuit of an optimal SPD representation. This work proposes a deep network that jointly learns local descriptors, kernel-matrix-based SPD representation, and the classifier via an end-to-end training process. We derive the derivatives for the mapping from a local descriptor set to the SPD representation to carry out backpropagation. Also, we exploit the Daleckii-Krein formula in operator theory to give a concise and unified result on differentiating SPD matrix functions, including the matrix logarithm to handle the Riemannian geometry of kernel matrix. Experiments not only show the superiority of kernel-matrix-based SPD representation with deep local descriptors, but also verify the advantage of the proposed deep network in pursuing better SPD representations for fine-grained image recognition tasks. |
Tasks | Fine-Grained Image Recognition |
Published | 2017-11-11 |
URL | http://arxiv.org/abs/1711.04047v1 |
http://arxiv.org/pdf/1711.04047v1.pdf | |
PWC | https://paperswithcode.com/paper/deepkspd-learning-kernel-matrix-based-spd |
Repo | |
Framework | |
Word Sense Disambiguation with LSTM: Do We Really Need 100 Billion Words?
Title | Word Sense Disambiguation with LSTM: Do We Really Need 100 Billion Words? |
Authors | Minh Le, Marten Postma, Jacopo Urbani |
Abstract | Recently, Yuan et al. (2016) have shown the effectiveness of using Long Short-Term Memory (LSTM) for performing Word Sense Disambiguation (WSD). Their proposed technique outperformed the previous state-of-the-art with several benchmarks, but neither the training data nor the source code was released. This paper presents the results of a reproduction study of this technique using only openly available datasets (GigaWord, SemCore, OMSTI) and software (TensorFlow). From them, it emerged that state-of-the-art results can be obtained with much less data than hinted by Yuan et al. All code and trained models are made freely available. |
Tasks | Word Sense Disambiguation |
Published | 2017-12-09 |
URL | http://arxiv.org/abs/1712.03376v2 |
http://arxiv.org/pdf/1712.03376v2.pdf | |
PWC | https://paperswithcode.com/paper/word-sense-disambiguation-with-lstm-do-we |
Repo | |
Framework | |
Addressing Cross-Lingual Word Sense Disambiguation on Low-Density Languages: Application to Persian
Title | Addressing Cross-Lingual Word Sense Disambiguation on Low-Density Languages: Application to Persian |
Authors | Navid Rekabsaz, Mihai Lupu, Allan Hanbury, Andres Duque |
Abstract | We explore the use of unsupervised methods in Cross-Lingual Word Sense Disambiguation (CL-WSD) with the application of English to Persian. Our proposed approach targets the languages with scarce resources (low-density) by exploiting word embedding and semantic similarity of the words in context. We evaluate the approach on a recent evaluation benchmark and compare it with the state-of-the-art unsupervised system (CO-Graph). The results show that our approach outperforms both the standard baseline and the CO-Graph system in both of the task evaluation metrics (Out-Of-Five and Best result). |
Tasks | Semantic Similarity, Semantic Textual Similarity, Word Sense Disambiguation |
Published | 2017-11-16 |
URL | http://arxiv.org/abs/1711.06196v3 |
http://arxiv.org/pdf/1711.06196v3.pdf | |
PWC | https://paperswithcode.com/paper/addressing-cross-lingual-word-sense |
Repo | |
Framework | |
Deep Adaptive Feature Embedding with Local Sample Distributions for Person Re-identification
Title | Deep Adaptive Feature Embedding with Local Sample Distributions for Person Re-identification |
Authors | Lin Wu, Yang Wang, Junbin Gao, Xue Li |
Abstract | Person re-identification (re-id) aims to match pedestrians observed by disjoint camera views. It attracts increasing attention in computer vision due to its importance to surveillance system. To combat the major challenge of cross-view visual variations, deep embedding approaches are proposed by learning a compact feature space from images such that the Euclidean distances correspond to their cross-view similarity metric. However, the global Euclidean distance cannot faithfully characterize the ideal similarity in a complex visual feature space because features of pedestrian images exhibit unknown distributions due to large variations in poses, illumination and occlusion. Moreover, intra-personal training samples within a local range are robust to guide deep embedding against uncontrolled variations, which however, cannot be captured by a global Euclidean distance. In this paper, we study the problem of person re-id by proposing a novel sampling to mine suitable \textit{positives} (i.e. intra-class) within a local range to improve the deep embedding in the context of large intra-class variations. Our method is capable of learning a deep similarity metric adaptive to local sample structure by minimizing each sample’s local distances while propagating through the relationship between samples to attain the whole intra-class minimization. To this end, a novel objective function is proposed to jointly optimize similarity metric learning, local positive mining and robust deep embedding. This yields local discriminations by selecting local-ranged positive samples, and the learned features are robust to dramatic intra-class variations. Experiments on benchmarks show state-of-the-art results achieved by our method. |
Tasks | Metric Learning, Person Re-Identification |
Published | 2017-06-10 |
URL | http://arxiv.org/abs/1706.03160v2 |
http://arxiv.org/pdf/1706.03160v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-adaptive-feature-embedding-with-local |
Repo | |
Framework | |
A Continuously Growing Dataset of Sentential Paraphrases
Title | A Continuously Growing Dataset of Sentential Paraphrases |
Authors | Wuwei Lan, Siyu Qiu, Hua He, Wei Xu |
Abstract | A major challenge in paraphrase research is the lack of parallel corpora. In this paper, we present a new method to collect large-scale sentential paraphrases from Twitter by linking tweets through shared URLs. The main advantage of our method is its simplicity, as it gets rid of the classifier or human in the loop needed to select data before annotation and subsequent application of paraphrase identification algorithms in the previous work. We present the largest human-labeled paraphrase corpus to date of 51,524 sentence pairs and the first cross-domain benchmarking for automatic paraphrase identification. In addition, we show that more than 30,000 new sentential paraphrases can be easily and continuously captured every month at ~70% precision, and demonstrate their utility for downstream NLP tasks through phrasal paraphrase extraction. We make our code and data freely available. |
Tasks | Paraphrase Identification |
Published | 2017-08-01 |
URL | http://arxiv.org/abs/1708.00391v1 |
http://arxiv.org/pdf/1708.00391v1.pdf | |
PWC | https://paperswithcode.com/paper/a-continuously-growing-dataset-of-sentential |
Repo | |
Framework | |
Generative Mixture of Networks
Title | Generative Mixture of Networks |
Authors | Ershad Banijamali, Ali Ghodsi, Pascal Poupart |
Abstract | A generative model based on training deep architectures is proposed. The model consists of K networks that are trained together to learn the underlying distribution of a given data set. The process starts with dividing the input data into K clusters and feeding each of them into a separate network. After few iterations of training networks separately, we use an EM-like algorithm to train the networks together and update the clusters of the data. We call this model Mixture of Networks. The provided model is a platform that can be used for any deep structure and be trained by any conventional objective function for distribution modeling. As the components of the model are neural networks, it has high capability in characterizing complicated data distributions as well as clustering data. We apply the algorithm on MNIST hand-written digits and Yale face datasets. We also demonstrate the clustering ability of the model using some real-world and toy examples. |
Tasks | |
Published | 2017-02-10 |
URL | http://arxiv.org/abs/1702.03307v1 |
http://arxiv.org/pdf/1702.03307v1.pdf | |
PWC | https://paperswithcode.com/paper/generative-mixture-of-networks |
Repo | |
Framework | |