Paper Group ANR 308
The observer-assisted method for adjusting hyper-parameters in deep learning algorithms. Training LDCRF model on unsegmented sequences using Connectionist Temporal Classification. Is a picture worth a thousand words? A Deep Multi-Modal Fusion Architecture for Product Classification in e-commerce. Document Clustering Games in Static and Dynamic Scen …
The observer-assisted method for adjusting hyper-parameters in deep learning algorithms
Title | The observer-assisted method for adjusting hyper-parameters in deep learning algorithms |
Authors | Maciej Wielgosz |
Abstract | This paper presents a concept of a novel method for adjusting hyper-parameters in Deep Learning (DL) algorithms. An external agent-observer monitors a performance of a selected Deep Learning algorithm. The observer learns to model the DL algorithm using a series of random experiments. Consequently, it may be used for predicting a response of the DL algorithm in terms of a selected quality measurement to a set of hyper-parameters. This allows to construct an ensemble composed of a series of evaluators which constitute an observer-assisted architecture. The architecture may be used to gradually iterate towards to the best achievable quality score in tiny steps governed by a unit of progress. The algorithm is stopped when the maximum number of steps is reached or no further progress is made. |
Tasks | |
Published | 2016-11-30 |
URL | http://arxiv.org/abs/1611.10328v1 |
http://arxiv.org/pdf/1611.10328v1.pdf | |
PWC | https://paperswithcode.com/paper/the-observer-assisted-method-for-adjusting |
Repo | |
Framework | |
Training LDCRF model on unsegmented sequences using Connectionist Temporal Classification
Title | Training LDCRF model on unsegmented sequences using Connectionist Temporal Classification |
Authors | Amir Ahooye Atashin, Kamaledin Ghiasi-Shirazi, Ahad Harati |
Abstract | Many machine learning problems such as speech recognition, gesture recognition, and handwriting recognition are concerned with simultaneous segmentation and labeling of sequence data. Latent-dynamic conditional random field (LDCRF) is a well-known discriminative method that has been successfully used for this task. However, LDCRF can only be trained with pre-segmented data sequences in which the label of each frame is available apriori. In the realm of neural networks, the invention of connectionist temporal classification (CTC) made it possible to train recurrent neural networks on unsegmented sequences with great success. In this paper, we use CTC to train an LDCRF model on unsegmented sequences. Experimental results on two gesture recognition tasks show that the proposed method outperforms LDCRFs, hidden Markov models, and conditional random fields. |
Tasks | Gesture Recognition, Speech Recognition |
Published | 2016-06-26 |
URL | http://arxiv.org/abs/1606.08051v3 |
http://arxiv.org/pdf/1606.08051v3.pdf | |
PWC | https://paperswithcode.com/paper/training-ldcrf-model-on-unsegmented-sequences |
Repo | |
Framework | |
Is a picture worth a thousand words? A Deep Multi-Modal Fusion Architecture for Product Classification in e-commerce
Title | Is a picture worth a thousand words? A Deep Multi-Modal Fusion Architecture for Product Classification in e-commerce |
Authors | Tom Zahavy, Alessandro Magnani, Abhinandan Krishnan, Shie Mannor |
Abstract | Classifying products into categories precisely and efficiently is a major challenge in modern e-commerce. The high traffic of new products uploaded daily and the dynamic nature of the categories raise the need for machine learning models that can reduce the cost and time of human editors. In this paper, we propose a decision level fusion approach for multi-modal product classification using text and image inputs. We train input specific state-of-the-art deep neural networks for each input source, show the potential of forging them together into a multi-modal architecture and train a novel policy network that learns to choose between them. Finally, we demonstrate that our multi-modal network improves the top-1 accuracy % over both networks on a real-world large-scale product classification dataset that we collected fromWalmart.com. While we focus on image-text fusion that characterizes e-commerce domains, our algorithms can be easily applied to other modalities such as audio, video, physical sensors, etc. |
Tasks | |
Published | 2016-11-29 |
URL | http://arxiv.org/abs/1611.09534v1 |
http://arxiv.org/pdf/1611.09534v1.pdf | |
PWC | https://paperswithcode.com/paper/is-a-picture-worth-a-thousand-words-a-deep |
Repo | |
Framework | |
Document Clustering Games in Static and Dynamic Scenarios
Title | Document Clustering Games in Static and Dynamic Scenarios |
Authors | Rocco Tripodi, Marcello Pelillo |
Abstract | In this work we propose a game theoretic model for document clustering. Each document to be clustered is represented as a player and each cluster as a strategy. The players receive a reward interacting with other players that they try to maximize choosing their best strategies. The geometry of the data is modeled with a weighted graph that encodes the pairwise similarity among documents, so that similar players are constrained to choose similar strategies, updating their strategy preferences at each iteration of the games. We used different approaches to find the prototypical elements of the clusters and with this information we divided the players into two disjoint sets, one collecting players with a definite strategy and the other one collecting players that try to learn from others the correct strategy to play. The latter set of players can be considered as new data points that have to be clustered according to previous information. This representation is useful in scenarios in which the data are streamed continuously. The evaluation of the system was conducted on 13 document datasets using different settings. It shows that the proposed method performs well compared to different document clustering algorithms. |
Tasks | |
Published | 2016-07-08 |
URL | http://arxiv.org/abs/1607.02436v1 |
http://arxiv.org/pdf/1607.02436v1.pdf | |
PWC | https://paperswithcode.com/paper/document-clustering-games-in-static-and |
Repo | |
Framework | |
Automatic Identification of Scenedesmus Polymorphic Microalgae from Microscopic Images
Title | Automatic Identification of Scenedesmus Polymorphic Microalgae from Microscopic Images |
Authors | Jhony-Heriberto Giraldo-Zuluaga, Geman Diez, Alexander Gomez, Tatiana Martinez, Mariana Peñuela Vasquez, Jesus Francisco Vargas Bonilla, Augusto Salazar |
Abstract | Microalgae counting is used to measure biomass quantity. Usually, it is performed in a manual way using a Neubauer chamber and expert criterion, with the risk of a high error rate. This paper addresses the methodology for automatic identification of Scenedesmus microalgae (used in the methane production and food industry) and applies it to images captured by a digital microscope. The use of contrast adaptive histogram equalization for pre-processing, and active contours for segmentation are presented. The calculation of statistical features (Histogram of Oriented Gradients, Hu and Zernike moments) with texture features (Haralick and Local Binary Patterns descriptors) are proposed for algae characterization. Scenedesmus algae can build coenobia consisting of 1, 2, 4 and 8 cells. The amount of algae of each coenobium helps to determine the amount of lipids, proteins, and other substances in a given sample of a algae crop. The knowledge of the quantity of those elements improves the quality of bioprocess applications. Classification of coenobia achieves accuracies of 98.63% and 97.32% with Support Vector Machine (SVM) and Artificial Neural Network (ANN), respectively. According to the results it is possible to consider the proposed methodology as an alternative to the traditional technique for algae counting. The database used in this paper is publicly available for download. |
Tasks | |
Published | 2016-12-21 |
URL | http://arxiv.org/abs/1612.07379v2 |
http://arxiv.org/pdf/1612.07379v2.pdf | |
PWC | https://paperswithcode.com/paper/automatic-identification-of-scenedesmus |
Repo | |
Framework | |
Robust Ensemble Classifier Combination Based on Noise Removal with One-Class SVM
Title | Robust Ensemble Classifier Combination Based on Noise Removal with One-Class SVM |
Authors | Ferhat Özgür Çatak |
Abstract | In machine learning area, as the number of labeled input samples becomes very large, it is very difficult to build a classification model because of input data set is not fit in a memory in training phase of the algorithm, therefore, it is necessary to utilize data partitioning to handle overall data set. Bagging and boosting based data partitioning methods have been broadly used in data mining and pattern recognition area. Both of these methods have shown a great possibility for improving classification model performance. This study is concerned with the analysis of data set partitioning with noise removal and its impact on the performance of multiple classifier models. In this study, we propose noise filtering preprocessing at each data set partition to increment classifier model performance. We applied Gini impurity approach to find the best split percentage of noise filter ratio. The filtered sub data set is then used to train individual ensemble models. |
Tasks | |
Published | 2016-02-09 |
URL | http://arxiv.org/abs/1602.02888v1 |
http://arxiv.org/pdf/1602.02888v1.pdf | |
PWC | https://paperswithcode.com/paper/robust-ensemble-classifier-combination-based |
Repo | |
Framework | |
Sub-Sampled Newton Methods II: Local Convergence Rates
Title | Sub-Sampled Newton Methods II: Local Convergence Rates |
Authors | Farbod Roosta-Khorasani, Michael W. Mahoney |
Abstract | Many data-fitting applications require the solution of an optimization problem involving a sum of large number of functions of high dimensional parameter. Here, we consider the problem of minimizing a sum of $n$ functions over a convex constraint set $\mathcal{X} \subseteq \mathbb{R}^{p}$ where both $n$ and $p$ are large. In such problems, sub-sampling as a way to reduce $n$ can offer great amount of computational efficiency. Within the context of second order methods, we first give quantitative local convergence results for variants of Newton’s method where the Hessian is uniformly sub-sampled. Using random matrix concentration inequalities, one can sub-sample in a way that the curvature information is preserved. Using such sub-sampling strategy, we establish locally Q-linear and Q-superlinear convergence rates. We also give additional convergence results for when the sub-sampled Hessian is regularized by modifying its spectrum or Levenberg-type regularization. Finally, in addition to Hessian sub-sampling, we consider sub-sampling the gradient as way to further reduce the computational complexity per iteration. We use approximate matrix multiplication results from randomized numerical linear algebra (RandNLA) to obtain the proper sampling strategy and we establish locally R-linear convergence rates. In such a setting, we also show that a very aggressive sample size increase results in a R-superlinearly convergent algorithm. While the sample size depends on the condition number of the problem, our convergence rates are problem-independent, i.e., they do not depend on the quantities related to the problem. Hence, our analysis here can be used to complement the results of our basic framework from the companion paper, [38], by exploring algorithmic trade-offs that are important in practice. |
Tasks | |
Published | 2016-01-18 |
URL | http://arxiv.org/abs/1601.04738v3 |
http://arxiv.org/pdf/1601.04738v3.pdf | |
PWC | https://paperswithcode.com/paper/sub-sampled-newton-methods-ii-local |
Repo | |
Framework | |
Provable learning of Noisy-or Networks
Title | Provable learning of Noisy-or Networks |
Authors | Sanjeev Arora, Rong Ge, Tengyu Ma, Andrej Risteski |
Abstract | Many machine learning applications use latent variable models to explain structure in data, whereby visible variables (= coordinates of the given datapoint) are explained as a probabilistic function of some hidden variables. Finding parameters with the maximum likelihood is NP-hard even in very simple settings. In recent years, provably efficient algorithms were nevertheless developed for models with linear structures: topic models, mixture models, hidden markov models, etc. These algorithms use matrix or tensor decomposition, and make some reasonable assumptions about the parameters of the underlying model. But matrix or tensor decomposition seems of little use when the latent variable model has nonlinearities. The current paper shows how to make progress: tensor decomposition is applied for learning the single-layer {\em noisy or} network, which is a textbook example of a Bayes net, and used for example in the classic QMR-DT software for diagnosing which disease(s) a patient may have by observing the symptoms he/she exhibits. The technical novelty here, which should be useful in other settings in future, is analysis of tensor decomposition in presence of systematic error (i.e., where the noise/error is correlated with the signal, and doesn’t decrease as number of samples goes to infinity). This requires rethinking all steps of tensor decomposition methods from the ground up. For simplicity our analysis is stated assuming that the network parameters were chosen from a probability distribution but the method seems more generally applicable. |
Tasks | Latent Variable Models, Topic Models |
Published | 2016-12-28 |
URL | http://arxiv.org/abs/1612.08795v1 |
http://arxiv.org/pdf/1612.08795v1.pdf | |
PWC | https://paperswithcode.com/paper/provable-learning-of-noisy-or-networks |
Repo | |
Framework | |
A Geometrical Approach to Topic Model Estimation
Title | A Geometrical Approach to Topic Model Estimation |
Authors | Zheng Tracy Ke |
Abstract | In the probabilistic topic models, the quantity of interest—a low-rank matrix consisting of topic vectors—is hidden in the text corpus matrix, masked by noise, and the Singular Value Decomposition (SVD) is a potentially useful tool for learning such a low-rank matrix. However, the connection between this low-rank matrix and the singular vectors of the text corpus matrix are usually complicated and hard to spell out, so how to use SVD for learning topic models faces challenges. In this paper, we overcome the challenge by revealing a surprising insight: there is a low-dimensional simplex structure which can be viewed as a bridge between the low-rank matrix of interest and the SVD of the text corpus matrix, and allows us to conveniently reconstruct the former using the latter. Such an insight motivates a new SVD approach to learning topic models, which we analyze with delicate random matrix theory and derive the rate of convergence. We support our methods and theory numerically, using both simulated data and real data. |
Tasks | Topic Models |
Published | 2016-08-16 |
URL | http://arxiv.org/abs/1608.04478v1 |
http://arxiv.org/pdf/1608.04478v1.pdf | |
PWC | https://paperswithcode.com/paper/a-geometrical-approach-to-topic-model |
Repo | |
Framework | |
Multi Model Data mining approach for Heart failure prediction
Title | Multi Model Data mining approach for Heart failure prediction |
Authors | Priyanka H U, Vivek R |
Abstract | Developing predictive modelling solutions for risk estimation is extremely challenging in health-care informatics. Risk estimation involves integration of heterogeneous clinical sources having different representation from different health-care provider making the task increasingly complex. Such sources are typically voluminous, diverse, and significantly change over the time. Therefore, distributed and parallel computing tools collectively termed big data tools are in need which can synthesize and assist the physician to make right clinical decisions. In this work we propose multi-model predictive architecture, a novel approach for combining the predictive ability of multiple models for better prediction accuracy. We demonstrate the effectiveness and efficiency of the proposed work on data from Framingham Heart study. Results show that the proposed multi-model predictive architecture is able to provide better accuracy than best model approach. By modelling the error of predictive models we are able to choose sub set of models which yields accurate results. More information was modelled into system by multi-level mining which has resulted in enhanced predictive accuracy. |
Tasks | |
Published | 2016-09-29 |
URL | http://arxiv.org/abs/1609.09194v1 |
http://arxiv.org/pdf/1609.09194v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-model-data-mining-approach-for-heart |
Repo | |
Framework | |
Stylometric Analysis of Early Modern Period English Plays
Title | Stylometric Analysis of Early Modern Period English Plays |
Authors | Mark Eisen, Santiago Segarra, Gabriel Egan, Alejandro Ribeiro |
Abstract | Function word adjacency networks (WANs) are used to study the authorship of plays from the Early Modern English period. In these networks, nodes are function words and directed edges between two nodes represent the relative frequency of directed co-appearance of the two words. For every analyzed play, a WAN is constructed and these are aggregated to generate author profile networks. We first study the similarity of writing styles between Early English playwrights by comparing the profile WANs. The accuracy of using WANs for authorship attribution is then demonstrated by attributing known plays among six popular playwrights. Moreover, the WAN method is shown to outperform other frequency-based methods on attributing Early English plays. In addition, WANs are shown to be reliable classifiers even when attributing collaborative plays. For several plays of disputed co-authorship, a deeper analysis is performed by attributing every act and scene separately, in which we both corroborate existing breakdowns and provide evidence of new assignments. |
Tasks | |
Published | 2016-10-18 |
URL | http://arxiv.org/abs/1610.05670v2 |
http://arxiv.org/pdf/1610.05670v2.pdf | |
PWC | https://paperswithcode.com/paper/stylometric-analysis-of-early-modern-period |
Repo | |
Framework | |
Resource Allocation in a MAC with and without security via Game Theoretic Learning
Title | Resource Allocation in a MAC with and without security via Game Theoretic Learning |
Authors | Shahid Mehraj Shah, Krishna Chaitanya A, Vinod Sharma |
Abstract | In this paper a $K$-user fading multiple access channel with and without security constraints is studied. First we consider a F-MAC without the security constraints. Under the assumption of individual CSI of users, we propose the problem of power allocation as a stochastic game when the receiver sends an ACK or a NACK depending on whether it was able to decode the message or not. We have used Multiplicative weight no-regret algorithm to obtain a Coarse Correlated Equilibrium (CCE). Then we consider the case when the users can decode ACK/NACK of each other. In this scenario we provide an algorithm to maximize the weighted sum-utility of all the users and obtain a Pareto optimal point. PP is socially optimal but may be unfair to individual users. Next we consider the case where the users can cooperate with each other so as to disagree with the policy which will be unfair to individual user. We then obtain a Nash bargaining solution, which in addition to being Pareto optimal, is also fair to each user. Next we study a $K$-user fading multiple access wiretap Channel with CSI of Eve available to the users. We use the previous algorithms to obtain a CCE, PP and a NBS. Next we consider the case where each user does not know the CSI of Eve but only its distribution. In that case we use secrecy outage as the criterion for the receiver to send an ACK or a NACK. Here also we use the previous algorithms to obtain a CCE, PP or a NBS. Finally we show that our algorithms can be extended to the case where a user can transmit at different rates. At the end we provide a few examples to compute different solutions and compare them under different CSI scenarios. |
Tasks | |
Published | 2016-07-05 |
URL | http://arxiv.org/abs/1607.01346v1 |
http://arxiv.org/pdf/1607.01346v1.pdf | |
PWC | https://paperswithcode.com/paper/resource-allocation-in-a-mac-with-and-without |
Repo | |
Framework | |
Stability revisited: new generalisation bounds for the Leave-one-Out
Title | Stability revisited: new generalisation bounds for the Leave-one-Out |
Authors | Alain Celisse, Benjamin Guedj |
Abstract | The present paper provides a new generic strategy leading to non-asymptotic theoretical guarantees on the Leave-one-Out procedure applied to a broad class of learning algorithms. This strategy relies on two main ingredients: the new notion of $L^q$ stability, and the strong use of moment inequalities. $L^q$ stability extends the ongoing notion of hypothesis stability while remaining weaker than the uniform stability. It leads to new PAC exponential generalisation bounds for Leave-one-Out under mild assumptions. In the literature, such bounds are available only for uniform stable algorithms under boundedness for instance. Our generic strategy is applied to the Ridge regression algorithm as a first step. |
Tasks | |
Published | 2016-08-23 |
URL | http://arxiv.org/abs/1608.06412v1 |
http://arxiv.org/pdf/1608.06412v1.pdf | |
PWC | https://paperswithcode.com/paper/stability-revisited-new-generalisation-bounds |
Repo | |
Framework | |
Learning rotation invariant convolutional filters for texture classification
Title | Learning rotation invariant convolutional filters for texture classification |
Authors | Diego Marcos, Michele Volpi, Devis Tuia |
Abstract | We present a method for learning discriminative filters using a shallow Convolutional Neural Network (CNN). We encode rotation invariance directly in the model by tying the weights of groups of filters to several rotated versions of the canonical filter in the group. These filters can be used to extract rotation invariant features well-suited for image classification. We test this learning procedure on a texture classification benchmark, where the orientations of the training images differ from those of the test images. We obtain results comparable to the state-of-the-art. Compared to standard shallow CNNs, the proposed method obtains higher classification performance while reducing by an order of magnitude the number of parameters to be learned. |
Tasks | Image Classification, Texture Classification |
Published | 2016-04-22 |
URL | http://arxiv.org/abs/1604.06720v2 |
http://arxiv.org/pdf/1604.06720v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-rotation-invariant-convolutional |
Repo | |
Framework | |
Sequential ranking under random semi-bandit feedback
Title | Sequential ranking under random semi-bandit feedback |
Authors | Hossein Vahabi, Paul Lagrée, Claire Vernade, Olivier Cappé |
Abstract | In many web applications, a recommendation is not a single item suggested to a user but a list of possibly interesting contents that may be ranked in some contexts. The combinatorial bandit problem has been studied quite extensively these last two years and many theoretical results now exist : lower bounds on the regret or asymptotically optimal algorithms. However, because of the variety of situations that can be considered, results are designed to solve the problem for a specific reward structure such as the Cascade Model. The present work focuses on the problem of ranking items when the user is allowed to click on several items while scanning the list from top to bottom. |
Tasks | |
Published | 2016-03-04 |
URL | http://arxiv.org/abs/1603.01450v2 |
http://arxiv.org/pdf/1603.01450v2.pdf | |
PWC | https://paperswithcode.com/paper/sequential-ranking-under-random-semi-bandit |
Repo | |
Framework | |