Paper Group ANR 379
DeepCas: an End-to-end Predictor of Information Cascades. Clustering Comparable Corpora of Russian and Ukrainian Academic Texts: Word Embeddings and Semantic Fingerprints. Discovering Patterns in Time-Varying Graphs: A Triclustering Approach. Pufferfish Privacy Mechanisms for Correlated Data. Video Registration in Egocentric Vision under Day and Ni …
DeepCas: an End-to-end Predictor of Information Cascades
Title | DeepCas: an End-to-end Predictor of Information Cascades |
Authors | Cheng Li, Jiaqi Ma, Xiaoxiao Guo, Qiaozhu Mei |
Abstract | Information cascades, effectively facilitated by most social network platforms, are recognized as a major factor in almost every social success and disaster in these networks. Can cascades be predicted? While many believe that they are inherently unpredictable, recent work has shown that some key properties of information cascades, such as size, growth, and shape, can be predicted by a machine learning algorithm that combines many features. These predictors all depend on a bag of hand-crafting features to represent the cascade network and the global network structure. Such features, always carefully and sometimes mysteriously designed, are not easy to extend or to generalize to a different platform or domain. Inspired by the recent successes of deep learning in multiple data mining tasks, we investigate whether an end-to-end deep learning approach could effectively predict the future size of cascades. Such a method automatically learns the representation of individual cascade graphs in the context of the global network structure, without hand-crafted features and heuristics. We find that node embeddings fall short of predictive power, and it is critical to learn the representation of a cascade graph as a whole. We present algorithms that learn the representation of cascade graphs in an end-to-end manner, which significantly improve the performance of cascade prediction over strong baselines that include feature based methods, node embedding methods, and graph kernel methods. Our results also provide interesting implications for cascade prediction in general. |
Tasks | |
Published | 2016-11-16 |
URL | http://arxiv.org/abs/1611.05373v1 |
http://arxiv.org/pdf/1611.05373v1.pdf | |
PWC | https://paperswithcode.com/paper/deepcas-an-end-to-end-predictor-of |
Repo | |
Framework | |
Clustering Comparable Corpora of Russian and Ukrainian Academic Texts: Word Embeddings and Semantic Fingerprints
Title | Clustering Comparable Corpora of Russian and Ukrainian Academic Texts: Word Embeddings and Semantic Fingerprints |
Authors | Andrey Kutuzov, Mikhail Kopotev, Tatyana Sviridenko, Lyubov Ivanova |
Abstract | We present our experience in applying distributional semantics (neural word embeddings) to the problem of representing and clustering documents in a bilingual comparable corpus. Our data is a collection of Russian and Ukrainian academic texts, for which topics are their academic fields. In order to build language-independent semantic representations of these documents, we train neural distributional models on monolingual corpora and learn the optimal linear transformation of vectors from one language to another. The resulting vectors are then used to produce semantic fingerprints' of documents, serving as input to a clustering algorithm. The presented method is compared to several baselines including orthographic translation’ with Levenshtein edit distance and outperforms them by a large margin. We also show that language-independent `semantic fingerprints’ are superior to multi-lingual clustering algorithms proposed in the previous work, at the same time requiring less linguistic resources. | |
Tasks | Word Embeddings |
Published | 2016-04-18 |
URL | http://arxiv.org/abs/1604.05372v1 |
http://arxiv.org/pdf/1604.05372v1.pdf | |
PWC | https://paperswithcode.com/paper/clustering-comparable-corpora-of-russian-and |
Repo | |
Framework | |
Discovering Patterns in Time-Varying Graphs: A Triclustering Approach
Title | Discovering Patterns in Time-Varying Graphs: A Triclustering Approach |
Authors | Romain Guigourès, Marc Boullé, Fabrice Rossi |
Abstract | This paper introduces a novel technique to track structures in time varying graphs. The method uses a maximum a posteriori approach for adjusting a three-dimensional co-clustering of the source vertices, the destination vertices and the time, to the data under study, in a way that does not require any hyper-parameter tuning. The three dimensions are simultaneously segmented in order to build clusters of source vertices, destination vertices and time segments where the edge distributions across clusters of vertices follow the same evolution over the time segments. The main novelty of this approach lies in that the time segments are directly inferred from the evolution of the edge distribution between the vertices, thus not requiring the user to make any a priori quantization. Experiments conducted on artificial data illustrate the good behavior of the technique, and a study of a real-life data set shows the potential of the proposed approach for exploratory data analysis. |
Tasks | Quantization |
Published | 2016-08-29 |
URL | http://arxiv.org/abs/1608.07929v1 |
http://arxiv.org/pdf/1608.07929v1.pdf | |
PWC | https://paperswithcode.com/paper/discovering-patterns-in-time-varying-graphs-a |
Repo | |
Framework | |
Pufferfish Privacy Mechanisms for Correlated Data
Title | Pufferfish Privacy Mechanisms for Correlated Data |
Authors | Shuang Song, Yizhen Wang, Kamalika Chaudhuri |
Abstract | Many modern databases include personal and sensitive correlated data, such as private information on users connected together in a social network, and measurements of physical activity of single subjects across time. However, differential privacy, the current gold standard in data privacy, does not adequately address privacy issues in this kind of data. This work looks at a recent generalization of differential privacy, called Pufferfish, that can be used to address privacy in correlated data. The main challenge in applying Pufferfish is a lack of suitable mechanisms. We provide the first mechanism – the Wasserstein Mechanism – which applies to any general Pufferfish framework. Since this mechanism may be computationally inefficient, we provide an additional mechanism that applies to some practical cases such as physical activity measurements across time, and is computationally efficient. Our experimental evaluations indicate that this mechanism provides privacy and utility for synthetic as well as real data in two separate domains. |
Tasks | |
Published | 2016-03-13 |
URL | http://arxiv.org/abs/1603.03977v3 |
http://arxiv.org/pdf/1603.03977v3.pdf | |
PWC | https://paperswithcode.com/paper/pufferfish-privacy-mechanisms-for-correlated |
Repo | |
Framework | |
Video Registration in Egocentric Vision under Day and Night Illumination Changes
Title | Video Registration in Egocentric Vision under Day and Night Illumination Changes |
Authors | Stefano Alletto, Giuseppe Serra, Rita Cucchiara |
Abstract | With the spread of wearable devices and head mounted cameras, a wide range of application requiring precise user localization is now possible. In this paper we propose to treat the problem of obtaining the user position with respect to a known environment as a video registration problem. Video registration, i.e. the task of aligning an input video sequence to a pre-built 3D model, relies on a matching process of local keypoints extracted on the query sequence to a 3D point cloud. The overall registration performance is strictly tied to the actual quality of this 2D-3D matching, and can degrade if environmental conditions such as steep changes in lighting like the ones between day and night occur. To effectively register an egocentric video sequence under these conditions, we propose to tackle the source of the problem: the matching process. To overcome the shortcomings of standard matching techniques, we introduce a novel embedding space that allows us to obtain robust matches by jointly taking into account local descriptors, their spatial arrangement and their temporal robustness. The proposal is evaluated using unconstrained egocentric video sequences both in terms of matching quality and resulting registration performance using different 3D models of historical landmarks. The results show that the proposed method can outperform state of the art registration algorithms, in particular when dealing with the challenges of night and day sequences. |
Tasks | |
Published | 2016-07-28 |
URL | http://arxiv.org/abs/1607.08434v1 |
http://arxiv.org/pdf/1607.08434v1.pdf | |
PWC | https://paperswithcode.com/paper/video-registration-in-egocentric-vision-under |
Repo | |
Framework | |
Piecewise convexity of artificial neural networks
Title | Piecewise convexity of artificial neural networks |
Authors | Blaine Rister, Daniel L Rubin |
Abstract | Although artificial neural networks have shown great promise in applications including computer vision and speech recognition, there remains considerable practical and theoretical difficulty in optimizing their parameters. The seemingly unreasonable success of gradient descent methods in minimizing these non-convex functions remains poorly understood. In this work we offer some theoretical guarantees for networks with piecewise affine activation functions, which have in recent years become the norm. We prove three main results. Firstly, that the network is piecewise convex as a function of the input data. Secondly, that the network, considered as a function of the parameters in a single layer, all others held constant, is again piecewise convex. Finally, that the network as a function of all its parameters is piecewise multi-convex, a generalization of biconvexity. From here we characterize the local minima and stationary points of the training objective, showing that they minimize certain subsets of the parameter space. We then analyze the performance of two optimization algorithms on multi-convex problems: gradient descent, and a method which repeatedly solves a number of convex sub-problems. We prove necessary convergence conditions for the first algorithm and both necessary and sufficient conditions for the second, after introducing regularization to the objective. Finally, we remark on the remaining difficulty of the global optimization problem. Under the squared error objective, we show that by varying the training data, a single rectifier neuron admits local minima arbitrarily far apart, both in objective value and parameter space. |
Tasks | Speech Recognition |
Published | 2016-07-17 |
URL | http://arxiv.org/abs/1607.04917v2 |
http://arxiv.org/pdf/1607.04917v2.pdf | |
PWC | https://paperswithcode.com/paper/piecewise-convexity-of-artificial-neural |
Repo | |
Framework | |
Randomized Clustered Nystrom for Large-Scale Kernel Machines
Title | Randomized Clustered Nystrom for Large-Scale Kernel Machines |
Authors | Farhad Pourkamali-Anaraki, Stephen Becker |
Abstract | The Nystrom method has been popular for generating the low-rank approximation of kernel matrices that arise in many machine learning problems. The approximation quality of the Nystrom method depends crucially on the number of selected landmark points and the selection procedure. In this paper, we present a novel algorithm to compute the optimal Nystrom low-approximation when the number of landmark points exceed the target rank. Moreover, we introduce a randomized algorithm for generating landmark points that is scalable to large-scale data sets. The proposed method performs K-means clustering on low-dimensional random projections of a data set and, thus, leads to significant savings for high-dimensional data sets. Our theoretical results characterize the tradeoffs between the accuracy and efficiency of our proposed method. Extensive experiments demonstrate the competitive performance as well as the efficiency of our proposed method. |
Tasks | |
Published | 2016-12-20 |
URL | http://arxiv.org/abs/1612.06470v1 |
http://arxiv.org/pdf/1612.06470v1.pdf | |
PWC | https://paperswithcode.com/paper/randomized-clustered-nystrom-for-large-scale |
Repo | |
Framework | |
Evaluating Crowdsourcing Participants in the Absence of Ground-Truth
Title | Evaluating Crowdsourcing Participants in the Absence of Ground-Truth |
Authors | Ramanathan Subramanian, Romer Rosales, Glenn Fung, Jennifer Dy |
Abstract | Given a supervised/semi-supervised learning scenario where multiple annotators are available, we consider the problem of identification of adversarial or unreliable annotators. |
Tasks | |
Published | 2016-05-30 |
URL | http://arxiv.org/abs/1605.09432v1 |
http://arxiv.org/pdf/1605.09432v1.pdf | |
PWC | https://paperswithcode.com/paper/evaluating-crowdsourcing-participants-in-the |
Repo | |
Framework | |
Learning Methods for Dynamic Topic Modeling in Automated Behaviour Analysis
Title | Learning Methods for Dynamic Topic Modeling in Automated Behaviour Analysis |
Authors | Olga Isupova, Danil Kuzin, Lyudmila Mihaylova |
Abstract | Semi-supervised and unsupervised systems provide operators with invaluable support and can tremendously reduce the operators load. In the light of the necessity to process large volumes of video data and provide autonomous decisions, this work proposes new learning algorithms for activity analysis in video. The activities and behaviours are described by a dynamic topic model. Two novel learning algorithms based on the expectation maximisation approach and variational Bayes inference are proposed. Theoretical derivations of the posterior of model parameters are given. The designed learning algorithms are compared with the Gibbs sampling inference scheme introduced earlier in the literature. A detailed comparison of the learning algorithms is presented on real video data. We also propose an anomaly localisation procedure, elegantly embedded in the topic modeling framework. The proposed framework can be applied to a number of areas, including transportation systems, security and surveillance. |
Tasks | |
Published | 2016-11-02 |
URL | http://arxiv.org/abs/1611.00565v2 |
http://arxiv.org/pdf/1611.00565v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-methods-for-dynamic-topic-modeling |
Repo | |
Framework | |
Automating Political Bias Prediction
Title | Automating Political Bias Prediction |
Authors | Felix Biessmann |
Abstract | Every day media generate large amounts of text. An unbiased view on media reports requires an understanding of the political bias of media content. Assistive technology for estimating the political bias of texts can be helpful in this context. This study proposes a simple statistical learning approach to predict political bias from text. Standard text features extracted from speeches and manifestos of political parties are used to predict political bias in terms of political party affiliation and in terms of political views. Results indicate that political bias can be predicted with above chance accuracy. Mistakes of the model can be interpreted with respect to changes of policies of political actors. Two approaches are presented to make the results more interpretable: a) discriminative text features are related to the political orientation of a party and b) sentiment features of texts are correlated with a measure of political power. Political power appears to be strongly correlated with positive sentiment of a text. To highlight some potential use cases a web application shows how the model can be used for texts for which the political bias is not clear such as news articles. |
Tasks | |
Published | 2016-08-07 |
URL | http://arxiv.org/abs/1608.02195v1 |
http://arxiv.org/pdf/1608.02195v1.pdf | |
PWC | https://paperswithcode.com/paper/automating-political-bias-prediction |
Repo | |
Framework | |
Estimation of solar irradiance using ground-based whole sky imagers
Title | Estimation of solar irradiance using ground-based whole sky imagers |
Authors | Soumyabrata Dev, Florian M. Savoy, Yee Hui Lee, Stefan Winkler |
Abstract | Ground-based whole sky imagers (WSIs) can provide localized images of the sky of high temporal and spatial resolution, which permits fine-grained cloud observation. In this paper, we show how images taken by WSIs can be used to estimate solar radiation. Sky cameras are useful here because they provide additional information about cloud movement and coverage, which are otherwise not available from weather station data. Our setup includes ground-based weather stations at the same location as the imagers. We use their measurements to validate our methods. |
Tasks | |
Published | 2016-06-08 |
URL | http://arxiv.org/abs/1606.02546v2 |
http://arxiv.org/pdf/1606.02546v2.pdf | |
PWC | https://paperswithcode.com/paper/estimation-of-solar-irradiance-using-ground |
Repo | |
Framework | |
Laplacian Eigenmaps from Sparse, Noisy Similarity Measurements
Title | Laplacian Eigenmaps from Sparse, Noisy Similarity Measurements |
Authors | Keith Levin, Vince Lyzinski |
Abstract | Manifold learning and dimensionality reduction techniques are ubiquitous in science and engineering, but can be computationally expensive procedures when applied to large data sets or when similarities are expensive to compute. To date, little work has been done to investigate the tradeoff between computational resources and the quality of learned representations. We present both theoretical and experimental explorations of this question. In particular, we consider Laplacian eigenmaps embeddings based on a kernel matrix, and explore how the embeddings behave when this kernel matrix is corrupted by occlusion and noise. Our main theoretical result shows that under modest noise and occlusion assumptions, we can (with high probability) recover a good approximation to the Laplacian eigenmaps embedding based on the uncorrupted kernel matrix. Our results also show how regularization can aid this approximation. Experimentally, we explore the effects of noise and occlusion on Laplacian eigenmaps embeddings of two real-world data sets, one from speech processing and one from neuroscience, as well as a synthetic data set. |
Tasks | Dimensionality Reduction |
Published | 2016-03-12 |
URL | http://arxiv.org/abs/1603.03972v2 |
http://arxiv.org/pdf/1603.03972v2.pdf | |
PWC | https://paperswithcode.com/paper/laplacian-eigenmaps-from-sparse-noisy |
Repo | |
Framework | |
The Inverse Bagging Algorithm: Anomaly Detection by Inverse Bootstrap Aggregating
Title | The Inverse Bagging Algorithm: Anomaly Detection by Inverse Bootstrap Aggregating |
Authors | Pietro Vischia, Tommaso Dorigo |
Abstract | For data sets populated by a very well modeled process and by another process of unknown probability density function (PDF), a desired feature when manipulating the fraction of the unknown process (either for enhancing it or suppressing it) consists in avoiding to modify the kinematic distributions of the well modeled one. A bootstrap technique is used to identify sub-samples rich in the well modeled process, and classify each event according to the frequency of it being part of such sub-samples. Comparisons with general MVA algorithms will be shown, as well as a study of the asymptotic properties of the method, making use of a public domain data set that models a typical search for new physics as performed at hadronic colliders such as the Large Hadron Collider (LHC). |
Tasks | Anomaly Detection |
Published | 2016-11-24 |
URL | http://arxiv.org/abs/1611.08256v1 |
http://arxiv.org/pdf/1611.08256v1.pdf | |
PWC | https://paperswithcode.com/paper/the-inverse-bagging-algorithm-anomaly |
Repo | |
Framework | |
Towards stationary time-vertex signal processing
Title | Towards stationary time-vertex signal processing |
Authors | Nathanael Perraudin, Andreas Loukas, Francesco Grassi, Pierre Vandergheynst |
Abstract | Graph-based methods for signal processing have shown promise for the analysis of data exhibiting irregular structure, such as those found in social, transportation, and sensor networks. Yet, though these systems are often dynamic, state-of-the-art methods for signal processing on graphs ignore the dimension of time, treating successive graph signals independently or taking a global average. To address this shortcoming, this paper considers the statistical analysis of time-varying graph signals. We introduce a novel definition of joint (time-vertex) stationarity, which generalizes the classical definition of time stationarity and the more recent definition appropriate for graphs. Joint stationarity gives rise to a scalable Wiener optimization framework for joint denoising, semi-supervised learning, or more generally inversing a linear operator, that is provably optimal. Experimental results on real weather data demonstrate that taking into account graph and time dimensions jointly can yield significant accuracy improvements in the reconstruction effort. |
Tasks | Denoising |
Published | 2016-06-22 |
URL | http://arxiv.org/abs/1606.06962v1 |
http://arxiv.org/pdf/1606.06962v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-stationary-time-vertex-signal |
Repo | |
Framework | |
An electronic-game framework for evaluating coevolutionary algorithms
Title | An electronic-game framework for evaluating coevolutionary algorithms |
Authors | Karine da Silva Miras de Araújo, Fabrício Olivetti de França |
Abstract | One of the common artificial intelligence applications in electronic games consists of making an artificial agent learn how to execute some determined task successfully in a game environment. One way to perform this task is through machine learning algorithms capable of learning the sequence of actions required to win in a given game environment. There are several supervised learning techniques able to learn the correct answer for a problem through examples. However, when learning how to play electronic games, the correct answer might only be known by the end of the game, after all the actions were already taken. Thus, not being possible to measure the accuracy of each individual action to be taken at each time step. A way for dealing with this problem is through Neuroevolution, a method which trains Artificial Neural Networks using evolutionary algorithms. In this article, we introduce a framework for testing optimization algorithms with artificial agent controllers in electronic games, called EvoMan, which is inspired in the action-platformer game Mega Man II. The environment can be configured to run in different experiment modes, as single evolution, coevolution and others. To demonstrate some challenges regarding the proposed platform, as initial experiments we applied Neuroevolution using Genetic Algorithms and the NEAT algorithm, in the context of competitively coevolving two distinct agents in this game. |
Tasks | |
Published | 2016-04-03 |
URL | http://arxiv.org/abs/1604.00644v2 |
http://arxiv.org/pdf/1604.00644v2.pdf | |
PWC | https://paperswithcode.com/paper/an-electronic-game-framework-for-evaluating |
Repo | |
Framework | |