Paper Group ANR 372
Big Data Approaches to Knot Theory: Understanding the Structure of the Jones Polynomial. Rumour Detection via News Propagation Dynamics and User Representation Learning. Causal Regularization. Jacobian Policy Optimizations. Reading Comprehension Ability Test-A Turing Test for Reading Comprehension. Identity Connections in Residual Nets Improve Nois …
Big Data Approaches to Knot Theory: Understanding the Structure of the Jones Polynomial
Title | Big Data Approaches to Knot Theory: Understanding the Structure of the Jones Polynomial |
Authors | Jesse S F Levitt, Mustafa Hajij, Radmila Sazdanovic |
Abstract | We examine the structure and dimensionality of the Jones polynomial using manifold learning techniques. Our data set consists of more than 10 million knots up to 17 crossings and two other special families up to 2001 crossings. We introduce and describe a method for using filtrations to analyze infinite data sets where representative sampling is impossible or impractical, an essential requirement for working with knots and the data from knot invariants. In particular, this method provides a new approach for analyzing knot invariants using Principal Component Analysis. Using this approach on the Jones polynomial data we find that it can be viewed as an approximately 3 dimensional manifold, that this description is surprisingly stable with respect to the filtration by the crossing number, and that the results suggest further structures to be examined and understood. |
Tasks | |
Published | 2019-12-20 |
URL | https://arxiv.org/abs/1912.10086v1 |
https://arxiv.org/pdf/1912.10086v1.pdf | |
PWC | https://paperswithcode.com/paper/big-data-approaches-to-knot-theory |
Repo | |
Framework | |
Rumour Detection via News Propagation Dynamics and User Representation Learning
Title | Rumour Detection via News Propagation Dynamics and User Representation Learning |
Authors | Tien Huu Do, Xiao Luo, Duc Minh Nguyen, Nikos Deligiannis |
Abstract | Rumours have existed for a long time and have been known for serious consequences. The rapid growth of social media platforms has multiplied the negative impact of rumours; it thus becomes important to early detect them. Many methods have been introduced to detect rumours using the content or the social context of news. However, most existing methods ignore or do not explore effectively the propagation pattern of news in social media, including the sequence of interactions of social media users with news across time. In this work, we propose a novel method for rumour detection based on deep learning. Our method leverages the propagation process of the news by learning the users’ representation and the temporal interrelation of users’ responses. Experiments conducted on Twitter and Weibo datasets demonstrate the state-of-the-art performance of the proposed method. |
Tasks | Representation Learning, Rumour Detection |
Published | 2019-04-18 |
URL | http://arxiv.org/abs/1905.03042v1 |
http://arxiv.org/pdf/1905.03042v1.pdf | |
PWC | https://paperswithcode.com/paper/190503042 |
Repo | |
Framework | |
Causal Regularization
Title | Causal Regularization |
Authors | Dominik Janzing |
Abstract | I argue that regularizing terms in standard regression methods not only help against overfitting finite data, but sometimes also yield better causal models in the infinite sample regime. I first consider a multi-dimensional variable linearly influencing a target variable with some multi-dimensional unobserved common cause, where the confounding effect can be decreased by keeping the penalizing term in Ridge and Lasso regression even in the population limit. Choosing the size of the penalizing term, is however challenging, because cross validation is pointless. Here it is done by first estimating the strength of confounding via a method proposed earlier, which yielded some reasonable results for simulated and real data. Further, I prove a `causal generalization bound’ which states (subject to a particular model of confounding) that the error made by interpreting any non-linear regression as causal model can be bounded from above whenever functions are taken from a not too rich class. In other words, the bound guarantees “generalization” from observational to interventional distributions, which is usually not subject of statistical learning theory (and is only possible due to the underlying symmetries of the confounder model). | |
Tasks | |
Published | 2019-06-28 |
URL | https://arxiv.org/abs/1906.12179v1 |
https://arxiv.org/pdf/1906.12179v1.pdf | |
PWC | https://paperswithcode.com/paper/causal-regularization-1 |
Repo | |
Framework | |
Jacobian Policy Optimizations
Title | Jacobian Policy Optimizations |
Authors | Arip Asadulaev, Gideon Stein, Igor Kuznetsov, Andrey Filchenkov |
Abstract | Recently, natural policy gradient algorithms gained widespread recognition due to their strong performance in reinforcement learning tasks. However, their major drawback is the need to secure the policy being in a ``trust region’’ and meanwhile allowing for sufficient exploration. The main objective of this study was to present an approach which models dynamical isometry of agents policies by estimating conditioning of its Jacobian at individual points in the environment space. We present a Jacobian Policy Optimization algorithm for policy optimization, which dynamically adapts the trust interval with respect to policy conditioning. The suggested approach was tested across a range of Atari environments. This paper offers some important insights into an improvement of policy optimization in reinforcement learning tasks. | |
Tasks | |
Published | 2019-06-13 |
URL | https://arxiv.org/abs/1906.05437v1 |
https://arxiv.org/pdf/1906.05437v1.pdf | |
PWC | https://paperswithcode.com/paper/jacobian-policy-optimizations |
Repo | |
Framework | |
Reading Comprehension Ability Test-A Turing Test for Reading Comprehension
Title | Reading Comprehension Ability Test-A Turing Test for Reading Comprehension |
Authors | Yuan Miao, Gongqi Lin, Yidan Hu, Chunyan Miao |
Abstract | Reading comprehension is an important ability of human intelligence. Literacy and numeracy are two most essential foundation for people to succeed at study, at work and in life. Reading comprehension ability is a core component of literacy. In most of the education systems, developing reading comprehension ability is compulsory in the curriculum from year one to year 12. It is an indispensable ability in the dissemination of knowledge. With the emerging artificial intelligence, computers start to be able to read and understand like people in some context. They can even read better than human beings for some tasks, but have little clue in other tasks. It will be very beneficial if we can identify the levels of machine comprehension ability, which will direct us on the further improvement. Turing test is a well-known test of the difference between computer intelligence and human intelligence. In order to be able to compare the difference between people reading and machines reading, we proposed a test called (reading) Comprehension Ability Test (CAT).CAT is similar to Turing test, passing of which means we cannot differentiate people from algorithms in term of their comprehension ability. CAT has multiple levels showing the different abilities in reading comprehension, from identifying basic facts, performing inference, to understanding the intent and sentiment. |
Tasks | Reading Comprehension |
Published | 2019-09-05 |
URL | https://arxiv.org/abs/1909.02399v1 |
https://arxiv.org/pdf/1909.02399v1.pdf | |
PWC | https://paperswithcode.com/paper/reading-comprehension-ability-test-a-turing |
Repo | |
Framework | |
Identity Connections in Residual Nets Improve Noise Stability
Title | Identity Connections in Residual Nets Improve Noise Stability |
Authors | Shuzhi Yu, Carlo Tomasi |
Abstract | Residual Neural Networks (ResNets) achieve state-of-the-art performance in many computer vision problems. Compared to plain networks without residual connections (PlnNets), ResNets train faster, generalize better, and suffer less from the so-called degradation problem. We introduce simplified (but still nonlinear) versions of ResNets and PlnNets for which these discrepancies still hold, although to a lesser degree. We establish a 1-1 mapping between simplified ResNets and simplified PlnNets, and show that they are exactly equivalent to each other in expressive power for the same computational complexity. We conjecture that ResNets generalize better because they have better noise stability, and empirically support it for both simplified and fully-fledged networks. |
Tasks | |
Published | 2019-05-27 |
URL | https://arxiv.org/abs/1905.10944v1 |
https://arxiv.org/pdf/1905.10944v1.pdf | |
PWC | https://paperswithcode.com/paper/identity-connections-in-residual-nets-improve |
Repo | |
Framework | |
Active Probabilistic Inference on Matrices for Pre-Conditioning in Stochastic Optimization
Title | Active Probabilistic Inference on Matrices for Pre-Conditioning in Stochastic Optimization |
Authors | Filip de Roos, Philipp Hennig |
Abstract | Pre-conditioning is a well-known concept that can significantly improve the convergence of optimization algorithms. For noise-free problems, where good pre-conditioners are not known a priori, iterative linear algebra methods offer one way to efficiently construct them. For the stochastic optimization problems that dominate contemporary machine learning, however, this approach is not readily available. We propose an iterative algorithm inspired by classic iterative linear solvers that uses a probabilistic model to actively infer a pre-conditioner in situations where Hessian-projections can only be constructed with strong Gaussian noise. The algorithm is empirically demonstrated to efficiently construct effective pre-conditioners for stochastic gradient descent and its variants. Experiments on problems of comparably low dimensionality show improved convergence. In very high-dimensional problems, such as those encountered in deep learning, the pre-conditioner effectively becomes an automatic learning-rate adaptation scheme, which we also empirically show to work well. |
Tasks | Stochastic Optimization |
Published | 2019-02-20 |
URL | http://arxiv.org/abs/1902.07557v1 |
http://arxiv.org/pdf/1902.07557v1.pdf | |
PWC | https://paperswithcode.com/paper/active-probabilistic-inference-on-matrices |
Repo | |
Framework | |
Connections Between Adaptive Control and Optimization in Machine Learning
Title | Connections Between Adaptive Control and Optimization in Machine Learning |
Authors | Joseph E. Gaudio, Travis E. Gibson, Anuradha M. Annaswamy, Michael A. Bolender, Eugene Lavretsky |
Abstract | This paper demonstrates many immediate connections between adaptive control and optimization methods commonly employed in machine learning. Starting from common output error formulations, similarities in update law modifications are examined. Concepts in stability, performance, and learning, common to both fields are then discussed. Building on the similarities in update laws and common concepts, new intersections and opportunities for improved algorithm analysis are provided. In particular, a specific problem related to higher order learning is solved through insights obtained from these intersections. |
Tasks | |
Published | 2019-04-11 |
URL | http://arxiv.org/abs/1904.05856v1 |
http://arxiv.org/pdf/1904.05856v1.pdf | |
PWC | https://paperswithcode.com/paper/connections-between-adaptive-control-and |
Repo | |
Framework | |
Representation Learning: A Statistical Perspective
Title | Representation Learning: A Statistical Perspective |
Authors | Jianwen Xie, Ruiqi Gao, Erik Nijkamp, Song-Chun Zhu, Ying Nian Wu |
Abstract | Learning representations of data is an important problem in statistics and machine learning. While the origin of learning representations can be traced back to factor analysis and multidimensional scaling in statistics, it has become a central theme in deep learning with important applications in computer vision and computational neuroscience. In this article, we review recent advances in learning representations from a statistical perspective. In particular, we review the following two themes: (a) unsupervised learning of vector representations and (b) learning of both vector and matrix representations. |
Tasks | Representation Learning |
Published | 2019-11-26 |
URL | https://arxiv.org/abs/1911.11374v1 |
https://arxiv.org/pdf/1911.11374v1.pdf | |
PWC | https://paperswithcode.com/paper/representation-learning-a-statistical |
Repo | |
Framework | |
Convergence of Gradient Methods on Bilinear Zero-Sum Games
Title | Convergence of Gradient Methods on Bilinear Zero-Sum Games |
Authors | Guojun Zhang, Yaoliang Yu |
Abstract | Min-max formulations have attracted great attention in the ML community due to the rise of deep generative models and adversarial methods, while understanding the dynamics of gradient algorithms for solving such formulations has remained a grand challenge. As a first step, we restrict to bilinear zero-sum games and give a systematic analysis of popular gradient updates, for both simultaneous and alternating versions. We provide exact conditions for their convergence and find the optimal parameter setup and convergence rates. In particular, our results offer formal evidence that alternating updates converge “better” than simultaneous ones. |
Tasks | |
Published | 2019-08-15 |
URL | https://arxiv.org/abs/1908.05699v4 |
https://arxiv.org/pdf/1908.05699v4.pdf | |
PWC | https://paperswithcode.com/paper/convergence-behaviour-of-some-gradient-based |
Repo | |
Framework | |
Multi-objective Evolutionary Approach to Grey-Box Identification of Buck Converter
Title | Multi-objective Evolutionary Approach to Grey-Box Identification of Buck Converter |
Authors | Faizal Hafiz, Akshya Swain, Eduardo M. A. M. Mendes, Luis Aguirre |
Abstract | The present study proposes a simple grey-box identification approach to model a real DC-DC buck converter operating in continuous conduction mode. The problem associated with the information void in the observed dynamical data, which is often obtained over a relatively narrow input range, is alleviated by exploiting the known static behavior of buck converter as a priori knowledge. A simple method is developed based on the concept of term clusters to determine the static response of the candidate models. The error in the static behavior is then directly embedded into the multi-objective framework for structure selection. In essence, the proposed approach casts grey-box identification problem into a multi-objective framework to balance bias-variance dilemma of model building while explicitly integrating a priori knowledge into the structure selection process. The results of the investigation, considering the case of practical buck converter, demonstrate that it is possible to identify parsimonious models which can capture both the dynamic and static behavior of the system over a wide input range. |
Tasks | |
Published | 2019-09-10 |
URL | https://arxiv.org/abs/1909.04320v2 |
https://arxiv.org/pdf/1909.04320v2.pdf | |
PWC | https://paperswithcode.com/paper/multi-objective-evolutionary-approach-to-grey |
Repo | |
Framework | |
Adaptive Iterative Hessian Sketch via A-Optimal Subsampling
Title | Adaptive Iterative Hessian Sketch via A-Optimal Subsampling |
Authors | Aijun Zhang, Hengtao Zhang, Guosheng Yin |
Abstract | Iterative Hessian sketch (IHS) is an effective sketching method for modeling large-scale data. It was originally proposed by Pilanci and Wainwright (2016; JMLR) based on randomized sketching matrices. However, it is computationally intensive due to the iterative sketch process. In this paper, we analyze the IHS algorithm under the unconstrained least squares problem setting, then propose a deterministic approach for improving IHS via A-optimal subsampling. Our contributions are three-fold: (1) a good initial estimator based on the A-optimal design is suggested; (2) a novel ridged preconditioner is developed for repeated sketching; and (3) an exact line search method is proposed for determining the optimal step length adaptively. Extensive experimental results demonstrate that our proposed A-optimal IHS algorithm outperforms the existing accelerated IHS methods. |
Tasks | |
Published | 2019-02-20 |
URL | https://arxiv.org/abs/1902.07627v2 |
https://arxiv.org/pdf/1902.07627v2.pdf | |
PWC | https://paperswithcode.com/paper/adaptive-iterative-hessian-sketch-via-a |
Repo | |
Framework | |
Closed-form Expressions for Maximum Mean Discrepancy with Applications to Wasserstein Auto-Encoders
Title | Closed-form Expressions for Maximum Mean Discrepancy with Applications to Wasserstein Auto-Encoders |
Authors | Raif M. Rustamov |
Abstract | The Maximum Mean Discrepancy (MMD) has found numerous applications in statistics and machine learning, most recently as a penalty in the Wasserstein Auto-Encoder (WAE). In this paper we compute closed-form expressions for estimating the Gaussian kernel based MMD between a given distribution and the standard multivariate normal distribution. We introduce the standardized version of MMD as a penalty for the WAE training objective, allowing for a better interpretability of MMD values and more compatibility across different hyperparameter settings. Next, we propose using a version of batch normalization at the code layer; this has the benefits of making the kernel width selection easier, reducing the training effort, and preventing outliers in the aggregate code distribution. Finally, we discuss the appropriate null distributions and provide thresholds for multivariate normality testing with the standardized MMD, leading to a number of easy rules of thumb for monitoring the progress of WAE training. Curiously, our MMD formula reveals a connection to the Baringhaus-Henze-Epps-Pulley (BHEP) statistic of the Henze-Zirkler test and provides further insights about the MMD. Our experiments on synthetic and real data show that the analytic formulation improves over the commonly used stochastic approximation of the MMD, and demonstrate that code normalization provides significant benefits when training WAEs. |
Tasks | |
Published | 2019-01-10 |
URL | http://arxiv.org/abs/1901.03227v1 |
http://arxiv.org/pdf/1901.03227v1.pdf | |
PWC | https://paperswithcode.com/paper/closed-form-expressions-for-maximum-mean |
Repo | |
Framework | |
Beta Survival Models
Title | Beta Survival Models |
Authors | David Hubbard, Benoit Rostykus, Yves Raimond, Tony Jebara |
Abstract | This article analyzes the problem of estimating the time until an event occurs, also known as survival modeling. We observe through substantial experiments on large real-world datasets and use-cases that populations are largely heterogeneous. Sub-populations have different mean and variance in their survival rates requiring flexible models that capture heterogeneity. We leverage a classical extension of the logistic function into the survival setting to characterize unobserved heterogeneity using the beta distribution. This yields insights into the geometry of the problem as well as efficient estimation methods for linear, tree and neural network models that adjust the beta distribution based on observed covariates. We also show that the additional information captured by the beta distribution leads to interesting ranking implications as we determine who is most-at-risk. We show theoretically that the ranking is variable as we forecast forward in time and prove that pairwise comparisons of survival remain transitive. Empirical results using large-scale datasets across two use-cases (online conversions and retention modeling), demonstrate the competitiveness of the method. The simplicity of the method and its ability to capture skew in the data makes it a viable alternative to standard techniques particularly when we are interested in the time to event and when the underlying probabilities are heterogeneous. |
Tasks | |
Published | 2019-05-09 |
URL | https://arxiv.org/abs/1905.03818v1 |
https://arxiv.org/pdf/1905.03818v1.pdf | |
PWC | https://paperswithcode.com/paper/beta-survival-models |
Repo | |
Framework | |
Understanding Social Networks using Transfer Learning
Title | Understanding Social Networks using Transfer Learning |
Authors | Jun Sun, Steffen Staab, Jérôme Kunegis |
Abstract | A detailed understanding of users contributes to the understanding of the Web’s evolution, and to the development of Web applications. Although for new Web platforms such a study is especially important, it is often jeopardized by the lack of knowledge about novel phenomena due to the sparsity of data. Akin to human transfer of experiences from one domain to the next, transfer learning as a subfield of machine learning adapts knowledge acquired in one domain to a new domain. We systematically investigate how the concept of transfer learning may be applied to the study of users on newly created (emerging) Web platforms, and propose our transfer learning-based approach, TraNet. We show two use cases where TraNet is applied to tasks involving the identification of user trust and roles on different Web platforms. We compare the performance of TraNet with other approaches and find that our approach can best transfer knowledge on users across platforms in the given tasks. |
Tasks | Transfer Learning |
Published | 2019-10-16 |
URL | https://arxiv.org/abs/1910.07918v1 |
https://arxiv.org/pdf/1910.07918v1.pdf | |
PWC | https://paperswithcode.com/paper/understanding-social-networks-using-transfer |
Repo | |
Framework | |