July 28, 2019

2768 words 13 mins read

Paper Group ANR 395

Does Weather Matter? Causal Analysis of TV Logs. Scale-invariant temporal history (SITH): optimal slicing of the past in an uncertain world. A Method for Determining Weights of Criterias and Alternative of Fuzzy Group Decision Making Problem. A proposal for ethically traceable artificial intelligence. Learning Independent Causal Mechanisms. Learnin …

Does Weather Matter? Causal Analysis of TV Logs


Title	Does Weather Matter? Causal Analysis of TV Logs
Authors	Shi Zong, Branislav Kveton, Shlomo Berkovsky, Azin Ashkan, Nikos Vlassis, Zheng Wen
Abstract	Weather affects our mood and behaviors, and many aspects of our life. When it is sunny, most people become happier; but when it rains, some people get depressed. Despite this evidence and the abundance of data, weather has mostly been overlooked in the machine learning and data science research. This work presents a causal analysis of how weather affects TV watching patterns. We show that some weather attributes, such as pressure and precipitation, cause major changes in TV watching patterns. To the best of our knowledge, this is the first large-scale causal study of the impact of weather on TV watching patterns.
Tasks
Published	2017-01-25
URL	http://arxiv.org/abs/1701.08716v2
PDF	http://arxiv.org/pdf/1701.08716v2.pdf
PWC	https://paperswithcode.com/paper/does-weather-matter-causal-analysis-of-tv
Repo
Framework

Scale-invariant temporal history (SITH): optimal slicing of the past in an uncertain world


Title	Scale-invariant temporal history (SITH): optimal slicing of the past in an uncertain world
Authors	Tyler A. Spears, Brandon G. Jacques, Marc W. Howard, Per B. Sederberg
Abstract	In both the human brain and any general artificial intelligence (AI), a representation of the past is necessary to predict the future. However, perfect storage of all experiences is not feasible. One approach utilized in many applications, including reward prediction in reinforcement learning, is to retain recently active features of experience in a buffer. Despite its prior successes, we show that the fixed length buffer renders Deep Q-learning Networks (DQNs) fragile to changes in the scale over which information can be learned. To enable learning when the relevant temporal scales in the environment are not known a priori, recent advances in psychology and neuroscience suggest that the brain maintains a compressed representation of the past. Here we introduce a neurally-plausible, scale-free memory representation we call Scale-Invariant Temporal History (SITH) for use with artificial agents. This representation covers an exponentially large period of time by sacrificing temporal accuracy for events further in the past. We demonstrate the utility of this representation by comparing the performance of agents given SITH, buffer, and exponential decay representations in learning to play video games at different levels of complexity. In these environments, SITH exhibits better learning performance by storing information for longer timescales than a fixed-size buffer, and representing this information more clearly than a set of exponentially decayed features. Finally, we discuss how the application of SITH, along with other human-inspired models of cognition, could improve reinforcement and machine learning algorithms in general.
Tasks	Q-Learning
Published	2017-12-19
URL	http://arxiv.org/abs/1712.07165v3
PDF	http://arxiv.org/pdf/1712.07165v3.pdf
PWC	https://paperswithcode.com/paper/scale-invariant-temporal-history-sith-optimal
Repo
Framework

A Method for Determining Weights of Criterias and Alternative of Fuzzy Group Decision Making Problem


Title	A Method for Determining Weights of Criterias and Alternative of Fuzzy Group Decision Making Problem
Authors	Jon JaeGyong, Mun JongHui, Ryang GyongIl
Abstract	In this paper, we constructed a model to determine weights of criterias and presented a solution for determining the optimal alternative by using the constructed model and relationship analysis between criterias in fuzzy group decision-making problem with different forms of preference information of decision makers on criterias.
Tasks	Decision Making
Published	2017-05-16
URL	http://arxiv.org/abs/1705.05515v1
PDF	http://arxiv.org/pdf/1705.05515v1.pdf
PWC	https://paperswithcode.com/paper/a-method-for-determining-weights-of-criterias
Repo
Framework

A proposal for ethically traceable artificial intelligence


Title	A proposal for ethically traceable artificial intelligence
Authors	Christopher A. Tucker
Abstract	Although the problem of a critique of robotic behavior in near-unanimous agreement to human norms seems intractable, a starting point of such an ambition is a framework of the collection of knowledge a priori and experience a posteriori categorized as a set of synthetical judgments available to the intelligence, translated into computer code. If such a proposal were successful, an algorithm with ethically traceable behavior and cogent equivalence to human cognition is established. This paper will propose the application of Kant’s critique of reason to current programming constructs of an autonomous intelligent system.
Tasks
Published	2017-03-06
URL	http://arxiv.org/abs/1703.01908v2
PDF	http://arxiv.org/pdf/1703.01908v2.pdf
PWC	https://paperswithcode.com/paper/a-proposal-for-ethically-traceable-artificial
Repo
Framework

Learning Independent Causal Mechanisms


Title	Learning Independent Causal Mechanisms
Authors	Giambattista Parascandolo, Niki Kilbertus, Mateo Rojas-Carulla, Bernhard Schölkopf
Abstract	Statistical learning relies upon data sampled from a distribution, and we usually do not care what actually generated it in the first place. From the point of view of causal modeling, the structure of each distribution is induced by physical mechanisms that give rise to dependences between observables. Mechanisms, however, can be meaningful autonomous modules of generative models that make sense beyond a particular entailed data distribution, lending themselves to transfer between problems. We develop an algorithm to recover a set of independent (inverse) mechanisms from a set of transformed data points. The approach is unsupervised and based on a set of experts that compete for data generated by the mechanisms, driving specialization. We analyze the proposed method in a series of experiments on image data. Each expert learns to map a subset of the transformed data back to a reference distribution. The learned mechanisms generalize to novel domains. We discuss implications for transfer learning and links to recent trends in generative modeling.
Tasks	Transfer Learning
Published	2017-12-04
URL	http://arxiv.org/abs/1712.00961v5
PDF	http://arxiv.org/pdf/1712.00961v5.pdf
PWC	https://paperswithcode.com/paper/learning-independent-causal-mechanisms
Repo
Framework

Learning linear structural equation models in polynomial time and sample complexity


Title	Learning linear structural equation models in polynomial time and sample complexity
Authors	Asish Ghoshal, Jean Honorio
Abstract	The problem of learning structural equation models (SEMs) from data is a fundamental problem in causal inference. We develop a new algorithm — which is computationally and statistically efficient and works in the high-dimensional regime — for learning linear SEMs from purely observational data with arbitrary noise distribution. We consider three aspects of the problem: identifiability, computational efficiency, and statistical efficiency. We show that when data is generated from a linear SEM over $p$ nodes and maximum degree $d$, our algorithm recovers the directed acyclic graph (DAG) structure of the SEM under an identifiability condition that is more general than those considered in the literature, and without faithfulness assumptions. In the population setting, our algorithm recovers the DAG structure in $\mathcal{O}(p(d^2 + \log p))$ operations. In the finite sample setting, if the estimated precision matrix is sparse, our algorithm has a smoothed complexity of $\widetilde{\mathcal{O}}(p^3 + pd^7)$, while if the estimated precision matrix is dense, our algorithm has a smoothed complexity of $\widetilde{\mathcal{O}}(p^5)$. For sub-Gaussian noise, we show that our algorithm has a sample complexity of $\mathcal{O}(\frac{d^8}{\varepsilon^2} \log (\frac{p}{\sqrt{\delta}}))$ to achieve $\varepsilon$ element-wise additive error with respect to the true autoregression matrix with probability at most $1 - \delta$, while for noise with bounded $(4m)$-th moment, with $m$ being a positive integer, our algorithm has a sample complexity of $\mathcal{O}(\frac{d^8}{\varepsilon^2} (\frac{p^2}{\delta})^{1/m})$.
Tasks	Causal Inference
Published	2017-07-15
URL	http://arxiv.org/abs/1707.04673v1
PDF	http://arxiv.org/pdf/1707.04673v1.pdf
PWC	https://paperswithcode.com/paper/learning-linear-structural-equation-models-in
Repo
Framework

Determinants of Mobile Money Adoption in Pakistan


Title	Determinants of Mobile Money Adoption in Pakistan
Authors	Muhammad Raza Khan, Joshua Blumenstock
Abstract	In this work, we analyze the problem of adoption of mobile money in Pakistan by using the call detail records of a major telecom company as our input. Our results highlight the fact that different sections of the society have different patterns of adoption of digital financial services but user mobility related features are the most important one when it comes to adopting and using mobile money services.
Tasks
Published	2017-11-13
URL	http://arxiv.org/abs/1712.01081v1
PDF	http://arxiv.org/pdf/1712.01081v1.pdf
PWC	https://paperswithcode.com/paper/determinants-of-mobile-money-adoption-in
Repo
Framework

Learning Hard Alignments with Variational Inference


Title	Learning Hard Alignments with Variational Inference
Authors	Dieterich Lawson, Chung-Cheng Chiu, George Tucker, Colin Raffel, Kevin Swersky, Navdeep Jaitly
Abstract	There has recently been significant interest in hard attention models for tasks such as object recognition, visual captioning and speech recognition. Hard attention can offer benefits over soft attention such as decreased computational cost, but training hard attention models can be difficult because of the discrete latent variables they introduce. Previous work used REINFORCE and Q-learning to approach these issues, but those methods can provide high-variance gradient estimates and be slow to train. In this paper, we tackle the problem of learning hard attention for a sequential task using variational inference methods, specifically the recently introduced VIMCO and NVIL. Furthermore, we propose a novel baseline that adapts VIMCO to this setting. We demonstrate our method on a phoneme recognition task in clean and noisy environments and show that our method outperforms REINFORCE, with the difference being greater for a more complicated task.
Tasks	Image Captioning, Object Recognition, Q-Learning, Speech Recognition
Published	2017-05-16
URL	http://arxiv.org/abs/1705.05524v2
PDF	http://arxiv.org/pdf/1705.05524v2.pdf
PWC	https://paperswithcode.com/paper/learning-hard-alignments-with-variational
Repo
Framework

The information bottleneck and geometric clustering


Title	The information bottleneck and geometric clustering
Authors	D J Strouse, David J Schwab
Abstract	The information bottleneck (IB) approach to clustering takes a joint distribution $P!\left(X,Y\right)$ and maps the data $X$ to cluster labels $T$ which retain maximal information about $Y$ (Tishby et al., 1999). This objective results in an algorithm that clusters data points based upon the similarity of their conditional distributions $P!\left(Y\mid X\right)$. This is in contrast to classic “geometric clustering” algorithms such as $k$-means and gaussian mixture models (GMMs) which take a set of observed data points $\left{ \mathbf{x}{i}\right}{i=1:N}$ and cluster them based upon their geometric (typically Euclidean) distance from one another. Here, we show how to use the deterministic information bottleneck (DIB) (Strouse and Schwab, 2017), a variant of IB, to perform geometric clustering, by choosing cluster labels that preserve information about data point location on a smoothed dataset. We also introduce a novel intuitive method to choose the number of clusters, via kinks in the information curve. We apply this approach to a variety of simple clustering problems, showing that DIB with our model selection procedure recovers the generative cluster labels. We also show that, for one simple case, DIB interpolates between the cluster boundaries of GMMs and $k$-means in the large data limit. Thus, our IB approach to clustering also provides an information-theoretic perspective on these classic algorithms.
Tasks	Model Selection
Published	2017-12-27
URL	http://arxiv.org/abs/1712.09657v1
PDF	http://arxiv.org/pdf/1712.09657v1.pdf
PWC	https://paperswithcode.com/paper/the-information-bottleneck-and-geometric
Repo
Framework

A Recursive Bayesian Approach To Describe Retinal Vasculature Geometry


Title	A Recursive Bayesian Approach To Describe Retinal Vasculature Geometry
Authors	Fatmatulzehra Uslu, Anil Anthony Bharath
Abstract	Demographic studies suggest that changes in the retinal vasculature geometry, especially in vessel width, are associated with the incidence or progression of eye-related or systemic diseases. To date, the main information source for width estimation from fundus images has been the intensity profile between vessel edges. However, there are many factors affecting the intensity profile: pathologies, the central light reflex and local illumination levels, to name a few. In this study, we introduce three information sources for width estimation. These are the probability profiles of vessel interior, centreline and edge locations generated by a deep network. The probability profiles provide direct access to vessel geometry and are used in the likelihood calculation for a Bayesian method, particle filtering. We also introduce a geometric model which can handle non-ideal conditions of the probability profiles. Our experiments conducted on the REVIEW dataset yielded consistent estimates of vessel width, even in cases when one of the vessel edges is difficult to identify. Moreover, our results suggest that the method is better than human observers at locating edges of low contrast vessels.
Tasks
Published	2017-11-28
URL	http://arxiv.org/abs/1711.10521v1
PDF	http://arxiv.org/pdf/1711.10521v1.pdf
PWC	https://paperswithcode.com/paper/a-recursive-bayesian-approach-to-describe
Repo
Framework

Mining a Sub-Matrix of Maximal Sum


Title	Mining a Sub-Matrix of Maximal Sum
Authors	Vincent Branders, Pierre Schaus, Pierre Dupont
Abstract	Biclustering techniques have been widely used to identify homogeneous subgroups within large data matrices, such as subsets of genes similarly expressed across subsets of patients. Mining a max-sum sub-matrix is a related but distinct problem for which one looks for a (non-necessarily contiguous) rectangular sub-matrix with a maximal sum of its entries. Le Van et al. (Ranked Tiling, 2014) already illustrated its applicability to gene expression analysis and addressed it with a constraint programming (CP) approach combined with large neighborhood search (CP-LNS). In this work, we exhibit some key properties of this NP-hard problem and define a bounding function such that larger problems can be solved in reasonable time. Two different algorithms are proposed in order to exploit the highlighted characteristics of the problem: a CP approach with a global constraint (CPGC) and mixed integer linear programming (MILP). Practical experiments conducted both on synthetic and real gene expression data exhibit the characteristics of these approaches and their relative benefits over the original CP-LNS method. Overall, the CPGC approach tends to be the fastest to produce a good solution. Yet, the MILP formulation is arguably the easiest to formulate and can also be competitive.
Tasks
Published	2017-09-25
URL	http://arxiv.org/abs/1709.08461v1
PDF	http://arxiv.org/pdf/1709.08461v1.pdf
PWC	https://paperswithcode.com/paper/mining-a-sub-matrix-of-maximal-sum
Repo
Framework

The Effectiveness of Data Augmentation for Detection of Gastrointestinal Diseases from Endoscopical Images


Title	The Effectiveness of Data Augmentation for Detection of Gastrointestinal Diseases from Endoscopical Images
Authors	Andrea Asperti, Claudio Mastronardo
Abstract	The lack, due to privacy concerns, of large public databases of medical pathologies is a well-known and major problem, substantially hindering the application of deep learning techniques in this field. In this article, we investigate the possibility to supply to the deficiency in the number of data by means of data augmentation techniques, working on the recent Kvasir dataset of endoscopical images of gastrointestinal diseases. The dataset comprises 4,000 colored images labeled and verified by medical endoscopists, covering a few common pathologies at different anatomical landmarks: Z-line, pylorus and cecum. We show how the application of data augmentation techniques allows to achieve sensible improvements of the classification with respect to previous approaches, both in terms of precision and recall.
Tasks	Data Augmentation
Published	2017-12-11
URL	http://arxiv.org/abs/1712.03689v1
PDF	http://arxiv.org/pdf/1712.03689v1.pdf
PWC	https://paperswithcode.com/paper/the-effectiveness-of-data-augmentation-for
Repo
Framework

Learning a Complete Image Indexing Pipeline


Title	Learning a Complete Image Indexing Pipeline
Authors	Himalaya Jain, Joaquin Zepeda, Patrick Pérez, Rémi Gribonval
Abstract	To work at scale, a complete image indexing system comprises two components: An inverted file index to restrict the actual search to only a subset that should contain most of the items relevant to the query; An approximate distance computation mechanism to rapidly scan these lists. While supervised deep learning has recently enabled improvements to the latter, the former continues to be based on unsupervised clustering in the literature. In this work, we propose a first system that learns both components within a unifying neural framework of structured binary encoding.
Tasks
Published	2017-12-12
URL	http://arxiv.org/abs/1712.04480v1
PDF	http://arxiv.org/pdf/1712.04480v1.pdf
PWC	https://paperswithcode.com/paper/learning-a-complete-image-indexing-pipeline
Repo
Framework

On Residual CNN in text-dependent speaker verification task


Title	On Residual CNN in text-dependent speaker verification task
Authors	Egor Malykh, Sergey Novoselov, Oleg Kudashev
Abstract	Deep learning approaches are still not very common in the speaker verification field. We investigate the possibility of using deep residual convolutional neural network with spectrograms as an input features in the text-dependent speaker verification task. Despite the fact that we were not able to surpass the baseline system in quality, we achieved a quite good results for such a new approach getting an 5.23% ERR on the RSR2015 evaluation part. Fusion of the baseline and proposed systems outperformed the best individual system by 18% relatively.
Tasks	Speaker Verification, Text-Dependent Speaker Verification
Published	2017-05-29
URL	http://arxiv.org/abs/1705.10134v2
PDF	http://arxiv.org/pdf/1705.10134v2.pdf
PWC	https://paperswithcode.com/paper/on-residual-cnn-in-text-dependent-speaker
Repo
Framework

Skyline Identification in Multi-Armed Bandits


Title	Skyline Identification in Multi-Armed Bandits
Authors	Albert Cheu, Ravi Sundaram, Jonathan Ullman
Abstract	We introduce a variant of the classical PAC multi-armed bandit problem. There is an ordered set of $n$ arms $A[1],\dots,A[n]$, each with some stochastic reward drawn from some unknown bounded distribution. The goal is to identify the $skyline$ of the set $A$, consisting of all arms $A[i]$ such that $A[i]$ has larger expected reward than all lower-numbered arms $A[1],\dots,A[i-1]$. We define a natural notion of an $\varepsilon$-approximate skyline and prove matching upper and lower bounds for identifying an $\varepsilon$-skyline. Specifically, we show that in order to identify an $\varepsilon$-skyline from among $n$ arms with probability $1-\delta$, $$ \Theta\bigg(\frac{n}{\varepsilon^2} \cdot \min\bigg{ \log\bigg(\frac{1}{\varepsilon \delta}\bigg), \log\bigg(\frac{n}{\delta}\bigg) \bigg} \bigg) $$ samples are necessary and sufficient. When $\varepsilon \gg 1/n$, our results improve over the naive algorithm, which draws enough samples to approximate the expected reward of every arm; the algorithm of (Auer et al., AISTATS’16) for Pareto-optimal arm identification is likewise superseded. Our results show that the sample complexity of the skyline problem lies strictly in between that of best arm identification (Even-Dar et al., COLT’02) and that of approximating the expected reward of every arm.
Tasks	Multi-Armed Bandits
Published	2017-11-12
URL	http://arxiv.org/abs/1711.04213v2
PDF	http://arxiv.org/pdf/1711.04213v2.pdf
PWC	https://paperswithcode.com/paper/skyline-identification-in-multi-armed-bandits
Repo
Framework