Paper Group ANR 395
Does Weather Matter? Causal Analysis of TV Logs. Scale-invariant temporal history (SITH): optimal slicing of the past in an uncertain world. A Method for Determining Weights of Criterias and Alternative of Fuzzy Group Decision Making Problem. A proposal for ethically traceable artificial intelligence. Learning Independent Causal Mechanisms. Learnin …
Does Weather Matter? Causal Analysis of TV Logs
Title | Does Weather Matter? Causal Analysis of TV Logs |
Authors | Shi Zong, Branislav Kveton, Shlomo Berkovsky, Azin Ashkan, Nikos Vlassis, Zheng Wen |
Abstract | Weather affects our mood and behaviors, and many aspects of our life. When it is sunny, most people become happier; but when it rains, some people get depressed. Despite this evidence and the abundance of data, weather has mostly been overlooked in the machine learning and data science research. This work presents a causal analysis of how weather affects TV watching patterns. We show that some weather attributes, such as pressure and precipitation, cause major changes in TV watching patterns. To the best of our knowledge, this is the first large-scale causal study of the impact of weather on TV watching patterns. |
Tasks | |
Published | 2017-01-25 |
URL | http://arxiv.org/abs/1701.08716v2 |
http://arxiv.org/pdf/1701.08716v2.pdf | |
PWC | https://paperswithcode.com/paper/does-weather-matter-causal-analysis-of-tv |
Repo | |
Framework | |
Scale-invariant temporal history (SITH): optimal slicing of the past in an uncertain world
Title | Scale-invariant temporal history (SITH): optimal slicing of the past in an uncertain world |
Authors | Tyler A. Spears, Brandon G. Jacques, Marc W. Howard, Per B. Sederberg |
Abstract | In both the human brain and any general artificial intelligence (AI), a representation of the past is necessary to predict the future. However, perfect storage of all experiences is not feasible. One approach utilized in many applications, including reward prediction in reinforcement learning, is to retain recently active features of experience in a buffer. Despite its prior successes, we show that the fixed length buffer renders Deep Q-learning Networks (DQNs) fragile to changes in the scale over which information can be learned. To enable learning when the relevant temporal scales in the environment are not known a priori, recent advances in psychology and neuroscience suggest that the brain maintains a compressed representation of the past. Here we introduce a neurally-plausible, scale-free memory representation we call Scale-Invariant Temporal History (SITH) for use with artificial agents. This representation covers an exponentially large period of time by sacrificing temporal accuracy for events further in the past. We demonstrate the utility of this representation by comparing the performance of agents given SITH, buffer, and exponential decay representations in learning to play video games at different levels of complexity. In these environments, SITH exhibits better learning performance by storing information for longer timescales than a fixed-size buffer, and representing this information more clearly than a set of exponentially decayed features. Finally, we discuss how the application of SITH, along with other human-inspired models of cognition, could improve reinforcement and machine learning algorithms in general. |
Tasks | Q-Learning |
Published | 2017-12-19 |
URL | http://arxiv.org/abs/1712.07165v3 |
http://arxiv.org/pdf/1712.07165v3.pdf | |
PWC | https://paperswithcode.com/paper/scale-invariant-temporal-history-sith-optimal |
Repo | |
Framework | |
A Method for Determining Weights of Criterias and Alternative of Fuzzy Group Decision Making Problem
Title | A Method for Determining Weights of Criterias and Alternative of Fuzzy Group Decision Making Problem |
Authors | Jon JaeGyong, Mun JongHui, Ryang GyongIl |
Abstract | In this paper, we constructed a model to determine weights of criterias and presented a solution for determining the optimal alternative by using the constructed model and relationship analysis between criterias in fuzzy group decision-making problem with different forms of preference information of decision makers on criterias. |
Tasks | Decision Making |
Published | 2017-05-16 |
URL | http://arxiv.org/abs/1705.05515v1 |
http://arxiv.org/pdf/1705.05515v1.pdf | |
PWC | https://paperswithcode.com/paper/a-method-for-determining-weights-of-criterias |
Repo | |
Framework | |
A proposal for ethically traceable artificial intelligence
Title | A proposal for ethically traceable artificial intelligence |
Authors | Christopher A. Tucker |
Abstract | Although the problem of a critique of robotic behavior in near-unanimous agreement to human norms seems intractable, a starting point of such an ambition is a framework of the collection of knowledge a priori and experience a posteriori categorized as a set of synthetical judgments available to the intelligence, translated into computer code. If such a proposal were successful, an algorithm with ethically traceable behavior and cogent equivalence to human cognition is established. This paper will propose the application of Kant’s critique of reason to current programming constructs of an autonomous intelligent system. |
Tasks | |
Published | 2017-03-06 |
URL | http://arxiv.org/abs/1703.01908v2 |
http://arxiv.org/pdf/1703.01908v2.pdf | |
PWC | https://paperswithcode.com/paper/a-proposal-for-ethically-traceable-artificial |
Repo | |
Framework | |
Learning Independent Causal Mechanisms
Title | Learning Independent Causal Mechanisms |
Authors | Giambattista Parascandolo, Niki Kilbertus, Mateo Rojas-Carulla, Bernhard Schölkopf |
Abstract | Statistical learning relies upon data sampled from a distribution, and we usually do not care what actually generated it in the first place. From the point of view of causal modeling, the structure of each distribution is induced by physical mechanisms that give rise to dependences between observables. Mechanisms, however, can be meaningful autonomous modules of generative models that make sense beyond a particular entailed data distribution, lending themselves to transfer between problems. We develop an algorithm to recover a set of independent (inverse) mechanisms from a set of transformed data points. The approach is unsupervised and based on a set of experts that compete for data generated by the mechanisms, driving specialization. We analyze the proposed method in a series of experiments on image data. Each expert learns to map a subset of the transformed data back to a reference distribution. The learned mechanisms generalize to novel domains. We discuss implications for transfer learning and links to recent trends in generative modeling. |
Tasks | Transfer Learning |
Published | 2017-12-04 |
URL | http://arxiv.org/abs/1712.00961v5 |
http://arxiv.org/pdf/1712.00961v5.pdf | |
PWC | https://paperswithcode.com/paper/learning-independent-causal-mechanisms |
Repo | |
Framework | |
Learning linear structural equation models in polynomial time and sample complexity
Title | Learning linear structural equation models in polynomial time and sample complexity |
Authors | Asish Ghoshal, Jean Honorio |
Abstract | The problem of learning structural equation models (SEMs) from data is a fundamental problem in causal inference. We develop a new algorithm — which is computationally and statistically efficient and works in the high-dimensional regime — for learning linear SEMs from purely observational data with arbitrary noise distribution. We consider three aspects of the problem: identifiability, computational efficiency, and statistical efficiency. We show that when data is generated from a linear SEM over $p$ nodes and maximum degree $d$, our algorithm recovers the directed acyclic graph (DAG) structure of the SEM under an identifiability condition that is more general than those considered in the literature, and without faithfulness assumptions. In the population setting, our algorithm recovers the DAG structure in $\mathcal{O}(p(d^2 + \log p))$ operations. In the finite sample setting, if the estimated precision matrix is sparse, our algorithm has a smoothed complexity of $\widetilde{\mathcal{O}}(p^3 + pd^7)$, while if the estimated precision matrix is dense, our algorithm has a smoothed complexity of $\widetilde{\mathcal{O}}(p^5)$. For sub-Gaussian noise, we show that our algorithm has a sample complexity of $\mathcal{O}(\frac{d^8}{\varepsilon^2} \log (\frac{p}{\sqrt{\delta}}))$ to achieve $\varepsilon$ element-wise additive error with respect to the true autoregression matrix with probability at most $1 - \delta$, while for noise with bounded $(4m)$-th moment, with $m$ being a positive integer, our algorithm has a sample complexity of $\mathcal{O}(\frac{d^8}{\varepsilon^2} (\frac{p^2}{\delta})^{1/m})$. |
Tasks | Causal Inference |
Published | 2017-07-15 |
URL | http://arxiv.org/abs/1707.04673v1 |
http://arxiv.org/pdf/1707.04673v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-linear-structural-equation-models-in |
Repo | |
Framework | |
Determinants of Mobile Money Adoption in Pakistan
Title | Determinants of Mobile Money Adoption in Pakistan |
Authors | Muhammad Raza Khan, Joshua Blumenstock |
Abstract | In this work, we analyze the problem of adoption of mobile money in Pakistan by using the call detail records of a major telecom company as our input. Our results highlight the fact that different sections of the society have different patterns of adoption of digital financial services but user mobility related features are the most important one when it comes to adopting and using mobile money services. |
Tasks | |
Published | 2017-11-13 |
URL | http://arxiv.org/abs/1712.01081v1 |
http://arxiv.org/pdf/1712.01081v1.pdf | |
PWC | https://paperswithcode.com/paper/determinants-of-mobile-money-adoption-in |
Repo | |
Framework | |
Learning Hard Alignments with Variational Inference
Title | Learning Hard Alignments with Variational Inference |
Authors | Dieterich Lawson, Chung-Cheng Chiu, George Tucker, Colin Raffel, Kevin Swersky, Navdeep Jaitly |
Abstract | There has recently been significant interest in hard attention models for tasks such as object recognition, visual captioning and speech recognition. Hard attention can offer benefits over soft attention such as decreased computational cost, but training hard attention models can be difficult because of the discrete latent variables they introduce. Previous work used REINFORCE and Q-learning to approach these issues, but those methods can provide high-variance gradient estimates and be slow to train. In this paper, we tackle the problem of learning hard attention for a sequential task using variational inference methods, specifically the recently introduced VIMCO and NVIL. Furthermore, we propose a novel baseline that adapts VIMCO to this setting. We demonstrate our method on a phoneme recognition task in clean and noisy environments and show that our method outperforms REINFORCE, with the difference being greater for a more complicated task. |
Tasks | Image Captioning, Object Recognition, Q-Learning, Speech Recognition |
Published | 2017-05-16 |
URL | http://arxiv.org/abs/1705.05524v2 |
http://arxiv.org/pdf/1705.05524v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-hard-alignments-with-variational |
Repo | |
Framework | |
The information bottleneck and geometric clustering
Title | The information bottleneck and geometric clustering |
Authors | D J Strouse, David J Schwab |
Abstract | The information bottleneck (IB) approach to clustering takes a joint distribution $P!\left(X,Y\right)$ and maps the data $X$ to cluster labels $T$ which retain maximal information about $Y$ (Tishby et al., 1999). This objective results in an algorithm that clusters data points based upon the similarity of their conditional distributions $P!\left(Y\mid X\right)$. This is in contrast to classic “geometric clustering” algorithms such as $k$-means and gaussian mixture models (GMMs) which take a set of observed data points $\left{ \mathbf{x}{i}\right}{i=1:N}$ and cluster them based upon their geometric (typically Euclidean) distance from one another. Here, we show how to use the deterministic information bottleneck (DIB) (Strouse and Schwab, 2017), a variant of IB, to perform geometric clustering, by choosing cluster labels that preserve information about data point location on a smoothed dataset. We also introduce a novel intuitive method to choose the number of clusters, via kinks in the information curve. We apply this approach to a variety of simple clustering problems, showing that DIB with our model selection procedure recovers the generative cluster labels. We also show that, for one simple case, DIB interpolates between the cluster boundaries of GMMs and $k$-means in the large data limit. Thus, our IB approach to clustering also provides an information-theoretic perspective on these classic algorithms. |
Tasks | Model Selection |
Published | 2017-12-27 |
URL | http://arxiv.org/abs/1712.09657v1 |
http://arxiv.org/pdf/1712.09657v1.pdf | |
PWC | https://paperswithcode.com/paper/the-information-bottleneck-and-geometric |
Repo | |
Framework | |
A Recursive Bayesian Approach To Describe Retinal Vasculature Geometry
Title | A Recursive Bayesian Approach To Describe Retinal Vasculature Geometry |
Authors | Fatmatulzehra Uslu, Anil Anthony Bharath |
Abstract | Demographic studies suggest that changes in the retinal vasculature geometry, especially in vessel width, are associated with the incidence or progression of eye-related or systemic diseases. To date, the main information source for width estimation from fundus images has been the intensity profile between vessel edges. However, there are many factors affecting the intensity profile: pathologies, the central light reflex and local illumination levels, to name a few. In this study, we introduce three information sources for width estimation. These are the probability profiles of vessel interior, centreline and edge locations generated by a deep network. The probability profiles provide direct access to vessel geometry and are used in the likelihood calculation for a Bayesian method, particle filtering. We also introduce a geometric model which can handle non-ideal conditions of the probability profiles. Our experiments conducted on the REVIEW dataset yielded consistent estimates of vessel width, even in cases when one of the vessel edges is difficult to identify. Moreover, our results suggest that the method is better than human observers at locating edges of low contrast vessels. |
Tasks | |
Published | 2017-11-28 |
URL | http://arxiv.org/abs/1711.10521v1 |
http://arxiv.org/pdf/1711.10521v1.pdf | |
PWC | https://paperswithcode.com/paper/a-recursive-bayesian-approach-to-describe |
Repo | |
Framework | |
Mining a Sub-Matrix of Maximal Sum
Title | Mining a Sub-Matrix of Maximal Sum |
Authors | Vincent Branders, Pierre Schaus, Pierre Dupont |
Abstract | Biclustering techniques have been widely used to identify homogeneous subgroups within large data matrices, such as subsets of genes similarly expressed across subsets of patients. Mining a max-sum sub-matrix is a related but distinct problem for which one looks for a (non-necessarily contiguous) rectangular sub-matrix with a maximal sum of its entries. Le Van et al. (Ranked Tiling, 2014) already illustrated its applicability to gene expression analysis and addressed it with a constraint programming (CP) approach combined with large neighborhood search (CP-LNS). In this work, we exhibit some key properties of this NP-hard problem and define a bounding function such that larger problems can be solved in reasonable time. Two different algorithms are proposed in order to exploit the highlighted characteristics of the problem: a CP approach with a global constraint (CPGC) and mixed integer linear programming (MILP). Practical experiments conducted both on synthetic and real gene expression data exhibit the characteristics of these approaches and their relative benefits over the original CP-LNS method. Overall, the CPGC approach tends to be the fastest to produce a good solution. Yet, the MILP formulation is arguably the easiest to formulate and can also be competitive. |
Tasks | |
Published | 2017-09-25 |
URL | http://arxiv.org/abs/1709.08461v1 |
http://arxiv.org/pdf/1709.08461v1.pdf | |
PWC | https://paperswithcode.com/paper/mining-a-sub-matrix-of-maximal-sum |
Repo | |
Framework | |
The Effectiveness of Data Augmentation for Detection of Gastrointestinal Diseases from Endoscopical Images
Title | The Effectiveness of Data Augmentation for Detection of Gastrointestinal Diseases from Endoscopical Images |
Authors | Andrea Asperti, Claudio Mastronardo |
Abstract | The lack, due to privacy concerns, of large public databases of medical pathologies is a well-known and major problem, substantially hindering the application of deep learning techniques in this field. In this article, we investigate the possibility to supply to the deficiency in the number of data by means of data augmentation techniques, working on the recent Kvasir dataset of endoscopical images of gastrointestinal diseases. The dataset comprises 4,000 colored images labeled and verified by medical endoscopists, covering a few common pathologies at different anatomical landmarks: Z-line, pylorus and cecum. We show how the application of data augmentation techniques allows to achieve sensible improvements of the classification with respect to previous approaches, both in terms of precision and recall. |
Tasks | Data Augmentation |
Published | 2017-12-11 |
URL | http://arxiv.org/abs/1712.03689v1 |
http://arxiv.org/pdf/1712.03689v1.pdf | |
PWC | https://paperswithcode.com/paper/the-effectiveness-of-data-augmentation-for |
Repo | |
Framework | |
Learning a Complete Image Indexing Pipeline
Title | Learning a Complete Image Indexing Pipeline |
Authors | Himalaya Jain, Joaquin Zepeda, Patrick Pérez, Rémi Gribonval |
Abstract | To work at scale, a complete image indexing system comprises two components: An inverted file index to restrict the actual search to only a subset that should contain most of the items relevant to the query; An approximate distance computation mechanism to rapidly scan these lists. While supervised deep learning has recently enabled improvements to the latter, the former continues to be based on unsupervised clustering in the literature. In this work, we propose a first system that learns both components within a unifying neural framework of structured binary encoding. |
Tasks | |
Published | 2017-12-12 |
URL | http://arxiv.org/abs/1712.04480v1 |
http://arxiv.org/pdf/1712.04480v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-a-complete-image-indexing-pipeline |
Repo | |
Framework | |
On Residual CNN in text-dependent speaker verification task
Title | On Residual CNN in text-dependent speaker verification task |
Authors | Egor Malykh, Sergey Novoselov, Oleg Kudashev |
Abstract | Deep learning approaches are still not very common in the speaker verification field. We investigate the possibility of using deep residual convolutional neural network with spectrograms as an input features in the text-dependent speaker verification task. Despite the fact that we were not able to surpass the baseline system in quality, we achieved a quite good results for such a new approach getting an 5.23% ERR on the RSR2015 evaluation part. Fusion of the baseline and proposed systems outperformed the best individual system by 18% relatively. |
Tasks | Speaker Verification, Text-Dependent Speaker Verification |
Published | 2017-05-29 |
URL | http://arxiv.org/abs/1705.10134v2 |
http://arxiv.org/pdf/1705.10134v2.pdf | |
PWC | https://paperswithcode.com/paper/on-residual-cnn-in-text-dependent-speaker |
Repo | |
Framework | |
Skyline Identification in Multi-Armed Bandits
Title | Skyline Identification in Multi-Armed Bandits |
Authors | Albert Cheu, Ravi Sundaram, Jonathan Ullman |
Abstract | We introduce a variant of the classical PAC multi-armed bandit problem. There is an ordered set of $n$ arms $A[1],\dots,A[n]$, each with some stochastic reward drawn from some unknown bounded distribution. The goal is to identify the $skyline$ of the set $A$, consisting of all arms $A[i]$ such that $A[i]$ has larger expected reward than all lower-numbered arms $A[1],\dots,A[i-1]$. We define a natural notion of an $\varepsilon$-approximate skyline and prove matching upper and lower bounds for identifying an $\varepsilon$-skyline. Specifically, we show that in order to identify an $\varepsilon$-skyline from among $n$ arms with probability $1-\delta$, $$ \Theta\bigg(\frac{n}{\varepsilon^2} \cdot \min\bigg{ \log\bigg(\frac{1}{\varepsilon \delta}\bigg), \log\bigg(\frac{n}{\delta}\bigg) \bigg} \bigg) $$ samples are necessary and sufficient. When $\varepsilon \gg 1/n$, our results improve over the naive algorithm, which draws enough samples to approximate the expected reward of every arm; the algorithm of (Auer et al., AISTATS’16) for Pareto-optimal arm identification is likewise superseded. Our results show that the sample complexity of the skyline problem lies strictly in between that of best arm identification (Even-Dar et al., COLT’02) and that of approximating the expected reward of every arm. |
Tasks | Multi-Armed Bandits |
Published | 2017-11-12 |
URL | http://arxiv.org/abs/1711.04213v2 |
http://arxiv.org/pdf/1711.04213v2.pdf | |
PWC | https://paperswithcode.com/paper/skyline-identification-in-multi-armed-bandits |
Repo | |
Framework | |