May 6, 2019

3320 words 16 mins read

Paper Group ANR 203

Paper Group ANR 203

Differential TD Learning for Value Function Approximation. Outlier Detection on Mixed-Type Data: An Energy-based Approach. A Review of 40 Years of Cognitive Architecture Research: Core Cognitive Abilities and Practical Applications. On Human Intellect and Machine Failures: Troubleshooting Integrative Machine Learning Systems. A Bayesian non-paramet …

Differential TD Learning for Value Function Approximation

Title Differential TD Learning for Value Function Approximation
Authors Adithya M. Devraj, Sean P. Meyn
Abstract Value functions arise as a component of algorithms as well as performance metrics in statistics and engineering applications. Computation of the associated Bellman equations is numerically challenging in all but a few special cases. A popular approximation technique is known as Temporal Difference (TD) learning. The algorithm introduced in this paper is intended to resolve two well-known problems with this approach: In the discounted-cost setting, the variance of the algorithm diverges as the discount factor approaches unity. Second, for the average cost setting, unbiased algorithms exist only in special cases. It is shown that the gradient of any of these value functions admits a representation that lends itself to algorithm design. Based on this result, the new differential TD method is obtained for Markovian models on Euclidean space with smooth dynamics. Numerical examples show remarkable improvements in performance. In application to speed scaling, variance is reduced by two orders of magnitude.
Tasks
Published 2016-04-06
URL http://arxiv.org/abs/1604.01828v3
PDF http://arxiv.org/pdf/1604.01828v3.pdf
PWC https://paperswithcode.com/paper/differential-td-learning-for-value-function
Repo
Framework

Outlier Detection on Mixed-Type Data: An Energy-based Approach

Title Outlier Detection on Mixed-Type Data: An Energy-based Approach
Authors Kien Do, Truyen Tran, Dinh Phung, Svetha Venkatesh
Abstract Outlier detection amounts to finding data points that differ significantly from the norm. Classic outlier detection methods are largely designed for single data type such as continuous or discrete. However, real world data is increasingly heterogeneous, where a data point can have both discrete and continuous attributes. Handling mixed-type data in a disciplined way remains a great challenge. In this paper, we propose a new unsupervised outlier detection method for mixed-type data based on Mixed-variate Restricted Boltzmann Machine (Mv.RBM). The Mv.RBM is a principled probabilistic method that models data density. We propose to use \emph{free-energy} derived from Mv.RBM as outlier score to detect outliers as those data points lying in low density regions. The method is fast to learn and compute, is scalable to massive datasets. At the same time, the outlier score is identical to data negative log-density up-to an additive constant. We evaluate the proposed method on synthetic and real-world datasets and demonstrate that (a) a proper handling mixed-types is necessary in outlier detection, and (b) free-energy of Mv.RBM is a powerful and efficient outlier scoring method, which is highly competitive against state-of-the-arts.
Tasks Outlier Detection
Published 2016-08-17
URL http://arxiv.org/abs/1608.04830v1
PDF http://arxiv.org/pdf/1608.04830v1.pdf
PWC https://paperswithcode.com/paper/outlier-detection-on-mixed-type-data-an
Repo
Framework

A Review of 40 Years of Cognitive Architecture Research: Core Cognitive Abilities and Practical Applications

Title A Review of 40 Years of Cognitive Architecture Research: Core Cognitive Abilities and Practical Applications
Authors Iuliia Kotseruba, John K. Tsotsos
Abstract In this paper we present a broad overview of the last 40 years of research on cognitive architectures. Although the number of existing architectures is nearing several hundred, most of the existing surveys do not reflect this growth and focus on a handful of well-established architectures. Thus, in this survey we wanted to shift the focus towards a more inclusive and high-level overview of the research on cognitive architectures. Our final set of 84 architectures includes 49 that are still actively developed, and borrow from a diverse set of disciplines, spanning areas from psychoanalysis to neuroscience. To keep the length of this paper within reasonable limits we discuss only the core cognitive abilities, such as perception, attention mechanisms, action selection, memory, learning and reasoning. In order to assess the breadth of practical applications of cognitive architectures we gathered information on over 900 practical projects implemented using the cognitive architectures in our list. We use various visualization techniques to highlight overall trends in the development of the field. In addition to summarizing the current state-of-the-art in the cognitive architecture research, this survey describes a variety of methods and ideas that have been tried and their relative success in modeling human cognitive abilities, as well as which aspects of cognitive behavior need more research with respect to their mechanistic counterparts and thus can further inform how cognitive science might progress.
Tasks
Published 2016-10-27
URL http://arxiv.org/abs/1610.08602v3
PDF http://arxiv.org/pdf/1610.08602v3.pdf
PWC https://paperswithcode.com/paper/a-review-of-40-years-of-cognitive
Repo
Framework

On Human Intellect and Machine Failures: Troubleshooting Integrative Machine Learning Systems

Title On Human Intellect and Machine Failures: Troubleshooting Integrative Machine Learning Systems
Authors Besmira Nushi, Ece Kamar, Eric Horvitz, Donald Kossmann
Abstract We study the problem of troubleshooting machine learning systems that rely on analytical pipelines of distinct components. Understanding and fixing errors that arise in such integrative systems is difficult as failures can occur at multiple points in the execution workflow. Moreover, errors can propagate, become amplified or be suppressed, making blame assignment difficult. We propose a human-in-the-loop methodology which leverages human intellect for troubleshooting system failures. The approach simulates potential component fixes through human computation tasks and measures the expected improvements in the holistic behavior of the system. The method provides guidance to designers about how they can best improve the system. We demonstrate the effectiveness of the approach on an automated image captioning system that has been pressed into real-world use.
Tasks Image Captioning
Published 2016-11-24
URL http://arxiv.org/abs/1611.08309v1
PDF http://arxiv.org/pdf/1611.08309v1.pdf
PWC https://paperswithcode.com/paper/on-human-intellect-and-machine-failures
Repo
Framework

A Bayesian non-parametric method for clustering high-dimensional binary data

Title A Bayesian non-parametric method for clustering high-dimensional binary data
Authors Tapesh Santra
Abstract In many real life problems, objects are described by large number of binary features. For instance, documents are characterized by presence or absence of certain keywords; cancer patients are characterized by presence or absence of certain mutations etc. In such cases, grouping together similar objects/profiles based on such high dimensional binary features is desirable, but challenging. Here, I present a Bayesian non parametric algorithm for clustering high dimensional binary data. It uses a Dirichlet Process (DP) mixture model and simulated annealing to not only cluster binary data, but also find optimal number of clusters in the data. The performance of the algorithm was evaluated and compared with other algorithms using simulated datasets. It outperformed all other clustering methods that were tested in the simulation studies. It was also used to cluster real datasets arising from document analysis, handwritten image analysis and cancer research. It successfully divided a set of documents based on their topics, hand written images based on different styles of writing digits and identified tissue and mutation specificity of chemotherapy treatments.
Tasks
Published 2016-03-08
URL http://arxiv.org/abs/1603.02494v1
PDF http://arxiv.org/pdf/1603.02494v1.pdf
PWC https://paperswithcode.com/paper/a-bayesian-non-parametric-method-for
Repo
Framework

Predicting Clinical Events by Combining Static and Dynamic Information Using Recurrent Neural Networks

Title Predicting Clinical Events by Combining Static and Dynamic Information Using Recurrent Neural Networks
Authors Cristóbal Esteban, Oliver Staeck, Yinchong Yang, Volker Tresp
Abstract In clinical data sets we often find static information (e.g. patient gender, blood type, etc.) combined with sequences of data that are recorded during multiple hospital visits (e.g. medications prescribed, tests performed, etc.). Recurrent Neural Networks (RNNs) have proven to be very successful for modelling sequences of data in many areas of Machine Learning. In this work we present an approach based on RNNs, specifically designed for the clinical domain, that combines static and dynamic information in order to predict future events. We work with a database collected in the Charit'{e} Hospital in Berlin that contains complete information concerning patients that underwent a kidney transplantation. After the transplantation three main endpoints can occur: rejection of the kidney, loss of the kidney and death of the patient. Our goal is to predict, based on information recorded in the Electronic Health Record of each patient, whether any of those endpoints will occur within the next six or twelve months after each visit to the clinic. We compared different types of RNNs that we developed for this work, with a model based on a Feedforward Neural Network and a Logistic Regression model. We found that the RNN that we developed based on Gated Recurrent Units provides the best performance for this task. We also used the same models for a second task, i.e., next event prediction, and found that here the model based on a Feedforward Neural Network outperformed the other models. Our hypothesis is that long-term dependencies are not as relevant in this task.
Tasks
Published 2016-02-08
URL http://arxiv.org/abs/1602.02685v2
PDF http://arxiv.org/pdf/1602.02685v2.pdf
PWC https://paperswithcode.com/paper/predicting-clinical-events-by-combining
Repo
Framework

A Linked Data Scalability Challenge: Concept Reuse Leads to Semantic Decay

Title A Linked Data Scalability Challenge: Concept Reuse Leads to Semantic Decay
Authors Paolo Pareti, Ewan Klein, Adam Barker
Abstract The increasing amount of available Linked Data resources is laying the foundations for more advanced Semantic Web applications. One of their main limitations, however, remains the general low level of data quality. In this paper we focus on a measure of quality which is negatively affected by the increase of the available resources. We propose a measure of semantic richness of Linked Data concepts and we demonstrate our hypothesis that the more a concept is reused, the less semantically rich it becomes. This is a significant scalability issue, as one of the core aspects of Linked Data is the propagation of semantic information on the Web by reusing common terms. We prove our hypothesis with respect to our measure of semantic richness and we validate our model empirically. Finally, we suggest possible future directions to address this scalability problem.
Tasks
Published 2016-03-05
URL http://arxiv.org/abs/1603.01722v1
PDF http://arxiv.org/pdf/1603.01722v1.pdf
PWC https://paperswithcode.com/paper/a-linked-data-scalability-challenge-concept
Repo
Framework

Knowledge will Propel Machine Understanding of Content: Extrapolating from Current Examples

Title Knowledge will Propel Machine Understanding of Content: Extrapolating from Current Examples
Authors Amit Sheth, Sujan Perera, Sanjaya Wijeratne
Abstract Machine Learning has been a big success story during the AI resurgence. One particular stand out success relates to unsupervised learning from a massive amount of data, albeit much of it relates to one modality/type of data at a time. In spite of early assertions of the unreasonable effectiveness of data, there is increasing recognition of utilizing knowledge whenever it is available or can be created purposefully. In this paper, we focus on discussing the indispensable role of knowledge for deeper understanding of complex text and multimodal data in situations where (i) large amounts of training data (labeled/unlabeled) are not available or labor intensive to create, (ii) the objects (particularly text) to be recognized are complex (i.e., beyond simple entity-person/location/organization names), such as implicit entities and highly subjective content, and (iii) applications need to use complementary or related data in multiple modalities/media. What brings us to the cusp of rapid progress is our ability to (a) create knowledge, varying from comprehensive or cross domain to domain or application specific, and (b) carefully exploit the knowledge to further empower or extend the applications of ML/NLP techniques. Using the early results in several diverse situations - both in data types and applications - we seek to foretell unprecedented progress in our ability for deeper understanding and exploitation of multimodal data.
Tasks
Published 2016-10-25
URL http://arxiv.org/abs/1610.07708v2
PDF http://arxiv.org/pdf/1610.07708v2.pdf
PWC https://paperswithcode.com/paper/knowledge-will-propel-machine-understanding
Repo
Framework

Joint Recognition and Segmentation of Actions via Probabilistic Integration of Spatio-Temporal Fisher Vectors

Title Joint Recognition and Segmentation of Actions via Probabilistic Integration of Spatio-Temporal Fisher Vectors
Authors Johanna Carvajal, Chris McCool, Brian Lovell, Conrad Sanderson
Abstract We propose a hierarchical approach to multi-action recognition that performs joint classification and segmentation. A given video (containing several consecutive actions) is processed via a sequence of overlapping temporal windows. Each frame in a temporal window is represented through selective low-level spatio-temporal features which efficiently capture relevant local dynamics. Features from each window are represented as a Fisher vector, which captures first and second order statistics. Instead of directly classifying each Fisher vector, it is converted into a vector of class probabilities. The final classification decision for each frame is then obtained by integrating the class probabilities at the frame level, which exploits the overlapping of the temporal windows. Experiments were performed on two datasets: s-KTH (a stitched version of the KTH dataset to simulate multi-actions), and the challenging CMU-MMAC dataset. On s-KTH, the proposed approach achieves an accuracy of 85.0%, significantly outperforming two recent approaches based on GMMs and HMMs which obtained 78.3% and 71.2%, respectively. On CMU-MMAC, the proposed approach achieves an accuracy of 40.9%, outperforming the GMM and HMM approaches which obtained 33.7% and 38.4%, respectively. Furthermore, the proposed system is on average 40 times faster than the GMM based approach.
Tasks Temporal Action Localization
Published 2016-02-04
URL http://arxiv.org/abs/1602.01601v3
PDF http://arxiv.org/pdf/1602.01601v3.pdf
PWC https://paperswithcode.com/paper/joint-recognition-and-segmentation-of-actions
Repo
Framework

Convolutional Rectifier Networks as Generalized Tensor Decompositions

Title Convolutional Rectifier Networks as Generalized Tensor Decompositions
Authors Nadav Cohen, Amnon Shashua
Abstract Convolutional rectifier networks, i.e. convolutional neural networks with rectified linear activation and max or average pooling, are the cornerstone of modern deep learning. However, despite their wide use and success, our theoretical understanding of the expressive properties that drive these networks is partial at best. On the other hand, we have a much firmer grasp of these issues in the world of arithmetic circuits. Specifically, it is known that convolutional arithmetic circuits possess the property of “complete depth efficiency”, meaning that besides a negligible set, all functions that can be implemented by a deep network of polynomial size, require exponential size in order to be implemented (or even approximated) by a shallow network. In this paper we describe a construction based on generalized tensor decompositions, that transforms convolutional arithmetic circuits into convolutional rectifier networks. We then use mathematical tools available from the world of arithmetic circuits to prove new results. First, we show that convolutional rectifier networks are universal with max pooling but not with average pooling. Second, and more importantly, we show that depth efficiency is weaker with convolutional rectifier networks than it is with convolutional arithmetic circuits. This leads us to believe that developing effective methods for training convolutional arithmetic circuits, thereby fulfilling their expressive potential, may give rise to a deep learning architecture that is provably superior to convolutional rectifier networks but has so far been overlooked by practitioners.
Tasks
Published 2016-03-01
URL http://arxiv.org/abs/1603.00162v2
PDF http://arxiv.org/pdf/1603.00162v2.pdf
PWC https://paperswithcode.com/paper/convolutional-rectifier-networks-as
Repo
Framework

Significance testing in non-sparse high-dimensional linear models

Title Significance testing in non-sparse high-dimensional linear models
Authors Yinchu Zhu, Jelena Bradic
Abstract In high-dimensional linear models, the sparsity assumption is typically made, stating that most of the parameters are equal to zero. Under the sparsity assumption, estimation and, recently, inference have been well studied. However, in practice, sparsity assumption is not checkable and more importantly is often violated; a large number of covariates might be expected to be associated with the response, indicating that possibly all, rather than just a few, parameters are non-zero. A natural example is a genome-wide gene expression profiling, where all genes are believed to affect a common disease marker. We show that existing inferential methods are sensitive to the sparsity assumption, and may, in turn, result in the severe lack of control of Type-I error. In this article, we propose a new inferential method, named CorrT, which is robust to model misspecification such as heteroscedasticity and lack of sparsity. CorrT is shown to have Type I error approaching the nominal level for \textit{any} models and Type II error approaching zero for sparse and many dense models. In fact, CorrT is also shown to be optimal in a variety of frameworks: sparse, non-sparse and hybrid models where sparse and dense signals are mixed. Numerical experiments show a favorable performance of the CorrT test compared to the state-of-the-art methods.
Tasks
Published 2016-10-07
URL http://arxiv.org/abs/1610.02122v4
PDF http://arxiv.org/pdf/1610.02122v4.pdf
PWC https://paperswithcode.com/paper/significance-testing-in-non-sparse-high
Repo
Framework

Domain Specific Author Attribution Based on Feedforward Neural Network Language Models

Title Domain Specific Author Attribution Based on Feedforward Neural Network Language Models
Authors Zhenhao Ge, Yufang Sun
Abstract Authorship attribution refers to the task of automatically determining the author based on a given sample of text. It is a problem with a long history and has a wide range of application. Building author profiles using language models is one of the most successful methods to automate this task. New language modeling methods based on neural networks alleviate the curse of dimensionality and usually outperform conventional N-gram methods. However, there have not been much research applying them to authorship attribution. In this paper, we present a novel setup of a Neural Network Language Model (NNLM) and apply it to a database of text samples from different authors. We investigate how the NNLM performs on a task with moderate author set size and relatively limited training and test data, and how the topics of the text samples affect the accuracy. NNLM achieves nearly 2.5% reduction in perplexity, a measurement of fitness of a trained language model to the test data. Given 5 random test sentences, it also increases the author classification accuracy by 3.43% on average, compared with the N-gram methods using SRILM tools. An open source implementation of our methodology is freely available at https://github.com/zge/authorship-attribution/.
Tasks Language Modelling
Published 2016-02-24
URL http://arxiv.org/abs/1602.07393v1
PDF http://arxiv.org/pdf/1602.07393v1.pdf
PWC https://paperswithcode.com/paper/domain-specific-author-attribution-based-on
Repo
Framework

A Way out of the Odyssey: Analyzing and Combining Recent Insights for LSTMs

Title A Way out of the Odyssey: Analyzing and Combining Recent Insights for LSTMs
Authors Shayne Longpre, Sabeek Pradhan, Caiming Xiong, Richard Socher
Abstract LSTMs have become a basic building block for many deep NLP models. In recent years, many improvements and variations have been proposed for deep sequence models in general, and LSTMs in particular. We propose and analyze a series of augmentations and modifications to LSTM networks resulting in improved performance for text classification datasets. We observe compounding improvements on traditional LSTMs using Monte Carlo test-time model averaging, average pooling, and residual connections, along with four other suggested modifications. Our analysis provides a simple, reliable, and high quality baseline model.
Tasks Text Classification
Published 2016-11-16
URL http://arxiv.org/abs/1611.05104v2
PDF http://arxiv.org/pdf/1611.05104v2.pdf
PWC https://paperswithcode.com/paper/a-way-out-of-the-odyssey-analyzing-and
Repo
Framework

Dual-tree $k$-means with bounded iteration runtime

Title Dual-tree $k$-means with bounded iteration runtime
Authors Ryan R. Curtin
Abstract k-means is a widely used clustering algorithm, but for $k$ clusters and a dataset size of $N$, each iteration of Lloyd’s algorithm costs $O(kN)$ time. Although there are existing techniques to accelerate single Lloyd iterations, none of these are tailored to the case of large $k$, which is increasingly common as dataset sizes grow. We propose a dual-tree algorithm that gives the exact same results as standard $k$-means; when using cover trees, we use adaptive analysis techniques to, under some assumptions, bound the single-iteration runtime of the algorithm as $O(N + k log k)$. To our knowledge these are the first sub-$O(kN)$ bounds for exact Lloyd iterations. We then show that this theoretically favorable algorithm performs competitively in practice, especially for large $N$ and $k$ in low dimensions. Further, the algorithm is tree-independent, so any type of tree may be used.
Tasks
Published 2016-01-14
URL http://arxiv.org/abs/1601.03754v1
PDF http://arxiv.org/pdf/1601.03754v1.pdf
PWC https://paperswithcode.com/paper/dual-tree-k-means-with-bounded-iteration
Repo
Framework

Robustness of Voice Conversion Techniques Under Mismatched Conditions

Title Robustness of Voice Conversion Techniques Under Mismatched Conditions
Authors Monisankha Pal, Dipjyoti Paul, Md Sahidullah, Goutam Saha
Abstract Most of the existing studies on voice conversion (VC) are conducted in acoustically matched conditions between source and target signal. However, the robustness of VC methods in presence of mismatch remains unknown. In this paper, we report a comparative analysis of different VC techniques under mismatched conditions. The extensive experiments with five different VC techniques on CMU ARCTIC corpus suggest that performance of VC methods substantially degrades in noisy conditions. We have found that bilinear frequency warping with amplitude scaling (BLFWAS) outperforms other methods in most of the noisy conditions. We further explore the suitability of different speech enhancement techniques for robust conversion. The objective evaluation results indicate that spectral subtraction and log minimum mean square error (logMMSE) based speech enhancement techniques can be used to improve the performance in specific noisy conditions.
Tasks Speech Enhancement, Voice Conversion
Published 2016-12-22
URL http://arxiv.org/abs/1612.07523v1
PDF http://arxiv.org/pdf/1612.07523v1.pdf
PWC https://paperswithcode.com/paper/robustness-of-voice-conversion-techniques
Repo
Framework
comments powered by Disqus