Paper Group ANR 692
Accounting for hidden common causes when inferring cause and effect from observational data. Learning Patient Representations from Text. Learning Neural Emotion Analysis from 100 Observations: The Surprising Effectiveness of Pre-Trained Word Representations. Determining the best classifier for predicting the value of a boolean field on a blood dono …
Accounting for hidden common causes when inferring cause and effect from observational data
Title | Accounting for hidden common causes when inferring cause and effect from observational data |
Authors | David Heckerman |
Abstract | Identifying causal relationships from observation data is difficult, in large part, due to the presence of hidden common causes. In some cases, where just the right patterns of conditional independence and dependence lie in the data—for example, Y-structures—it is possible to identify cause and effect. In other cases, the analyst deliberately makes an uncertain assumption that hidden common causes are absent, and infers putative causal relationships to be tested in a randomized trial. Here, we consider a third approach, where there are sufficient clues in the data such that hidden common causes can be inferred. |
Tasks | |
Published | 2018-01-02 |
URL | http://arxiv.org/abs/1801.00727v2 |
http://arxiv.org/pdf/1801.00727v2.pdf | |
PWC | https://paperswithcode.com/paper/accounting-for-hidden-common-causes-when |
Repo | |
Framework | |
Learning Patient Representations from Text
Title | Learning Patient Representations from Text |
Authors | Dmitriy Dligach, Timothy Miller |
Abstract | Mining electronic health records for patients who satisfy a set of predefined criteria is known in medical informatics as phenotyping. Phenotyping has numerous applications such as outcome prediction, clinical trial recruitment, and retrospective studies. Supervised machine learning for phenotyping typically relies on sparse patient representations such as bag-of-words. We consider an alternative that involves learning patient representations. We develop a neural network model for learning patient representations and show that the learned representations are general enough to obtain state-of-the-art performance on a standard comorbidity detection task. |
Tasks | |
Published | 2018-05-05 |
URL | http://arxiv.org/abs/1805.02096v1 |
http://arxiv.org/pdf/1805.02096v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-patient-representations-from-text |
Repo | |
Framework | |
Learning Neural Emotion Analysis from 100 Observations: The Surprising Effectiveness of Pre-Trained Word Representations
Title | Learning Neural Emotion Analysis from 100 Observations: The Surprising Effectiveness of Pre-Trained Word Representations |
Authors | Sven Buechel, João Sedoc, H. Andrew Schwartz, Lyle Ungar |
Abstract | Deep Learning has drastically reshaped virtually all areas of NLP. Yet on the downside, it is commonly thought to be dependent on vast amounts of training data. As such, these techniques appear ill-suited for areas where annotated data is limited, like emotion analysis, with its many nuanced and hard-to-acquire annotation formats, or other low-data scenarios encountered in under-resourced languages. In contrast to this popular notion, we provide empirical evidence from three typologically diverse languages that today’s favorite neural architectures can be trained on a few hundred observations only. Our results suggest that high-quality, pre-trained word embeddings are crucial for achieving high performance despite such strong data limitations. |
Tasks | Emotion Recognition, Word Embeddings |
Published | 2018-10-25 |
URL | http://arxiv.org/abs/1810.10949v1 |
http://arxiv.org/pdf/1810.10949v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-neural-emotion-analysis-from-100 |
Repo | |
Framework | |
Determining the best classifier for predicting the value of a boolean field on a blood donor database using genetic algorithms
Title | Determining the best classifier for predicting the value of a boolean field on a blood donor database using genetic algorithms |
Authors | Ritabrata Maiti |
Abstract | Motivation: Thanks to digitization, we often have access to large databases, consisting of various fields of information, ranging from numbers to texts and even boolean values. Such databases lend themselves especially well to machine learning, classification and big data analysis tasks. We are able to train classifiers, using already existing data and use them for predicting the values of a certain field, given that we have information regarding the other fields. Most specifically, in this study, we look at the Electronic Health Records (EHRs) that are compiled by hospitals. These EHRs are convenient means of accessing data of individual patients, but there processing as a whole still remains a task. However, EHRs that are composed of coherent, well-tabulated structures lend themselves quite well to the application to machine language, via the usage of classifiers. In this study, we look at a Blood Transfusion Service Center Data Set (Data taken from the Blood Transfusion Service Center in Hsin-Chu City in Taiwan). We used scikit-learn machine learning in python. From Support Vector Machines(SVM), we use Support Vector Classification(SVC), from the linear model we import Perceptron. We also used the K.neighborsclassifier and the decision tree classifiers. Furthermore, we use the TPOT library to find an optimized pipeline using genetic algorithms. Using the above classifiers, we score each one of them using k fold cross-validation. Contact: ritabratamaiti@hiretrex.com GitHub Repository: https://github.com/ritabratamaiti/Blooddonorprediction |
Tasks | |
Published | 2018-02-21 |
URL | http://arxiv.org/abs/1802.07756v4 |
http://arxiv.org/pdf/1802.07756v4.pdf | |
PWC | https://paperswithcode.com/paper/determining-the-best-classifier-for |
Repo | |
Framework | |
Explainable Security
Title | Explainable Security |
Authors | Luca Viganò, Daniele Magazzeni |
Abstract | The Defense Advanced Research Projects Agency (DARPA) recently launched the Explainable Artificial Intelligence (XAI) program that aims to create a suite of new AI techniques that enable end users to understand, appropriately trust, and effectively manage the emerging generation of AI systems. In this paper, inspired by DARPA’s XAI program, we propose a new paradigm in security research: Explainable Security (XSec). We discuss the ``Six Ws’’ of XSec (Who? What? Where? When? Why? and How?) and argue that XSec has unique and complex characteristics: XSec involves several different stakeholders (i.e., the system’s developers, analysts, users and attackers) and is multi-faceted by nature (as it requires reasoning about system model, threat model and properties of security, privacy and trust as well as about concrete attacks, vulnerabilities and countermeasures). We define a roadmap for XSec that identifies several possible research directions. | |
Tasks | |
Published | 2018-07-11 |
URL | http://arxiv.org/abs/1807.04178v1 |
http://arxiv.org/pdf/1807.04178v1.pdf | |
PWC | https://paperswithcode.com/paper/explainable-security |
Repo | |
Framework | |
Hierarchical Attention Networks for Knowledge Base Completion via Joint Adversarial Training
Title | Hierarchical Attention Networks for Knowledge Base Completion via Joint Adversarial Training |
Authors | Chen Li, Xutan Peng, Shanghang Zhang, Jianxin Li, Lihong Wang |
Abstract | Knowledge Base (KB) completion, which aims to determine missing relation between entities, has raised increasing attention in recent years. Most existing methods either focus on the positional relationship between entity pair and single relation (1-hop path) in semantic space or concentrate on the joint probability of Random Walks on multi-hop paths among entities. However, they do not fully consider the intrinsic relationships of all the links among entities. By observing that the single relation and multi-hop paths between the same entity pair generally contain shared/similar semantic information, this paper proposes a novel method to capture the shared features between them as the basis for inferring missing relations. To capture the shared features jointly, we develop Hierarchical Attention Networks (HANs) to automatically encode the inputs into low-dimensional vectors, and exploit two partial parameter-shared components, one for feature source discrimination and the other for determining missing relations. By joint Adversarial Training (AT) the entire model, our method minimizes the classification error of missing relations, and ensures the source of shared features are difficult to discriminate in the meantime. The AT mechanism encourages our model to extract features that are both discriminative for missing relation prediction and shareable between single relation and multi-hop paths. We extensively evaluate our method on several large-scale KBs for relation completion. Experimental results show that our method consistently outperforms the baseline approaches. In addition, the hierarchical attention mechanism and the feature extractor in our model can be well interpreted and utilized in the related downstream tasks. |
Tasks | Knowledge Base Completion |
Published | 2018-10-14 |
URL | http://arxiv.org/abs/1810.06033v1 |
http://arxiv.org/pdf/1810.06033v1.pdf | |
PWC | https://paperswithcode.com/paper/hierarchical-attention-networks-for-knowledge |
Repo | |
Framework | |
Learning Robotic Assembly from CAD
Title | Learning Robotic Assembly from CAD |
Authors | Garrett Thomas, Melissa Chien, Aviv Tamar, Juan Aparicio Ojea, Pieter Abbeel |
Abstract | In this work, motivated by recent manufacturing trends, we investigate autonomous robotic assembly. Industrial assembly tasks require contact-rich manipulation skills, which are challenging to acquire using classical control and motion planning approaches. Consequently, robot controllers for assembly domains are presently engineered to solve a particular task, and cannot easily handle variations in the product or environment. Reinforcement learning (RL) is a promising approach for autonomously acquiring robot skills that involve contact-rich dynamics. However, RL relies on random exploration for learning a control policy, which requires many robot executions, and often gets trapped in locally suboptimal solutions. Instead, we posit that prior knowledge, when available, can improve RL performance. We exploit the fact that in modern assembly domains, geometric information about the task is readily available via the CAD design files. We propose to leverage this prior knowledge by guiding RL along a geometric motion plan, calculated using the CAD data. We show that our approach effectively improves over traditional control approaches for tracking the motion plan, and can solve assembly tasks that require high precision, even without accurate state estimation. In addition, we propose a neural network architecture that can learn to track the motion plan, and generalize the assembly controller to changes in the object positions. |
Tasks | Motion Planning |
Published | 2018-03-20 |
URL | http://arxiv.org/abs/1803.07635v2 |
http://arxiv.org/pdf/1803.07635v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-robotic-assembly-from-cad |
Repo | |
Framework | |
Determinantal Point Processes for Coresets
Title | Determinantal Point Processes for Coresets |
Authors | Nicolas Tremblay, Simon Barthelmé, Pierre-Olivier Amblard |
Abstract | When faced with a data set too large to be processed all at once, an obvious solution is to retain only part of it. In practice this takes a wide variety of different forms, and among them “coresets” are especially appealing. A coreset is a (small) weighted sample of the original data that comes with the following guarantee: a cost function can be evaluated on the smaller set instead of the larger one, with low relative error. For some classes of problems, and via a careful choice of sampling distribution (based on the so-called “sensitivity” metric), iid random sampling has turned to be one of the most successful methods for building coresets efficiently. However, independent samples are sometimes overly redundant, and one could hope that enforcing diversity would lead to better performance. The difficulty lies in proving coreset properties in non-iid samples. We show that the coreset property holds for samples formed with determinantal point processes (DPP). DPPs are interesting because they are a rare example of repulsive point processes with tractable theoretical properties, enabling us to prove general coreset theorems. We apply our results to both the k-means and the linear regression problems, and give extensive empirical evidence that the small additional computational cost of DPP sampling comes with superior performance over its iid counterpart. Of independent interest, we also provide analytical formulas for the sensitivity in the linear regression and 1-means cases. |
Tasks | Point Processes |
Published | 2018-03-23 |
URL | https://arxiv.org/abs/1803.08700v3 |
https://arxiv.org/pdf/1803.08700v3.pdf | |
PWC | https://paperswithcode.com/paper/determinantal-point-processes-for-coresets |
Repo | |
Framework | |
Semantically Enhanced Models for Commonsense Knowledge Acquisition
Title | Semantically Enhanced Models for Commonsense Knowledge Acquisition |
Authors | Ikhlas Alhussien, Erik Cambria, Zhang NengSheng |
Abstract | Commonsense knowledge is paramount to enable intelligent systems. Typically, it is characterized as being implicit and ambiguous, hindering thereby the automation of its acquisition. To address these challenges, this paper presents semantically enhanced models to enable reasoning through resolving part of commonsense ambiguity. The proposed models enhance in a knowledge graph embedding (KGE) framework for knowledge base completion. Experimental results show the effectiveness of the new semantic models in commonsense reasoning. |
Tasks | Graph Embedding, Knowledge Base Completion, Knowledge Graph Embedding |
Published | 2018-09-12 |
URL | http://arxiv.org/abs/1809.04708v2 |
http://arxiv.org/pdf/1809.04708v2.pdf | |
PWC | https://paperswithcode.com/paper/semantically-enhanced-models-for-commonsense |
Repo | |
Framework | |
DBSCAN++: Towards fast and scalable density clustering
Title | DBSCAN++: Towards fast and scalable density clustering |
Authors | Jennifer Jang, Heinrich Jiang |
Abstract | DBSCAN is a classical density-based clustering procedure with tremendous practical relevance. However, DBSCAN implicitly needs to compute the empirical density for each sample point, leading to a quadratic worst-case time complexity, which is too slow on large datasets. We propose DBSCAN++, a simple modification of DBSCAN which only requires computing the densities for a chosen subset of points. We show empirically that, compared to traditional DBSCAN, DBSCAN++ can provide not only competitive performance but also added robustness in the bandwidth hyperparameter while taking a fraction of the runtime. We also present statistical consistency guarantees showing the trade-off between computational cost and estimation rates. Surprisingly, up to a certain point, we can enjoy the same estimation rates while lowering computational cost, showing that DBSCAN++ is a sub-quadratic algorithm that attains minimax optimal rates for level-set estimation, a quality that may be of independent interest. |
Tasks | |
Published | 2018-10-31 |
URL | https://arxiv.org/abs/1810.13105v3 |
https://arxiv.org/pdf/1810.13105v3.pdf | |
PWC | https://paperswithcode.com/paper/dbscan-towards-fast-and-scalable-density |
Repo | |
Framework | |
Customer Sharing in Economic Networks with Costs
Title | Customer Sharing in Economic Networks with Costs |
Authors | Bin Li, Dong Hao, Dengji Zhao, Tao Zhou |
Abstract | In an economic market, sellers, infomediaries and customers constitute an economic network. Each seller has her own customer group and the seller’s private customers are unobservable to other sellers. Therefore, a seller can only sell commodities among her own customers unless other sellers or infomediaries share her sale information to their customer groups. However, a seller is not incentivized to share others’ sale information by default, which leads to inefficient resource allocation and limited revenue for the sale. To tackle this problem, we develop a novel mechanism called customer sharing mechanism (CSM) which incentivizes all sellers to share each other’s sale information to their private customer groups. Furthermore, CSM also incentivizes all customers to truthfully participate in the sale. In the end, CSM not only allocates the commodities efficiently but also optimizes the seller’s revenue. |
Tasks | |
Published | 2018-07-18 |
URL | http://arxiv.org/abs/1807.06822v1 |
http://arxiv.org/pdf/1807.06822v1.pdf | |
PWC | https://paperswithcode.com/paper/customer-sharing-in-economic-networks-with |
Repo | |
Framework | |
Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis
Title | Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis |
Authors | Tal Ben-Nun, Torsten Hoefler |
Abstract | Deep Neural Networks (DNNs) are becoming an important tool in modern computing applications. Accelerating their training is a major challenge and techniques range from distributed algorithms to low-level circuit design. In this survey, we describe the problem from a theoretical perspective, followed by approaches for its parallelization. We present trends in DNN architectures and the resulting implications on parallelization strategies. We then review and model the different types of concurrency in DNNs: from the single operator, through parallelism in network inference and training, to distributed deep learning. We discuss asynchronous stochastic optimization, distributed system architectures, communication schemes, and neural architecture search. Based on those approaches, we extrapolate potential directions for parallelism in deep learning. |
Tasks | Neural Architecture Search, Stochastic Optimization |
Published | 2018-02-26 |
URL | http://arxiv.org/abs/1802.09941v2 |
http://arxiv.org/pdf/1802.09941v2.pdf | |
PWC | https://paperswithcode.com/paper/demystifying-parallel-and-distributed-deep |
Repo | |
Framework | |
Understanding Generalization and Optimization Performance of Deep CNNs
Title | Understanding Generalization and Optimization Performance of Deep CNNs |
Authors | Pan Zhou, Jiashi Feng |
Abstract | This work aims to provide understandings on the remarkable success of deep convolutional neural networks (CNNs) by theoretically analyzing their generalization performance and establishing optimization guarantees for gradient descent based training algorithms. Specifically, for a CNN model consisting of $l$ convolutional layers and one fully connected layer, we prove that its generalization error is bounded by $\mathcal{O}(\sqrt{\dt\widetilde{\varrho}/n})$ where $\theta$ denotes freedom degree of the network parameters and $\widetilde{\varrho}=\mathcal{O}(\log(\prod_{i=1}^{l}\rwi{i} (\ki{i}-\si{i}+1)/p)+\log(\rf))$ encapsulates architecture parameters including the kernel size $\ki{i}$, stride $\si{i}$, pooling size $p$ and parameter magnitude $\rwi{i}$. To our best knowledge, this is the first generalization bound that only depends on $\mathcal{O}(\log(\prod_{i=1}^{l+1}\rwi{i}))$, tighter than existing ones that all involve an exponential term like $\mathcal{O}(\prod_{i=1}^{l+1}\rwi{i})$. Besides, we prove that for an arbitrary gradient descent algorithm, the computed approximate stationary point by minimizing empirical risk is also an approximate stationary point to the population risk. This well explains why gradient descent training algorithms usually perform sufficiently well in practice. Furthermore, we prove the one-to-one correspondence and convergence guarantees for the non-degenerate stationary points between the empirical and population risks. It implies that the computed local minimum for the empirical risk is also close to a local minimum for the population risk, thus ensuring the good generalization performance of CNNs. |
Tasks | |
Published | 2018-05-28 |
URL | http://arxiv.org/abs/1805.10767v1 |
http://arxiv.org/pdf/1805.10767v1.pdf | |
PWC | https://paperswithcode.com/paper/understanding-generalization-and-optimization |
Repo | |
Framework | |
Data-dependent Learning of Symmetric/Antisymmetric Relations for Knowledge Base Completion
Title | Data-dependent Learning of Symmetric/Antisymmetric Relations for Knowledge Base Completion |
Authors | Hitoshi Manabe, Katsuhiko Hayashi, Masashi Shimbo |
Abstract | Embedding-based methods for knowledge base completion (KBC) learn representations of entities and relations in a vector space, along with the scoring function to estimate the likelihood of relations between entities. The learnable class of scoring functions is designed to be expressive enough to cover a variety of real-world relations, but this expressive comes at the cost of an increased number of parameters. In particular, parameters in these methods are superfluous for relations that are either symmetric or antisymmetric. To mitigate this problem, we propose a new L1 regularizer for Complex Embeddings, which is one of the state-of-the-art embedding-based methods for KBC. This regularizer promotes symmetry or antisymmetry of the scoring function on a relation-by-relation basis, in accordance with the observed data. Our empirical evaluation shows that the proposed method outperforms the original Complex Embeddings and other baseline methods on the FB15k dataset. |
Tasks | Knowledge Base Completion |
Published | 2018-08-25 |
URL | http://arxiv.org/abs/1808.08361v1 |
http://arxiv.org/pdf/1808.08361v1.pdf | |
PWC | https://paperswithcode.com/paper/data-dependent-learning-of |
Repo | |
Framework | |
Domain Adversarial Training for Accented Speech Recognition
Title | Domain Adversarial Training for Accented Speech Recognition |
Authors | Sining Sun, Ching-Feng Yeh, Mei-Yuh Hwang, Mari Ostendorf, Lei Xie |
Abstract | In this paper, we propose a domain adversarial training (DAT) algorithm to alleviate the accented speech recognition problem. In order to reduce the mismatch between labeled source domain data (“standard” accent) and unlabeled target domain data (with heavy accents), we augment the learning objective for a Kaldi TDNN network with a domain adversarial training (DAT) objective to encourage the model to learn accent-invariant features. In experiments with three Mandarin accents, we show that DAT yields up to 7.45% relative character error rate reduction when we do not have transcriptions of the accented speech, compared with the baseline trained on standard accent data only. We also find a benefit from DAT when used in combination with training from automatic transcriptions on the accented data. Furthermore, we find that DAT is superior to multi-task learning for accented speech recognition. |
Tasks | Accented Speech Recognition, Multi-Task Learning, Speech Recognition |
Published | 2018-06-07 |
URL | http://arxiv.org/abs/1806.02786v1 |
http://arxiv.org/pdf/1806.02786v1.pdf | |
PWC | https://paperswithcode.com/paper/domain-adversarial-training-for-accented |
Repo | |
Framework | |