Paper Group AWR 277
Challenges of Using Text Classifiers for Causal Inference. From Soft Classifiers to Hard Decisions: How fair can we be?. EDDI: Efficient Dynamic Discovery of High-Value Information with Partial VAE. Better Safe than Sorry: Evidence Accumulation Allows for Safe Reinforcement Learning. Temporal Convolutional Memory Networks for Remaining Useful Life …
Challenges of Using Text Classifiers for Causal Inference
Title | Challenges of Using Text Classifiers for Causal Inference |
Authors | Zach Wood-Doughty, Ilya Shpitser, Mark Dredze |
Abstract | Causal understanding is essential for many kinds of decision-making, but causal inference from observational data has typically only been applied to structured, low-dimensional datasets. While text classifiers produce low-dimensional outputs, their use in causal inference has not previously been studied. To facilitate causal analyses based on language data, we consider the role that text classifiers can play in causal inference through established modeling mechanisms from the causality literature on missing data and measurement error. We demonstrate how to conduct causal analyses using text classifiers on simulated and Yelp data, and discuss the opportunities and challenges of future work that uses text data in causal inference. |
Tasks | Causal Inference, Decision Making |
Published | 2018-10-01 |
URL | http://arxiv.org/abs/1810.00956v1 |
http://arxiv.org/pdf/1810.00956v1.pdf | |
PWC | https://paperswithcode.com/paper/challenges-of-using-text-classifiers-for |
Repo | https://github.com/zachwooddoughty/emnlp2018-causal |
Framework | none |
From Soft Classifiers to Hard Decisions: How fair can we be?
Title | From Soft Classifiers to Hard Decisions: How fair can we be? |
Authors | Ran Canetti, Aloni Cohen, Nishanth Dikkala, Govind Ramnarayan, Sarah Scheffler, Adam Smith |
Abstract | A popular methodology for building binary decision-making classifiers in the presence of imperfect information is to first construct a non-binary “scoring” classifier that is calibrated over all protected groups, and then to post-process this score to obtain a binary decision. We study the feasibility of achieving various fairness properties by post-processing calibrated scores, and then show that deferring post-processors allow for more fairness conditions to hold on the final decision. Specifically, we show: 1. There does not exist a general way to post-process a calibrated classifier to equalize protected groups’ positive or negative predictive value (PPV or NPV). For certain “nice” calibrated classifiers, either PPV or NPV can be equalized when the post-processor uses different thresholds across protected groups, though there exist distributions of calibrated scores for which the two measures cannot be both equalized. When the post-processing consists of a single global threshold across all groups, natural fairness properties, such as equalizing PPV in a nontrivial way, do not hold even for “nice” classifiers. 2. When the post-processing is allowed to `defer’ on some decisions (that is, to avoid making a decision by handing off some examples to a separate process), then for the non-deferred decisions, the resulting classifier can be made to equalize PPV, NPV, false positive rate (FPR) and false negative rate (FNR) across the protected groups. This suggests a way to partially evade the impossibility results of Chouldechova and Kleinberg et al., which preclude equalizing all of these measures simultaneously. We also present different deferring strategies and show how they affect the fairness properties of the overall system. We evaluate our post-processing techniques using the COMPAS data set from 2016. | |
Tasks | Decision Making |
Published | 2018-10-03 |
URL | http://arxiv.org/abs/1810.02003v2 |
http://arxiv.org/pdf/1810.02003v2.pdf | |
PWC | https://paperswithcode.com/paper/from-soft-classifiers-to-hard-decisions-how |
Repo | https://github.com/nishanthdikkala/postprocessing-deferrals |
Framework | none |
EDDI: Efficient Dynamic Discovery of High-Value Information with Partial VAE
Title | EDDI: Efficient Dynamic Discovery of High-Value Information with Partial VAE |
Authors | Chao Ma, Sebastian Tschiatschek, Konstantina Palla, José Miguel Hernández-Lobato, Sebastian Nowozin, Cheng Zhang |
Abstract | Many real-life decision-making situations allow further relevant information to be acquired at a specific cost, for example, in assessing the health status of a patient we may decide to take additional measurements such as diagnostic tests or imaging scans before making a final assessment. Acquiring more relevant information enables better decision making, but may be costly. How can we trade off the desire to make good decisions by acquiring further information with the cost of performing that acquisition? To this end, we propose a principled framework, named EDDI (Efficient Dynamic Discovery of high-value Information), based on the theory of Bayesian experimental design. In EDDI, we propose a novel partial variational autoencoder (Partial VAE) to predict missing data entries problematically given any subset of the observed ones, and combine it with an acquisition function that maximizes expected information gain on a set of target variables. We show cost reduction at the same decision quality and improved decision quality at the same cost in multiple machine learning benchmarks and two real-world health-care applications. |
Tasks | Decision Making |
Published | 2018-09-28 |
URL | https://arxiv.org/abs/1809.11142v4 |
https://arxiv.org/pdf/1809.11142v4.pdf | |
PWC | https://paperswithcode.com/paper/eddi-efficient-dynamic-discovery-of-high |
Repo | https://github.com/microsoft/EDDI |
Framework | tf |
Better Safe than Sorry: Evidence Accumulation Allows for Safe Reinforcement Learning
Title | Better Safe than Sorry: Evidence Accumulation Allows for Safe Reinforcement Learning |
Authors | Akshat Agarwal, Abhinau Kumar V, Kyle Dunovan, Erik Peterson, Timothy Verstynen, Katia Sycara |
Abstract | In the real world, agents often have to operate in situations with incomplete information, limited sensing capabilities, and inherently stochastic environments, making individual observations incomplete and unreliable. Moreover, in many situations it is preferable to delay a decision rather than run the risk of making a bad decision. In such situations it is necessary to aggregate information before taking an action; however, most state of the art reinforcement learning (RL) algorithms are biased towards taking actions \textit{at every time step}, even if the agent is not particularly confident in its chosen action. This lack of caution can lead the agent to make critical mistakes, regardless of prior experience and acclimation to the environment. Motivated by theories of dynamic resolution of uncertainty during decision making in biological brains, we propose a simple accumulator module which accumulates evidence in favor of each possible decision, encodes uncertainty as a dynamic competition between actions, and acts on the environment only when it is sufficiently confident in the chosen action. The agent makes no decision by default, and the burden of proof to make a decision falls on the policy to accrue evidence strongly in favor of a single decision. Our results show that this accumulator module achieves near-optimal performance on a simple guessing game, far outperforming deep recurrent networks using traditional, forced action selection policies. |
Tasks | Decision Making |
Published | 2018-09-24 |
URL | http://arxiv.org/abs/1809.09147v1 |
http://arxiv.org/pdf/1809.09147v1.pdf | |
PWC | https://paperswithcode.com/paper/better-safe-than-sorry-evidence-accumulation |
Repo | https://github.com/susumuota/gym-modeestimation |
Framework | none |
Temporal Convolutional Memory Networks for Remaining Useful Life Estimation of Industrial Machinery
Title | Temporal Convolutional Memory Networks for Remaining Useful Life Estimation of Industrial Machinery |
Authors | Lahiru Jayasinghe, Tharaka Samarasinghe, Chau Yuen, Jenny Chen Ni Low, Shuzhi Sam Ge |
Abstract | Accurately estimating the remaining useful life (RUL) of industrial machinery is beneficial in many real-world applications. Estimation techniques have mainly utilized linear models or neural network based approaches with a focus on short term time dependencies. This paper, introduces a system model that incorporates temporal convolutions with both long term and short term time dependencies. The proposed network learns salient features and complex temporal variations in sensor values, and predicts the RUL. A data augmentation method is used for increased accuracy. The proposed method is compared with several state-of-the-art algorithms on publicly available datasets. It demonstrates promising results, with superior results for datasets obtained from complex environments. |
Tasks | Data Augmentation |
Published | 2018-10-12 |
URL | http://arxiv.org/abs/1810.05644v2 |
http://arxiv.org/pdf/1810.05644v2.pdf | |
PWC | https://paperswithcode.com/paper/temporal-convolutional-memory-networks-for |
Repo | https://github.com/LahiruJayasinghe/RUL-Net |
Framework | tf |
Unsupervised Transfer Learning for Spoken Language Understanding in Intelligent Agents
Title | Unsupervised Transfer Learning for Spoken Language Understanding in Intelligent Agents |
Authors | Aditya Siddhant, Anuj Goyal, Angeliki Metallinou |
Abstract | User interaction with voice-powered agents generates large amounts of unlabeled utterances. In this paper, we explore techniques to efficiently transfer the knowledge from these unlabeled utterances to improve model performance on Spoken Language Understanding (SLU) tasks. We use Embeddings from Language Model (ELMo) to take advantage of unlabeled data by learning contextualized word representations. Additionally, we propose ELMo-Light (ELMoL), a faster and simpler unsupervised pre-training method for SLU. Our findings suggest unsupervised pre-training on a large corpora of unlabeled utterances leads to significantly better SLU performance compared to training from scratch and it can even outperform conventional supervised transfer. Additionally, we show that the gains from unsupervised transfer techniques can be further improved by supervised transfer. The improvements are more pronounced in low resource settings and when using only 1000 labeled in-domain samples, our techniques match the performance of training from scratch on 10-15x more labeled in-domain data. |
Tasks | Language Modelling, Spoken Language Understanding, Transfer Learning |
Published | 2018-11-13 |
URL | http://arxiv.org/abs/1811.05370v1 |
http://arxiv.org/pdf/1811.05370v1.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-transfer-learning-for-spoken |
Repo | https://github.com/sxjscience/GluonNLP-Slot-Filling |
Framework | mxnet |
An Improved Evaluation Framework for Generative Adversarial Networks
Title | An Improved Evaluation Framework for Generative Adversarial Networks |
Authors | Shaohui Liu, Yi Wei, Jiwen Lu, Jie Zhou |
Abstract | In this paper, we propose an improved quantitative evaluation framework for Generative Adversarial Networks (GANs) on generating domain-specific images, where we improve conventional evaluation methods on two levels: the feature representation and the evaluation metric. Unlike most existing evaluation frameworks which transfer the representation of ImageNet inception model to map images onto the feature space, our framework uses a specialized encoder to acquire fine-grained domain-specific representation. Moreover, for datasets with multiple classes, we propose Class-Aware Frechet Distance (CAFD), which employs a Gaussian mixture model on the feature space to better fit the multi-manifold feature distribution. Experiments and analysis on both the feature level and the image level were conducted to demonstrate improvements of our proposed framework over the recently proposed state-of-the-art FID method. To our best knowledge, we are the first to provide counter examples where FID gives inconsistent results with human judgments. It is shown in the experiments that our framework is able to overcome the shortness of FID and improves robustness. Code will be made available. |
Tasks | |
Published | 2018-03-20 |
URL | http://arxiv.org/abs/1803.07474v3 |
http://arxiv.org/pdf/1803.07474v3.pdf | |
PWC | https://paperswithcode.com/paper/an-improved-evaluation-framework-for |
Repo | https://github.com/B1ueber2y/CAFD |
Framework | tf |
Generating Adversarial Examples with Adversarial Networks
Title | Generating Adversarial Examples with Adversarial Networks |
Authors | Chaowei Xiao, Bo Li, Jun-Yan Zhu, Warren He, Mingyan Liu, Dawn Song |
Abstract | A challenge to explore adversarial robustness of neural networks on MNIST. |
Tasks | |
Published | 2018-01-08 |
URL | http://arxiv.org/abs/1801.02610v5 |
http://arxiv.org/pdf/1801.02610v5.pdf | |
PWC | https://paperswithcode.com/paper/generating-adversarial-examples-with |
Repo | https://github.com/niharikajainn/adv_gan_keras |
Framework | tf |
Lessons from Natural Language Inference in the Clinical Domain
Title | Lessons from Natural Language Inference in the Clinical Domain |
Authors | Alexey Romanov, Chaitanya Shivade |
Abstract | State of the art models using deep neural networks have become very good in learning an accurate mapping from inputs to outputs. However, they still lack generalization capabilities in conditions that differ from the ones encountered during training. This is even more challenging in specialized, and knowledge intensive domains, where training data is limited. To address this gap, we introduce MedNLI - a dataset annotated by doctors, performing a natural language inference task (NLI), grounded in the medical history of patients. We present strategies to: 1) leverage transfer learning using datasets from the open domain, (e.g. SNLI) and 2) incorporate domain knowledge from external data and lexical sources (e.g. medical terminologies). Our results demonstrate performance gains using both strategies. |
Tasks | Natural Language Inference, Transfer Learning |
Published | 2018-08-21 |
URL | http://arxiv.org/abs/1808.06752v2 |
http://arxiv.org/pdf/1808.06752v2.pdf | |
PWC | https://paperswithcode.com/paper/lessons-from-natural-language-inference-in |
Repo | https://github.com/jgc128/mednli_baseline |
Framework | pytorch |
RSA: Byzantine-Robust Stochastic Aggregation Methods for Distributed Learning from Heterogeneous Datasets
Title | RSA: Byzantine-Robust Stochastic Aggregation Methods for Distributed Learning from Heterogeneous Datasets |
Authors | Liping Li, Wei Xu, Tianyi Chen, Georgios B. Giannakis, Qing Ling |
Abstract | In this paper, we propose a class of robust stochastic subgradient methods for distributed learning from heterogeneous datasets at presence of an unknown number of Byzantine workers. The Byzantine workers, during the learning process, may send arbitrary incorrect messages to the master due to data corruptions, communication failures or malicious attacks, and consequently bias the learned model. The key to the proposed methods is a regularization term incorporated with the objective function so as to robustify the learning task and mitigate the negative effects of Byzantine attacks. The resultant subgradient-based algorithms are termed Byzantine-Robust Stochastic Aggregation methods, justifying our acronym RSA used henceforth. In contrast to most of the existing algorithms, RSA does not rely on the assumption that the data are independent and identically distributed (i.i.d.) on the workers, and hence fits for a wider class of applications. Theoretically, we show that: i) RSA converges to a near-optimal solution with the learning error dependent on the number of Byzantine workers; ii) the convergence rate of RSA under Byzantine attacks is the same as that of the stochastic gradient descent method, which is free of Byzantine attacks. Numerically, experiments on real dataset corroborate the competitive performance of RSA and a complexity reduction compared to the state-of-the-art alternatives. |
Tasks | |
Published | 2018-11-09 |
URL | https://arxiv.org/abs/1811.03761v2 |
https://arxiv.org/pdf/1811.03761v2.pdf | |
PWC | https://paperswithcode.com/paper/rsa-byzantine-robust-stochastic-aggregation |
Repo | https://github.com/Liepill/RSA-Byzantine |
Framework | none |
Formalized Conceptual Spaces with a Geometric Representation of Correlations
Title | Formalized Conceptual Spaces with a Geometric Representation of Correlations |
Authors | Lucas Bechberger, Kai-Uwe Kühnberger |
Abstract | The highly influential framework of conceptual spaces provides a geometric way of representing knowledge. Instances are represented by points in a similarity space and concepts are represented by convex regions in this space. After pointing out a problem with the convexity requirement, we propose a formalization of conceptual spaces based on fuzzy star-shaped sets. Our formalization uses a parametric definition of concepts and extends the original framework by adding means to represent correlations between different domains in a geometric way. Moreover, we define various operations for our formalization, both for creating new concepts from old ones and for measuring relations between concepts. We present an illustrative toy-example and sketch a research project on concept formation that is based on both our formalization and its implementation. |
Tasks | |
Published | 2018-01-11 |
URL | https://arxiv.org/abs/1801.03929v2 |
https://arxiv.org/pdf/1801.03929v2.pdf | |
PWC | https://paperswithcode.com/paper/formalized-conceptual-spaces-with-a-geometric |
Repo | https://github.com/lbechberger/ConceptualSpaces |
Framework | none |
HENet:A Highly Efficient Convolutional Neural Networks Optimized for Accuracy, Speed and Storage
Title | HENet:A Highly Efficient Convolutional Neural Networks Optimized for Accuracy, Speed and Storage |
Authors | Qiuyu Zhu, Ruixin Zhang |
Abstract | In order to enhance the real-time performance of convolutional neural networks(CNNs), more and more researchers are focusing on improving the efficiency of CNN. Based on the analysis of some CNN architectures, such as ResNet, DenseNet, ShuffleNet and so on, we combined their advantages and proposed a very efficient model called Highly Efficient Networks(HENet). The new architecture uses an unusual way to combine group convolution and channel shuffle which was mentioned in ShuffleNet. Inspired by ResNet and DenseNet, we also proposed a new way to use element-wise addition and concatenation connection with each block. In order to make greater use of feature maps, pooling operations are removed from HENet. The experiments show that our model’s efficiency is more than 1 times higher than ShuffleNet on many open source datasets, such as CIFAR-10/100 and SVHN. |
Tasks | |
Published | 2018-03-07 |
URL | http://arxiv.org/abs/1803.02742v2 |
http://arxiv.org/pdf/1803.02742v2.pdf | |
PWC | https://paperswithcode.com/paper/heneta-highly-efficient-convolutional-neural |
Repo | https://github.com/anlongstory/HENet |
Framework | none |
DeepGRU: Deep Gesture Recognition Utility
Title | DeepGRU: Deep Gesture Recognition Utility |
Authors | Mehran Maghoumi, Joseph J. LaViola Jr |
Abstract | We propose DeepGRU, a novel end-to-end deep network model informed by recent developments in deep learning for gesture and action recognition, that is streamlined and device-agnostic. DeepGRU, which uses only raw skeleton, pose or vector data is quickly understood, implemented, and trained, and yet achieves state-of-the-art results on challenging datasets. At the heart of our method lies a set of stacked gated recurrent units (GRU), two fully-connected layers and a novel global attention model. We evaluate our method on seven publicly available datasets, containing various number of samples and spanning over a broad range of interactions (full-body, multi-actor, hand gestures, etc.). In all but one case we outperform the state-of-the-art pose-based methods. For instance, we achieve a recognition accuracy of 84.9% and 92.3% on cross-subject and cross-view tests of the NTU RGB+D dataset respectively, and also 100% recognition accuracy on the UT-Kinect dataset. While DeepGRU works well on large datasets with many training samples, we show that even in the absence of a large number of training data, and with as little as four samples per class, DeepGRU can beat traditional methods specifically designed for small training sets. Lastly, we demonstrate that even without powerful hardware, and using only the CPU, our method can still be trained in under 10 minutes on small-scale datasets, making it an enticing choice for rapid application prototyping and development. |
Tasks | Gesture Recognition, Skeleton Based Action Recognition, Temporal Action Localization |
Published | 2018-10-30 |
URL | https://arxiv.org/abs/1810.12514v4 |
https://arxiv.org/pdf/1810.12514v4.pdf | |
PWC | https://paperswithcode.com/paper/deepgru-deep-gesture-recognition-utility |
Repo | https://github.com/Maghoumi/DeepGRU |
Framework | pytorch |
Deep Convolutional Neural Networks for Breast Cancer Histology Image Analysis
Title | Deep Convolutional Neural Networks for Breast Cancer Histology Image Analysis |
Authors | Alexander Rakhlin, Alexey Shvets, Vladimir Iglovikov, Alexandr A. Kalinin |
Abstract | Breast cancer is one of the main causes of cancer death worldwide. Early diagnostics significantly increases the chances of correct treatment and survival, but this process is tedious and often leads to a disagreement between pathologists. Computer-aided diagnosis systems showed potential for improving the diagnostic accuracy. In this work, we develop the computational approach based on deep convolution neural networks for breast cancer histology image classification. Hematoxylin and eosin stained breast histology microscopy image dataset is provided as a part of the ICIAR 2018 Grand Challenge on Breast Cancer Histology Images. Our approach utilizes several deep neural network architectures and gradient boosted trees classifier. For 4-class classification task, we report 87.2% accuracy. For 2-class classification task to detect carcinomas we report 93.8% accuracy, AUC 97.3%, and sensitivity/specificity 96.5/88.0% at the high-sensitivity operating point. To our knowledge, this approach outperforms other common methods in automated histopathological image classification. The source code for our approach is made publicly available at https://github.com/alexander-rakhlin/ICIAR2018 |
Tasks | Breast Cancer Detection, Breast Cancer Histology Image Classification, Histopathological Image Classification, Image Classification |
Published | 2018-02-02 |
URL | http://arxiv.org/abs/1802.00752v2 |
http://arxiv.org/pdf/1802.00752v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-convolutional-neural-networks-for-breast |
Repo | https://github.com/alexander-rakhlin/ICIAR2018 |
Framework | tf |
Understanding the Origins of Bias in Word Embeddings
Title | Understanding the Origins of Bias in Word Embeddings |
Authors | Marc-Etienne Brunet, Colleen Alkalay-Houlihan, Ashton Anderson, Richard Zemel |
Abstract | The power of machine learning systems not only promises great technical progress, but risks societal harm. As a recent example, researchers have shown that popular word embedding algorithms exhibit stereotypical biases, such as gender bias. The widespread use of these algorithms in machine learning systems, from automated translation services to curriculum vitae scanners, can amplify stereotypes in important contexts. Although methods have been developed to measure these biases and alter word embeddings to mitigate their biased representations, there is a lack of understanding in how word embedding bias depends on the training data. In this work, we develop a technique for understanding the origins of bias in word embeddings. Given a word embedding trained on a corpus, our method identifies how perturbing the corpus will affect the bias of the resulting embedding. This can be used to trace the origins of word embedding bias back to the original training documents. Using our method, one can investigate trends in the bias of the underlying corpus and identify subsets of documents whose removal would most reduce bias. We demonstrate our techniques on both a New York Times and Wikipedia corpus and find that our influence function-based approximations are very accurate. |
Tasks | Word Embeddings |
Published | 2018-10-08 |
URL | https://arxiv.org/abs/1810.03611v2 |
https://arxiv.org/pdf/1810.03611v2.pdf | |
PWC | https://paperswithcode.com/paper/understanding-the-origins-of-bias-in-word |
Repo | https://github.com/sathvikn/word_embedding_bias |
Framework | none |