October 20, 2019

3118 words 15 mins read

Paper Group AWR 277

Challenges of Using Text Classifiers for Causal Inference. From Soft Classifiers to Hard Decisions: How fair can we be?. EDDI: Efficient Dynamic Discovery of High-Value Information with Partial VAE. Better Safe than Sorry: Evidence Accumulation Allows for Safe Reinforcement Learning. Temporal Convolutional Memory Networks for Remaining Useful Life …

Challenges of Using Text Classifiers for Causal Inference


Title	Challenges of Using Text Classifiers for Causal Inference
Authors	Zach Wood-Doughty, Ilya Shpitser, Mark Dredze
Abstract	Causal understanding is essential for many kinds of decision-making, but causal inference from observational data has typically only been applied to structured, low-dimensional datasets. While text classifiers produce low-dimensional outputs, their use in causal inference has not previously been studied. To facilitate causal analyses based on language data, we consider the role that text classifiers can play in causal inference through established modeling mechanisms from the causality literature on missing data and measurement error. We demonstrate how to conduct causal analyses using text classifiers on simulated and Yelp data, and discuss the opportunities and challenges of future work that uses text data in causal inference.
Tasks	Causal Inference, Decision Making
Published	2018-10-01
URL	http://arxiv.org/abs/1810.00956v1
PDF	http://arxiv.org/pdf/1810.00956v1.pdf
PWC	https://paperswithcode.com/paper/challenges-of-using-text-classifiers-for
Repo	https://github.com/zachwooddoughty/emnlp2018-causal
Framework	none

From Soft Classifiers to Hard Decisions: How fair can we be?


Title	From Soft Classifiers to Hard Decisions: How fair can we be?
Authors	Ran Canetti, Aloni Cohen, Nishanth Dikkala, Govind Ramnarayan, Sarah Scheffler, Adam Smith
Abstract	A popular methodology for building binary decision-making classifiers in the presence of imperfect information is to first construct a non-binary “scoring” classifier that is calibrated over all protected groups, and then to post-process this score to obtain a binary decision. We study the feasibility of achieving various fairness properties by post-processing calibrated scores, and then show that deferring post-processors allow for more fairness conditions to hold on the final decision. Specifically, we show: 1. There does not exist a general way to post-process a calibrated classifier to equalize protected groups’ positive or negative predictive value (PPV or NPV). For certain “nice” calibrated classifiers, either PPV or NPV can be equalized when the post-processor uses different thresholds across protected groups, though there exist distributions of calibrated scores for which the two measures cannot be both equalized. When the post-processing consists of a single global threshold across all groups, natural fairness properties, such as equalizing PPV in a nontrivial way, do not hold even for “nice” classifiers. 2. When the post-processing is allowed to `defer’ on some decisions (that is, to avoid making a decision by handing off some examples to a separate process), then for the non-deferred decisions, the resulting classifier can be made to equalize PPV, NPV, false positive rate (FPR) and false negative rate (FNR) across the protected groups. This suggests a way to partially evade the impossibility results of Chouldechova and Kleinberg et al., which preclude equalizing all of these measures simultaneously. We also present different deferring strategies and show how they affect the fairness properties of the overall system. We evaluate our post-processing techniques using the COMPAS data set from 2016. \|
Tasks	Decision Making
Published	2018-10-03
URL	http://arxiv.org/abs/1810.02003v2
PDF	http://arxiv.org/pdf/1810.02003v2.pdf
PWC	https://paperswithcode.com/paper/from-soft-classifiers-to-hard-decisions-how
Repo	https://github.com/nishanthdikkala/postprocessing-deferrals
Framework	none

EDDI: Efficient Dynamic Discovery of High-Value Information with Partial VAE


Title	EDDI: Efficient Dynamic Discovery of High-Value Information with Partial VAE
Authors	Chao Ma, Sebastian Tschiatschek, Konstantina Palla, José Miguel Hernández-Lobato, Sebastian Nowozin, Cheng Zhang
Abstract	Many real-life decision-making situations allow further relevant information to be acquired at a specific cost, for example, in assessing the health status of a patient we may decide to take additional measurements such as diagnostic tests or imaging scans before making a final assessment. Acquiring more relevant information enables better decision making, but may be costly. How can we trade off the desire to make good decisions by acquiring further information with the cost of performing that acquisition? To this end, we propose a principled framework, named EDDI (Efficient Dynamic Discovery of high-value Information), based on the theory of Bayesian experimental design. In EDDI, we propose a novel partial variational autoencoder (Partial VAE) to predict missing data entries problematically given any subset of the observed ones, and combine it with an acquisition function that maximizes expected information gain on a set of target variables. We show cost reduction at the same decision quality and improved decision quality at the same cost in multiple machine learning benchmarks and two real-world health-care applications.
Tasks	Decision Making
Published	2018-09-28
URL	https://arxiv.org/abs/1809.11142v4
PDF	https://arxiv.org/pdf/1809.11142v4.pdf
PWC	https://paperswithcode.com/paper/eddi-efficient-dynamic-discovery-of-high
Repo	https://github.com/microsoft/EDDI
Framework	tf

Better Safe than Sorry: Evidence Accumulation Allows for Safe Reinforcement Learning


Title	Better Safe than Sorry: Evidence Accumulation Allows for Safe Reinforcement Learning
Authors	Akshat Agarwal, Abhinau Kumar V, Kyle Dunovan, Erik Peterson, Timothy Verstynen, Katia Sycara
Abstract	In the real world, agents often have to operate in situations with incomplete information, limited sensing capabilities, and inherently stochastic environments, making individual observations incomplete and unreliable. Moreover, in many situations it is preferable to delay a decision rather than run the risk of making a bad decision. In such situations it is necessary to aggregate information before taking an action; however, most state of the art reinforcement learning (RL) algorithms are biased towards taking actions \textit{at every time step}, even if the agent is not particularly confident in its chosen action. This lack of caution can lead the agent to make critical mistakes, regardless of prior experience and acclimation to the environment. Motivated by theories of dynamic resolution of uncertainty during decision making in biological brains, we propose a simple accumulator module which accumulates evidence in favor of each possible decision, encodes uncertainty as a dynamic competition between actions, and acts on the environment only when it is sufficiently confident in the chosen action. The agent makes no decision by default, and the burden of proof to make a decision falls on the policy to accrue evidence strongly in favor of a single decision. Our results show that this accumulator module achieves near-optimal performance on a simple guessing game, far outperforming deep recurrent networks using traditional, forced action selection policies.
Tasks	Decision Making
Published	2018-09-24
URL	http://arxiv.org/abs/1809.09147v1
PDF	http://arxiv.org/pdf/1809.09147v1.pdf
PWC	https://paperswithcode.com/paper/better-safe-than-sorry-evidence-accumulation
Repo	https://github.com/susumuota/gym-modeestimation
Framework	none

Temporal Convolutional Memory Networks for Remaining Useful Life Estimation of Industrial Machinery


Title	Temporal Convolutional Memory Networks for Remaining Useful Life Estimation of Industrial Machinery
Authors	Lahiru Jayasinghe, Tharaka Samarasinghe, Chau Yuen, Jenny Chen Ni Low, Shuzhi Sam Ge
Abstract	Accurately estimating the remaining useful life (RUL) of industrial machinery is beneficial in many real-world applications. Estimation techniques have mainly utilized linear models or neural network based approaches with a focus on short term time dependencies. This paper, introduces a system model that incorporates temporal convolutions with both long term and short term time dependencies. The proposed network learns salient features and complex temporal variations in sensor values, and predicts the RUL. A data augmentation method is used for increased accuracy. The proposed method is compared with several state-of-the-art algorithms on publicly available datasets. It demonstrates promising results, with superior results for datasets obtained from complex environments.
Tasks	Data Augmentation
Published	2018-10-12
URL	http://arxiv.org/abs/1810.05644v2
PDF	http://arxiv.org/pdf/1810.05644v2.pdf
PWC	https://paperswithcode.com/paper/temporal-convolutional-memory-networks-for
Repo	https://github.com/LahiruJayasinghe/RUL-Net
Framework	tf

Unsupervised Transfer Learning for Spoken Language Understanding in Intelligent Agents


Title	Unsupervised Transfer Learning for Spoken Language Understanding in Intelligent Agents
Authors	Aditya Siddhant, Anuj Goyal, Angeliki Metallinou
Abstract	User interaction with voice-powered agents generates large amounts of unlabeled utterances. In this paper, we explore techniques to efficiently transfer the knowledge from these unlabeled utterances to improve model performance on Spoken Language Understanding (SLU) tasks. We use Embeddings from Language Model (ELMo) to take advantage of unlabeled data by learning contextualized word representations. Additionally, we propose ELMo-Light (ELMoL), a faster and simpler unsupervised pre-training method for SLU. Our findings suggest unsupervised pre-training on a large corpora of unlabeled utterances leads to significantly better SLU performance compared to training from scratch and it can even outperform conventional supervised transfer. Additionally, we show that the gains from unsupervised transfer techniques can be further improved by supervised transfer. The improvements are more pronounced in low resource settings and when using only 1000 labeled in-domain samples, our techniques match the performance of training from scratch on 10-15x more labeled in-domain data.
Tasks	Language Modelling, Spoken Language Understanding, Transfer Learning
Published	2018-11-13
URL	http://arxiv.org/abs/1811.05370v1
PDF	http://arxiv.org/pdf/1811.05370v1.pdf
PWC	https://paperswithcode.com/paper/unsupervised-transfer-learning-for-spoken
Repo	https://github.com/sxjscience/GluonNLP-Slot-Filling
Framework	mxnet

An Improved Evaluation Framework for Generative Adversarial Networks


Title	An Improved Evaluation Framework for Generative Adversarial Networks
Authors	Shaohui Liu, Yi Wei, Jiwen Lu, Jie Zhou
Abstract	In this paper, we propose an improved quantitative evaluation framework for Generative Adversarial Networks (GANs) on generating domain-specific images, where we improve conventional evaluation methods on two levels: the feature representation and the evaluation metric. Unlike most existing evaluation frameworks which transfer the representation of ImageNet inception model to map images onto the feature space, our framework uses a specialized encoder to acquire fine-grained domain-specific representation. Moreover, for datasets with multiple classes, we propose Class-Aware Frechet Distance (CAFD), which employs a Gaussian mixture model on the feature space to better fit the multi-manifold feature distribution. Experiments and analysis on both the feature level and the image level were conducted to demonstrate improvements of our proposed framework over the recently proposed state-of-the-art FID method. To our best knowledge, we are the first to provide counter examples where FID gives inconsistent results with human judgments. It is shown in the experiments that our framework is able to overcome the shortness of FID and improves robustness. Code will be made available.
Tasks
Published	2018-03-20
URL	http://arxiv.org/abs/1803.07474v3
PDF	http://arxiv.org/pdf/1803.07474v3.pdf
PWC	https://paperswithcode.com/paper/an-improved-evaluation-framework-for
Repo	https://github.com/B1ueber2y/CAFD
Framework	tf

Generating Adversarial Examples with Adversarial Networks


Title	Generating Adversarial Examples with Adversarial Networks
Authors	Chaowei Xiao, Bo Li, Jun-Yan Zhu, Warren He, Mingyan Liu, Dawn Song
Abstract	A challenge to explore adversarial robustness of neural networks on MNIST.
Tasks
Published	2018-01-08
URL	http://arxiv.org/abs/1801.02610v5
PDF	http://arxiv.org/pdf/1801.02610v5.pdf
PWC	https://paperswithcode.com/paper/generating-adversarial-examples-with
Repo	https://github.com/niharikajainn/adv_gan_keras
Framework	tf

Lessons from Natural Language Inference in the Clinical Domain


Title	Lessons from Natural Language Inference in the Clinical Domain
Authors	Alexey Romanov, Chaitanya Shivade
Abstract	State of the art models using deep neural networks have become very good in learning an accurate mapping from inputs to outputs. However, they still lack generalization capabilities in conditions that differ from the ones encountered during training. This is even more challenging in specialized, and knowledge intensive domains, where training data is limited. To address this gap, we introduce MedNLI - a dataset annotated by doctors, performing a natural language inference task (NLI), grounded in the medical history of patients. We present strategies to: 1) leverage transfer learning using datasets from the open domain, (e.g. SNLI) and 2) incorporate domain knowledge from external data and lexical sources (e.g. medical terminologies). Our results demonstrate performance gains using both strategies.
Tasks	Natural Language Inference, Transfer Learning
Published	2018-08-21
URL	http://arxiv.org/abs/1808.06752v2
PDF	http://arxiv.org/pdf/1808.06752v2.pdf
PWC	https://paperswithcode.com/paper/lessons-from-natural-language-inference-in
Repo	https://github.com/jgc128/mednli_baseline
Framework	pytorch

RSA: Byzantine-Robust Stochastic Aggregation Methods for Distributed Learning from Heterogeneous Datasets


Title	RSA: Byzantine-Robust Stochastic Aggregation Methods for Distributed Learning from Heterogeneous Datasets
Authors	Liping Li, Wei Xu, Tianyi Chen, Georgios B. Giannakis, Qing Ling
Abstract	In this paper, we propose a class of robust stochastic subgradient methods for distributed learning from heterogeneous datasets at presence of an unknown number of Byzantine workers. The Byzantine workers, during the learning process, may send arbitrary incorrect messages to the master due to data corruptions, communication failures or malicious attacks, and consequently bias the learned model. The key to the proposed methods is a regularization term incorporated with the objective function so as to robustify the learning task and mitigate the negative effects of Byzantine attacks. The resultant subgradient-based algorithms are termed Byzantine-Robust Stochastic Aggregation methods, justifying our acronym RSA used henceforth. In contrast to most of the existing algorithms, RSA does not rely on the assumption that the data are independent and identically distributed (i.i.d.) on the workers, and hence fits for a wider class of applications. Theoretically, we show that: i) RSA converges to a near-optimal solution with the learning error dependent on the number of Byzantine workers; ii) the convergence rate of RSA under Byzantine attacks is the same as that of the stochastic gradient descent method, which is free of Byzantine attacks. Numerically, experiments on real dataset corroborate the competitive performance of RSA and a complexity reduction compared to the state-of-the-art alternatives.
Tasks
Published	2018-11-09
URL	https://arxiv.org/abs/1811.03761v2
PDF	https://arxiv.org/pdf/1811.03761v2.pdf
PWC	https://paperswithcode.com/paper/rsa-byzantine-robust-stochastic-aggregation
Repo	https://github.com/Liepill/RSA-Byzantine
Framework	none

Formalized Conceptual Spaces with a Geometric Representation of Correlations


Title	Formalized Conceptual Spaces with a Geometric Representation of Correlations
Authors	Lucas Bechberger, Kai-Uwe Kühnberger
Abstract	The highly influential framework of conceptual spaces provides a geometric way of representing knowledge. Instances are represented by points in a similarity space and concepts are represented by convex regions in this space. After pointing out a problem with the convexity requirement, we propose a formalization of conceptual spaces based on fuzzy star-shaped sets. Our formalization uses a parametric definition of concepts and extends the original framework by adding means to represent correlations between different domains in a geometric way. Moreover, we define various operations for our formalization, both for creating new concepts from old ones and for measuring relations between concepts. We present an illustrative toy-example and sketch a research project on concept formation that is based on both our formalization and its implementation.
Tasks
Published	2018-01-11
URL	https://arxiv.org/abs/1801.03929v2
PDF	https://arxiv.org/pdf/1801.03929v2.pdf
PWC	https://paperswithcode.com/paper/formalized-conceptual-spaces-with-a-geometric
Repo	https://github.com/lbechberger/ConceptualSpaces
Framework	none

HENet:A Highly Efficient Convolutional Neural Networks Optimized for Accuracy, Speed and Storage


Title	HENet:A Highly Efficient Convolutional Neural Networks Optimized for Accuracy, Speed and Storage
Authors	Qiuyu Zhu, Ruixin Zhang
Abstract	In order to enhance the real-time performance of convolutional neural networks(CNNs), more and more researchers are focusing on improving the efficiency of CNN. Based on the analysis of some CNN architectures, such as ResNet, DenseNet, ShuffleNet and so on, we combined their advantages and proposed a very efficient model called Highly Efficient Networks(HENet). The new architecture uses an unusual way to combine group convolution and channel shuffle which was mentioned in ShuffleNet. Inspired by ResNet and DenseNet, we also proposed a new way to use element-wise addition and concatenation connection with each block. In order to make greater use of feature maps, pooling operations are removed from HENet. The experiments show that our model’s efficiency is more than 1 times higher than ShuffleNet on many open source datasets, such as CIFAR-10/100 and SVHN.
Tasks
Published	2018-03-07
URL	http://arxiv.org/abs/1803.02742v2
PDF	http://arxiv.org/pdf/1803.02742v2.pdf
PWC	https://paperswithcode.com/paper/heneta-highly-efficient-convolutional-neural
Repo	https://github.com/anlongstory/HENet
Framework	none

DeepGRU: Deep Gesture Recognition Utility


Title	DeepGRU: Deep Gesture Recognition Utility
Authors	Mehran Maghoumi, Joseph J. LaViola Jr
Abstract	We propose DeepGRU, a novel end-to-end deep network model informed by recent developments in deep learning for gesture and action recognition, that is streamlined and device-agnostic. DeepGRU, which uses only raw skeleton, pose or vector data is quickly understood, implemented, and trained, and yet achieves state-of-the-art results on challenging datasets. At the heart of our method lies a set of stacked gated recurrent units (GRU), two fully-connected layers and a novel global attention model. We evaluate our method on seven publicly available datasets, containing various number of samples and spanning over a broad range of interactions (full-body, multi-actor, hand gestures, etc.). In all but one case we outperform the state-of-the-art pose-based methods. For instance, we achieve a recognition accuracy of 84.9% and 92.3% on cross-subject and cross-view tests of the NTU RGB+D dataset respectively, and also 100% recognition accuracy on the UT-Kinect dataset. While DeepGRU works well on large datasets with many training samples, we show that even in the absence of a large number of training data, and with as little as four samples per class, DeepGRU can beat traditional methods specifically designed for small training sets. Lastly, we demonstrate that even without powerful hardware, and using only the CPU, our method can still be trained in under 10 minutes on small-scale datasets, making it an enticing choice for rapid application prototyping and development.
Tasks	Gesture Recognition, Skeleton Based Action Recognition, Temporal Action Localization
Published	2018-10-30
URL	https://arxiv.org/abs/1810.12514v4
PDF	https://arxiv.org/pdf/1810.12514v4.pdf
PWC	https://paperswithcode.com/paper/deepgru-deep-gesture-recognition-utility
Repo	https://github.com/Maghoumi/DeepGRU
Framework	pytorch

Deep Convolutional Neural Networks for Breast Cancer Histology Image Analysis


Title	Deep Convolutional Neural Networks for Breast Cancer Histology Image Analysis
Authors	Alexander Rakhlin, Alexey Shvets, Vladimir Iglovikov, Alexandr A. Kalinin
Abstract	Breast cancer is one of the main causes of cancer death worldwide. Early diagnostics significantly increases the chances of correct treatment and survival, but this process is tedious and often leads to a disagreement between pathologists. Computer-aided diagnosis systems showed potential for improving the diagnostic accuracy. In this work, we develop the computational approach based on deep convolution neural networks for breast cancer histology image classification. Hematoxylin and eosin stained breast histology microscopy image dataset is provided as a part of the ICIAR 2018 Grand Challenge on Breast Cancer Histology Images. Our approach utilizes several deep neural network architectures and gradient boosted trees classifier. For 4-class classification task, we report 87.2% accuracy. For 2-class classification task to detect carcinomas we report 93.8% accuracy, AUC 97.3%, and sensitivity/specificity 96.5/88.0% at the high-sensitivity operating point. To our knowledge, this approach outperforms other common methods in automated histopathological image classification. The source code for our approach is made publicly available at https://github.com/alexander-rakhlin/ICIAR2018
Tasks	Breast Cancer Detection, Breast Cancer Histology Image Classification, Histopathological Image Classification, Image Classification
Published	2018-02-02
URL	http://arxiv.org/abs/1802.00752v2
PDF	http://arxiv.org/pdf/1802.00752v2.pdf
PWC	https://paperswithcode.com/paper/deep-convolutional-neural-networks-for-breast
Repo	https://github.com/alexander-rakhlin/ICIAR2018
Framework	tf

Understanding the Origins of Bias in Word Embeddings


Title	Understanding the Origins of Bias in Word Embeddings
Authors	Marc-Etienne Brunet, Colleen Alkalay-Houlihan, Ashton Anderson, Richard Zemel
Abstract	The power of machine learning systems not only promises great technical progress, but risks societal harm. As a recent example, researchers have shown that popular word embedding algorithms exhibit stereotypical biases, such as gender bias. The widespread use of these algorithms in machine learning systems, from automated translation services to curriculum vitae scanners, can amplify stereotypes in important contexts. Although methods have been developed to measure these biases and alter word embeddings to mitigate their biased representations, there is a lack of understanding in how word embedding bias depends on the training data. In this work, we develop a technique for understanding the origins of bias in word embeddings. Given a word embedding trained on a corpus, our method identifies how perturbing the corpus will affect the bias of the resulting embedding. This can be used to trace the origins of word embedding bias back to the original training documents. Using our method, one can investigate trends in the bias of the underlying corpus and identify subsets of documents whose removal would most reduce bias. We demonstrate our techniques on both a New York Times and Wikipedia corpus and find that our influence function-based approximations are very accurate.
Tasks	Word Embeddings
Published	2018-10-08
URL	https://arxiv.org/abs/1810.03611v2
PDF	https://arxiv.org/pdf/1810.03611v2.pdf
PWC	https://paperswithcode.com/paper/understanding-the-origins-of-bias-in-word
Repo	https://github.com/sathvikn/word_embedding_bias
Framework	none