January 27, 2020

3165 words 15 mins read

Paper Group ANR 1274

Wasserstein Dependency Measure for Representation Learning. BEHRT: Transformer for Electronic Health Records. Acquisition of Inflectional Morphology in Artificial Neural Networks With Prior Knowledge. Unbabel’s Submission to the WMT2019 APE Shared Task: BERT-based Encoder-Decoder for Automatic Post-Editing. Mixed-Integer Optimization Approach to Le …

Wasserstein Dependency Measure for Representation Learning


Title	Wasserstein Dependency Measure for Representation Learning
Authors	Sherjil Ozair, Corey Lynch, Yoshua Bengio, Aaron van den Oord, Sergey Levine, Pierre Sermanet
Abstract	Mutual information maximization has emerged as a powerful learning objective for unsupervised representation learning obtaining state-of-the-art performance in applications such as object recognition, speech recognition, and reinforcement learning. However, such approaches are fundamentally limited since a tight lower bound of mutual information requires sample size exponential in the mutual information. This limits the applicability of these approaches for prediction tasks with high mutual information, such as in video understanding or reinforcement learning. In these settings, such techniques are prone to overfit, both in theory and in practice, and capture only a few of the relevant factors of variation. This leads to incomplete representations that are not optimal for downstream tasks. In this work, we empirically demonstrate that mutual information-based representation learning approaches do fail to learn complete representations on a number of designed and real-world tasks. To mitigate these problems we introduce the Wasserstein dependency measure, which learns more complete representations by using the Wasserstein distance instead of the KL divergence in the mutual information estimator. We show that a practical approximation to this theoretically motivated solution, constructed using Lipschitz constraint techniques from the GAN literature, achieves substantially improved results on tasks where incomplete representations are a major challenge.
Tasks	Object Recognition, Representation Learning, Speech Recognition, Unsupervised Representation Learning, Video Understanding
Published	2019-03-28
URL	http://arxiv.org/abs/1903.11780v1
PDF	http://arxiv.org/pdf/1903.11780v1.pdf
PWC	https://paperswithcode.com/paper/wasserstein-dependency-measure-for
Repo
Framework

BEHRT: Transformer for Electronic Health Records


Title	BEHRT: Transformer for Electronic Health Records
Authors	Yikuan Li, Shishir Rao, Jose Roberto Ayala Solares, Abdelaali Hassaine, Dexter Canoy, Yajie Zhu, Kazem Rahimi, Gholamreza Salimi-Khorshidi
Abstract	Today, despite decades of developments in medicine and the growing interest in precision healthcare, vast majority of diagnoses happen once patients begin to show noticeable signs of illness. Early indication and detection of diseases, however, can provide patients and carers with the chance of early intervention, better disease management, and efficient allocation of healthcare resources. The latest developments in machine learning (more specifically, deep learning) provides a great opportunity to address this unmet need. In this study, we introduce BEHRT: A deep neural sequence transduction model for EHR (electronic health records), capable of multitask prediction and disease trajectory mapping. When trained and evaluated on the data from nearly 1.6 million individuals, BEHRT shows a striking absolute improvement of 8.0-10.8%, in terms of Average Precision Score, compared to the existing state-of-the-art deep EHR models (in terms of average precision, when predicting for the onset of 301 conditions). In addition to its superior prediction power, BEHRT provides a personalised view of disease trajectories through its attention mechanism; its flexible architecture enables it to incorporate multiple heterogeneous concepts (e.g., diagnosis, medication, measurements, and more) to improve the accuracy of its predictions; and its (pre-)training results in disease and patient representations that can help us get a step closer to interpretable predictions.
Tasks
Published	2019-07-22
URL	https://arxiv.org/abs/1907.09538v1
PDF	https://arxiv.org/pdf/1907.09538v1.pdf
PWC	https://paperswithcode.com/paper/behrt-transformer-for-electronic-health
Repo
Framework

Acquisition of Inflectional Morphology in Artificial Neural Networks With Prior Knowledge


Title	Acquisition of Inflectional Morphology in Artificial Neural Networks With Prior Knowledge
Authors	Katharina Kann
Abstract	How does knowledge of one language’s morphology influence learning of inflection rules in a second one? In order to investigate this question in artificial neural network models, we perform experiments with a sequence-to-sequence architecture, which we train on different combinations of eight source and three target languages. A detailed analysis of the model outputs suggests the following conclusions: (i) if source and target language are closely related, acquisition of the target language’s inflectional morphology constitutes an easier task for the model; (ii) knowledge of a prefixing (resp. suffixing) language makes acquisition of a suffixing (resp. prefixing) language’s morphology more challenging; and (iii) surprisingly, a source language which exhibits an agglutinative morphology simplifies learning of a second language’s inflectional morphology, independent of their relatedness.
Tasks
Published	2019-10-12
URL	https://arxiv.org/abs/1910.05456v1
PDF	https://arxiv.org/pdf/1910.05456v1.pdf
PWC	https://paperswithcode.com/paper/acquisition-of-inflectional-morphology-in
Repo
Framework

Unbabel’s Submission to the WMT2019 APE Shared Task: BERT-based Encoder-Decoder for Automatic Post-Editing


Title	Unbabel’s Submission to the WMT2019 APE Shared Task: BERT-based Encoder-Decoder for Automatic Post-Editing
Authors	António V. Lopes, M. Amin Farajian, Gonçalo M. Correia, Jonay Trenous, André F. T. Martins
Abstract	This paper describes Unbabel’s submission to the WMT2019 APE Shared Task for the English-German language pair. Following the recent rise of large, powerful, pre-trained models, we adapt the BERT pretrained model to perform Automatic Post-Editing in an encoder-decoder framework. Analogously to dual-encoder architectures we develop a BERT-based encoder-decoder (BED) model in which a single pretrained BERT encoder receives both the source src and machine translation tgt strings. Furthermore, we explore a conservativeness factor to constrain the APE system to perform fewer edits. As the official results show, when trained on a weighted combination of in-domain and artificial training data, our BED system with the conservativeness penalty improves significantly the translations of a strong Neural Machine Translation system by $-0.78$ and $+1.23$ in terms of TER and BLEU, respectively. Finally, our submission achieves a new state-of-the-art, ex-aequo, in English-German APE of NMT.
Tasks	Automatic Post-Editing, Machine Translation
Published	2019-05-30
URL	https://arxiv.org/abs/1905.13068v2
PDF	https://arxiv.org/pdf/1905.13068v2.pdf
PWC	https://paperswithcode.com/paper/unbabels-submission-to-the-wmt2019-ape-shared
Repo
Framework

Mixed-Integer Optimization Approach to Learning Association Rules for Unplanned ICU Transfer


Title	Mixed-Integer Optimization Approach to Learning Association Rules for Unplanned ICU Transfer
Authors	Chun-An Chou, Qingtao Cao, Shao-Jen Weng, Che-Hung Tsai
Abstract	After admission to emergency department (ED), patients with critical illnesses are transferred to intensive care unit (ICU) due to unexpected clinical deterioration occurrence. Identifying such unplanned ICU transfers is urgently needed for medical physicians to achieve two-fold goals: improving critical care quality and preventing mortality. A priority task is to understand the crucial rationale behind diagnosis results of individual patients during stay in ED, which helps prepare for an early transfer to ICU. Most existing prediction studies were based on univariate analysis or multiple logistic regression to provide one-size-fit-all results. However, patient condition varying from case to case may not be accurately examined by the only judgment. In this study, we present a new decision tool using a mathematical optimization approach aiming to automatically discover rules associating diagnostic features with high-risk outcome (i.e., unplanned transfers) in different deterioration scenarios. We consider four mutually exclusive patient subgroups based on the principal reasons of ED visits: infections, cardiovascular/respiratory diseases, gastrointestinal diseases, and neurological/other diseases at a suburban teaching hospital. The analysis results demonstrate significant rules associated with unplanned transfer outcome for each subgroups and also show comparable prediction accuracy, compared to state-of-the-art machine learning methods while providing easy-to-interpret symptom-outcome information.
Tasks
Published	2019-08-02
URL	https://arxiv.org/abs/1908.00966v1
PDF	https://arxiv.org/pdf/1908.00966v1.pdf
PWC	https://paperswithcode.com/paper/mixed-integer-optimization-approach-to
Repo
Framework

Part-Guided Attention Learning for Vehicle Re-Identification


Title	Part-Guided Attention Learning for Vehicle Re-Identification
Authors	Xinyu Zhang, Rufeng Zhang, Jiewei Cao, Dong Gong, Mingyu You, Chunhua Shen
Abstract	Vehicle re-identification (Re-ID) often requires one to recognize the fine-grained visual differences between vehicles. Besides the holistic appearance of vehicles which is easily affected by the viewpoint variation and distortion, vehicle parts also provide crucial cues to differentiate near-identical vehicles. Motivated by these observations, we introduce a Part-Guided Attention Network (PGAN) to pinpoint the prominent part regions and effectively combine the global and part information for discriminative feature learning. PGAN first detects the locations of different part components and salient regions regardless of the vehicle identity, which serve as the bottom-up attention to narrow down the possible searching regions. To estimate the importance of detected parts, we propose a Part Attention Module (PAM) to adaptively locate the most discriminative regions with high-attention weights and suppress the distraction of irrelevant parts with relatively low weights. The PAM is guided by the Re-ID loss and therefore provides top-down attention that enables attention to be calculated at the level of car parts and other salient regions. Finally, we aggregate the global appearance and part features to improve the feature performance further. The PGAN combines part-guided bottom-up and top-down attention, global and part visual features in an end-to-end framework. Extensive experiments demonstrate that the proposed method achieves new state-of-the-art vehicle Re-ID performance on four large-scale benchmark datasets.
Tasks	Vehicle Re-Identification
Published	2019-09-13
URL	https://arxiv.org/abs/1909.06023v3
PDF	https://arxiv.org/pdf/1909.06023v3.pdf
PWC	https://paperswithcode.com/paper/part-guided-attention-learning-for-vehicle-re
Repo
Framework

Why gradient clipping accelerates training: A theoretical justification for adaptivity


Title	Why gradient clipping accelerates training: A theoretical justification for adaptivity
Authors	Jingzhao Zhang, Tianxing He, Suvrit Sra, Ali Jadbabaie
Abstract	We provide a theoretical explanation for the effectiveness of gradient clipping in training deep neural networks. The key ingredient is a new smoothness condition derived from practical neural network training examples. We observe that gradient smoothness, a concept central to the analysis of first-order optimization algorithms that is often assumed to be a constant, demonstrates significant variability along the training trajectory of deep neural networks. Further, this smoothness positively correlates with the gradient norm, and contrary to standard assumptions in the literature, it can grow with the norm of the gradient. These empirical observations limit the applicability of existing theoretical analyses of algorithms that rely on a fixed bound on smoothness. These observations motivate us to introduce a novel relaxation of gradient smoothness that is weaker than the commonly used Lipschitz smoothness assumption. Under the new condition, we prove that two popular methods, namely, \emph{gradient clipping} and \emph{normalized gradient}, converge arbitrarily faster than gradient descent with fixed stepsize. We further explain why such adaptively scaled gradient methods can accelerate empirical convergence and verify our results empirically in popular neural network training settings.
Tasks	Image Classification, Language Modelling
Published	2019-05-28
URL	https://arxiv.org/abs/1905.11881v2
PDF	https://arxiv.org/pdf/1905.11881v2.pdf
PWC	https://paperswithcode.com/paper/analysis-of-gradient-clipping-and-adaptive
Repo
Framework

Scalable Thompson Sampling via Optimal Transport


Title	Scalable Thompson Sampling via Optimal Transport
Authors	Ruiyi Zhang, Zheng Wen, Changyou Chen, Lawrence Carin
Abstract	Thompson sampling (TS) is a class of algorithms for sequential decision-making, which requires maintaining a posterior distribution over a model. However, calculating exact posterior distributions is intractable for all but the simplest models. Consequently, efficient computation of an approximate posterior distribution is a crucial problem for scalable TS with complex models, such as neural networks. In this paper, we use distribution optimization techniques to approximate the posterior distribution, solved via Wasserstein gradient flows. Based on the framework, a principled particle-optimization algorithm is developed for TS to approximate the posterior efficiently. Our approach is scalable and does not make explicit distribution assumptions on posterior approximations. Extensive experiments on both synthetic data and real large-scale data demonstrate the superior performance of the proposed methods.
Tasks	Decision Making
Published	2019-02-19
URL	http://arxiv.org/abs/1902.07239v1
PDF	http://arxiv.org/pdf/1902.07239v1.pdf
PWC	https://paperswithcode.com/paper/scalable-thompson-sampling-via-optimal
Repo
Framework

A computational model of early language acquisition from audiovisual experiences of young infants


Title	A computational model of early language acquisition from audiovisual experiences of young infants
Authors	Okko Räsänen, Khazar Khorrami
Abstract	Earlier research has suggested that human infants might use statistical dependencies between speech and non-linguistic multimodal input to bootstrap their language learning before they know how to segment words from running speech. However, feasibility of this hypothesis in terms of real-world infant experiences has remained unclear. This paper presents a step towards a more realistic test of the multimodal bootstrapping hypothesis by describing a neural network model that can learn word segments and their meanings from referentially ambiguous acoustic input. The model is tested on recordings of real infant-caregiver interactions using utterance-level labels for concrete visual objects that were attended by the infant when caregiver spoke an utterance containing the name of the object, and using random visual labels for utterances during absence of attention. The results show that beginnings of lexical knowledge may indeed emerge from individually ambiguous learning scenarios. In addition, the hidden layers of the network show gradually increasing selectivity to phonetic categories as a function of layer depth, resembling models trained for phone recognition in a supervised manner.
Tasks	Language Acquisition
Published	2019-06-24
URL	https://arxiv.org/abs/1906.09832v1
PDF	https://arxiv.org/pdf/1906.09832v1.pdf
PWC	https://paperswithcode.com/paper/a-computational-model-of-early-language
Repo
Framework

Live Forensics for Distributed Storage Systems


Title	Live Forensics for Distributed Storage Systems
Authors	Saurabh Jha, Shengkun Cui, Tianyin Xu, Jeremy Enos, Mike Showerman, Mark Dalton, Zbigniew T. Kalbarczyk, William T. Kramer, Ravishankar K. Iyer
Abstract	We present Kaleidoscope an innovative system that supports live forensics for application performance problems caused by either individual component failures or resource contention issues in large-scale distributed storage systems. The design of Kaleidoscope is driven by our study of I/O failures observed in a peta-scale storage system anonymized as PetaStore. Kaleidoscope is built on three key features: 1) using temporal and spatial differential observability for end-to-end performance monitoring of I/O requests, 2) modeling the health of storage components as a stochastic process using domain-guided functions that accounts for path redundancy and uncertainty in measurements, and, 3) observing differences in reliability and performance metrics between similar types of healthy and unhealthy components to attribute the most likely root causes. We deployed Kaleidoscope on PetaStore and our evaluation shows that Kaleidoscope can run live forensics at 5-minute intervals and pinpoint the root causes of 95.8% of real-world performance issues, with negligible monitoring overhead.
Tasks
Published	2019-07-24
URL	https://arxiv.org/abs/1907.10203v1
PDF	https://arxiv.org/pdf/1907.10203v1.pdf
PWC	https://paperswithcode.com/paper/live-forensics-for-distributed-storage
Repo
Framework

Event Outcome Prediction using Sentiment Analysis and Crowd Wisdom in Microblog Feeds


Title	Event Outcome Prediction using Sentiment Analysis and Crowd Wisdom in Microblog Feeds
Authors	Rahul Radhakrishnan Iyer, Ronghuo Zheng, Yuezhang Li, Katia Sycara
Abstract	Sentiment Analysis of microblog feeds has attracted considerable interest in recent times. Most of the current work focuses on tweet sentiment classification. But not much work has been done to explore how reliable the opinions of the mass (crowd wisdom) in social network microblogs such as twitter are in predicting outcomes of certain events such as election debates. In this work, we investigate whether crowd wisdom is useful in predicting such outcomes and whether their opinions are influenced by the experts in the field. We work in the domain of multi-label classification to perform sentiment classification of tweets and obtain the opinion of the crowd. This learnt sentiment is then used to predict outcomes of events such as: US Presidential Debate winners, Grammy Award winners, Super Bowl Winners. We find that in most of the cases, the wisdom of the crowd does indeed match with that of the experts, and in cases where they don’t (particularly in the case of debates), we see that the crowd’s opinion is actually influenced by that of the experts.
Tasks	Multi-Label Classification, Sentiment Analysis
Published	2019-12-11
URL	https://arxiv.org/abs/1912.05066v1
PDF	https://arxiv.org/pdf/1912.05066v1.pdf
PWC	https://paperswithcode.com/paper/event-outcome-prediction-using-sentiment
Repo
Framework

Topic Classification Method for Analyzing Effect of eWOM on Consumer Game Sales


Title	Topic Classification Method for Analyzing Effect of eWOM on Consumer Game Sales
Authors	Yoshiki Horii, Hirofumi Nonaka, Elisa Claire Alemán Carreón, Hiroki Horino, Toru Hiraoka
Abstract	Electronic word-of-mouth (eWOM) has become an important resource for the analysis of marketing research. In this study, in order to analyze user needs for consumer game software, we focus on tweet data. And we proposed topic extraction method using entropy-based feature selection based feature expansion. We also applied it to the classification of the data extracted from tweet data by using SVM. As a result, we achieved a 0.63 F-measure.
Tasks	Feature Selection
Published	2019-04-23
URL	http://arxiv.org/abs/1904.13213v1
PDF	http://arxiv.org/pdf/1904.13213v1.pdf
PWC	https://paperswithcode.com/paper/190413213
Repo
Framework

Fair Division Without Disparate Impact


Title	Fair Division Without Disparate Impact
Authors	Alexander Peysakhovich, Christian Kroer
Abstract	We consider the problem of dividing items between individuals in a way that is fair both in the sense of distributional fairness and in the sense of not having disparate impact across protected classes. An important existing mechanism for distributionally fair division is competitive equilibrium from equal incomes (CEEI). Unfortunately, CEEI will not, in general, respect disparate impact constraints. We consider two types of disparate impact measures: requiring that allocations be similar across protected classes and requiring that average utility levels be similar across protected classes. We modify the standard CEEI algorithm in two ways: equitable equilibrium from equal incomes, which removes disparate impact in allocations, and competitive equilibrium from equitable incomes which removes disparate impact in attained utility levels. We show analytically that removing disparate impact in outcomes breaks several of CEEI’s desirable properties such as envy, regret, Pareto optimality, and incentive compatibility. By contrast, we can remove disparate impact in attained utility levels without affecting these properties. Finally, we experimentally evaluate the tradeoffs between efficiency, equity, and disparate impact in a recommender-system based market.
Tasks	Recommendation Systems
Published	2019-06-06
URL	https://arxiv.org/abs/1906.02775v1
PDF	https://arxiv.org/pdf/1906.02775v1.pdf
PWC	https://paperswithcode.com/paper/fair-division-without-disparate-impact
Repo
Framework

Recognition in Unseen Domains: Domain Generalization via Universal Non-volume Preserving Models


Title	Recognition in Unseen Domains: Domain Generalization via Universal Non-volume Preserving Models
Authors	Thanh-Dat Truong, Chi Nhan Duong, Khoa Luu, Minh-Triet Tran
Abstract	Recognition across domains has recently become an active topic in the research community. However, it has been largely overlooked in the problem of recognition in new unseen domains. Under this condition, the delivered deep network models are unable to be updated, adapted or fine-tuned. Therefore, recent deep learning techniques, such as: domain adaptation, feature transferring, and fine-tuning, cannot be applied. This paper presents a novel approach to the problem of domain generalization in the context of deep learning. The proposed method is evaluated on different datasets in various problems, i.e. (i) digit recognition on MNIST, SVHN and MNIST-M, (ii) face recognition on Extended Yale-B, CMU-PIE and CMU-MPIE, and (iii) pedestrian recognition on RGB and Thermal image datasets. The experimental results show that our proposed method consistently improves the performance accuracy. It can be also easily incorporated with any other CNN frameworks within an end-to-end deep network design for object detection and recognition problems to improve their performance.
Tasks	Domain Adaptation, Domain Generalization, Face Recognition, Object Detection
Published	2019-05-28
URL	https://arxiv.org/abs/1905.13040v1
PDF	https://arxiv.org/pdf/1905.13040v1.pdf
PWC	https://paperswithcode.com/paper/recognition-in-unseen-domains-domain
Repo
Framework

Giant Panda Face Recognition Using Small Dataset


Title	Giant Panda Face Recognition Using Small Dataset
Authors	Wojciech Michal Matkowski, Adams Wai Kin Kong, Han Su, Peng Chen, Rong Hou, Zhihe Zhang
Abstract	Giant panda (panda) is a highly endangered animal. Significant efforts and resources have been put on panda conservation. To measure effectiveness of conservation schemes, estimating its population size in wild is an important task. The current population estimation approaches, including capture-recapture, human visual identification and collection of DNA from hair or feces, are invasive, subjective, costly or even dangerous to the workers who perform these tasks in wild. Cameras have been widely installed in the regions where pandas live. It opens a new possibility for non-invasive image based panda recognition. Panda face recognition is naturally a small dataset problem, because of the number of pandas in the world and the number of qualified images captured by the cameras in each encounter. In this paper, a panda face recognition algorithm, which includes alignment, large feature set extraction and matching is proposed and evaluated on a dataset consisting of 163 images. The experimental results are encouraging.
Tasks	Face Recognition
Published	2019-05-27
URL	https://arxiv.org/abs/1905.11163v1
PDF	https://arxiv.org/pdf/1905.11163v1.pdf
PWC	https://paperswithcode.com/paper/giant-panda-face-recognition-using-small
Repo
Framework