Paper Group ANR 1274
Wasserstein Dependency Measure for Representation Learning. BEHRT: Transformer for Electronic Health Records. Acquisition of Inflectional Morphology in Artificial Neural Networks With Prior Knowledge. Unbabel’s Submission to the WMT2019 APE Shared Task: BERT-based Encoder-Decoder for Automatic Post-Editing. Mixed-Integer Optimization Approach to Le …
Wasserstein Dependency Measure for Representation Learning
Title | Wasserstein Dependency Measure for Representation Learning |
Authors | Sherjil Ozair, Corey Lynch, Yoshua Bengio, Aaron van den Oord, Sergey Levine, Pierre Sermanet |
Abstract | Mutual information maximization has emerged as a powerful learning objective for unsupervised representation learning obtaining state-of-the-art performance in applications such as object recognition, speech recognition, and reinforcement learning. However, such approaches are fundamentally limited since a tight lower bound of mutual information requires sample size exponential in the mutual information. This limits the applicability of these approaches for prediction tasks with high mutual information, such as in video understanding or reinforcement learning. In these settings, such techniques are prone to overfit, both in theory and in practice, and capture only a few of the relevant factors of variation. This leads to incomplete representations that are not optimal for downstream tasks. In this work, we empirically demonstrate that mutual information-based representation learning approaches do fail to learn complete representations on a number of designed and real-world tasks. To mitigate these problems we introduce the Wasserstein dependency measure, which learns more complete representations by using the Wasserstein distance instead of the KL divergence in the mutual information estimator. We show that a practical approximation to this theoretically motivated solution, constructed using Lipschitz constraint techniques from the GAN literature, achieves substantially improved results on tasks where incomplete representations are a major challenge. |
Tasks | Object Recognition, Representation Learning, Speech Recognition, Unsupervised Representation Learning, Video Understanding |
Published | 2019-03-28 |
URL | http://arxiv.org/abs/1903.11780v1 |
http://arxiv.org/pdf/1903.11780v1.pdf | |
PWC | https://paperswithcode.com/paper/wasserstein-dependency-measure-for |
Repo | |
Framework | |
BEHRT: Transformer for Electronic Health Records
Title | BEHRT: Transformer for Electronic Health Records |
Authors | Yikuan Li, Shishir Rao, Jose Roberto Ayala Solares, Abdelaali Hassaine, Dexter Canoy, Yajie Zhu, Kazem Rahimi, Gholamreza Salimi-Khorshidi |
Abstract | Today, despite decades of developments in medicine and the growing interest in precision healthcare, vast majority of diagnoses happen once patients begin to show noticeable signs of illness. Early indication and detection of diseases, however, can provide patients and carers with the chance of early intervention, better disease management, and efficient allocation of healthcare resources. The latest developments in machine learning (more specifically, deep learning) provides a great opportunity to address this unmet need. In this study, we introduce BEHRT: A deep neural sequence transduction model for EHR (electronic health records), capable of multitask prediction and disease trajectory mapping. When trained and evaluated on the data from nearly 1.6 million individuals, BEHRT shows a striking absolute improvement of 8.0-10.8%, in terms of Average Precision Score, compared to the existing state-of-the-art deep EHR models (in terms of average precision, when predicting for the onset of 301 conditions). In addition to its superior prediction power, BEHRT provides a personalised view of disease trajectories through its attention mechanism; its flexible architecture enables it to incorporate multiple heterogeneous concepts (e.g., diagnosis, medication, measurements, and more) to improve the accuracy of its predictions; and its (pre-)training results in disease and patient representations that can help us get a step closer to interpretable predictions. |
Tasks | |
Published | 2019-07-22 |
URL | https://arxiv.org/abs/1907.09538v1 |
https://arxiv.org/pdf/1907.09538v1.pdf | |
PWC | https://paperswithcode.com/paper/behrt-transformer-for-electronic-health |
Repo | |
Framework | |
Acquisition of Inflectional Morphology in Artificial Neural Networks With Prior Knowledge
Title | Acquisition of Inflectional Morphology in Artificial Neural Networks With Prior Knowledge |
Authors | Katharina Kann |
Abstract | How does knowledge of one language’s morphology influence learning of inflection rules in a second one? In order to investigate this question in artificial neural network models, we perform experiments with a sequence-to-sequence architecture, which we train on different combinations of eight source and three target languages. A detailed analysis of the model outputs suggests the following conclusions: (i) if source and target language are closely related, acquisition of the target language’s inflectional morphology constitutes an easier task for the model; (ii) knowledge of a prefixing (resp. suffixing) language makes acquisition of a suffixing (resp. prefixing) language’s morphology more challenging; and (iii) surprisingly, a source language which exhibits an agglutinative morphology simplifies learning of a second language’s inflectional morphology, independent of their relatedness. |
Tasks | |
Published | 2019-10-12 |
URL | https://arxiv.org/abs/1910.05456v1 |
https://arxiv.org/pdf/1910.05456v1.pdf | |
PWC | https://paperswithcode.com/paper/acquisition-of-inflectional-morphology-in |
Repo | |
Framework | |
Unbabel’s Submission to the WMT2019 APE Shared Task: BERT-based Encoder-Decoder for Automatic Post-Editing
Title | Unbabel’s Submission to the WMT2019 APE Shared Task: BERT-based Encoder-Decoder for Automatic Post-Editing |
Authors | António V. Lopes, M. Amin Farajian, Gonçalo M. Correia, Jonay Trenous, André F. T. Martins |
Abstract | This paper describes Unbabel’s submission to the WMT2019 APE Shared Task for the English-German language pair. Following the recent rise of large, powerful, pre-trained models, we adapt the BERT pretrained model to perform Automatic Post-Editing in an encoder-decoder framework. Analogously to dual-encoder architectures we develop a BERT-based encoder-decoder (BED) model in which a single pretrained BERT encoder receives both the source src and machine translation tgt strings. Furthermore, we explore a conservativeness factor to constrain the APE system to perform fewer edits. As the official results show, when trained on a weighted combination of in-domain and artificial training data, our BED system with the conservativeness penalty improves significantly the translations of a strong Neural Machine Translation system by $-0.78$ and $+1.23$ in terms of TER and BLEU, respectively. Finally, our submission achieves a new state-of-the-art, ex-aequo, in English-German APE of NMT. |
Tasks | Automatic Post-Editing, Machine Translation |
Published | 2019-05-30 |
URL | https://arxiv.org/abs/1905.13068v2 |
https://arxiv.org/pdf/1905.13068v2.pdf | |
PWC | https://paperswithcode.com/paper/unbabels-submission-to-the-wmt2019-ape-shared |
Repo | |
Framework | |
Mixed-Integer Optimization Approach to Learning Association Rules for Unplanned ICU Transfer
Title | Mixed-Integer Optimization Approach to Learning Association Rules for Unplanned ICU Transfer |
Authors | Chun-An Chou, Qingtao Cao, Shao-Jen Weng, Che-Hung Tsai |
Abstract | After admission to emergency department (ED), patients with critical illnesses are transferred to intensive care unit (ICU) due to unexpected clinical deterioration occurrence. Identifying such unplanned ICU transfers is urgently needed for medical physicians to achieve two-fold goals: improving critical care quality and preventing mortality. A priority task is to understand the crucial rationale behind diagnosis results of individual patients during stay in ED, which helps prepare for an early transfer to ICU. Most existing prediction studies were based on univariate analysis or multiple logistic regression to provide one-size-fit-all results. However, patient condition varying from case to case may not be accurately examined by the only judgment. In this study, we present a new decision tool using a mathematical optimization approach aiming to automatically discover rules associating diagnostic features with high-risk outcome (i.e., unplanned transfers) in different deterioration scenarios. We consider four mutually exclusive patient subgroups based on the principal reasons of ED visits: infections, cardiovascular/respiratory diseases, gastrointestinal diseases, and neurological/other diseases at a suburban teaching hospital. The analysis results demonstrate significant rules associated with unplanned transfer outcome for each subgroups and also show comparable prediction accuracy, compared to state-of-the-art machine learning methods while providing easy-to-interpret symptom-outcome information. |
Tasks | |
Published | 2019-08-02 |
URL | https://arxiv.org/abs/1908.00966v1 |
https://arxiv.org/pdf/1908.00966v1.pdf | |
PWC | https://paperswithcode.com/paper/mixed-integer-optimization-approach-to |
Repo | |
Framework | |
Part-Guided Attention Learning for Vehicle Re-Identification
Title | Part-Guided Attention Learning for Vehicle Re-Identification |
Authors | Xinyu Zhang, Rufeng Zhang, Jiewei Cao, Dong Gong, Mingyu You, Chunhua Shen |
Abstract | Vehicle re-identification (Re-ID) often requires one to recognize the fine-grained visual differences between vehicles. Besides the holistic appearance of vehicles which is easily affected by the viewpoint variation and distortion, vehicle parts also provide crucial cues to differentiate near-identical vehicles. Motivated by these observations, we introduce a Part-Guided Attention Network (PGAN) to pinpoint the prominent part regions and effectively combine the global and part information for discriminative feature learning. PGAN first detects the locations of different part components and salient regions regardless of the vehicle identity, which serve as the bottom-up attention to narrow down the possible searching regions. To estimate the importance of detected parts, we propose a Part Attention Module (PAM) to adaptively locate the most discriminative regions with high-attention weights and suppress the distraction of irrelevant parts with relatively low weights. The PAM is guided by the Re-ID loss and therefore provides top-down attention that enables attention to be calculated at the level of car parts and other salient regions. Finally, we aggregate the global appearance and part features to improve the feature performance further. The PGAN combines part-guided bottom-up and top-down attention, global and part visual features in an end-to-end framework. Extensive experiments demonstrate that the proposed method achieves new state-of-the-art vehicle Re-ID performance on four large-scale benchmark datasets. |
Tasks | Vehicle Re-Identification |
Published | 2019-09-13 |
URL | https://arxiv.org/abs/1909.06023v3 |
https://arxiv.org/pdf/1909.06023v3.pdf | |
PWC | https://paperswithcode.com/paper/part-guided-attention-learning-for-vehicle-re |
Repo | |
Framework | |
Why gradient clipping accelerates training: A theoretical justification for adaptivity
Title | Why gradient clipping accelerates training: A theoretical justification for adaptivity |
Authors | Jingzhao Zhang, Tianxing He, Suvrit Sra, Ali Jadbabaie |
Abstract | We provide a theoretical explanation for the effectiveness of gradient clipping in training deep neural networks. The key ingredient is a new smoothness condition derived from practical neural network training examples. We observe that gradient smoothness, a concept central to the analysis of first-order optimization algorithms that is often assumed to be a constant, demonstrates significant variability along the training trajectory of deep neural networks. Further, this smoothness positively correlates with the gradient norm, and contrary to standard assumptions in the literature, it can grow with the norm of the gradient. These empirical observations limit the applicability of existing theoretical analyses of algorithms that rely on a fixed bound on smoothness. These observations motivate us to introduce a novel relaxation of gradient smoothness that is weaker than the commonly used Lipschitz smoothness assumption. Under the new condition, we prove that two popular methods, namely, \emph{gradient clipping} and \emph{normalized gradient}, converge arbitrarily faster than gradient descent with fixed stepsize. We further explain why such adaptively scaled gradient methods can accelerate empirical convergence and verify our results empirically in popular neural network training settings. |
Tasks | Image Classification, Language Modelling |
Published | 2019-05-28 |
URL | https://arxiv.org/abs/1905.11881v2 |
https://arxiv.org/pdf/1905.11881v2.pdf | |
PWC | https://paperswithcode.com/paper/analysis-of-gradient-clipping-and-adaptive |
Repo | |
Framework | |
Scalable Thompson Sampling via Optimal Transport
Title | Scalable Thompson Sampling via Optimal Transport |
Authors | Ruiyi Zhang, Zheng Wen, Changyou Chen, Lawrence Carin |
Abstract | Thompson sampling (TS) is a class of algorithms for sequential decision-making, which requires maintaining a posterior distribution over a model. However, calculating exact posterior distributions is intractable for all but the simplest models. Consequently, efficient computation of an approximate posterior distribution is a crucial problem for scalable TS with complex models, such as neural networks. In this paper, we use distribution optimization techniques to approximate the posterior distribution, solved via Wasserstein gradient flows. Based on the framework, a principled particle-optimization algorithm is developed for TS to approximate the posterior efficiently. Our approach is scalable and does not make explicit distribution assumptions on posterior approximations. Extensive experiments on both synthetic data and real large-scale data demonstrate the superior performance of the proposed methods. |
Tasks | Decision Making |
Published | 2019-02-19 |
URL | http://arxiv.org/abs/1902.07239v1 |
http://arxiv.org/pdf/1902.07239v1.pdf | |
PWC | https://paperswithcode.com/paper/scalable-thompson-sampling-via-optimal |
Repo | |
Framework | |
A computational model of early language acquisition from audiovisual experiences of young infants
Title | A computational model of early language acquisition from audiovisual experiences of young infants |
Authors | Okko Räsänen, Khazar Khorrami |
Abstract | Earlier research has suggested that human infants might use statistical dependencies between speech and non-linguistic multimodal input to bootstrap their language learning before they know how to segment words from running speech. However, feasibility of this hypothesis in terms of real-world infant experiences has remained unclear. This paper presents a step towards a more realistic test of the multimodal bootstrapping hypothesis by describing a neural network model that can learn word segments and their meanings from referentially ambiguous acoustic input. The model is tested on recordings of real infant-caregiver interactions using utterance-level labels for concrete visual objects that were attended by the infant when caregiver spoke an utterance containing the name of the object, and using random visual labels for utterances during absence of attention. The results show that beginnings of lexical knowledge may indeed emerge from individually ambiguous learning scenarios. In addition, the hidden layers of the network show gradually increasing selectivity to phonetic categories as a function of layer depth, resembling models trained for phone recognition in a supervised manner. |
Tasks | Language Acquisition |
Published | 2019-06-24 |
URL | https://arxiv.org/abs/1906.09832v1 |
https://arxiv.org/pdf/1906.09832v1.pdf | |
PWC | https://paperswithcode.com/paper/a-computational-model-of-early-language |
Repo | |
Framework | |
Live Forensics for Distributed Storage Systems
Title | Live Forensics for Distributed Storage Systems |
Authors | Saurabh Jha, Shengkun Cui, Tianyin Xu, Jeremy Enos, Mike Showerman, Mark Dalton, Zbigniew T. Kalbarczyk, William T. Kramer, Ravishankar K. Iyer |
Abstract | We present Kaleidoscope an innovative system that supports live forensics for application performance problems caused by either individual component failures or resource contention issues in large-scale distributed storage systems. The design of Kaleidoscope is driven by our study of I/O failures observed in a peta-scale storage system anonymized as PetaStore. Kaleidoscope is built on three key features: 1) using temporal and spatial differential observability for end-to-end performance monitoring of I/O requests, 2) modeling the health of storage components as a stochastic process using domain-guided functions that accounts for path redundancy and uncertainty in measurements, and, 3) observing differences in reliability and performance metrics between similar types of healthy and unhealthy components to attribute the most likely root causes. We deployed Kaleidoscope on PetaStore and our evaluation shows that Kaleidoscope can run live forensics at 5-minute intervals and pinpoint the root causes of 95.8% of real-world performance issues, with negligible monitoring overhead. |
Tasks | |
Published | 2019-07-24 |
URL | https://arxiv.org/abs/1907.10203v1 |
https://arxiv.org/pdf/1907.10203v1.pdf | |
PWC | https://paperswithcode.com/paper/live-forensics-for-distributed-storage |
Repo | |
Framework | |
Event Outcome Prediction using Sentiment Analysis and Crowd Wisdom in Microblog Feeds
Title | Event Outcome Prediction using Sentiment Analysis and Crowd Wisdom in Microblog Feeds |
Authors | Rahul Radhakrishnan Iyer, Ronghuo Zheng, Yuezhang Li, Katia Sycara |
Abstract | Sentiment Analysis of microblog feeds has attracted considerable interest in recent times. Most of the current work focuses on tweet sentiment classification. But not much work has been done to explore how reliable the opinions of the mass (crowd wisdom) in social network microblogs such as twitter are in predicting outcomes of certain events such as election debates. In this work, we investigate whether crowd wisdom is useful in predicting such outcomes and whether their opinions are influenced by the experts in the field. We work in the domain of multi-label classification to perform sentiment classification of tweets and obtain the opinion of the crowd. This learnt sentiment is then used to predict outcomes of events such as: US Presidential Debate winners, Grammy Award winners, Super Bowl Winners. We find that in most of the cases, the wisdom of the crowd does indeed match with that of the experts, and in cases where they don’t (particularly in the case of debates), we see that the crowd’s opinion is actually influenced by that of the experts. |
Tasks | Multi-Label Classification, Sentiment Analysis |
Published | 2019-12-11 |
URL | https://arxiv.org/abs/1912.05066v1 |
https://arxiv.org/pdf/1912.05066v1.pdf | |
PWC | https://paperswithcode.com/paper/event-outcome-prediction-using-sentiment |
Repo | |
Framework | |
Topic Classification Method for Analyzing Effect of eWOM on Consumer Game Sales
Title | Topic Classification Method for Analyzing Effect of eWOM on Consumer Game Sales |
Authors | Yoshiki Horii, Hirofumi Nonaka, Elisa Claire Alemán Carreón, Hiroki Horino, Toru Hiraoka |
Abstract | Electronic word-of-mouth (eWOM) has become an important resource for the analysis of marketing research. In this study, in order to analyze user needs for consumer game software, we focus on tweet data. And we proposed topic extraction method using entropy-based feature selection based feature expansion. We also applied it to the classification of the data extracted from tweet data by using SVM. As a result, we achieved a 0.63 F-measure. |
Tasks | Feature Selection |
Published | 2019-04-23 |
URL | http://arxiv.org/abs/1904.13213v1 |
http://arxiv.org/pdf/1904.13213v1.pdf | |
PWC | https://paperswithcode.com/paper/190413213 |
Repo | |
Framework | |
Fair Division Without Disparate Impact
Title | Fair Division Without Disparate Impact |
Authors | Alexander Peysakhovich, Christian Kroer |
Abstract | We consider the problem of dividing items between individuals in a way that is fair both in the sense of distributional fairness and in the sense of not having disparate impact across protected classes. An important existing mechanism for distributionally fair division is competitive equilibrium from equal incomes (CEEI). Unfortunately, CEEI will not, in general, respect disparate impact constraints. We consider two types of disparate impact measures: requiring that allocations be similar across protected classes and requiring that average utility levels be similar across protected classes. We modify the standard CEEI algorithm in two ways: equitable equilibrium from equal incomes, which removes disparate impact in allocations, and competitive equilibrium from equitable incomes which removes disparate impact in attained utility levels. We show analytically that removing disparate impact in outcomes breaks several of CEEI’s desirable properties such as envy, regret, Pareto optimality, and incentive compatibility. By contrast, we can remove disparate impact in attained utility levels without affecting these properties. Finally, we experimentally evaluate the tradeoffs between efficiency, equity, and disparate impact in a recommender-system based market. |
Tasks | Recommendation Systems |
Published | 2019-06-06 |
URL | https://arxiv.org/abs/1906.02775v1 |
https://arxiv.org/pdf/1906.02775v1.pdf | |
PWC | https://paperswithcode.com/paper/fair-division-without-disparate-impact |
Repo | |
Framework | |
Recognition in Unseen Domains: Domain Generalization via Universal Non-volume Preserving Models
Title | Recognition in Unseen Domains: Domain Generalization via Universal Non-volume Preserving Models |
Authors | Thanh-Dat Truong, Chi Nhan Duong, Khoa Luu, Minh-Triet Tran |
Abstract | Recognition across domains has recently become an active topic in the research community. However, it has been largely overlooked in the problem of recognition in new unseen domains. Under this condition, the delivered deep network models are unable to be updated, adapted or fine-tuned. Therefore, recent deep learning techniques, such as: domain adaptation, feature transferring, and fine-tuning, cannot be applied. This paper presents a novel approach to the problem of domain generalization in the context of deep learning. The proposed method is evaluated on different datasets in various problems, i.e. (i) digit recognition on MNIST, SVHN and MNIST-M, (ii) face recognition on Extended Yale-B, CMU-PIE and CMU-MPIE, and (iii) pedestrian recognition on RGB and Thermal image datasets. The experimental results show that our proposed method consistently improves the performance accuracy. It can be also easily incorporated with any other CNN frameworks within an end-to-end deep network design for object detection and recognition problems to improve their performance. |
Tasks | Domain Adaptation, Domain Generalization, Face Recognition, Object Detection |
Published | 2019-05-28 |
URL | https://arxiv.org/abs/1905.13040v1 |
https://arxiv.org/pdf/1905.13040v1.pdf | |
PWC | https://paperswithcode.com/paper/recognition-in-unseen-domains-domain |
Repo | |
Framework | |
Giant Panda Face Recognition Using Small Dataset
Title | Giant Panda Face Recognition Using Small Dataset |
Authors | Wojciech Michal Matkowski, Adams Wai Kin Kong, Han Su, Peng Chen, Rong Hou, Zhihe Zhang |
Abstract | Giant panda (panda) is a highly endangered animal. Significant efforts and resources have been put on panda conservation. To measure effectiveness of conservation schemes, estimating its population size in wild is an important task. The current population estimation approaches, including capture-recapture, human visual identification and collection of DNA from hair or feces, are invasive, subjective, costly or even dangerous to the workers who perform these tasks in wild. Cameras have been widely installed in the regions where pandas live. It opens a new possibility for non-invasive image based panda recognition. Panda face recognition is naturally a small dataset problem, because of the number of pandas in the world and the number of qualified images captured by the cameras in each encounter. In this paper, a panda face recognition algorithm, which includes alignment, large feature set extraction and matching is proposed and evaluated on a dataset consisting of 163 images. The experimental results are encouraging. |
Tasks | Face Recognition |
Published | 2019-05-27 |
URL | https://arxiv.org/abs/1905.11163v1 |
https://arxiv.org/pdf/1905.11163v1.pdf | |
PWC | https://paperswithcode.com/paper/giant-panda-face-recognition-using-small |
Repo | |
Framework | |