Paper Group ANR 1727
Wasserstein-Fisher-Rao Document Distance. MANELA: A Multi-Agent Algorithm for Learning Network Embeddings. An Imitation Learning Approach to Unsupervised Parsing. Inpatient2Vec: Medical Representation Learning for Inpatients. Adaptive Matrix Completion for the Users and the Items in Tail. Style Transfer with Time Series: Generating Synthetic Financ …
Wasserstein-Fisher-Rao Document Distance
Title | Wasserstein-Fisher-Rao Document Distance |
Authors | Zihao Wang, Datong Zhou, Yong Zhang, Hao Wu, Chenglong Bao |
Abstract | As a fundamental problem of natural language processing, it is important to measure the distance between different documents. Among the existing methods, the Word Mover’s Distance (WMD) has shown remarkable success in document semantic matching for its clear physical insight as a parameter-free model. However, WMD is essentially based on the classical Wasserstein metric, thus it often fails to robustly represent the semantic similarity between texts of different lengths. In this paper, we apply the newly developed Wasserstein-Fisher-Rao (WFR) metric from unbalanced optimal transport theory to measure the distance between different documents. The proposed WFR document distance maintains the great interpretability and simplicity as WMD. We demonstrate that the WFR document distance has significant advantages when comparing the texts of different lengths. In addition, an accelerated Sinkhorn based algorithm with GPU implementation has been developed for the fast computation of WFR distances. The KNN classification results on eight datasets have shown its clear improvement over WMD. |
Tasks | Semantic Similarity, Semantic Textual Similarity |
Published | 2019-04-23 |
URL | https://arxiv.org/abs/1904.10294v2 |
https://arxiv.org/pdf/1904.10294v2.pdf | |
PWC | https://paperswithcode.com/paper/wasserstein-fisher-rao-document-distance |
Repo | |
Framework | |
MANELA: A Multi-Agent Algorithm for Learning Network Embeddings
Title | MANELA: A Multi-Agent Algorithm for Learning Network Embeddings |
Authors | Han Zhang, Hong Xu |
Abstract | Playing an essential role in data mining, machine learning has a long history of being applied to networks on multifarious tasks and has played an essential role in data mining. However, the discrete and sparse natures of networks often render it difficult to apply machine learning directly to networks. To circumvent this difficulty, one major school of thought to approach networks using machine learning is via network embeddings. On the one hand, this network embeddings have achieved huge success on aggregated network data in recent years. On the other hand, learning network embeddings on distributively stored networks still remained understudied: To the best of our knowledge, all existing algorithms for learning network embeddings have hitherto been exclusively centralized and thus cannot be applied to these networks. To accommodate distributively stored networks, in this paper, we proposed a multi-agent model. Under this model, we developed the multi-agent network embedding learning algorithm (MANELA) for learning network embeddings. We demonstrate MANELA’s advantages over other existing centralized network embedding learning algorithms both theoretically and experimentally. Finally, we further our understanding in MANELA via visualization and exploration of its relationship to DeepWalk. |
Tasks | Network Embedding |
Published | 2019-12-01 |
URL | https://arxiv.org/abs/1912.00303v1 |
https://arxiv.org/pdf/1912.00303v1.pdf | |
PWC | https://paperswithcode.com/paper/manela-a-multi-agent-algorithm-for-learning |
Repo | |
Framework | |
An Imitation Learning Approach to Unsupervised Parsing
Title | An Imitation Learning Approach to Unsupervised Parsing |
Authors | Bowen Li, Lili Mou, Frank Keller |
Abstract | Recently, there has been an increasing interest in unsupervised parsers that optimize semantically oriented objectives, typically using reinforcement learning. Unfortunately, the learned trees often do not match actual syntax trees well. Shen et al. (2018) propose a structured attention mechanism for language modeling (PRPN), which induces better syntactic structures but relies on ad hoc heuristics. Also, their model lacks interpretability as it is not grounded in parsing actions. In our work, we propose an imitation learning approach to unsupervised parsing, where we transfer the syntactic knowledge induced by the PRPN to a Tree-LSTM model with discrete parsing actions. Its policy is then refined by Gumbel-Softmax training towards a semantically oriented objective. We evaluate our approach on the All Natural Language Inference dataset and show that it achieves a new state of the art in terms of parsing $F$-score, outperforming our base models, including the PRPN. |
Tasks | Imitation Learning, Language Modelling, Natural Language Inference |
Published | 2019-06-05 |
URL | https://arxiv.org/abs/1906.02276v1 |
https://arxiv.org/pdf/1906.02276v1.pdf | |
PWC | https://paperswithcode.com/paper/an-imitation-learning-approach-to |
Repo | |
Framework | |
Inpatient2Vec: Medical Representation Learning for Inpatients
Title | Inpatient2Vec: Medical Representation Learning for Inpatients |
Authors | Ying Wang, Xiao Xu, Tao Jin, Xiang Li, Guotong Xie, Jianmin Wang |
Abstract | Representation learning (RL) plays an important role in extracting proper representations from complex medical data for various analyzing tasks, such as patient grouping, clinical endpoint prediction and medication recommendation. Medical data can be divided into two typical categories, outpatient and inpatient, that have different data characteristics. However, few of existing RL methods are specially designed for inpatients data, which have strong temporal relations and consistent diagnosis. In addition, for unordered medical activity set, existing medical RL methods utilize a simple pooling strategy, which would result in indistinguishable contributions among the activities for learning. In this work, weproposeInpatient2Vec, anovelmodel for learning three kinds of representations for inpatient, including medical activity, hospital day and diagnosis. A multi-layer self-attention mechanism with two training tasks is designed to capture the inpatient data characteristics and process the unordered set. Using a real-world dataset, we demonstrate that the proposed approach outperforms the competitive baselines on semantic similarity measurement and clinical events prediction tasks. |
Tasks | Representation Learning, Semantic Similarity, Semantic Textual Similarity |
Published | 2019-04-18 |
URL | https://arxiv.org/abs/1904.08558v2 |
https://arxiv.org/pdf/1904.08558v2.pdf | |
PWC | https://paperswithcode.com/paper/inpatient2vec-medical-representation-learning |
Repo | |
Framework | |
Adaptive Matrix Completion for the Users and the Items in Tail
Title | Adaptive Matrix Completion for the Users and the Items in Tail |
Authors | Mohit Sharma, George Karypis |
Abstract | Recommender systems are widely used to recommend the most appealing items to users. These recommendations can be generated by applying collaborative filtering methods. The low-rank matrix completion method is the state-of-the-art collaborative filtering method. In this work, we show that the skewed distribution of ratings in the user-item rating matrix of real-world datasets affects the accuracy of matrix-completion-based approaches. Also, we show that the number of ratings that an item or a user has positively correlates with the ability of low-rank matrix-completion-based approaches to predict the ratings for the item or the user accurately. Furthermore, we use these insights to develop four matrix completion-based approaches, i.e., Frequency Adaptive Rating Prediction (FARP), Truncated Matrix Factorization (TMF), Truncated Matrix Factorization with Dropout (TMF + Dropout) and Inverse Frequency Weighted Matrix Factorization (IFWMF), that outperforms traditional matrix-completion-based approaches for the users and the items with few ratings in the user-item rating matrix. |
Tasks | Low-Rank Matrix Completion, Matrix Completion, Recommendation Systems |
Published | 2019-04-22 |
URL | https://arxiv.org/abs/1904.11800v2 |
https://arxiv.org/pdf/1904.11800v2.pdf | |
PWC | https://paperswithcode.com/paper/190411800 |
Repo | |
Framework | |
Style Transfer with Time Series: Generating Synthetic Financial Data
Title | Style Transfer with Time Series: Generating Synthetic Financial Data |
Authors | Brandon Da Silva, Sylvie Shang Shi |
Abstract | Training deep learning models that generalize well to live deployment is a challenging problem in the financial markets. The challenge arises because of high dimensionality, limited observations, changing data distributions, and a low signal-to-noise ratio. High dimensionality can be dealt with using robust feature selection or dimensionality reduction, but limited observations often result in a model that overfits due to the large parameter space of most deep neural networks. We propose a generative model for financial time series, which allows us to train deep learning models on millions of simulated paths. We show that our generative model is able to create realistic paths that embed the underlying structure of the markets in a way stochastic processes cannot. |
Tasks | Dimensionality Reduction, Feature Selection, Style Transfer, Time Series |
Published | 2019-05-24 |
URL | https://arxiv.org/abs/1906.03232v2 |
https://arxiv.org/pdf/1906.03232v2.pdf | |
PWC | https://paperswithcode.com/paper/towards-improved-generalization-in-financial |
Repo | |
Framework | |
Out of distribution detection for intra-operative functional imaging
Title | Out of distribution detection for intra-operative functional imaging |
Authors | Tim J. Adler, Leonardo Ayala, Lynton Ardizzone, Hannes G. Kenngott, Anant Vemuri, Beat P. Müller-Stich, Carsten Rother, Ullrich Köthe, Lena Maier-Hein |
Abstract | Multispectral optical imaging is becoming a key tool in the operating room. Recent research has shown that machine learning algorithms can be used to convert pixel-wise reflectance measurements to tissue parameters, such as oxygenation. However, the accuracy of these algorithms can only be guaranteed if the spectra acquired during surgery match the ones seen during training. It is therefore of great interest to detect so-called out of distribution (OoD) spectra to prevent the algorithm from presenting spurious results. In this paper we present an information theory based approach to OoD detection based on the widely applicable information criterion (WAIC). Our work builds upon recent methodology related to invertible neural networks (INN). Specifically, we make use of an ensemble of INNs as we need their tractable Jacobians in order to compute the WAIC. Comprehensive experiments with in silico, and in vivo multispectral imaging data indicate that our approach is well-suited for OoD detection. Our method could thus be an important step towards reliable functional imaging in the operating room. |
Tasks | Out-of-Distribution Detection |
Published | 2019-11-05 |
URL | https://arxiv.org/abs/1911.01877v1 |
https://arxiv.org/pdf/1911.01877v1.pdf | |
PWC | https://paperswithcode.com/paper/out-of-distribution-detection-for-intra |
Repo | |
Framework | |
Performance Analysis of Deep Learning Workloads on Leading-edge Systems
Title | Performance Analysis of Deep Learning Workloads on Leading-edge Systems |
Authors | Yihui Ren, Shinjae Yoo, Adolfy Hoisie |
Abstract | This work examines the performance of leading-edge systems designed for machine learning computing, including the NVIDIA DGX-2, Amazon Web Services (AWS) P3, IBM Power System Accelerated Compute Server AC922, and a consumer-grade Exxact TensorEX TS4 GPU server. Representative deep learning workloads from the fields of computer vision and natural language processing are the focus of the analysis. Performance analysis is performed along with a number of important dimensions. Performance of the communication interconnects and large and high-throughput deep learning models are considered. Different potential use models for the systems as standalone and in the cloud also are examined. The effect of various optimization of the deep learning models and system configurations is included in the analysis. |
Tasks | |
Published | 2019-05-21 |
URL | https://arxiv.org/abs/1905.08764v2 |
https://arxiv.org/pdf/1905.08764v2.pdf | |
PWC | https://paperswithcode.com/paper/performance-analysis-of-deep-learning |
Repo | |
Framework | |
Components of Machine Learning: Binding Bits and FLOPS
Title | Components of Machine Learning: Binding Bits and FLOPS |
Authors | Alexander Jung |
Abstract | Many machine learning problems and methods are combinations of three components: data, hypothesis space and loss function. Different machine learning methods are obtained as combinations of different choices for the representation of data, hypothesis space and loss function. After reviewing the mathematical structure of these three components, we discuss intrinsic trade-offs between statistical and computational properties of machine learning methods. |
Tasks | |
Published | 2019-10-25 |
URL | https://arxiv.org/abs/1910.12387v2 |
https://arxiv.org/pdf/1910.12387v2.pdf | |
PWC | https://paperswithcode.com/paper/components-of-machine-learning-binding-bits |
Repo | |
Framework | |
Federated Deep Reinforcement Learning
Title | Federated Deep Reinforcement Learning |
Authors | Hankz Hankui Zhuo, Wenfeng Feng, Yufeng Lin, Qian Xu, Qiang Yang |
Abstract | In deep reinforcement learning, building policies of high-quality is challenging when the feature space of states is small and the training data is limited. Despite the success of previous transfer learning approaches in deep reinforcement learning, directly transferring data or models from an agent to another agent is often not allowed due to the privacy of data and/or models in many privacy-aware applications. In this paper, we propose a novel deep reinforcement learning framework to federatively build models of high-quality for agents with consideration of their privacies, namely Federated deep Reinforcement Learning (FedRL). To protect the privacy of data and models, we exploit Gausian differentials on the information shared with each other when updating their local models. In the experiment, we evaluate our FedRL framework in two diverse domains, Grid-world and Text2Action domains, by comparing to various baselines. |
Tasks | Transfer Learning |
Published | 2019-01-24 |
URL | https://arxiv.org/abs/1901.08277v3 |
https://arxiv.org/pdf/1901.08277v3.pdf | |
PWC | https://paperswithcode.com/paper/federated-reinforcement-learning |
Repo | |
Framework | |
The Practical Challenges of Active Learning: Lessons Learned from Live Experimentation
Title | The Practical Challenges of Active Learning: Lessons Learned from Live Experimentation |
Authors | Jean-François Kagy, Tolga Kayadelen, Ji Ma, Afshin Rostamizadeh, Jana Strnadova |
Abstract | We tested in a live setting the use of active learning for selecting text sentences for human annotations used in training a Thai segmentation machine learning model. In our study, two concurrent annotated samples were constructed, one through random sampling of sentences from a text corpus, and the other through model-based scoring and ranking of sentences from the same corpus. In the course of the experiment, we observed the effect of significant changes to the learning environment which are likely to occur in real-world learning tasks. We describe how our active learning strategy interacted with these events and discuss other practical challenges encountered in using active learning in the live setting. |
Tasks | Active Learning |
Published | 2019-06-28 |
URL | https://arxiv.org/abs/1907.00038v1 |
https://arxiv.org/pdf/1907.00038v1.pdf | |
PWC | https://paperswithcode.com/paper/the-practical-challenges-of-active-learning |
Repo | |
Framework | |
Private Hypothesis Selection
Title | Private Hypothesis Selection |
Authors | Mark Bun, Gautam Kamath, Thomas Steinke, Zhiwei Steven Wu |
Abstract | We provide a differentially private algorithm for hypothesis selection. Given samples from an unknown probability distribution $P$ and a set of $m$ probability distributions $\mathcal{H}$, the goal is to output, in a $\varepsilon$-differentially private manner, a distribution from $\mathcal{H}$ whose total variation distance to $P$ is comparable to that of the best such distribution (which we denote by $\alpha$). The sample complexity of our basic algorithm is $O\left(\frac{\log m}{\alpha^2} + \frac{\log m}{\alpha \varepsilon}\right)$, representing a minimal cost for privacy when compared to the non-private algorithm. We also can handle infinite hypothesis classes $\mathcal{H}$ by relaxing to $(\varepsilon,\delta)$-differential privacy. We apply our hypothesis selection algorithm to give learning algorithms for a number of natural distribution classes, including Gaussians, product distributions, sums of independent random variables, piecewise polynomials, and mixture classes. Our hypothesis selection procedure allows us to generically convert a cover for a class to a learning algorithm, complementing known learning lower bounds which are in terms of the size of the packing number of the class. As the covering and packing numbers are often closely related, for constant $\alpha$, our algorithms achieve the optimal sample complexity for many classes of interest. Finally, we describe an application to private distribution-free PAC learning. |
Tasks | |
Published | 2019-05-30 |
URL | https://arxiv.org/abs/1905.13229v3 |
https://arxiv.org/pdf/1905.13229v3.pdf | |
PWC | https://paperswithcode.com/paper/private-hypothesis-selection |
Repo | |
Framework | |
Unsupervised 3D Pose Estimation with Geometric Self-Supervision
Title | Unsupervised 3D Pose Estimation with Geometric Self-Supervision |
Authors | Ching-Hang Chen, Ambrish Tyagi, Amit Agrawal, Dylan Drover, Rohith MV, Stefan Stojanov, James M. Rehg |
Abstract | We present an unsupervised learning approach to recover 3D human pose from 2D skeletal joints extracted from a single image. Our method does not require any multi-view image data, 3D skeletons, correspondences between 2D-3D points, or use previously learned 3D priors during training. A lifting network accepts 2D landmarks as inputs and generates a corresponding 3D skeleton estimate. During training, the recovered 3D skeleton is reprojected on random camera viewpoints to generate new “synthetic” 2D poses. By lifting the synthetic 2D poses back to 3D and re-projecting them in the original camera view, we can define self-consistency loss both in 3D and in 2D. The training can thus be self supervised by exploiting the geometric self-consistency of the lift-reproject-lift process. We show that self-consistency alone is not sufficient to generate realistic skeletons, however adding a 2D pose discriminator enables the lifter to output valid 3D poses. Additionally, to learn from 2D poses “in the wild”, we train an unsupervised 2D domain adapter network to allow for an expansion of 2D data. This improves results and demonstrates the usefulness of 2D pose data for unsupervised 3D lifting. Results on Human3.6M dataset for 3D human pose estimation demonstrate that our approach improves upon the previous unsupervised methods by 30% and outperforms many weakly supervised approaches that explicitly use 3D data. |
Tasks | 3D Human Pose Estimation, 3D Pose Estimation, Pose Estimation |
Published | 2019-04-09 |
URL | http://arxiv.org/abs/1904.04812v1 |
http://arxiv.org/pdf/1904.04812v1.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-3d-pose-estimation-with |
Repo | |
Framework | |
State-of-the-Art Speech Recognition Using Multi-Stream Self-Attention With Dilated 1D Convolutions
Title | State-of-the-Art Speech Recognition Using Multi-Stream Self-Attention With Dilated 1D Convolutions |
Authors | Kyu J. Han, Ramon Prieto, Kaixing Wu, Tao Ma |
Abstract | Self-attention has been a huge success for many downstream tasks in NLP, which led to exploration of applying self-attention to speech problems as well. The efficacy of self-attention in speech applications, however, seems not fully blown yet since it is challenging to handle highly correlated speech frames in the context of self-attention. In this paper we propose a new neural network model architecture, namely multi-stream self-attention, to address the issue thus make the self-attention mechanism more effective for speech recognition. The proposed model architecture consists of parallel streams of self-attention encoders, and each stream has layers of 1D convolutions with dilated kernels whose dilation rates are unique given stream, followed by a self-attention layer. The self-attention mechanism in each stream pays attention to only one resolution of input speech frames and the attentive computation can be more efficient. In a later stage, outputs from all the streams are concatenated then linearly projected to the final embedding. By stacking the proposed multi-stream self-attention encoder blocks and rescoring the resultant lattices with neural network language models, we achieve the word error rate of 2.2% on the test-clean dataset of the LibriSpeech corpus, the best number reported thus far on the dataset. |
Tasks | Speech Recognition |
Published | 2019-10-01 |
URL | https://arxiv.org/abs/1910.00716v1 |
https://arxiv.org/pdf/1910.00716v1.pdf | |
PWC | https://paperswithcode.com/paper/state-of-the-art-speech-recognition-using |
Repo | |
Framework | |
BOSH: An Efficient Meta Algorithm for Decision-based Attacks
Title | BOSH: An Efficient Meta Algorithm for Decision-based Attacks |
Authors | Zhenxin Xiao, Puyudi Yang, Yuchen Jiang, Kai-Wei Chang, Cho-Jui Hsieh |
Abstract | Adversarial example generation becomes a viable method for evaluating the robustness of a machine learning model. In this paper, we consider hard-label black-box attacks (a.k.a. decision-based attacks), which is a challenging setting that generates adversarial examples based on only a series of black-box hard-label queries. This type of attacks can be used to attack discrete and complex models, such as Gradient Boosting Decision Tree (GBDT) and detection-based defense models. Existing decision-based attacks based on iterative local updates often get stuck in a local minimum and fail to generate the optimal adversarial example with the smallest distortion. To remedy this issue, we propose an efficient meta algorithm called BOSH-attack, which tremendously improves existing algorithms through Bayesian Optimization (BO) and Successive Halving (SH). In particular, instead of traversing a single solution path when searching an adversarial example, we maintain a pool of solution paths to explore important regions. We show empirically that the proposed algorithm converges to a better solution than existing approaches, while the query count is smaller than applying multiple random initializations by a factor of 10. |
Tasks | Adversarial Attack |
Published | 2019-09-10 |
URL | https://arxiv.org/abs/1909.04288v3 |
https://arxiv.org/pdf/1909.04288v3.pdf | |
PWC | https://paperswithcode.com/paper/toward-finding-the-global-optimal-of |
Repo | |
Framework | |