Paper Group ANR 1302
Quantum Unsupervised and Supervised Learning on Superconducting Processors. Offensive Language Analysis using Deep Learning Architecture. Conversational Contextual Bandit: Algorithm and Application. Detecting Patterns of Physiological Response to Hemodynamic Stress via Unsupervised Deep Learning. REMAP: Multi-layer entropy-guided pooling of dense C …
Quantum Unsupervised and Supervised Learning on Superconducting Processors
Title | Quantum Unsupervised and Supervised Learning on Superconducting Processors |
Authors | Abhijat Sarma, Rupak Chatterjee, Kaitlin Gili, Ting Yu |
Abstract | Machine learning algorithms perform well on identifying patterns in many datasets due to their versatility. However, as one increases the size of the data, the time for training and using these statistical models grows quickly. Here, we propose and implement on the IBMQ a quantum analogue to K-means clustering, and compare it to a previously developed quantum support vector machine. We find the algorithm’s accuracy comparable to classical K-means for clustering and classification problems, and find that it becomes less computationally expensive to implement for large datasets. |
Tasks | |
Published | 2019-09-10 |
URL | https://arxiv.org/abs/1909.04226v1 |
https://arxiv.org/pdf/1909.04226v1.pdf | |
PWC | https://paperswithcode.com/paper/quantum-unsupervised-and-supervised-learning |
Repo | |
Framework | |
Offensive Language Analysis using Deep Learning Architecture
Title | Offensive Language Analysis using Deep Learning Architecture |
Authors | Ryan Ong |
Abstract | SemEval-2019 Task 6 (Zampieri et al., 2019b) requires us to identify and categorise offensive language in social media. In this paper we will describe the process we took to tackle this challenge. Our process is heavily inspired by Sosa (2017) where he proposed CNN-LSTM and LSTM-CNN models to conduct twitter sentiment analysis. We decided to follow his approach as well as further his work by testing out different variations of RNN models with CNN. Specifically, we have divided the challenge into two parts: data processing and sampling and choosing the optimal deep learning architecture. In preprocessing, we experimented with two techniques, SMOTE and Class Weights to counter the imbalance between classes. Once we are happy with the quality of our input data, we proceed to choosing the optimal deep learning architecture for this task. Given the quality and quantity of data we have been given, we found that the addition of CNN layer provides very little to no additional improvement to our model’s performance and sometimes even lead to a decrease in our F1-score. In the end, the deep learning architecture that gives us the highest macro F1-score is a simple BiLSTM-CNN. |
Tasks | Sentiment Analysis, Twitter Sentiment Analysis |
Published | 2019-03-12 |
URL | http://arxiv.org/abs/1903.05280v3 |
http://arxiv.org/pdf/1903.05280v3.pdf | |
PWC | https://paperswithcode.com/paper/transforma-at-semeval-2019-task-6-offensive |
Repo | |
Framework | |
Conversational Contextual Bandit: Algorithm and Application
Title | Conversational Contextual Bandit: Algorithm and Application |
Authors | Xiaoying Zhang, Hong Xie, Hang Li, John C. S. Lui |
Abstract | Contextual bandit algorithms provide principled online learning solutions to balance the exploitation-exploration trade-off in various applications such as recommender systems. However, the learning speed of the traditional contextual bandit algorithms is often slow due to the need for extensive exploration. This poses a critical issue in applications like recommender systems, since users may need to provide feedbacks on a lot of uninterested items. To accelerate the learning speed, we generalize contextual bandit to conversational contextual bandit. Conversational contextual bandit leverages not only behavioral feedbacks on arms (e.g., articles in news recommendation), but also occasional conversational feedbacks on key-terms from the user. Here, a key-term can relate to a subset of arms, for example, a category of articles in news recommendation. We then design the Conversational UCB algorithm (ConUCB) to address two challenges in conversational contextual bandit: (1) which key-terms to select to conduct conversation, (2) how to leverage conversational feedbacks to accelerate the speed of bandit learning. We theoretically prove that ConUCB can achieve a smaller regret upper bound than the traditional contextual bandit algorithm LinUCB, which implies a faster learning speed. Experiments on synthetic data, as well as real datasets from Yelp and Toutiao, demonstrate the efficacy of the ConUCB algorithm. |
Tasks | Recommendation Systems |
Published | 2019-06-04 |
URL | https://arxiv.org/abs/1906.01219v2 |
https://arxiv.org/pdf/1906.01219v2.pdf | |
PWC | https://paperswithcode.com/paper/toward-building-conversational-recommender |
Repo | |
Framework | |
Detecting Patterns of Physiological Response to Hemodynamic Stress via Unsupervised Deep Learning
Title | Detecting Patterns of Physiological Response to Hemodynamic Stress via Unsupervised Deep Learning |
Authors | Chufan Gao, Fabian Falck, Mononito Goswami, Anthony Wertz, Michael R. Pinsky, Artur Dubrawski |
Abstract | Monitoring physiological responses to hemodynamic stress can help in determining appropriate treatment and ensuring good patient outcomes. Physicians’ intuition suggests that the human body has a number of physiological response patterns to hemorrhage which escalate as blood loss continues, however the exact etiology and phenotypes of such responses are not well known or understood only at a coarse level. Although previous research has shown that machine learning models can perform well in hemorrhage detection and survival prediction, it is unclear whether machine learning could help to identify and characterize the underlying physiological responses in raw vital sign data. We approach this problem by first transforming the high-dimensional vital sign time series into a tractable, lower-dimensional latent space using a dilated, causal convolutional encoder model trained purely unsupervised. Second, we identify informative clusters in the embeddings. By analyzing the clusters of latent embeddings and visualizing them over time, we hypothesize that the clusters correspond to the physiological response patterns that match physicians’ intuition. Furthermore, we attempt to evaluate the latent embeddings using a variety of methods, such as predicting the cluster labels using explainable features. |
Tasks | Time Series |
Published | 2019-11-12 |
URL | https://arxiv.org/abs/1911.05121v1 |
https://arxiv.org/pdf/1911.05121v1.pdf | |
PWC | https://paperswithcode.com/paper/detecting-patterns-of-physiological-response |
Repo | |
Framework | |
REMAP: Multi-layer entropy-guided pooling of dense CNN features for image retrieval
Title | REMAP: Multi-layer entropy-guided pooling of dense CNN features for image retrieval |
Authors | Syed Sameed Husain, Miroslaw Bober |
Abstract | This paper addresses the problem of very large-scale image retrieval, focusing on improving its accuracy and robustness. We target enhanced robustness of search to factors such as variations in illumination, object appearance and scale, partial occlusions, and cluttered backgrounds - particularly important when search is performed across very large datasets with significant variability. We propose a novel CNN-based global descriptor, called REMAP, which learns and aggregates a hierarchy of deep features from multiple CNN layers, and is trained end-to-end with a triplet loss. REMAP explicitly learns discriminative features which are mutually-supportive and complementary at various semantic levels of visual abstraction. These dense local features are max-pooled spatially at each layer, within multi-scale overlapping regions, before aggregation into a single image-level descriptor. To identify the semantically useful regions and layers for retrieval, we propose to measure the information gain of each region and layer using KL-divergence. Our system effectively learns during training how useful various regions and layers are and weights them accordingly. We show that such relative entropy-guided aggregation outperforms classical CNN-based aggregation controlled by SGD. The entire framework is trained in an end-to-end fashion, outperforming the latest state-of-the-art results. On image retrieval datasets Holidays, Oxford and MPEG, the REMAP descriptor achieves mAP of 95.5%, 91.5%, and 80.1% respectively, outperforming any results published to date. REMAP also formed the core of the winning submission to the Google Landmark Retrieval Challenge on Kaggle. |
Tasks | Image Retrieval |
Published | 2019-06-15 |
URL | https://arxiv.org/abs/1906.06626v1 |
https://arxiv.org/pdf/1906.06626v1.pdf | |
PWC | https://paperswithcode.com/paper/remap-multi-layer-entropy-guided-pooling-of |
Repo | |
Framework | |
New Potential-Based Bounds for Prediction with Expert Advice
Title | New Potential-Based Bounds for Prediction with Expert Advice |
Authors | Vladimir A. Kobzar, Robert V. Kohn, Zhilei Wang |
Abstract | This work addresses the classic machine learning problem of online prediction with expert advice. We consider the finite-horizon version of this zero-sum, two-person game. Using verification arguments from optimal control theory, we view the task of finding better lower and upper bounds on the value of the game (regret) as the problem of finding better sub- and supersolutions of certain partial differential equations (PDEs). These sub- and supersolutions serve as the potentials for player and adversary strategies, which lead to the corresponding bounds. Our techniques extend in a nonasymptotic setting the recent work of Drenska and Kohn (J. Nonlinear Sci. 2020), which showed that the asymptotically optimal value function is the unique solution of an associated nonlinear PDE. To get explicit bounds, we use closed-form solutions of specific PDEs. Our bounds hold for any fixed number of experts and any time-horizon; in certain regimes (which we identify) they improve upon the previous state-of-the-art. For up to three experts, our bounds provide the asymptotically optimal leading order term. Therefore, in this setting, we provide a continuum perspective on recent work on optimal strategies. |
Tasks | |
Published | 2019-11-05 |
URL | https://arxiv.org/abs/1911.01641v2 |
https://arxiv.org/pdf/1911.01641v2.pdf | |
PWC | https://paperswithcode.com/paper/new-potential-based-bounds-for-prediction |
Repo | |
Framework | |
The Online Resources Shared on Twitter About the #MeToo Movement: The Pareto Principle
Title | The Online Resources Shared on Twitter About the #MeToo Movement: The Pareto Principle |
Authors | Iman Tahamtan, Javad Seif |
Abstract | In this paper we examine the most influential resources shared on Twitter about the #MeToo movement. We also examine whether a small proportion of domain names and URLs (e.g. 20%) appear in a large number of tweets (e.g. 80%) that contain #MeToo (known as the 80/20 rule or Pareto principle). R and Python were used to analyze the data. Results demonstrated that the most frequently shared domains were twitter.com (47.20%), nytimes.com (4.42%) and youtube.com (3.69%). The most frequently shared content was a recent poll which indicated “men are afraid to mentor women after the #MeToo movement”. In accordance with the Pareto principle, 8% of domain names accounted for 80% of the shared content on Twitter that contained #MeToo. This study provides a base for researchers who are interested in understanding what online resources people rely on when sharing information about online social movements (e.g. #MeToo). |
Tasks | |
Published | 2019-06-10 |
URL | https://arxiv.org/abs/1906.12321v2 |
https://arxiv.org/pdf/1906.12321v2.pdf | |
PWC | https://paperswithcode.com/paper/the-online-resources-shared-on-twitter-about |
Repo | |
Framework | |
Estimating brain age based on a healthy population with deep learning and structural MRI
Title | Estimating brain age based on a healthy population with deep learning and structural MRI |
Authors | Xinyang Feng, Zachary C. Lipton, Jie Yang, Scott A. Small, Frank A. Provenzano |
Abstract | Numerous studies have established that estimated brain age, as derived from statistical models trained on healthy populations, constitutes a valuable biomarker that is predictive of cognitive decline and various neurological diseases. In this work, we curate a large-scale heterogeneous dataset (N = 10,158, age range 18 - 97) of structural brain MRIs in a healthy population from multiple publicly-available sources, upon which we train a deep learning model for brain age estimation. The availability of the large-scale dataset enables a more uniform age distribution across adult life-span for effective age estimation with no bias toward certain age groups. We demonstrate that the age estimation accuracy, evaluated with mean absolute error (MAE) and correlation coefficient (r), outperforms previously reported methods in both a hold-out test set reflective of the custom population (MAE = 4.06 years, r = 0.970) and an independent life-span evaluation dataset (MAE = 4.21 years, r = 0.960) on which a previous study has evaluated. We further demonstrate the utility of the estimated age in life-span aging analysis of cognitive functions. Furthermore, we conduct extensive ablation tests and employ feature-attribution techniques to analyze which regions contribute the most predictive value, demonstrating the prominence of the frontal lobe as well as pattern shift across life-span. In summary, we achieve superior age estimation performance confirming the efficacy of deep learning and the added utility of training with data both in larger number and more uniformly distributed than in previous studies. We demonstrate the regional contribution to our brain age predictions through multiple routes and confirm the association of divergence between estimated and chronological brain age with neuropsychological measures. |
Tasks | Age Estimation |
Published | 2019-07-01 |
URL | https://arxiv.org/abs/1907.00943v1 |
https://arxiv.org/pdf/1907.00943v1.pdf | |
PWC | https://paperswithcode.com/paper/estimating-brain-age-based-on-a-healthy |
Repo | |
Framework | |
State Drug Policy Effectiveness: Comparative Policy Analysis of Drug Overdose Mortality
Title | State Drug Policy Effectiveness: Comparative Policy Analysis of Drug Overdose Mortality |
Authors | Jarrod Olson, Po-Hsu Allen Chen, Marissa White, Nicole Brennan, Ning Gong |
Abstract | Opioid overdose rates have reached an epidemic level and state-level policy innovations have followed suit in an effort to prevent overdose deaths. State-level drug law is a set of policies that may reinforce or undermine each other, and analysts have a limited set of tools for handling the policy collinearity using statistical methods. This paper uses a machine learning method called hierarchical clustering to empirically generate “policy bundles” by grouping states with similar sets of policies in force at a given time together for analysis in a 50-state, 10-year interrupted time series regression with drug overdose deaths as the dependent variable. Policy clusters were generated from 138 binomial variables observed by state and year from the Prescription Drug Abuse Policy System. Clustering reduced the policies to a set of 10 bundles. The approach allows for ranking of the relative effect of different bundles and is a tool to recommend those most likely to succeed. This study shows that a set of policies balancing Medication Assisted Treatment, Naloxone Access, Good Samaritan Laws, Medication Assisted Treatment, Prescription Drug Monitoring Programs and legalization of medical marijuana leads to a reduced number of overdose deaths, but not until its second year in force. |
Tasks | Time Series |
Published | 2019-09-03 |
URL | https://arxiv.org/abs/1909.01936v2 |
https://arxiv.org/pdf/1909.01936v2.pdf | |
PWC | https://paperswithcode.com/paper/state-drug-policy-effectiveness-comparative |
Repo | |
Framework | |
Blended Convolution and Synthesis for Efficient Discrimination of 3D Shapes
Title | Blended Convolution and Synthesis for Efficient Discrimination of 3D Shapes |
Authors | Sameera Ramasinghe, Salman Khan, Nick Barnes, Stephen Gould |
Abstract | Existing networks directly learn feature representations on 3D point clouds for shape analysis. We argue that 3D point clouds are highly redundant and hold irregular (permutation-invariant) structure, which makes it difficult to achieve inter-class discrimination efficiently. In this paper, we propose a two-faceted solution to this problem that is seamlessly integrated in a single `Blended Convolution and Synthesis’ layer. This fully differentiable layer performs two critical tasks in succession. In the first step, it projects the input 3D point clouds into a latent 3D space to synthesize a highly compact and more inter-class discriminative point cloud representation. Since, 3D point clouds do not follow a Euclidean topology, standard 2/3D Convolutional Neural Networks offer limited representation capability. Therefore, in the second step, it uses a novel 3D convolution operator functioning inside the unit ball ($\mathbb{B}^3$) to extract useful volumetric features. We extensively derive formulae to achieve both translation and rotation of our novel convolution kernels. Finally, using the proposed techniques we present an extremely light-weight, end-to-end architecture that achieves compelling results on 3D shape recognition and retrieval. | |
Tasks | 3D Shape Recognition |
Published | 2019-08-24 |
URL | https://arxiv.org/abs/1908.10209v1 |
https://arxiv.org/pdf/1908.10209v1.pdf | |
PWC | https://paperswithcode.com/paper/blended-convolution-and-synthesis-for |
Repo | |
Framework | |
Minimum description length as an objective function for non-negative matrix factorization
Title | Minimum description length as an objective function for non-negative matrix factorization |
Authors | Steven Squires, Adam Prugel Bennett, Mahesan Niranjan |
Abstract | Non-negative matrix factorization (NMF) is a dimensionality reduction technique which tends to produce a sparse representation of data. Commonly, the error between the actual and recreated matrices is used as an objective function, but this method may not produce the type of representation we desire as it allows for the complexity of the model to grow, constrained only by the size of the subspace and the non-negativity requirement. If additional constraints, such as sparsity, are imposed the question of parameter selection becomes critical. Instead of adding sparsity constraints in an ad-hoc manner we propose a novel objective function created by using the principle of minimum description length (MDL). Our formulation, MDL-NMF, automatically trades off between the complexity and accuracy of the model using a principled approach with little parameter selection or the need for domain expertise. We demonstrate our model works effectively on three heterogeneous data-sets and on a range of semi-synthetic data showing the broad applicability of our method. |
Tasks | Dimensionality Reduction |
Published | 2019-02-05 |
URL | http://arxiv.org/abs/1902.01632v1 |
http://arxiv.org/pdf/1902.01632v1.pdf | |
PWC | https://paperswithcode.com/paper/minimum-description-length-as-an-objective |
Repo | |
Framework | |
Multi-View Time Series Classification via Global-Local Correlative Channel-Aware Fusion Mechanism
Title | Multi-View Time Series Classification via Global-Local Correlative Channel-Aware Fusion Mechanism |
Authors | Yue Bai, Lichen Wang, Zhiqiang Tao, Sheng Li, Yun Fu |
Abstract | Multi-view time series classification aims to fuse the distinctive temporal information from different views to further enhance the classification performance. Existing methods mainly focus on fusing multi-view features at an early stage (e.g., learning a common representation shared by multiple views). However, these early fusion methods may not fully exploit the view-specific distinctive patterns in high-dimension time series data. Moreover, the intra-view and inter-view label correlations, which are critical for multi-view classification, are usually ignored in previous works. In this paper, we propose a Global-Local Correlative Channel-AwareFusion (GLCCF) model to address the aforementioned issues. Particularly, our model extracts global and local temporal patterns by a two-stream structure encoder, captures the intra-view and inter-view label correlations by constructing a graph based correlation matrix, and extracts the cross-view global patterns via a learnable channel-aware late fusion mechanism, which could be effectively implemented with a convolutional neural network. Extensive experiments on two real-world datasets demonstrate the superiority of our approach over the state-of-the-art methods. An ablation study is furtherprovided to show the effectiveness of each model component. |
Tasks | Time Series, Time Series Classification |
Published | 2019-11-24 |
URL | https://arxiv.org/abs/1911.11561v1 |
https://arxiv.org/pdf/1911.11561v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-view-time-series-classification-via |
Repo | |
Framework | |
Time Series Classification: Lessons Learned in the (Literal) Field while Studying Chicken Behavior
Title | Time Series Classification: Lessons Learned in the (Literal) Field while Studying Chicken Behavior |
Authors | Alireza Abdoli, Amy C. Murillo, Alec C. Gerry, Eamonn J. Keogh |
Abstract | Poultry farms are a major contributor to the human food chain. However, around the world, there have been growing concerns about the quality of life for the livestock in poultry farms; and increasingly vocal demands for improved standards of animal welfare. Recent advances in sensing technologies and machine learning allow the possibility of monitoring birds, and employing the lessons learned to improve the welfare for all birds. This task superficially appears to be easy, yet, studying behavioral patterns involves collecting enormous amounts of data, justifying the term Big Data. Before the big data can be used for analytical purposes to tease out meaningful, well-conserved behavioral patterns, the collected data needs to be pre-processed. The pre-processing refers to processes for cleansing and preparing data so that it is in the format ready to be analyzed by downstream algorithms, such as classification and clustering algorithms. However, as we shall demonstrate, efficient pre-processing of chicken big data is both non-trivial and crucial towards success of further analytics. |
Tasks | Time Series, Time Series Classification |
Published | 2019-11-21 |
URL | https://arxiv.org/abs/1912.05913v2 |
https://arxiv.org/pdf/1912.05913v2.pdf | |
PWC | https://paperswithcode.com/paper/time-series-classification-lessons-learned-in |
Repo | |
Framework | |
The Geometry of Deep Networks: Power Diagram Subdivision
Title | The Geometry of Deep Networks: Power Diagram Subdivision |
Authors | Randall Balestriero, Romain Cosentino, Behnaam Aazhang, Richard Baraniuk |
Abstract | We study the geometry of deep (neural) networks (DNs) with piecewise affine and convex nonlinearities. The layers of such DNs have been shown to be {\em max-affine spline operators} (MASOs) that partition their input space and apply a region-dependent affine mapping to their input to produce their output. We demonstrate that each MASO layer’s input space partitioning corresponds to a {\em power diagram} (an extension of the classical Voronoi tiling) with a number of regions that grows exponentially with respect to the number of units (neurons). We further show that a composition of MASO layers (e.g., the entire DN) produces a progressively subdivided power diagram and provide its analytical form. The subdivision process constrains the affine maps on the (exponentially many) power diagram regions to greatly reduce their complexity. For classification problems, we obtain a formula for a MASO DN’s decision boundary in the input space plus a measure of its curvature that depends on the DN’s nonlinearities, weights, and architecture. Numerous numerical experiments support and extend our theoretical results. |
Tasks | |
Published | 2019-05-21 |
URL | https://arxiv.org/abs/1905.08443v1 |
https://arxiv.org/pdf/1905.08443v1.pdf | |
PWC | https://paperswithcode.com/paper/the-geometry-of-deep-networks-power-diagram |
Repo | |
Framework | |
Adaptive Transfer Learning of Multi-View Time Series Classification
Title | Adaptive Transfer Learning of Multi-View Time Series Classification |
Authors | Donglin Zhan, Shiyu Yi, Dongli Xu, Xiao Yu, Denglin Jiang, Siqi Yu, Haoting Zhang, Wenfang Shangguan, Weihua Zhang |
Abstract | Time Series Classification (TSC) has been an important and challenging task in data mining, especially on multivariate time series and multi-view time series data sets. Meanwhile, transfer learning has been widely applied in computer vision and natural language processing applications to improve deep neural network’s generalization capabilities. However, very few previous works applied transfer learning framework to time series mining problems. Particularly, the technique of measuring similarities between source domain and target domain based on dynamic representation such as density estimation with importance sampling has never been combined with transfer learning framework. In this paper, we first proposed a general adaptive transfer learning framework for multi-view time series data, which shows strong ability in storing inter-view importance value in the process of knowledge transfer. Next, we represented inter-view importance through some time series similarity measurements and approximated the posterior distribution in latent space for the importance sampling via density estimation techniques. We then computed the matrix norm of sampled importance value, which controls the degree of knowledge transfer in pre-training process. We further evaluated our work, applied it to many other time series classification tasks, and observed that our architecture maintained desirable generalization ability. Finally, we concluded that our framework could be adapted with deep learning techniques to receive significant model performance improvements. |
Tasks | Density Estimation, Time Series, Time Series Classification, Transfer Learning |
Published | 2019-10-14 |
URL | https://arxiv.org/abs/1910.07632v1 |
https://arxiv.org/pdf/1910.07632v1.pdf | |
PWC | https://paperswithcode.com/paper/adaptive-transfer-learning-of-multi-view-time |
Repo | |
Framework | |