Paper Group ANR 30
Matrix Normal PCA for Interpretable Dimension Reduction and Graphical Noise Modeling. UDPipe at SIGMORPHON 2019: Contextualized Embeddings, Regularization with Morphological Categories, Corpora Merging. Event detection in Colombian security Twitter news using fine-grained latent topic analysis. Learning the dynamics of technical trading strategies. …
Matrix Normal PCA for Interpretable Dimension Reduction and Graphical Noise Modeling
Title | Matrix Normal PCA for Interpretable Dimension Reduction and Graphical Noise Modeling |
Authors | Chihao Zhang, Kuo Gai, Shihua Zhang |
Abstract | Principal component analysis (PCA) is one of the most widely used dimension reduction and multivariate statistical techniques. From a probabilistic perspective, PCA seeks a low-dimensional representation of data in the presence of independent identical Gaussian noise. Probabilistic PCA (PPCA) and its variants have been extensively studied for decades. Most of them assume the underlying noise follows a certain independent identical distribution. However, the noise in the real world is usually complicated and structured. To address this challenge, some non-linear variants of PPCA have been proposed. But those methods are generally difficult to interpret. To this end, we propose a powerful and intuitive PCA method (MN-PCA) through modeling the graphical noise by the matrix normal distribution, which enables us to explore the structure of noise in both the feature space and the sample space. MN-PCA obtains a low-rank representation of data and the structure of noise simultaneously. And it can be explained as approximating data over the generalized Mahalanobis distance. We develop two algorithms to solve this model: one maximizes the regularized likelihood, the other exploits the Wasserstein distance, which is more robust. Extensive experiments on various data demonstrate their effectiveness. |
Tasks | Dimensionality Reduction |
Published | 2019-11-25 |
URL | https://arxiv.org/abs/1911.10796v1 |
https://arxiv.org/pdf/1911.10796v1.pdf | |
PWC | https://paperswithcode.com/paper/matrix-normal-pca-for-interpretable-dimension |
Repo | |
Framework | |
UDPipe at SIGMORPHON 2019: Contextualized Embeddings, Regularization with Morphological Categories, Corpora Merging
Title | UDPipe at SIGMORPHON 2019: Contextualized Embeddings, Regularization with Morphological Categories, Corpora Merging |
Authors | Milan Straka, Jana Straková, Jan Hajič |
Abstract | We present our contribution to the SIGMORPHON 2019 Shared Task: Crosslinguality and Context in Morphology, Task 2: contextual morphological analysis and lemmatization. We submitted a modification of the UDPipe 2.0, one of best-performing systems of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies and an overall winner of the The 2018 Shared Task on Extrinsic Parser Evaluation. As our first improvement, we use the pretrained contextualized embeddings (BERT) as additional inputs to the network; secondly, we use individual morphological features as regularization; and finally, we merge the selected corpora of the same language. In the lemmatization task, our system exceeds all the submitted systems by a wide margin with lemmatization accuracy 95.78 (second best was 95.00, third 94.46). In the morphological analysis, our system placed tightly second: our morphological analysis accuracy was 93.19, the winning system’s 93.23. |
Tasks | Lemmatization, Morphological Analysis |
Published | 2019-08-19 |
URL | https://arxiv.org/abs/1908.06931v1 |
https://arxiv.org/pdf/1908.06931v1.pdf | |
PWC | https://paperswithcode.com/paper/udpipe-at-sigmorphon-2019-contextualized-1 |
Repo | |
Framework | |
Event detection in Colombian security Twitter news using fine-grained latent topic analysis
Title | Event detection in Colombian security Twitter news using fine-grained latent topic analysis |
Authors | Vladimir Vargas-Calderón, Nicolás Parra-A., Jorge E. Camargo, Herbert Vinck-Posada |
Abstract | Cultural and social dynamics are important concepts that must be understood in order to grasp what a community cares about. To that end, an excellent source of information on what occurs in a community is the news, especially in recent years, when mass media giants use social networks to communicate and interact with their audience. In this work, we use a method to discover latent topics in tweets from Colombian Twitter news accounts in order to identify the most prominent events in the country. We pay particular attention to security, violence and crime-related tweets because of the violent environment that surrounds Colombian society. The latent topic discovery method that we use builds vector representations of the tweets by using FastText and finds clusters of tweets through the K-means clustering algorithm. The number of clusters is found by measuring the $C_V$ coherence for a range of number of topics of the Latent Dirichlet Allocation (LDA) model. We finally use Uniform Manifold Approximation and Projection (UMAP) for dimensionality reduction to visualise the tweets vectors. Once the clusters related to security, violence and crime are identified, we proceed to apply the same method within each cluster to perform a fine-grained analysis in which specific events mentioned in the news are grouped together. Our method is able to discover event-specific sets of news, which is the baseline to perform an extensive analysis of how people engage in Twitter threads on the different types of news, with an emphasis on security, violence and crime-related tweets. |
Tasks | Dimensionality Reduction |
Published | 2019-11-19 |
URL | https://arxiv.org/abs/1911.08370v1 |
https://arxiv.org/pdf/1911.08370v1.pdf | |
PWC | https://paperswithcode.com/paper/event-detection-in-colombian-security-twitter |
Repo | |
Framework | |
Learning the dynamics of technical trading strategies
Title | Learning the dynamics of technical trading strategies |
Authors | Nicholas Murphy, Tim Gebbie |
Abstract | We use an adversarial expert based online learning algorithm to learn the optimal parameters required to maximise wealth trading zero-cost portfolio strategies. The learning algorithm is used to determine the relative population dynamics of technical trading strategies that can survive historical back-testing as well as form an overall aggregated portfolio trading strategy from the set of underlying trading strategies implemented on daily and intraday Johannesburg Stock Exchange data. The resulting population time-series are investigated using unsupervised learning for dimensionality reduction and visualisation. A key contribution is that the overall aggregated trading strategies are tested for statistical arbitrage using a novel hypothesis test proposed by Jarrow et al. (2012) on both daily sampled and intraday time-scales. The (low frequency) daily sampled strategies fail the arbitrage tests after costs, while the (high frequency) intraday sampled strategies are not falsified as statistical arbitrages after costs. The estimates of trading strategy success, cost of trading and slippage are considered along with an online benchmark portfolio algorithm for performance comparison. In addition, the algorithms generalisation error is analysed by recovering a probability of back-test overfitting estimate using a nonparametric procedure introduced by Bailey et al. (2016). The work aims to explore and better understand the interplay between different technical trading strategies from a data-informed perspective. |
Tasks | Dimensionality Reduction, Time Series |
Published | 2019-03-06 |
URL | https://arxiv.org/abs/1903.02228v3 |
https://arxiv.org/pdf/1903.02228v3.pdf | |
PWC | https://paperswithcode.com/paper/learning-the-population-dynamics-of-technical |
Repo | |
Framework | |
Nonparametric Density Estimation & Convergence Rates for GANs under Besov IPM Losses
Title | Nonparametric Density Estimation & Convergence Rates for GANs under Besov IPM Losses |
Authors | Ananya Uppal, Shashank Singh, Barnabás Póczos |
Abstract | We study the problem of estimating a nonparametric probability density under a large family of losses called Besov IPMs, which include, for example, $\mathcal{L}^p$ distances, total variation distance, and generalizations of both Wasserstein and Kolmogorov-Smirnov distances. For a wide variety of settings, we provide both lower and upper bounds, identifying precisely how the choice of loss function and assumptions on the data interact to determine the minimax optimal convergence rate. We also show that linear distribution estimates, such as the empirical distribution or kernel density estimator, often fail to converge at the optimal rate. Our bounds generalize, unify, or improve several recent and classical results. Moreover, IPMs can be used to formalize a statistical model of generative adversarial networks (GANs). Thus, we show how our results imply bounds on the statistical error of a GAN, showing, for example, that GANs can strictly outperform the best linear estimator. |
Tasks | Density Estimation |
Published | 2019-02-09 |
URL | https://arxiv.org/abs/1902.03511v4 |
https://arxiv.org/pdf/1902.03511v4.pdf | |
PWC | https://paperswithcode.com/paper/nonparametric-density-estimation-under-besov |
Repo | |
Framework | |
Optimal Mini-Batch Size Selection for Fast Gradient Descent
Title | Optimal Mini-Batch Size Selection for Fast Gradient Descent |
Authors | Michael P. Perrone, Haidar Khan, Changhoan Kim, Anastasios Kyrillidis, Jerry Quinn, Valentina Salapura |
Abstract | This paper presents a methodology for selecting the mini-batch size that minimizes Stochastic Gradient Descent (SGD) learning time for single and multiple learner problems. By decoupling algorithmic analysis issues from hardware and software implementation details, we reveal a robust empirical inverse law between mini-batch size and the average number of SGD updates required to converge to a specified error threshold. Combining this empirical inverse law with measured system performance, we create an accurate, closed-form model of average training time and show how this model can be used to identify quantifiable implications for both algorithmic and hardware aspects of machine learning. We demonstrate the inverse law empirically, on both image recognition (MNIST, CIFAR10 and CIFAR100) and machine translation (Europarl) tasks, and provide a theoretic justification via proving a novel bound on mini-batch SGD training. |
Tasks | Machine Translation |
Published | 2019-11-15 |
URL | https://arxiv.org/abs/1911.06459v1 |
https://arxiv.org/pdf/1911.06459v1.pdf | |
PWC | https://paperswithcode.com/paper/optimal-mini-batch-size-selection-for-fast |
Repo | |
Framework | |
Time-Series Analysis via Low-Rank Matrix Factorization Applied to Infant-Sleep Data
Title | Time-Series Analysis via Low-Rank Matrix Factorization Applied to Infant-Sleep Data |
Authors | Sheng Liu, Mark Cheng, Hayley Brooks, Wayne Mackey, David J. Heeger, Esteban G. Tabak, Carlos Fernandez-Granda |
Abstract | We propose a nonparametric model for time series with missing data based on low-rank matrix factorization. The model expresses each instance in a set of time series as a linear combination of a small number of shared basis functions. Constraining the functions and the corresponding coefficients to be nonnegative yields an interpretable low-dimensional representation of the data. A time-smoothing regularization term ensures that the model captures meaningful trends in the data, instead of overfitting short-term fluctuations. The low-dimensional representation makes it possible to detect outliers and cluster the time series according to the interpretable features extracted by the model, and also to perform forecasting via kernel regression. We apply our methodology to a large real-world dataset of infant-sleep data gathered by caregivers with a mobile-phone app. Our analysis automatically extracts daily-sleep patterns consistent with the existing literature. This allows us to compute sleep-development trends for the cohort, which characterize the emergence of circadian sleep and different napping habits. We apply our methodology to detect anomalous individuals, to cluster the cohort into groups with different sleeping tendencies, and to obtain improved predictions of future sleep behavior. |
Tasks | Time Series, Time Series Analysis |
Published | 2019-04-09 |
URL | https://arxiv.org/abs/1904.04780v3 |
https://arxiv.org/pdf/1904.04780v3.pdf | |
PWC | https://paperswithcode.com/paper/time-series-analysis-via-low-rank-matrix |
Repo | |
Framework | |
Design of intentional backdoors in sequential models
Title | Design of intentional backdoors in sequential models |
Authors | Zhaoyuan Yang, Naresh Iyer, Johan Reimann, Nurali Virani |
Abstract | Recent work has demonstrated robust mechanisms by which attacks can be orchestrated on machine learning models. In contrast to adversarial examples, backdoor or trojan attacks embed surgically modified samples with targeted labels in the model training process to cause the targeted model to learn to misclassify chosen samples in the presence of specific triggers, while keeping the model performance stable across other nominal samples. However, current published research on trojan attacks mainly focuses on classification problems, which ignores sequential dependency between inputs. In this paper, we propose methods to discreetly introduce and exploit novel backdoor attacks within a sequential decision-making agent, such as a reinforcement learning agent, by training multiple benign and malicious policies within a single long short-term memory (LSTM) network. We demonstrate the effectiveness as well as the damaging impact of such attacks through initial outcomes generated from our approach, employed on grid-world environments. We also provide evidence as well as intuition on how the trojan trigger and malicious policy is activated. Challenges with network size and unintentional triggers are identified and analogies with adversarial examples are also discussed. In the end, we propose potential approaches to defend against or serve as early detection for such attacks. Results of our work can also be extended to many applications of LSTM and recurrent networks. |
Tasks | Decision Making |
Published | 2019-02-26 |
URL | http://arxiv.org/abs/1902.09972v1 |
http://arxiv.org/pdf/1902.09972v1.pdf | |
PWC | https://paperswithcode.com/paper/design-of-intentional-backdoors-in-sequential |
Repo | |
Framework | |
Tight Sample Complexity of Learning One-hidden-layer Convolutional Neural Networks
Title | Tight Sample Complexity of Learning One-hidden-layer Convolutional Neural Networks |
Authors | Yuan Cao, Quanquan Gu |
Abstract | We study the sample complexity of learning one-hidden-layer convolutional neural networks (CNNs) with non-overlapping filters. We propose a novel algorithm called approximate gradient descent for training CNNs, and show that, with high probability, the proposed algorithm with random initialization grants a linear convergence to the ground-truth parameters up to statistical precision. Compared with existing work, our result applies to general non-trivial, monotonic and Lipschitz continuous activation functions including ReLU, Leaky ReLU, Sigmod and Softplus etc. Moreover, our sample complexity beats existing results in the dependency of the number of hidden nodes and filter size. In fact, our result matches the information-theoretic lower bound for learning one-hidden-layer CNNs with linear activation functions, suggesting that our sample complexity is tight. Our theoretical analysis is backed up by numerical experiments. |
Tasks | |
Published | 2019-11-12 |
URL | https://arxiv.org/abs/1911.05059v1 |
https://arxiv.org/pdf/1911.05059v1.pdf | |
PWC | https://paperswithcode.com/paper/tight-sample-complexity-of-learning-one-1 |
Repo | |
Framework | |
Reconciling Utility and Membership Privacy via Knowledge Distillation
Title | Reconciling Utility and Membership Privacy via Knowledge Distillation |
Authors | Virat Shejwalkar, Amir Houmansadr |
Abstract | Large capacity machine learning models are prone to membership inference attacks in which an adversary aims to infer whether a particular data sample is a member of the target model’s training dataset. Such membership inferences can lead to serious privacy violations as machine learning models are often trained using privacy-sensitive data such as medical records and controversial user opinions. Recently, defenses against membership inference attacks are developed, in particular, based on differential privacy and adversarial regularization; unfortunately, such defenses highly impact the classification accuracy of the underlying machine learning models. In this work, we present a new defense against membership inference attacks that preserves the utility of the target machine learning models significantly better than prior defenses. Our defense, called distillation for membership privacy (DMP), leverages knowledge distillation to train machine learning models with membership privacy. We analyze the key requirements for membership privacy and provide a novel criterion to select data used for knowledge transfer, in order to improve membership privacy of the final models. DMP works effectively against the attackers with either a whitebox or blackbox access to the target model. We evaluate DMP’s performance through extensive experiments on different deep neural networks and using various benchmark datasets. We show that DMP provides a significantly better tradeoff between inference resistance and classification performance than state-of-the-art membership inference defenses. For instance, a DMP-trained DenseNet provides a classification accuracy of 65.3% for a 54.4% blackbox membership inference attack accuracy, while an adversarially regularized DenseNet provides a classification accuracy of only 53.7% for a (much worse) 68.7% blackbox membership inference attack accuracy. |
Tasks | Inference Attack, Model Compression, Transfer Learning |
Published | 2019-06-15 |
URL | https://arxiv.org/abs/1906.06589v2 |
https://arxiv.org/pdf/1906.06589v2.pdf | |
PWC | https://paperswithcode.com/paper/reconciling-utility-and-membership-privacy |
Repo | |
Framework | |
RDD-Eclat: Approaches to Parallelize Eclat Algorithm on Spark RDD Framework
Title | RDD-Eclat: Approaches to Parallelize Eclat Algorithm on Spark RDD Framework |
Authors | Pankaj Singh, Sudhakar Singh, P. K. Mishra, Rakhi Garg |
Abstract | Initially, a number of frequent itemset mining (FIM) algorithms have been designed on the Hadoop MapReduce, a distributed big data processing framework. But, due to heavy disk I/O, MapReduce is found to be inefficient for such highly iterative algorithms. Therefore, Spark, a more efficient distributed data processing framework, has been developed with in-memory computation and resilient distributed dataset (RDD) features to support the iterative algorithms. On the Spark RDD framework, Apriori and FP-Growth based FIM algorithms have been designed, but Eclat-based algorithm has not been explored yet. In this paper, RDD-Eclat, a parallel Eclat algorithm on the Spark RDD framework is proposed with its five variants. The proposed algorithms are evaluated on the various benchmark datasets, which shows that RDD-Eclat outperforms the Spark-based Apriori by many times. Also, the experimental results show the scalability of the proposed algorithms on increasing the number of cores and size of the dataset. |
Tasks | |
Published | 2019-12-13 |
URL | https://arxiv.org/abs/1912.06415v1 |
https://arxiv.org/pdf/1912.06415v1.pdf | |
PWC | https://paperswithcode.com/paper/rdd-eclat-approaches-to-parallelize-eclat |
Repo | |
Framework | |
Automatic Identification of Traditional Colombian Music Genres based on Audio Content Analysis and Machine Learning Technique
Title | Automatic Identification of Traditional Colombian Music Genres based on Audio Content Analysis and Machine Learning Technique |
Authors | Diego A. Cruz, Sergio S. Lopez, Jorge E. Camargo |
Abstract | Colombia has a diversity of genres in traditional music, which allows to express the richness of the Colombian culture according to the region. This musical diversity is the result of a mixture of African, native Indigenous, and European influences. Organizing large collections of songs is a time consuming task that requires that a human listens to fragments of audio to identify genre, singer, year, instruments and other relevant characteristics that allow to index the song dataset. This paper presents a method to automatically identify the genre of a Colombian song by means of its audio content. The method extracts audio features that are used to train a machine learning model that learns to classify the genre. The method was evaluated in a dataset of 180 musical pieces belonging to six folkloric Colombian music genres: Bambuco, Carranga, Cumbia, Joropo, Pasillo, and Vallenato. Results show that it is possible to automatically identify the music genre in spite of the complexity of Colombian rhythms reaching an average accuracy of 69%. |
Tasks | |
Published | 2019-11-08 |
URL | https://arxiv.org/abs/1911.03372v1 |
https://arxiv.org/pdf/1911.03372v1.pdf | |
PWC | https://paperswithcode.com/paper/automatic-identification-of-traditional |
Repo | |
Framework | |
A comparative study of physics-informed neural network models for learning unknown dynamics and constitutive relations
Title | A comparative study of physics-informed neural network models for learning unknown dynamics and constitutive relations |
Authors | Ramakrishna Tipireddy, Paris Perdikaris, Panos Stinis, Alexandre Tartakovsky |
Abstract | We investigate the use of discrete and continuous versions of physics-informed neural network methods for learning unknown dynamics or constitutive relations of a dynamical system. For the case of unknown dynamics, we represent all the dynamics with a deep neural network (DNN). When the dynamics of the system are known up to the specification of constitutive relations (that can depend on the state of the system), we represent these constitutive relations with a DNN. The discrete versions combine classical multistep discretization methods for dynamical systems with neural network based machine learning methods. On the other hand, the continuous versions utilize deep neural networks to minimize the residual function for the continuous governing equations. We use the case of a fedbatch bioreactor system to study the effectiveness of these approaches and discuss conditions for their applicability. Our results indicate that the accuracy of the trained neural network models is much higher for the cases where we only have to learn a constitutive relation instead of the whole dynamics. This finding corroborates the well-known fact from scientific computing that building as much structural information is available into an algorithm can enhance its efficiency and/or accuracy. |
Tasks | |
Published | 2019-04-02 |
URL | http://arxiv.org/abs/1904.04058v1 |
http://arxiv.org/pdf/1904.04058v1.pdf | |
PWC | https://paperswithcode.com/paper/a-comparative-study-of-physics-informed |
Repo | |
Framework | |
RADE: Resource-Efficient Supervised Anomaly Detection Using Decision Tree-Based Ensemble Methods
Title | RADE: Resource-Efficient Supervised Anomaly Detection Using Decision Tree-Based Ensemble Methods |
Authors | Shay Vargaftik, Isaac Keslassy, Ariel Orda, Yaniv Ben-Itzhak |
Abstract | Decision-tree-based ensemble classification methods (DTEMs) are a prevalent tool for supervised anomaly detection. However, due to the continued growth of datasets, DTEMs result in increasing drawbacks such as growing memory footprints, longer training times, and slower classification latencies at lower throughput. In this paper, we present, design, and evaluate RADE - a DTEM-based anomaly detection framework that augments standard DTEM classifiers and alleviates these drawbacks by relying on two observations: (1) we find that a small (coarse-grained) DTEM model is sufficient to classify the majority of the classification queries correctly, such that a classification is valid only if its corresponding confidence level is greater than or equal to a predetermined classification confidence threshold; (2) we find that in these fewer harder cases where our coarse-grained DTEM model results in insufficient confidence in its classification, we can improve it by forwarding the classification query to one of expert DTEM (fine-grained) models, which is explicitly trained for that particular case. We implement RADE in Python based on scikit-learn and evaluate it over different DTEM methods: RF, XGBoost, AdaBoost, GBDT and LightGBM, and over three publicly available datasets. Our evaluation over both a strong AWS EC2 instance and a Raspberry Pi 3 device indicates that RADE offers competitive and often superior anomaly detection capabilities as compared to standard DTEM methods, while significantly improving memory footprint (by up to 5.46x), training-time (by up to 17.2x), and classification latency (by up to 31.2x). |
Tasks | Anomaly Detection |
Published | 2019-09-26 |
URL | https://arxiv.org/abs/1909.11877v2 |
https://arxiv.org/pdf/1909.11877v2.pdf | |
PWC | https://paperswithcode.com/paper/rade-resource-efficient-supervised-anomaly |
Repo | |
Framework | |
Cost-Sensitive Feature-Value Acquisition Using Feature Relevance
Title | Cost-Sensitive Feature-Value Acquisition Using Feature Relevance |
Authors | Kimmo Kärkkäinen, Mohammad Kachuee, Orpaz Goldstein, Majid Sarrafzadeh |
Abstract | In many real-world machine learning problems, feature values are not readily available. To make predictions, some of the missing features have to be acquired, which can incur a cost in money, computational time, or human time, depending on the problem domain. This leads us to the problem of choosing which features to use at the prediction time. The chosen features should increase the prediction accuracy for a low cost, but determining which features will do that is challenging. The choice should take into account the previously acquired feature values as well as the feature costs. This paper proposes a novel approach to address this problem. The proposed approach chooses the most useful features adaptively based on how relevant they are for the prediction task as well as what the corresponding feature costs are. Our approach uses a generic neural network architecture, which is suitable for a wide range of problems. We evaluate our approach on three cost-sensitive datasets, including Yahoo! Learning to Rank Competition dataset as well as two health datasets. We show that our approach achieves high accuracy with a lower cost than the current state-of-the-art approaches. |
Tasks | Learning-To-Rank |
Published | 2019-12-17 |
URL | https://arxiv.org/abs/1912.08281v2 |
https://arxiv.org/pdf/1912.08281v2.pdf | |
PWC | https://paperswithcode.com/paper/cost-sensitive-feature-value-acquisition |
Repo | |
Framework | |