Paper Group ANR 1247
Myers-Briggs Personality Classification and Personality-Specific Language Generation Using Pre-trained Language Models. Learning Deep Transformer Models for Machine Translation. oLMpics – On what Language Model Pre-training Captures. “Hinglish” Language – Modeling a Messy Code-Mixed Language. Laguerre-Gauss Preprocessing: Line Profiles as Image F …
Myers-Briggs Personality Classification and Personality-Specific Language Generation Using Pre-trained Language Models
Title | Myers-Briggs Personality Classification and Personality-Specific Language Generation Using Pre-trained Language Models |
Authors | Sedrick Scott Keh, I-Tsun Cheng |
Abstract | The Myers-Briggs Type Indicator (MBTI) is a popular personality metric that uses four dichotomies as indicators of personality traits. This paper examines the use of pre-trained language models to predict MBTI personality types based on scraped labeled texts. The proposed model reaches an accuracy of $0.47$ for correctly predicting all 4 types and $0.86$ for correctly predicting at least 2 types. Furthermore, we investigate the possible uses of a fine-tuned BERT model for personality-specific language generation. This is a task essential for both modern psychology and for intelligent empathetic systems. |
Tasks | Text Generation |
Published | 2019-07-15 |
URL | https://arxiv.org/abs/1907.06333v1 |
https://arxiv.org/pdf/1907.06333v1.pdf | |
PWC | https://paperswithcode.com/paper/myers-briggs-personality-classification-and |
Repo | |
Framework | |
Learning Deep Transformer Models for Machine Translation
Title | Learning Deep Transformer Models for Machine Translation |
Authors | Qiang Wang, Bei Li, Tong Xiao, Jingbo Zhu, Changliang Li, Derek F. Wong, Lidia S. Chao |
Abstract | Transformer is the state-of-the-art model in recent machine translation evaluations. Two strands of research are promising to improve models of this kind: the first uses wide networks (a.k.a. Transformer-Big) and has been the de facto standard for the development of the Transformer system, and the other uses deeper language representation but faces the difficulty arising from learning deep networks. Here, we continue the line of research on the latter. We claim that a truly deep Transformer model can surpass the Transformer-Big counterpart by 1) proper use of layer normalization and 2) a novel way of passing the combination of previous layers to the next. On WMT’16 English- German, NIST OpenMT’12 Chinese-English and larger WMT’18 Chinese-English tasks, our deep system (30/25-layer encoder) outperforms the shallow Transformer-Big/Base baseline (6-layer encoder) by 0.4-2.4 BLEU points. As another bonus, the deep model is 1.6X smaller in size and 3X faster in training than Transformer-Big. |
Tasks | Machine Translation |
Published | 2019-06-05 |
URL | https://arxiv.org/abs/1906.01787v1 |
https://arxiv.org/pdf/1906.01787v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-deep-transformer-models-for-machine |
Repo | |
Framework | |
oLMpics – On what Language Model Pre-training Captures
Title | oLMpics – On what Language Model Pre-training Captures |
Authors | Alon Talmor, Yanai Elazar, Yoav Goldberg, Jonathan Berant |
Abstract | Recent success of pre-trained language models (LMs) has spurred widespread interest in the language capabilities that they possess. However, efforts to understand whether LM representations are useful for symbolic reasoning tasks have been limited and scattered. In this work, we propose eight reasoning tasks, which conceptually require operations such as comparison, conjunction, and composition. A fundamental challenge is to understand whether the performance of a LM on a task should be attributed to the pre-trained representations or to the process of fine-tuning on the task data. To address this, we propose an evaluation protocol that includes both zero-shot evaluation (no fine-tuning), as well as comparing the learning curve of a fine-tuned LM to the learning curve of multiple controls, which paints a rich picture of the LM capabilities. Our main findings are that: (a) different LMs exhibit qualitatively different reasoning abilities, e.g., RoBERTa succeeds in reasoning tasks where BERT fails completely; (b) LMs do not reason in an abstract manner and are context-dependent, e.g., while RoBERTa can compare ages, it can do so only when the ages are in the typical range of human ages; (c) On half of our reasoning tasks all models fail completely. Our findings and infrastructure can help future work on designing new datasets, models and objective functions for pre-training. |
Tasks | Language Modelling |
Published | 2019-12-31 |
URL | https://arxiv.org/abs/1912.13283v1 |
https://arxiv.org/pdf/1912.13283v1.pdf | |
PWC | https://paperswithcode.com/paper/olmpics-on-what-language-model-pre-training |
Repo | |
Framework | |
“Hinglish” Language – Modeling a Messy Code-Mixed Language
Title | “Hinglish” Language – Modeling a Messy Code-Mixed Language |
Authors | Vivek Kumar Gupta |
Abstract | With a sharp rise in fluency and users of “Hinglish” in linguistically diverse country, India, it has increasingly become important to analyze social content written in this language in platforms such as Twitter, Reddit, Facebook. This project focuses on using deep learning techniques to tackle a classification problem in categorizing social content written in Hindi-English into Abusive, Hate-Inducing and Not offensive categories. We utilize bi-directional sequence models with easy text augmentation techniques such as synonym replacement, random insertion, random swap, and random deletion to produce a state of the art classifier that outperforms the previous work done on analyzing this dataset. |
Tasks | Language Modelling, Text Augmentation |
Published | 2019-12-30 |
URL | https://arxiv.org/abs/1912.13109v1 |
https://arxiv.org/pdf/1912.13109v1.pdf | |
PWC | https://paperswithcode.com/paper/hinglish-language-modeling-a-messy-code-mixed |
Repo | |
Framework | |
Laguerre-Gauss Preprocessing: Line Profiles as Image Features for Aerial Images Classification
Title | Laguerre-Gauss Preprocessing: Line Profiles as Image Features for Aerial Images Classification |
Authors | Alejandro Murillo-González, José David Ortega Pabón, Juan Guillermo Paniagua, Olga Lucía Quintero Montoya |
Abstract | An image preprocessing methodology based on Fourier analysis together with the Laguerre-Gauss Spatial Filter is proposed. This is an alternative to obtain features from aerial images that reduces the feature space significantly, preserving enough information for classification tasks. Experiments on a challenging data set of aerial images show that it is possible to learn a robust classifier from this transformed and smaller feature space using simple models, with similar performance to the complete feature space and more complex models. |
Tasks | |
Published | 2019-12-13 |
URL | https://arxiv.org/abs/1912.06729v1 |
https://arxiv.org/pdf/1912.06729v1.pdf | |
PWC | https://paperswithcode.com/paper/laguerre-gauss-preprocessing-line-profiles-as |
Repo | |
Framework | |
Personalized Multimedia Item and Key Frame Recommendation
Title | Personalized Multimedia Item and Key Frame Recommendation |
Authors | Le Wu, Lei Chen, Yonghui Yang, Richang Hong, Yong Ge, Xing Xie, Meng Wang |
Abstract | When recommending or advertising items to users, an emerging trend is to present each multimedia item with a key frame image (e.g., the poster of a movie). As each multimedia item can be represented as multiple fine-grained visual images (e.g., related images of the movie), personalized key frame recommendation is necessary in these applications to attract users’ unique visual preferences. However, previous personalized key frame recommendation models relied on users’ fine-grained image behavior of multimedia items (e.g., user-image interaction behavior), which is often not available in real scenarios. In this paper, we study the general problem of joint multimedia item and key frame recommendation in the absence of the fine-grained user-image behavior. We argue that the key challenge of this problem lies in discovering users’ visual profiles for key frame recommendation, as most recommendation models would fail without any users’ fine-grained image behavior. To tackle this challenge, we leverage users’ item behavior by projecting users (items) in two latent spaces: a collaborative latent space and a visual latent space. We further design a model to discern both the collaborative and visual dimensions of users, and model how users make decisive item preferences from these two spaces. As a result, the learned user visual profiles could be directly applied for key frame recommendation. Finally, experimental results on a real-world dataset clearly show the effectiveness of our proposed model on the two recommendation tasks. |
Tasks | |
Published | 2019-06-01 |
URL | https://arxiv.org/abs/1906.00246v2 |
https://arxiv.org/pdf/1906.00246v2.pdf | |
PWC | https://paperswithcode.com/paper/190600246 |
Repo | |
Framework | |
SAdam: A Variant of Adam for Strongly Convex Functions
Title | SAdam: A Variant of Adam for Strongly Convex Functions |
Authors | Guanghui Wang, Shiyin Lu, Weiwei Tu, Lijun Zhang |
Abstract | The Adam algorithm has become extremely popular for large-scale machine learning. Under convexity condition, it has been proved to enjoy a data-dependant $O(\sqrt{T})$ regret bound where $T$ is the time horizon. However, whether strong convexity can be utilized to further improve the performance remains an open problem. In this paper, we give an affirmative answer by developing a variant of Adam (referred to as SAdam) which achieves a data-dependant $O(\log T)$ regret bound for strongly convex functions. The essential idea is to maintain a faster decaying yet under controlled step size for exploiting strong convexity. In addition, under a special configuration of hyperparameters, our SAdam reduces to SC-RMSprop, a recently proposed variant of RMSprop for strongly convex functions, for which we provide the first data-dependent logarithmic regret bound. Empirical results on optimizing strongly convex functions and training deep networks demonstrate the effectiveness of our method. |
Tasks | |
Published | 2019-05-08 |
URL | https://arxiv.org/abs/1905.02957v1 |
https://arxiv.org/pdf/1905.02957v1.pdf | |
PWC | https://paperswithcode.com/paper/sadam-a-variant-of-adam-for-strongly-convex |
Repo | |
Framework | |
Multi-GCN: Graph Convolutional Networks for Multi-View Networks, with Applications to Global Poverty
Title | Multi-GCN: Graph Convolutional Networks for Multi-View Networks, with Applications to Global Poverty |
Authors | Muhammad Raza Khan, Joshua E. Blumenstock |
Abstract | With the rapid expansion of mobile phone networks in developing countries, large-scale graph machine learning has gained sudden relevance in the study of global poverty. Recent applications range from humanitarian response and poverty estimation to urban planning and epidemic containment. Yet the vast majority of computational tools and algorithms used in these applications do not account for the multi-view nature of social networks: people are related in myriad ways, but most graph learning models treat relations as binary. In this paper, we develop a graph-based convolutional network for learning on multi-view networks. We show that this method outperforms state-of-the-art semi-supervised learning algorithms on three different prediction tasks using mobile phone datasets from three different developing countries. We also show that, while designed specifically for use in poverty research, the algorithm also outperforms existing benchmarks on a broader set of learning tasks on multi-view networks, including node labelling in citation networks. |
Tasks | |
Published | 2019-01-31 |
URL | http://arxiv.org/abs/1901.11213v1 |
http://arxiv.org/pdf/1901.11213v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-gcn-graph-convolutional-networks-for |
Repo | |
Framework | |
Biosignal Generation and Latent Variable Analysis with Recurrent Generative Adversarial Networks
Title | Biosignal Generation and Latent Variable Analysis with Recurrent Generative Adversarial Networks |
Authors | Shota Harada, Hideaki Hayashi, Seiichi Uchida |
Abstract | The effectiveness of biosignal generation and data augmentation with biosignal generative models based on generative adversarial networks (GANs), which are a type of deep learning technique, was demonstrated in our previous paper. GAN-based generative models only learn the projection between a random distribution as input data and the distribution of training data.Therefore, the relationship between input and generated data is unclear, and the characteristics of the data generated from this model cannot be controlled. This study proposes a method for generating time-series data based on GANs and explores their ability to generate biosignals with certain classes and characteristics. Moreover, in the proposed method, latent variables are analyzed using canonical correlation analysis (CCA) to represent the relationship between input and generated data as canonical loadings. Using these loadings, we can control the characteristics of the data generated by the proposed method. The influence of class labels on generated data is analyzed by feeding the data interpolated between two class labels into the generator of the proposed GANs. The CCA of the latent variables is shown to be an effective method of controlling the generated data characteristics. We are able to model the distribution of the time-series data without requiring domain-dependent knowledge using the proposed method. Furthermore, it is possible to control the characteristics of these data by analyzing the model trained using the proposed method. To the best of our knowledge, this work is the first to generate biosignals using GANs while controlling the characteristics of the generated data. |
Tasks | Data Augmentation, Time Series |
Published | 2019-05-17 |
URL | https://arxiv.org/abs/1905.07136v1 |
https://arxiv.org/pdf/1905.07136v1.pdf | |
PWC | https://paperswithcode.com/paper/biosignal-generation-and-latent-variable |
Repo | |
Framework | |
Pushing the limits of RNN Compression
Title | Pushing the limits of RNN Compression |
Authors | Urmish Thakker, Igor Fedorov, Jesse Beu, Dibakar Gope, Chu Zhou, Ganesh Dasika, Matthew Mattina |
Abstract | Recurrent Neural Networks (RNN) can be difficult to deploy on resource constrained devices due to their size. As a result, there is a need for compression techniques that can significantly compress RNNs without negatively impacting task accuracy. This paper introduces a method to compress RNNs for resource constrained environments using Kronecker product (KP). KPs can compress RNN layers by 16-38x with minimal accuracy loss. We show that KP can beat the task accuracy achieved by other state-of-the-art compression techniques (pruning and low-rank matrix factorization) across 4 benchmarks spanning 3 different applications, while simultaneously improving inference run-time. |
Tasks | |
Published | 2019-10-04 |
URL | https://arxiv.org/abs/1910.02558v2 |
https://arxiv.org/pdf/1910.02558v2.pdf | |
PWC | https://paperswithcode.com/paper/pushing-the-limits-of-rnn-compression |
Repo | |
Framework | |
Disease Knowledge Transfer across Neurodegenerative Diseases
Title | Disease Knowledge Transfer across Neurodegenerative Diseases |
Authors | Razvan V. Marinescu, Marco Lorenzi, Stefano B. Blumberg, Alexandra L. Young, Pere P. Morell, Neil P. Oxtoby, Arman Eshaghi, Keir X. Yong, Sebastian J. Crutch, Polina Golland, Daniel C. Alexander |
Abstract | We introduce Disease Knowledge Transfer (DKT), a novel technique for transferring biomarker information between related neurodegenerative diseases. DKT infers robust multimodal biomarker trajectories in rare neurodegenerative diseases even when only limited, unimodal data is available, by transferring information from larger multimodal datasets from common neurodegenerative diseases. DKT is a joint-disease generative model of biomarker progressions, which exploits biomarker relationships that are shared across diseases. Our proposed method allows, for the first time, the estimation of plausible, multimodal biomarker trajectories in Posterior Cortical Atrophy (PCA), a rare neurodegenerative disease where only unimodal MRI data is available. For this we train DKT on a combined dataset containing subjects with two distinct diseases and sizes of data available: 1) a larger, multimodal typical AD (tAD) dataset from the TADPOLE Challenge, and 2) a smaller unimodal Posterior Cortical Atrophy (PCA) dataset from the Dementia Research Centre (DRC), for which only a limited number of Magnetic Resonance Imaging (MRI) scans are available. Although validation is challenging due to lack of data in PCA, we validate DKT on synthetic data and two patient datasets (TADPOLE and PCA cohorts), showing it can estimate the ground truth parameters in the simulation and predict unseen biomarkers on the two patient datasets. While we demonstrated DKT on Alzheimer’s variants, we note DKT is generalisable to other forms of related neurodegenerative diseases. Source code for DKT is available online: https://github.com/mrazvan22/dkt. |
Tasks | Transfer Learning |
Published | 2019-01-11 |
URL | https://arxiv.org/abs/1901.03517v2 |
https://arxiv.org/pdf/1901.03517v2.pdf | |
PWC | https://paperswithcode.com/paper/disease-knowledge-transfer-across |
Repo | |
Framework | |
When Explainability Meets Adversarial Learning: Detecting Adversarial Examples using SHAP Signatures
Title | When Explainability Meets Adversarial Learning: Detecting Adversarial Examples using SHAP Signatures |
Authors | Gil Fidel, Ron Bitton, Asaf Shabtai |
Abstract | State-of-the-art deep neural networks (DNNs) are highly effective in solving many complex real-world problems. However, these models are vulnerable to adversarial perturbation attacks, and despite the plethora of research in this domain, to this day, adversaries still have the upper hand in the cat and mouse game of adversarial example generation methods vs. detection and prevention methods. In this research, we present a novel detection method that uses Shapley Additive Explanations (SHAP) values computed for the internal layers of a DNN classifier to discriminate between normal and adversarial inputs. We evaluate our method by building an extensive dataset of adversarial examples over the popular CIFAR-10 and MNIST datasets, and training a neural network-based detector to distinguish between normal and adversarial inputs. We evaluate our detector against adversarial examples generated by diverse state-of-the-art attacks and demonstrate its high detection accuracy and strong generalization ability to adversarial inputs generated with different attack methods. |
Tasks | |
Published | 2019-09-08 |
URL | https://arxiv.org/abs/1909.03418v1 |
https://arxiv.org/pdf/1909.03418v1.pdf | |
PWC | https://paperswithcode.com/paper/when-explainability-meets-adversarial |
Repo | |
Framework | |
An Implicit Form of Krasulina’s k-PCA Update without the Orthonormality Constraint
Title | An Implicit Form of Krasulina’s k-PCA Update without the Orthonormality Constraint |
Authors | Ehsan Amid, Manfred K. Warmuth |
Abstract | We shed new insights on the two commonly used updates for the online $k$-PCA problem, namely, Krasulina’s and Oja’s updates. We show that Krasulina’s update corresponds to a projected gradient descent step on the Stiefel manifold of the orthonormal $k$-frames, while Oja’s update amounts to a gradient descent step using the unprojected gradient. Following these observations, we derive a more \emph{implicit} form of Krasulina’s $k$-PCA update, i.e. a version that uses the information of the future gradient as much as possible. Most interestingly, our implicit Krasulina update avoids the costly QR-decomposition step by bypassing the orthonormality constraint. We show that the new update in fact corresponds to an online EM step applied to a probabilistic $k$-PCA model. The probabilistic view of the updates allows us to combine multiple models in a distributed setting. We show experimentally that the implicit Krasulina update yields superior convergence while being significantly faster. We also give strong evidence that the new update can benefit from parallelism and is more stable w.r.t. tuning of the learning rate. |
Tasks | |
Published | 2019-09-11 |
URL | https://arxiv.org/abs/1909.04803v1 |
https://arxiv.org/pdf/1909.04803v1.pdf | |
PWC | https://paperswithcode.com/paper/an-implicit-form-of-krasulinas-k-pca-update |
Repo | |
Framework | |
Prediction of individual progression rate in Parkinson’s disease using clinical measures and biomechanical measures of gait and postural stability
Title | Prediction of individual progression rate in Parkinson’s disease using clinical measures and biomechanical measures of gait and postural stability |
Authors | Vyom Raval, Kevin P. Nguyen, Ashley Gerald, Richard B. Dewey Jr., Albert Montillo |
Abstract | Parkinson’s disease (PD) is a common neurological disorder characterized by gait impairment. PD has no cure, and an impediment to developing a treatment is the lack of any accepted method to predict disease progression rate. The primary aim of this study was to develop a model using clinical measures and biomechanical measures of gait and postural stability to predict an individual’s PD progression over two years. Data from 160 PD subjects were utilized. Machine learning models, including XGBoost and Feed Forward Neural Networks, were developed using extensive model optimization and cross-validation. The highest performing model was a neural network that used a group of clinical measures, achieved a PPV of 71% in identifying fast progressors, and explained a large portion (37%) of the variance in an individual’s progression rate on held-out test data. This demonstrates the potential to predict individual PD progression rate and enrich trials by analyzing clinical and biomechanical measures with machine learning. |
Tasks | |
Published | 2019-11-22 |
URL | https://arxiv.org/abs/1911.10227v1 |
https://arxiv.org/pdf/1911.10227v1.pdf | |
PWC | https://paperswithcode.com/paper/prediction-of-individual-progression-rate-in |
Repo | |
Framework | |
Biology and Compositionality: Empirical Considerations for Emergent-Communication Protocols
Title | Biology and Compositionality: Empirical Considerations for Emergent-Communication Protocols |
Authors | Travis LaCroix |
Abstract | Significant advances have been made in artificial systems by using biological systems as a guide. However, there is often little interaction between computational models for emergent communication and biological models of the emergence of language. Many researchers in language origins and emergent communication take compositionality as their primary target for explaining how simple communication systems can become more like natural language. However, there is reason to think that compositionality is the wrong target on the biological side, and so too the wrong target on the machine-learning side. As such, the purpose of this paper is to explore this claim. This has theoretical implications for language origins research more generally, but the focus here will be the implications for research on emergent communication in computer science and machine learning—specifically regarding the types of programmes that might be expected to work and those which will not. I further suggest an alternative approach for future research which focuses on reflexivity, rather than compositionality, as a target for explaining how simple communication systems may become more like natural language. I end by providing some reference to the language origins literature that may be of some use to researchers in machine learning. |
Tasks | |
Published | 2019-11-26 |
URL | https://arxiv.org/abs/1911.11668v2 |
https://arxiv.org/pdf/1911.11668v2.pdf | |
PWC | https://paperswithcode.com/paper/biology-and-compositionality-empirical |
Repo | |
Framework | |