Paper Group ANR 852
Cross-Domain Cascaded Deep Feature Translation. Assessing Partisan Traits of News Text Attributions. Zero-shot Dependency Parsing with Pre-trained Multilingual Sentence Representations. ReWE: Regressing Word Embeddings for Regularization of Neural Machine Translation Systems. Snowball: Iterative Model Evolution and Confident Sample Discovery for Se …
Cross-Domain Cascaded Deep Feature Translation
Title | Cross-Domain Cascaded Deep Feature Translation |
Authors | Oren Katzir, Dani Lischinski, Daniel Cohen-Or |
Abstract | In recent years we have witnessed tremendous progress in unpaired image-to-image translation methods, propelled by the emergence of DNNs and adversarial training strategies. However, most existing methods focus on transfer of style and appearance, rather than on shape translation. The latter task is challenging, due to its intricate non-local nature, which calls for additional supervision. We mitigate this by descending the deep layers of a pre-trained network, where the deep features contain more semantics, and applying the translation from and between these deep features. Specifically, we leverage VGG, which is a classification network, pre-trained with large-scale semantic supervision. Our translation is performed in a cascaded, deep-to-shallow, fashion, along the deep feature hierarchy: we first translate between the deepest layers that encode the higher-level semantic content of the image, proceeding to translate the shallower layers, conditioned on the deeper ones. We show that our method is able to translate between different domains, which exhibit significantly different shapes. We evaluate our method both qualitatively and quantitatively and compare it to state-of-the-art image-to-image translation methods. Our code and trained models will be made available. |
Tasks | Image-to-Image Translation |
Published | 2019-06-04 |
URL | https://arxiv.org/abs/1906.01526v1 |
https://arxiv.org/pdf/1906.01526v1.pdf | |
PWC | https://paperswithcode.com/paper/cross-domain-cascaded-deep-feature |
Repo | |
Framework | |
Assessing Partisan Traits of News Text Attributions
Title | Assessing Partisan Traits of News Text Attributions |
Authors | Logan Martel, Edward Newell, Drew Margolin, Derek Ruths |
Abstract | On the topic of journalistic integrity, the current state of accurate, impartial news reporting has garnered much debate in context to the 2016 US Presidential Election. In pursuit of computational evaluation of news text, the statements (attributions) ascribed by media outlets to sources provide a common category of evidence on which to operate. In this paper, we develop an approach to compare partisan traits of news text attributions and apply it to characterize differences in statements ascribed to candidate, Hilary Clinton, and incumbent President, Donald Trump. In doing so, we present a model trained on over 600 in-house annotated attributions to identify each candidate with accuracy > 88%. Finally, we discuss insights from its performance for future research. |
Tasks | |
Published | 2019-01-25 |
URL | http://arxiv.org/abs/1902.02179v1 |
http://arxiv.org/pdf/1902.02179v1.pdf | |
PWC | https://paperswithcode.com/paper/assessing-partisan-traits-of-news-text |
Repo | |
Framework | |
Zero-shot Dependency Parsing with Pre-trained Multilingual Sentence Representations
Title | Zero-shot Dependency Parsing with Pre-trained Multilingual Sentence Representations |
Authors | Ke Tran, Arianna Bisazza |
Abstract | We investigate whether off-the-shelf deep bidirectional sentence representations trained on a massively multilingual corpus (multilingual BERT) enable the development of an unsupervised universal dependency parser. This approach only leverages a mix of monolingual corpora in many languages and does not require any translation data making it applicable to low-resource languages. In our experiments we outperform the best CoNLL 2018 language-specific systems in all of the shared task’s six truly low-resource languages while using a single system. However, we also find that (i) parsing accuracy still varies dramatically when changing the training languages and (ii) in some target languages zero-shot transfer fails under all tested conditions, raising concerns on the ‘universality’ of the whole approach. |
Tasks | Dependency Parsing |
Published | 2019-10-12 |
URL | https://arxiv.org/abs/1910.05479v1 |
https://arxiv.org/pdf/1910.05479v1.pdf | |
PWC | https://paperswithcode.com/paper/zero-shot-dependency-parsing-with-pre-trained |
Repo | |
Framework | |
ReWE: Regressing Word Embeddings for Regularization of Neural Machine Translation Systems
Title | ReWE: Regressing Word Embeddings for Regularization of Neural Machine Translation Systems |
Authors | Inigo Jauregi Unanue, Ehsan Zare Borzeshi, Nazanin Esmaili, Massimo Piccardi |
Abstract | Regularization of neural machine translation is still a significant problem, especially in low-resource settings. To mollify this problem, we propose regressing word embeddings (ReWE) as a new regularization technique in a system that is jointly trained to predict the next word in the translation (categorical value) and its word embedding (continuous value). Such a joint training allows the proposed system to learn the distributional properties represented by the word embeddings, empirically improving the generalization to unseen sentences. Experiments over three translation datasets have showed a consistent improvement over a strong baseline, ranging between 0.91 and 2.54 BLEU points, and also a marked improvement over a state-of-the-art system. |
Tasks | Machine Translation, Word Embeddings |
Published | 2019-04-04 |
URL | http://arxiv.org/abs/1904.02461v1 |
http://arxiv.org/pdf/1904.02461v1.pdf | |
PWC | https://paperswithcode.com/paper/rewe-regressing-word-embeddings-for |
Repo | |
Framework | |
Snowball: Iterative Model Evolution and Confident Sample Discovery for Semi-Supervised Learning on Very Small Labeled Datasets
Title | Snowball: Iterative Model Evolution and Confident Sample Discovery for Semi-Supervised Learning on Very Small Labeled Datasets |
Authors | Yang Li, Jianhe Yuan, Zhiqun Zhao, Hao Sun, Zhihai He |
Abstract | In this work, we develop a joint sample discovery and iterative model evolution method for semi-supervised learning on very small labeled training sets. We propose a master-teacher-student model framework to provide multi-layer guidance during the model evolution process with multiple iterations and generations. The teacher model is constructed by performing an exponential moving average of the student models obtained from past training steps. The master network combines the knowledge of the student and teacher models with additional access to newly discovered samples. The master and teacher models are then used to guide the training of the student network by enforcing the consistence between their predictions of unlabeled samples and evolve all models when more and more samples are discovered. Our extensive experiments demonstrate that the discovering confident samples from the unlabeled dataset, once coupled with the above master-teacher-student network evolution, can significantly improve the overall semi-supervised learning performance. For example, on the CIFAR-10 dataset, with a very small set of 250 labeled samples, our method achieves an error rate of 11.81 %, more than 38 % lower than the state-of-the-art method Mean-Teacher (49.91 %). |
Tasks | |
Published | 2019-09-04 |
URL | https://arxiv.org/abs/1909.01542v1 |
https://arxiv.org/pdf/1909.01542v1.pdf | |
PWC | https://paperswithcode.com/paper/snowball-iterative-model-evolution-and |
Repo | |
Framework | |
A Broad Class of Discrete-Time Hypercomplex-Valued Hopfield Neural Networks
Title | A Broad Class of Discrete-Time Hypercomplex-Valued Hopfield Neural Networks |
Authors | Fidelis Zanetti de Castro, Marcos Eduardo Valle |
Abstract | In this paper, we address the stability of a broad class of discrete-time hypercomplex-valued Hopfield-type neural networks. To ensure the neural networks belonging to this class always settle down at a stationary state, we introduce novel hypercomplex number systems referred to as real-part associative hypercomplex number systems. Real-part associative hypercomplex number systems generalize the well-known Cayley-Dickson algebras and real Clifford algebras and include the systems of real numbers, complex numbers, dual numbers, hyperbolic numbers, quaternions, tessarines, and octonions as particular instances. Apart from the novel hypercomplex number systems, we introduce a family of hypercomplex-valued activation functions called $\mathcal{B}$-projection functions. Broadly speaking, a $\mathcal{B}$-projection function projects the activation potential onto the set of all possible states of a hypercomplex-valued neuron. Using the theory presented in this paper, we confirm the stability analysis of several discrete-time hypercomplex-valued Hopfield-type neural networks from the literature. Moreover, we introduce and provide the stability analysis of a general class of Hopfield-type neural networks on Cayley-Dickson algebras. |
Tasks | |
Published | 2019-02-14 |
URL | https://arxiv.org/abs/1902.05478v3 |
https://arxiv.org/pdf/1902.05478v3.pdf | |
PWC | https://paperswithcode.com/paper/a-broad-class-of-discrete-time-hypercomplex |
Repo | |
Framework | |
Intra-Ensemble in Neural Networks
Title | Intra-Ensemble in Neural Networks |
Authors | Yuan Gao, Zixiang Cai, Yimin Chen, Wenke Chen, Kan Yang, Chen Sun, Cong Yao |
Abstract | Improving model performance is always the key problem in machine learning including deep learning. However, stand-alone neural networks always suffer from marginal effect when stacking more layers. At the same time, ensemble is a useful technique to further enhance model performance. Nevertheless, training several independent stand-alone deep neural networks costs multiple resources. In this work, we propose Intra-Ensemble, an end-to-end strategy with stochastic training operations to train several sub-networks simultaneously within one neural network. Additional parameter size is marginal since the majority of parameters are mutually shared. Meanwhile, stochastic training increases the diversity of sub-networks with weight sharing, which significantly enhances intra-ensemble performance. Extensive experiments prove the applicability of intra-ensemble on various kinds of datasets and network architectures. Our models achieve comparable results with the state-of-the-art architectures on CIFAR-10 and CIFAR-100. |
Tasks | |
Published | 2019-04-09 |
URL | http://arxiv.org/abs/1904.04466v1 |
http://arxiv.org/pdf/1904.04466v1.pdf | |
PWC | https://paperswithcode.com/paper/intra-ensemble-in-neural-networks |
Repo | |
Framework | |
Deep Learning the EEG Manifold for Phonological Categorization from Active Thoughts
Title | Deep Learning the EEG Manifold for Phonological Categorization from Active Thoughts |
Authors | Pramit Saha, Muhammad Abdul-Mageed, Sidney Fels |
Abstract | Speech-related Brain Computer Interfaces (BCI) aim primarily at finding an alternative vocal communication pathway for people with speaking disabilities. As a step towards full decoding of imagined speech from active thoughts, we present a BCI system for subject-independent classification of phonological categories exploiting a novel deep learning based hierarchical feature extraction scheme. To better capture the complex representation of high-dimensional electroencephalography (EEG) data, we compute the joint variability of EEG electrodes into a channel cross-covariance matrix. We then extract the spatio-temporal information encoded within the matrix using a mixed deep neural network strategy. Our model framework is composed of a convolutional neural network (CNN), a long-short term network (LSTM), and a deep autoencoder. We train the individual networks hierarchically, feeding their combined outputs in a final gradient boosting classification step. Our best models achieve an average accuracy of 77.9% across five different binary classification tasks, providing a significant 22.5% improvement over previous methods. As we also show visually, our work demonstrates that the speech imagery EEG possesses significant discriminative information about the intended articulatory movements responsible for natural speech synthesis. |
Tasks | EEG, Speech Synthesis |
Published | 2019-04-08 |
URL | http://arxiv.org/abs/1904.04358v1 |
http://arxiv.org/pdf/1904.04358v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-the-eeg-manifold-for |
Repo | |
Framework | |
Arterial incident duration prediction using a bi-level framework of extreme gradient-tree boosting
Title | Arterial incident duration prediction using a bi-level framework of extreme gradient-tree boosting |
Authors | Adriana-Simona Mihaita, Zheyuan Liu, Chen Cai, Marian-Andrei Rizoiu |
Abstract | Predicting traffic incident duration is a major challenge for many traffic centres around the world. Most research studies focus on predicting the incident duration on motorways rather than arterial roads, due to a high network complexity and lack of data. In this paper we propose a bi-level framework for predicting the accident duration on arterial road networks in Sydney, based on operational requirements of incident clearance target which is less than 45 minutes. Using incident baseline information, we first deploy a classification method using various ensemble tree models in order to predict whether a new incident will be cleared in less than 45min or not. If the incident was classified as short-term, then various regression models are developed for predicting the actual incident duration in minutes by incorporating various traffic flow features. After outlier removal and intensive model hyper-parameter tuning through randomized search and cross-validation, we show that the extreme gradient boost approach outperformed all models, including the gradient-boosted decision-trees by almost 53%. Finally, we perform a feature importance evaluation for incident duration prediction and show that the best prediction results are obtained when leveraging the real-time traffic flow in vicinity road sections to the reported accident location. |
Tasks | Feature Importance |
Published | 2019-05-29 |
URL | https://arxiv.org/abs/1905.12254v1 |
https://arxiv.org/pdf/1905.12254v1.pdf | |
PWC | https://paperswithcode.com/paper/arterial-incident-duration-prediction-using-a |
Repo | |
Framework | |
Recognition of Advertisement Emotions with Application to Computational Advertising
Title | Recognition of Advertisement Emotions with Application to Computational Advertising |
Authors | Abhinav Shukla, Shruti Shriya Gullapuram, Harish Katti, Mohan Kankanhalli, Stefan Winkler, Ramanathan Subramanian |
Abstract | Advertisements (ads) often contain strong affective content to capture viewer attention and convey an effective message to the audience. However, most computational affect recognition (AR) approaches examine ads via the text modality, and only limited work has been devoted to decoding ad emotions from audiovisual or user cues. This work (1) compiles an affective ad dataset capable of evoking coherent emotions across users; (2) explores the efficacy of content-centric convolutional neural network (CNN) features for AR vis-~a-vis handcrafted audio-visual descriptors; (3) examines user-centric ad AR from Electroencephalogram (EEG) responses acquired during ad-viewing, and (4) demonstrates how better affect predictions facilitate effective computational advertising as determined by a study involving 18 users. Experiments reveal that (a) CNN features outperform audiovisual descriptors for content-centric AR; (b) EEG features are able to encode ad-induced emotions better than content-based features; (c) Multi-task learning performs best among a slew of classification algorithms to achieve optimal AR, and (d) Pursuant to (b), EEG features also enable optimized ad insertion onto streamed video, as compared to content-based or manual insertion techniques in terms of ad memorability and overall user experience. |
Tasks | EEG, Multi-Task Learning |
Published | 2019-04-03 |
URL | http://arxiv.org/abs/1904.01778v1 |
http://arxiv.org/pdf/1904.01778v1.pdf | |
PWC | https://paperswithcode.com/paper/recognition-of-advertisement-emotions-with |
Repo | |
Framework | |
Incremental Reinforcement Learning — a New Continuous Reinforcement Learning Frame Based on Stochastic Differential Equation methods
Title | Incremental Reinforcement Learning — a New Continuous Reinforcement Learning Frame Based on Stochastic Differential Equation methods |
Authors | Tianhao Chen, Limei Cheng, Yang Liu, Wenchuan Jia, Shugen Ma |
Abstract | Continuous reinforcement learning such as DDPG and A3C are widely used in robot control and autonomous driving. However, both methods have theoretical weaknesses. While DDPG cannot control noises in the control process, A3C does not satisfy the continuity conditions under the Gaussian policy. To address these concerns, we propose a new continues reinforcement learning method based on stochastic differential equations and we call it Incremental Reinforcement Learning (IRL). This method not only guarantees the continuity of actions within any time interval, but controls the variance of actions in the training process. In addition, our method does not assume Markov control in agents’ action control and allows agents to predict scene changes for action selection. With our method, agents no longer passively adapt to the environment. Instead, they positively interact with the environment for maximum rewards. |
Tasks | Autonomous Driving |
Published | 2019-08-08 |
URL | https://arxiv.org/abs/1908.02974v1 |
https://arxiv.org/pdf/1908.02974v1.pdf | |
PWC | https://paperswithcode.com/paper/incremental-reinforcement-learning-a-new |
Repo | |
Framework | |
Interpreting a Recurrent Neural Network Model for ICU Mortality Using Learned Binary Masks
Title | Interpreting a Recurrent Neural Network Model for ICU Mortality Using Learned Binary Masks |
Authors | Long V. Ho, Melissa D. Aczon, David Ledbetter, Randall Wetzel |
Abstract | An attribution method was developed to interpret a recurrent neural network (RNN) trained to predict a child’s risk of ICU mortality using multi-modal, time series data in the Electronic Medical Records. By learning a sparse, binary mask that highlights salient features of the input data, critical features determining an individual patient’s severity of illness could be identified. The method, called Learned Binary Masks (LBM), demonstrated that the RNN used different feature sets specific to each patient’s illness; and further, the features highlighted aligned with clinical intuition of the patient’s disease trajectories. LBM was also used to identify the most salient features across the model, analogous to “feature importance” computed in the Random Forest. This measure of the RNN’s feature importance was further used to select the 25% most used features for training a second RNN model. Interestingly, but not surprisingly, the second model maintained similar performance to the model trained on all features. LBM is data-agnostic and can be used to interpret the predictions of any differentiable model. |
Tasks | Feature Importance, Time Series |
Published | 2019-05-23 |
URL | https://arxiv.org/abs/1905.09865v2 |
https://arxiv.org/pdf/1905.09865v2.pdf | |
PWC | https://paperswithcode.com/paper/interpreting-a-recurrent-neural-network-model |
Repo | |
Framework | |
Naive probability
Title | Naive probability |
Authors | Zalan Gyenis, Andras Kornai |
Abstract | We describe a rational, but low resolution model of probability. |
Tasks | |
Published | 2019-05-20 |
URL | https://arxiv.org/abs/1905.10924v1 |
https://arxiv.org/pdf/1905.10924v1.pdf | |
PWC | https://paperswithcode.com/paper/190510924 |
Repo | |
Framework | |
Hierarchical Target-Attentive Diagnosis Prediction in Heterogeneous Information Networks
Title | Hierarchical Target-Attentive Diagnosis Prediction in Heterogeneous Information Networks |
Authors | Anahita Hosseini, Tyler Davis, Majid Sarrafzadeh |
Abstract | We introduce HTAD, a novel model for diagnosis prediction using Electronic Health Records (EHR) represented as Heterogeneous Information Networks. Recent studies on modeling EHR have shown success in automatically learning representations of the clinical records in order to avoid the need for manual feature selection. However, these representations are often learned and aggregated without specificity for the different possible targets being predicted. Our model introduces a target-aware hierarchical attention mechanism that allows it to learn to attend to the most important clinical records when aggregating their representations for prediction of a diagnosis. We evaluate our model using a publicly available benchmark dataset and demonstrate that the use of target-aware attention significantly improves performance compared to the current state of the art. Additionally, we propose a method for incorporating non-categorical data into our predictions and demonstrate that this technique leads to further performance improvements. Lastly, we demonstrate that the predictions made by our proposed model are easily interpretable. |
Tasks | Feature Selection |
Published | 2019-12-22 |
URL | https://arxiv.org/abs/1912.10552v1 |
https://arxiv.org/pdf/1912.10552v1.pdf | |
PWC | https://paperswithcode.com/paper/hierarchical-target-attentive-diagnosis |
Repo | |
Framework | |
Distinguishing Clinical Sentiment: The Importance of Domain Adaptation in Psychiatric Patient Health Records
Title | Distinguishing Clinical Sentiment: The Importance of Domain Adaptation in Psychiatric Patient Health Records |
Authors | Eben Holderness, Philip Cawkwell, Kirsten Bolton, James Pustejovsky, Mei-Hua Hall |
Abstract | Recently natural language processing (NLP) tools have been developed to identify and extract salient risk indicators in electronic health records (EHRs). Sentiment analysis, although widely used in non-medical areas for improving decision making, has been studied minimally in the clinical setting. In this study, we undertook, to our knowledge, the first domain adaptation of sentiment analysis to psychiatric EHRs by defining psychiatric clinical sentiment, performing an annotation project, and evaluating multiple sentence-level sentiment machine learning (ML) models. Results indicate that off-the-shelf sentiment analysis tools fail in identifying clinically positive or negative polarity, and that the definition of clinical sentiment that we provide is learnable with relatively small amounts of training data. This project is an initial step towards further refining sentiment analysis methods for clinical use. Our long-term objective is to incorporate the results of this project as part of a machine learning model that predicts inpatient readmission risk. We hope that this work will initiate a discussion concerning domain adaptation of sentiment analysis to the clinical setting. |
Tasks | Decision Making, Domain Adaptation, Sentiment Analysis |
Published | 2019-04-05 |
URL | http://arxiv.org/abs/1904.03225v1 |
http://arxiv.org/pdf/1904.03225v1.pdf | |
PWC | https://paperswithcode.com/paper/distinguishing-clinical-sentiment-the |
Repo | |
Framework | |