October 15, 2019

2830 words 14 mins read

Paper Group NANR 48

Paper Group NANR 48

Initialization matters: Orthogonal Predictive State Recurrent Neural Networks. Integrating Knowledge-Supported Search into the INCEpTION Annotation Platform. Homonym Detection For Humor Recognition In Short Text. Microblog Conversation Recommendation via Joint Modeling of Topics and Discourse. The University of Maryland’s Chinese-English Neural Mac …

Initialization matters: Orthogonal Predictive State Recurrent Neural Networks

Title Initialization matters: Orthogonal Predictive State Recurrent Neural Networks
Authors Krzysztof Choromanski, Carlton Downey, Byron Boots
Abstract Learning to predict complex time-series data is a fundamental challenge in a range of disciplines including Machine Learning, Robotics, and Natural Language Processing. Predictive State Recurrent Neural Networks (PSRNNs) (Downey et al.) are a state-of-the-art approach for modeling time-series data which combine the benefits of probabilistic filters and Recurrent Neural Networks into a single model. PSRNNs leverage the concept of Hilbert Space Embeddings of distributions (Smola et al.) to embed predictive states into a Reproducing Kernel Hilbert Space, then estimate, predict, and update these embedded states using Kernel Bayes Rule. Practical implementations of PSRNNs are made possible by the machinery of Random Features, where input features are mapped into a new space where dot products approximate the kernel well. Unfortunately PSRNNs often require a large number of RFs to obtain good results, resulting in large models which are slow to execute and slow to train. Orthogonal Random Features (ORFs) (Choromanski et al.) is an improvement on RFs which has been shown to decrease the number of RFs required for pointwise kernel approximation. Unfortunately, it is not clear that ORFs can be applied to PSRNNs, as PSRNNs rely on Kernel Ridge Regression as a core component of their learning algorithm, and the theoretical guarantees of ORF do not apply in this setting. In this paper, we extend the theory of ORFs to Kernel Ridge Regression and show that ORFs can be used to obtain Orthogonal PSRNNs (OPSRNNs), which are smaller and faster than PSRNNs. In particular, we show that OPSRNN models clearly outperform LSTMs and furthermore, can achieve accuracy similar to PSRNNs with an order of magnitude smaller number of features needed.
Tasks Time Series
Published 2018-01-01
URL https://openreview.net/forum?id=HJJ23bW0b
PDF https://openreview.net/pdf?id=HJJ23bW0b
PWC https://paperswithcode.com/paper/initialization-matters-orthogonal-predictive
Repo
Framework

Integrating Knowledge-Supported Search into the INCEpTION Annotation Platform

Title Integrating Knowledge-Supported Search into the INCEpTION Annotation Platform
Authors Beto Boullosa, Richard Eckart de Castilho, Naveen Kumar, Jan-Christoph Klie, Iryna Gurevych
Abstract Annotating entity mentions and linking them to a knowledge resource are essential tasks in many domains. It disambiguates mentions, introduces cross-document coreferences, and the resources contribute extra information, e.g. taxonomic relations. Such tasks benefit from text annotation tools that integrate a search which covers the text, the annotations, as well as the knowledge resource. However, to the best of our knowledge, no current tools integrate knowledge-supported search as well as entity linking support. We address this gap by introducing knowledge-supported search functionality into the INCEpTION text annotation platform. In our approach, cross-document references are created by linking entity mentions to a knowledge base in the form of a structured hierarchical vocabulary. The resulting annotations are then indexed to enable fast and yet complex queries taking into account the text, the annotations, and the vocabulary structure.
Tasks Entity Linking
Published 2018-11-01
URL https://www.aclweb.org/anthology/D18-2022/
PDF https://www.aclweb.org/anthology/D18-2022
PWC https://paperswithcode.com/paper/integrating-knowledge-supported-search-into
Repo
Framework

Homonym Detection For Humor Recognition In Short Text

Title Homonym Detection For Humor Recognition In Short Text
Authors Sven van den Beukel, Lora Aroyo
Abstract In this paper, automatic homophone- and homograph detection are suggested as new useful features for humor recognition systems. The system combines style-features from previous studies on humor recognition in short text with ambiguity-based features. The performance of two potentially useful homograph detection methods is evaluated using crowdsourced annotations as ground truth. Adding homophones and homographs as features to the classifier results in a small but significant improvement over the style-features alone. For the task of humor recognition, recall appears to be a more important quality measure than precision. Although the system was designed for humor recognition in oneliners, it also performs well at the classification of longer humorous texts.
Tasks
Published 2018-10-01
URL https://www.aclweb.org/anthology/W18-6242/
PDF https://www.aclweb.org/anthology/W18-6242
PWC https://paperswithcode.com/paper/homonym-detection-for-humor-recognition-in
Repo
Framework

Microblog Conversation Recommendation via Joint Modeling of Topics and Discourse

Title Microblog Conversation Recommendation via Joint Modeling of Topics and Discourse
Authors Xingshan Zeng, Jing Li, Lu Wang, Nicholas Beauchamp, Sarah Shugars, Kam-Fai Wong
Abstract Millions of conversations are generated every day on social media platforms. With limited attention, it is challenging for users to select which discussions they would like to participate in. Here we propose a new method for microblog conversation recommendation. While much prior work has focused on post-level recommendation, we exploit both the conversational context, and user content and behavior preferences. We propose a statistical model that jointly captures: (1) topics for representing user interests and conversation content, and (2) discourse modes for describing user replying behavior and conversation dynamics. Experimental results on two Twitter datasets demonstrate that our system outperforms methods that only model content without considering discourse.
Tasks
Published 2018-06-01
URL https://www.aclweb.org/anthology/N18-1035/
PDF https://www.aclweb.org/anthology/N18-1035
PWC https://paperswithcode.com/paper/microblog-conversation-recommendation-via
Repo
Framework

The University of Maryland’s Chinese-English Neural Machine Translation Systems at WMT18

Title The University of Maryland’s Chinese-English Neural Machine Translation Systems at WMT18
Authors Weijia Xu, Marine Carpuat
Abstract This paper describes the University of Maryland{'}s submission to the WMT 2018 Chinese↔English news translation tasks. Our systems are BPE-based self-attentional Transformer networks with parallel and backtranslated monolingual training data. Using ensembling and reranking, we improve over the Transformer baseline by +1.4 BLEU for Chinese→English and +3.97 BLEU for English→Chinese on \textit{newstest2017}. Our best systems reach BLEU scores of 24.4 for Chinese→English and 39.0 for English→Chinese on \textit{newstest2018}.
Tasks Machine Translation
Published 2018-10-01
URL https://www.aclweb.org/anthology/W18-6431/
PDF https://www.aclweb.org/anthology/W18-6431
PWC https://paperswithcode.com/paper/the-university-of-marylands-chinese-english
Repo
Framework

Team GESIS Cologne: An all in all sentence-based approach for FEVER

Title Team GESIS Cologne: An all in all sentence-based approach for FEVER
Authors Wolfgang Otto
Abstract In this system description of our pipeline to participate at the Fever Shared Task, we describe our sentence-based approach. Throughout all steps of our pipeline, we regarded single sentences as our processing unit. In our IR-Component, we searched in the set of all possible Wikipedia introduction sentences without limiting sentences to a fixed number of relevant documents. In the entailment module, we judged every sentence separately and combined the result of the classifier for the top 5 sentences with the help of an ensemble classifier to make a judgment whether the truth of a statement can be derived from the given claim.
Tasks Coreference Resolution
Published 2018-11-01
URL https://www.aclweb.org/anthology/W18-5524/
PDF https://www.aclweb.org/anthology/W18-5524
PWC https://paperswithcode.com/paper/team-gesis-cologne-an-all-in-all-sentence
Repo
Framework

Measures of distortion for machine learning

Title Measures of distortion for machine learning
Authors Leena Chennuru Vankadara, Ulrike Von Luxburg
Abstract Given data from a general metric space, one of the standard machine learning pipelines is to first embed the data into a Euclidean space and subsequently apply out of the box machine learning algorithms to analyze the data. The quality of such an embedding is typically described in terms of a distortion measure. In this paper, we show that many of the existing distortion measures behave in an undesired way, when considered from a machine learning point of view. We investigate desirable properties of distortion measures and formally prove that most of the existing measures fail to satisfy these properties. These theoretical findings are supported by simulations, which for example demonstrate that existing distortion measures are not robust to noise or outliers and cannot serve as good indicators for classification accuracy. As an alternative, we suggest a new measure of distortion, called $\sigma$-distortion. We can show both in theory and in experiments that it satisfies all desirable properties and is a better candidate to evaluate distortion in the context of machine learning.
Tasks
Published 2018-12-01
URL http://papers.nips.cc/paper/7737-measures-of-distortion-for-machine-learning
PDF http://papers.nips.cc/paper/7737-measures-of-distortion-for-machine-learning.pdf
PWC https://paperswithcode.com/paper/measures-of-distortion-for-machine-learning
Repo
Framework

Leaving no token behind: comprehensive (and delicious) annotation of MWEs and supersenses

Title Leaving no token behind: comprehensive (and delicious) annotation of MWEs and supersenses
Authors Nathan Schneider
Abstract I will describe an unorthodox approach to lexical semantic annotation that prioritizes corpus coverage, democratizing analysis of a wide range of expression types. I argue that a lexicon-free lexical semantics{—}defined in terms of units and supersense tags{—}is an appetizing direction for NLP, as it is robust, cost-effective, easily understood, not too language-specific, and can serve as a foundation for richer semantic structure. Linguistic delicacies from the STREUSLE and DiMSUM corpora, which have been multiword- and supersense-annotated, attest to the veritable sm{"o}rg{\aa}sbord of noncanonical constructions in English, including various flavors of prepositions, MWEs, and other curiosities. Bio: Nathan Schneider is an annotation schemer and computational modeler for natural language. As Assistant Professor of Linguistics and Computer Science at Georgetown University, he looks for synergies between practical language technologies and the scientific study of language. He specializes in broad-coverage semantic analysis: designing linguistic meaning representations, annotating them in corpora, and automating them with statistical natural language processing techniques. A central focus in this research is the nexus between grammar and lexicon as manifested in multiword expressions and adpositions/case markers. He has inhabited UC Berkeley (BA in Computer Science and Linguistics), Carnegie Mellon University (Ph.D. in Language Technologies), and the University of Edinburgh (postdoc). Now a Hoya and leader of NERT, he continues to play with data and algorithms for linguistic meaning.
Tasks
Published 2018-08-01
URL https://www.aclweb.org/anthology/W18-4903/
PDF https://www.aclweb.org/anthology/W18-4903
PWC https://paperswithcode.com/paper/leaving-no-token-behind-comprehensive-and
Repo
Framework

Neural Network Architectures for Arabic Dialect Identification

Title Neural Network Architectures for Arabic Dialect Identification
Authors Elise Michon, Minh Quang Pham, Josep Crego, Jean Senellart
Abstract SYSTRAN competes this year for the first time to the DSL shared task, in the Arabic Dialect Identification subtask. We participate by training several Neural Network models showing that we can obtain competitive results despite the limited amount of training data available for learning. We report our experiments and detail the network architecture and parameters of our 3 runs: our best performing system consists in a Multi-Input CNN that learns separate embeddings for lexical, phonetic and acoustic input features (F1: 0.5289); we also built a CNN-biLSTM network aimed at capturing both spatial and sequential features directly from speech spectrograms (F1: 0.3894 at submission time, F1: 0.4235 with later found parameters); and finally a system relying on binary CNN-biLSTMs (F1: 0.4339).
Tasks Feature Engineering, Language Identification, Machine Translation, Sentence Classification, Speech Recognition
Published 2018-08-01
URL https://www.aclweb.org/anthology/W18-3914/
PDF https://www.aclweb.org/anthology/W18-3914
PWC https://paperswithcode.com/paper/neural-network-architectures-for-arabic
Repo
Framework

UBC-NLP at IEST 2018: Learning Implicit Emotion With an Ensemble of Language Models

Title UBC-NLP at IEST 2018: Learning Implicit Emotion With an Ensemble of Language Models
Authors Hassan Alhuzali, Mohamed Elaraby, Muhammad Abdul-Mageed
Abstract We describe UBC-NLP contribution to IEST-2018, focused at learning implicit emotion in Twitter data. Among the 30 participating teams, our system ranked the 4th (with 69.3{%} \textit{F}-score). Post competition, we were able to score slightly higher than the 3rd ranking system (reaching 70.7{%}). Our system is trained on top of a pre-trained language model (LM), fine-tuned on the data provided by the task organizers. Our best results are acquired by an average of an ensemble of language models. We also offer an analysis of system performance and the impact of training data size on the task. For example, we show that training our best model for only one epoch with {\textless} 40{%} of the data enables better performance than the baseline reported by Klinger et al. (2018) for the task.
Tasks Language Modelling
Published 2018-10-01
URL https://www.aclweb.org/anthology/W18-6250/
PDF https://www.aclweb.org/anthology/W18-6250
PWC https://paperswithcode.com/paper/ubc-nlp-at-iest-2018-learning-implicit
Repo
Framework

CTSys at SemEval-2018 Task 3: Irony in Tweets

Title CTSys at SemEval-2018 Task 3: Irony in Tweets
Authors Myan Sherif, Sherine Mamdouh, Wegdan Ghazi
Abstract The objective of this paper is to provide a description for a system built as our participation in SemEval-2018 Task 3 on Irony detection in English tweets. This system classifies a tweet as either ironic or non-ironic through a supervised learning approach. Our approach is to implement three feature models, and to then improve the performance of the supervised learning classification of tweets by combining many data features and using a voting system on four different classifiers. We describe the process of pre-processing data, extracting features, and running different types of classifiers against our feature set. In the competition, our system achieved an F1-score of 0.4675, ranking 35th in subtask A, and an F1-score score of 0.3014 ranking 22th in subtask B.
Tasks Feature Engineering
Published 2018-06-01
URL https://www.aclweb.org/anthology/S18-1094/
PDF https://www.aclweb.org/anthology/S18-1094
PWC https://paperswithcode.com/paper/ctsys-at-semeval-2018-task-3-irony-in-tweets
Repo
Framework

Facial Expression Recognition by De-Expression Residue Learning

Title Facial Expression Recognition by De-Expression Residue Learning
Authors Huiyuan Yang, Umur Ciftci, Lijun Yin
Abstract A facial expression is a combination of an expressive component and a neutral component of a person. In this paper, we propose to recognize facial expressions by extracting information of the expressive component through a de-expression learning procedure, called De-expression Residue Learning (DeRL). First, a generative model is trained by cGAN. This model generates the corresponding neutral face image for any input face image. We call this procedure de-expression because the expressive information is filtered out by the generative model; however, the expressive information is still recorded in the intermediate layers. Given the neutral face image, unlike previous works using pixel-level or feature-level difference for facial expression classification, our new method learns the deposition (or residue) that remains in the intermediate layers of the generative model. Such a residue is essential as it contains the expressive component deposited in the generative model from any input facial expression images. Seven public facial expression databases are employed in our experiments. With two databases (BU-4DFE and BP4D-spontaneous) for pre-training, the DeRL method has been evaluated on five databases, CK+, Oulu-CASIA, MMI, BU- 3DFE, and BP4D+. The experimental results demonstrate the superior performance of the proposed method.
Tasks Facial Expression Recognition
Published 2018-06-01
URL http://openaccess.thecvf.com/content_cvpr_2018/html/Yang_Facial_Expression_Recognition_CVPR_2018_paper.html
PDF http://openaccess.thecvf.com/content_cvpr_2018/papers/Yang_Facial_Expression_Recognition_CVPR_2018_paper.pdf
PWC https://paperswithcode.com/paper/facial-expression-recognition-by-de
Repo
Framework

Joint Pose and Expression Modeling for Facial Expression Recognition

Title Joint Pose and Expression Modeling for Facial Expression Recognition
Authors Feifei Zhang, Tianzhu Zhang, Qirong Mao, Changsheng Xu
Abstract Facial expression recognition (FER) is a challenging task due to different expressions under arbitrary poses. Most conventional approaches either perform face frontalization on a non-frontal facial image or learn separate classifiers for each pose. Different from existing methods, in this paper, we propose an end-to-end deep learning model by exploiting different poses and expressions jointly for simultaneous facial image synthesis and pose-invariant facial expression recognition. The proposed model is based on generative adversarial network (GAN) and enjoys several merits. First, the encoder-decoder structure of the generator can learn a generative and discriminative identity representation for face images. Second, the identity representation is explicitly disentangled from both expression and pose variations through the expression and pose codes. Third, our model can automatically generate face images with different expressions under arbitrary poses to enlarge and enrich the training set for FER. Quantitative and qualitative evaluations on both controlled and in-the-wild datasets demonstrate that the proposed algorithm performs favorably against state-of-the-art methods.
Tasks Facial Expression Recognition, Image Generation
Published 2018-06-01
URL http://openaccess.thecvf.com/content_cvpr_2018/html/Zhang_Joint_Pose_and_CVPR_2018_paper.html
PDF http://openaccess.thecvf.com/content_cvpr_2018/papers/Zhang_Joint_Pose_and_CVPR_2018_paper.pdf
PWC https://paperswithcode.com/paper/joint-pose-and-expression-modeling-for-facial
Repo
Framework

BML: A High-performance, Low-cost Gradient Synchronization Algorithm for DML Training

Title BML: A High-performance, Low-cost Gradient Synchronization Algorithm for DML Training
Authors Songtao Wang, Dan Li, Yang Cheng, Jinkun Geng, Yanshu Wang, Shuai Wang, Shu-Tao Xia, Jianping Wu
Abstract In distributed machine learning (DML), the network performance between machines significantly impacts the speed of iterative training. In this paper we propose BML, a new gradient synchronization algorithm with higher network performance and lower network cost than the current practice. BML runs on BCube network, instead of using the traditional Fat-Tree topology. BML algorithm is designed in such a way that, compared to the parameter server (PS) algorithm on a Fat-Tree network connecting the same number of server machines, BML achieves theoretically 1/k of the gradient synchronization time, with k/5 of switches (the typical number of k is 2∼4). Experiments of LeNet-5 and VGG-19 benchmarks on a testbed with 9 dual-GPU servers show that, BML reduces the job completion time of DML training by up to 56.4%.
Tasks
Published 2018-12-01
URL http://papers.nips.cc/paper/7678-bml-a-high-performance-low-cost-gradient-synchronization-algorithm-for-dml-training
PDF http://papers.nips.cc/paper/7678-bml-a-high-performance-low-cost-gradient-synchronization-algorithm-for-dml-training.pdf
PWC https://paperswithcode.com/paper/bml-a-high-performance-low-cost-gradient
Repo
Framework

Exponentially Weighted Imitation Learning for Batched Historical Data

Title Exponentially Weighted Imitation Learning for Batched Historical Data
Authors Qing Wang, Jiechao Xiong, Lei Han, Peng Sun, Han Liu, Tong Zhang
Abstract We consider deep policy learning with only batched historical trajectories. The main challenge of this problem is that the learner no longer has a simulator or ``environment oracle’’ as in most reinforcement learning settings. To solve this problem, we propose a monotonic advantage reweighted imitation learning strategy that is applicable to problems with complex nonlinear function approximation and works well with hybrid (discrete and continuous) action space. The method does not rely on the knowledge of the behavior policy, thus can be used to learn from data generated by an unknown policy. Under mild conditions, our algorithm, though surprisingly simple, has a policy improvement bound and outperforms most competing methods empirically. Thorough numerical results are also provided to demonstrate the efficacy of the proposed methodology. |
Tasks Imitation Learning
Published 2018-12-01
URL http://papers.nips.cc/paper/7866-exponentially-weighted-imitation-learning-for-batched-historical-data
PDF http://papers.nips.cc/paper/7866-exponentially-weighted-imitation-learning-for-batched-historical-data.pdf
PWC https://paperswithcode.com/paper/exponentially-weighted-imitation-learning-for
Repo
Framework
comments powered by Disqus