Paper Group NANR 149
Towards Summarization for Social Media - Results of the TL;DR Challenge. Embedding Complementary Deep Networks for Image Classification. Scalable Knowledge Graph Construction from Text Collections. Surface Realisation Using Full Delexicalisation. Modeling Paths for Explainable Knowledge Base Completion. Probing Word and Sentence Embeddings for Long …
Towards Summarization for Social Media - Results of the TL;DR Challenge
Title | Towards Summarization for Social Media - Results of the TL;DR Challenge |
Authors | Shahbaz Syed, Michael V{"o}lske, Nedim Lipka, Benno Stein, Hinrich Sch{"u}tze, Martin Potthast |
Abstract | In this paper, we report on the results of the TL;DR challenge, discussing an extensive manual evaluation of the expected properties of a good summary based on analyzing the comments provided by human annotators. |
Tasks | |
Published | 2019-10-01 |
URL | https://www.aclweb.org/anthology/W19-8666/ |
https://www.aclweb.org/anthology/W19-8666 | |
PWC | https://paperswithcode.com/paper/towards-summarization-for-social-media |
Repo | |
Framework | |
Embedding Complementary Deep Networks for Image Classification
Title | Embedding Complementary Deep Networks for Image Classification |
Authors | Qiuyu Chen, Wei Zhang, Jun Yu, Jianping Fan |
Abstract | In this paper, a deep embedding algorithm is developed to achieve higher accuracy rates on large-scale image classification. By adapting the importance of the object classes to their error rates, our deep embedding algorithm can train multiple complementary deep networks sequentially, where each of them focuses on achieving higher accuracy rates for different subsets of object classes in an easy-to-hard way. By integrating such complementary deep networks to generate an ensemble network, our deep embedding algorithm can improve the accuracy rates for the hard object classes (which initially have higher error rates) at certain degrees while effectively preserving high accuracy rates for the easy object classes. Our deep embedding algorithm has achieved higher overall accuracy rates on large scale image classification. |
Tasks | Image Classification |
Published | 2019-06-01 |
URL | http://openaccess.thecvf.com/content_CVPR_2019/html/Chen_Embedding_Complementary_Deep_Networks_for_Image_Classification_CVPR_2019_paper.html |
http://openaccess.thecvf.com/content_CVPR_2019/papers/Chen_Embedding_Complementary_Deep_Networks_for_Image_Classification_CVPR_2019_paper.pdf | |
PWC | https://paperswithcode.com/paper/embedding-complementary-deep-networks-for |
Repo | |
Framework | |
Scalable Knowledge Graph Construction from Text Collections
Title | Scalable Knowledge Graph Construction from Text Collections |
Authors | Ryan Clancy, Ihab F. Ilyas, Jimmy Lin |
Abstract | We present a scalable, open-source platform that {``}distills{''} a potentially large text collection into a knowledge graph. Our platform takes documents stored in Apache Solr and scales out the Stanford CoreNLP toolkit via Apache Spark integration to extract mentions and relations that are then ingested into the Neo4j graph database. The raw knowledge graph is then enriched with facts extracted from an external knowledge graph. The complete product can be manipulated by various applications using Neo4j{'}s native Cypher query language: We present a subgraph-matching approach to align extracted relations with external facts and show that fact verification, locating textual support for asserted facts, detecting inconsistent and missing facts, and extracting distantly-supervised training data can all be performed within the same framework. | |
Tasks | graph construction |
Published | 2019-11-01 |
URL | https://www.aclweb.org/anthology/D19-6607/ |
https://www.aclweb.org/anthology/D19-6607 | |
PWC | https://paperswithcode.com/paper/scalable-knowledge-graph-construction-from-1 |
Repo | |
Framework | |
Surface Realisation Using Full Delexicalisation
Title | Surface Realisation Using Full Delexicalisation |
Authors | Anastasia Shimorina, Claire Gardent |
Abstract | Surface realisation (SR) maps a meaning representation to a sentence and can be viewed as consisting of three subtasks: word ordering, morphological inflection and contraction generation (e.g., clitic attachment in Portuguese or elision in French). We propose a modular approach to surface realisation which models each of these components separately, and evaluate our approach on the 10 languages covered by the SR{'}18 Surface Realisation Shared Task shallow track. We provide a detailed evaluation of how word order, morphological realisation and contractions are handled by the model and an analysis of the differences in word ordering performance across languages. |
Tasks | Morphological Inflection |
Published | 2019-11-01 |
URL | https://www.aclweb.org/anthology/D19-1305/ |
https://www.aclweb.org/anthology/D19-1305 | |
PWC | https://paperswithcode.com/paper/surface-realisation-using-full |
Repo | |
Framework | |
Modeling Paths for Explainable Knowledge Base Completion
Title | Modeling Paths for Explainable Knowledge Base Completion |
Authors | Josua Stadelmaier, Sebastian Pad{'o} |
Abstract | A common approach in knowledge base completion (KBC) is to learn representations for entities and relations in order to infer missing facts by generalizing existing ones. A shortcoming of standard models is that they do not explain their predictions to make them verifiable easily to human inspection. In this paper, we propose the Context Path Model (CPM) which generates explanations for new facts in KBC by providing sets of \textit{context paths} as supporting evidence for these triples. For example, a new triple (Theresa May, nationality, Britain) may be explained by the path (Theresa May, born in, Eastbourne, contained in, Britain). The CPM is formulated as a wrapper that can be applied on top of various existing KBC models. We evaluate it for the well-established TransE model. We observe that its performance remains very close despite the added complexity, and that most of the paths proposed as explanations provide meaningful evidence to assess the correctness. |
Tasks | Knowledge Base Completion |
Published | 2019-08-01 |
URL | https://www.aclweb.org/anthology/W19-4816/ |
https://www.aclweb.org/anthology/W19-4816 | |
PWC | https://paperswithcode.com/paper/modeling-paths-for-explainable-knowledge-base |
Repo | |
Framework | |
Probing Word and Sentence Embeddings for Long-distance Dependencies Effects in French and English
Title | Probing Word and Sentence Embeddings for Long-distance Dependencies Effects in French and English |
Authors | Paola Merlo |
Abstract | The recent wide-spread and strong interest in RNNs has spurred detailed investigations of the distributed representations they generate and specifically if they exhibit properties similar to those characterising human languages. Results are at present inconclusive. In this paper, we extend previous work on long-distance dependencies in three ways. We manipulate word embeddings to translate them in a space that is attuned to the linguistic properties under study. We extend the work to sentence embeddings and to new languages. We confirm previous negative results: word embeddings and sentence embeddings do not unequivocally encode fine-grained linguistic properties of long-distance dependencies. |
Tasks | Sentence Embeddings, Word Embeddings |
Published | 2019-08-01 |
URL | https://www.aclweb.org/anthology/W19-4817/ |
https://www.aclweb.org/anthology/W19-4817 | |
PWC | https://paperswithcode.com/paper/probing-word-and-sentence-embeddings-for-long |
Repo | |
Framework | |
Comparing MT Approaches for Text Normalization
Title | Comparing MT Approaches for Text Normalization |
Authors | Claudia Matos Veliz, Orphee De Clercq, Veronique Hoste |
Abstract | One of the main characteristics of social media data is the use of non-standard language. Since NLP tools have been trained on traditional text material their performance drops when applied to social media data. One way to overcome this is to first perform text normalization. In this work, we apply text normalization to noisy English and Dutch text coming from different social media genres: text messages, message board posts and tweets. We consider the normalization task as a Machine Translation problem and test the two leading paradigms: statistical and neural machine translation. For SMT we explore the added value of varying background corpora for training the language model. For NMT we have a look at data augmentation since the parallel datasets we are working with are limited in size. Our results reveal that when relying on SMT to perform the normalization it is beneficial to use a background corpus that is close to the genre you are normalizing. Regarding NMT, we find that the translations - or normalizations - coming out of this model are far from perfect and that for a low-resource language like Dutch adding additional training data works better than artificially augmenting the data. |
Tasks | Data Augmentation, Language Modelling, Machine Translation |
Published | 2019-09-01 |
URL | https://www.aclweb.org/anthology/R19-1086/ |
https://www.aclweb.org/anthology/R19-1086 | |
PWC | https://paperswithcode.com/paper/comparing-mt-approaches-for-text |
Repo | |
Framework | |
Geolocation with Attention-Based Multitask Learning Models
Title | Geolocation with Attention-Based Multitask Learning Models |
Authors | Tommaso Fornaciari, Dirk Hovy |
Abstract | Geolocation, predicting the location of a post based on text and other information, has a huge potential for several social media applications. Typically, the problem is modeled as either multi-class classification or regression. In the first case, the classes are geographic areas previously identified; in the second, the models directly predict geographic coordinates. The former requires discretization of the coordinates, but yields better performance. The latter is potentially more precise and true to the nature of the problem, but often results in worse performance. We propose to combine the two approaches in an attentionbased multitask convolutional neural network that jointly predicts both discrete locations and continuous geographic coordinates. We evaluate the multi-task (MTL) model against singletask models and prior work. We find that MTL significantly improves performance, reporting large gains on one data set, but also note that the correlation between labels and coordinates has a marked impact on the effectiveness of including a regression task. |
Tasks | |
Published | 2019-11-01 |
URL | https://www.aclweb.org/anthology/D19-5528/ |
https://www.aclweb.org/anthology/D19-5528 | |
PWC | https://paperswithcode.com/paper/geolocation-with-attention-based-multitask |
Repo | |
Framework | |
BERT for Question Generation
Title | BERT for Question Generation |
Authors | Ying-Hong Chan, Yao-Chung Fan |
Abstract | In this study, we investigate the employment of the pre-trained BERT language model to tackle question generation tasks. We introduce two neural architectures built on top of BERT for question generation tasks. The first one is a straightforward BERT employment, which reveals the defects of directly using BERT for text generation. And, the second one remedies the first one by restructuring the BERT employment into a sequential manner for taking information from previous decoded results. Our models are trained and evaluated on the question-answering dataset SQuAD. Experiment results show that our best model yields state-of-the-art performance which advances the BLEU4 score of existing best models from 16.85 to 18.91. |
Tasks | Language Modelling, Question Answering, Question Generation, Text Generation |
Published | 2019-10-01 |
URL | https://www.aclweb.org/anthology/W19-8624/ |
https://www.aclweb.org/anthology/W19-8624 | |
PWC | https://paperswithcode.com/paper/bert-for-question-generation |
Repo | |
Framework | |
Sigmorphon 2019 Task 2 system description paper: Morphological analysis in context for many languages, with supervision from only a few
Title | Sigmorphon 2019 Task 2 system description paper: Morphological analysis in context for many languages, with supervision from only a few |
Authors | Brad Aiken, Jared Kelly, Alexis Palmer, Suleyman Olcay Polat, Taraka Rama, Rodney Nielsen |
Abstract | This paper presents the UNT HiLT+Ling system for the Sigmorphon 2019 shared Task 2: Morphological Analysis and Lemmatization in Context. Our core approach focuses on the morphological tagging task; part-of-speech tagging and lemmatization are treated as secondary tasks. Given the highly multilingual nature of the task, we propose an approach which makes minimal use of the supplied training data, in order to be extensible to languages without labeled training data for the morphological inflection task. Specifically, we use a parallel Bible corpus to align contextual embeddings at the verse level. The aligned verses are used to build cross-language translation matrices, which in turn are used to map between embedding spaces for the various languages. Finally, we use sets of inflected forms, primarily from a high-resource language, to induce vector representations for individual UniMorph tags. Morphological analysis is performed by matching vector representations to embeddings for individual tokens. While our system results are dramatically below the average system submitted for the shared task evaluation campaign, our method is (we suspect) unique in its minimal reliance on labeled training data. |
Tasks | Lemmatization, Morphological Analysis, Morphological Inflection, Morphological Tagging, Part-Of-Speech Tagging |
Published | 2019-08-01 |
URL | https://www.aclweb.org/anthology/W19-4211/ |
https://www.aclweb.org/anthology/W19-4211 | |
PWC | https://paperswithcode.com/paper/sigmorphon-2019-task-2-system-description |
Repo | |
Framework | |
Correlation clustering with local objectives
Title | Correlation clustering with local objectives |
Authors | Sanchit Kalhan, Konstantin Makarychev, Timothy Zhou |
Abstract | Correlation Clustering is a powerful graph partitioning model that aims to cluster items based on the notion of similarity between items. An instance of the Correlation Clustering problem consists of a graph G (not necessarily complete) whose edges are labeled by a binary classifier as similar and dissimilar. Classically, we are tasked with producing a clustering that minimizes the number of disagreements: an edge is in disagreement if it is a similar edge and is present across clusters or if it is a dissimilar edge and is present within a cluster. Define the disagreements vector to be an n dimensional vector indexed by the vertices, where the v-th index is the number of disagreements at vertex v. Recently, Puleo and Milenkovic (ICML ‘16) initiated the study of the Correlation Clustering framework in which the objectives were more general functions of the disagreements vector. In this paper, we study algorithms for minimizing \ell_q norms (q >= 1) of the disagreements vector for both arbitrary and complete graphs. We present the first known algorithm for minimizing the \ell_q norm of the disagreements vector on arbitrary graphs and also provide an improved algorithm for minimizing the \ell_q norm (q >= 1) of the disagreements vector on complete graphs. We also study an alternate cluster-wise local objective introduced by Ahmadi, Khuller and Saha (IPCO ‘19), which aims to minimize the maximum number of disagreements associated with a cluster. We present an improved (2 + \eps) approximation algorithm for this objective. |
Tasks | graph partitioning |
Published | 2019-12-01 |
URL | http://papers.nips.cc/paper/9132-correlation-clustering-with-local-objectives |
http://papers.nips.cc/paper/9132-correlation-clustering-with-local-objectives.pdf | |
PWC | https://paperswithcode.com/paper/correlation-clustering-with-local-objectives |
Repo | |
Framework | |
Towards Zero-shot Language Modeling
Title | Towards Zero-shot Language Modeling |
Authors | Edoardo Maria Ponti, Ivan Vuli{'c}, Ryan Cotterell, Roi Reichart, Anna Korhonen |
Abstract | Can we construct a neural language model which is inductively biased towards learning human language? Motivated by this question, we aim at constructing an informative prior for held-out languages on the task of character-level, open-vocabulary language modelling. We obtain this prior as the posterior over network weights conditioned on the data from a sample of training languages, which is approximated through Laplace{'}s method. Based on a large and diverse sample of languages, the use of our prior outperforms baseline models with an uninformative prior in both zero-shot and few-shot settings, showing that the prior is imbued with universal linguistic knowledge. Moreover, we harness broad language-specific information available for most languages of the world, i.e., features from typological databases, as distant supervision for held-out languages. We explore several language modelling conditioning techniques, including concatenation and meta-networks for parameter generation. They appear beneficial in the few-shot setting, but ineffective in the zero-shot setting. Since the paucity of even plain digital text affects the majority of the world{'}s languages, we hope that these insights will broaden the scope of applications for language technology. |
Tasks | Language Modelling |
Published | 2019-11-01 |
URL | https://www.aclweb.org/anthology/D19-1288/ |
https://www.aclweb.org/anthology/D19-1288 | |
PWC | https://paperswithcode.com/paper/towards-zero-shot-language-modeling |
Repo | |
Framework | |
Inverting and Modeling Morphological Inflection
Title | Inverting and Modeling Morphological Inflection |
Authors | Yohei Oseki, Yasutada Sudo, Hiromu Sakai, Alec Marantz |
Abstract | Previous {}wug{''} tests (Berko, 1958) on Japanese verbal inflection have demonstrated that Japanese speakers, both adults and children, cannot inflect novel present tense forms to { }correct{''} past tense forms predicted by rules of existent verbs (de Chene, 1982; Vance, 1987, 1991; Klafehn, 2003, 2013), indicating that Japanese verbs are merely stored in the mental lexicon. However, the implicit assumption that present tense forms are bases for verbal inflection should not be blindly extended to morphologically rich languages like Japanese in which both present and past tense forms are morphologically complex without inherent direction (Albright, 2002). Interestingly, there are also independent observations in the acquisition literature to suggest that past tense forms may be bases for verbal inflection in Japanese (Klafehn, 2003; Murasugi et al., 2010; Hirose, 2017; Tatsumi et al., 2018). In this paper, we computationally simulate two directions of verbal inflection in Japanese, Present → Past and Past → Present, with the rule-based computational model called Minimal Generalization Learner (MGL; Albright and Hayes, 2003) and experimentally evaluate the model with the bidirectional {``}wug{''} test where humans inflect novel verbs in two opposite directions. We conclude that Japanese verbs can be computed online via some generalizations and those generalizations do depend on the direction of morphological inflection. | |
Tasks | Morphological Inflection |
Published | 2019-08-01 |
URL | https://www.aclweb.org/anthology/W19-4220/ |
https://www.aclweb.org/anthology/W19-4220 | |
PWC | https://paperswithcode.com/paper/inverting-and-modeling-morphological |
Repo | |
Framework | |
Bayes Test of Precision, Recall, and F1 Measure for Comparison of Two Natural Language Processing Models
Title | Bayes Test of Precision, Recall, and F1 Measure for Comparison of Two Natural Language Processing Models |
Authors | Ruibo Wang, Jihong Li |
Abstract | Direct comparison on point estimation of the precision (P), recall (R), and F1 measure of two natural language processing (NLP) models on a common test corpus is unreasonable and results in less replicable conclusions due to a lack of a statistical test. However, the existing t-tests in cross-validation (CV) for model comparison are inappropriate because the distributions of P, R, F1 are skewed and an interval estimation of P, R, and F1 based on a t-test may exceed [0,1]. In this study, we propose to use a block-regularized 3{\mbox{$\times$}}2 CV (3{\mbox{$\times$}}2 BCV) in model comparison because it could regularize the difference in certain frequency distributions over linguistic units between training and validation sets and yield stable estimators of P, R, and F1. On the basis of the 3{\mbox{$\times$}}2 BCV, we calibrate the posterior distributions of P, R, and F1 and derive an accurate interval estimation of P, R, and F1. Furthermore, we formulate the comparison into a hypothesis testing problem and propose a novel Bayes test. The test could directly compute the probabilities of the hypotheses on the basis of the posterior distributions and provide more informative decisions than the existing significance t-tests. Three experiments with regard to NLP chunking tasks are conducted, and the results illustrate the validity of the Bayes test. |
Tasks | Chunking |
Published | 2019-07-01 |
URL | https://www.aclweb.org/anthology/P19-1405/ |
https://www.aclweb.org/anthology/P19-1405 | |
PWC | https://paperswithcode.com/paper/bayes-test-of-precision-recall-and-f1-measure |
Repo | |
Framework | |
Submodular Function Minimization with Noisy Evaluation Oracle
Title | Submodular Function Minimization with Noisy Evaluation Oracle |
Authors | Shinji Ito |
Abstract | This paper considers submodular function minimization with \textit{noisy evaluation oracles} that return the function value of a submodular objective with zero-mean additive noise. For this problem, we provide an algorithm that returns an $O(n^{3/2}/\sqrt{T})$-additive approximate solution in expectation, where $n$ and $T$ stand for the size of the problem and the number of oracle calls, respectively. There is no room for reducing this error bound by a factor smaller than $O(1/\sqrt{n})$. Indeed, we show that any algorithm will suffer additive errors of $\Omega(n/\sqrt{T})$ in the worst case. Further, we consider an extended problem setting with \textit{multiple-point feedback} in which we can get the feedback of $k$ function values with each oracle call. Under the additional assumption that each noisy oracle is submodular and that $2 \leq k = O(1)$, we provide an algorithm with an $O(n/\sqrt{T})$-additive error bound as well as a worst-case analysis including a lower bound of $\Omega(n/\sqrt{T})$, which together imply that the algorithm achieves an optimal error bound up to a constant. |
Tasks | |
Published | 2019-12-01 |
URL | http://papers.nips.cc/paper/9378-submodular-function-minimization-with-noisy-evaluation-oracle |
http://papers.nips.cc/paper/9378-submodular-function-minimization-with-noisy-evaluation-oracle.pdf | |
PWC | https://paperswithcode.com/paper/submodular-function-minimization-with-noisy |
Repo | |
Framework | |