January 24, 2020

2774 words 14 mins read

Paper Group NANR 132

Paper Group NANR 132

Comparing a Hand-crafted to an Automatically Generated Feature Set for Deep Learning: Pairwise Translation Evaluation. Linguistically-Informed Specificity and Semantic Plausibility for Dialogue Generation. Simple But Not Na"\ive: Fine-Grained Arabic Dialect Identification Using Only N-Grams. Nonlinear scaling of resource allocation in sensory bott …

Comparing a Hand-crafted to an Automatically Generated Feature Set for Deep Learning: Pairwise Translation Evaluation

Title Comparing a Hand-crafted to an Automatically Generated Feature Set for Deep Learning: Pairwise Translation Evaluation
Authors Despoina Mouratidis, Katia Lida Kermanidis
Abstract The automatic evaluation of machine translation (MT) has proven to be a very significant research topic. Most automatic evaluation methods focus on the evaluation of the output of MT as they compute similarity scores that represent translation quality. This work targets on the performance of MT evaluation. We present a general scheme for learning to classify parallel translations, using linguistic information, of two MT model outputs and one human (reference) translation. We present three experiments to this scheme using neural networks (NN). One using string based hand-crafted features (Exp1), the second using automatically trained embeddings from the reference and the two MT outputs (one from a statistical machine translation (SMT) model and the other from a neural ma-chine translation (NMT) model), which are learned using NN (Exp2), and the third experiment (Exp3) that combines information from the other two experiments. The languages involved are English (EN), Greek (GR) and Italian (IT) segments are educational in domain. The proposed language-independent learning scheme which combines information from the two experiments (experiment 3) achieves higher classification accuracy compared with models using BLEU score information as well as other classification approaches, such as Random Forest (RF) and Support Vector Machine (SVM).
Tasks Machine Translation
Published 2019-09-01
URL https://www.aclweb.org/anthology/W19-8708/
PDF https://www.aclweb.org/anthology/W19-8708
PWC https://paperswithcode.com/paper/comparing-a-hand-crafted-to-an-automatically
Repo
Framework

Linguistically-Informed Specificity and Semantic Plausibility for Dialogue Generation

Title Linguistically-Informed Specificity and Semantic Plausibility for Dialogue Generation
Authors Wei-Jen Ko, Greg Durrett, Junyi Jessy Li
Abstract Sequence-to-sequence models for open-domain dialogue generation tend to favor generic, uninformative responses. Past work has focused on word frequency-based approaches to improving specificity, such as penalizing responses with only common words. In this work, we examine whether specificity is solely a frequency-related notion and find that more linguistically-driven specificity measures are better suited to improving response informativeness. However, we find that forcing a sequence-to-sequence model to be more specific can expose a host of other problems in the responses, including flawed discourse and implausible semantics. We rerank our model{'}s outputs using externally-trained classifiers targeting each of these identified factors. Experiments show that our final model using linguistically motivated specificity and plausibility reranking improves the informativeness, reasonableness, and grammatically of responses.
Tasks Dialogue Generation
Published 2019-06-01
URL https://www.aclweb.org/anthology/N19-1349/
PDF https://www.aclweb.org/anthology/N19-1349
PWC https://paperswithcode.com/paper/linguistically-informed-specificity-and
Repo
Framework

Simple But Not Na"\ive: Fine-Grained Arabic Dialect Identification Using Only N-Grams

Title Simple But Not Na"\ive: Fine-Grained Arabic Dialect Identification Using Only N-Grams
Authors Sohaila Eltanbouly, May Bashendy, Tamer Elsayed
Abstract This paper presents the participation of Qatar University team in MADAR shared task, which addresses the problem of sentence-level fine-grained Arabic Dialect Identification over 25 different Arabic dialects in addition to the Modern Standard Arabic. Arabic Dialect Identification is not a trivial task since different dialects share some features, e.g., utilizing the same character set and some vocabularies. We opted to adopt a very simple approach in terms of extracted features and classification models; we only utilize word and character n-grams as features, and Na ̈{\i}ve Bayes models as classifiers. Surprisingly, the simple approach achieved non-na ̈{\i}ve performance. The official results, reported on a held-out testing set, show that the dialect of a given sentence can be identified at an accuracy of 64.58{%} by our best submitted run.
Tasks
Published 2019-08-01
URL https://www.aclweb.org/anthology/W19-4624/
PDF https://www.aclweb.org/anthology/W19-4624
PWC https://paperswithcode.com/paper/simple-but-not-naive-fine-grained-arabic
Repo
Framework

Nonlinear scaling of resource allocation in sensory bottlenecks

Title Nonlinear scaling of resource allocation in sensory bottlenecks
Authors Laura Rose Edmondson, Alejandro Jimenez Rodriguez, Hannes P. Saal
Abstract In many sensory systems, information transmission is constrained by a bottleneck, where the number of output neurons is vastly smaller than the number of input neurons. Efficient coding theory predicts that in these scenarios the brain should allocate its limited resources by removing redundant information. Previous work has typically assumed that receptors are uniformly distributed across the sensory sheet, when in reality these vary in density, often by an order of magnitude. How, then, should the brain efficiently allocate output neurons when the density of input neurons is nonuniform? Here, we show analytically and numerically that resource allocation scales nonlinearly in efficient coding models that maximize information transfer, when inputs arise from separate regions with different receptor densities. Importantly, the proportion of output neurons allocated to a given input region changes depending on the width of the bottleneck, and thus cannot be predicted from input density or region size alone. Narrow bottlenecks favor magnification of high density input regions, while wider bottlenecks often cause contraction. Our results demonstrate that both expansion and contraction of sensory input regions can arise in efficient coding models and that the final allocation crucially depends on the neural resources made available.
Tasks
Published 2019-12-01
URL http://papers.nips.cc/paper/8972-nonlinear-scaling-of-resource-allocation-in-sensory-bottlenecks
PDF http://papers.nips.cc/paper/8972-nonlinear-scaling-of-resource-allocation-in-sensory-bottlenecks.pdf
PWC https://paperswithcode.com/paper/nonlinear-scaling-of-resource-allocation-in
Repo
Framework

Construction and Alignment of Multilingual Entailment Graphs for Semantic Inference

Title Construction and Alignment of Multilingual Entailment Graphs for Semantic Inference
Authors Sabine Weber, Mark Steedman
Abstract This paper presents ongoing work on the construction and alignment of predicate entailment graphs in English and German. We extract predicate-argument pairs from large corpora of monolingual English and German news text and construct monolingual paraphrase clusters and entailment graphs. We use an aligned subset of entities to derive the bilingual alignment of entities and relations, and achieve better than baseline results on a translated subset of a predicate entailment data set (Levy and Dagan, 2016) and the German portion of XNLI (Conneau et al., 2018).
Tasks
Published 2019-08-01
URL https://www.aclweb.org/anthology/papers/W/W19/W19-3625/
PDF https://www.aclweb.org/anthology/W19-3625
PWC https://paperswithcode.com/paper/construction-and-alignment-of-multilingual
Repo
Framework

Iterative Alignment Network for Continuous Sign Language Recognition

Title Iterative Alignment Network for Continuous Sign Language Recognition
Authors Junfu Pu, Wengang Zhou, Houqiang Li
Abstract In this paper, we propose an alignment network with iterative optimization for weakly supervised continuous sign language recognition. Our framework consists of two modules: a 3D convolutional residual network (3D-ResNet) for feature learning and an encoder-decoder network with connectionist temporal classification (CTC) for sequence modelling. The above two modules are optimized in an alternate way. In the encoder-decoder sequence learning network, two decoders are included, i.e., LSTM decoder and CTC decoder. Both decoders are jointly trained by maximum likelihood criterion with a soft Dynamic Time Warping (soft-DTW) alignment constraint. The warping path, which indicates the possible alignment between input video clips and sign words, is used to fine-tune the 3D-ResNet as training labels with classification loss. After fine-tuning, the improved features are extracted for optimization of encoder-decoder sequence learning network in next iteration. The proposed algorithm is evaluated on two large scale continuous sign language recognition benchmarks, i.e., RWTH-PHOENIX-Weather and CSL. Experimental results demonstrate the effectiveness of our proposed method.
Tasks Sign Language Recognition
Published 2019-06-01
URL http://openaccess.thecvf.com/content_CVPR_2019/html/Pu_Iterative_Alignment_Network_for_Continuous_Sign_Language_Recognition_CVPR_2019_paper.html
PDF http://openaccess.thecvf.com/content_CVPR_2019/papers/Pu_Iterative_Alignment_Network_for_Continuous_Sign_Language_Recognition_CVPR_2019_paper.pdf
PWC https://paperswithcode.com/paper/iterative-alignment-network-for-continuous
Repo
Framework

Are ambiguous conjunctions problematic for machine translation?

Title Are ambiguous conjunctions problematic for machine translation?
Authors Maja Popovi{'c}, Sheila Castilho
Abstract The translation of ambiguous words still poses challenges for machine translation. In this work, we carry out a systematic quantitative analysis regarding the ability of different machine translation systems to disambiguate the source language conjunctions {}but{''} and {}and{''}. We evaluate specialised test sets focused on the translation of these two conjunctions. The test sets contain source languages that do not distinguish different variants of the given conjunction, whereas the target languages do. In total, we evaluate the conjunction {}but{''} on 20 translation outputs, and the conjunction {}and{''} on 10. All machine translation systems almost perfectly recognise one variant of the target conjunction, especially for the source conjunction {}but{''}. The other target variant, however, represents a challenge for machine translation systems, with accuracy varying from 50{\%} to 95{\%} for {}but{''} and from 20{%} to 57{%} for {``}and{''}. The major error for all systems is replacing the correct target variant with the opposite one. |
Tasks Machine Translation
Published 2019-09-01
URL https://www.aclweb.org/anthology/R19-1111/
PDF https://www.aclweb.org/anthology/R19-1111
PWC https://paperswithcode.com/paper/are-ambiguous-conjunctions-problematic-for
Repo
Framework

Benchmark Dataset for Propaganda Detection in Czech Newspaper Texts

Title Benchmark Dataset for Propaganda Detection in Czech Newspaper Texts
Authors V{'\i}t Baisa, Ond{\v{r}}ej Herman, Ales Horak
Abstract Propaganda of various pressure groups ranging from big economies to ideological blocks is often presented in a form of objective newspaper texts. However, the real objectivity is here shaded with the support of imbalanced views and distorted attitudes by means of various manipulative stylistic techniques. In the project of Manipulative Propaganda Techniques in the Age of Internet, a new resource for automatic analysis of stylistic mechanisms for influencing the readers{'} opinion is developed. In its current version, the resource consists of 7,494 newspaper articles from four selected Czech digital news servers annotated for the presence of specific manipulative techniques. In this paper, we present the current state of the annotations and describe the structure of the dataset in detail. We also offer an evaluation of bag-of-words classification algorithms for the annotated manipulative techniques.
Tasks
Published 2019-09-01
URL https://www.aclweb.org/anthology/R19-1010/
PDF https://www.aclweb.org/anthology/R19-1010
PWC https://paperswithcode.com/paper/benchmark-dataset-for-propaganda-detection-in
Repo
Framework

Measuring Diachronic Evolution of Evaluative Adjectives with Word Embeddings: the Case for English, Norwegian, and Russian

Title Measuring Diachronic Evolution of Evaluative Adjectives with Word Embeddings: the Case for English, Norwegian, and Russian
Authors Julia Rodina, Baksh, Daria aeva, Vadim Fomin, Andrey Kutuzov, Samia Touileb, Erik Velldal
Abstract We measure the intensity of diachronic semantic shifts in adjectives in English, Norwegian and Russian across 5 decades. This is done in order to test the hypothesis that evaluative adjectives are more prone to temporal semantic change. To this end, 6 different methods of quantifying semantic change are used. Frequency-controlled experimental results show that, depending on the particular method, evaluative adjectives either do not differ from other types of adjectives in terms of semantic change or appear to actually be less prone to shifting (particularly, to {}jitter{'}-type shifting). Thus, in spite of many well-known examples of semantically changing evaluative adjectives (like {}terrific{'} or {`}incredible{'}), it seems that such cases are not specific to this particular type of words. |
Tasks Word Embeddings
Published 2019-08-01
URL https://www.aclweb.org/anthology/W19-4725/
PDF https://www.aclweb.org/anthology/W19-4725
PWC https://paperswithcode.com/paper/measuring-diachronic-evolution-of-evaluative
Repo
Framework

EmoTag – Towards an Emotion-Based Analysis of Emojis

Title EmoTag – Towards an Emotion-Based Analysis of Emojis
Authors Abu Awal Md Shoeb, Shahab Raji, Gerard de Melo
Abstract Despite being a fairly recent phenomenon, emojis have quickly become ubiquitous. Besides their extensive use in social media, they are now also invoked in customer surveys and feedback forms. Hence, there is a need for techniques to understand their sentiment and emotion. In this work, we provide a method to quantify the emotional association of basic emotions such as anger, fear, joy, and sadness for a set of emojis. We collect and process a unique corpus of 20 million emoji-centric tweets, such that we can capture rich emoji semantics using a comparably small dataset. We evaluate the induced emotion profiles of emojis with regard to their ability to predict word affect intensities as well as sentiment scores.
Tasks
Published 2019-09-01
URL https://www.aclweb.org/anthology/R19-1126/
PDF https://www.aclweb.org/anthology/R19-1126
PWC https://paperswithcode.com/paper/emotag-towards-an-emotion-based-analysis-of
Repo
Framework

On Making Reading Comprehension More Comprehensive

Title On Making Reading Comprehension More Comprehensive
Authors Matt Gardner, Jonathan Berant, Hannaneh Hajishirzi, Alon Talmor, Sewon Min
Abstract Machine reading comprehension, the task of evaluating a machine{'}s ability to comprehend a passage of text, has seen a surge in popularity in recent years. There are many datasets that are targeted at reading comprehension, and many systems that perform as well as humans on some of these datasets. Despite all of this interest, there is no work that systematically defines what reading comprehension is. In this work, we justify a question answering approach to reading comprehension and describe the various kinds of questions one might use to more fully test a system{'}s comprehension of a passage, moving beyond questions that only probe local predicate-argument structures. The main pitfall of this approach is that questions can easily have surface cues or other biases that allow a model to shortcut the intended reasoning process. We discuss ways proposed in current literature to mitigate these shortcuts, and we conclude with recommendations for future dataset collection efforts.
Tasks Machine Reading Comprehension, Question Answering, Reading Comprehension
Published 2019-11-01
URL https://www.aclweb.org/anthology/D19-5815/
PDF https://www.aclweb.org/anthology/D19-5815
PWC https://paperswithcode.com/paper/on-making-reading-comprehension-more
Repo
Framework

Using natural conversations to classify autism with limited data: Age matters

Title Using natural conversations to classify autism with limited data: Age matters
Authors Michael Hauser, Evangelos Sariyanidi, Birkan Tunc, Casey Zampella, Edward Brodkin, Robert Schultz, Julia Parish-Morris
Abstract Spoken language ability is highly heterogeneous in Autism Spectrum Disorder (ASD), which complicates efforts to identify linguistic markers for use in diagnostic classification, clinical characterization, and for research and clinical outcome measurement. Machine learning techniques that harness the power of multivariate statistics and non-linear data analysis hold promise for modeling this heterogeneity, but many models require enormous datasets, which are unavailable for most psychiatric conditions (including ASD). In lieu of such datasets, good models can still be built by leveraging domain knowledge. In this study, we compare two machine learning approaches: the first approach incorporates prior knowledge about language variation across middle childhood, adolescence, and adulthood to classify 6-minute naturalistic conversation samples from 140 age- and IQ-matched participants (81 with ASD), while the other approach treats all ages the same. We found that individual age-informed models were significantly more accurate than a single model tasked with building a common algorithm across age groups. Furthermore, predictive linguistic features differed significantly by age group, confirming the importance of considering age-related changes in language use when classifying ASD. Our results suggest that limitations imposed by heterogeneity inherent to ASD and from developmental change with age can be (at least partially) overcome using domain knowledge, such as understanding spoken language development from childhood through adulthood.
Tasks
Published 2019-06-01
URL https://www.aclweb.org/anthology/W19-3006/
PDF https://www.aclweb.org/anthology/W19-3006
PWC https://paperswithcode.com/paper/using-natural-conversations-to-classify
Repo
Framework

NeuralClassifier: An Open-source Neural Hierarchical Multi-label Text Classification Toolkit

Title NeuralClassifier: An Open-source Neural Hierarchical Multi-label Text Classification Toolkit
Authors Liqun Liu, Funan Mu, Pengyu Li, Xin Mu, Jing Tang, Xingsheng Ai, Ran Fu, Lifeng Wang, Xing Zhou
Abstract In this paper, we introduce NeuralClassifier, a toolkit for neural hierarchical multi-label text classification. NeuralClassifier is designed for quick implementation of neural models for hierarchical multi-label classification task, which is more challenging and common in real-world scenarios. A salient feature is that NeuralClassifier currently provides a variety of text encoders, such as FastText, TextCNN, TextRNN, RCNN, VDCNN, DPCNN, DRNN, AttentiveConvNet and Transformer encoder, etc. It also supports other text classification scenarios, including binary-class and multi-class classification. Built on PyTorch, the core operations are calculated in batch, making the toolkit efficient with the acceleration of GPU. Experiments show that models built in our toolkit achieve comparable performance with reported results in the literature.
Tasks Multi-Label Classification, Multi-Label Text Classification, Text Classification
Published 2019-07-01
URL https://www.aclweb.org/anthology/P19-3015/
PDF https://www.aclweb.org/anthology/P19-3015
PWC https://paperswithcode.com/paper/neuralclassifier-an-open-source-neural
Repo
Framework

Discovering the Functions of Language in Online Forums

Title Discovering the Functions of Language in Online Forums
Authors Youmna Ismaeil, Oana Balalau, Paramita Mirza
Abstract In this work, we revisit the functions of language proposed by linguist Roman Jakobson and we highlight their potential in analyzing online forum conversations. We investigate the relationship between functions and other properties of comments, such as controversiality. We propose and evaluate a semi-supervised framework for predicting the functions of Reddit comments. To accommodate further research, we release a corpus of 165K comments annotated with their functions of language.
Tasks
Published 2019-11-01
URL https://www.aclweb.org/anthology/D19-5534/
PDF https://www.aclweb.org/anthology/D19-5534
PWC https://paperswithcode.com/paper/discovering-the-functions-of-language-in
Repo
Framework

Elaborate Monocular Point and Line SLAM With Robust Initialization

Title Elaborate Monocular Point and Line SLAM With Robust Initialization
Authors Sang Jun Lee, Sung Soo Hwang
Abstract This paper presents a monocular indirect SLAM system which performs robust initialization and accurate localization. For initialization, we utilize a matrix factorization-based method. Matrix factorization-based methods require that extracted feature points must be tracked in all used frames. Since consistent tracking is difficult in challenging environments, a geometric interpolation that utilizes epipolar geometry is proposed. For localization, 3D lines are utilized. We propose the use of Plu cker line coordinates to represent geometric information of lines. We also propose orthonormal representation of Plu cker line coordinates and Jacobians of lines for better optimization. Experimental results show that the proposed initialization generates consistent and robust map in linear time with fast convergence even in challenging scenes. And localization using proposed line representations is faster, more accurate and memory efficient than other state-of-the-art methods.
Tasks
Published 2019-10-01
URL http://openaccess.thecvf.com/content_ICCV_2019/html/Lee_Elaborate_Monocular_Point_and_Line_SLAM_With_Robust_Initialization_ICCV_2019_paper.html
PDF http://openaccess.thecvf.com/content_ICCV_2019/papers/Lee_Elaborate_Monocular_Point_and_Line_SLAM_With_Robust_Initialization_ICCV_2019_paper.pdf
PWC https://paperswithcode.com/paper/elaborate-monocular-point-and-line-slam-with
Repo
Framework
comments powered by Disqus