January 24, 2020

2527 words 12 mins read

Paper Group NANR 130

Paper Group NANR 130

Automated Identification of Verbally Abusive Behaviors in Online Discussions. Subspace Structure-Aware Spectral Clustering for Robust Subspace Clustering. Vaijayant=\iko'sa Knowledge-Net. Preemptive Toxic Language Detection in Wikipedia Comments Using Thread-Level Context. Assessing Arabic Weblog Credibility via Deep Co-learning. Chains-of-Reason …

Automated Identification of Verbally Abusive Behaviors in Online Discussions

Title Automated Identification of Verbally Abusive Behaviors in Online Discussions
Authors Srecko Joksimovic, Ryan S. Baker, Jaclyn Ocumpaugh, Juan Miguel L. Andres, Ivan Tot, Elle Yuan Wang, Shane Dawson
Abstract Discussion forum participation represents one of the crucial factors for learning and often the only way of supporting social interactions in online settings. However, as much as sharing new ideas or asking thoughtful questions contributes learning, verbally abusive behaviors, such as expressing negative emotions in online discussions, could have disproportionate detrimental effects. To provide means for mitigating the potential negative effects on course participation and learning, we developed an automated classifier for identifying communication that show linguistic patterns associated with hostility in online forums. In so doing, we employ several well-established automated text analysis tools and build on the common practices for handling highly imbalanced datasets and reducing the sensitivity to overfitting. Although still in its infancy, our approach shows promising results (ROC AUC .73) towards establishing a robust detector of abusive behaviors. We further provide an overview of the classification (linguistic and contextual) features most indicative of online aggression.
Tasks
Published 2019-08-01
URL https://www.aclweb.org/anthology/W19-3505/
PDF https://www.aclweb.org/anthology/W19-3505
PWC https://paperswithcode.com/paper/automated-identification-of-verbally-abusive
Repo
Framework

Subspace Structure-Aware Spectral Clustering for Robust Subspace Clustering

Title Subspace Structure-Aware Spectral Clustering for Robust Subspace Clustering
Authors Masataka Yamaguchi, Go Irie, Takahito Kawanishi, Kunio Kashino
Abstract Subspace clustering is the problem of partitioning data drawn from a union of multiple subspaces. The most popular subspace clustering framework in recent years is the graph clustering-based approach, which performs subspace clustering in two steps: graph construction and graph clustering. Although both steps are equally important for accurate clustering, the vast majority of work has focused on improving the graph construction step rather than the graph clustering step. In this paper, we propose a novel graph clustering framework for robust subspace clustering. By incorporating a geometry-aware term with the spectral clustering objective, we encourage our framework to be robust to noise and outliers in given affinity matrices. We also develop an efficient expectation-maximization-based algorithm for optimization. Through extensive experiments on four real-world datasets, we demonstrate that the proposed method outperforms existing methods.
Tasks Graph Clustering, graph construction
Published 2019-10-01
URL http://openaccess.thecvf.com/content_ICCV_2019/html/Yamaguchi_Subspace_Structure-Aware_Spectral_Clustering_for_Robust_Subspace_Clustering_ICCV_2019_paper.html
PDF http://openaccess.thecvf.com/content_ICCV_2019/papers/Yamaguchi_Subspace_Structure-Aware_Spectral_Clustering_for_Robust_Subspace_Clustering_ICCV_2019_paper.pdf
PWC https://paperswithcode.com/paper/subspace-structure-aware-spectral-clustering
Repo
Framework

Vaijayant=\iko'sa Knowledge-Net

Title Vaijayant=\iko'sa Knowledge-Net
Authors Aruna Vayuvegula, Satish Kanugovi, Sivaja S Nair, Shivani V
Abstract
Tasks
Published 2019-10-01
URL https://www.aclweb.org/anthology/W19-7510/
PDF https://www.aclweb.org/anthology/W19-7510
PWC https://paperswithcode.com/paper/vaijayantikosa-knowledge-net
Repo
Framework

Preemptive Toxic Language Detection in Wikipedia Comments Using Thread-Level Context

Title Preemptive Toxic Language Detection in Wikipedia Comments Using Thread-Level Context
Authors Mladen Karan, Jan {\v{S}}najder
Abstract We address the task of automatically detecting toxic content in user generated texts. We fo cus on exploring the potential for preemptive moderation, i.e., predicting whether a particular conversation thread will, in the future, incite a toxic comment. Moreover, we perform preliminary investigation of whether a model that jointly considers all comments in a conversation thread outperforms a model that considers only individual comments. Using an existing dataset of conversations among Wikipedia contributors as a starting point, we compile a new large-scale dataset for this task consisting of labeled comments and comments from their conversation threads.
Tasks
Published 2019-08-01
URL https://www.aclweb.org/anthology/W19-3514/
PDF https://www.aclweb.org/anthology/W19-3514
PWC https://paperswithcode.com/paper/preemptive-toxic-language-detection-in
Repo
Framework

Assessing Arabic Weblog Credibility via Deep Co-learning

Title Assessing Arabic Weblog Credibility via Deep Co-learning
Authors Chadi Helwe, Shady Elbassuoni, Ayman Al Zaatari, Wassim El-Hajj
Abstract Assessing the credibility of online content has garnered a lot of attention lately. We focus on one such type of online content, namely weblogs or blogs for short. Some recent work attempted the task of automatically assessing the credibility of blogs, typically via machine learning. However, in the case of Arabic blogs, there are hardly any datasets available that can be used to train robust machine learning models for this difficult task. To overcome the lack of sufficient training data, we propose deep co-learning, a semi-supervised end-to-end deep learning approach to assess the credibility of Arabic blogs. In deep co-learning, multiple weak deep neural network classifiers are trained using a small labeled dataset, and each using a different view of the data. Each one of these classifiers is then used to classify unlabeled data, and its prediction is used to train the other classifiers in a semi-supervised fashion. We evaluate our deep co-learning approach on an Arabic blogs dataset, and we report significant improvements in performance compared to many baselines including fully-supervised deep learning models as well as ensemble models.
Tasks
Published 2019-08-01
URL https://www.aclweb.org/anthology/W19-4614/
PDF https://www.aclweb.org/anthology/W19-4614
PWC https://paperswithcode.com/paper/assessing-arabic-weblog-credibility-via-deep
Repo
Framework

Chains-of-Reasoning at TextGraphs 2019 Shared Task: Reasoning over Chains of Facts for Explainable Multi-hop Inference

Title Chains-of-Reasoning at TextGraphs 2019 Shared Task: Reasoning over Chains of Facts for Explainable Multi-hop Inference
Authors Rajarshi Das, Ameya Godbole, Manzil Zaheer, Shehzaad Dhuliawala, Andrew McCallum
Abstract This paper describes our submission to the shared task on {``}Multi-hop Inference Explanation Regeneration{''} in TextGraphs workshop at EMNLP 2019 (Jansen and Ustalov, 2019). Our system identifies chains of facts relevant to explain an answer to an elementary science examination question. To counter the problem of {}spurious chains{'} leading to {}semantic drifts{'}, we train a ranker that uses contextualized representation of facts to score its relevance for explaining an answer to a question. Our system was ranked first w.r.t the mean average precision (MAP) metric outperforming the second best system by 14.95 points. |
Tasks
Published 2019-11-01
URL https://www.aclweb.org/anthology/D19-5313/
PDF https://www.aclweb.org/anthology/D19-5313
PWC https://paperswithcode.com/paper/chains-of-reasoning-at-textgraphs-2019-shared
Repo
Framework

Minimally-Augmented Grammatical Error Correction

Title Minimally-Augmented Grammatical Error Correction
Authors Roman Grundkiewicz, Marcin Junczys-Dowmunt
Abstract There has been an increased interest in low-resource approaches to automatic grammatical error correction. We introduce Minimally-Augmented Grammatical Error Correction (MAGEC) that does not require any error-labelled data. Our unsupervised approach is based on a simple but effective synthetic error generation method based on confusion sets from inverted spell-checkers. In low-resource settings, we outperform the current state-of-the-art results for German and Russian GEC tasks by a large margin without using any real error-annotated training data. When combined with labelled data, our method can serve as an efficient pre-training technique
Tasks Grammatical Error Correction
Published 2019-11-01
URL https://www.aclweb.org/anthology/D19-5546/
PDF https://www.aclweb.org/anthology/D19-5546
PWC https://paperswithcode.com/paper/minimally-augmented-grammatical-error
Repo
Framework

Deep Parametric Indoor Lighting Estimation

Title Deep Parametric Indoor Lighting Estimation
Authors Marc-Andre Gardner, Yannick Hold-Geoffroy, Kalyan Sunkavalli, Christian Gagne, Jean-Francois Lalonde
Abstract We present a method to estimate lighting from a single image of an indoor scene. Previous work has used an environment map representation that does not account for the localized nature of indoor lighting. Instead, we represent lighting as a set of discrete 3D lights with geometric and photometric parameters. We train a deep neural network to regress these parameters from a single image, on a dataset of environment maps annotated with depth. We propose a differentiable layer to convert these parameters to an environment map to compute our loss; this bypasses the challenge of establishing correspondences between estimated and ground truth lights. We demonstrate, via quantitative and qualitative evaluations, that our representation and training scheme lead to more accurate results compared to previous work, while allowing for more realistic 3D object compositing with spatially-varying lighting.
Tasks
Published 2019-10-01
URL http://openaccess.thecvf.com/content_ICCV_2019/html/Gardner_Deep_Parametric_Indoor_Lighting_Estimation_ICCV_2019_paper.html
PDF http://openaccess.thecvf.com/content_ICCV_2019/papers/Gardner_Deep_Parametric_Indoor_Lighting_Estimation_ICCV_2019_paper.pdf
PWC https://paperswithcode.com/paper/deep-parametric-indoor-lighting-estimation-1
Repo
Framework

CALOR-QUEST : generating a training corpus for Machine Reading Comprehension models from shallow semantic annotations

Title CALOR-QUEST : generating a training corpus for Machine Reading Comprehension models from shallow semantic annotations
Authors Frederic Bechet, Cindy Aloui, Delphine Charlet, Geraldine Damnati, Johannes Heinecke, Alexis Nasr, Frederic Herledan
Abstract Machine reading comprehension is a task related to Question-Answering where questions are not generic in scope but are related to a particular document. Recently very large corpora (SQuAD, MS MARCO) containing triplets (document, question, answer) were made available to the scientific community to develop supervised methods based on deep neural networks with promising results. These methods need very large training corpus to be efficient, however such kind of data only exists for English and Chinese at the moment. The aim of this study is the development of such resources for other languages by proposing to generate in a semi-automatic way questions from the semantic Frame analysis of large corpora. The collect of natural questions is reduced to a validation/test set. We applied this method on the CALOR-Frame French corpus to develop the CALOR-QUEST resource presented in this paper.
Tasks Machine Reading Comprehension, Question Answering, Reading Comprehension
Published 2019-11-01
URL https://www.aclweb.org/anthology/D19-5803/
PDF https://www.aclweb.org/anthology/D19-5803
PWC https://paperswithcode.com/paper/calor-quest-generating-a-training-corpus-for
Repo
Framework

Quasi-Unsupervised Color Constancy

Title Quasi-Unsupervised Color Constancy
Authors Simone Bianco, Claudio Cusano
Abstract We present here a method for computational color constancy in which a deep convolutional neural network is trained to detect achromatic pixels in color images after they have been converted to grayscale. The method does not require any information about the illuminant in the scene and relies on the weak assumption, fulfilled by almost all images available on the web, that training images have been approximately balanced. Because of this requirement we define our method as quasi-unsupervised. After training, unbalanced images can be processed thanks to the preliminary conversion to grayscale of the input to the neural network. The results of an extensive experimentation demonstrate that the proposed method is able to outperform the other unsupervised methods in the state of the art being, at the same time, flexible enough to be supervisedly fine-tuned to reach performance comparable with those of the best supervised methods.
Tasks Color Constancy
Published 2019-06-01
URL http://openaccess.thecvf.com/content_CVPR_2019/html/Bianco_Quasi-Unsupervised_Color_Constancy_CVPR_2019_paper.html
PDF http://openaccess.thecvf.com/content_CVPR_2019/papers/Bianco_Quasi-Unsupervised_Color_Constancy_CVPR_2019_paper.pdf
PWC https://paperswithcode.com/paper/quasi-unsupervised-color-constancy
Repo
Framework

Mawdoo3 AI at MADAR Shared Task: Arabic Tweet Dialect Identification

Title Mawdoo3 AI at MADAR Shared Task: Arabic Tweet Dialect Identification
Authors Bashar Talafha, Wael Farhan, Ahmed Altakrouri, Hussein Al-Natsheh
Abstract Arabic dialect identification is an inherently complex problem, as Arabic dialect taxonomy is convoluted and aims to dissect a continuous space rather than a discrete one. In this work, we present machine and deep learning approaches to predict 21 fine-grained dialects form a set of given tweets per user. We adopted numerous feature extraction methods most of which showed improvement in the final model, such as word embedding, Tf-idf, and other tweet features. Our results show that a simple LinearSVC can outperform any complex deep learning model given a set of curated features. With a relatively complex user voting mechanism, we were able to achieve a Macro-Averaged F1-score of 71.84{%} on MADAR shared subtask-2. Our best submitted model ranked second out of all participating teams.
Tasks
Published 2019-08-01
URL https://www.aclweb.org/anthology/W19-4629/
PDF https://www.aclweb.org/anthology/W19-4629
PWC https://paperswithcode.com/paper/mawdoo3-ai-at-madar-shared-task-arabic-tweet
Repo
Framework

Memory Graph Networks for Explainable Memory-grounded Question Answering

Title Memory Graph Networks for Explainable Memory-grounded Question Answering
Authors Seungwhan Moon, Pararth Shah, Anuj Kumar, Rajen Subba
Abstract We introduce Episodic Memory QA, the task of answering personal user questions grounded on memory graph (MG), where episodic memories and related entity nodes are connected via relational edges. We create a new benchmark dataset first by generating synthetic memory graphs with simulated attributes, and by composing 100K QA pairs for the generated MG with bootstrapped scripts. To address the unique challenges for the proposed task, we propose Memory Graph Networks (MGN), a novel extension of memory networks to enable dynamic expansion of memory slots through graph traversals, thus able to answer queries in which contexts from multiple linked episodes and external knowledge are required. We then propose the Episodic Memory QA Net with multiple module networks to effectively handle various question types. Empirical results show improvement over the QA baselines in top-k answer prediction accuracy in the proposed task. The proposed model also generates a graph walk path and attention vectors for each predicted answer, providing a natural way to explain its QA reasoning.
Tasks Question Answering
Published 2019-11-01
URL https://www.aclweb.org/anthology/K19-1068/
PDF https://www.aclweb.org/anthology/K19-1068
PWC https://paperswithcode.com/paper/memory-graph-networks-for-explainable-memory
Repo
Framework

Adaptively Scheduled Multitask Learning: The Case of Low-Resource Neural Machine Translation

Title Adaptively Scheduled Multitask Learning: The Case of Low-Resource Neural Machine Translation
Authors Poorya Zaremoodi, Gholamreza Haffari
Abstract Neural Machine Translation (NMT), a data-hungry technology, suffers from the lack of bilingual data in low-resource scenarios. Multitask learning (MTL) can alleviate this issue by injecting inductive biases into NMT, using auxiliary syntactic and semantic tasks. However, an effective \textit{training schedule} is required to balance the importance of tasks to get the best use of the training signal. The role of training schedule becomes even more crucial in \textit{biased-MTL} where the goal is to improve one (or a subset) of tasks the most, e.g. translation quality. Current approaches for biased-MTL are based on brittle \textit{hand-engineered} heuristics that require trial and error, and should be (re-)designed for each learning scenario. To the best of our knowledge, ours is the first work on \textit{adaptively} and \textit{dynamically} changing the training schedule in biased-MTL. We propose a rigorous approach for automatically reweighing the training data of the main and auxiliary tasks throughout the training process based on their contributions to the generalisability of the main NMT task. Our experiments on translating from English to Vietnamese/Turkish/Spanish show improvements of up to +1.2 BLEU points, compared to strong baselines. Additionally, our analyses shed light on the dynamic of needs throughout the training of NMT: from syntax to semantic.
Tasks Low-Resource Neural Machine Translation, Machine Translation
Published 2019-11-01
URL https://www.aclweb.org/anthology/D19-5618/
PDF https://www.aclweb.org/anthology/D19-5618
PWC https://paperswithcode.com/paper/adaptively-scheduled-multitask-learning-the
Repo
Framework

Exploiting Multilingualism through Multistage Fine-Tuning for Low-Resource Neural Machine Translation

Title Exploiting Multilingualism through Multistage Fine-Tuning for Low-Resource Neural Machine Translation
Authors Raj Dabre, Atsushi Fujita, Chenhui Chu
Abstract This paper highlights the impressive utility of multi-parallel corpora for transfer learning in a one-to-many low-resource neural machine translation (NMT) setting. We report on a systematic comparison of multistage fine-tuning configurations, consisting of (1) pre-training on an external large (209k{–}440k) parallel corpus for English and a helping target language, (2) mixed pre-training or fine-tuning on a mixture of the external and low-resource (18k) target parallel corpora, and (3) pure fine-tuning on the target parallel corpora. Our experiments confirm that multi-parallel corpora are extremely useful despite their scarcity and content-wise redundancy thus exhibiting the true power of multilingualism. Even when the helping target language is not one of the target languages of our concern, our multistage fine-tuning can give 3{–}9 BLEU score gains over a simple one-to-one model.
Tasks Low-Resource Neural Machine Translation, Machine Translation, Transfer Learning
Published 2019-11-01
URL https://www.aclweb.org/anthology/D19-1146/
PDF https://www.aclweb.org/anthology/D19-1146
PWC https://paperswithcode.com/paper/exploiting-multilingualism-through-multistage
Repo
Framework

Bilingual Low-Resource Neural Machine Translation with Round-Tripping: The Case of Persian-Spanish

Title Bilingual Low-Resource Neural Machine Translation with Round-Tripping: The Case of Persian-Spanish
Authors Benyamin Ahmadnia, Bonnie Dorr
Abstract The quality of Neural Machine Translation (NMT), as a data-driven approach, massively depends on quantity, quality, and relevance of the training dataset. Such approaches have achieved promising results for bilingually high-resource scenarios but are inadequate for low-resource conditions. This paper describes a round-trip training approach to bilingual low-resource NMT that takes advantage of monolingual datasets to address training data scarcity, thus augmenting translation quality. We conduct detailed experiments on Persian-Spanish as a bilingually low-resource scenario. Experimental results demonstrate that this competitive approach outperforms the baselines.
Tasks Low-Resource Neural Machine Translation, Machine Translation
Published 2019-09-01
URL https://www.aclweb.org/anthology/R19-1003/
PDF https://www.aclweb.org/anthology/R19-1003
PWC https://paperswithcode.com/paper/bilingual-low-resource-neural-machine
Repo
Framework
comments powered by Disqus