January 24, 2020

2527 words 12 mins read

Paper Group NANR 130

Automated Identification of Verbally Abusive Behaviors in Online Discussions. Subspace Structure-Aware Spectral Clustering for Robust Subspace Clustering. Vaijayant=\iko'sa Knowledge-Net. Preemptive Toxic Language Detection in Wikipedia Comments Using Thread-Level Context. Assessing Arabic Weblog Credibility via Deep Co-learning. Chains-of-Reason …

Automated Identification of Verbally Abusive Behaviors in Online Discussions


Title	Automated Identification of Verbally Abusive Behaviors in Online Discussions
Authors	Srecko Joksimovic, Ryan S. Baker, Jaclyn Ocumpaugh, Juan Miguel L. Andres, Ivan Tot, Elle Yuan Wang, Shane Dawson
Abstract	Discussion forum participation represents one of the crucial factors for learning and often the only way of supporting social interactions in online settings. However, as much as sharing new ideas or asking thoughtful questions contributes learning, verbally abusive behaviors, such as expressing negative emotions in online discussions, could have disproportionate detrimental effects. To provide means for mitigating the potential negative effects on course participation and learning, we developed an automated classifier for identifying communication that show linguistic patterns associated with hostility in online forums. In so doing, we employ several well-established automated text analysis tools and build on the common practices for handling highly imbalanced datasets and reducing the sensitivity to overfitting. Although still in its infancy, our approach shows promising results (ROC AUC .73) towards establishing a robust detector of abusive behaviors. We further provide an overview of the classification (linguistic and contextual) features most indicative of online aggression.
Tasks
Published	2019-08-01
URL	https://www.aclweb.org/anthology/W19-3505/
PDF	https://www.aclweb.org/anthology/W19-3505
PWC	https://paperswithcode.com/paper/automated-identification-of-verbally-abusive
Repo
Framework

Subspace Structure-Aware Spectral Clustering for Robust Subspace Clustering


Title	Subspace Structure-Aware Spectral Clustering for Robust Subspace Clustering
Authors	Masataka Yamaguchi, Go Irie, Takahito Kawanishi, Kunio Kashino
Abstract	Subspace clustering is the problem of partitioning data drawn from a union of multiple subspaces. The most popular subspace clustering framework in recent years is the graph clustering-based approach, which performs subspace clustering in two steps: graph construction and graph clustering. Although both steps are equally important for accurate clustering, the vast majority of work has focused on improving the graph construction step rather than the graph clustering step. In this paper, we propose a novel graph clustering framework for robust subspace clustering. By incorporating a geometry-aware term with the spectral clustering objective, we encourage our framework to be robust to noise and outliers in given affinity matrices. We also develop an efficient expectation-maximization-based algorithm for optimization. Through extensive experiments on four real-world datasets, we demonstrate that the proposed method outperforms existing methods.
Tasks	Graph Clustering, graph construction
Published	2019-10-01
URL	http://openaccess.thecvf.com/content_ICCV_2019/html/Yamaguchi_Subspace_Structure-Aware_Spectral_Clustering_for_Robust_Subspace_Clustering_ICCV_2019_paper.html
PDF	http://openaccess.thecvf.com/content_ICCV_2019/papers/Yamaguchi_Subspace_Structure-Aware_Spectral_Clustering_for_Robust_Subspace_Clustering_ICCV_2019_paper.pdf
PWC	https://paperswithcode.com/paper/subspace-structure-aware-spectral-clustering
Repo
Framework

Vaijayant=\iko'sa Knowledge-Net


Title	Vaijayant=\iko'sa Knowledge-Net
Authors	Aruna Vayuvegula, Satish Kanugovi, Sivaja S Nair, Shivani V
Abstract
Tasks
Published	2019-10-01
URL	https://www.aclweb.org/anthology/W19-7510/
PDF	https://www.aclweb.org/anthology/W19-7510
PWC	https://paperswithcode.com/paper/vaijayantikosa-knowledge-net
Repo
Framework

Preemptive Toxic Language Detection in Wikipedia Comments Using Thread-Level Context


Title	Preemptive Toxic Language Detection in Wikipedia Comments Using Thread-Level Context
Authors	Mladen Karan, Jan {\v{S}}najder
Abstract	We address the task of automatically detecting toxic content in user generated texts. We fo cus on exploring the potential for preemptive moderation, i.e., predicting whether a particular conversation thread will, in the future, incite a toxic comment. Moreover, we perform preliminary investigation of whether a model that jointly considers all comments in a conversation thread outperforms a model that considers only individual comments. Using an existing dataset of conversations among Wikipedia contributors as a starting point, we compile a new large-scale dataset for this task consisting of labeled comments and comments from their conversation threads.
Tasks
Published	2019-08-01
URL	https://www.aclweb.org/anthology/W19-3514/
PDF	https://www.aclweb.org/anthology/W19-3514
PWC	https://paperswithcode.com/paper/preemptive-toxic-language-detection-in
Repo
Framework

Assessing Arabic Weblog Credibility via Deep Co-learning


Title	Assessing Arabic Weblog Credibility via Deep Co-learning
Authors	Chadi Helwe, Shady Elbassuoni, Ayman Al Zaatari, Wassim El-Hajj
Abstract	Assessing the credibility of online content has garnered a lot of attention lately. We focus on one such type of online content, namely weblogs or blogs for short. Some recent work attempted the task of automatically assessing the credibility of blogs, typically via machine learning. However, in the case of Arabic blogs, there are hardly any datasets available that can be used to train robust machine learning models for this difficult task. To overcome the lack of sufficient training data, we propose deep co-learning, a semi-supervised end-to-end deep learning approach to assess the credibility of Arabic blogs. In deep co-learning, multiple weak deep neural network classifiers are trained using a small labeled dataset, and each using a different view of the data. Each one of these classifiers is then used to classify unlabeled data, and its prediction is used to train the other classifiers in a semi-supervised fashion. We evaluate our deep co-learning approach on an Arabic blogs dataset, and we report significant improvements in performance compared to many baselines including fully-supervised deep learning models as well as ensemble models.
Tasks
Published	2019-08-01
URL	https://www.aclweb.org/anthology/W19-4614/
PDF	https://www.aclweb.org/anthology/W19-4614
PWC	https://paperswithcode.com/paper/assessing-arabic-weblog-credibility-via-deep
Repo
Framework

Chains-of-Reasoning at TextGraphs 2019 Shared Task: Reasoning over Chains of Facts for Explainable Multi-hop Inference


Title	Chains-of-Reasoning at TextGraphs 2019 Shared Task: Reasoning over Chains of Facts for Explainable Multi-hop Inference
Authors	Rajarshi Das, Ameya Godbole, Manzil Zaheer, Shehzaad Dhuliawala, Andrew McCallum
Abstract	This paper describes our submission to the shared task on {``}Multi-hop Inference Explanation Regeneration{''} in TextGraphs workshop at EMNLP 2019 (Jansen and Ustalov, 2019). Our system identifies chains of facts relevant to explain an answer to an elementary science examination question. To counter the problem of {`}spurious chains{'} leading to {`}semantic drifts{'}, we train a ranker that uses contextualized representation of facts to score its relevance for explaining an answer to a question. Our system was ranked first w.r.t the mean average precision (MAP) metric outperforming the second best system by 14.95 points. \|
Tasks
Published	2019-11-01
URL	https://www.aclweb.org/anthology/D19-5313/
PDF	https://www.aclweb.org/anthology/D19-5313
PWC	https://paperswithcode.com/paper/chains-of-reasoning-at-textgraphs-2019-shared
Repo
Framework

Minimally-Augmented Grammatical Error Correction


Title	Minimally-Augmented Grammatical Error Correction
Authors	Roman Grundkiewicz, Marcin Junczys-Dowmunt
Abstract	There has been an increased interest in low-resource approaches to automatic grammatical error correction. We introduce Minimally-Augmented Grammatical Error Correction (MAGEC) that does not require any error-labelled data. Our unsupervised approach is based on a simple but effective synthetic error generation method based on confusion sets from inverted spell-checkers. In low-resource settings, we outperform the current state-of-the-art results for German and Russian GEC tasks by a large margin without using any real error-annotated training data. When combined with labelled data, our method can serve as an efficient pre-training technique
Tasks	Grammatical Error Correction
Published	2019-11-01
URL	https://www.aclweb.org/anthology/D19-5546/
PDF	https://www.aclweb.org/anthology/D19-5546
PWC	https://paperswithcode.com/paper/minimally-augmented-grammatical-error
Repo
Framework

Deep Parametric Indoor Lighting Estimation


Title	Deep Parametric Indoor Lighting Estimation
Authors	Marc-Andre Gardner, Yannick Hold-Geoffroy, Kalyan Sunkavalli, Christian Gagne, Jean-Francois Lalonde
Abstract	We present a method to estimate lighting from a single image of an indoor scene. Previous work has used an environment map representation that does not account for the localized nature of indoor lighting. Instead, we represent lighting as a set of discrete 3D lights with geometric and photometric parameters. We train a deep neural network to regress these parameters from a single image, on a dataset of environment maps annotated with depth. We propose a differentiable layer to convert these parameters to an environment map to compute our loss; this bypasses the challenge of establishing correspondences between estimated and ground truth lights. We demonstrate, via quantitative and qualitative evaluations, that our representation and training scheme lead to more accurate results compared to previous work, while allowing for more realistic 3D object compositing with spatially-varying lighting.
Tasks
Published	2019-10-01
URL	http://openaccess.thecvf.com/content_ICCV_2019/html/Gardner_Deep_Parametric_Indoor_Lighting_Estimation_ICCV_2019_paper.html
PDF	http://openaccess.thecvf.com/content_ICCV_2019/papers/Gardner_Deep_Parametric_Indoor_Lighting_Estimation_ICCV_2019_paper.pdf
PWC	https://paperswithcode.com/paper/deep-parametric-indoor-lighting-estimation-1
Repo
Framework

CALOR-QUEST : generating a training corpus for Machine Reading Comprehension models from shallow semantic annotations


Title	CALOR-QUEST : generating a training corpus for Machine Reading Comprehension models from shallow semantic annotations
Authors	Frederic Bechet, Cindy Aloui, Delphine Charlet, Geraldine Damnati, Johannes Heinecke, Alexis Nasr, Frederic Herledan
Abstract	Machine reading comprehension is a task related to Question-Answering where questions are not generic in scope but are related to a particular document. Recently very large corpora (SQuAD, MS MARCO) containing triplets (document, question, answer) were made available to the scientific community to develop supervised methods based on deep neural networks with promising results. These methods need very large training corpus to be efficient, however such kind of data only exists for English and Chinese at the moment. The aim of this study is the development of such resources for other languages by proposing to generate in a semi-automatic way questions from the semantic Frame analysis of large corpora. The collect of natural questions is reduced to a validation/test set. We applied this method on the CALOR-Frame French corpus to develop the CALOR-QUEST resource presented in this paper.
Tasks	Machine Reading Comprehension, Question Answering, Reading Comprehension
Published	2019-11-01
URL	https://www.aclweb.org/anthology/D19-5803/
PDF	https://www.aclweb.org/anthology/D19-5803
PWC	https://paperswithcode.com/paper/calor-quest-generating-a-training-corpus-for
Repo
Framework

Quasi-Unsupervised Color Constancy


Title	Quasi-Unsupervised Color Constancy
Authors	Simone Bianco, Claudio Cusano
Abstract	We present here a method for computational color constancy in which a deep convolutional neural network is trained to detect achromatic pixels in color images after they have been converted to grayscale. The method does not require any information about the illuminant in the scene and relies on the weak assumption, fulfilled by almost all images available on the web, that training images have been approximately balanced. Because of this requirement we define our method as quasi-unsupervised. After training, unbalanced images can be processed thanks to the preliminary conversion to grayscale of the input to the neural network. The results of an extensive experimentation demonstrate that the proposed method is able to outperform the other unsupervised methods in the state of the art being, at the same time, flexible enough to be supervisedly fine-tuned to reach performance comparable with those of the best supervised methods.
Tasks	Color Constancy
Published	2019-06-01
URL	http://openaccess.thecvf.com/content_CVPR_2019/html/Bianco_Quasi-Unsupervised_Color_Constancy_CVPR_2019_paper.html
PDF	http://openaccess.thecvf.com/content_CVPR_2019/papers/Bianco_Quasi-Unsupervised_Color_Constancy_CVPR_2019_paper.pdf
PWC	https://paperswithcode.com/paper/quasi-unsupervised-color-constancy
Repo
Framework

Mawdoo3 AI at MADAR Shared Task: Arabic Tweet Dialect Identification


Title	Mawdoo3 AI at MADAR Shared Task: Arabic Tweet Dialect Identification
Authors	Bashar Talafha, Wael Farhan, Ahmed Altakrouri, Hussein Al-Natsheh
Abstract	Arabic dialect identification is an inherently complex problem, as Arabic dialect taxonomy is convoluted and aims to dissect a continuous space rather than a discrete one. In this work, we present machine and deep learning approaches to predict 21 fine-grained dialects form a set of given tweets per user. We adopted numerous feature extraction methods most of which showed improvement in the final model, such as word embedding, Tf-idf, and other tweet features. Our results show that a simple LinearSVC can outperform any complex deep learning model given a set of curated features. With a relatively complex user voting mechanism, we were able to achieve a Macro-Averaged F1-score of 71.84{%} on MADAR shared subtask-2. Our best submitted model ranked second out of all participating teams.
Tasks
Published	2019-08-01
URL	https://www.aclweb.org/anthology/W19-4629/
PDF	https://www.aclweb.org/anthology/W19-4629
PWC	https://paperswithcode.com/paper/mawdoo3-ai-at-madar-shared-task-arabic-tweet
Repo
Framework

Memory Graph Networks for Explainable Memory-grounded Question Answering


Title	Memory Graph Networks for Explainable Memory-grounded Question Answering
Authors	Seungwhan Moon, Pararth Shah, Anuj Kumar, Rajen Subba
Abstract	We introduce Episodic Memory QA, the task of answering personal user questions grounded on memory graph (MG), where episodic memories and related entity nodes are connected via relational edges. We create a new benchmark dataset first by generating synthetic memory graphs with simulated attributes, and by composing 100K QA pairs for the generated MG with bootstrapped scripts. To address the unique challenges for the proposed task, we propose Memory Graph Networks (MGN), a novel extension of memory networks to enable dynamic expansion of memory slots through graph traversals, thus able to answer queries in which contexts from multiple linked episodes and external knowledge are required. We then propose the Episodic Memory QA Net with multiple module networks to effectively handle various question types. Empirical results show improvement over the QA baselines in top-k answer prediction accuracy in the proposed task. The proposed model also generates a graph walk path and attention vectors for each predicted answer, providing a natural way to explain its QA reasoning.
Tasks	Question Answering
Published	2019-11-01
URL	https://www.aclweb.org/anthology/K19-1068/
PDF	https://www.aclweb.org/anthology/K19-1068
PWC	https://paperswithcode.com/paper/memory-graph-networks-for-explainable-memory
Repo
Framework

Adaptively Scheduled Multitask Learning: The Case of Low-Resource Neural Machine Translation


Title	Adaptively Scheduled Multitask Learning: The Case of Low-Resource Neural Machine Translation
Authors	Poorya Zaremoodi, Gholamreza Haffari
Abstract	Neural Machine Translation (NMT), a data-hungry technology, suffers from the lack of bilingual data in low-resource scenarios. Multitask learning (MTL) can alleviate this issue by injecting inductive biases into NMT, using auxiliary syntactic and semantic tasks. However, an effective \textit{training schedule} is required to balance the importance of tasks to get the best use of the training signal. The role of training schedule becomes even more crucial in \textit{biased-MTL} where the goal is to improve one (or a subset) of tasks the most, e.g. translation quality. Current approaches for biased-MTL are based on brittle \textit{hand-engineered} heuristics that require trial and error, and should be (re-)designed for each learning scenario. To the best of our knowledge, ours is the first work on \textit{adaptively} and \textit{dynamically} changing the training schedule in biased-MTL. We propose a rigorous approach for automatically reweighing the training data of the main and auxiliary tasks throughout the training process based on their contributions to the generalisability of the main NMT task. Our experiments on translating from English to Vietnamese/Turkish/Spanish show improvements of up to +1.2 BLEU points, compared to strong baselines. Additionally, our analyses shed light on the dynamic of needs throughout the training of NMT: from syntax to semantic.
Tasks	Low-Resource Neural Machine Translation, Machine Translation
Published	2019-11-01
URL	https://www.aclweb.org/anthology/D19-5618/
PDF	https://www.aclweb.org/anthology/D19-5618
PWC	https://paperswithcode.com/paper/adaptively-scheduled-multitask-learning-the
Repo
Framework

Exploiting Multilingualism through Multistage Fine-Tuning for Low-Resource Neural Machine Translation


Title	Exploiting Multilingualism through Multistage Fine-Tuning for Low-Resource Neural Machine Translation
Authors	Raj Dabre, Atsushi Fujita, Chenhui Chu
Abstract	This paper highlights the impressive utility of multi-parallel corpora for transfer learning in a one-to-many low-resource neural machine translation (NMT) setting. We report on a systematic comparison of multistage fine-tuning configurations, consisting of (1) pre-training on an external large (209k{–}440k) parallel corpus for English and a helping target language, (2) mixed pre-training or fine-tuning on a mixture of the external and low-resource (18k) target parallel corpora, and (3) pure fine-tuning on the target parallel corpora. Our experiments confirm that multi-parallel corpora are extremely useful despite their scarcity and content-wise redundancy thus exhibiting the true power of multilingualism. Even when the helping target language is not one of the target languages of our concern, our multistage fine-tuning can give 3{–}9 BLEU score gains over a simple one-to-one model.
Tasks	Low-Resource Neural Machine Translation, Machine Translation, Transfer Learning
Published	2019-11-01
URL	https://www.aclweb.org/anthology/D19-1146/
PDF	https://www.aclweb.org/anthology/D19-1146
PWC	https://paperswithcode.com/paper/exploiting-multilingualism-through-multistage
Repo
Framework

Bilingual Low-Resource Neural Machine Translation with Round-Tripping: The Case of Persian-Spanish


Title	Bilingual Low-Resource Neural Machine Translation with Round-Tripping: The Case of Persian-Spanish
Authors	Benyamin Ahmadnia, Bonnie Dorr
Abstract	The quality of Neural Machine Translation (NMT), as a data-driven approach, massively depends on quantity, quality, and relevance of the training dataset. Such approaches have achieved promising results for bilingually high-resource scenarios but are inadequate for low-resource conditions. This paper describes a round-trip training approach to bilingual low-resource NMT that takes advantage of monolingual datasets to address training data scarcity, thus augmenting translation quality. We conduct detailed experiments on Persian-Spanish as a bilingually low-resource scenario. Experimental results demonstrate that this competitive approach outperforms the baselines.
Tasks	Low-Resource Neural Machine Translation, Machine Translation
Published	2019-09-01
URL	https://www.aclweb.org/anthology/R19-1003/
PDF	https://www.aclweb.org/anthology/R19-1003
PWC	https://paperswithcode.com/paper/bilingual-low-resource-neural-machine
Repo
Framework