Paper Group NANR 130
Automated Identification of Verbally Abusive Behaviors in Online Discussions. Subspace Structure-Aware Spectral Clustering for Robust Subspace Clustering. Vaijayant=\iko'sa Knowledge-Net. Preemptive Toxic Language Detection in Wikipedia Comments Using Thread-Level Context. Assessing Arabic Weblog Credibility via Deep Co-learning. Chains-of-Reason …
Automated Identification of Verbally Abusive Behaviors in Online Discussions
Title | Automated Identification of Verbally Abusive Behaviors in Online Discussions |
Authors | Srecko Joksimovic, Ryan S. Baker, Jaclyn Ocumpaugh, Juan Miguel L. Andres, Ivan Tot, Elle Yuan Wang, Shane Dawson |
Abstract | Discussion forum participation represents one of the crucial factors for learning and often the only way of supporting social interactions in online settings. However, as much as sharing new ideas or asking thoughtful questions contributes learning, verbally abusive behaviors, such as expressing negative emotions in online discussions, could have disproportionate detrimental effects. To provide means for mitigating the potential negative effects on course participation and learning, we developed an automated classifier for identifying communication that show linguistic patterns associated with hostility in online forums. In so doing, we employ several well-established automated text analysis tools and build on the common practices for handling highly imbalanced datasets and reducing the sensitivity to overfitting. Although still in its infancy, our approach shows promising results (ROC AUC .73) towards establishing a robust detector of abusive behaviors. We further provide an overview of the classification (linguistic and contextual) features most indicative of online aggression. |
Tasks | |
Published | 2019-08-01 |
URL | https://www.aclweb.org/anthology/W19-3505/ |
https://www.aclweb.org/anthology/W19-3505 | |
PWC | https://paperswithcode.com/paper/automated-identification-of-verbally-abusive |
Repo | |
Framework | |
Subspace Structure-Aware Spectral Clustering for Robust Subspace Clustering
Title | Subspace Structure-Aware Spectral Clustering for Robust Subspace Clustering |
Authors | Masataka Yamaguchi, Go Irie, Takahito Kawanishi, Kunio Kashino |
Abstract | Subspace clustering is the problem of partitioning data drawn from a union of multiple subspaces. The most popular subspace clustering framework in recent years is the graph clustering-based approach, which performs subspace clustering in two steps: graph construction and graph clustering. Although both steps are equally important for accurate clustering, the vast majority of work has focused on improving the graph construction step rather than the graph clustering step. In this paper, we propose a novel graph clustering framework for robust subspace clustering. By incorporating a geometry-aware term with the spectral clustering objective, we encourage our framework to be robust to noise and outliers in given affinity matrices. We also develop an efficient expectation-maximization-based algorithm for optimization. Through extensive experiments on four real-world datasets, we demonstrate that the proposed method outperforms existing methods. |
Tasks | Graph Clustering, graph construction |
Published | 2019-10-01 |
URL | http://openaccess.thecvf.com/content_ICCV_2019/html/Yamaguchi_Subspace_Structure-Aware_Spectral_Clustering_for_Robust_Subspace_Clustering_ICCV_2019_paper.html |
http://openaccess.thecvf.com/content_ICCV_2019/papers/Yamaguchi_Subspace_Structure-Aware_Spectral_Clustering_for_Robust_Subspace_Clustering_ICCV_2019_paper.pdf | |
PWC | https://paperswithcode.com/paper/subspace-structure-aware-spectral-clustering |
Repo | |
Framework | |
Vaijayant=\iko'sa Knowledge-Net
Title | Vaijayant=\iko'sa Knowledge-Net |
Authors | Aruna Vayuvegula, Satish Kanugovi, Sivaja S Nair, Shivani V |
Abstract | |
Tasks | |
Published | 2019-10-01 |
URL | https://www.aclweb.org/anthology/W19-7510/ |
https://www.aclweb.org/anthology/W19-7510 | |
PWC | https://paperswithcode.com/paper/vaijayantikosa-knowledge-net |
Repo | |
Framework | |
Preemptive Toxic Language Detection in Wikipedia Comments Using Thread-Level Context
Title | Preemptive Toxic Language Detection in Wikipedia Comments Using Thread-Level Context |
Authors | Mladen Karan, Jan {\v{S}}najder |
Abstract | We address the task of automatically detecting toxic content in user generated texts. We fo cus on exploring the potential for preemptive moderation, i.e., predicting whether a particular conversation thread will, in the future, incite a toxic comment. Moreover, we perform preliminary investigation of whether a model that jointly considers all comments in a conversation thread outperforms a model that considers only individual comments. Using an existing dataset of conversations among Wikipedia contributors as a starting point, we compile a new large-scale dataset for this task consisting of labeled comments and comments from their conversation threads. |
Tasks | |
Published | 2019-08-01 |
URL | https://www.aclweb.org/anthology/W19-3514/ |
https://www.aclweb.org/anthology/W19-3514 | |
PWC | https://paperswithcode.com/paper/preemptive-toxic-language-detection-in |
Repo | |
Framework | |
Assessing Arabic Weblog Credibility via Deep Co-learning
Title | Assessing Arabic Weblog Credibility via Deep Co-learning |
Authors | Chadi Helwe, Shady Elbassuoni, Ayman Al Zaatari, Wassim El-Hajj |
Abstract | Assessing the credibility of online content has garnered a lot of attention lately. We focus on one such type of online content, namely weblogs or blogs for short. Some recent work attempted the task of automatically assessing the credibility of blogs, typically via machine learning. However, in the case of Arabic blogs, there are hardly any datasets available that can be used to train robust machine learning models for this difficult task. To overcome the lack of sufficient training data, we propose deep co-learning, a semi-supervised end-to-end deep learning approach to assess the credibility of Arabic blogs. In deep co-learning, multiple weak deep neural network classifiers are trained using a small labeled dataset, and each using a different view of the data. Each one of these classifiers is then used to classify unlabeled data, and its prediction is used to train the other classifiers in a semi-supervised fashion. We evaluate our deep co-learning approach on an Arabic blogs dataset, and we report significant improvements in performance compared to many baselines including fully-supervised deep learning models as well as ensemble models. |
Tasks | |
Published | 2019-08-01 |
URL | https://www.aclweb.org/anthology/W19-4614/ |
https://www.aclweb.org/anthology/W19-4614 | |
PWC | https://paperswithcode.com/paper/assessing-arabic-weblog-credibility-via-deep |
Repo | |
Framework | |
Chains-of-Reasoning at TextGraphs 2019 Shared Task: Reasoning over Chains of Facts for Explainable Multi-hop Inference
Title | Chains-of-Reasoning at TextGraphs 2019 Shared Task: Reasoning over Chains of Facts for Explainable Multi-hop Inference |
Authors | Rajarshi Das, Ameya Godbole, Manzil Zaheer, Shehzaad Dhuliawala, Andrew McCallum |
Abstract | This paper describes our submission to the shared task on {``}Multi-hop Inference Explanation Regeneration{''} in TextGraphs workshop at EMNLP 2019 (Jansen and Ustalov, 2019). Our system identifies chains of facts relevant to explain an answer to an elementary science examination question. To counter the problem of {}spurious chains{'} leading to { }semantic drifts{'}, we train a ranker that uses contextualized representation of facts to score its relevance for explaining an answer to a question. Our system was ranked first w.r.t the mean average precision (MAP) metric outperforming the second best system by 14.95 points. | |
Tasks | |
Published | 2019-11-01 |
URL | https://www.aclweb.org/anthology/D19-5313/ |
https://www.aclweb.org/anthology/D19-5313 | |
PWC | https://paperswithcode.com/paper/chains-of-reasoning-at-textgraphs-2019-shared |
Repo | |
Framework | |
Minimally-Augmented Grammatical Error Correction
Title | Minimally-Augmented Grammatical Error Correction |
Authors | Roman Grundkiewicz, Marcin Junczys-Dowmunt |
Abstract | There has been an increased interest in low-resource approaches to automatic grammatical error correction. We introduce Minimally-Augmented Grammatical Error Correction (MAGEC) that does not require any error-labelled data. Our unsupervised approach is based on a simple but effective synthetic error generation method based on confusion sets from inverted spell-checkers. In low-resource settings, we outperform the current state-of-the-art results for German and Russian GEC tasks by a large margin without using any real error-annotated training data. When combined with labelled data, our method can serve as an efficient pre-training technique |
Tasks | Grammatical Error Correction |
Published | 2019-11-01 |
URL | https://www.aclweb.org/anthology/D19-5546/ |
https://www.aclweb.org/anthology/D19-5546 | |
PWC | https://paperswithcode.com/paper/minimally-augmented-grammatical-error |
Repo | |
Framework | |
Deep Parametric Indoor Lighting Estimation
Title | Deep Parametric Indoor Lighting Estimation |
Authors | Marc-Andre Gardner, Yannick Hold-Geoffroy, Kalyan Sunkavalli, Christian Gagne, Jean-Francois Lalonde |
Abstract | We present a method to estimate lighting from a single image of an indoor scene. Previous work has used an environment map representation that does not account for the localized nature of indoor lighting. Instead, we represent lighting as a set of discrete 3D lights with geometric and photometric parameters. We train a deep neural network to regress these parameters from a single image, on a dataset of environment maps annotated with depth. We propose a differentiable layer to convert these parameters to an environment map to compute our loss; this bypasses the challenge of establishing correspondences between estimated and ground truth lights. We demonstrate, via quantitative and qualitative evaluations, that our representation and training scheme lead to more accurate results compared to previous work, while allowing for more realistic 3D object compositing with spatially-varying lighting. |
Tasks | |
Published | 2019-10-01 |
URL | http://openaccess.thecvf.com/content_ICCV_2019/html/Gardner_Deep_Parametric_Indoor_Lighting_Estimation_ICCV_2019_paper.html |
http://openaccess.thecvf.com/content_ICCV_2019/papers/Gardner_Deep_Parametric_Indoor_Lighting_Estimation_ICCV_2019_paper.pdf | |
PWC | https://paperswithcode.com/paper/deep-parametric-indoor-lighting-estimation-1 |
Repo | |
Framework | |
CALOR-QUEST : generating a training corpus for Machine Reading Comprehension models from shallow semantic annotations
Title | CALOR-QUEST : generating a training corpus for Machine Reading Comprehension models from shallow semantic annotations |
Authors | Frederic Bechet, Cindy Aloui, Delphine Charlet, Geraldine Damnati, Johannes Heinecke, Alexis Nasr, Frederic Herledan |
Abstract | Machine reading comprehension is a task related to Question-Answering where questions are not generic in scope but are related to a particular document. Recently very large corpora (SQuAD, MS MARCO) containing triplets (document, question, answer) were made available to the scientific community to develop supervised methods based on deep neural networks with promising results. These methods need very large training corpus to be efficient, however such kind of data only exists for English and Chinese at the moment. The aim of this study is the development of such resources for other languages by proposing to generate in a semi-automatic way questions from the semantic Frame analysis of large corpora. The collect of natural questions is reduced to a validation/test set. We applied this method on the CALOR-Frame French corpus to develop the CALOR-QUEST resource presented in this paper. |
Tasks | Machine Reading Comprehension, Question Answering, Reading Comprehension |
Published | 2019-11-01 |
URL | https://www.aclweb.org/anthology/D19-5803/ |
https://www.aclweb.org/anthology/D19-5803 | |
PWC | https://paperswithcode.com/paper/calor-quest-generating-a-training-corpus-for |
Repo | |
Framework | |
Quasi-Unsupervised Color Constancy
Title | Quasi-Unsupervised Color Constancy |
Authors | Simone Bianco, Claudio Cusano |
Abstract | We present here a method for computational color constancy in which a deep convolutional neural network is trained to detect achromatic pixels in color images after they have been converted to grayscale. The method does not require any information about the illuminant in the scene and relies on the weak assumption, fulfilled by almost all images available on the web, that training images have been approximately balanced. Because of this requirement we define our method as quasi-unsupervised. After training, unbalanced images can be processed thanks to the preliminary conversion to grayscale of the input to the neural network. The results of an extensive experimentation demonstrate that the proposed method is able to outperform the other unsupervised methods in the state of the art being, at the same time, flexible enough to be supervisedly fine-tuned to reach performance comparable with those of the best supervised methods. |
Tasks | Color Constancy |
Published | 2019-06-01 |
URL | http://openaccess.thecvf.com/content_CVPR_2019/html/Bianco_Quasi-Unsupervised_Color_Constancy_CVPR_2019_paper.html |
http://openaccess.thecvf.com/content_CVPR_2019/papers/Bianco_Quasi-Unsupervised_Color_Constancy_CVPR_2019_paper.pdf | |
PWC | https://paperswithcode.com/paper/quasi-unsupervised-color-constancy |
Repo | |
Framework | |
Mawdoo3 AI at MADAR Shared Task: Arabic Tweet Dialect Identification
Title | Mawdoo3 AI at MADAR Shared Task: Arabic Tweet Dialect Identification |
Authors | Bashar Talafha, Wael Farhan, Ahmed Altakrouri, Hussein Al-Natsheh |
Abstract | Arabic dialect identification is an inherently complex problem, as Arabic dialect taxonomy is convoluted and aims to dissect a continuous space rather than a discrete one. In this work, we present machine and deep learning approaches to predict 21 fine-grained dialects form a set of given tweets per user. We adopted numerous feature extraction methods most of which showed improvement in the final model, such as word embedding, Tf-idf, and other tweet features. Our results show that a simple LinearSVC can outperform any complex deep learning model given a set of curated features. With a relatively complex user voting mechanism, we were able to achieve a Macro-Averaged F1-score of 71.84{%} on MADAR shared subtask-2. Our best submitted model ranked second out of all participating teams. |
Tasks | |
Published | 2019-08-01 |
URL | https://www.aclweb.org/anthology/W19-4629/ |
https://www.aclweb.org/anthology/W19-4629 | |
PWC | https://paperswithcode.com/paper/mawdoo3-ai-at-madar-shared-task-arabic-tweet |
Repo | |
Framework | |
Memory Graph Networks for Explainable Memory-grounded Question Answering
Title | Memory Graph Networks for Explainable Memory-grounded Question Answering |
Authors | Seungwhan Moon, Pararth Shah, Anuj Kumar, Rajen Subba |
Abstract | We introduce Episodic Memory QA, the task of answering personal user questions grounded on memory graph (MG), where episodic memories and related entity nodes are connected via relational edges. We create a new benchmark dataset first by generating synthetic memory graphs with simulated attributes, and by composing 100K QA pairs for the generated MG with bootstrapped scripts. To address the unique challenges for the proposed task, we propose Memory Graph Networks (MGN), a novel extension of memory networks to enable dynamic expansion of memory slots through graph traversals, thus able to answer queries in which contexts from multiple linked episodes and external knowledge are required. We then propose the Episodic Memory QA Net with multiple module networks to effectively handle various question types. Empirical results show improvement over the QA baselines in top-k answer prediction accuracy in the proposed task. The proposed model also generates a graph walk path and attention vectors for each predicted answer, providing a natural way to explain its QA reasoning. |
Tasks | Question Answering |
Published | 2019-11-01 |
URL | https://www.aclweb.org/anthology/K19-1068/ |
https://www.aclweb.org/anthology/K19-1068 | |
PWC | https://paperswithcode.com/paper/memory-graph-networks-for-explainable-memory |
Repo | |
Framework | |
Adaptively Scheduled Multitask Learning: The Case of Low-Resource Neural Machine Translation
Title | Adaptively Scheduled Multitask Learning: The Case of Low-Resource Neural Machine Translation |
Authors | Poorya Zaremoodi, Gholamreza Haffari |
Abstract | Neural Machine Translation (NMT), a data-hungry technology, suffers from the lack of bilingual data in low-resource scenarios. Multitask learning (MTL) can alleviate this issue by injecting inductive biases into NMT, using auxiliary syntactic and semantic tasks. However, an effective \textit{training schedule} is required to balance the importance of tasks to get the best use of the training signal. The role of training schedule becomes even more crucial in \textit{biased-MTL} where the goal is to improve one (or a subset) of tasks the most, e.g. translation quality. Current approaches for biased-MTL are based on brittle \textit{hand-engineered} heuristics that require trial and error, and should be (re-)designed for each learning scenario. To the best of our knowledge, ours is the first work on \textit{adaptively} and \textit{dynamically} changing the training schedule in biased-MTL. We propose a rigorous approach for automatically reweighing the training data of the main and auxiliary tasks throughout the training process based on their contributions to the generalisability of the main NMT task. Our experiments on translating from English to Vietnamese/Turkish/Spanish show improvements of up to +1.2 BLEU points, compared to strong baselines. Additionally, our analyses shed light on the dynamic of needs throughout the training of NMT: from syntax to semantic. |
Tasks | Low-Resource Neural Machine Translation, Machine Translation |
Published | 2019-11-01 |
URL | https://www.aclweb.org/anthology/D19-5618/ |
https://www.aclweb.org/anthology/D19-5618 | |
PWC | https://paperswithcode.com/paper/adaptively-scheduled-multitask-learning-the |
Repo | |
Framework | |
Exploiting Multilingualism through Multistage Fine-Tuning for Low-Resource Neural Machine Translation
Title | Exploiting Multilingualism through Multistage Fine-Tuning for Low-Resource Neural Machine Translation |
Authors | Raj Dabre, Atsushi Fujita, Chenhui Chu |
Abstract | This paper highlights the impressive utility of multi-parallel corpora for transfer learning in a one-to-many low-resource neural machine translation (NMT) setting. We report on a systematic comparison of multistage fine-tuning configurations, consisting of (1) pre-training on an external large (209k{–}440k) parallel corpus for English and a helping target language, (2) mixed pre-training or fine-tuning on a mixture of the external and low-resource (18k) target parallel corpora, and (3) pure fine-tuning on the target parallel corpora. Our experiments confirm that multi-parallel corpora are extremely useful despite their scarcity and content-wise redundancy thus exhibiting the true power of multilingualism. Even when the helping target language is not one of the target languages of our concern, our multistage fine-tuning can give 3{–}9 BLEU score gains over a simple one-to-one model. |
Tasks | Low-Resource Neural Machine Translation, Machine Translation, Transfer Learning |
Published | 2019-11-01 |
URL | https://www.aclweb.org/anthology/D19-1146/ |
https://www.aclweb.org/anthology/D19-1146 | |
PWC | https://paperswithcode.com/paper/exploiting-multilingualism-through-multistage |
Repo | |
Framework | |
Bilingual Low-Resource Neural Machine Translation with Round-Tripping: The Case of Persian-Spanish
Title | Bilingual Low-Resource Neural Machine Translation with Round-Tripping: The Case of Persian-Spanish |
Authors | Benyamin Ahmadnia, Bonnie Dorr |
Abstract | The quality of Neural Machine Translation (NMT), as a data-driven approach, massively depends on quantity, quality, and relevance of the training dataset. Such approaches have achieved promising results for bilingually high-resource scenarios but are inadequate for low-resource conditions. This paper describes a round-trip training approach to bilingual low-resource NMT that takes advantage of monolingual datasets to address training data scarcity, thus augmenting translation quality. We conduct detailed experiments on Persian-Spanish as a bilingually low-resource scenario. Experimental results demonstrate that this competitive approach outperforms the baselines. |
Tasks | Low-Resource Neural Machine Translation, Machine Translation |
Published | 2019-09-01 |
URL | https://www.aclweb.org/anthology/R19-1003/ |
https://www.aclweb.org/anthology/R19-1003 | |
PWC | https://paperswithcode.com/paper/bilingual-low-resource-neural-machine |
Repo | |
Framework | |