July 26, 2019

1686 words 8 mins read

Paper Group NANR 112

Paper Group NANR 112

Ethical by Design: Ethics Best Practices for Natural Language Processing. Building Better Open-Source Tools to Support Fairness in Automated Scoring. Handling Multi-Sentence Queries in a Domain Independent Dialogue System. Integrating the Management of Personal Data Protection and Open Science with Research Ethics. Predicting User Activity Level In …

Ethical by Design: Ethics Best Practices for Natural Language Processing

Title Ethical by Design: Ethics Best Practices for Natural Language Processing
Authors Jochen L. Leidner, Vassilis Plachouras
Abstract Natural language processing (NLP) systems analyze and/or generate human language, typically on users{'} behalf. One natural and necessary question that needs to be addressed in this context, both in research projects and in production settings, is the question how ethical the work is, both regarding the process and its outcome. Towards this end, we articulate a set of issues, propose a set of best practices, notably a process featuring an ethics review board, and sketch and how they could be meaningfully applied. Our main argument is that ethical outcomes ought to be achieved by design, i.e. by following a process aligned by ethical values. We also offer some response options for those facing ethics issues. While a number of previous works exist that discuss ethical issues, in particular around big data and machine learning, to the authors{'} knowledge this is the first account of NLP and ethics from the perspective of a principled process.
Published 2017-04-01
URL https://www.aclweb.org/anthology/W17-1604/
PDF https://www.aclweb.org/anthology/W17-1604
PWC https://paperswithcode.com/paper/ethical-by-design-ethics-best-practices-for

Building Better Open-Source Tools to Support Fairness in Automated Scoring

Title Building Better Open-Source Tools to Support Fairness in Automated Scoring
Authors Nitin Madnani, Anastassia Loukina, Alina von Davier, Jill Burstein, Aoife Cahill
Abstract Automated scoring of written and spoken responses is an NLP application that can significantly impact lives especially when deployed as part of high-stakes tests such as the GRE® and the TOEFL®. Ethical considerations require that automated scoring algorithms treat all test-takers fairly. The educational measurement community has done significant research on fairness in assessments and automated scoring systems must incorporate their recommendations. The best way to do that is by making available automated, non-proprietary tools to NLP researchers that directly incorporate these recommendations and generate the analyses needed to help identify and resolve biases in their scoring systems. In this paper, we attempt to provide such a solution.
Published 2017-04-01
URL https://www.aclweb.org/anthology/W17-1605/
PDF https://www.aclweb.org/anthology/W17-1605
PWC https://paperswithcode.com/paper/building-better-open-source-tools-to-support

Handling Multi-Sentence Queries in a Domain Independent Dialogue System

Title Handling Multi-Sentence Queries in a Domain Independent Dialogue System
Authors Prathyusha Jwalapuram, Radhika Mamidi
Published 2017-12-01
URL https://www.aclweb.org/anthology/W17-7516/
PDF https://www.aclweb.org/anthology/W17-7516
PWC https://paperswithcode.com/paper/handling-multi-sentence-queries-in-a-domain

Integrating the Management of Personal Data Protection and Open Science with Research Ethics

Title Integrating the Management of Personal Data Protection and Open Science with Research Ethics
Authors Dave Lewis, Joss Moorkens, Kaniz Fatema
Abstract We examine the impact of the EU General Data Protection Regulation and the push from research funders to provide open access research data on the current practices in Language Technology Research. We analyse the challenges that arise and the opportunities to address many of them through the use of existing open data practices. We discuss the impact of this also on current practice in research ethics.
Published 2017-04-01
URL https://www.aclweb.org/anthology/W17-1607/
PDF https://www.aclweb.org/anthology/W17-1607
PWC https://paperswithcode.com/paper/integrating-the-management-of-personal-data

Predicting User Activity Level In Point Processes With Mass Transport Equation

Title Predicting User Activity Level In Point Processes With Mass Transport Equation
Authors Yichen Wang, Xiaojing Ye, Hongyuan Zha, Le Song
Abstract Point processes are powerful tools to model user activities and have a plethora of applications in social sciences. Predicting user activities based on point processes is a central problem. However, existing works are mostly problem specific, use heuristics, or simplify the stochastic nature of point processes. In this paper, we propose a framework that provides an unbiased estimator of the probability mass function of point processes. In particular, we design a key reformulation of the prediction problem, and further derive a differential-difference equation to compute a conditional probability mass function. Our framework is applicable to general point processes and prediction tasks, and achieves superb predictive and efficiency performance in diverse real-world applications compared to state-of-arts.
Tasks Point Processes
Published 2017-12-01
URL http://papers.nips.cc/paper/6762-predicting-user-activity-level-in-point-processes-with-mass-transport-equation
PDF http://papers.nips.cc/paper/6762-predicting-user-activity-level-in-point-processes-with-mass-transport-equation.pdf
PWC https://paperswithcode.com/paper/predicting-user-activity-level-in-point

Ethical Considerations in NLP Shared Tasks

Title Ethical Considerations in NLP Shared Tasks
Authors Carla Parra Escart{'\i}n, Wessel Reijers, Teresa Lynn, Joss Moorkens, Andy Way, Chao-Hong Liu
Abstract Shared tasks are increasingly common in our field, and new challenges are suggested at almost every conference and workshop. However, as this has become an established way of pushing research forward, it is important to discuss how we researchers organise and participate in shared tasks, and make that information available to the community to allow further research improvements. In this paper, we present a number of ethical issues along with other areas of concern that are related to the competitive nature of shared tasks. As such issues could potentially impact on research ethics in the Natural Language Processing community, we also propose the development of a framework for the organisation of and participation in shared tasks that can help mitigate against these issues arising.
Tasks Machine Translation
Published 2017-04-01
URL https://www.aclweb.org/anthology/W17-1608/
PDF https://www.aclweb.org/anthology/W17-1608
PWC https://paperswithcode.com/paper/ethical-considerations-in-nlp-shared-tasks

LexSubNC: A Dataset of Lexical Substitution for Nominal Compounds

Title LexSubNC: A Dataset of Lexical Substitution for Nominal Compounds
Authors Rodrigo Wilkens, Leonardo Zilio, Silvio Ricardo Cordeiro, Felipe Paula, Carlos Ramisch, Marco Idiart, Aline Villavicencio
Tasks Machine Translation, Text Simplification
Published 2017-01-01
URL https://www.aclweb.org/anthology/W17-6941/
PDF https://www.aclweb.org/anthology/W17-6941
PWC https://paperswithcode.com/paper/lexsubnc-a-dataset-of-lexical-substitution

Universal Dependencies

Title Universal Dependencies
Authors Joakim Nivre, Daniel Zeman, Filip Ginter, Francis Tyers
Abstract Universal Dependencies (UD) is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages. This tutorial gives an introduction to the UD framework and resources, from basic design principles to annotation guidelines and existing treebanks. We also discuss tools for developing and exploiting UD treebanks and survey applications of UD in NLP and linguistics.
Published 2017-04-01
URL https://www.aclweb.org/anthology/E17-5001/
PDF https://www.aclweb.org/anthology/E17-5001
PWC https://paperswithcode.com/paper/universal-dependencies

GeoDict: an integrated gazetteer

Title GeoDict: an integrated gazetteer
Authors Jacques Fize, Gaurav Shrivastava, Pierre Andr{'e} M{'e}nard
Tasks Epidemiology
Published 2017-09-01
URL https://www.aclweb.org/anthology/W17-7004/
PDF https://www.aclweb.org/anthology/W17-7004
PWC https://paperswithcode.com/paper/geodict-an-integrated-gazetteer

Proceedings of the IWCS workshop on Foundations of Situated and Multimodal Communication

Title Proceedings of the IWCS workshop on Foundations of Situated and Multimodal Communication
Tasks Language Acquisition
Published 2017-01-01
URL https://www.aclweb.org/anthology/W17-7100/
PDF https://www.aclweb.org/anthology/W17-7100
PWC https://paperswithcode.com/paper/proceedings-of-the-iwcs-workshop-on

Text-Picture Relations in Multimodal Instructions

Title Text-Picture Relations in Multimodal Instructions
Authors Ielka van der Sluis, Anne Nienke Eppinga, Gisela Redeker
Published 2017-01-01
URL https://www.aclweb.org/anthology/W17-7104/
PDF https://www.aclweb.org/anthology/W17-7104
PWC https://paperswithcode.com/paper/text-picture-relations-in-multimodal

Ethical Research Protocols for Social Media Health Research

Title Ethical Research Protocols for Social Media Health Research
Authors Adrian Benton, Glen Coppersmith, Mark Dredze
Abstract Social media have transformed data-driven research in political science, the social sciences, health, and medicine. Since health research often touches on sensitive topics that relate to ethics of treatment and patient privacy, similar ethical considerations should be acknowledged when using social media data in health research. While much has been said regarding the ethical considerations of social media research, health research leads to an additional set of concerns. We provide practical suggestions in the form of guidelines for researchers working with social media data in health research. These guidelines can inform an IRB proposal for researchers new to social media health research.
Tasks Decision Making
Published 2017-04-01
URL https://www.aclweb.org/anthology/W17-1612/
PDF https://www.aclweb.org/anthology/W17-1612
PWC https://paperswithcode.com/paper/ethical-research-protocols-for-social-media

Multimodal Fusion with Recurrent Neural Networks for Rumor Detection on Microblogs

Title Multimodal Fusion with Recurrent Neural Networks for Rumor Detection on Microblogs
Authors Zhiwei Jin, Juan Cao, Han Guo, Yongdong Zhang
Abstract Microblogs have become popular media for news propagation in recent years. Meanwhile, numerous rumors and fake news also bloom and spread wildly on the open social media plat- forms. Without veri cation, they could seriously jeopardize the credibility of microblogs. We observe that an increasing number of users are using images and videos to post news in addition to texts. Tweets or microblogs are commonly composed of text, image and social context. In this paper, we propose a novel Recurrent Neural Network with an at- tention mechanism (att-RNN) to fuse multimodal features for e ective rumor detection. In this end-to-end network, image features are incorporated into the joint features of text and social context, which are obtained with an LSTM (Long-Short Term Memory) network, to produce a reliable fused classi cation. The neural attention from the outputs of the LSTM is utilized when fusing with the visual features. Extensive experiments are conducted on two multimedia ru- mor datasets collected from Weibo and Twitter. The results demonstrate the e ectiveness of the proposed end-to-end att-RNN in detecting rumors with multimodal contents.
Published 2017-10-23
URL https://dl.acm.org/citation.cfm?id=3123454
PDF https://doi.org/10.1145/3123266.3123454
PWC https://paperswithcode.com/paper/multimodal-fusion-with-recurrent-neural

Priv’IT: Private and Sample Efficient Identity Testing

Title Priv’IT: Private and Sample Efficient Identity Testing
Authors Bryan Cai, Constantinos Daskalakis, Gautam Kamath
Abstract We develop differentially private hypothesis testing methods for the small sample regime. Given a sample $\mathcal{D}$ from a categorical distribution $p$ over some domain $\Sigma$, an explicitly described distribution $q$ over $\Sigma$, some privacy parameter $\epsilon$, accuracy parameter $\alpha$, and requirements $\beta_\mathrm{I}$ and $\beta_\mathrm{II}$ for the type I and type II errors of our test, the goal is to distinguish between $p=q$ and $d_\mathrm{tv}(p,q) \ge \alpha$. We provide theoretical bounds for the sample size $\mathcal{D}$ so that our method both satisfies $(\epsilon,0)$-differential privacy, and guarantees $\beta_\mathrm{I}$ and $\beta_\mathrm{II}$ type I and type II errors. We show that differential privacy may come for free in some regimes of parameters, and we always beat the sample complexity resulting from running the $\chi^2$-test with noisy counts, or standard approaches such as repetition for endowing non-private $\chi^2$-style statistics with differential privacy guarantees. We experimentally compare the sample complexity of our method to that of recently proposed methods for private hypothesis testing.
Published 2017-08-01
URL https://icml.cc/Conferences/2017/Schedule?showEvent=459
PDF http://proceedings.mlr.press/v70/cai17a/cai17a.pdf
PWC https://paperswithcode.com/paper/privit-private-and-sample-efficient-identity-1

Four types of emporal signals

Title Four types of emporal signals
Authors Kiyong Lee
Published 2017-01-01
URL https://www.aclweb.org/anthology/W17-7412/
PDF https://www.aclweb.org/anthology/W17-7412
PWC https://paperswithcode.com/paper/four-types-of-emporal-signals
comments powered by Disqus