July 26, 2019

1686 words 8 mins read

Paper Group NANR 112

Ethical by Design: Ethics Best Practices for Natural Language Processing. Building Better Open-Source Tools to Support Fairness in Automated Scoring. Handling Multi-Sentence Queries in a Domain Independent Dialogue System. Integrating the Management of Personal Data Protection and Open Science with Research Ethics. Predicting User Activity Level In …

Ethical by Design: Ethics Best Practices for Natural Language Processing


Title	Ethical by Design: Ethics Best Practices for Natural Language Processing
Authors	Jochen L. Leidner, Vassilis Plachouras
Abstract	Natural language processing (NLP) systems analyze and/or generate human language, typically on users{'} behalf. One natural and necessary question that needs to be addressed in this context, both in research projects and in production settings, is the question how ethical the work is, both regarding the process and its outcome. Towards this end, we articulate a set of issues, propose a set of best practices, notably a process featuring an ethics review board, and sketch and how they could be meaningfully applied. Our main argument is that ethical outcomes ought to be achieved by design, i.e. by following a process aligned by ethical values. We also offer some response options for those facing ethics issues. While a number of previous works exist that discuss ethical issues, in particular around big data and machine learning, to the authors{'} knowledge this is the first account of NLP and ethics from the perspective of a principled process.
Tasks
Published	2017-04-01
URL	https://www.aclweb.org/anthology/W17-1604/
PDF	https://www.aclweb.org/anthology/W17-1604
PWC	https://paperswithcode.com/paper/ethical-by-design-ethics-best-practices-for
Repo
Framework

Building Better Open-Source Tools to Support Fairness in Automated Scoring


Title	Building Better Open-Source Tools to Support Fairness in Automated Scoring
Authors	Nitin Madnani, Anastassia Loukina, Alina von Davier, Jill Burstein, Aoife Cahill
Abstract	Automated scoring of written and spoken responses is an NLP application that can significantly impact lives especially when deployed as part of high-stakes tests such as the GRE® and the TOEFL®. Ethical considerations require that automated scoring algorithms treat all test-takers fairly. The educational measurement community has done significant research on fairness in assessments and automated scoring systems must incorporate their recommendations. The best way to do that is by making available automated, non-proprietary tools to NLP researchers that directly incorporate these recommendations and generate the analyses needed to help identify and resolve biases in their scoring systems. In this paper, we attempt to provide such a solution.
Tasks
Published	2017-04-01
URL	https://www.aclweb.org/anthology/W17-1605/
PDF	https://www.aclweb.org/anthology/W17-1605
PWC	https://paperswithcode.com/paper/building-better-open-source-tools-to-support
Repo
Framework

Handling Multi-Sentence Queries in a Domain Independent Dialogue System


Title	Handling Multi-Sentence Queries in a Domain Independent Dialogue System
Authors	Prathyusha Jwalapuram, Radhika Mamidi
Abstract
Tasks
Published	2017-12-01
URL	https://www.aclweb.org/anthology/W17-7516/
PDF	https://www.aclweb.org/anthology/W17-7516
PWC	https://paperswithcode.com/paper/handling-multi-sentence-queries-in-a-domain
Repo
Framework

Integrating the Management of Personal Data Protection and Open Science with Research Ethics


Title	Integrating the Management of Personal Data Protection and Open Science with Research Ethics
Authors	Dave Lewis, Joss Moorkens, Kaniz Fatema
Abstract	We examine the impact of the EU General Data Protection Regulation and the push from research funders to provide open access research data on the current practices in Language Technology Research. We analyse the challenges that arise and the opportunities to address many of them through the use of existing open data practices. We discuss the impact of this also on current practice in research ethics.
Tasks
Published	2017-04-01
URL	https://www.aclweb.org/anthology/W17-1607/
PDF	https://www.aclweb.org/anthology/W17-1607
PWC	https://paperswithcode.com/paper/integrating-the-management-of-personal-data
Repo
Framework

Predicting User Activity Level In Point Processes With Mass Transport Equation


Title	Predicting User Activity Level In Point Processes With Mass Transport Equation
Authors	Yichen Wang, Xiaojing Ye, Hongyuan Zha, Le Song
Abstract	Point processes are powerful tools to model user activities and have a plethora of applications in social sciences. Predicting user activities based on point processes is a central problem. However, existing works are mostly problem specific, use heuristics, or simplify the stochastic nature of point processes. In this paper, we propose a framework that provides an unbiased estimator of the probability mass function of point processes. In particular, we design a key reformulation of the prediction problem, and further derive a differential-difference equation to compute a conditional probability mass function. Our framework is applicable to general point processes and prediction tasks, and achieves superb predictive and efficiency performance in diverse real-world applications compared to state-of-arts.
Tasks	Point Processes
Published	2017-12-01
URL	http://papers.nips.cc/paper/6762-predicting-user-activity-level-in-point-processes-with-mass-transport-equation
PDF	http://papers.nips.cc/paper/6762-predicting-user-activity-level-in-point-processes-with-mass-transport-equation.pdf
PWC	https://paperswithcode.com/paper/predicting-user-activity-level-in-point
Repo
Framework

Ethical Considerations in NLP Shared Tasks


Title	Ethical Considerations in NLP Shared Tasks
Authors	Carla Parra Escart{'\i}n, Wessel Reijers, Teresa Lynn, Joss Moorkens, Andy Way, Chao-Hong Liu
Abstract	Shared tasks are increasingly common in our field, and new challenges are suggested at almost every conference and workshop. However, as this has become an established way of pushing research forward, it is important to discuss how we researchers organise and participate in shared tasks, and make that information available to the community to allow further research improvements. In this paper, we present a number of ethical issues along with other areas of concern that are related to the competitive nature of shared tasks. As such issues could potentially impact on research ethics in the Natural Language Processing community, we also propose the development of a framework for the organisation of and participation in shared tasks that can help mitigate against these issues arising.
Tasks	Machine Translation
Published	2017-04-01
URL	https://www.aclweb.org/anthology/W17-1608/
PDF	https://www.aclweb.org/anthology/W17-1608
PWC	https://paperswithcode.com/paper/ethical-considerations-in-nlp-shared-tasks
Repo
Framework

LexSubNC: A Dataset of Lexical Substitution for Nominal Compounds


Title	LexSubNC: A Dataset of Lexical Substitution for Nominal Compounds
Authors	Rodrigo Wilkens, Leonardo Zilio, Silvio Ricardo Cordeiro, Felipe Paula, Carlos Ramisch, Marco Idiart, Aline Villavicencio
Abstract
Tasks	Machine Translation, Text Simplification
Published	2017-01-01
URL	https://www.aclweb.org/anthology/W17-6941/
PDF	https://www.aclweb.org/anthology/W17-6941
PWC	https://paperswithcode.com/paper/lexsubnc-a-dataset-of-lexical-substitution
Repo
Framework

Universal Dependencies


Title	Universal Dependencies
Authors	Joakim Nivre, Daniel Zeman, Filip Ginter, Francis Tyers
Abstract	Universal Dependencies (UD) is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages. This tutorial gives an introduction to the UD framework and resources, from basic design principles to annotation guidelines and existing treebanks. We also discuss tools for developing and exploiting UD treebanks and survey applications of UD in NLP and linguistics.
Tasks
Published	2017-04-01
URL	https://www.aclweb.org/anthology/E17-5001/
PDF	https://www.aclweb.org/anthology/E17-5001
PWC	https://paperswithcode.com/paper/universal-dependencies
Repo
Framework

GeoDict: an integrated gazetteer


Title	GeoDict: an integrated gazetteer
Authors	Jacques Fize, Gaurav Shrivastava, Pierre Andr{'e} M{'e}nard
Abstract
Tasks	Epidemiology
Published	2017-09-01
URL	https://www.aclweb.org/anthology/W17-7004/
PDF	https://www.aclweb.org/anthology/W17-7004
PWC	https://paperswithcode.com/paper/geodict-an-integrated-gazetteer
Repo
Framework

Proceedings of the IWCS workshop on Foundations of Situated and Multimodal Communication


Title	Proceedings of the IWCS workshop on Foundations of Situated and Multimodal Communication
Authors
Abstract
Tasks	Language Acquisition
Published	2017-01-01
URL	https://www.aclweb.org/anthology/W17-7100/
PDF	https://www.aclweb.org/anthology/W17-7100
PWC	https://paperswithcode.com/paper/proceedings-of-the-iwcs-workshop-on
Repo
Framework

Text-Picture Relations in Multimodal Instructions


Title	Text-Picture Relations in Multimodal Instructions
Authors	Ielka van der Sluis, Anne Nienke Eppinga, Gisela Redeker
Abstract
Tasks
Published	2017-01-01
URL	https://www.aclweb.org/anthology/W17-7104/
PDF	https://www.aclweb.org/anthology/W17-7104
PWC	https://paperswithcode.com/paper/text-picture-relations-in-multimodal
Repo
Framework


Title	Ethical Research Protocols for Social Media Health Research
Authors	Adrian Benton, Glen Coppersmith, Mark Dredze
Abstract	Social media have transformed data-driven research in political science, the social sciences, health, and medicine. Since health research often touches on sensitive topics that relate to ethics of treatment and patient privacy, similar ethical considerations should be acknowledged when using social media data in health research. While much has been said regarding the ethical considerations of social media research, health research leads to an additional set of concerns. We provide practical suggestions in the form of guidelines for researchers working with social media data in health research. These guidelines can inform an IRB proposal for researchers new to social media health research.
Tasks	Decision Making
Published	2017-04-01
URL	https://www.aclweb.org/anthology/W17-1612/
PDF	https://www.aclweb.org/anthology/W17-1612
PWC	https://paperswithcode.com/paper/ethical-research-protocols-for-social-media
Repo
Framework

Multimodal Fusion with Recurrent Neural Networks for Rumor Detection on Microblogs


Title	Multimodal Fusion with Recurrent Neural Networks for Rumor Detection on Microblogs
Authors	Zhiwei Jin, Juan Cao, Han Guo, Yongdong Zhang
Abstract	Microblogs have become popular media for news propagation in recent years. Meanwhile, numerous rumors and fake news also bloom and spread wildly on the open social media plat- forms. Without veri cation, they could seriously jeopardize the credibility of microblogs. We observe that an increasing number of users are using images and videos to post news in addition to texts. Tweets or microblogs are commonly composed of text, image and social context. In this paper, we propose a novel Recurrent Neural Network with an at- tention mechanism (att-RNN) to fuse multimodal features for e ective rumor detection. In this end-to-end network, image features are incorporated into the joint features of text and social context, which are obtained with an LSTM (Long-Short Term Memory) network, to produce a reliable fused classi cation. The neural attention from the outputs of the LSTM is utilized when fusing with the visual features. Extensive experiments are conducted on two multimedia ru- mor datasets collected from Weibo and Twitter. The results demonstrate the e ectiveness of the proposed end-to-end att-RNN in detecting rumors with multimodal contents.
Tasks
Published	2017-10-23
URL	https://dl.acm.org/citation.cfm?id=3123454
PDF	https://doi.org/10.1145/3123266.3123454
PWC	https://paperswithcode.com/paper/multimodal-fusion-with-recurrent-neural
Repo
Framework

Priv’IT: Private and Sample Efficient Identity Testing


Title	Priv’IT: Private and Sample Efficient Identity Testing
Authors	Bryan Cai, Constantinos Daskalakis, Gautam Kamath
Abstract	We develop differentially private hypothesis testing methods for the small sample regime. Given a sample $\mathcal{D}$ from a categorical distribution $p$ over some domain $\Sigma$, an explicitly described distribution $q$ over $\Sigma$, some privacy parameter $\epsilon$, accuracy parameter $\alpha$, and requirements $\beta_\mathrm{I}$ and $\beta_\mathrm{II}$ for the type I and type II errors of our test, the goal is to distinguish between $p=q$ and $d_\mathrm{tv}(p,q) \ge \alpha$. We provide theoretical bounds for the sample size $\mathcal{D}$ so that our method both satisfies $(\epsilon,0)$-differential privacy, and guarantees $\beta_\mathrm{I}$ and $\beta_\mathrm{II}$ type I and type II errors. We show that differential privacy may come for free in some regimes of parameters, and we always beat the sample complexity resulting from running the $\chi^2$-test with noisy counts, or standard approaches such as repetition for endowing non-private $\chi^2$-style statistics with differential privacy guarantees. We experimentally compare the sample complexity of our method to that of recently proposed methods for private hypothesis testing.
Tasks
Published	2017-08-01
URL	https://icml.cc/Conferences/2017/Schedule?showEvent=459
PDF	http://proceedings.mlr.press/v70/cai17a/cai17a.pdf
PWC	https://paperswithcode.com/paper/privit-private-and-sample-efficient-identity-1
Repo
Framework

Four types of emporal signals


Title	Four types of emporal signals
Authors	Kiyong Lee
Abstract
Tasks
Published	2017-01-01
URL	https://www.aclweb.org/anthology/W17-7412/
PDF	https://www.aclweb.org/anthology/W17-7412
PWC	https://paperswithcode.com/paper/four-types-of-emporal-signals
Repo
Framework