Document Type
Article
Publication Date
9-29-2017
Abstract
In recent years there have been many studies investigating gender biases in the content and editorial process ofWikipedia. In addition to creating a distorted account of knowledge, biases in Wikipedia and similar corpora have especially harmful downstream eects as they are often used in Artificial Intelligence and Machine Learning applications. As a result, many of the algorithms that are deployed in production “learn" the same biases inherent in the data that they churned. It is the therefore increasingly important to develop quantitative metrics to measure bias. In this study we propose a simple metric, the Gendered Pronoun Gap, that measures the ratio of the occurrences of the pronoun “he" versus the pronoun “she." We use this metric to investigate the distribution of the Gendered Pronoun Gap in two Wikipedia corpora prepared by Machine Learning companies for developing and benchmarking algorithms. Our results suggests that the way these datasets have been produced introduce dierent types of gender biases that can potentially distort the learning process for Machine Learning algorithms. We stress that while a single metric is not sucient to completely capture the rich nuances of bias, we suggest that the Gendered Pronoun Gap can be used as one of many metrics.
Relational Format
journal article
Creative Commons License
This work is licensed under a Creative Commons Attribution-No Derivative Works 4.0 International License.
Recommended Citation
Yazdani, Mehrdad, "Investigating the Gender Pronoun Gap in Wikipedia" (2017). WikiStudies. 4.
https://egrove.olemiss.edu/wiki_studies/4