Honors Theses

Date of Award


Document Type

Undergraduate Thesis


Computer and Information Science

First Advisor

Dawn Wilkins

Relational Format



This thesis is a first attempt in assessing patterns related to gender in classical literature prior to the 21st century, specifically looking at the way writing styles and language di↵er among female and male authors; e.g., sentence structuring, lexical diversity, and dictionary similarities. Further topics of analysis included gender distribution and sentiment within the novels themselves. Seventy-eight open-source novels were used in the analysis, with thirty-one books by female authors and forty-seven by male authors. Data from the novels was acquired using Python, natural language processing, and the Natural Language Toolkit (NLTK). Graphs and visuals were then created to summarize data gathered. This information was then used to explore whether bias or patterns existed among authors in their writing style or in their descriptions and references of gender within their novels. It is concluded that while no distinctive pattern of bias or prejudice exists among authors, there is a strong correlation in writing styles and vocabulary to author gender.

Available for download on Wednesday, September 02, 2020