Honors Theses

Date of Award


Document Type

Undergraduate Thesis


Computer and Information Science

First Advisor

Dawn Wilkins

Relational Format



This thesis is a first attempt in assessing patterns related to gender in classical literature prior to the 21st century, specifically looking at the way writing styles and language di↵er among female and male authors; e.g., sentence structuring, lexical diversity, and dictionary similarities. Further topics of analysis included gender distribution and sentiment within the novels themselves. Seventy-eight open-source novels were used in the analysis, with thirty-one books by female authors and forty-seven by male authors. Data from the novels was acquired using Python, natural language processing, and the Natural Language Toolkit (NLTK). Graphs and visuals were then created to summarize data gathered. This information was then used to explore whether bias or patterns existed among authors in their writing style or in their descriptions and references of gender within their novels. It is concluded that while no distinctive pattern of bias or prejudice exists among authors, there is a strong correlation in writing styles and vocabulary to author gender.



To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.