Honors Theses
Date of Award
Spring 5-8-2022
Document Type
Undergraduate Thesis
Department
Computer and Information Science
First Advisor
Dawn E. Wilkins
Relational Format
Dissertation/Thesis
Abstract
This project involves comparing different methods of missing data imputation in the context of predicting real estate listing prices. These methods are compared against each other in both their ability to recreate the original data and their effects on a final predictive model. In order to evaluate their effectiveness, first, a predictive model is made using the complete dataset to use as a benchmark for the imputed datasets. Then, a complete dataset is split into 80% training and 20% testing datasets, and missing values are created in the training data using two different missing data mechanisms, missing completely at random (MCAR) and missing at random (MAR). These datasets are then imputed using several popular imputation methods and used as training data for the same model architecture as the benchmark.
The final predictive models show that multiple imputation using deterministic regression gives the best results for MCAR data, and multiple imputation using stochastic regression gives the best results for MAR data.
Recommended Citation
Donlen, Connor, "Comparative Analysis of Imputation Methods in Real Estate Data" (2022). Honors Theses. 2735.
https://egrove.olemiss.edu/hon_thesis/2735
Accessibility Status
Searchable text