Date of Award
Computer and Information Science
Dawn E. Wilkins
This project involves comparing different methods of missing data imputation in the context of predicting real estate listing prices. These methods are compared against each other in both their ability to recreate the original data and their effects on a final predictive model. In order to evaluate their effectiveness, first, a predictive model is made using the complete dataset to use as a benchmark for the imputed datasets. Then, a complete dataset is split into 80% training and 20% testing datasets, and missing values are created in the training data using two different missing data mechanisms, missing completely at random (MCAR) and missing at random (MAR). These datasets are then imputed using several popular imputation methods and used as training data for the same model architecture as the benchmark.
The final predictive models show that multiple imputation using deterministic regression gives the best results for MCAR data, and multiple imputation using stochastic regression gives the best results for MAR data.
Donlen, Connor, "Comparative Analysis of Imputation Methods in Real Estate Data" (2022). Honors Theses. 2735.