Honors Theses

Date of Award

Spring 5-8-2022

Document Type

Undergraduate Thesis

Department

Computer and Information Science

First Advisor

Dawn E. Wilkins

Relational Format

Dissertation/Thesis

Abstract

This project involves comparing different methods of missing data imputation in the context of predicting real estate listing prices. These methods are compared against each other in both their ability to recreate the original data and their effects on a final predictive model. In order to evaluate their effectiveness, first, a predictive model is made using the complete dataset to use as a benchmark for the imputed datasets. Then, a complete dataset is split into 80% training and 20% testing datasets, and missing values are created in the training data using two different missing data mechanisms, missing completely at random (MCAR) and missing at random (MAR). These datasets are then imputed using several popular imputation methods and used as training data for the same model architecture as the benchmark.

The final predictive models show that multiple imputation using deterministic regression gives the best results for MCAR data, and multiple imputation using stochastic regression gives the best results for MAR data.

Accessibility Status

Searchable text

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.