Date of Award

2001

Document Type

Undergraduate Thesis

Department

Computer and Information Science

First Advisor

Dawn Wilkins

Relational Format

Dissertation/Thesis

Abstract

In today's world, the amount of raw data archived across multiple distinct domains is growing at an exponential rate. "Data Mining" is a continuously evolving family of processes by which individuals extract useful information from these data. Classification is one of these processes, and is the construction of varying types of descriptive models from labeled data objects, for the purpose of predicting the label of those objects with unknown labels. The construction of these modules is often adversely affected by the presence of incorrect values or outlier values within the data, a phenomenon known as noise. The original motivation of this research was to test the performance of the binary genetic algorithm, one of a multitude of algorithms used for model construction, in the presence of data with varying percentages of noise. However, in the course of experimentation, several issues arose concerning the effectiveness of the binary genetic algorithm as a classifier. Specifically, the chosen method for encoding classification hypotheses demonstrated limited scalability. Furthermore, the chosen method for encoding continuous and nominally valued data attributes was discovered to be unreasonably strict, leading to poor performance. Further research should be undergone to investigate a more reasonable encoding method. However, the algorithm performed favorably on purely categorical data with a relatively moderate number of small-domained dimensions. Upon injecting varying percentages of noise into these data, the algorithm exhibited a slow, steady descent in classification accuracy. These results lead to the conclusion that the binary genetic algorithm should not be discounted as a possible answer to the question of data classification, especially for data sets with the above characteristics, and further research could reveal hypothesis encoding strategies that will result in improved scalability.

Recommended Citation

Stine, Matthew E., "Performance of Genetic Algorithms for Data Classification" (2001). Honors Theses. 676.
https://egrove.olemiss.edu/hon_thesis/676

Accessibility Status

Searchable text

Download

Included in

Computer Sciences Commons

COinS

Honors Theses

Performance of Genetic Algorithms for Data Classification

Date of Award

Document Type

Department

First Advisor

Relational Format

Abstract

Recommended Citation

Accessibility Status

Included in

Browse

Search

Author Corner

Additional Information

Honors Theses

Performance of Genetic Algorithms for Data Classification

Author

Date of Award

Document Type

Department

First Advisor

Relational Format

Abstract

Recommended Citation

Accessibility Status

Included in

Share

Browse

Search

Author Corner

Additional Information