Honors Theses

Date of Award


Document Type

Undergraduate Thesis


Computer and Information Science

First Advisor

Yixin Chen

Relational Format



The Expectation Maximization algorithm also known as the EM algorithm is an algorithm used to solve the maximum likelihood parameter estimation problem. This problem arises when some of the data involved are missing or incomplete, hence it becomes difficult to know the parameters of the underlying distribution. The EM algorithm mainly comprises of two steps; the E—Step, and the M—Step. In the E—Step, estimated parameter values are used as true values to calculate the maximum likelihood estimate, and in the M—Step, the maximum likelihood calculated is used to estimate the parameters. The E—Step and M—Step iterate through until a specified convergence is met. Applications of the EM algorithm include density estimation in unsupervised clustering, estimating class—conditional densities in supervised learning settings, and for outlier detection purposes. The Spatial — EM algorithm is a novel approach that utilizes median — based location and rank — based scatter estimators to replace the sample mean and sample covariance matrix in the M — Step of an EM algorithm. This helps to enhance the stability and robustness of the Spatial — EM algorithm for finite mixture models. The algorithm is especially robust to outliers. In this research, we use the trimmed Bayesian Information Criterion (BIC) to determine the optimal value of the number of components in the distribution. The algorithm is implemented as an R package, and tested on different datasets.

Accessibility Status

Searchable text



To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.