## Honors Theses

#### Date of Award

2015

#### Document Type

Undergraduate Thesis

#### Department

Computer and Information Science

#### First Advisor

Yixin Chen

#### Relational Format

Dissertation/Thesis

#### Abstract

The Expectation Maximization algorithm also known as the EM algorithm is an algorithm used to solve the maximum likelihood parameter estimation problem. This problem arises when some of the data involved are missing or incomplete, hence it becomes diï¬ƒcult to know the parameters of the underlying distribution. The EM algorithm mainly comprises of two steps; the E—Step, and the M—Step. In the E—Step, estimated parameter values are used as true values to calculate the maximum likelihood estimate, and in the M—Step, the maximum likelihood calculated is used to estimate the parameters. The E—Step and M—Step iterate through until a speciï¬ed convergence is met. Applications of the EM algorithm include density estimation in unsupervised clustering, estimating class—conditional densities in supervised learning settings, and for outlier detection purposes. The Spatial — EM algorithm is a novel approach that utilizes median — based location and rank — based scatter estimators to replace the sample mean and sample covariance matrix in the M — Step of an EM algorithm. This helps to enhance the stability and robustness of the Spatial — EM algorithm for ï¬nite mixture models. The algorithm is especially robust to outliers. In this research, we use the trimmed Bayesian Information Criterion (BIC) to determine the optimal value of the number of components in the distribution. The algorithm is implemented as an R package, and tested on diï¬€erent datasets.

#### Recommended Citation

Aloba, Aishat O., "Estimating the Number of Components of a Spatial—Em Algorithm: an R Package" (2015). *Honors Theses*. 13.

https://egrove.olemiss.edu/hon_thesis/13