Date of Award

8-2002

Degree Name

Doctor of Philosophy

Department

Statistics

First Advisor

Dr. Joseph W. McKean

Abstract

Two major goals in discriminant analysis are discrimination and classification. In discrimination, the goal is to describe graphically (visualization) different features of several known groups. In classification, the goal is to allocate unknown observations to one of several known groups. We have developed new visualization procedures based on traditional estimating procedures and also on robust estimating procedures. We have further developed robust classification procedures. We propose several robust classification procedures based on coordinatewise and affine equivariant, rank-based robust estimates. Empirical studies are performed over many different error distributions. These studies result in empirical efficiencies of the robust and traditional procedures. The robust procedures are much less sensitive to outliers than the traditional procedures. Traditional procedures in discrimination analysis generalize to Sliced Inverse Regression (SIR) and Sliced Average Variance Estimation (SAVE) for discrete data. These are visualization methods used to determine graphically group structures and patterns in the data. One of their goals is to determine these relationships in a much smaller coordinate system (the “principal discriminate co-ordinates”) than that of the original data. For both, a kernel matrix is obtained based upon a spanning set. The spanning set for SIR involves differences in locations of the variables while the spanning set for SAVE contains differences in variance-covariance estimates, as well as differences in locations. We present two new visualization procedures. Based on numerical linear algebra techniques, our first procedure allows the researcher to select in order of importance the vectors in the spanning set which enter the kernel. Our second procedure uses the same ordering technique but further, by an expansion of the spanning set, allows the researcher to know which particular differences (location, variance, covariance) are most important. Our second procedure we call Sliced Mean Variance Covariance Inverse Regression (SMVCIR). We also investigate robust procedures for our two new visualization methods, as well as robust generalizations of SIR and SAVE. We extend these procedures to continuous response data, also. We investigate and compare these procedures over many real and simulated data sets. We have made our procedures available to the scientific community via the world wide web.

Access Setting

Dissertation-Open Access

Share

COinS