Date of Award

6-2020

Degree Name

Doctor of Philosophy

Department

Statistics

First Advisor

Dr. Joshua Naranjo

Second Advisor

Dr. Joseph McKean,

Third Advisor

Dr. Hyun Bin Kang

Fourth Advisor

Dr. Bradford Dykes

Keywords

PRESS statistic, influential observations, diagnostic analysis, mean square error

Abstract

The most popularly used statistic R2 has a fundamental weakness in model building: it favors adding more predictors to the model because R2 can only increase. In effect, the additional predictors start fitting the noise in data. Other criterion in selecting a regression model such as R2 adj , AIC, SBC, and Mallow’s Cp does not guarantee the model selected will also make better prediction of future values. To avoid this, data scientists withhold a percentage of the data for validation purposes. The PRESS statistic does something similar by withholding each observation in calculating its own predicted value. In this paper, we investigated the properties of PRESS statistic and explored how it performs compared to other criterion in model selection. We also derived estimators of the parameters of interest in linear regression that is based on PRESS, while maintaining desirable statistical properties of estimators such as unbiasedness. A diagnostic statistic that looks at the impact of deleting one observation from the estimation of MSE is also presented.

Access Setting

Dissertation-Open Access

Share

COinS