Date of Award
Doctor of Philosophy
Dr. Mary Anne Bunda
Dr. Jianping Shen
Dr. Michael Stoline
This study addressed the problem o f the probable effectiveness o f the Pearson correlation coefficient (r) as an estimator o f moderate or strong population correlation (rho) when that estimate is based on small sample data which contains an outlier. In such a situation, three components contribute to the size o f a sample correlation coefficient, and so to the subsequent effectiveness o f the resulting estimation decision. These components are 1) rho, 2) sample size, and 3) outlier. Considered in this study were: two conditions o f rho (.5 and .8), three sample sizes (10, 30 and 50) and two outlier conditions (without outlier and with outlier).
The investigation w as conducted by simulating the distribution o f Pearson r’s under each condition and observing its behavior. Each sample distribution was characterized by values o f central tendency, dispersion and skew. Each distribution was also summarized in terms o f a hit rate which indicated the percentage o f times the confidence interval about its sample r’s contained the known population rho. The nominal expected hit rate was 95%.
Results indicated that in the condition without outlier measures o f central tendency were close to rho across all sample sizes and for both conditions o f rho. Hit rate was very close to the expected 95% across all study conditions.
In the condition with outlier, measures o f central tendency were not close to Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. rho, and were farther from rho as sample size became smaller. Hit rate was considerably smaller than the expected 95%, particularly when rho was .5 rather than .8. When rho was .5, the hit rate was 73% at sample size 10, 83% at sample size 30 and 87% at sample size 50.
When rho was .8, the hit rate was 84% at sample size 10, 90% at sample size 30, and 92% at sample size 50. The implication o f these results for the practical investigator is that if an outlier appears in small study data, the risk o f making an incorrect decision is substantially increased particularly when rho is moderate. Reproduced with
Suchowski, Maria A., "An Analysis of the Impact of an Outlier on Correlation Coefficients Across Small Sample Data Where RHO is Non-Zero" (2001). Dissertations. 1348.