Date of Award


Degree Name

Doctor of Philosophy



First Advisor

Dr. E. Brooks Applegate

Second Advisor

Dr. Jianping Shen

Third Advisor

Dr. Ya Zhang


test, measurement, differential item functioning, differential test functioning, norm-referenced tests, criterion-referenced tests


In a compensatory criterion referenced test (CRT) it is common that the test developer maximizes the test information at or near the cut score location. This is believed to produce the highest classification accuracy. However, a recent study showed that the optimal location to maximize test information for highest classification accuracy (CA) and consistency (CC) also depends on factors such as test length, cut score location, the examinee ability distribution, and the test model applied for latent trait estimation (Wyse & Babcock, 2016). Moreover, when unintentional secondary dimensions are also measured (nuisance dimension caused by test construction, test standardization, or test scoring) they interfere with the construct designed to be measured. Fortunately, these nuisance factors can be estimated through differential item functioning (DIF) and differential test functioning (DTF, Drasgow, 1984; Cohen & Bolt, 2005). The presence of DIF/DTF in a CRT may cause misclassification errors, decreasing the CA, thus lowering the validity of the test, and or decreasing the CC of the classification decision, thus lowering the reliability of the test.

There has been little attention focused on DIF & DTF effects in CRT. This study was designed to gain comprehensive knowledge about how classification results are affected when tests present with varying test conditions such as a) DIF item location—at or near a cut score location, b) form of DIF—uniform or nonuniform DIF, c) number of DIF items—10%, 20% or 30% of DIF items existing in the test, d) which the DIF favors—the focal or both focal and reference groups, and e) the magnitude of DIF items—small, medium or large. Moreover, there is a lack of investigation of how item and person parameter estimates are affected by the above DIF conditions. A Monti Carlo study was conducted to simulate tests with varying DIF conditions in order to investigate the effect on test classification result and parameter estimates.

The result of this study demonstrates that there was relatively small change on the shifting of item and person parameter estimates when the DIF contamination became heavier. However, the bias on the classification result and Rudner’s CA and CC indices increase as the number and magnitude of DIF items increase when only the reference or focal groups are favored, especially for DIF items located at the optimal location. This study suggests that test developers or researchers should be alerted during CRT test development period when DIF items exist at cut or optimal score location, especially when the DIF favors only the reference or focal group.

Access Setting

Dissertation-Open Access