Doctor of Philosophy


Educational Leadership, Research and Technology

Dr. Brooks Applegate

Dr. David Hartmann

Dr. Warren Lacefield


Appropriate scoring methods for tests should be based on theories of the construct domains of such tests (Messick, 1989). This is called structural fidelity (Loevinger, 1957), a necessary but not sufficient condition for construct validity (Keith & Kranzler, 1999). A situation reaction test (SRT) consists of items with alternatives to be ranked according to subjects’ best judgment. Traditional rank scoring methods assume unidimensionality, implying one scoring key for each item – a single external criterion. This is not appropriate when items elicit multidimensional responses determined by subjects’ best judgments based on possibly multiple internal criteria. The purpose of this study is to determine when a complex ranking item is theoretically governed by multiple traits, whether multiple trait-keys can be identified and validated such that multiple item scores, one for each trait, can be derived from each item.

SRTs with 4 alternatives per item were examined using "optimal ranking order relationship with criterion scores" (ORORCS) in a Monte Carlo simulation under various test conditions: correlation among traits, sample size for calibration and validation, and number of items. Dependent variables were cksk (Fisher’s z between SRT and criterion k scores) and corresponding CI widths obtained using a validation sample. MANOVA results (N = 1000/cell) demonstrated SRTs can be scored in a multidimensional manner; that multiple traits can be measured simultaneously using multiple trait keys to score ranking items.

The ORORCS’s ability to resolve multiple traits in SRTs decreases as inter-trait correlations increase (p < .0001). But for fixed correlation, SRT validity improves and sharpens (tighter CIs) with test length (p < .0001) and with sample size (p < .0001). Interactions were also significant. The results indicate the SRT ORORCS scoring method has structural fidelity. It can effectively measure subject states with respect to several primary causal factors, even when these factors are somewhat inter-correlated. Sample size, though significant, was not much of a factor above 25. Test length is important, but good results do not require long tests. Good design is more important. This study demonstrated an alternative scoring model where items may be scored with multiple keys corresponding to different traits identified using ORORCS procedure.

