Date of Award


Degree Name

Doctor of Philosophy



First Advisor

Dr. Chris L. S. Coryn

Second Advisor

Dr. Arlen R. Gullickson

Third Advisor

Dr, Leslie Cooksy


Metaevaluation is the evaluation of evaluation. Metaevaluation may focus particular evaluation cases, evaluation systems, or the discipline overall. Leading scholars within the discipline consider metaevaluation to be a professional imperative, demonstrating that evaluation is a reflexive enterprise. Various criteria have been set forth for what constitutes excellence in evaluation. In the context of educational program evaluation, the dominant criteria are the Program Evaluation Standards, developed by the developed by the Joint Committee on Standards for Educational Evaluation.

There has been widespread acceptance and application of the Program Evaluation Standards, and their use is advocated by major organizations and several of the leading scholars and textbooks in evaluation. Concurrently, metaevaluation has received increasing attention within the evaluation discipline. Despite these two important developments in the field, there has been little empirical study of the Standards and their role in metaevaluation practice. There is an implicit assumption concerning their use as a tool for metaevaluation that comparable judgments about a given evaluation would be reached by different individuals when they use the Standards as criteria for assessing the evaluation's quality. This issue concerns interrater reliability among metaevaluators when they use the Standards as rating criteria. Since reliability is a prerequisite for validity, that is a critical assumption worthy of empirical investigation.

The legitimacy of this assumption was investigated in this study by having thirty individuals—ten evaluation doctoral students, ten evaluation practitioners, and ten evaluation scholars—rate the same ten program evaluations using the Program Evaluation Standards as criteria. The overall purpose of the study was to assess interrater reliability in this context using multiple measures. The results showed uniformly low interrater reliability, which has direct implications for how metaevaluations should be performed and their results used.

Access Setting

Dissertation-Open Access