Date of Award


Degree Name

Doctor of Philosophy



First Advisor

Dr. Jessaca K. Spybrook

Second Advisor

Dr. Chris L. S. Coryn

Third Advisor

Dr. Joseph A. Taylor


Cluster-randomized trials, intraclass correlation, design parameters, relative efficiency, optimal design, science education evaluation


The purpose of this three-essay dissertation is to provide practical guidance to evaluators planning cluster-randomized trials (CRTs) of science achievement. In an educational setting, interventions are often administered at the cluster level, while outcomes are typically measured at the student level through standardized achievement testing. When evaluating an intervention, a CRT is appropriate because it allows for treatment to be modeled at a different level than the unit of analysis, and properly accounts for the violation of independence that occurs due to nesting. Accurately designing a CRT involves estimating variance parameters (i.e., intraclass correlations [ICCs] and percent of variance explained [R2] values). Prior efforts to improve the design of CRTs in education have primarily been limited to mathematics and reading disciplines, and the applicability of their findings to studies of science achievement is unknown.

I use three essays to present decision scenarios an evaluator faces when designing a CRT. In the first essay, the evaluator has limited information to inform the selection of ICCs for a three-level CRT. I use surface plots of relative efficiency to explore the robustness of an optimal design to misspecification of the ICCs. Findings suggest that three-level CRTs are quite robust to misspecification of either or both ICCs. In the second essay, I resolve the challenge of limited information by using five years of achievement data from Texas to estimate ICCs for two- and three-level CRTs. I then analyze the decision of which covariate to include by estimating and evaluating R2 values for demographic and pretest covariates. Findings suggest ICCs are larger in science than in mathematics and reading, and when a one-year lagged student-level science pretest is unavailable, a one-year lagged school-level science pretest is preferred. In the final essay, I recognize that a multi-site CRT (MSCRT) design is often more appropriate than a CRT, and the evaluator must once again select appropriate variance design parameter values. Using the Texas data, I empirically estimate a distribution of within-district ICCs, and show the number of districts in the MSCRT can impact the average within-district ICC value.

Access Setting

Dissertation-Open Access