Date of Award


Degree Name

Doctor of Philosophy



First Advisor

Dr. Jessaca K. Spybrook

Second Advisor

Dr. Ya Zhang

Third Advisor

Dr. Joseph A. Taylor


Impact studies, cluster randomized trial, experimental design, moderator effects, multi- level design, educational evaluation


In the last two decades, we have seen an increasing number of impact studies in education that are designed to detect a meaningful average treatment effect, which answers the what works question. Recently, researchers have started to consider how to expand the design of impact studies to answer questions regarding the treatment effect heterogeneity across different contexts, such as for whom and under what conditions programs work. Answering these questions is imperative in the quest to make evidence-based and context-relevant decisions to improve educational outcomes for all students.

Many of the impact studies utilize a cluster randomized trial (CRT). CRTs are prevalent in education as interventions are often implemented in intact groups. In a typical CRT, schools are randomly assigned to treatment conditions. A two-level CRT could have students nested within each school. Similarly, a three-level CRT could have teachers nested within each school and students nested within each teacher. For these types of designs, the for whom and under what conditions questions can be answered with individual (i.e. student) and cluster (i.e. school, teacher) level moderator analyses. Thus, it is important to ensure that these CRTs have sufficient statistical power to answer these questions.

Currently, there is abundant empirical work on design parameters and meaningful effect sizes for planning two-level CRTs to detect the average treatment effects. However, there is a limited empirical base for design parameters (i.e. ICCs and R2 values) for planning three-level CRTs. Further, the expected effect sizes for moderator effects at all levels is relatively unknown. In this study, I investigate the design parameters for planning three-level and two-level CRTs using datasets from large-scale impact evaluations funded by the IES. These impact evaluations examined the effect of mathematics and literacy interventions in different regions in the U.S. I also investigate the feasibility of conducting power analysis and estimating effect sizes for threelevel CRTs as two-level CRTs while ignoring the middle level. Further, I present the differential treatment effect sizes associated with moderators at the student, teacher, and school level, with the hope that they could serve as target effect sizes for designing impact studies and establish a foundation for the field to continue to build empirical work on this topic. Lastly, I demonstrate the application of these design parameters, effect sizes, and differential effect sizes to conduct power analysis for CRTs that aim to detect the average treatment and moderator effects.

Access Setting

Dissertation-Campus Only

Restricted to Campus until