Theory cannot be fully validated unless the original results have been replicated, resulting in conclusion consistency. Replications are the strongest source to verify research findings and knowledge claims. Sciences such as medicine, chemistry, physics, genetics, and biology, are considered successful because their knowledge claims are buttressed by a large set of replications of original studies. Unfortunately in the social sciences many attempts to replicate fail and thus there is a continuing need for replication studies to confirm facts, expand knowledge to gain new understanding, and verify hypotheses. Two plausible explanations for the failure to replicate in the social sciences could result from the dissimilarity of research questions between original and replication studies. Alternatively, when the same hypothesis is tested over and over (e.g., replicated), but done so in a manner that seemingly neglects the knowledge gains of previous experiments, as when the original first study effect sizes are not considered in replication studies.

Evaluation, as part of the social sciences, depends on replications to make its primary purpose of maintaining and improving services and protecting citizens, more credible. To achieve this purpose, evaluation findings should not be based on a single study.

To increase replicability of original research findings and evaluation studies, this dissertation was focused on demonstrating that the application of two one-sided tests to evaluate a replication question provides a superior way to conduct replication inquiry, assuming other methodological procedures remained as similar they were possible. Furthermore, this dissertation sought to explore the impact of heterogeneity of variance and nonorthogonal sample sizes in replication studies. A two-stage Monte Carlo simulation was conducted to investigate conclusion consistency among different replication procedures about the repeatability of an observed effect.

Overall, the alternative approach yielded higher proportion of successful replications than the traditional approach. The presence of heterogeneity of variance made the equivalence test more liberal to reject the null hypothesis, whereas nonorthogonal sample sizes made it more conservative. Thus, findings can be confirmed by replications and in the absence of them, there cannot be a final statement about any theory.

