Imagine a promising new reading program touted by research as delivering significant gains. A district invests heavily, only to find that the hoped-for results don't materialize. Unfortunately, this scenario is more common than we'd like to admit in education. Practitioners are often left navigating a confusing landscape of studies, unsure which interventions will truly make a difference for their students.
Why does this happen? A growing number of researchers across social sciences, including education, are grappling with a fundamental challenge: how we interpret the results from research. The way we've traditionally analyzed and understood study results has some critical flaws that can lead us astray.
One major issue is the tendency for promising findings to be difficult to replicate. Just as psychology has faced scrutiny over studies that couldn't be reproduced, education research also struggles with this. Exciting results from one study often don't hold up when others try to replicate them (Gelbach and Robinson 2021; Open Science Collaborative 2015).
Furthermore, we rarely hear about the interventions that don't work. Negative or null findings often go unpublished, preventing the field from learning valuable lessons about what doesn't succeed in different contexts. Practitioners are left with confusing and mixed messages. Measuring how much students grew is simple, but understanding which interventions were responsible for those changes is complex. This lack of a complete picture can lead to significant investment in interventions based on a limited number of positive studies, only to see those investments fall short.
Think again of the resources sometimes poured into reading programs that don't align with the well-established science of reading—the consequences for students can be profound. Fortunately, there are ways to navigate this more effectively. But it requires all of us—researchers, practitioners, and policymakers—to become more critical consumers of research.
The Problem with "Statistical Significance" and Interpreting Effects
At the heart of the issue lie some common research practices related to how we determine if a finding is "real." Many are familiar with the idea of "statistical significance," often presented as a way to know if an effect is likely due to the intervention rather than chance. If the probability of a chance occurrence is less than 5%, the result is often deemed significant.
However, researchers know that this widely used framework has limitations that can lead to biased or inaccurate interpretations. For instance, achieving a statistically significant result that is actually incorrect is more likely when the real impact of an intervention is small and when the study involves a limited number of participants (Sims et al., 2022). These conditions are not uncommon in the complex world of education research. One study estimated that the reported effects in successful education studies can be inflated by 52% or more (Sims et al., 2022).
In education, we often see these study results translated into practical terms, like "months of learning gained," to inform important decisions. We might also mistakenly interpret statistical significance as proof that the reported effect size is accurate.
In reality, an "effect size" is our best estimate of an intervention's impact, as we can never truly know what would have happened without it. Statistical significance, on the other hand, primarily tells us whether the estimated effect is likely different from zero, not how large or accurate that effect is (Deke, 2023). These misunderstandings, combined with the inherent complexities of schools and the challenges of small studies, contribute to often exaggerated claims about what works.
The consequence is that much of what we believe about the impact of a curriculum, policy, or practice may not hold true in the practical realities of our schools. If we don't address these fundamental issues, the credibility of education research is at risk, and decisions may be made without a reliable evidence base.
A More Reliable Path Forward
The good news is that solutions exist, and they can be communicated in ways that are more intuitive for practitioners.
Newer research methods are gaining traction that aim to correct for these biases. One approach involves using probability models that incorporate existing knowledge from prior research. This helps us to get a more realistic understanding of an intervention's likely impact (Deke et al., 2022).
Imagine a study on a new science of reading partnership. Instead of a headline declaring, "The intervention had a significant, positive effect on student achievement of 0.14 standard deviations," a more informative statement might be: "We found a 97% chance the program had a positive effect on student test score performance, and there is a nearly 70% chance that effect was larger than 0.10 standard deviations." This is the kind of insight we gained from the first year of Leading Educators’ science of reading partnership with Harlem Community School District 5.
While the difference in the statement might seem subtle, the implications over many studies are significant. It could lead to fewer failed replications and more realistic expectations about the potential of different educational approaches, ultimately leading to better cost-benefit analyses.
This shift requires researchers to communicate their findings in accessible language, focusing on probabilities and the likely range of impact. Consumers of research, in turn, may need to become more comfortable with these more nuanced interpretations of evidence. While effect sizes and probabilities offer a more reliable understanding, they aren't as straightforward as a simple percentage increase. This learning curve will be worthwhile.
What Decision Makers Can Do Now
If you are a funder, district leader, or policymaker, you have a crucial role to play in raising the bar for research quality and the use of evidence:
● Demand multiple studies: Ask vendors and partners for results from several impact studies, not just a single promising one.
● Consider the context: Inquire about who was included in the study and who was not.
● Be skeptical of large claims: Exercise caution with interventions reporting very large effect sizes (remember, a moderate effect is generally between 0.05 and 0.20 standard deviations) and ask how those estimates were calculated.
● Prioritize transparency: Work with partners who are open about the limitations of their data and can explain their impact in ways that resonate with the realities of learning.
This transformation won't happen overnight. However, as more decision-makers demand credible, well-grounded evidence, we can collectively improve how we invest in our students' success.
Our schools and students have the potential for remarkable growth when provided with the right tools and support. In the Harlem partnership mentioned earlier, we saw the percentage of students at or above grade level on i-Ready double in just one year. Yet, our confidence in the specific contribution of each individual intervention to that growth needs to be framed with nuance.
By embracing a more sophisticated understanding of research evidence, we can avoid placing excessive faith in some interventions while potentially overlooking others that could yield positive outcomes for our students.
About the author
Rebecca Taylor-Perryman is the Managing Director of Research and Data at Leading Educators