Scaling “What Works” Is Hard To Do: You’re confident your program works –– but can it scale?

The international development sector has been grappling with the question of scale for decades, recognizing that our best hope of solving intractable global problems is to scale what works. Given recent paradigm shifts in the development and humanitarian sphere, funding is more uncertain than ever, which makes directing limited resources toward proven, scalable solutions more important than ever. Yet the development literature has many more examples of “pilots to nowhere” than solutions that have successfully scaled.

In recent years, as conflict, crisis, and climate change have deepened and intensified the challenges vulnerable populations face, the development sector’s focus on scale has also intensified. Many funders and organizations now prioritize actively working towards scaling successful interventions. For instance, GiveWell and Open Philanthropy have funded CHAI’s Incubator and EvidenceAction’s Accelerator programs, both of which aim to identify and scale evidence-based, high-impact interventions through a multi-stage review and testing process. The Global Schools Forum has established the Impact at Scale Labs. The UK-based NGO, Elrha, has emerged as a thought leader on scale in the humanitarian sector. The Scaling Community of Practice has recently released a rich library of case studies documenting efforts by donors, multilaterals, and implementing organizations to mainstream scaling into their vision, mission, and operations to maximize impact.

In 2022, development and humanitarian organization CARE and evidence-focused decision support organization IDinsight formed a three-year learning partnership to identify, design, and grow CARE’s portfolio solutions with high potential to achieve exponential impact through scale. As we dove into this work, we found ourselves questioning not only what makes an intervention successful, but whether successful always means scalable. The methods we developed to answer these questions have proven invaluable to us and we hope they can benefit the broader sector.

The Ultimate Goal: Sustainable Scale

While a concern with scale is prevalent across the development and humanitarian sphere, there is no agreed upon definition of this concept. ‘Scaling’ might refer to anything from organizations expanding direct implementation through their own footprint, to organizations transferring ownership of proven interventions to governments, the private sector, or civil society partners – with many variations in between.

We think of the former as replication—when the original implementing organization takes interventions that worked well in one place, adapts them for new contexts, and directly implements them, generally using grant funding. This approach yields linear scale, limited by the implementing organization’s footprint and available funding.

We describe the latter as sustainable scale—the result of “doers and payers at scale” other than the original implementing organization scaling solutions through local systems, using local resources. Sustainable scale aligns with recent shifts in the foreign aid sector, as it supports local system actors—government, private sector, and civil society—to own, implement, fund and grow solutions, gradually reducing dependency on international aid. In this article, when we refer to scale, we mean sustainable scale.

In looking for or designing self-sustaining models that can scale through other actors, the scaling pathways we consider include the following institutional sectors, or collaborative networks that include all of them:

Government adoption scales by embedding evidence-based interventions in government policy and systems, financed by government budgets, enabling them to reach tens of millions of people – far beyond the operational and financial capacity of any single NGO or social enterprise.
Private sector scaling relies on the existence of market mechanisms connecting suppliers of scalable goods or services with customers from target groups who are willing and able to pay for them.
Civil society organizations can also serve as scaling pathways for powerful interventions, likely relying on grant funding.

Scaling through these pathways is pursued by many but achieved by relatively few. We believe this is generally because of the presumption that any intervention with measurable impact, which has expanded to multiple sites, and which donors have funded, is inherently scalable. While such interventions are good places to start the search for scalable models, our work suggests that most legacy interventions designed and tested under multi-year grant funding lack potential for sustainable scale and cannot be effectively retrofitted for scale.

**What Can Actually Scale?**

Rigorous evidence of positive impact determines whether an intervention should scale. But whether it actually can scale is another question.

Development challenges are typically complex, and characterized by interrelated economic, social, cultural, and environmental factors. To address these complex problems, organizations like CARE have traditionally developed holistic interventions addressing multiple underlying factors. While these complex solutions may be impactful, and replicable when the original organization remains the implementer and uses grant funding, they rarely lend themselves to sustainable scale.

This becomes especially clear when considering who will implement and fund the intervention at scale. For sustainable scaling, an intervention must be owned and implemented by local actors and integrated into local stakeholder ecosystems – in this sense, scale and localization are deeply intertwined. Therefore, the intervention must fit within the current (or possible) capacities, capabilities, and resources of local implementers (government, private sector, civil society) and funders (government budgets, market mechanisms, ODA). This requires the intervention to be simple enough that local actors can deliver it, affordable enough that their available funding can pay for it, and cost-effective enough to be worth scaling.

Most legacy grant-funded programs we assessed for scalability were complex and costly, making it difficult to maintain quality and fidelity at scale. However, simplifying an intervention to align with local capabilities or making it more affordable can result in quality and fidelity losses so significant that the adapted intervention yields reduced or no impact—or in the worst cases, causes harm.

For example, in 2015, CARE collaborated with the Rwanda Men’s Resource Centre and the Rwanda Women’s Network to implement a comprehensive intimate partner violence (IPV) prevention program in Rwanda. A study of the original program found large reductions in reported IPV levels and improvements in related gender norms. However, a subsequent adaptation for scale by the World Bank and Government of Rwanda, which made parts of the intervention simpler and more affordable, resulted in large statistically significant increases in IPV levels and related negative effects on gender norms. A retrospective analysis of these disparate outcomes concluded that “the conflicting evaluation findings are largely due to significant differences in implementation quality.” This example highlights the importance of ensuring that the magnitude of an intervention’s positive impact remains meaningful as it is adapted for sustainable scale. It also points to the importance of establishing and maintaining minimum quality standards to ensure continued impact.

Methods and Tools

With this in mind, we assess whether an intervention is simple and affordable enough to scale and, if not, whether it can be adapted to a scalable “optimal fidelity model” that maintains core impact drivers and delivers meaningful impact. We use three primary methods to assess scalability in terms of simplicity, affordability, and cost-effectiveness:

1. Competitive Landscape Assessment

Our competitive landscape assessment (CLA) is an adaptation of a strategic business analysis commonly used in the private sector. In this assessment, we systematically identify and analyze interventions that target the same outcomes as our intervention of interest, whether or not they use a similar approach. This comparative analysis helps us understand the broader ecosystem of solutions and where our intervention fits within it, in terms of impact, cost effectiveness, and scalability.

The CLA follows a structured, multi-stage process:

First, we create a comprehensive longlist of interventions targeting the same outcome. For each intervention, we collect basic information such as implementing organizations, countries and years of implementation, reach, and approach. We then sort these interventions based on their similarity to the target program, creating an initial landscape map.

Next, we move to shortlisting the interventions by narrowing down the list to 10-15 interventions for deeper consideration. We prioritize interventions with rigorous evidence (experimental or quasi-experimental) of positive impact, as well as those showing signs of scalability. These scalability indicators include government adoption, implementation by multiple organizations across diverse contexts, or a design that is inherently simple and affordable to implement. This stage helps focus our attention on the most promising comparative examples.

Finally, we make a final selection of about 5-6 interventions and conduct an in-depth review of their impacts and cost-effectiveness in consultation with the program team. We ensure diversity in our selection by including different types of interventions or approaches to achieve the same outcome. When similar interventions appear multiple times, we select just one representative example to avoid redundancy and manage the scope of the review.

The completed CLA helps us assess whether existing alternative interventions have stronger evidence of impact and cost-effectiveness, or better scalability potential. This analysis informs whether we should proceed with scaling our intervention, pivot by improving it to incorporate successful features from other models, or partner to support a more promising existing solution.

For instance, CLA proved valuable in assessing the scalability of a CARE intervention focused on improving agricultural yields, income, and women’s agency. This model had been successfully replicated in many contexts using grant funding, and had a rigorous impact evaluation demonstrating improvements to food security, wealth creation, gender equality and women’s empowerment. However, like many similar interventions, it was resource-intensive due to the high costs of in-person training and the difficulty of finding and maintaining skilled trainers. It also relied significantly on uncompensated local volunteer ‘lead farmers’ to provide ongoing support to their peers, a precarious foundation for a scalable model. The CLA revealed that a number of competitors were using existing digital platforms to thoughtfully supplement a leaner set of in-person trainings, which made their solutions less resource-intensive. We also learned that social enterprises selling targeted inputs and services to the same farmers CARE targets (seeds, soil testing, climate projections, tractor-share, etc.) faced challenges in finding last-mile village agents which made it difficult to scale their solutions. This information enabled us to advocate for a more scalable intervention that leveraged digital platforms and partnerships with social businesses to deliver a more sustainable and affordable model.

2. Collaborative Theory of Change Review

We then explore opportunities to simplify an intervention while preserving its impact, engaging program teams and stakeholders in a series of interactive workshops and key informant interviews around the theory of change (ToC). This collaborative process aims to develop a lean optimal fidelity model that balances simplicity (for scalability) with effectiveness.

The review involves three main components:

First, we systematically question whether all activities in the current status-quo theory of change are essential for achieving desired outcomes. We ask program teams to distinguish between non-negotiable components that drive impact and secondary activities that might be desirable but not essential. This critical analysis helps identify opportunities to streamline the program logic and reduce implementation costs without sacrificing core impact drivers.

Second, we explore whether there are existing factors or infrastructure in the enabling environment that could be leveraged to simplify the intervention. For example, can we utilize government community health workers or agricultural extension networks instead of creating parallel structures? Can we integrate the intervention into existing platforms or systems rather than build new ones? Can we leverage technology and leaner systems to simplify operations, rather than simplifying the ToC itself? This analysis helps identify ways to make interventions more affordable and sustainable by utilizing existing resources and infrastructure.

Third, we identify and interrogate assumptions that may hold in pilot implementations but could break down at scale. For instance, an intervention may assume a certain level of implementation quality that is achievable with intensive NGO supervision but may not be realistic to maintain for a government implementing at scale. By proactively identifying these scale-sensitive assumptions, we can address them in developing the optimal fidelity model and intentionally monitor and test them during implementation.

The collaborative ToC review helps envision what a more scalable version of the intervention might look like, and what trade-offs between impact and scalability might be necessary and acceptable. It also builds program team ownership for the scaling journey, which often requires letting go of certain aspects of an intervention they may have been heavily invested in.

It is important to note that, however carefully it is done, simplifying an intervention can have implications for impact. Therefore, any proposed changes should be tested and monitored. For example, A/B-testing can help understand how changes in inputs affect outputs and outcomes. Similarly, a robust monitoring system can help track whether important links in the ToC are remaining intact as the intervention scales. Where larger adjustments to intervention design are made to facilitate scale, or where scaling yields significant variances in implementation fidelity and quality, new rigorous evaluations may be required to ensure that the impacts of the base model are realized at scale.

3. Cost Review

To make informed decisions about scalability, we need to understand not just whether an intervention can be simplified, but also the financial feasibility of scaling. A holistic cost review aims to quantify cost-effectiveness, efficiency, and affordability, providing concrete financial information to guide scaling decisions.

Multi-component interventions with long implementation timelines and intensive human capital requirements are expensive, sometimes costing millions of dollars to implement while reaching relatively few beneficiaries. Though this level of resourcing may be possible for an organization implementing a grant-funded project, for potential “doers and payers at scale,” like governments, implementing such interventions nationally may be financially impossible, even if they recognize the intervention’s value. While economies of scale and integration into existing systems may reduce per capita costs and improve cost-effectiveness, the absolute amount required might still far exceed available sectoral budgets.

A thorough cost review includes several analytical approaches:

We begin by compiling detailed data on actual program implementation costs. We separate one-off costs that will not recur as the intervention scales (such as program design, contextualization, or initial training) from ongoing operational costs that will continue throughout implementation. This differentiation helps us understand the potential cost structure of taking an intervention to scale.

With cost data in hand, we conduct various cost analyses depending on the available evidence. When rigorous impact data exists, we may do a formal cost-effectiveness analysis that calculates the cost per unit of outcome achieved (e.g., cost per percentage point reduction in malnutrition). Common metrics, such as disease incidence (health), child test scores (education), and crop yields (agriculture), allow comparison across different interventions within a sector targeting the same outcomes. Conversion of outcomes to a common unit, such as utility or monetary terms, can build on a cost-effectiveness analysis framework to allow for comparisons across sectors.

In cases where impact data is limited, we might analyze per-unit-costs based on outputs or intermediate outcomes available through monitoring data. Such a cost-efficiency analysis can be used to compare interventions in a sector that have similar theories of change with

comparable outputs. For instance, we might calculate the intervention cost per latrine sold or per savings group established. Further, if there is strong causal evidence for linking inputs-to-outputs and outputs-to-impact, cost-efficiency analysis can help to find the lowest-cost configuration of inputs that supports impact.

We may also develop projections for how costs would scale under different scenarios. This includes modelling how economies of scale might reduce certain costs, how utilizing existing government systems could change the cost structure, and how technological innovations might improve cost-efficiency. Predictive analytics and machine learning can also be useful tools in this effort. Projections are then compared to relevant government sectoral budgets to assess affordability in absolute terms.

In some cases, we use a reverse approach, calculating the maximum cost per beneficiary or output that would make our intervention competitive with alternatives. This provides a helpful reference point: keeping one of the two parameters (financial inputs or desired outputs) constant tells us by how much the second parameter would need to change to be competitive with other evidence-backed interventions.

Throughout the cost review, we consider both quantitative financial data and qualitative information about willingness to pay. For instance, examples of governments having previously paid for similar interventions can provide valuable signals about affordability and priority.

Key Lessons

Together, these three analytical approaches – competitive landscape assessment, collaborative ToC review, and cost review – provide a framework for identifying interventions that are simple, affordable, and cost-effective enough to scale sustainably. They help us understand whether there are viable pathways to develop an optimal fidelity model that balances impact and cost to substantially increase reach.

We believe these methods are valuable for anyone thinking deeply about what should scale and what can scale. The insights they deliver can inform whether organizations should proceed with scaling their own model, pivot by improving their model, partner to scale competitor solutions, or pause on scaling if the intervention cannot be simplified or better alternatives do not exist. Based on the many CARE interventions we have analyzed for scalability in iterating these methods, we have drawn a few guiding lessons that underpin a pragmatic view of scaling.

Not everything can scale, and not everything has to scale

In our enthusiasm for scalable models with potential for exponential impact, we should not presume that interventions that cannot sustainably scale are bad. Relatively few existing interventions are likely to be simple, affordable and cost-effective enough to sustainably scale – the limited number of development innovations that have scaled over the last decade is evidence of that. This is okay, because not all issues lend themselves to sustainable scale, and not all interventions need to scale. Some challenges – for instance related to social norms change, systems change, or other ‘public goods’ – may truly require highly contextual, comprehensive programming approaches that will always rely on philanthropic funding. Where policy failures keep governments from engaging, and market failures create limited incentives for the private sector to engage, there will always be gaps that civil society organizations and philanthropic dollars must fill. While reaching impact at scale is a worthy ambition to address key social challenges, sustainable scale should not be the only goal for all interventions, and replication will remain a valuable approach for certain contexts and challenges.

Scale implies trade-offs between impact, reach and cost

In making interventions more scalable, trade-offs are inevitable. A streamlined optimal fidelity model that local systems actors can afford to implement is likely to reach many more people, though potentially with less impact than the resource-intensive pilot that preceded it. However, if the impact per participant is still significant, and delivered at a cost that makes the intervention more scalable and generates impact more cost-effectively, the trade-off is likely a good one. Alternatively, an intervention’s impact may rely so heavily on complex and costly activities that there is no way to simplify it responsibly. This interplay of trade-offs between impact, reach, and cost will be different across interventions and contexts. There is no right answer to the question of how to balance them; the best we can do is accept that there will be trade-offs that should be optimized. To ensure that impact remains meaningful, it is important to define a set of minimum standards for the intervention – retaining core impact drivers while interrogating the most resource-intensive components – and then ensure those standards are maintained as the model evolves. Of course, changes to intervention design ultimately do limit the relevance of existing evidence, so testing assumptions and re-estimating the impact of new optimal fidelity models through rigorous evidence generation is essential.

Designing for scale from the start may be preferable to retrofitting

Many grant-funded multi-year projects that deliver meaningful impact rely on a level of resourcing that is feasible when donors are funding implementation, but that is inherently not suited to sustainable scale. Sustainable scale requires a solution to be integrated into local systems; implemented by local actors; and paid for by local resources. This requires it to fit the current or potential capacities, capabilities, and resources of local systems actors including government, private sector, and civil society organizations. In our experience, trying to “retrofit” an intervention for scale can be costly, time-consuming, and ultimately ineffective. For instance, interventions that are highly sensitive to changes in program design or implementation quality are difficult to adapt for scale. Additionally, teams often struggle with the trade-off between impact and reach that making a legacy intervention scalable requires. Given this reality, it seems prudent to shift focus from modifying complex legacy interventions to designing interventions with scale, localization, and the realities of local doers and payers at scale in mind, from the start.

We have come to believe that “scaling what works” requires that we become more selective about what we try to scale. By being systematic in how we assess scalability, and designing with scale in mind from the start, the development sector can more effectively navigate the challenging journey from successful pilots to sustainable impact at scale.

The Ultimate Goal: Sustainable Scale

What Can Actually Scale?

Methods and Tools

1. Competitive Landscape Assessment

2. Collaborative Theory of Change Review

3. Cost Review

Key Lessons

Not everything can scale, and not everything has to scale

Scale implies trade-offs between impact, reach and cost

Designing for scale from the start may be preferable to retrofitting

Related Posts

Community Agro-Vet Entrepreneurs: Scaling Livestock-based sustainable Livelihood

Unlocking Government Resources to Scale-Up Innovations

Biofortified Wheat Scaling Success: Akbar 2019 Covers 42 Percent of Pakistan’s Wheat Area

**What Can Actually Scale?**