7 min read

Evaluation’s Journey towards the Future, Part 2. How did we get here?

Evaluation’s Journey towards the Future, Part 2. How did we get here?
jeremy-bishop-AsAO6aCGRqU-unsplash 900x500
<p><img src="https://www.zendaofir.com/content/images/2025/11/jeremy-bishop-asao6acgrqu-unsplash-900x500-1-300x163.jpg" alt width="987" height="536"></p>

In my previous post I imagined the ancient evolution of evaluation as a journey across uncharted waters - aquifers, canals, streams and tributaries flowing together to become a long, winding river.

In this post, I trace modern evaluation's journey across continents and sectors. It reflects its ancient instincts while continuously adapting to the needs of the time.

Today evaluation is more than a river. It has reached wider, more open waters - an ocean, perhaps. It has flowed across vast landscapes, shaping and being shaped by the complex systems around it. Now it is no longer on a fairly linear, predictable path of evolution, but part of a vast, interconnected sea where it has latent potential to be a central force for understanding, navigating and shaping the future. The challenge now is to navigate the currents and waves as they increase in turbulence and unpredictability.

Let us briefly examine how we arrived here.

The Genesis: Post-War Accountability and Efficiency (1950s-1960s)

In the rubble of WWII, Western governments urgently needed ways to rebuild, and rebuild fast. Evaluation emerged as a systematic exercise, but one focused almost exclusively on accountability-quantitative, top-down, and financial. It served as an audit tool, verifying whether funds were spent properly, and whether activities were completed and outputs achieved.

Countries emerging from colonisation mirrored these approaches, often replicating colonial audit practices and emphasising project completion without much concern for the difference made. Meanwhile, in the private sector 'evaluation' primarily meant performance measurement, centered on financial returns and operational metrics, with formal evaluation methodologies still in their infancy. Evaluation was a tool for control-nothing more, nothing less.

Formalisation and Emerging Critiques (1970s)

The 1970s saw evaluation formalise as a field in its own right. Western governments and multilaterals integrated it into public systems to track programmes and public spending, reinforcing top-down control. Neoliberal ideas were simmering in the background, laying the groundwork for future efficiency obsessions.

Yet even then, critics were pushing back. Paulo Freire's Pedagogy of the Oppressed inspired Participatory Evaluation approaches, focused on giving voice to the marginalised while in the UK, Robert Chambers pioneered Participatory Rural Appraisal. Michael Scriven introduced Goal-Free Evaluation, challenging the fixation with pre-set objectives. Stake's Responsive Evaluation and Patton's early initiating of Utilisation-Focused Evaluation shifted attention to real-world use and relevance. Logic models began making systematic appearances, mapping inputs to outcomes.

Across it all was a clear tension: evaluation could either reinforce bureaucratic power-or disrupt it in service of justice, equity and community agency.

Expansion and the Neoliberal Turn (1980s)

The 1980s brought expansion, but also distortion. Evaluation flourished, but largely in the service of neoliberal policies and efficiency.

Under Reagan, cost-benefit analyses dominated. In Chile under Pinochet, evaluation supported authoritarian control. Governments and organisations institutionalised evaluation further to demonstrate transparency and control.

Aid-driven evaluation started to arrive in the Global South, bringing with it an outsider's gaze, often failing to notice the knowledge already present in local systems. Logical Frameworks or logframes became standard-even though critics were already highlighting their rigidity and inattention to complexity. The Inter-Agency Working Group on Evaluation, precursor of UNEG, formed as a forum for discussion and harmonisation of evaluation practices in the UN system.

At the same time, the field's intellectual undercurrents grew richer. Deliberative Democratic Evaluation and early feminist critiques questioned evaluation's assumed neutrality. Participatory methods expanded across the Global South, infusing evaluation with goals of empowerment, not just measurement.

The field expanded-but remained torn between competing visions: compliance and control on one side; learning and empowerment on the other.

Professionalisation, Values and Innovation (1990s)

The 1990s heralded an era of rapid professionalisation and innovation, still largely in the Global North and among aid agencies. Standards were developed to ensure consistency, led by the AEA's Guiding Principles for Evaluators and the JCSEE Programme Evaluation Standards. The Evaluation Cooperation Group (ECG) emerged to promote harmonisation among the multilateral development banks, and already in 1991, the OECD DAC codified the now-famous criteria: relevance, effectiveness, efficiency, impact and sustainability-criteria that would dominate development evaluation for decades (only modified in the 2020s, including adding 'coherence'.

New evaluation designs blossomed in the Global North with its abundant resources. Empowerment Evaluation stressed community control; Theory-Driven Evaluation brought programme theory into the heart of evaluation design; and Realist Evaluation reframed questions: what works, for whom, in what contexts and why? Culturally Responsive Evaluation (CRE) pushed evaluators to centre marginalised cultures and voices.

In the private sector, corporate social responsibility (CSR) efforts grew, pushing businesses to adopt quasi-philanthropic evaluation frameworks primarily for reputation management. On the other hand, philanthropy moved away from just compliance reporting, and toward outcome-driven funding, learning from the academic and development sectors. A paradox also emerged: the private sector began mimicking philanthropy's social impact language, while philanthropy adopted corporate-style (e.g. ROI) metrics.

By the end of the decade, evaluation had shifted from a largely technocratic activity to a more contested, diverse and reflexive field, openly grappling with questions of power, participation and justice.

Integration, Mobilisation and Going Global (2000s)

As the new millennium dawned, integration started to characterise evaluation as it became increasingly embedded into programme management and learning frameworks aimed at strengthening evidence-based planning and decision-making. The Millennium Development Goals brought urgency to the need to show results, and results-based management (RBM) became standard in aid-driven programmes as well as among governments who followed the Western example.

New terminologies reflected this shift: from Monitoring and Evaluation (M&E) to Monitoring, Evaluation and Learning (MEL); Monitoring, Evaluation, Research and Learning (MERL) in research-heavy organisations, while NGOs stressed Reflection rather than Research; and Monitoring, Evaluation, Accountability and Learning (MEAL), especially in humanitarian settings.

Meanwhile, Indigenous and Global South practitioners raised their voice more loudly. Indigenous and African-American evaluators and early proponents of Made in Africa Evaluation (MAE) demanded that evaluation fully reflect local cultures, contexts and knowledge systems, not just imported Western frameworks and practices. NGOs like BRAC and the Landless Workers' Movement embedded participatory evaluation into broader struggles for systemic change.

At the same time, Randomised Controlled Trials (RCTs) became fashionable, aggressively promoted especially in the Global South by organisations like J-PAL, the World Bank and IPA as the 'gold standard' for evidence despite growing critiques about their limitations in complex social settings.

Evaluation networks multiplied, engaging especially government, academic, bi- and multilateral, development and humanitarian funders, commissioners and evaluators. The African Evaluation Association (AfrEA) formed in 1999, while ReLAC in Latin America, the International Organisation for Cooperation in Evaluation (IOCE) as global umbrella for all evaluation associations, and the International Development Evaluation Association (IDEAS) emerged in the early 2000s.

Technological innovations also began reshaping practice. The spread of mobile technology, digital surveys and GIS mapping made more dynamic, granular and spatially-sensitive evaluations possible.

Private sector evaluation continued to focus on outputs, compliance and reputation management, as well as standardised, low-risk reporting reflecting global standards like the Global Reporting Initiative. Meanwhile evaluation in philanthropy, became more sophisticated, with major foundations establishing dedicated impact measurement teams.

In the academic sector evaluation started to gain traction as a field of study, though formal programmes remained limited. The launch of IPDET in 2001 helped bridge capacity gaps internationally.

Complexity, Diversification and Decolonisation (2010s)

The 2010s marked a shift from static, linear models towards complexity-aware, adaptive and equity-driven approaches - increasing recognition that change is messy, non-linear, and unpredictable. Developmental Evaluation, Principles-Focused Evaluation and Dynamic Evaluation brought concepts like emergence and feedback loops into standard practice, while adaptive management took hold in development sectors, requiring real-time data, learning and flexibility.

Technological advances raced ahead. Mobile data, remote sensing, big data analytics and real-time feedback systems expanded opportunities but also sharpened digital divides and raised ethical red flags.

Calls for equity, inclusion and decolonisation gained strength. Designs like Transformative Evaluation and movements like Culturally Responsive Evaluation, Indigenous Evaluation and Made in Africa Evaluation pushed harder against Western-centric models, interrogating power structures within evaluation itself, amplified by new networks like EvalMENA and EvalPartners with its subnetworks EvalIndigenous, EvalYouth, EvalSDGs, EvalGender+ and the Global Parliamentarians Forum for Evaluation. The UN's declaration of 2015 as the International Year of Evaluation further intensified these voices.

Meanwhile, the private sector, especially in impact investing and ESG frameworks, deepened its use of evaluation using faster, leaner approaches driven by returns on investment; tensions around accountability versus marketing continued. Practices started to converge with philanthropy and development, for example with the impact measurement framing by Impact Management Project, although significant divides remain. Foundations moved increasingly towards participatory, equity-driven models, especially as decolonial critiques gained influence.

Academia also evolved, with more evaluation programmes at universities across the Global South and North. The establishment of the Global Evaluation Initiative (GEI) around 2020 signaled new investments in system-wide capacity strengthening, particularly for LICs and LMICs.

By the end of the decade, evaluation was no longer merely a tool for evidence-informed accountability, decision-making and learning, but as a facilitator of strategic learning and inclusion, and a political, cultural and strategic practice capable of reinforcing inequities or advancing justice depending on how it was framed and enacted.

Today's Frontiers: Polycrisis, Transformation and the AI Disruption (2020s)

The 2020s, the polycrisis, most visible in the COVID-19, the climate and environmental crisis, global geopolitical shocks and aid system disruptions forced us into a new era: one of urgent, messy, adaptive evaluation.

Complexity and systems thinking are by now well advanced although still struggling to be applied in practice. Designs and approaches such as Developmental Evaluation, Realist Evaluation, Contribution Analysis, Process Tracing and Outcome Harvesting are widely accepted, and Rapid Evaluation and Real-Time Evaluation (RTE) have become indispensable.

The call for decolonising evaluation has grown impossible to ignore. It's no longer just a critique-it's a demand for radical reframing. Initiatives like EvalIndigenous, EvalSouth and MAE are pushing the field toward systemic changes in power, knowledge hierarchies and framing.

A focus on transformation has accelerated, catalysed by the urgency of the polycrisis and the emerging failure of the Sustainable Development Goals and the gathering polycrisis. The lines between evaluation, monitoring, research and learning are increasingly fluid. Initiatives such as Blue Marble Evaluation (BME), the International Evaluation Academy (IEAc), EvalIndigenous, the International Development Evaluation Association (IDEAS), the Climate Investment Funds (CIF), the Global Environment Facility (GEF), UNDP Systems Monitoring, Learning and Evaluation (SMLE) Initiative as well as many prominent individuals are championing evaluation as an instrument for deep systems change and transformation. This is accompanied by a rise in efforts to include specific issues like futures thinking, the environment, just transition, resilience and regeneration into evaluation design.

Technology continues to reshape practice. AI, machine learning, remote sensing and blockchain are becoming part of evaluation toolkits, bringing new possibilities but also new ethical minefields.

In the private sector, in philanthropy and business, the use of learning-oriented, flexible evaluation models as well as systemic approaches is growing. Impact investors demand better metrics, but often still walk the tightrope between real learning and public relations.

By 2025, evaluation is increasingly being positioned not just as a technical craft, but as a political, ethical and ecological practice, essential for navigating transformation-to be guided by humility, justice and adaptability.

This era demands that we rethink, reframe and reform what we do, how, when and for and with whom. Meeting this challenge remains an urgent, unfinished task.