An evaluation is a systematic collection and analysis of data in order to assess the strengths and weaknesses of a project, programme or policy. The aim is to determine the relevance and fulfilment of objectives, efficiency, effectiveness, impact and sustainability. The evaluation can be performed before the project has started (ex ante), as it is on-going or when it is completed (ex post).
In general, evaluations:
- Assess the value of something, in addition to describing it.
- Require a systematic approach and methods with in-depth descriptions.
- Imply that someone regards and places themselves outside what is evaluated, with more or less distance.
- Assume that there is a factual basis, or other sufficiently specific information, that defines what should be measured and what is good or bad.
Evaluations assess whether we are doing the right things, not only if we are doing things right (as planned).
Most commonly, evaluations are performed on public sector matters. Evaluation of private sector initiatives are currently limited and are mostly linked with Corporate Social Responsibilities initiatives or programmes.
Evaluation as a discipline was developed during the Post-World War II era in the US. Evaluations of innovations within the education system, such as new curricula, were undertaken. Anti-poverty programmes were also evaluated. The discipline then spread to Europe and Canada. Historically, evaluations have been more frequently used in periods of reform.
Today, an increasing proportion of social research projects are evaluations. Evaluations examine the effectiveness of projects, programmes or policies. This is valuable information both in terms of decision-making on continuation of initiatives (including funding) and organisational learning.
Ipsos Point Of View
Ipsos recognises that high quality evaluations need a multi-disciplinary team – expertise in the policy area, economics and in social research. For example, you cannot evaluate a labour market policy without understanding what the background in the specific policy are has been, what has been tried before and in other geographies, as well as labour market economics and the experiences and outcomes for beneficiaries of the policy.
The need to demonstrate the effectiveness of policies and initiatives (to justify continuation/roll-out and to secure funding) means that an increasing proportion of social research projects are evaluations. Evaluation is also key to organisational learning and the provision of appropriate, efficient and well-targeted programmes of activity.
The table below provides an illustrative perspective on evaluation taxonomy.
The first stage for any evaluation is the design of the evaluation framework – this should set out the vision of success for a programme or policy, and how the evaluation team will seek to measure whether the intervention has achieved the desired outcomes / if there have been any unintended consequences. There are many formats for evaluation frameworks and often an illustrative representative is used to summarise the 'vision of success' e.g. 'Theories of change' and logic models. A theory of change (ToC) outlines what should happen if the theory supporting an intervention is correct. It is a systematic and well-thought out study of the complex links between processes, activities, outcomes and context and the changes that occur in the short, medium and long term.
Building the ToC from the bottom up through extensive stakeholder engagement is a core strand of the approach. Such an approach is usually required to build a sense of ownership and consensus on what changes should happen in the short, medium and long term. The ToC should be evidence-based and be able to distinguish change at all levels culminating in a clear set of outcomes. The ToC should be plausible, achievable, testable and meaningful to provide an effective means of framing the evaluation research. Various tools can be used including logic chain analysis (see example below), logical framework approach, causal mapping, problem tree analysis etc. ToC provides an explicit understanding of the assumptions (implicit and otherwise) and rationales underpinning complex interventions providing a framework for generating hypotheses and questions for qualitative and quantitative evaluation research. Consequently it helps determine what things should be measured and which cause and effect relationships need to be explored and understand enabling effective targeting of scarce evaluation resources.
An evaluation framework is a document which the client should sign off to demonstrate that they understand and agree with how their programme/ policy will be independently evaluated and judged. This is a key difference between an evaluation and a social research project. It should do the following:
- Background and strategic context: Detail the background to the policy intervention and the reason it is required, and a short literature review (at a minimum) of what has come before, been tried already to resolve the market or equity failure previously/elsewhere
- Impact Evaluation framework:
- Identify the programme level inputs, activities, outputs and the short and medium-term outcomes which are expected to lead to the longer term outcomes – both diagrammatically and in narrative form
- Discuss any limitations / factors which hinder an ability to measure the net impact of a policy/ programme. These might include the way in which the intervention was designed e.g. it was universally implemented so there is no 'untreated group of beneficiaries' to provide a comparison group, or there are a number of external factors which will also influence outcomes achieved but are not measureable.
- Identify a preferred impact evaluation approach (more below) – which might include randomised control trials, quasi-experimental designs or non-experimental approaches including contribution analysis, process tracing and the like.
- Detail the various evaluation questions which will be answered through the impact evaluation, where the evidence will come from – stakeholders, methods and frequency of data collection. A good question matrix will also explain the extent to which the evidence source will contribute to responding to the evaluation question and it is good practice to ensure you have more than one evidence source to triangulate in answering each question.
- Process evaluation framework:
- If it is a process evaluation: provide illustrative and narrative detail of the implementation process from beginning to end, identifying key stakeholder involved. This information will come from a desk based review and a number of familiarisation sessions with people involved in implementing the programme/ policy.
- Discuss any limitations / factors which hinder an ability to understand the effectiveness of the interventions implementation. These might include the ability to access key monitoring information, access to the delivery agent's staff or the budget available to gather views from participants for example.
- Identify a preferred process approach – which might include a review of key performance or management information, consultations with key stakeholders including those delivering a programme, mystery shopping approaches, observational research, focus groups or surveys with beneficiaries/users,
- Detail the various evaluation questions which will be answered through the process evaluation, where the evidence will come from – stakeholders, methods and frequency of data collection. The same points apply re: good practice as per the impact evaluation framework
- Economic evaluation frameworks
- The extent to which a full economic evaluation will be undertaken – to review the economy, efficiency and cost effectiveness of the intervention and how relative comparisons will be made to assess value for money
- Development of a cost/benefit account table which seeks to outline the full costs and benefits and how each will be measured/ where the evidence will come from
- Evidence collection and analysis programme
- Details the timings and data collection approaches which the evaluation team will be using to answer the various evaluation questions. This element is more like a detailed research programme.
The next stage is gathering and analysing evidence in order to answer the evaluation questions. We typically use a mixed method approach to cover the wide range of evidence that is normally required. Depending on the purpose of the evaluation and the nature of the initiative being evaluated, we might use several of the following methods: qualitative and quantitative research (with participant's/service users, staff involved in delivery, managers, key stakeholders); observational research; literature reviews; analysis of monitoring data; and analysis of financial data.
Impact evaluation is defined as an assessment of the impact of an intervention has in terms of benefits delivered. This requires a comparison between what actually happened (i.e. factually) and what would have happened in the absence of the intervention, otherwise called the counterfactual. The fundamental evaluation problem that all impact assessment faces is that we cannot observe what would have happened to those affected by the intervention if the intervention had not happened. Therefore, impact evaluation requires a rigorous approach to establishing the counterfactual. The most robust way to do this is to compare the outcomes achieved by those who benefited from an intervention with the outcomes achieved by a group of people who are similar in every way to the beneficiaries, except that they were not subject to the project intervention being evaluated i.e. a comparison or control group.
It is usual that project beneficiaries are selected in some way either through specific targeting or through some form of self-selection. The selection process implies that beneficiaries are not selected at random, which means that any comparison group should also not be selected at random. Therefore, the comparison group should be drawn from a population that has the same characteristics as the beneficiary group. If these characteristics are readily observed, then this problem is relatively easy to resolve. However, there are characteristics of people that are not easily observed that significantly affect the achievement of the project's outcomes. Consequently, if the people in the comparison group do not possess the same unobserved characteristics then estimates of the project's impact are likely to be significantly biased because the comparison is no longer like-for-like – in other words the evaluation would be subject to selection bias.
The 'Gold Standard' approach to minimising the risk of selection bias is to use randomised controlled trials. This approach requires that the population eligible to participate in the project be identified and then a random sample of that population be selected to benefit from the project's activities. Those who were not selected then represent a valid comparison group because they are similar in every way to the project beneficiary group, the only difference being that they were not randomly selected to participate.
Whilst randomised control trials present the best approach to estimating impact there are limitations to its applications. Most importantly randomisation needs to be built into the design of a project from the very start – if this is not done then the opportunity to robustly apply this approach has been lost. Additionally, randomisation requires that the evaluator maintains a degree of quality control over the execution of this experimental approach throughout the lifetime of the project, which may not be possible. These and other issues, such as political barriers, means that randomisation is frequently only applied to a small sample of projects.
Therefore, for many evaluations the problem of selection bias persists, which can be managed through non-experimental techniques such as propensity score matching. Propensity score matching identifies a comparison group with the same observable characteristics as those in the beneficiary group. It does this by using statistical modelling to estimate the probability of someone participating (i.e. propensity to participate) in the project. A propensity score is calculated and used to match beneficiaries with non-beneficiaries who have similar (propensity) scores enabling comparisons to estimate the additional impacts of the project. A difference in difference approach compares over time the differences in outcomes experienced by the beneficiary group and the comparison group, sometimes called 'double differencing'. Comparing changes pre and post the intervention rather than comparing the absolute difference in the outcome itself helps to address the problem of selection bias. However, this method still relies upon the assumption that external influences on the outcome being evaluated were the same for the treatment and comparison groups.
Multivariate analysis is another way of assessing the links between an outcome and the different factors that may have influenced changes in that outcome, particularly factors influenced through the intervention of the project. Regression-based statistical modelling is used to hold or fix (control for) all other influences or factors apart from one that is of interest. This enables the project-specific causal factor to be isolated in order to estimate its effect on the project's outcome compared to all other potential influences. This approach enables analysis of the effect of each potential factor to inform the assessment of whether or not the outcomes realised were due to the intervention of the project or other prevailing factors either inherent to the beneficiary or the project's external environment.
All good approaches to impact evaluation require good quality data. The most common way of sourcing quantitative data is to undertake a survey of both project beneficiaries and non-beneficiaries in the comparison group. However, this is a costly activity, which means that it would be far too impractical and expensive to survey the whole beneficiary population. Consequently, the approach to survey sampling needs to be carefully designed and executed to ensure that the results are reliable and representative of the whole population.