An Introduction to Evaluation (Part 1): Randomised Controlled Trials

In Blog by Michael


8 November 2024


Michael Galley


Share


An Introduction to Evaluation (Part 1): Randomised Controlled Trials


Evaluation is an essential part of applied behavioural science. The efficacy of behavioural insights, policies, and interventions is often highly context-dependent, making it vital to test potential solutions before they are scaled-up. The most rigorous way to do this is through randomised controlled trials (RCTs). In most of our projects we utilise RCTs to understand whether and for whom our interventions work and to assess how cost-effective they are.

In this first blog of a series on evaluation methods, we introduce RCTs and explore various forms they can take. We also cover some of the advantages and disadvantages of each form of RCT, and briefly discuss when each might be preferable. We conclude with a non-exhaustive list of best practices to follow when conducting RCTs.



So, what are Randomised Controlled Trials?

In an ideal world we’d like to know how a person fares when receiving an intervention versus not receiving it, or versus receiving a different intervention. However, we cannot simultaneously give and withhold treatment to the same person. To overcome this challenge, we create statistically identical groups that act as valid counterfactuals of each other; this is achieved by randomly assigning people from the sample group into either the control or (one of the) treatment group(s). When done at scale and correctly, this creates groups that are statistically identical in terms of relevant characteristics such as demographics, attitudes, and behaviours. After the intervention has been administered and some time has passed, we compare the groups based on a predetermined outcome metric to establish the causal effect of treatment. 


Image

Lab and field RCTs

RCT’s can take place in the ‘lab’ or the ‘field’. A lab setting gives the researcher more control over the trial, but the artificiality of the environment (e.g., participants are not in the real world and know they are participating in an experiment) and the seldom use of a representative sample (usually participants are university students), means that findings may not hold up in other contexts. Field experiments,which take place in the real world and often without participants knowing they are in an experiment, have the converse implications; findings are less artificial, but they are costlier and more complex to run, and there is greater risk of unaccounted-for factors (we call these confounding variables) impacting the trial. Nowadays, as discussed below, trials can also take place online or via mobile apps and these can share characteristics with lab or field experiments depending on the characteristics of the unique trial.


Example: The first RCT in the aviation industry

In 2014, our co-founder Rob Metcalfe, along with colleagues Greer Gosnell and John List, partnered with Virgin Atlantic to conduct the first ever RCT in the aviation industry. This field experiment tested different behavioural interventions aimed at changing pilot behaviour to use less fuel. A total of 335 pilots were randomly assigned to a control group or one of three treatments – receiving information on their performance, being shown performance targets and a comparison to their peers, and being offered prosocial incentives in the form of donations to the charity of their choice. The amount of fuel used by each group was observed and compared for the next seven months (over 42,000 flights). In total, the interventions reduced CO2 emissions by 21,500 tonnes. The most effective intervention - personalised targets - improved pilots’ fuel efficiency by 19%, whilst prosocial incentives improved pilot’s job satisfaction by 6.5%. You can learn more about this case study on our website or check out the academic paper .


Mobile app-based RCTs

In the 21st century, mobile applications can have a major impact on users’ behaviours. As such, they are an important medium for behavioural interventions and an ideal space for running RCTs. Users will interact with a different version of the app depending on whether they are in the control or a treatment group. This can range from a difference in a message or visual presentation of the app, to having access to entirely different app functionality. These app-based experiments are a form of ‘field’ experiments; they have high external validity because they take place as participants are going about their daily lives. App-based experiments can be difficult to run, particularly when they face behavioural obstacles in the first place, such as requiring participants to download the app and allow it to run in the background. However, mobile apps allow for the collection of rich data and can be used to provide users timely and dynamic feedback and other interventions.


Example: Testing incentives through a travel app

As part of our project for the Metropolitan Transport Commission (MTC) of the San Francisco Bay Area (SFBA), we used Metropia’s ‘Go Ezy’ travel app as the medium for two RCTs. A total of 216 Metropia app users participated in the experiments, which took place over 10 months, enabling us to record data on over 7,000 unique trips. In the first experiment, users were randomly allocated between a control group and four treatment groups, with the treatment groups receiving various in-app messages designed to reduce trips taken by car. The second experiment had five treatment groups and tested the impact of different incentive levels and custom suggestions of alternative transport options. We found that incentives can encourage people to take alternative forms of transportation to driving, and that informational nudges can change the behaviour of people who are already familiar with alternative options. You can read more about the experiments here.


Example: Using an online game to impart recycling knowledge

We created The Waste Game for the Irish EPA and Irish Universities Association. We used an RCT to test the effectiveness of the game at increasing key predictors of waste prevention and recycling behaviour (measured via a questionnaire). Students were randomly assigned to one of three groups; the control ground did not play the game, one treatment group played a ‘simple version’, and the other played an ‘enhanced version’ with added gamification features. The results of the trial – proving the game’s effectiveness and finding that the simplified version was more effective – enabled us to finalise the game and scale it with confidence – it is now a mandatory part of the orientation-week curriculum at all universities in Ireland.


Image

Example: Re-creating Android and Windows operating systems to run an RCT on browser choice screens

We partnered with Mozilla to run an RCT to test how different versions of a browser choice screen affect users’ default browser choice. To do this we designed highly realistic simulations of Android and Windows operating systems to recreate the experience that a user goes through when setting up a new device. This online experiment involved 12,000 participants across Germany, Spain,and Poland who were randomised between a control group and four treatment groups. We found that users that were shown browser choice screens had greater satisfaction with their browser, felt a greater sense of control whilst setting up their device, and were more likely to choose independent browsers. The realistic nature of the interfaces made participants feel like they were choosing a browser for real, which increases the validity of the findings. Check out our previous blog on online experiments to learn more about this project and the methodology behind it. 


Image
Image

Challenges of conducting RCTs

While RCTs are the gold standard of evaluation methods, they are not always feasible due to potential practical and ethical challenges. In particular, it requires a large sample size to detect small effects (which are common in behavioural science), and this can be difficult and expensive to acquire. There can also be difficulties randomly allocating people to groups, or ethical questions about deliberately applying or withholding treatment from certain groups. 

If one does decide to run an RCT, here are some best practices to abide by:

  • Ensure random assignment is truly random. Often, customer numbers or usernames may be based on a non-random factor (such as the date the user joined), so simply dividing the list at the halfway point into groups may not be random at all. Using a free online randomiser can ensure you avoid such issues.
  • Avoid cross-group contamination. In certain trials, if participants in the treatment group interact with those in the control group they may inadvertently pass on elements of the treatment and thereby compromise the validity of the control group. This is an important factor for researchers to consider and seek to mitigate through their trial design. 
  • Ensure the intervention being tested does not change during the trial; otherwise, the findings may be contaminated.
  • Test one intervention at a time on a given individual to be able to distinguish the effect that each one has, unless you want to find out the effectiveness of combining treatments.
  • Assess how generalisable the findings are. The generalisability of the trial is impacted by many factors including how realistic the settings are and the similarities in context between where and with whom the intervention is being tested and where and with whom it might be replicated in the future. One recurring criticism of behavioural science research is that it’s mostly conducted in WIERD (Western, Industrialised, Educated, Rich, Democratic) countries and therefore the findings are not always that reliable for developing countries.

In circumstances where running an RCT of any form is not feasible, there are several other evaluation methods that can be used, including interviews and focus groups, regression analysis, pre-post comparisons, difference in differences, and non-experimental surveys. The intervention that is most appropriate will depend on a range of factors including how certain the results must be, whether evaluation was considered in the design of interventions or is being considered ex-post, budget, and several other factors. We will explore non-experimental evaluation methods in more detail in the second part of this blog . That said, RCTs offer a level of rigour above and beyond alternative evaluation methods. As such, whenever possible we should aim to conduct RCTs to ensure our assessment of behavioural interventions are as accurate as possible.


If you’d like to know more about RCTs, or think they might be useful for your organisation, please don’t hesitate to get in touch; we’d love to see how we could help you!