Why they should still be the standard.
A couple of weeks ago, I wrote a short piece on why Randomised Controlled Trials (RCTs) are considered the gold standard studies to evaluate the efficacy/effectiveness of drugs in peacetime, that is "outside of a pandemic".
In these last couple of decades that have been 2020, I often heard people arguing against considering RCTs as the standard during pandemics. I strongly disagree with that take, for a multitude of reasons, and so I decided to write a piece about it. I kept procrastinating and at some point I thought it was not worth writing it. But then, last week something happened: the American Food and Drug Administration, the organisation which is supposed to ensure the safety and efficacy of drugs administered to patients in the US, gave an Emergency Use Autorization for convalescent plasma treatment in complete absence of data from any robust RCT. Since this was the FDA, and not a random guy on Twitter, this pushed me to finally write this article. Arguments and counterarguments
So, here we are. Why do I think RCTs should still be the standard even during pandemics? To answer this question, it is probably better to ask ourselves the opposite one: we know that RCTs are the standard in peacetime, so why some people think they shouldn't be during pandemics?
Claim: Because we have no time, people are dying and cases often increase at an exponential rate. Response: the fact that people are dying is no different from many other non-pandemic situations, and giving something without any evidence of efficacy can be very dangerous. In fact, precisely because of the exponential increase in cases, if the treatment we chose to give extensively was found to be harmful, we could end up causing a huge number of additional deaths over and above those caused by the virus without any opportunity to realise our mistake and counteract.
Claim: but surely, if the treatment is very simple and we have some safety data, the risk that it is harmful is minimal, so I would prefer to take it even if I didn't know if it was effective.
Response: it needs time (and a good study design) to really know about safety of a specific drug to treat a specific disease. The common perception that safety is such a simple thing to study compared to efficacy is far from reality. There are thousands of examples in the medical research literature of biologically plausible mechanisms being completely disproven by RCTs. Often this is for initially mysterious reasons, because the human body is such a complex system that the effect of a simple compound can be very difficult to predict. Even more, the interaction between a drug and a condition. Claim: well, if a drug has been widely known for ages to treat other diseases, we surely know enough about its safety, even though we do not know about its efficacy against this specific virus.
Response: This is only true as long as safety is thought of as an inherent characteristics of the drug, rather than of the complex interaction between the drug and the disease. But actually what is safe for a healthy individual, may not be safe for an infected one. Most importantly, we only think in terms of safety as bad, huge, side effects happening after taking a pill or an injection. However, even if a pill does not make your skin become green or a sixth finger grow on your hand, it may have a small but significant effect on your body that worsens your condition slightly. And all medical researchers know that small effects are impossible to spot without conducting large studies (preferably trials), irrespective of whether these effects are protective or harmful. Let me give you an example I worked on recently: for decades, children showing up in hospital with septic shock have been given fluid boluses as standard-of-care in Western countries. This is probably one of the most simple types of treatment you could think of: at the risk of oversimplifying, it's basically just some fluid solution, for example a saline one, administered intravenously. Surely, you wouldn't think there were any safety issues with a saline solution, innit?
Were boluses tested widely with RCTs before deciding they were to become the gold standard in treating children (or more generally people) with septic shock? Nope. Why? They were first given during cholera outbreaks in the 1830s, and later suggested for the treatment of shock in early 20th Century, well before the trial era started. With a bit of imagination, I can figure some doctors may have suggested RCTs were necessary later, but were most likely dismissed with claims such as: "There is no equipoise", "the biological mechanism through which they work is obvious and has no risks", "Children clearly look better after receiving boluses", "it's just some salt and water", "it would not be ethical not to give boluses", etc etc. Few small studies were conducted, but results were anything but conclusive.
Now, back to the future: some clinicians were still sceptical in the 2000s, so they wanted to run an actual RCT. It was impossible to run it in the Western world because "it's unethical, etc etc". But boluses were not yet that common in Africa, so it was possible to run one there. If they were shown to be very effective, it would have been great news, as they are a very cheap and simple treatment to recommend using more in Africa. This trial, called FEAST, randomised more than 3000 children with shock to receive either one of two kinds of fluid boluses or no bolus at all. Results? Mortality was 7% in the control group, and 10.5% in both bolus groups. Yes, almost 50% higher, in relative terms, in the groups receiving boluses.
Some of the doctors were so unconvinced, that they accused the researchers of having swapped the two groups. Fortunately, there were two different groups receiving boluses, so this was clearly not possible, and both groups showed similarly harmful effects compared to the single no bolus group.
So here we are, we had a treatment thought to be very simple, safe and effective, that was shown to worsen mortality by around 50% on a relative scale. This means this trial suggests in all these years where we have given a treatment without evidence from RCTs on the ground that "it was not ethical", "it was safe" and "data from small studies were clear", we may have had 50% more deaths than if we had not done anything. Imagine the number of preventable deaths this may have caused... Of course there are plenty of reasons why these results may not be generalisable to different settings, but this is not the point. The point is that they might well be, and the first rule in medicine should always be do no harm.
Claim: these examples are very rare, strategically it is still better to try to give something, the rewards if it works are going to be far more than the harm if it doesn't.
Response: to answer to this claim it is important to both know (i) what are the typical effect sizes of successful treatments in medicine and (ii) how often treatments touted by experts on the base of anecdoctal evidence end up being confirmed to be effective by RCTs.
With regards to (i), the vast majority of successful trials show effects that are far from miracles. A 20% relative decrease in mortality (that is, for example, 40% vs 50%) is considered huge in most settings. Whatever we decided to give to sick patients without evidence during pandemics, would have a nearly null probability of being a "miracle cure". Surely, this probability would be lower than the probability that what we gave actually ended up being slightly harmful. And here we come to point (ii). It's hard to give an estimate to what is the probability that a treatment will work before we run a trial. But past experience should tell us that, even when doctors claim with no shadow of a doubt that they have seen an effect, only rarely trials end up finding one. This paper for example suggests that in certain fields, like oncology, as little as 1 in 20 drugs completes the whole trip through different phases of experimentation and is finally given the green light. And this is mainly for drugs created for a specific purpose.
During pandemics, the first set of drugs to be tested will instead surely be re-purposed drugs. Sometimes drugs that were done with a specific aim in mind turn out to be great at something else. For example Viagra was originally developed to treat hypertension and turned out to be great at treating erectile dysfunction. Coca-Cola was thought of as a patent medicine by its inventor John Pemberton, a potential substitute of morphine or even a headache remedie, before turning out to be great at making people obese. But this is very rare and, most likely, the success rate of re-purposed drugs is going to be lower than that of specifically created ones.
What if there was no harm?
Now, let's forget for a moment about the fact that I just said I believe a small harmful effect to be much more likely than a miracolously large effect. Let's pretend giving a drug can only be helpful. Surely in this (ideal and unrealistic) situation it would make sense to give it without conducting proper trials, you will say? Not quite. There are still a number of reasons why it can still be better to run trials first.
1) Because they can tell us what works and what not. This seems a pretty obvious point, but for some it's not. If we choose a single treatment, and we start giving it to everybody, we will never learn whether it worked or not. Of course we could potentially give more than one drug, but that's just up to a certain extent. Truth is, we might be giving something useless to thousands of people, and prevent at the same time the discovery of a real treatment. For the first wave of cases, it is likely that results of the first trials will be out when everything is over. But if there were more than one wave, knowledge accrued with the trials in the first one, would help treat patients in the second one. This is what happened with dexamethasone/steroids recently. Running trials has helped finding that these were useful, and could potentially prevent a lot of deaths. Had all the countries just started giving HCQ, this would have not probably been discovered, because of the results of a first small study on the topic. 2) Because production of the drugs has to be scaled up to meet the demand. No drug can be instantaneously made available for millions of additional people compared to normal times. While appropriate actions are put in place to increase production, rather than witnessing a race to take hold of the available doses, it is wise to use the available stock within studies only, accruing additional evidence both in terms of efficacy and safety. As usual, this is best done with RCTs in order to make sure to be able to use this information in the best possible way.
[Of course it is theoretically possible to behave like a rogue state, a hyena or a jackal. This would involve not running any RCT, securing all possible stocks of a specific drug and letting other states run trials with the few remaining ones. Some states may have done this, but I refuse to consider it a viable and/or ethical strategy. Nonetheleast because it only works as long as you're the only grasshopper among ants. ]
Of masks and lockdowns
So, summarising, if we were to give a certain treatment to everybody at the start of a pandemic, before conducting actual trials, we would:
- Have a very small probability that what we choose works at all; - Have a similar probability that it harms; - Have a negligible probability that it is an actual "cure"; - Have no way to know whether it was a good idea in the first place to give it; - Witness a race to get hold of the limited available stocks.
Of course, this is the scenario specifically for pharmaceutical interventions. As I said, two of the main problems are that (i) it is virtually impossible to forecast what will happen when giving a drug to an individual population because of the complexity of human biology and of huge individual variability; and (ii) there is a reasonable probability that you may be worsening things, compared to doing nothing ("Do no harm").
So, should we use trials before we take any decision during pandemics? Not necessarily. The considerations I just made for drugs, will have to be made of each specific, possibly non-pharmaceutical, intervention we'd want to test. Of course, depending on the situation, the trade-offs might be very different. What are some of the differences between drugs and other interventions? 1) Some interventions have no real cost, and come with almost unlimited availability. Pretty much the whole population now owns one or more masks in many countries. As long as we talk about any cloth mask or face covering, there will never be any shortage. So, for masks, what we said about drug availability does not hold. Are there any other possible "costs"?
Some people say that there is a real risk that masks may worsen things by giving a sense of "false security". I am personally very skeptical of these claims, though of course we would need data to confidently rule them out. One way to test this, could be to run Cluster-Randomised or Stepped-Wedge trials comparing disease incidence between, for examples, schools randomised to different policies. This is perhaps the only reasonable way to investigate whether masks work, but would not be an easy thing to do. Most likely a potential trial investigating this should enrol hundreds of schools to be able to detect any effect with confidence. In my view, such trials should mainly be used to rule out any harm.
2) RCTs may investigate effectiveness or efficacy of a certain intervention. Efficacy is about whether or not something works in an ideal world where everybody can and does use it as intended. Effectiveness is about the actual real-world effect, taking into account the fact that somebody will not adhere, whether because they can't or they don't want to. Either of the two can be important to study in different settings. It is important to know what is the theoretical effect of a drug, but if somebody has a serious side effect, then we cannot ask them to keep getting the drug anyway. So it is also important to understand the effect of prescribing a certain drug, whether or not it is finally taken as intended. One important point, that is often overlooked, is that while there is in general a single true, but generally difficult to estimate, efficacy, there can be several different effectivenesses in different settings. While in clinical trials a good proportion of the reasons why people may stop adhering are things completely unrelated with the trial itself (e.g. side effects), with non-pharmaceutical interventions results of a trial, and its depiction in the media, could clearly affect effectiveness in real-world in either direction. Let's consider two scenarios: A) masks reduce transmission by 10%. We power multiple trials to detect that 10% effect. However, only 50% of people use them as intended. Because of this, if we do not account for it in the sample size calculation, our power drops significantly (even from 80% to ~30%). When the results are published, the message is surely "masks don't work". People get angry with those pesky scientists who keep changing their minds, and stop using masks. Only 20% of people keep using them. Effectiveness drops further. With our trial(s) we have only worsened things.
B) masks reduce transmission by 20%, since we expect only 50% of people to comply, we power a trial with this assumption in mind. Results are positive. Effectiveness is similar/higher than hoped. Results are amplified in the media, but there is a hard base of people who will not wear a mask, whatever happens, because "I want to be free to protect or do harm to whomever I like". Effectiveness improves, but only slightly.
Given these two scenarios, is it worth running these trials? Is it really that important to know whether (cloth) masks work or not? Of course, a hardcore evidence-based person would argue it is, because these trials would have to prove that, if anything, masks are not dangerous. However, we might not even be able to exclude a harmful effect in scenario A. The old adage "absence of evidence is not evidence of absence" holds. Standard frequentist trials are based on testing a null hypothesis. There are broadly two possible strategies, corresponding to two different null hypotheses: (i) tell everybody to wear masks, and change policy if there is evidence of harm. Or (ii) avoid masks, unless there is evidence of benefit. In (i) the null is that masks are not harmful. In (ii) that they are not effective. While in clinical trials, the risks are such that a strategy such as (i) might be incredibly dangerous, my idea is that it is a much preferable strategy for things like masks, for which possible harms are very unlikely. Of course, if we were really worried about the potential "invincibility" effect of masks, we could launch a campaign to sensibilise about it. We clearly could not run any such campaign to sensibilise the human body to react as we hoped to a drug, instead!
3) It is very difficult to explain to the general population why randomised experiments are important. Clinical trials are mainly to be interpreted by clinicians, who are (at least ideally) supposed to understand them and to use them wisely to make informed decisions. Clinicians are generally happy to accept randomisation (though reality might look a bit more like a mix similar to that depicted in my previous post).
Now, imagine instead the idea of randomising, for example, different regions to lockdowns or differently stringent measures. Imagine convincing entrepreneurs in North Yorkshire that they had to close down their businesses while those in South Yorkshire can carry on with their activities. Imagine making sure that these different policies are followed to at least such an extent that effectiveness can be decently measured. Once again, imagine how much lower effectiveness can be, compared to the ideal efficacy, if I lived in Greater London and I was asked to stay at home while I knew that people in the City of London were allowed to move freely with no restrictions...
In conclusion...
Randomisation is one of the most amazing inventions of human mind, and it has to be used in clinical research, even during pandemics, to understand which treatments work best. This is particularly true for pharmaceutical interventions because of the complexity of the human biology, of the severity of the consequences of a bad decision and of the importance of learning what works even for future outbreaks.
When talking about different, non-pharmaceutical, interventions, alternative strategies could be adopted, but these need to be properly justified, and assessed. Trials could still be helpful, but should be used wisely.
Comments