In 2013 Danish pharmaceutical company Lundbeck was authorised by the European Medicines Agency (EMA) to market Selincro – their trade name for the opiate-blocking drug nalmefene – to reduce consumption among dependent (but not physically dependent) drinkers.
Authorisation paved the way for nalmefene to tackle the bulk of dependent drinking lying below the iceberg-tip of physically dependent drinkers aiming for abstinence – and opened up for its manufacturer a large and potentially lucrative market, provoking accusations of an expensive and inappropriate medicalisation of lesser degrees of dependence based on unproven effectiveness.
To grasp the essence of the controversy, first we have to understand the dubious world of the post hoc sub-sample analysis, the type of analysis on which authorisation was based.
Imagine you have carefully levelled the playing field in a study by randomly allocating patients to a medication or to an identical but inactive placebo. Then eliminating any further bias, you check how the patients do. It can be likened to randomly loading coins with medication or placebo, then tossing them in the air and leaving them to fall – a process over which you have no control once the coins leave your hand.
If the medication worked, you would expect to see, not an even split of heads (healthy outcome) and tails (not so good), but the medication-loaded coins tending to fall on the healthier side. That might happen, but not consistently enough to meet conventional criteria for a significant effect. However, now you have a great advantage: you can actually see how the coins have fallen. You can check the one-pences, the two-pences, the five-pences, the ten-pence coins, the 20-pences, the pounds and the two-pounds. Maybe in one of these subsets there is such an excess of heads that you can pronounce the medication effective, at least among (say) the ten-pence patients. Had you said in advance you would focus on the ten-pence patients, you would have risked another negative finding. But with the data in, now you can see what the outcome actually was.
The conventional criterion for a significant effect is that the difference between the outcomes of medication and placebo patients would have happened less than one in 20 times by chance – a result considered so unlikely that something more must have been involved. Everything else having been equalised, that ‘something’ could only have been the medication.
Now we can see that researchers have an almost sure-fire way to generate a statistically significant finding: slice up the sample in lots of ways until in one subset the magical ‘less than one in 20 by chance’ result emerges. Try more than 20 slices, and a significant finding becomes more likely than not, even if in reality the medication is ineffective.
It is not enough to back-engineer good reasons for after-the-event (or post hoc) sub-sampling, and to deny trawling the data until a ‘significant’ pattern of excess heads was found. The possibility that this could have happened has to be eliminated. Otherwise the analysis can merely suggest the medication might be found effective in another trial limited to these patients, or at least where sub-sampling was planned in advance. Without this, it remains of unproven efficacy.
Authorisation to market Selincro rested on just such an analysis, undertaken in response to unconvincing initial findings in Lundbeck’s trials. Most ways of assessing the primary drinking outcomes had left nalmefene with no significant advantage over a placebo. When it was assumed patients not followed up were drinking at their pre-trial levels, none of the comparisons with a placebo reached statistical significance.
Faced with these results, Lundbeck and their research associates conducted sub-sample analyses which excluded medium-risk drinkers, and those at higher risk who had rapidly remitted even before treatment started – drinkers who tended to stay remitted, leaving Selincro little to improve on. What remained was a higher risk sub-sample who remained at high risk when treatment started. Among these patients, nalmefene had greater scope to reduce drinking, and the results were more consistently positive – but in the process, scientific credibility had been sacrificed.
The EMA’s scientific advisers admitted it was ‘not ideal’, but shrugged off post hoc sub-sampling as common in psychiatric trials due to high dropout. But in this case, high dropout was not the rationale. Instead, sub-sampling had been ‘proposed’ by Lundbeck ‘in order to define a population where the benefit of Selincro would be greatest’. Not just the effect, but the intention it seems was to find a slicing strategy which favoured Selincro. Sub-sampling also helped exclude about half the randomised patients, leaving a small and probably atypical remainder to supply the critical data. Together with multiple reasons for excluding trial applicants, it meant the results could not be relied on as an indication of nalmefene’s likely impact among the generality of drinkers.
Once made, the EMA’s decision initiated a chain leading to its approval for the NHS in Britain. In self-justifying loops, during European authorisation Lundbeck conducted the sub-sampling analysis in order to maximise nalmefene’s apparent impact, which in turn justified authorisation for these kinds of drinkers. This justified a published analysis focused on these drinkers and led to cost-effectiveness analyses based on the sub-sample, leading the National Institute for Health and Care Excellence (NICE) to say the NHS must make the product available for these types of drinkers.
Each link in the chain retained the original analysis’s vulnerability to bias and its questionable applicability to patients in general. To this, NICE added acceptance of the company’s argument that it was neither appropriate nor possible to compare nalmefene with naltrexone, its cheaper parent drug. One strand in the argument (justified by the unreliable sub-sample analysis) was that nalmefene was licensed to reduce drinking, but naltrexone to promote abstinence. In fact, naltrexone usually promotes reduced drinking, and does so among the same types of drinkers.
The other argument which led NICE to discount naltrexone was the company’s assertion that required data was lacking from trials, and that these were so different from the nalmefene trials that comparison would have been invalid. Contradicting their own case, Lundbeck later sponsored and co-authored just such a comparison. Its findings were broadly but not always significantly in favour of nalmefene, but were undermined by the sub-sampling decision. In the three largest of the four nalmefene trials, this gifted the drug an advantage not replicated for naltrexone. The dice were stacked against naltrexone, but only a reader familiar with the source studies would have known.
Eliminating naltrexone from Selincro’s therapeutic ball-park or finding it less effective was vital to Lundbeck. Financially, the company had suffered from the expiring of patent protection, leaving its medications open to competition from cheaper, non-branded, ‘generic’ equivalents. Selincro was meant to help plug the resulting revenue gap, but this would not happen if it too faced competition from generic naltrexone. An indication of how crucial this kind of issue was, in 2013 Lundbeck had paid a 93.8m euro fine imposed by the European Commission after being found to have paid rivals manufacturing generic antidepressants to ‘stay out of its market and delay the entry of cheaper medicines’.
Beyond naltrexone – and beyond this abridged version of the story – is whether any medication is appropriate for the kinds of drinkers at whom nalmefene is targeted. Full story and supporting citations at http://findings.org.uk/PHP/dl.php?file=Palpacuer_C_1.txt&s=dd
Mike Ashton is editor of Drug and Alcohol Findings, http://findings.org.uk