Fraud is a big problem in science. People fake academic work. In some cases, this is silly and fairly amusing, the retracted paper that claimed that men who carry guitar cases are more attractive. In other cases, like the story of Yoshitaka Fujii who admitted to fabricating results in trials of drugs used to treat serious surgical complications, it’s less funny and more terrifying.
And a new paper has just come out estimating that this problem is way bigger than most people believe. It’s possible that one in seven scientific papers are the result of people fabricating their results.
Scientific Misconduct
For most bad practices, you’ll see a huge grey area where different professors argue one way or another about what constitutes an issue. Some people think self-plagiarism - copying paragraphs or even large segments from one of your papers to a new one - is terrible, while some senior professors defend it. Everyone agrees that p-hacking, where scientists cherry-pick results or analyses to find statistically important results even though their data does not show anything, is bad, but there are degrees of bad.
Fraud is no different. There’s a wide range of things that you can do which could potentially be considered fraudulent in the right context. Deleting certain observations from a dataset and calling them “outliers”, for example. In some circumstances, that could be fraudulent practice - in others it’s totally fine. There are grey areas all the way up until people start fabricating data out of thin air or truly torturing the data until it confesses, say, throwing out half a sample.
In the past, everyone’s agreed that fraud - real, serious fabrication of the Fujii kind - is rare. In part, this is because academia works entirely on a system of trust. Everyone from journal editors to readers assumes that no one fabricates data on a regular basis, and even if they do it can’t be a real problem because of peer review.
It’s also because the most popular estimate we’ve had until now of scientific misconduct put the rate at about 2%. This estimate came from a paper that aggregated together a series of surveys where academics were asked whether they had ever done something that they would consider to be straightforward misconduct - usually plagiarism, fabrication, or similar. If 1 in 50 scientists has ever done something dodgy, the proportion of research that is fake would likely be quite low.
But this estimate has some serious flaws. People lie. People who fake research lie more than people who don’t, by definition. There’s no guarantee that these people answer surveys correctly, if at all. We also don’t know how often these people fake findings - if it’s once per career, that’s not a problem. If it’s once a week, that’s…worse.
In this void of knowledge, infamous scientific sleuth - and a colleague of mine - Dr. James Heathers just preprinted a new paper. It looks at the current published estimates not of how many people will admit to fraud, but how many fraudulent papers have been found when sleuths dig into the literature.
The estimates are, in a word, stark. Of the systematic investigations included in this review, the lowest rate of misconduct was from a study that looked at image manipulation in a massive dataset of >20,000 papers. 3.8% of these were found to have had issues consistent with authors faking images in their papers. The highest estimate comes from a much smaller but otherwise similar investigation in which nearly 30% of the papers examined contained some measure of image manipulation.
And it’s not all issues with images. One of the more famous studies of this nature was by an anaesthetist called Dr. John Carlisle. He reviewed hundreds of randomized clinical trials - the sort that are used by doctors to decide what to do with your healthcare - and found that a staggering 14% of them contained “false data”. The number grew to the truly wild 44% in cases where Dr. Carlisle made the authors send data alongside their papers.
One in seven of the studies that are used by your doctors to treat your health. At least, if you were seeing an anaesthetist for some reason in the last decade or so.
This number - 14% - is echoed across many of the other investigations of this nature, leading the paper to conclude that it is a plausible estimate for the total proportion of fraudulent work in the scientific literature.
How Many Frauds?
As someone who has some experience in investigations of scientific misconduct, I find this question extremely interesting. I have personally been involved in dozens of retractions, in some cases where the authors admitted to fabricating data, and I agree wholeheartedly that the number of fake papers is far higher than many people realize.
I also find it hard to fault the general argument in this paper (disclaimer - I helped Dr. Heathers with the draft). The 1/7 number may be wrong for many reasons which the paper discusses at length. The new preprint probably covers most of the available estimates - there really are not that many of them - but is far from systematic, and Heathers may have missed some estimates. Most fraud investigations such as these focus on areas where we expect there to be issues - it’s likely that there are areas of research with far lower rates of misconduct.
Science is, by design, very heterogeneous. This means that there are lots of areas which are very different to one another. Mycological studies are quite different to bovine ecology, and both barely resemble studies looking at atmospheric science. It’s likely that different areas of science simply have enormously different rates of misconduct, because there are different incentives and ways to fabricate data depending on where you look.
That being said, the really scary part about this data is that some of the highest estimates of misconduct are in studies of human medicine. The exact sort of thing that defines how you and I get medical care. One professor of obstetrics and gynaecology estimates that up to 30% of published, peer-reviewed randomized trials in the field have been fabricated entirely. It’s possible that every person who has been pregnant or given birth in the last few decades has to some extent been impacted by fabrication, especially if they used alternative therapies or supplements such as probiotics.
None of this means that all hope is lost when it comes to medicine. In fact, this feels like the point where it starts to get better. There are also areas where the structure and circumstances of how the science is done make it much less likely. If fraud is uncomfortably common, it almost certainly less common in the kind of large, multicentre trials that define many of the most important interventions that we use. It’s not that hard for one guy who claims to be running trials of anaesthesia drugs to fabricate datasets - it’s much harder to fake data when you’ve got hundreds of people across dozens of hospitals who have access to the data with the system logging every entry automatically.
There are areas where fraud is almost unknown in medical science, which means that we could probably eliminate it in all areas. We just need to acknowledge that we have an issue and start working to fix it.
We, as a society, have a huge problem. Whether the number is one in seven, or even half that, we can say with a great deal of confidence that quite a lot of the studies we rely on to understand reality never happened at all.
We really do have to talk some time. This is awesome.
Nice summary. I don't see a link to the original preprint you're discussing? Also what areas do you have in mind: "There are areas where fraud is almost unknown in medical science"?