One morning last summer, a German psychologist named Mathias Kauff woke up to find that he had been reprimanded by a robot. In an email, a computer program named Statcheck informed him that a 2013 paper he had published on multiculturalism and prejudice appeared to contain a number of incorrect calculations – which the program had catalogued and then posted on the internet for anyone to see. The problems turned out to be minor – just a few rounding errors – but the experience left Kauff feeling rattled. “At first I was a bit frightened,” he said. “I felt a bit exposed.”
Kauff wasn’t alone. Statcheck had read some 50,000 published psychology papers and checked the maths behind every statistical result it encountered. In the space of 24 hours, virtually every academic active in the field in the past two decades had received an email from the program, informing them that their work had been reviewed. Nothing like this had ever been seen before: a massive, open, retroactive evaluation of scientific literature, conducted entirely by computer.
Statcheck’s method was relatively simple, more like the mathematical equivalent of a spellchecker than a thoughtful review, but some scientists saw it as a new form of scrutiny and suspicion, portending a future in which the objective authority of peer review would be undermined by unaccountable and uncredentialed critics.
Susan Fiske, the former head of the Association for Psychological Science, wrote an op-ed accusing “self-appointed data police” of pioneering a new “form of harassment”. The German Psychological Society issued a statement condemning the unauthorised use of Statcheck. The intensity of the reaction suggested that many were afraid that the program was not just attributing mere statistical errors, but some impropriety, to the scientists.
The man behind all this controversy was a 25-year-old Dutch scientist named Chris Hartgerink, based at Tilburg University’s Meta-Research Center, which studies bias and error in science. Statcheck was the brainchild of Hartgerink’s colleague Michèle Nuijten, who had used the program to conduct a 2015 study that demonstrated that about half of all papers in psychology journals contained a statistical error. Nuijten’s study was written up in Nature as a valuable contribution to the growing literature acknowledging bias and error in science – but she had not published an inventory of the specific errors it had detected, or the authors who had committed them. The real flashpoint came months later,when Hartgerink modified Statcheck with some code of his own devising, which catalogued the individual errors and posted them online – sparking uproar across the scientific community.
Hartgerink is one of only a handful of researchers in the world who work full-time on the problem of scientific fraud – and he is perfectly happy to upset his peers. “The scientific system as we know it is pretty screwed up,” he told me last autumn. Sitting in the offices of the Meta-Research Center, which look out on to Tilburg’s grey, mid-century campus, he added: “I’ve known for years that I want to help improve it.”
. . . .
“Statcheck is a good example of what is now possible,” he said. The top priority,for Hartgerink, is something much more grave than correcting simple statistical miscalculations. He is now proposing to deploy a similar program that will uncover fake or manipulated results – which he believes are far more prevalent than most scientists would like to admit.
When it comes to fraud – or in the more neutral terms he prefers, “scientific misconduct” – Hartgerink is aware that he is venturing into sensitive territory. “It is not something people enjoy talking about,” he told me, with a weary grin. Despite its professed commitment to self-correction, science is a discipline that relies mainly on a culture of mutual trust and good faith to stay clean.
. . . .
If Fanelli’s estimate is correct, it seems likely that thousands of scientists are getting away with misconduct each year. Fraud – including outright fabrication, plagiarism and self-plagiarism – accounts for the majority of retracted scientific articles. But, according to RetractionWatch, which catalogues papers that have been withdrawn from the scientific literature, only 684 were retracted in 2015, while more than 800,000 new papers were published. If even just a few of the suggested 2% of scientific fraudsters – which, relying on self-reporting, is itself probably a conservative estimate – are active in any given year, the vast majority are going totally undetected. “Reviewers and editors, other gatekeepers – they’re not looking for potential problems,” Hartgerink said.
. . . .
Even in the more mundane business of day-to-day research, scientists are constantly building on past work, relying on its solidity to underpin their own theories. If misconduct really is as widespread as Hartgerink and Van Assen think, then false results are strewn across scientific literature, like unexploded mines that threaten any new structure built over them. At the very least, if science is truly invested in its ideal of self-correction, it seems essential to know the extent of the problem.
But there is little motivation within the scientific community to ramp up efforts to detect fraud. Part of this has to do with the way the field is organised. Science isn’t a traditional hierarchy, but a loose confederation of research groups, institutions, and professional organisations. Universities are clearly central to the scientific enterprise, but they are not in the business of evaluating scientific results, and as long as fraud doesn’t become public they have little incentive to go after it. There is also the questionable perception, although widespread in the scientific community, that there are already measures in place that preclude fraud. When Gore and his fellow congressmen held their hearings 35 years ago, witnesses routinely insisted that science had a variety of self-correcting mechanisms, such as peer-review and replication. But, as the science journalists William Broad and Nicholas Wade pointed out at the time, the vast majority of cases of fraud are actually exposed by whistleblowers, and that holds true to this day.
And so the enormous task of keeping science honest is left to individual scientists in the hope that they will police themselves, and each other. “Not only is it not sustainable,” said Simonsohn, “it doesn’t even work. You only catch the most obvious fakers, and only a small share of them.” There is also the problem of relying on whistleblowers, who face the thankless and emotionally draining prospect of accusing their own colleagues of fraud. (“It’s like saying someone is a paedophile,” one of the students at Tilburg told me.) Neither Simonsohn nor any of the Tilburg whistleblowers I interviewed said they would come forward again. “There is no way we as a field can deal with fraud like this,” the student said. “There has to be a better way.”