Abstracts written by ChatGPT fool scientists

This content has been archived. It may no longer be accurate or relevant.

From Nature:

An artificial-intelligence (AI) chatbot can write such convincing fake research-paper abstracts that scientists are often unable to spot them, according to a preprint posted on the bioRxiv server in late December1. Researchers are divided over the implications for science.

“I am very worried,” says Sandra Wachter, who studies technology and regulation at the University of Oxford, UK, and was not involved in the research. “If we’re now in a situation where the experts are not able to determine what’s true or not, we lose the middleman that we desperately need to guide us through complicated topics,” she adds.

The chatbot, ChatGPT, creates realistic and intelligent-sounding text in response to user prompts. It is a ‘large language model’, a system based on neural networks that learn to perform a task by digesting huge amounts of existing human-generated text. Software company OpenAI, based in San Francisco, California, released the tool on 30 November, and it is free to use.

Since its release, researchers have been grappling with the ethical issues surrounding its use, because much of its output can be difficult to distinguish from human-written text. Scientists have published a preprint2 and an editorial3 written by ChatGPT. Now, a group led by Catherine Gao at Northwestern University in Chicago, Illinois, has used ChatGPT to generate artificial research-paper abstracts to test whether scientists can spot them.

The researchers asked the chatbot to write 50 medical-research abstracts based on a selection published in JAMA, The New England Journal of Medicine, The BMJ, The Lancet and Nature Medicine. They then compared these with the original abstracts by running them through a plagiarism detector and an AI-output detector, and they asked a group of medical researchers to spot the fabricated abstracts.

. . . .

The ChatGPT-generated abstracts sailed through the plagiarism checker: the median originality score was 100%, which indicates that no plagiarism was detected. The AI-output detector spotted 66% the generated abstracts. But the human reviewers didn’t do much better: they correctly identified only 68% of the generated abstracts and 86% of the genuine abstracts. They incorrectly identified 32% of the generated abstracts as being real and 14% of the genuine abstracts as being generated.

“ChatGPT writes believable scientific abstracts,” say Gao and colleagues in the preprint. “The boundaries of ethical and acceptable use of large language models to help scientific writing remain to be determined.”

Wachter says that, if scientists can’t determine whether research is true, there could be “dire consequences”. As well as being problematic for researchers, who could be pulled down flawed routes of investigation, because the research they are reading has been fabricated, there are “implications for society at large because scientific research plays such a huge role in our society”. For example, it could mean that research-informed policy decisions are incorrect, she adds.

But Arvind Narayanan, a computer scientist at Princeton University in New Jersey, says: “It is unlikely that any serious scientist will use ChatGPT to generate abstracts.” He adds that whether generated abstracts can be detected is “irrelevant”. “The question is whether the tool can generate an abstract that is accurate and compelling. It can’t, and so the upside of using ChatGPT is minuscule, and the downside is significant,” he says.

Irene Solaiman, who researches the social impact of AI at Hugging Face, an AI company with headquarters in New York and Paris, has fears about any reliance on large language models for scientific thinking. “These models are trained on past information and social and scientific progress can often come from thinking, or being open to thinking, differently from the past,” she adds.

Link to the rest at Nature

PG decided to ChatGPT’s capabilities for creating an abstract of a longer text. He did so by copying and pasting the test of the blog post above into ChatGPT and asking the AI to summarize it.

Here’s what the AI produced:

A preprint study found that AI chatbot, ChatGPT, can write convincing fake research-paper abstracts that are often difficult to spot. The researchers at Northwestern University in Chicago used ChatGPT to generate 50 medical research abstracts and compared them with the original abstracts using a plagiarism detector and AI-output detector. They also asked a group of medical researchers to spot the fabricated abstracts. The ChatGPT-generated abstracts sailed through the plagiarism checker and the AI-output detector spotted 66% of the generated abstracts. However, the human reviewers correctly identified only 68% of the generated abstracts and 86% of the genuine abstracts, raising concerns about the ethical implications of using AI in scientific research.

7 thoughts on “Abstracts written by ChatGPT fool scientists”

  1. When research papers were mentioned the other day I remembered this paper that came out in 2012.

    Theory of the Origin, Evolution, and Nature of Life
    https://www.researchgate.net/publication/233860963_Theory_of_the_Origin_Evolution_and_Nature_of_Life

    It caused quite a stir when it came out. I was excited while reading it until I started getting that “dizzy” feeling when I read artificially generated text. At the time I harvested a huge number of articles discussing the paper. This captures the event best.

    The comparison to jabberwocky is inevitable
    https://freethoughtblogs.com/pharyngula/2012/01/27/the-comparison-to-jabberwocky-is-inevitable/

    I have a folder full of the discussion that I need to read through again.

    Thanks…

    BTW, If people think that scientists will not use ChatGPT to generate abstracts for papers they have no clue about the amount of money out there for research. They will also use ChatGPT, with extensive editing, to generate the final paper.

    Why Most Published Research Findings Are False
    https://en.wikipedia.org/wiki/Why_Most_Published_Research_Findings_Are_False

  2. Maybe I’m missing something here (wouldn’t surprise me) but doesn’t an actual human being have to post or submit the abstract to the publication or committee or other powers that be? So if someone used AI to create the abstract and it was not truthful, the submitter would have to be either deliberately deceptive or extremely careless.

  3. I’m not worried about this. Given the nature of the research paper abstracts I’ve read – admittedly mostly from physics and astonomy – I would have thought that they were one of the easier objects for ChatGPT to fabricate. It is a little amusing – and says something (I’m not quite sure what) about the formalised natute of abstracts – that that the experts thought that 14% of the real extracts were fakes.

  4. I’m not particularly concerned about this – the map of the abstract is not the territory of the paper.

    Far more worrisome, and for several years now, are the bogus papers that are generated, most of them without the benefit of what we call “AI.” Particularly prevalent in what Jerry Pournelle called the “voodoo sciences,” although the phenomenon is also slowly creeping into the hard sciences. One merely has to look at sites like https://retractionwatch.com/ to become quite concerned.

    Interesting here – although one must take the source of this article with an entire cow lick of salt – is this: https://www.theguardian.com/science/2023/jan/26/science-journals-ban-listing-of-chatgpt-as-co-author-on-papers.

    • Bogus papers are mostly a problem as opportunity cost. The usual reason to write one is to pad your C.V. But this only works if you keep a low profile. You don’t sent your bogus paper to Nature. You send it to some third-tier journal in your sub- sub-specialty. This in turn only works to enhance your C.V. because no one outside that sub- sub-specialty knows whether the journal is any good or not. So it gets published, added to the C.V., and forgotten. This happens all the time, and is not a new phenomenon.

      True story: my brother was an academic chemist. A literature search for a project he was working on turned up a relevant paper in one of those obscure journals. Not being a fool, he attempted to replicate it. This was unsuccessful, but this is not unusual. It is hard to exactly replicate the conditions leading to the conclusion. So he followed the usual procedure of contacting the author of the paper to discuss the matter, and perhaps share data. The initial contact was very collegial, but when it came down to brass tacks, the guy went dark on him. After a few attempts, it dawned on my brother that the paper was bogus. It likely started out as legit research, but when nothing came of it the guy fudged the data to have something publishable. There is no real down side to this, professionally speaking. The expectation was that it would be published and forgotten, having served its purpose. And really, even in this instance, this was true. My brother could in principle have gone on a crusade, but all he really had was that he couldn’t replicate the results of a minor paper in a minor journal. This is to say, he had nothing of note, and so he just passed over it and went on with his own project.

      The point of this story is that this sort of thing goes on all the time, and has for a long time, around the margins. It is a distraction for people like my brother who might try to build on it, but even this rarely happens.

      • In the day job it was a running joke that “one data point is good for ten papers”. But that was one *good* point. 😀

        Early days, I was handed a model to simulate of a turbine blade design that kept ” blowing up”, going to infinity and crashing the software. I managed to isolate the physical local of the failure point and a model of the design was testing in a flow visualization tunnel (I was invited to sitbin on the test runs. Fun stuff, really.) Lo and behold, the tests verified flow separation of the boundary layer at the exact point the software choked. A modified model (a milimeter or so change) tweaked the blade shape and both the simulation and the tunnel proved the problem went away. The blade shape came from a multimillion dollar test facility meant to push the state of the art. Which had…malfunctioned catastrophically.
        I lost count at three derivative papers. I was third author on the first.

        Bad papers aren’t always trivial.

        A lot of bad policy stems from bad, incomplete, or outright fake data.

Comments are closed.