For your health, avoid ChatGPT,
researchers insist
Researchers at CHU Sainte-Justine and the Montreal Children's Hospital tested the famous chatbot and discovered a very high error rate: invented scientific references, bad advice, inaccurate information, erroneous quotes...
WHAT YOU NEED TO KNOW• Researchers at CHU Sainte-Justine and the Montreal Children's Hospital tested ChatGPT.
• They found numerous falsehoods, including five "major" factual errors.
• The robot created 70% of the references provided from scratch.
It was known that ChatGPT and other artificial intelligence robots could generate falsehoods, but the consequences are much more serious when it comes to human health.
"I was a little surprised," admitsDr. Jocelyn Gravel, an emergency physician at CHU Sainte-Justine and lead author of the study that has just been published. "ChatGPT writes very well, but sometimes he talks wrong, and it doesn't show!"
Her colleague and co-author of the study, Dr. Esli Osmanlliu, an emergency physician at the Montreal Children's Hospital and a scientist at the Research Institute of the McGill University Health Centre, adds that this is a reminder that ChatGPT "isn't going to fix everything."
"These models were developed to predict that the next word is going to be x, y or z," he explains. They were not developed for factual accuracy. »
"An appearance of intelligence"To evaluate the software, the researchers proceeded in two stages. They first asked ChatGPT 20 medical questions, drawn from 20 recently published scientific papers. They asked him to respond by providing references. Then, they submitted the responses to the authors of the articles in question to assess the quality and accuracy of the responses, on a scale of 0 to 100%.
"We could have evaluated the answers ourselves,"says Dr. Gravel. But I thought that the best expert on a research topic is someone who has just published on that topic. »
As a result, five "major" factual errors were found in the responses and 70% of the references provided were fabricated.
At the end of the exercise, ChatGPT's median score is 60%.
There were enough mistakes to remind people how important it is to cross-check sources and cross-check information.
Dr. Esli Osmanlliu, Emergency Physician at the Montreal Children's Hospital and Scientist at
the Research Institute of the McGill University Health Centre
"Sometimes ChatGPT is presented to us as a machine that acts like a human,"adds Dr. Osmanlliu. But we understand that in the end, it is a brute force that gives an appearance of intelligence. Will it evolve? Will it get better? Certainly. But it is not yet a tool with intelligence. »
FalsehoodsOne of the questions asked was: What is the standard treatment for a child with eosinophilic esophagitis (an irritation of the esophagus)?
In its response, ChatGPT suggested cortisone injections. However, this drug should not be injected, but ingested,says Dr. Gravel.
Another question was the global mortality rate associated with Shigella infections.
"Experts say there are probably 100,000 deaths a year because of Shigella, but ChatGPT said there are a million," the researcher reports. It's still a factor of 10! Maybe ChatGPT is right, but that's not what the experts say. »
The tool provided different explanations when asked about the accuracy of the references provided.
In one case, he said that "references are available on Pubmed": a site that lists all the scientific articles in the world on a given topic. But the web link provided referred to other publications with no connection to the issue.
As if he knew everythingThe problem, Dr. Gravel notes, is that ChatGPT behaves as if it knows the answers to all the questions.
"When you interact with him, you really feel like you're texting someone," he says. But if we talk to someone, we ask him a question and he doesn't know the answer, he'll tell us: I think that's it, but I'm not sure...
"ChatGPT is not that. We ask him a question, he gives us an answer. The answer may be in the field completely and you don't see the difference. And it's super well written. It sounds very convincing, but it's not based on anything. I wouldn't say he's a compulsive liar, but he almost always has an answer. He will rarely say: I don't have an answer to that question. He will give an answer, and if it is not good, he will say: "I sometimes make mistakes, it is not more serious than that." I find that a little worrying. »
WHAT YOU NEED TO KNOW• Researchers at CHU Sainte-Justine and the Montreal Children's Hospital tested ChatGPT.
• They found numerous falsehoods, including five "major" factual errors.
• The robot created 70% of the references provided from scratch.
It was known that ChatGPT and other artificial intelligence robots could generate falsehoods, but the consequences are much more serious when it comes to human health.
"I was a little surprised," admitsDr. Jocelyn Gravel, an emergency physician at CHU Sainte-Justine and lead author of the study that has just been published. "ChatGPT writes very well, but sometimes he talks wrong, and it doesn't show!"
Her colleague and co-author of the study, Dr. Esli Osmanlliu, an emergency physician at the Montreal Children's Hospital and a scientist at the Research Institute of the McGill University Health Centre, adds that this is a reminder that ChatGPT "isn't going to fix everything."
"These models were developed to predict that the next word is going to be x, y or z," he explains. They were not developed for factual accuracy. »
"An appearance of intelligence"To evaluate the software, the researchers proceeded in two stages. They first asked ChatGPT 20 medical questions, drawn from 20 recently published scientific papers. They asked him to respond by providing references. Then, they submitted the responses to the authors of the articles in question to assess the quality and accuracy of the responses, on a scale of 0 to 100%.
"We could have evaluated the answers ourselves,"says Dr. Gravel. But I thought that the best expert on a research topic is someone who has just published on that topic. »
As a result, five "major" factual errors were found in the responses and 70% of the references provided were fabricated.
At the end of the exercise, ChatGPT's median score is 60%.
There were enough mistakes to remind people how important it is to cross-check sources and cross-check information.
Dr. Esli Osmanlliu, Emergency Physician at the Montreal Children's Hospital and Scientist at
the Research Institute of the McGill University Health Centre
"Sometimes ChatGPT is presented to us as a machine that acts like a human,"adds Dr. Osmanlliu. But we understand that in the end, it is a brute force that gives an appearance of intelligence. Will it evolve? Will it get better? Certainly. But it is not yet a tool with intelligence. »
FalsehoodsOne of the questions asked was: What is the standard treatment for a child with eosinophilic esophagitis (an irritation of the esophagus)?
In its response, ChatGPT suggested cortisone injections. However, this drug should not be injected, but ingested,says Dr. Gravel.
Another question was the global mortality rate associated with Shigella infections.
"Experts say there are probably 100,000 deaths a year because of Shigella, but ChatGPT said there are a million," the researcher reports. It's still a factor of 10! Maybe ChatGPT is right, but that's not what the experts say. »
The tool provided different explanations when asked about the accuracy of the references provided.
In one case, he said that "references are available on Pubmed": a site that lists all the scientific articles in the world on a given topic. But the web link provided referred to other publications with no connection to the issue.
As if he knew everythingThe problem, Dr. Gravel notes, is that ChatGPT behaves as if it knows the answers to all the questions.
"When you interact with him, you really feel like you're texting someone," he says. But if we talk to someone, we ask him a question and he doesn't know the answer, he'll tell us: I think that's it, but I'm not sure...
"ChatGPT is not that. We ask him a question, he gives us an answer. The answer may be in the field completely and you don't see the difference. And it's super well written. It sounds very convincing, but it's not based on anything. I wouldn't say he's a compulsive liar, but he almost always has an answer. He will rarely say: I don't have an answer to that question. He will give an answer, and if it is not good, he will say: "I sometimes make mistakes, it is not more serious than that." I find that a little worrying. »