×

AI Better Than Humans At Healthcare Questions

The AI gave longer and more detailed responses.

Share







AI answers healthcare questions.AI answers healthcare questions.

Scientists have compared doctor-written and chatbot-generated responses to healthcare-related questions, and the results don’t look good for Team Human [1].

Ask your doctor?

Access to healthcare has been linked to longer lifespan, and good healthcare often starts with a good initial consultation. A group of researchers, including scientists from the University of California, San Diego and from the company Human Longevity, has set out to determine if AI might be able to do this better than human beings. The results were published in JAMA Internal Medicine.

The researchers based their study on 195 real-life exchanges from the Reddit forum r/AskDocs. In all instances, the initial questions asked by users were answered by verified physicians. Questions answered by other healthcare professionals were omitted on the premise that a response by a licensed physician constitutes a better benchmark. The researchers then posited same questions to the 3.5 version of ChatGPT that’s been around since November last year. Each question was asked in a new chat session.

Both the responses provided by human physicians and those provided by the AI model were then evaluated by a team of licensed healthcare professionals using several criteria. The evaluators considered “the quality of information provided” (very poor, poor, acceptable, good, or very good) and “the empathy or bedside manner provided” (not empathetic, slightly empathetic, moderately empathetic, empathetic, and very empathetic). The responses were, of course, randomized, stripped of any identifying information such as “I am an AI model”, and labeled “Response 1” and “Response 2”. To decrease the possibility of bias, each case was presented to three different teams of healthcare professionals for a total of 585 evaluations.

The machine prevails

The differences between the human-generated and the machine-generated responses began with their length. The AI gave significantly longer answers on average (211 words vs 52 words). Human professionals were not inclined to engage in a prolonged conversation: 94% of the exchanges contained a single response from the physician.

ADVERTISEMENT

Eterna is a clothing company with a focus on longevity.

The evaluators preferred the chatbot response in a staggering 78.6% of cases. Chatbot responses reached the average score of 4.13 (better than “good”) and human responses 3.26 (worse than “good”). Moreover, 27% of human responses, but only 2.6% of machine responses, were rated “unacceptable” (less than 3). ChatGPT also cleanly defeated human physicians in the percentage of responses rated “good” or “very good”: 75.5% vs only 22% on Team Human.

As if this wasn’t enough, chatbot responses were also found to be significantly more empathetic (3.65 vs 2.15). A full 80.5% of human responses and just 15% of chatbot responses scored below “slightly empathetic” (less than 3). Chatbot responses were also almost 10 times likelier to be rated “sympathetic” or “very sympathetic”.

Let’s ask the chatbot

To explain these staggering results, we asked ChatGPT version 4.0 for its own analysis.

ChatGPT 1

The chatbot then offered several important caveats:

ChatGPT 2

Moreover, ChatGPT was empathetic enough to provide some comfort to us, humans, and show understanding of the circumstances many healthcare professionals find themselves in:

ADVERTISEMENT

An advertisement banner for PartiQular supplements.

ChatGPT 3

The researchers mention several limitations of their study, the most important being that an exchange on an online forum does not recapitulate a face-to-face dialogue between a patient and a physician. In such dialogue, the physician can expand on the topic, ask follow-up questions, provide increasingly more relevant information, and probably be more empathetic as well.

Additionally, the sample size was limited, and some of the co-authors were also on the evaluation team, which might have created bias despite the study’s blind design. Finally, it is possible that not all human physicians in the study were native English speakers, and the language barrier could have added to the impression of brevity and dispassion.

Conclusion

Based on the results of this study, the researchers call for an evaluation of the possibility to integrate chatbots into clinical settings. While chatbots cannot replace human healthcare professionals (at least for now), they might, the authors suggest, be employed in drafting messages to the patients to be edited and approved by the human staff.

In developing countries, where people often have only limited access to human healthcare professionals, chatbots might be even more important for providing initial assessment and assistance. Last but not least, chatbot-generated responses to healthcare questions might be able to counteract the copious amounts of incomprehensible, contradictory, or plainly misleading information that a regular web search often yields.

To do this, we need your support. Your charitable contribution tranforms into rejuvenation research, news, shows, and more. Will you help?

Literature

[1] Ayers JW, Poliak A, Dredze M, et al. Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum [published online ahead of print, 2023 Apr 28]. JAMA Intern Med. 2023

ADVERTISEMENT

CategoryNews
About the author
Arkadi Mazin

Arkadi Mazin

Arkadi is a seasoned journalist and op-ed author with a passion for learning and exploration. His interests span from politics to science and philosophy. Having studied economics and international relations, he is particularly interested in the social aspects of longevity and life extension. He strongly believes that life extension is an achievable and noble goal that has yet to take its rightful place on the very top of our civilization’s agenda – a situation he is eager to change.