AI Beats Humans in Answering Healthcare-Related Questions

The AI gave longer and more detailed responses.


AI answers healthcare questions.AI answers healthcare questions.

Scientists have compared doctor-written and chatbot-generated responses to healthcare-related questions, and the results don’t look good for Team Human [1].

Ask your doctor?

Access to healthcare has been linked to longer lifespan, and good healthcare often starts with a good initial consultation. A group of researchers, including scientists from the University of California, San Diego and from the company Human Longevity, has set out to determine if AI might be able to do this better than human beings. The results were published in JAMA Internal Medicine.

The researchers based their study on 195 real-life exchanges from the Reddit forum r/AskDocs. In all instances, the initial questions asked by users were answered by verified physicians. Questions answered by other healthcare professionals were omitted on the premise that a response by a licensed physician constitutes a better benchmark. The researchers then posited same questions to the 3.5 version of ChatGPT that’s been around since November last year. Each question was asked in a new chat session.

Both the responses provided by human physicians and those provided by the AI model were then evaluated by a team of licensed healthcare professionals using several criteria. The evaluators considered “the quality of information provided” (very poor, poor, acceptable, good, or very good) and “the empathy or bedside manner provided” (not empathetic, slightly empathetic, moderately empathetic, empathetic, and very empathetic). The responses were, of course, randomized, stripped of any identifying information such as “I am an AI model”, and labeled “Response 1” and “Response 2”. To decrease the possibility of bias, each case was presented to three different teams of healthcare professionals for a total of 585 evaluations.

The machine prevails

The differences between the human-generated and the machine-generated responses began with their length. The AI gave significantly longer answers on average (211 words vs 52 words). Human professionals were not inclined to engage in a prolonged conversation: 94% of the exchanges contained a single response from the physician.

The evaluators preferred the chatbot response in a staggering 78.6% of cases. Chatbot responses reached the average score of 4.13 (better than “good”) and human responses 3.26 (worse than “good”). Moreover, 27% of human responses, but only 2.6% of machine responses, were rated “unacceptable” (less than 3). ChatGPT also cleanly defeated human physicians in the percentage of responses rated “good” or “very good”: 75.5% vs only 22% on Team Human.

As if this wasn’t enough, chatbot responses were also found to be significantly more empathetic (3.65 vs 2.15). A full 80.5% of human responses and just 15% of chatbot responses scored below “slightly empathetic” (less than 3). Chatbot responses were also almost 10 times likelier to be rated “sympathetic” or “very sympathetic”.

Let’s ask the chatbot

To explain these staggering results, we asked ChatGPT version 4.0 for its own analysis.

ChatGPT 1

The chatbot then offered several important caveats:

ChatGPT 2

Moreover, ChatGPT was empathetic enough to provide some comfort to us, humans, and show understanding of the circumstances many healthcare professionals find themselves in:

ChatGPT 3

The researchers mention several limitations of their study, the most important being that an exchange on an online forum does not recapitulate a face-to-face dialogue between a patient and a physician. In such dialogue, the physician can expand on the topic, ask follow-up questions, provide increasingly more relevant information, and probably be more empathetic as well.

Additionally, the sample size was limited, and some of the co-authors were also on the evaluation team, which might have created bias despite the study’s blind design. Finally, it is possible that not all human physicians in the study were native English speakers, and the language barrier could have added to the impression of brevity and dispassion.


Based on the results of this study, the researchers call for an evaluation of the possibility to integrate chatbots into clinical settings. While chatbots cannot replace human healthcare professionals (at least for now), they might, the authors suggest, be employed in drafting messages to the patients to be edited and approved by the human staff.

In developing countries, where people often have only limited access to human healthcare professionals, chatbots might be even more important for providing initial assessment and assistance. Last but not least, chatbot-generated responses to healthcare questions might be able to counteract the copious amounts of incomprehensible, contradictory, or plainly misleading information that a regular web search often yields.

We would like to ask you a small favor. We are a non-profit foundation, and unlike some other organizations, we have no shareholders and no products to sell you. We are committed to responsible journalism, free from commercial or political influence, that allows you to make informed decisions about your future health.

All our news and educational content is free for everyone to read, but it does mean that we rely on the help of people like you. Every contribution, no matter if it’s big or small, supports independent journalism and sustains our future. You can support us by making a donation or in other ways at no cost to you.

Developing a Treatment for Arthritis from Stem Cell Signals

Resarchers publishing in Aging have found that extracellular vesicles (EVs) derived from human umbilical cord mesenchymal stem cells (MSCs) reduce...

Creating a Noise Clock to Measure Biological Age

Publishing in Aging, the Conboy research lab has outlined the problems with existing machine learning-based clocks and created a new...

EARD2023: Using NFTs to Support Video Gaming for Good

In this talk, Lifespan.io President Keith Comito describes use cases for "Proof of Philanthropy" (PoP) dynamic NFTs - a new...

Senolytics as a Potential Back Pain Treatment

In a recent paper, researchers from McGill University in Canada have investigated how a combination of two senolytics, RG-7112 and...


[1] Ayers JW, Poliak A, Dredze M, et al. Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum [published online ahead of print, 2023 Apr 28]. JAMA Intern Med. 2023

About the author
Arkadi Mazin

Arkadi Mazin

Arkadi is a seasoned journalist and op-ed author with a passion for learning and exploration. His interests span from politics to science and philosophy. Having studied economics and international relations, he is particularly interested in the social aspects of longevity and life extension. He strongly believes that life extension is an achievable and noble goal that has yet to take its rightful place on the very top of our civilization’s agenda – a situation he is eager to change.
  1. Izabela Edmunds
    June 1, 2023

    Very promising results. Now imagine AI actually applied and allowed access to all research papers. The only thing stopping progress in medicine is human greed.

Write a comment:


Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.