Abstract
Purpose/Background:
ChatGPT powered by the generative pretrained transform (GPT) version 3.5 has emerged as a singular, accessible source of what appears to be authoritative patient information compared to the multiple, often contradictory search returns from internet search browsers. While ChatGPT has purported benefits as a source of patient education and information, actual capability needs careful evaluation. Moreover, the emergence of paid subscription access to GPT-4 promises further enhanced capabilities requiring evaluation.
Methods:
A series of 50 randomly selected nuclear medicine reports were split into two groups of 25 reports. Reports covered a wide range of nuclear medicine procedures from simple to complex, from PET/CT to general nuclear medicine. Group 1 reports were provided to GPT-3.5 and group 2 to GPT-4 with the express prompt of translating the clinical report into a patient-facing summary in no more than three sentences. After completion, the groups were switched and entered with the same prompt into GPT-3.5 for group 2 and GPT-4 for group 1. The ChatGPT reports were blindly (blinded to which version of GPT produced the report and the outcomes of other observers) evaluated by three nuclear medicine physicians and assessed for adherence to the three-sentence limit, appropriateness of patient-facing language, accuracy of translation, important omissions, the use of jargon and overall fitness for purpose.
Results:
ChatGPT powered by GPT-3.5 showed low adherence to the three-sentence limit (14%), appropriateness of patient-facing language (18%), accuracy of translation (26%), and the high use of jargon (78%). Conversely, GPT-4 demonstrated 100% adherence to the three-sentence limit, appropriateness of patient-facing language (76%), accuracy of translation (46%), and low use of jargon (2%). While GPT-4 had a tendency to introduce inappropriate language (e.g. “the good news”) it was considered more fit for purpose (90%) than GPT-3.5 (68%). The fitness for purpose for both GPT-3.5 and GPT-4 was deemed poorer in more complex procedures like PET/CT (74% and 44% respectively) than in less complex like bone scans (26% and 4% respectively). While GPT-3.5 had a higher use of medical jargon, GPT-4 was more prone to introducing inappropriate language (28% compared to 6%), omissions of key information (12% compared to 8%) and misinterpretation of results (6% versus 4%).
Conclusion:
While generative artificial intelligence may contribute to overcoming the gap between patient understanding and nuclear medicine reports, both GPT-3.5 and GPT-4 demonstrated concerning limitations that could create harmful misunderstanding or confusion. While GPT-4 provided greater fitness for purpose than GPT-3.5, GPT-4 remains prone to errors that suggests use for report translation should be undertaken with care and the support of health literacy (eg. primary care physician). GPT-3.5 is most likely to be used for clinical report translation because it is available without paid subscription, but is not fit for purpose.
ChatGPT powered by the generative pretrained transform (GPT) version 3.5 has emerged as a singular, accessible source of what appears to be authoritative patient information compared to the multiple, often contradictory search returns from internet search browsers. While ChatGPT has purported benefits as a source of patient education and information, actual capability needs careful evaluation. Moreover, the emergence of paid subscription access to GPT-4 promises further enhanced capabilities requiring evaluation.
Methods:
A series of 50 randomly selected nuclear medicine reports were split into two groups of 25 reports. Reports covered a wide range of nuclear medicine procedures from simple to complex, from PET/CT to general nuclear medicine. Group 1 reports were provided to GPT-3.5 and group 2 to GPT-4 with the express prompt of translating the clinical report into a patient-facing summary in no more than three sentences. After completion, the groups were switched and entered with the same prompt into GPT-3.5 for group 2 and GPT-4 for group 1. The ChatGPT reports were blindly (blinded to which version of GPT produced the report and the outcomes of other observers) evaluated by three nuclear medicine physicians and assessed for adherence to the three-sentence limit, appropriateness of patient-facing language, accuracy of translation, important omissions, the use of jargon and overall fitness for purpose.
Results:
ChatGPT powered by GPT-3.5 showed low adherence to the three-sentence limit (14%), appropriateness of patient-facing language (18%), accuracy of translation (26%), and the high use of jargon (78%). Conversely, GPT-4 demonstrated 100% adherence to the three-sentence limit, appropriateness of patient-facing language (76%), accuracy of translation (46%), and low use of jargon (2%). While GPT-4 had a tendency to introduce inappropriate language (e.g. “the good news”) it was considered more fit for purpose (90%) than GPT-3.5 (68%). The fitness for purpose for both GPT-3.5 and GPT-4 was deemed poorer in more complex procedures like PET/CT (74% and 44% respectively) than in less complex like bone scans (26% and 4% respectively). While GPT-3.5 had a higher use of medical jargon, GPT-4 was more prone to introducing inappropriate language (28% compared to 6%), omissions of key information (12% compared to 8%) and misinterpretation of results (6% versus 4%).
Conclusion:
While generative artificial intelligence may contribute to overcoming the gap between patient understanding and nuclear medicine reports, both GPT-3.5 and GPT-4 demonstrated concerning limitations that could create harmful misunderstanding or confusion. While GPT-4 provided greater fitness for purpose than GPT-3.5, GPT-4 remains prone to errors that suggests use for report translation should be undertaken with care and the support of health literacy (eg. primary care physician). GPT-3.5 is most likely to be used for clinical report translation because it is available without paid subscription, but is not fit for purpose.
Original language | English |
---|---|
Publication status | Published - 2024 |
Event | 2024 SNMMI Annual Meeting - Metro Toronto Convention Centre, Toronto, Canada Duration: 08 Jun 2024 → 11 Jun 2024 https://web.archive.org/web/20240611155505/https://sites.snmmi.org/SNMMI-AM/Home.aspx?hkey=2f9701b4-43dd-4988-9b11-d5f2d3bc118c&WebsiteKey=2ac95882-cb13-4b7d-aabc-6f7c0ef60252 (Meeting website) |
Conference
Conference | 2024 SNMMI Annual Meeting |
---|---|
Country/Territory | Canada |
City | Toronto |
Period | 08/06/24 → 11/06/24 |
Other | The premier educational, scientific, research, and networking event in nuclear medicine and molecular imaging, the SNMMI Annual Meeting provides physicians, technologists, pharmacists, laboratory professionals, and scientists with an in-depth view of the latest research and development in the field as well as providing insights into practical applications for the clinic. |
Internet address |