TY - JOUR
T1 - GPT-4 in Nuclear Medicine Education
T2 - Does It Outperform GPT-3.5?
AU - Currie, Geoffrey M.
N1 - Publisher Copyright:
© 2023 Society of Nuclear Medicine Inc.. All rights reserved.
PY - 2023/12/1
Y1 - 2023/12/1
N2 - The emergence of ChatGPT has challenged academic integrity in teaching institutions, including those providing nuclear medicine training. Although previous evaluations of ChatGPT have suggested a limited scope for academic writing, the March 2023 release of generative pretrained transformer (GPT)-4 promises enhanced capabilities that require evaluation. Methods: Examinations (final and calculation) and written assignments for nuclear medicine subjects were tested using GPT-3.5 and GPT-4. GPT-3.5 and GPT-4 responses were evaluated by Turnitin software for artificial intelligence scores, marked against standardized rubrics, and compared with the mean performance of student cohorts. Results: ChatGPT powered by GPT-3.5 performed poorly in calculation examinations (31.4%), compared with GPT-4 (59.1%). GPT-3.5 failed each of 3 written tasks (39.9%), whereas GPT-4 passed each task (56.3%). Conclusion: Although GPT-3.5 poses a minimal risk to academic integrity, its usefulness as a cheating tool can be significantly enhanced by GPT-4 but remains prone to hallucination and fabrication.
AB - The emergence of ChatGPT has challenged academic integrity in teaching institutions, including those providing nuclear medicine training. Although previous evaluations of ChatGPT have suggested a limited scope for academic writing, the March 2023 release of generative pretrained transformer (GPT)-4 promises enhanced capabilities that require evaluation. Methods: Examinations (final and calculation) and written assignments for nuclear medicine subjects were tested using GPT-3.5 and GPT-4. GPT-3.5 and GPT-4 responses were evaluated by Turnitin software for artificial intelligence scores, marked against standardized rubrics, and compared with the mean performance of student cohorts. Results: ChatGPT powered by GPT-3.5 performed poorly in calculation examinations (31.4%), compared with GPT-4 (59.1%). GPT-3.5 failed each of 3 written tasks (39.9%), whereas GPT-4 passed each task (56.3%). Conclusion: Although GPT-3.5 poses a minimal risk to academic integrity, its usefulness as a cheating tool can be significantly enhanced by GPT-4 but remains prone to hallucination and fabrication.
KW - academic integrity
KW - artificial intelligence
KW - generative algorithms
KW - higher education
KW - patient education
KW - scientific writing
UR - http://www.scopus.com/inward/record.url?scp=85179011911&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85179011911&partnerID=8YFLogxK
U2 - 10.2967/jnmt.123.266485
DO - 10.2967/jnmt.123.266485
M3 - Article
C2 - 37852647
AN - SCOPUS:85179011911
SN - 0091-4916
VL - 51
SP - 314
EP - 317
JO - Journal of Nuclear Medicine Technology
JF - Journal of Nuclear Medicine Technology
IS - 4
ER -