TY - JOUR
T1 - Sci2Txt
T2 - Automated report generation of low-resolution SPECT bone scintigrams using spatial-position-aware and hierarchical features
AU - Song, Tao
AU - Lin, Qiang
AU - Zeng, Xianwu
AU - Cao, Yongchun
AU - Man, Zhengxing
AU - Liu, Caihong
AU - Cai, Zhengqi
AU - Gui, Chun
AU - Huang, Xiaodi
PY - 2025/7
Y1 - 2025/7
N2 - In clinical practice, writing diagnostic reports from low-resolution, large-scale bone scintigrams poses a significant burden on nuclear medicine physicians. While deep learning-based automated report generation has shown promise in reducing diagnostic oversights and alleviating physician workload, most existing methods are tailored for high-resolution X-ray images. However, bone scintigrams exhibit unpredictable characteristics in location, size, and shape, which complicates accurate report generation. To address this challenge, we introduce Sci2Txt, a novel encoder-decoder architecture for the automated generation of diagnostic reports from bone scintigrams. Sci2Txt incorporates three innovative components: (1) a Spatial-Position Visual Feature Extractor (SPVFE) that captures multi-scale spatial position information from low-resolution images; (2) a Hierarchical Fusion Encoder (HFE) that integrates low- and high-level semantic features through cross-level feature splicing and nonlinear transformations; and (3) a Memory-driven Transformer Decoder (MTD) that generates coherent and clinically accurate reports. Evaluated on a dataset of 2,091 clinical bone scintigrams, Sci2Txt outperforms existing methods in both traditional natural language generation metrics and the newly proposed hierarchical Clinical Efficacy (CE) metric. By enabling efficient and accurate automated reporting, Sci2Txt offers a practical solution for enhancing diagnostic workflows in the detection of bone metastases.
AB - In clinical practice, writing diagnostic reports from low-resolution, large-scale bone scintigrams poses a significant burden on nuclear medicine physicians. While deep learning-based automated report generation has shown promise in reducing diagnostic oversights and alleviating physician workload, most existing methods are tailored for high-resolution X-ray images. However, bone scintigrams exhibit unpredictable characteristics in location, size, and shape, which complicates accurate report generation. To address this challenge, we introduce Sci2Txt, a novel encoder-decoder architecture for the automated generation of diagnostic reports from bone scintigrams. Sci2Txt incorporates three innovative components: (1) a Spatial-Position Visual Feature Extractor (SPVFE) that captures multi-scale spatial position information from low-resolution images; (2) a Hierarchical Fusion Encoder (HFE) that integrates low- and high-level semantic features through cross-level feature splicing and nonlinear transformations; and (3) a Memory-driven Transformer Decoder (MTD) that generates coherent and clinically accurate reports. Evaluated on a dataset of 2,091 clinical bone scintigrams, Sci2Txt outperforms existing methods in both traditional natural language generation metrics and the newly proposed hierarchical Clinical Efficacy (CE) metric. By enabling efficient and accurate automated reporting, Sci2Txt offers a practical solution for enhancing diagnostic workflows in the detection of bone metastases.
KW - Nuclear medicine
KW - Functional medical image
KW - Bone scintigram, Diagnostic report generation
KW - Deep learning
UR - https://www.scopus.com/pages/publications/105011968051
UR - https://www.scopus.com/pages/publications/105011968051#tab=citedBy
U2 - 10.1016/j.bspc.2025.108359
DO - 10.1016/j.bspc.2025.108359
M3 - Article
SN - 1746-8094
VL - 111
SP - 1
EP - 15
JO - Biomedical Signal Processing and Control
JF - Biomedical Signal Processing and Control
M1 - 108359
ER -