Enhancing Radiology Report Generation via Multi-Phased Supervision

IEEE Trans Med Imaging. 2025 Jun 25:PP. doi: 10.1109/TMI.2025.3580659. Online ahead of print.

Abstract

Radiology report generation using large language models has recently produced reports with more realistic styles and better language fluency. However, their clinical accuracy remains inadequate. Considering the significant imbalance between clinical phrases and general descriptions in a report, we argue that using an entire report for supervision is problematic as it fails to emphasize the crucial clinical phrases, which require focused learning. To address this issue, we propose a multi-phased supervision method, inspired by the spirit of curriculum learning where models are trained by gradually increasing task complexity. Our approach organizes the learning process into structured phases at different levels of semantical granularity, each building on the previous one to enhance the model. During the first phase, disease labels are used to supervise the model, equipping it with the ability to identify underlying diseases. The second phase progresses to use entity-relation triples to guide the model to describe associated clinical findings. Finally, in the third phase, we introduce conventional whole-report-based supervision to quickly adapt the model for report generation. Throughout the phased training, the model remains the same and consistently operates in the generation mode. As experimentally demonstrated, this proposed change in the way of supervision enhances report generation, achieving state-of-the-art performance in both language fluency and clinical accuracy. Our work underscores the importance of training process design in radiology report generation. Our code is available on https://github.com/zailongchen/MultiP-R2Gen.