Multimodal deep learning for cephalometric landmark detection and treatment prediction

Fei Gao; Yulong Tang

doi:10.1038/s41598-025-06229-w

Multimodal deep learning for cephalometric landmark detection and treatment prediction

Sci Rep. 2025 Jul 12;15(1):25205. doi: 10.1038/s41598-025-06229-w.

Authors

Fei Gao^#¹, Yulong Tang^#²

Affiliations

¹ Department of Stomatology, General Hospital of PLA Northern Theater Command, Shenyang, 110002, Liaoning, China.
² Department of Stomatology, General Hospital of PLA Northern Theater Command, Shenyang, 110002, Liaoning, China. bzkq0615@163.com.

^# Contributed equally.

PMID: 40651957
DOI: 10.1038/s41598-025-06229-w

Abstract

In orthodontics and maxillofacial surgery, accurate cephalometric analysis and treatment outcome prediction are critical for clinical decision-making. Traditional approaches rely on manual landmark identification, which is time-consuming and subject to inter-observer variability, while existing automated methods typically utilize single imaging modalities with limited accuracy. This paper presents DeepFuse, a novel multi-modal deep learning framework that integrates information from lateral cephalograms, CBCT volumes, and digital dental models to simultaneously perform landmark detection and treatment outcome prediction. The framework employs modality-specific encoders, an attention-guided fusion mechanism, and dual-task decoders to leverage complementary information across imaging techniques. Extensive experiments on three clinical datasets demonstrate that DeepFuse achieves a mean radial error of 1.21 mm for landmark detection, representing a 13% improvement over state-of-the-art methods, with a clinical acceptability rate of 92.4% at the 2 mm threshold. For treatment outcome prediction, the framework attains an overall accuracy of 85.6%, significantly outperforming both conventional prediction models and experienced clinicians. The proposed approach enhances diagnostic precision and treatment planning while providing interpretable visualization of decision factors, demonstrating significant potential for clinical integration in orthodontic and maxillofacial practice.

Keywords: Attention mechanism; Cephalometric analysis; Landmark detection; Multi-modal deep learning; Orthodontics; Treatment outcome prediction.

MeSH terms

Anatomic Landmarks* / diagnostic imaging
Cephalometry* / methods
Cone-Beam Computed Tomography / methods
Deep Learning*
Humans
Image Processing, Computer-Assisted / methods
Treatment Outcome