Background: Immunohistochemistry (IHC) is a critical tool for tumor diagnosis and treatment, but it is time and tissue consuming, and highly dependent on skilled laboratory technicians. Recently, deep learning-based IHC biomarker prediction models have been widely developed, but few investigations have explored their clinical application effectiveness.
Methods: In this study, we aimed to create an automatic pipeline for the construction of deep learning models to generate AI-IHC (Artificial Intelligence) output using H&E whole slide images (WSIs) and compared the pathology reports by pathologists on AI-IHC versus conventional IHC. We obtained 134 WSIs including H&E and IHC pairs, and automatically extracted 415,463 tiles from H&E slides for model construction based on the annotation transfer from IHC slides. Five IHC biomarker prediction models (P40, Pan-CK, Desmin, P53, Ki-67) were developed to support a range of clinically relevant diagnostic applications across various gastrointestinal cancer subtypes, including esophageal, gastric, and colorectal cancers. The Ki-67 proliferation index was quantitatively assessed using digital image analysis.
Results: The AUCs of five IHC biomarker models ranged from 0.90 to 0.96 and the accuracies were between 83.04 and 90.81%. Additional 150 WSIs from 30 patients were collected to assess the effectiveness of AI-IHC through the multi-reader multi-case (MRMC) study. Each case was read by three pathologists, once on AI-IHC and once on conventional IHC with a minimum 2-week washout period. The results indicate that the consistency rates of pathologists in AI and conventional IHC cases were high in Desmin, Pan-CK and P40 (96.67-100%) while moderate in the P53 (70.00%). We also evaluated the T-stage through the staining of these IHC biomarkers and the consistency rate was 86.36%. Furthermore, the Ki-67 proliferation index, as reported by AI-IHC, showed a variability ranging from 17.35% ±16.2% compared to conventional IHC, with ICC of 0.415 (P = 0.015) between these two groups.
Conclusions: Here, we leveraged automatic tile-level annotations from H&E slides to efficiently develop deep learning-based IHC biomarker models, achieving AUCs between 0.90 and 0.96. AI generated IHC showed substantial concordance with conventional IHC across most markers, supporting its potential as an assistive tool in routine diagnostics.
Keywords: Deep learning; Gastrointestinal cancers; Immunohistochemistry; Pathology; Whole-slide image.
© 2025. The Author(s).