Robust Cluster Prediction Across Data Types Validates Association of Sex and Therapy Response in GBM

Cancers (Basel). 2025 Jan 28;17(3):445. doi: 10.3390/cancers17030445.

Abstract

Background: Previous studies have described sex-specific patient subtyping in glioblastoma. The cluster labels associated with these "legacy data" were used to train a predictive model capable of recapitulating this clustering in contemporary contexts.

Methods: We used robust ensemble machine learning to train a model using gene microarray data to perform multi-platform predictions including RNA-seq and potentially scRNA-seq.

Results: The engineered feature set was composed of many previously reported genes that are associated with patient prognosis. Interestingly, these well-known genes formed a predictive signature only for female patients, and the application of the predictive signature to male patients produced unexpected results.

Conclusions: This work demonstrates how annotated "legacy data" can be used to build robust predictive models capable of multi-target predictions across multiple platforms.

Keywords: GBM; clustering; disease subtyping; feature engineering; female; gene expression signatures; glioblastoma multiforme; machine learning.