Background: Previous studies have described sex-specific patient subtyping in glioblastoma. The cluster labels associated with these "legacy data" were used to train a predictive model capable of recapitulating this clustering in contemporary contexts.
Methods: We used robust ensemble machine learning to train a model using gene microarray data to perform multi-platform predictions including RNA-seq and potentially scRNA-seq.
Results: The engineered feature set was composed of many previously reported genes that are associated with patient prognosis. Interestingly, these well-known genes formed a predictive signature only for female patients, and the application of the predictive signature to male patients produced unexpected results.
Conclusions: This work demonstrates how annotated "legacy data" can be used to build robust predictive models capable of multi-target predictions across multiple platforms.
Keywords: GBM; clustering; disease subtyping; feature engineering; female; gene expression signatures; glioblastoma multiforme; machine learning.