Aggregating multimodal cancer data across unaligned embedding spaces maintains tumor of origin signal

bioRxiv [Preprint]. 2025 May 18:2025.05.14.653900. doi: 10.1101/2025.05.14.653900.

Abstract

AI based embeddings offer the possibilities of encoding complex biological data into low dimensional spaces, called embedding spaces, that maintain the relationships between entities. There is an open question about the compatibility of embedding spaces that are created without any coordination. It has been assumed that signals in these unaligned embedding spaces would be destroyed if vectors were aggregated into summed values. We trained embedding models across different data modalities and tested aggregating the values together to test this assumption. Our research shows that signal from unaligned embedded values is conserved and able to still be used for learning tasks, such as data modality and tumor of origin recognition.

Keywords: Embedding; graph; multimodal; neural network.

Publication types

  • Preprint