Structure-Infused Protein Language Models

bioRxiv [Preprint]. 2024 Apr 23:2023.12.13.571525. doi: 10.1101/2023.12.13.571525.

Abstract

Embeddings from protein language models (PLM's) capture intricate patterns for protein sequences, enabling more accurate and efficient prediction of protein properties. Incorporating protein structure information as direct input into PLMs results in an improvement on the predictive ability of protein embeddings on downstream tasks. In this work we demonstrate that indirectly infusing structure information into PLMs also leads to performance gains on structure related tasks. The key difference between this framework and others is that at inference time the model does not require access to structure to produce its embeddings.

Publication types

  • Preprint