Large Context, Deeper Insights: Harnessing Large Language Models for Advancing Protein-Protein Interaction Analysis

Kaicheng U; Sophia Meixuan Zhang; Suresh Pokharel; Pawel Pratyush; Farah Qaderi; Dongfang Liu; Junhan Zhao; Dukka B Kc; Siwei Chen

doi:10.1007/978-1-0716-4623-6_15

Large Context, Deeper Insights: Harnessing Large Language Models for Advancing Protein-Protein Interaction Analysis

Methods Mol Biol. 2025:2941:243-267. doi: 10.1007/978-1-0716-4623-6_15.

Authors

Kaicheng U^{1

2}, Sophia Meixuan Zhang^{3

4}, Suresh Pokharel⁵, Pawel Pratyush⁵, Farah Qaderi⁶, Dongfang Liu⁷, Junhan Zhao^{8

9}, Dukka B Kc¹⁰, Siwei Chen^{11

12

13}

Affiliations

¹ Tri-Institutional Computational Biology & Medicine, Weill Cornell Medicine, New York, NY, USA.
² Department of Computational Oncology, Memorial Sloan Kettering Cancer Center, New York, NY, USA.
³ College of Agriculture and Life Sciences, Cornell University, Ithaca, NY, USA.
⁴ Harvard College, Harvard University, Cambridge, MA, USA.
⁵ Department of Computer Science, Rochester Institute of Technology, Rochester, NY, USA.
⁶ Department of Surgical Oncology, Massachusetts General Hospital, Boston, MA, USA.
⁷ Department of Computer Engineering, Rochester Institute of Technology, Rochester, NY, USA.
⁸ Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
⁹ Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
¹⁰ Department of Computer Science, Golisano College of Computing and Information Sciences, Rochester Institute of Technology, Rochester, NY, USA.
¹¹ Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA. siwei@broadinstitute.org.
¹² Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA. siwei@broadinstitute.org.
¹³ Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA. siwei@broadinstitute.org.

PMID: 40601262
DOI: 10.1007/978-1-0716-4623-6_15

Abstract

Protein-protein interactions (PPIs) are involved in nearly all biological processes. Understanding and analysis of PPI is key to revealing biological networks and identifying new therapeutic targets. Various computational approaches have been proposed as an alternative to the experimental investigation of PPIs. More recently, with the advent of Large Language Models (LLMs), a plethora of approaches using LLMs have been developed, enabling efficient analysis of interaction networks and binding sites directly from protein sequences. These models capture intricate biological patterns, offering scalability and adaptability across diverse datasets. However, challenges remain, including computational costs, data imbalance, and the integration of multimodal information. Advancements in addressing these limitations are set to further enhance the potential of LLMs in protein-protein interaction analysis, driving deeper insights and broader applications in biological research.

Keywords: Large language models (LLMs); PPI prediction; Protein language model; Protein–protein interaction (PPI); Sequence-based models.

MeSH terms

Binding Sites
Computational Biology* / methods
Databases, Protein
Humans
Large Language Models
Protein Binding
Protein Interaction Mapping* / methods
Protein Interaction Maps*
Proteins* / chemistry
Proteins* / metabolism
Software

Substances

Proteins