Rapid in silico directed evolution by a protein language model with EVOLVEpro

Kaiyi Jiang; Zhaoqing Yan; Matteo Di Bernardo; Samantha R Sgrizzi; Lukas Villiger; Alisan Kayabolen; B J Kim; Josephine K Carscadden; Masahiro Hiraizumi; Hiroshi Nishimasu; Jonathan S Gootenberg; Omar O Abudayyeh

doi:10.1126/science.adr6006

Rapid in silico directed evolution by a protein language model with EVOLVEpro

Science. 2025 Jan 24;387(6732):eadr6006. doi: 10.1126/science.adr6006. Epub 2025 Jan 24.

Authors

Kaiyi Jiang^#^{1

2

3

4}, Zhaoqing Yan^#^{1

2

3}, Matteo Di Bernardo⁵, Samantha R Sgrizzi^{1

2

3}, Lukas Villiger⁶, Alisan Kayabolen^{1

2

3}, B J Kim⁷, Josephine K Carscadden^{1

2

3}, Masahiro Hiraizumi⁸, Hiroshi Nishimasu^{8

9

10}, Jonathan S Gootenberg^#^{1

2

3}, Omar O Abudayyeh^#^{1

2

3}

Affiliations

¹ Department of Medicine Division of Engineering in Medicine Brigham and Women's Hospital Harvard Medical School, Boston, MA, USA.
² Gene and Cell Therapy Institute Mass General Brigham, Cambridge, MA, USA.
³ Center for Virology and Vaccine Research Beth Israel Deaconess Medical Center Harvard Medical School, Boston, MA, USA.
⁴ Department of Bioengineering Massachusetts Institute of Technology, Cambridge, MA, USA.
⁵ Whitehead Institute Massachusetts Institute of Technology, Cambridge, MA, USA.
⁶ Department of Dermatology and Allergology Kantonspital St. Gallen, St. Gallen, Switzerland.
⁷ Koch Institute for Integrative Cancer Research at MIT Massachusetts Institute of Technology, Cambridge, MA, USA.
⁸ Department of Chemistry and Biotechnology, Graduate School of Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, Japan.
⁹ Structural Biology Division, Research Center for Advanced Science and Technology, The University of Tokyo, 4-6-1 Komaba, Meguro-ku, Tokyo, Japan.
¹⁰ Inamori Research Institute for Science, 620 Suiginya-cho, Shimogyo-ku, Kyoto, Japan.

^# Contributed equally.

PMID: 39571002
DOI: 10.1126/science.adr6006

Abstract

Directed protein evolution is central to biomedical applications but faces challenges such as experimental complexity, inefficient multiproperty optimization, and local maxima traps. Although in silico methods that use protein language models (PLMs) can provide modeled fitness landscape guidance, they struggle to generalize across diverse protein families and map to protein activity. We present EVOLVEpro, a few-shot active learning framework that combines PLMs and regression models to rapidly improve protein activity. EVOLVEpro surpasses current methods, yielding up to 100-fold improvements in desired properties. We demonstrate its effectiveness across six proteins in RNA production, genome editing, and antibody binding applications. These results highlight the advantages of few-shot active learning with minimal experimental data over zero-shot predictions. EVOLVEpro opens new possibilities for artificial intelligence-guided protein engineering in biology and medicine.

MeSH terms

Computer Simulation
Deep Learning*
Directed Molecular Evolution* / methods
Gene Editing
Protein Engineering / methods
Proteins* / chemistry
Proteins* / genetics
Software

Substances

Proteins

Abstract

MeSH terms

Substances

Grants and funding