IECata: interpretable bilinear attention network and evidential deep learning improve the catalytic efficiency prediction of enzymes

Jingjing Wang; Yanpeng Zhao; Zhijiang Yang; Ge Yao; Penggang Han; Jiajia Liu; Chang Chen; Peng Zan; Xiukun Wan; Xiaochen Bo; Hui Jiang

doi:10.1093/bib/bbaf283

IECata: interpretable bilinear attention network and evidential deep learning improve the catalytic efficiency prediction of enzymes

Brief Bioinform. 2025 May 1;26(3):bbaf283. doi: 10.1093/bib/bbaf283.

Authors

Jingjing Wang¹, Yanpeng Zhao^{2

3}, Zhijiang Yang¹, Ge Yao¹, Penggang Han¹, Jiajia Liu¹, Chang Chen¹, Peng Zan⁴, Xiukun Wan¹, Xiaochen Bo³, Hui Jiang¹

Affiliations

¹ State Key Laboratory of NBC Protection for Civilian, No. 37, South Central Street, Changping District, Beijing 102205, China.
² School of Medicine, Shanghai University, No. 99, Shangda Road, Baoshan District, Shanghai 200444, China.
³ Academy of Military Medical Sciences, No. 27, Taiping Road, Haidian District, Beijing 100039, China.
⁴ Shanghai Key Laboratory of Power Station Automation Technology, School of Mechatronics Engineering and Automation, Shanghai University, No. 99, Shangda Road, Baoshan District, Shanghai 200444, China.

Abstract

Enzyme catalytic efficiency (kcat/Km) is a key parameter for identifying high-activity enzymes. Recently, deep learning techniques have demonstrated the potential for fast and accurate kcat/Km prediction. However, three challenges remain: (i) the limited size of the available kcat/Km dataset hinders the development of deep learning models; (ii) the model predictions lack reliable confidence estimates; and (iii) models lack interpretable insights into enzyme-catalyzed reactions. To address these challenges, we proposed IECata, a kcat/Km prediction model that provides uncertainty estimation and interpretability. IECata collected a dataset of 11 815 kcat/Km entries from the BRENDA and SABIO-RK databases, along with an out-of-domain test dataset of 806 entries from the literature. By introducing evidential deep learning, IECata provides uncertainty estimates for kcat/Km predictions. Moreover, it uses a bilinear attention mechanism to focus on learning crucial local interactions to interpret the key residues and substrate atoms in enzyme-catalyzed reactions. Testing results indicate that the prediction performance of IECata exceeds that of state-of-the-art benchmark models. More importantly, it provides a reliable confidence assessment for these predictions. Case studies further highlight that the incorporation of uncertainty in screening for highly active enzymes can effectively increase the hit ratio, thereby improving the efficiency of experimental validation and accelerating directed enzyme evolution. To facilitate researchers' use of IECata, we have developed an online prediction platform: http://mathtc.nscc-tj.cn/cataai/.

Keywords: k cat/Km prediction; bilinear attention mechanism; evidential deep learning; interpretability; uncertainty.

MeSH terms

Biocatalysis*
Catalysis
Computational Biology* / methods
Databases, Protein
Deep Learning*
Enzymes* / chemistry
Enzymes* / metabolism
Kinetics

Substances

Enzymes

Abstract

MeSH terms

Substances

Grants and funding