LPItabformer: Enhancing generalization in predicting lncRNA-protein interactions via a tabular Transformer

Qin Lin; Jie Sheng; Chang Zhou; Tao Xiao; Yilei Meng; Mingxin Lu; Junfang Zhang; Xueyun Yan; Luying Peng; Huaming Cao; Li Li

doi:10.1016/j.csbj.2025.05.050

LPItabformer: Enhancing generalization in predicting lncRNA-protein interactions via a tabular Transformer

Comput Struct Biotechnol J. 2025 May 29:27:2323-2335. doi: 10.1016/j.csbj.2025.05.050. eCollection 2025.

Authors

Qin Lin^{1

2

3}, Jie Sheng^{1

2

3}, Chang Zhou^{1

2

3}, Tao Xiao^{1

2

3}, Yilei Meng^{1

2

3}, Mingxin Lu^{1

2

3}, Junfang Zhang⁴, Xueyun Yan⁵, Luying Peng^{1

2

3}, Huaming Cao⁵, Li Li^{1

2

3}

Affiliations

¹ State Key Laboratory of Cardiovascular Diseases and Medical Innovation Center, Shanghai East Hospital, School of Medicine, Tongji University, Shanghai 200120, China.
² Shanghai Arrhythmias Research Center, Shanghai East Hospital, Tongji University School of Medicine, Shanghai 200120, China.
³ Stem Cell Research Center, Medical School, Tongji University, Shanghai 200120, China.
⁴ Teaching Laboratory Center, Tongji University School of Medicine, Shanghai 200331, China.
⁵ Department of Cardiology, Shibei Hospital, Shanghai 200435, China.

Abstract

Long-noncoding RNAs (LncRNAs) play important roles in physiological and pathological processes. Accurately predicting lncRNA-protein interactions (LPIs) is vital strategy for clarify functions and pathogenic mechanisms of lncRNAs. Current computational methods for evaluating LPIs with their utility and generalization have significant room for improvement. In this study, data splitting by incorporating protein clusters as group information reveals that lots of LPI prediction methods suffer from generalization flaws due to data leakage caused by ignoring LPI biological properties. To address the issue, we present LPItabformer, a tabular Transformer framework for predicting LPIs, that incorporates a domain shifts with uncertainty (DSU) module for generalization enhancement. The LPItabformer demonstrates a capacity to alleviate the generalization challenges associated with biases in LPI data and preferences in protein binding patterns. In addition, LPItabformer shows greater robustness and generalization on human and mouse LPI datasets compared to state-of-the-art methods. Ultimately, we have verified that the LPItabformer is capable of predicting novel LPIs. Code is available at https://github.com/Ci-TJ/LPItabformer.

Keywords: Deep learning; Generalization; LncRNA-protein interactions; Long non-coding RNA; Tabular Transformer.