DrugProtAI: A machine learning-driven approach for predicting protein druggability through feature engineering and robust partition-based ensemble methods

Ankit Halder; Sabyasachi Samantaray; Sahil Barbade; Aditya Gupta; Sanjeeva Srivastava

doi:10.1093/bib/bbaf330

DrugProtAI: A machine learning-driven approach for predicting protein druggability through feature engineering and robust partition-based ensemble methods

Brief Bioinform. 2025 Jul 2;26(4):bbaf330. doi: 10.1093/bib/bbaf330.

Authors

Ankit Halder¹, Sabyasachi Samantaray², Sahil Barbade³, Aditya Gupta⁴, Sanjeeva Srivastava¹

Affiliations

¹ Department of Biosciences and Bioengineering, Indian Institute of Technology Bombay, Powai, Mumbai 400076, Maharashtra, India.
² Department of Computer Science and Engineering, Indian Institute of Technology Bombay, Powai, Mumbai 400076, Maharashtra, India.
³ Department of Civil Engineering, Indian Institute of Technology Bombay, Powai, Mumbai 400076, Maharashtra, India.
⁴ Department of Mechanical Engineering, Indian Institute of Technology Bombay, Powai, Mumbai 400076, Maharashtra, India.

Abstract

Drug design and development are central to clinical research, yet 90% of drugs fail to reach the clinic, often due to inappropriate selection of drug targets. Conventional methods for target identification lack precision and sensitivity. While various computational tools have been developed to predict the druggability of proteins, they often focus on limited subsets of the human proteome or rely solely on amino acid properties. Our study presents DrugProtAI, a tool developed by implementing a partitioning-based method and trained on the entire human protein set using both sequence- and non-sequence-derived properties. The partitioned method was evaluated using popular machine learning algorithms, of which Random Forest and XGBoost performed the best. A comprehensive analysis of 183 features, encompassing biophysical, sequence-, and non-sequence-derived properties, achieved a median Area Under Precision-Recall Curve (AUC) of 0.87 in target prediction. The model was further tested on a blinded validation set comprising recently approved drug targets. The key predictors were also identified, which we believe will help users in selecting appropriate drug targets. We believe that these insights are poised to significantly advance drug development. This version of the tool provides the probability of druggability for human proteins. The tool is freely accessible at https://drugprotai.pythonanywhere.com/.

Keywords: drug discovery; druggable targets; ensemble-based methods; feature selection; machine learning.

MeSH terms

Algorithms
Computational Biology* / methods
Databases, Protein
Drug Design*
Drug Discovery* / methods
Humans
Machine Learning*
Proteins* / chemistry
Proteins* / metabolism
Software*

Substances

Proteins

Abstract

MeSH terms

Substances

Grants and funding