Building a Hybrid Physical-Statistical Classifier for Predicting the Effect of Variants Related to Protein-Drug Interactions

Structure. 2019 Sep 3;27(9):1469-1481.e3. doi: 10.1016/j.str.2019.06.001. Epub 2019 Jul 3.

Abstract

A key issue in drug design is how population variation affects drug efficacy by altering binding affinity (BA) in different individuals, an essential consideration for government regulators. Ideally, we would like to evaluate the BA perturbations of millions of single-nucleotide variants (SNVs). However, only hundreds of protein-drug complexes with SNVs have experimentally characterized BAs, constituting too small a gold standard for straightforward statistical model training. Thus, we take a hybrid approach: using physically based calculations to bootstrap the parameterization of a full model. In particular, we do 3D structure-based docking on ∼10,000 SNVs modifying known protein-drug complexes to construct a pseudo gold standard. Then we use this augmented set of BAs to train a statistical model combining structure, ligand and sequence features and illustrate how it can be applied to millions of SNVs. Finally, we show that our model has good cross-validated performance (97% AUROC) and can also be validated by orthogonal ligand-binding data.

Keywords: drug resistance; machine learning; nsSNV; protein-drug interactions.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Computational Biology / methods*
  • Databases, Protein
  • Drug Design
  • Humans
  • Ligands
  • Machine Learning
  • Models, Statistical
  • Molecular Docking Simulation
  • Polymorphism, Single Nucleotide*
  • Protein Binding
  • Protein Conformation
  • Proteins / chemistry*
  • Proteins / genetics*
  • Proteins / metabolism

Substances

  • Ligands
  • Proteins