Temperature-Dependent Small-Molecule Solubility Prediction Using MoE-Enhanced Directed Message Passing Neural Networks

Lixiang Guo; Yujing Zhao; Qilei Liu; Lei Zhang; Jian Du; Qingwei Meng

doi:10.1021/acs.jcim.5c00781

Temperature-Dependent Small-Molecule Solubility Prediction Using MoE-Enhanced Directed Message Passing Neural Networks

J Chem Inf Model. 2025 Jul 10. doi: 10.1021/acs.jcim.5c00781. Online ahead of print.

Authors

Lixiang Guo¹, Yujing Zhao^{1

2}, Qilei Liu^{1

3}, Lei Zhang^{1

3}, Jian Du^{1

3}, Qingwei Meng^{1

3}

Affiliations

¹ State Key Laboratory of Fine Chemicals, Frontiers Science Center for Smart Materials Oriented Chemical Engineering, Department of Pharmaceutical Sciences, Institute of Chemical Process Systems Engineering, School of Chemical Engineering, Dalian University of Technology, Dalian 116024, China.
² MOE Key Laboratory of Bio-Intelligent Manufacturing, School of Bioengineering, Dalian University of Technology, Dalian 116024, China.
³ Ningbo Institute of Dalian University of Technology, Ningbo 315016, China.

PMID: 40637004
DOI: 10.1021/acs.jcim.5c00781

Abstract

Solubility prediction is crucial for drug development and materials science, yet existing models struggle with generalizability across diverse solvents and temperatures. This study develops a novel solubility prediction model, DMPNN-MoE, which integrates a directed message passing neural network (DMPNN) with a mixture-of-experts (MoE) algorithm to address these limitations. Leveraging a curated data set of 56,945 experimental solubility values spanning 791 solutes, 140 solvents, and temperatures from 243.15 to 403.15 K, the model effectively captures molecular structural features and solute-solvent interactions. The DMPNN enhances feature extraction through directional message passing, while the MoE dynamically allocates experts to improve adaptability across diverse solute-solvent systems. Rigorous 10-fold cross-validation demonstrates superior performance (MAE = 0.256 ± 0.010, R² = 0.863 ± 0.016), outperforming graph-based (DMPNN, MPNN-MoE, MPNN, GINE-MoE, GINE, GAT-MoE, and GAT) and descriptor-based (RDkit/Mol2vec-ANN/RF/SVR) benchmarks by up to 35%. The model retains high accuracy for unseen rare solvents (MAE = 0.341 on 2380 low-sample data) and generalizes to 12 unseen solutes across 1221 data points (MAE = 0.413), indicating robust generalization with some limitations. Feature importance evaluation further elucidates critical molecular features influencing solubility, such as three-membered ring structures and bond E/Z configurations. This work establishes a robust, interpretable model for temperature-dependent solubility prediction, with broad applications in pharmaceutical discovery and chemical engineering.