Temperature-Dependent Small-Molecule Solubility Prediction Using MoE-Enhanced Directed Message Passing Neural Networks

J Chem Inf Model. 2025 Jul 10. doi: 10.1021/acs.jcim.5c00781. Online ahead of print.

Abstract

Solubility prediction is crucial for drug development and materials science, yet existing models struggle with generalizability across diverse solvents and temperatures. This study develops a novel solubility prediction model, DMPNN-MoE, which integrates a directed message passing neural network (DMPNN) with a mixture-of-experts (MoE) algorithm to address these limitations. Leveraging a curated data set of 56,945 experimental solubility values spanning 791 solutes, 140 solvents, and temperatures from 243.15 to 403.15 K, the model effectively captures molecular structural features and solute-solvent interactions. The DMPNN enhances feature extraction through directional message passing, while the MoE dynamically allocates experts to improve adaptability across diverse solute-solvent systems. Rigorous 10-fold cross-validation demonstrates superior performance (MAE = 0.256 ± 0.010, R2 = 0.863 ± 0.016), outperforming graph-based (DMPNN, MPNN-MoE, MPNN, GINE-MoE, GINE, GAT-MoE, and GAT) and descriptor-based (RDkit/Mol2vec-ANN/RF/SVR) benchmarks by up to 35%. The model retains high accuracy for unseen rare solvents (MAE = 0.341 on 2380 low-sample data) and generalizes to 12 unseen solutes across 1221 data points (MAE = 0.413), indicating robust generalization with some limitations. Feature importance evaluation further elucidates critical molecular features influencing solubility, such as three-membered ring structures and bond E/Z configurations. This work establishes a robust, interpretable model for temperature-dependent solubility prediction, with broad applications in pharmaceutical discovery and chemical engineering.