Methodology for contamination detection and reduction in fermentation processes using machine learning

Xuan Dung James Nguyen; Y A Liu; Christopher C McDowell; Luke Dooley

doi:10.1007/s00449-025-03194-6

Methodology for contamination detection and reduction in fermentation processes using machine learning

Bioprocess Biosyst Eng. 2025 Jun 26. doi: 10.1007/s00449-025-03194-6. Online ahead of print.

Authors

Xuan Dung James Nguyen¹, Y A Liu², Christopher C McDowell^{1

3}, Luke Dooley³

Affiliations

¹ Aspen Tech Center of Excellence in Process System Engineering, Department of Chemical Engineering, Virginia Polytechnic Institute and State University, Blacksburg, VA, 24061, USA.
² Aspen Tech Center of Excellence in Process System Engineering, Department of Chemical Engineering, Virginia Polytechnic Institute and State University, Blacksburg, VA, 24061, USA. design@vt.edu.
³ Novonesis Biological, Inc., 5400 Corporate Circle, Salem, VA, 24153, USA.

PMID: 40569455
DOI: 10.1007/s00449-025-03194-6

Abstract

This paper demonstrates an accurate and efficient methodology for fermentation contamination detection and reduction using two machine learning (ML) methods, including one-class support vector machine and autoencoders. We also optimize as many hyperparameters as possible prior to the training of the ML models to improve the model accuracy and efficiency, and choose a Python platform called Optuna, to enable the parallel execution of hyperparameter optimization (HPO). We recommend using Bayesian optimization with hyperband algorithm to carry out HPO. Results show that we can predict contaminated fermentation batches with recall up to 1.0 without sacrificing the precision and specificity of non-contaminated batches, which read up to 0.96 and 0.99, respectively. One-class support vector machine outperforms autoencoders in terms of precision and specificity even though they both achieve an outstanding recall of 1.0. These models demonstrate high accuracy in detecting contamination without requiring labeled contaminated data and are suitable for integration into real-time fermentation monitoring systems with minimal latency and retraining needs. In addition, we benchmark our ML methods against a traditional threshold-based contamination detection approach (mean $\pm$ 3 $σ$ rule) to quantify the added value of using data-driven models. Finally, we identify important independent variables contributing to the contaminated batches and give recommendations on how to regulate them to reduce the likelihood of contamination.

Keywords: Contamination; Fermentation processes; Hyperparameter optimization; Machine learning; SHAP feature importance.