This paper demonstrates an accurate and efficient methodology for fermentation contamination detection and reduction using two machine learning (ML) methods, including one-class support vector machine and autoencoders. We also optimize as many hyperparameters as possible prior to the training of the ML models to improve the model accuracy and efficiency, and choose a Python platform called Optuna, to enable the parallel execution of hyperparameter optimization (HPO). We recommend using Bayesian optimization with hyperband algorithm to carry out HPO. Results show that we can predict contaminated fermentation batches with recall up to 1.0 without sacrificing the precision and specificity of non-contaminated batches, which read up to 0.96 and 0.99, respectively. One-class support vector machine outperforms autoencoders in terms of precision and specificity even though they both achieve an outstanding recall of 1.0. These models demonstrate high accuracy in detecting contamination without requiring labeled contaminated data and are suitable for integration into real-time fermentation monitoring systems with minimal latency and retraining needs. In addition, we benchmark our ML methods against a traditional threshold-based contamination detection approach (mean 3 rule) to quantify the added value of using data-driven models. Finally, we identify important independent variables contributing to the contaminated batches and give recommendations on how to regulate them to reduce the likelihood of contamination.
Keywords: Contamination; Fermentation processes; Hyperparameter optimization; Machine learning; SHAP feature importance.
© 2025. The Author(s).