Cluster-based human-in-the-loop strategy for improving machine learning-based circulating tumor cell detection in liquid biopsy

Patterns (N Y). 2025 May 30;6(6):101285. doi: 10.1016/j.patter.2025.101285. eCollection 2025 Jun 13.

Abstract

In liquid biopsy, detecting and differentiating circulating tumor cells (CTCs) and non-CTCs in metastatic cancer patients' blood samples remains challenging. The current gold standard often involves tedious manual examination of extensive image galleries. While machine learning (ML) offers potential automation, human expertise remains essential, particularly when ML systems face uncertainty or incorrect predictions due to limited labeled data. Combining self-supervised deep learning with an easily adaptable conventional ML classifier, we propose a human-in-the-loop approach with a targeted sampling strategy. By directing human efforts to label a limited set of new training samples from high-uncertainty clusters in the latent space, we iteratively reduce the system's uncertainty and improve classification performance, thereby saving time compared to naive sampling approaches. On data from metastatic breast cancer patients, we show the feasibility of our approach and achieve better performance while reducing expert evaluation time compared to the gold standard, the FDA-approved CellSearch system.

Keywords: CTC; circulating tumor cells; clustering; human-in-the-loop; image classification; latent space analysis; liquid biopsy; machine learning; metastatic breast cancer; self-supervision.