Machine Learning Classifier Using Blood Count Parameters and Erythropoietin to Predict JAK2 Mutations in Patients With Erythrocytosis

Arch Pathol Lab Med. 2025 Apr 28. doi: 10.5858/arpa.2023-0262-OA. Online ahead of print.

Abstract

Context.—: Differentiating polycythemia vera from other causes of erythrocytosis is a diagnostic challenge. Although most patients with polycythemia vera have Janus kinase 2 (JAK2) mutations, extensive testing is impractical because this is an uncommon cause of erythrocytosis. Identifying polycythemic patients most likely to benefit from JAK2 testing would improve use of this test.

Objective.—: To develop an artificial intelligence analysis/machine learning classifier using blood count parameters and erythropoietin to predict JAK2 results in patients with erythrocytosis.

Design.—: Results from the Veterans Affairs data warehouse were used for training and validation. Cases with JAK2 results and hemoglobin values 15 g/dL or higher and 17 g/dL or higher in females and males respectively were included. Erythropoietin was optional. The highest performing model was evaluated with an out-of-sample data set.

Results.—: Among 31 models trained on data from 8479 individuals, including 540 (6.4%) positive for JAK2, Light Gradient Boosted Trees Classifier performed best. When applied to 330 out-of-sample cases with 9 (2.7%) positive for JAK2, the classifier's sensitivity, specificity, positive predictive value, and negative predictive value, were 100%, 92.8%, 28.1%, and 100%, respectively. Among a subset of 183 out-of-sample cases, the model's algorithm would have potentially reduced JAK2 testing by 89% compared with 50% to 62% reduction using previously reported rule-based systems that similarly used blood count parameters. Platelet count had the greatest impact on the model, followed by relative distribution width and erythropoietin.

Conclusions.—: These results show that a machine learning classifier may be beneficial as a decision support aid for JAK2 testing in polycythemic patients.