Detection of Early Parkinson's Disease by Leveraging Speech Foundation Models

IEEE J Biomed Health Inform. 2025 Jul;29(7):5181-5190. doi: 10.1109/JBHI.2025.3548917.

Abstract

Parkinson's disease (PD) is a progressive neurodegenerative disorder affecting millions worldwide, characterized by a wide range of motor and non-motor symptoms. Among these symptoms, alterations in speech and voice quality stand out as early and prominent indicators of the disease. Recently, the emergence of speech foundation models has revolutionized the field by providing powerful tools for speech processing and feature extraction. In this article, we investigate the capabilities of three state-of the art speech foundation models, wav2vec2.0, Whisper and SeamlessM4T, to develop robust and accurate methods for PD detection from voice recordings. We experiment with both direct feature extraction and finetuning of the foundation models for the PD classification task, and validate the results against clinical and neuroimaging data. We achieve promising results using both pretrained features and models' finetuning, with finetuning providing stronger performance, up to 91.35% for AUC, which is the new state of the art on the ICEBERG dataset. The predictions of our models also show good correlation with clinical as well as DaTSCAN scores, proving the feasibility to apply speech foundation models for detection of early PD.

MeSH terms

  • Aged
  • Female
  • Humans
  • Male
  • Parkinson Disease* / diagnosis
  • Parkinson Disease* / physiopathology
  • Signal Processing, Computer-Assisted*
  • Speech* / physiology