Multimodal processing in simultaneous interpreting with text: Evidence from ear-eye-voice span and performance

PLoS One. 2025 Jul 3;20(7):e0326527. doi: 10.1371/journal.pone.0326527. eCollection 2025.

Abstract

Simultaneous interpreting (SI) with text, a hybrid modality combining auditory and visual inputs, presents greater cognitive complexity than traditional SI. This study investigates multimodal processing in Chinese-English SI with text by examining how source speech rate and professional experience modulate interpreters' Ear-Eye-Voice Span (EIVS)-a temporal measure reflecting the cognitive coordination among auditory input, visual processing, and verbal output-and interpreting performance. Using eye-tracking technology, we analyzed EIVS patterns in 15 professional interpreters and 30 interpreting trainees performing three SI with text tasks at slow, intermediate and fast speech rates. EIVS measures, including Ear-Eye Span (EIS), Eye-Voice Span (IVS), and Ear-Voice Span (EVS), were analyzed to assess temporal coordination of listening, reading and interpreting processes. Results indicate that faster speech rates significantly reduced EIVS across all measures, suggesting accelerated information processing and strategic cognitive adaptation. A significant interaction effect between speech rate and professional experience was observed. Professionals maintained more stable and efficient EIVS patterns, particularly under accelerated speech rates, reflecting an advantage in cross-modal attention allocation and cognitive resource management. In contrast, trainees exhibited greater reliance on visual input, and struggled more with multimodal demands, manifested in longer EIVS values and greater individual variation. Both groups exhibited an ear-lead-eye coordination pattern during the fast speech rate task, though professionals achieved more efficient auditory-visual synchronization. Despite a decline in interpreting performance with increasing speech rates, professionals consistently outperformed trainees. These findings underscore the critical role of experience in enhancing multimodal coordination, and highlight the importance of dedicated skill-specific practice in enhancing auditory-visual coordination and optimizing interpreting performance under cognitively demanding conditions.

MeSH terms

  • Adult
  • Auditory Perception* / physiology
  • Eye-Tracking Technology
  • Female
  • Humans
  • Male
  • Reading
  • Speech Perception* / physiology
  • Translating*
  • Visual Perception / physiology
  • Voice* / physiology
  • Young Adult