Exploring the Potential of Electroencephalography Signal-Based Image Generation Using Diffusion Models: Integrative Framework Combining Mixed Methods and Multimodal Analysis

JMIR Med Inform. 2025 Jun 25:13:e72027. doi: 10.2196/72027.

Abstract

Background: Electroencephalography (EEG) has been widely used to measure brain activity, but its potential to generate accurate images from neural signals remains a challenge. Most EEG-decoding research has focused on tasks such as motor imagery, emotion recognition, and brain wave classification, which involve EEG signal analysis and classification. Some studies have explored the correlation between EEG and images, primarily focusing on EEG-image pair classification or transformation. However, EEG-based image generation remains underexplored.

Objective: The primary goal of this study was to extend EEG-based classification to image generation, addressing the limitations of previous methods and unlocking the full potential of EEG for image synthesis. To achieve more meaningful EEG-to-image generation, we developed a novel framework, Neural-Cognitive Multimodal EEG-Informed Image (NECOMIMI), which was specifically designed to generate images directly from EEG signals.

Methods: We developed a 2-stage NECOMIMI method, which integrated the novel Neural Encoding Representation Vectorizer (NERV) EEG encoder that we designed with a diffusion-based generative model. The Category-Based Assessment Table (CAT) score was introduced to evaluate the semantic quality of EEG-generated images. In addition, the ThingsEEG dataset was used to validate and benchmark the CAT score, providing a standardized measure for assessing EEG-to-image generation performance.

Results: The NERV EEG encoder achieved state-of-the-art performance in several zero-shot classification tasks, with an average accuracy of 94.8% (SD 1.7%) in the 2-way task and 86.8% (SD 3.4%) in the 4-way task, outperforming models such as Natural Image Contrast EEG, Multimodal Similarity-Keeping Contrastive Learning, and Adaptive Thinking Mapper ShallowNet. This highlighted its superiority as a feature extraction tool for EEG signals. In a 1-stage image generation framework, EEG embeddings often resulted in abstract or generalized images such as landscapes instead of specific objects. Our proposed 2-stage NECOMIMI architecture effectively extracted semantic information from noisy EEG signals, showing its ability to capture and represent underlying concepts derived from brain wave activity. We further conducted a perturbation study to test whether the model overly depended on visual cortex EEG signals for scene-based image generation. The perturbation of visual cortex EEG channels led to a notable increase in Fréchet inception distance scores, suggesting that our model relied heavily on posterior brain signals to generate semantically coherent images.

Conclusions: NECOMIMI demonstrated the potential of EEG-to-image generation, revealing the challenges of translating noisy EEG data into accurate visual representations. The novel NERV EEG encoder for multimodal contrastive learning reached state-of-the-art performance both on n-way zero-shot and EEG-informed image generation. The introduction of the CAT score provided a new evaluation metric, paving the way for future research to refine generative models. In addition, this study highlighted the significant clinical potential of EEG-to-image generation, particularly in enhancing brain-machine interface systems and improving quality of life for individuals with motor impairments.

Keywords: brain-computer interface; diffusion models; electroencephalography; electroencephalography to image; multimodal generative framework.

MeSH terms

  • Adult
  • Brain* / diagnostic imaging
  • Brain* / physiology
  • Electroencephalography* / methods
  • Humans
  • Image Processing, Computer-Assisted* / methods
  • Signal Processing, Computer-Assisted*