Shapley value-driven multi-modal deep reinforcement learning for complex decision-making

Jie Zhang; Boqiang Bao; Chao Wang; Feng Zhu

doi:10.1016/j.neunet.2025.107650

Shapley value-driven multi-modal deep reinforcement learning for complex decision-making

Neural Netw. 2025 Jun 21:191:107650. doi: 10.1016/j.neunet.2025.107650. Online ahead of print.

Authors

Jie Zhang¹, Boqiang Bao², Chao Wang³, Feng Zhu³

Affiliations

¹ Nanjing University, China; Nanjing Research Institute of Electronic Engineering, China. Electronic address: guyuexiao95@gmail.com.
² Nanjing University, China.
³ Nanjing Research Institute of Electronic Engineering, China.

PMID: 40580626
DOI: 10.1016/j.neunet.2025.107650

Abstract

Deep Reinforcement Learning (DRL) has made significant strides in addressing various sequential decision-making problems, particularly in domains such as game simulations and robotic control. However, substantial challenges arise when DRL is applied to real-world scenarios characterized by complex multimodal environments. Traditional DRL's reliance on single-modal data limits its ability to extract rich semantic information, which is crucial for effective decision-making in intricate contexts like autonomous driving. Additionally, the scarcity of effective samples and conflicts in sample representation further complicate the training of robust DRL models. To address these limitations, this paper introduces a novel framework-Multi-Modal Deep Reinforcement Learning (MMDRL)-which integrates deep reinforcement learning with multimodal learning to enhance the extraction and utilization of environmental information. We propose a knowledge-based sample augmentation technique to enrich the training dataset and improve the model's generalization capabilities. Furthermore, we conceptualize the perception of complex environmental information as a multi-agent cooperative problem, leveraging the Shapley value to optimize policy decisions by evaluating the contribution of each modality. This approach not only tackles the challenges of multimodal data integration and decision optimization in continuous action spaces but also reduces computational complexity through efficient approximation methods. Extensive experimental validations on benchmark environments such as MuJoCo and Atari demonstrate the effectiveness of our proposed method in enhancing the accuracy and efficiency of agent decision-making. These contributions advance the state-of-the-art in DRL and provide practical solutions for complex decision-making tasks in real-world applications.

Keywords: Agent cooperation; Complex decision-making; Deep reinforcement learning; Multi-modal learning; Shapley value.