Few-shot event-based action recognition

Neural Netw. 2025 Jun 21:191:107750. doi: 10.1016/j.neunet.2025.107750. Online ahead of print.

Abstract

Despite the evident superiority of event cameras in practical vision applications (e.g., action recognition), owing to their distinctive sensing mechanism, existing event-based action recognition methods rely heavily on large-scale training data. However, the expensive cost of camera deployment and the requirement of data privacy protection make it challenging to collect substantial data in real-world scenarios. To address this limitation, we explore a novel yet practical task, Few-Shot Event-Based Action Recognition (FSEAR), which aims at leveraging a minimal number of intractable event action data for model training and accurately classifying unlabeled data into a specific category. Accordingly, we design a new framework for FSEAR, including a Noise-Aware Event Encoder (NAE) and a Distilled Prototypical Distance Fusion (DPDF). The former efficiently filters noise within the spatiotemporal domain while retaining vital information related to action timing. The latter conducts multi-scale measurements across geometric, directional, and distributional dimensions. These two modules benefit mutually and thus effectively exploit the potential characteristics of event data. Extensive experiments on four distinct event action recognition datasets have demonstrated the significant advantages of our model over other few-shot learning methods. Our code and models will be publicly released.

Keywords: Action recognition; Event camera; Few-shot learning.