Deep learning methods have achieved remarkable progress in network intrusion detection. However, their performance often deteriorates significantly in real-world scenarios characterized by limited attack samples and substantial domain shifts. To address this challenge, we propose a novel few-shot intrusion detection method that integrates multi-domain feature fusion with a bidirectional cross-attention mechanism. Specifically, the method adopts a dual-branch feature extractor to jointly capture spatial and frequency domain characteristics of network traffic. The frequency domain features are obtained via two-dimensional discrete cosine transform (2D-DCT), which helps to highlight the spectral structure and improve feature discriminability. To bridge the semantic gap between support and query samples under few-shot conditions, we design a dual-domain bidirectional cross-attention module that enables deep, task-specific alignment across spatial and frequency domains. Additionally, we introduce a hierarchical feature encoding module based on a modified Mamba architecture, which leverages state space modeling to capture long-range dependencies and temporal patterns in traffic sequences. Extensive experiments on two benchmark datasets, CICIDS2017 and CICIDS2018, demonstrate that the proposed method achieves accuracy of 99.03% and 98.64% under the 10-shot setting, outperforming state-of-the-art methods. Moreover, the method exhibits strong cross-domain generalization, achieving over 95.13% accuracy in cross-domain scenarios, thereby proving its robustness and practical applicability in real-world, dynamic network environments.
Copyright: © 2025 Xu et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.