Optimized customer churn prediction using tabular generative adversarial network (GAN)-based hybrid sampling method and cost-sensitive learning

I Nyoman Mahayasa Adiputra; Paweena Wanchai; Pei-Chun Lin

doi:10.7717/peerj-cs.2949

Optimized customer churn prediction using tabular generative adversarial network (GAN)-based hybrid sampling method and cost-sensitive learning

PeerJ Comput Sci. 2025 Jun 19:11:e2949. doi: 10.7717/peerj-cs.2949. eCollection 2025.

Authors

I Nyoman Mahayasa Adiputra¹, Paweena Wanchai¹, Pei-Chun Lin²

Affiliations

¹ College of Computing, Khon Kaen University, Khon Kaen, Thailand.
² Department of Information Engineering and Computer Science, Feng Chia University, Taichung, Taiwan.

Abstract

Background: Imbalanced and overlapped data in customer churn prediction significantly impact classification results. Various sampling and hybrid sampling methods have demonstrated effectiveness in addressing these issues. However, these methods have not performed well with classical machine learning algorithms.

Methods: To optimize the performance of classical machine learning on customer churn prediction tasks, this study introduces an extension framework called CostLearnGAN, a tabular generative adversarial network (GAN)-hybrid sampling method, and cost-sensitive Learning. Utilizing a cost-sensitive learning perspective, this research aims to enhance the performance of several classical machine learning algorithms in customer churn prediction tasks. Based on the experimental results classical machine learning algorithms exhibit shorter execution times, making them suitable for predicting churn in large customer bases.

Results: This study conducted an experiment with six comparative sampling methods, six datasets, and three machine learning algorithms. The results show that CostLearnGAN achieved a satisfying result across all evaluation metrics with a 1.44 average mean rank score. Additionally, this study provided a robustness measurement for algorithms, demonstrating that CostLearnGAN outperforms other sampling methods in improving the performance of classical machine learning models with a 5.68 robustness value on average.

Keywords: Cost-sensitive learning; Customer churn prediction; GAN-based hybrid sampling method.