Increasing Rigour in Online Health Surveys Through the Reduction of Fraudulent Data

Wen Zhi Ng; Sundarimaa Erdembileg; Jean Cj Liu; Joseph D Tucker; Rayner Kay Jin Tan

doi:10.2196/68092

Increasing Rigour in Online Health Surveys Through the Reduction of Fraudulent Data

J Med Internet Res. 2025 Jun 26. doi: 10.2196/68092. Online ahead of print.

Authors

Wen Zhi Ng¹, Sundarimaa Erdembileg^{1

2}, Jean Cj Liu², Joseph D Tucker^{3

4}, Rayner Kay Jin Tan¹

Affiliations

¹ Saw Swee Hock School of Public Health, National University of Singapore and National University Health System, 12 Science Drive 2, #10-01, Singapore, SG.
² Yale-NUS College, National University of Singapore, Singapore, SG.
³ London School of Hygiene and Tropical Medicine, London, GB.
⁴ UNC School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, US.

PMID: 40570901
DOI: 10.2196/68092

Abstract

Online surveys have become a key tool of modern health research, offering a fast, cost-effective, and convenient means of data collection. It enables researchers to access diverse populations, such as those underrepresented in traditional studies, and facilitates the collection of stigmatized or sensitive behaviours through greater anonymity. However, the ease of participation also introduces significant challenges, particularly around data integrity and rigour. As fraudulent responses - whether from bots, repeat responders, or individuals misrepresenting themselves - become more sophisticated and pervasive, ensuring the rigour of online surveys has never been more crucial. This article provides a comprehensive synthesis of practical strategies that help to increase the rigour of online surveys through the detection and removal of fraudulent data. Drawing on recent literature and case studies, we outline several options that address the full research cycle from pre-data collection strategies to post-data collection validation. We emphasize the integration of automated screening techniques (e.g. CAPTCHAs, honeypot questions) and attention checks (e.g. trap questions) for purposeful survey design. Robust recruitment procedures (e.g. concealed eligibility criteria, two-stage screening) and a proper incentive or compensation structure can also help to deter fraudulent participation. We examine the merits and limitations of different sampling methodologies, including river sampling, online panels, and crowdsourcing platforms, offering guidance on how to select samples based on specific research objectives. Post-data collection, we discuss meta-data based techniques to detect fraudulent data (e.g. duplicate email or IP addresses, response time analysis), alongside methods to better screen for low quality responses (e.g. inconsistent response patterns, improbable qualitative responses). The escalating sophistication of fraud tactics, particularly with the growth of Artificial Intelligence, demands that researchers continuously adapt and stay vigilant. We propose the use of dynamic protocols, combining multiple strategies into a multi-pronged approach that can better filter for fraudulent data and evolve depending on the type of responses received across the data-collection process. However, there is still significant room for strategies to develop, and it should be a key focus for upcoming research. As online surveys become increasingly integral to health research, investing in robust strategies to screen for fraudulent data and increasing the rigour of studies is key to upholding scientific integrity.