Systematic review of shoulder arthroplasty outcomes: what sample size is meaningful?

J Shoulder Elbow Surg. 2025 Jan 19:S1058-2746(25)00028-X. doi: 10.1016/j.jse.2024.11.029. Online ahead of print.

Abstract

Background: Shoulder arthroplasty is increasingly performed for shoulder conditions such as arthritis, rotator cuff arthropathy, and traumatic injuries. Registries and other compilations of patient data provide the opportunity to detect meaningful differences in outcomes between alternative techniques and implants. A wide range of outcome measurements are reported after shoulder arthroplasty, but the sample sizes needed to identify meaningful differences have not been studied systematically. This review systematically analyzes common clinical outcomes reported for shoulder arthroplasty and reports the sample sizes necessary to confirm a clinically meaningful and statistically significant difference for these outcome measures.

Methods: A systematic review was conducted following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. Included studies evaluated outcomes of anatomic or reverse total shoulder arthroplasty (TSA) with a minimum of 2 years of follow-up data. Outcome measures reported in 3 or more studies were combined to establish an overall mean and standard deviation for each measure. Using these combined measures, the sample size needed to detect clinically and statistically meaningful differences was established using published minimally clinically important differences with the assumptions of significance at α = 0.05 and power at β = 0.20.

Results: A total of 43 studies (29 anatomic TSA and 13 reverse TSA) met inclusion criteria, comprising 84,503 shoulders. Outcome measures analyzed were revision rate, range of motion (ROM), patient-reported outcomes (PROs) (American Shoulder and Elbow Surgeons, Constant-Murley, Western Ontario Osteoarthritis of the Shoulder, Disabilities of the Arm, Shoulder, and Hand, Simple Shoulder Test, and visual analog scale scores), and complication rates. The sample size needed to detect meaningful differences is much lower for continuous measures (ROM and PROs) than for dichotomous outcomes (revisions and complications). For example, ROM outcome requires a minimum of 13 patients per treatment group to demonstrate a 15° change. PROs required a minimum of 6 patients to demonstrate a minimally clinically important difference. For a 20% difference in treatment groups, revision rates required a minimum of 8527 patients and total complications required a minimum of 1854.

Discussion: Large patient databases provide the opportunity to improve patient outcomes based on measured evidence. However, comparative studies need to use appropriate outcome measures with adequate sample sizes to provide meaningful results. This review has identified the minimum sample size needed to provide clinically meaningful conclusions for various outcomes reported in studies. Binary outcomes (revision rate and complication rate) are less sensitive and require larger sample sizes, whereas continuous outcomes (ROM and PROs) require smaller sample sizes. This should be considered when establishing patient registries, publishing results, and analyzing studies on shoulder arthroplasty.

Keywords: Total shoulder arthroplasty; complications; outcome measures; patient-reported outcomes; power analysis; range of motion; sensitivity analysis.