Background: We assessed the ability to supplement existing epidemiologic/etiologic studies with data on treatment and clinical outcomes by linking to publicly available cancer registry and administrative databases.
Methods: Medical records were retrieved and abstracted for cases enrolled in a Los Angeles County case-control study of non-Hodgkin lymphoma (NHL). Cases were linked to the Los Angeles County cancer registry (CSP), the California state hospitalization discharge database (OSHPD), and the SEER-Medicare database. We assessed sensitivity, specificity, and positive predictive value (PPV) of cancer treatment in linked databases, compared with medical record abstraction.
Results: We successfully retrieved medical records for 918 of 1,004 participating NHL cases and abstracted treatment for 698. We linked 59% of cases (96% of cases >65 years old) to SEER-Medicare and 96% to OSHPD. Chemotherapy was the most common treatment and best captured, with the highest sensitivity in SEER-Medicare (80%) and CSP (74%); combining all three data sources together increased sensitivity (92%), at reduced specificity (56%). Sensitivity for radiotherapy was moderate: 77% with aggregated data. Sensitivity of BMT was low in the CSP (42%), but high for the administrative databases, especially OSHPD (98%). Sensitivity for surgery reached 83% when considering all three datasets in aggregate, but PPV was 60%. In general, sensitivity and PPV for chronic lymphocytic leukemia/small lymphocytic lymphoma were low.
Conclusions: Chemotherapy was accurately captured by all data sources. Hospitalization data yielded the highest performance values for BMTs. Performance measures for radiotherapy and surgery were moderate.
Impact: Various administrative databases can supplement epidemiologic studies, depending on treatment type and NHL subtype of interest.
©2020 American Association for Cancer Research.