BACKGROUND: Large automated electronic health records (EHRs), if brought together in a federated data model, have the potential to serve as valuable population-based tools in studying the patterns and effectiveness of treatment. The Indiana Network for Patient Care (INPC) is a unique federated EHR data repository that contains data collected from a large population across various health care settings throughout the state of Indiana. The INPC clinical data environment allows quick access and extraction of information from medical charts. The purpose of this project was to evaluate 2 different methods of record linkage between the Indiana State Cancer Registry (ISCR) and INPC, determine the match rate for linkage between the ISCR and INPC data for patients diagnosed with cancer, and to assess the completeness of the ISCR based on additional validated cancer cases identified in the INPC EHRs. METHODS: Deterministic and probabilistic algorithms were applied to link ISCR cases to the INPC. The linkage results were validated by manual review and the accuracy assessed with positive predictive value (PPV). Medical charts of melanoma and lung cancer cases identified in INPC but not linked to ISCR were manually reviewed to identify true incidence cancers missed by the ISCR, from which the completeness of the ISCR was estimated for each cancer. RESULTS: Both deterministic and probabilistic approaches to linking ISCR and INPC had extremely high PPV (>99%) for identifying true matches for the overall cohort and each subcohort. The combined match rate for melanoma and lung cancer cases identified in the ISCR that matched to any patient occurrence in INPC (not by disease) was 85.5% for the complete cohort, 94.4% for melanoma, and 84.4% for lung cancer. The estimated completeness of capture by the ISCR was 84% for melanoma and 98% for lung cancer. Conclusion: Cancer registries can be successfully linked to patients’ EHR data from institutions participating in a regional health information organization (RHIO) with a high match rate. A pragmatic approach to data linkage may apply both deterministic and probabilistic approaches together for the diverse purposes of cancer control research. The RHIO has the potential to add value to the state cancer registry through the identification of additional true incident cases, but more advanced approaches, such as natural language processing, are needed.