We developed datasets intended to aid and inform, health and epidemiological studies in Canada by providing highly resolved spatiotemporal concentrations for regulated air pollutants (fine particulate matter PM2.5, nitrogen dioxide NO2, and ozone O3) across Canada. Daily estimates were generated at various spatial resolutions for the years 2000 through 2020. The datasets are based on simulations of the US EPA's Community Multiscale (CMAQ) model at 12 km horizontal resolution. In an effort to increase the accuracy and spatial resolution of the exposure estimates, especially in complex urban environments, we used statistical and machine learning methods to downscale CMAQ outputs to finer resolutions. Downscaling relies on raw CMAQ results, high-resolution land-use datasets, existing concentrations datasets, and observations from the National Air Pollution Surveillance (NAPS) network. Widely used machine learning (ML) algorithms like random forest and gradient boosting were chosen and proved to be promising. Generated datasets at various spatial resolutions (census divisions, postal codes, gridded 12 km, or 1 km) showed adequate statistical performance and clear representations of spatial features associated with pollutant emissions.
Keywords: Air quality modelling; CMAQ; Hybrid modelling; Machine learning; Nitrogen dioxide; Ozone; PM2.5.
© 2025 The Authors.