Motivation: Interest in text mining full-text biomedical research articles is growing. To facilitate automated processing of nearly 3 million full-text articles (in PubMed Central® Open Access and Author Manuscript subsets) and to improve interoperability, we convert these articles to BioC, a community-driven simple data structure in either XML or JavaScript Object Notation format for conveniently sharing text and annotations.
Results: The resultant articles can be downloaded via both File Transfer Protocol for bulk access and a Web API for updates or a more focused collection. Since the availability of the Web API in 2017, our BioC collection has been widely used by the research community.
Availability and implementation: https://www.ncbi.nlm.nih.gov/research/bionlp/APIs/BioC-PMC/.
© Published by Oxford University Press 2019. This work is written by a US Government employee and is in the public domain in the US.