nf-core/pacvar: a pipeline for analyzing long-read PacBio whole genome and repeat expansion sequencing data

Tanya Jain; Claire Clelland

doi:10.1093/bioinformatics/btaf116

nf-core/pacvar: a pipeline for analyzing long-read PacBio whole genome and repeat expansion sequencing data

Bioinformatics. 2025 Mar 29;41(4):btaf116. doi: 10.1093/bioinformatics/btaf116.

Authors

Tanya Jain¹, Claire Clelland^{1

2}

Affiliations

¹ Weill Institute for Neurosciences, University of California, San Francisco, CA, 94158, United States.
² Department of Neurology, Memory & Aging Center, University of California, San Francisco, CA, 94158, United States.

Abstract

Motivation: Pacific Biosciences (PacBio) single-molecule, long-read sequencing enables whole genome annotation and the characterization of 20 complex repetitive repeat regions, especially relevant to neurodegenerative diseases, through their PureTarget panel. Long-read whole-genome sequencing (WGS) also allows for the detection of structural variants that would be difficult to detect with traditional short-read sequencing. However, the raw unaligned Binary Alignment Map data need to be processed before analysis. There is a need for an intuitive comprehensive bioinformatic pipeline that can analyze these data.

Results: We present nf-core/pacvar, a comprehensive pipeline for analyzing both PacBio single-molecule PureTarget and WGS data that demultiplexes and parallelizes pre-processing, variant calling and repeat characterization. nf-core/pacvar is compatible with little configuration and has few dependencies. This pipeline enables rapid end-to-end, parallel processing of PacBio single-molecule whole genome and targeted repeat expansion sequencing.

Availability and implementation: nf-core/pacvar is available on nf-core website (https://nf-co.re/pacvar/) and on github (https://github.com/nf-core/pacvar) under MIT License (DOI: 10.5281/zenodo.14813048).

MeSH terms

Computational Biology / methods
DNA Repeat Expansion*
Genome, Human
Genomics* / methods
High-Throughput Nucleotide Sequencing* / methods
Humans
Sequence Analysis, DNA* / methods
Software*
Whole Genome Sequencing* / methods

Abstract

MeSH terms

Grants and funding