Comparative analysis of RAD-seq methods for SNP discovery and genetic diversity assessment in oil seed crop safflower

Sci Rep. 2025 Jul 2;15(1):22600. doi: 10.1038/s41598-025-06706-2.

Abstract

Safflower (Carthamus tinctorius L.) is an important oilseed crop with diverse uses and the potential for genetic improvement. This study aimed to optimize genotyping-by-sequencing (GBS) for safflower via in silico and in vitro methods with two restriction site-associated DNA sequencing (RAD-seq) approaches, i.e., single restriction site-associated DNA sequencing (sdRAD-seq) and double-digest RAD sequencing (ddRAD-seq) and three restriction enzyme combinations (ApeKI, NlaIII_Msel, and EcoRI_Msel). Forty-two safflower accessions were selected for this study. In silico testing revealed that NlaIII_Msel generated the largest number of DNA fragments, followed by ApeKI and EcoRI_Msel. The in vitro results showed that ddRAD-seq outperformed sdRAD-seq in terms of raw read count, alignment rate, depth and breadth of coverage, and SNP detection. An alignment-free analysis using k-mer counting and sketching based on genetic distance further confirmed the superiority of ddRAD-seq. Gene-level k-mer validation identified more core genes in the ddRAD-seq data. Variant calling resulted in 6,721, 173,212, and 221,805 single nucleotide polymorphic sites (SNPs) for ApeKI, NlaIII_Msel, and EcoRI_Msel, respectively. SNP annotation and distribution analysis revealed that EcoRI_Msel captured more SNPs with fewer missing observations. Principal component analysis via ddRAD-seq data explained 30.29% and 33.98% of the total genetic variation in NlaIII_Msel and EcoRI_Msel, respectively. This study demonstrated that ddRAD-seq with the EcoRI_Msel enzyme combination is the most suitable GBS approach for genome sampling and SNP genotyping in safflower.

Keywords: Carthamus tinctorius; Genetic diversity; Genome sampling; RAD-seq; Restriction enzymes; SNP discovery.

Publication types

  • Comparative Study

MeSH terms

  • Carthamus tinctorius* / genetics
  • Genetic Variation*
  • Genome, Plant
  • Genotype
  • High-Throughput Nucleotide Sequencing / methods
  • Polymorphism, Single Nucleotide*
  • Seeds* / genetics
  • Sequence Analysis, DNA* / methods