Long-read cDNA sequencing identifies functional pseudogenes in the human transcriptome

Genome Biol. 2021 May 10;22(1):146. doi: 10.1186/s13059-021-02369-0.

Abstract

Pseudogenes are gene copies presumed to mainly be functionless relics of evolution due to acquired deleterious mutations or transcriptional silencing. Using deep full-length PacBio cDNA sequencing of normal human tissues and cancer cell lines, we identify here hundreds of novel transcribed pseudogenes expressed in tissue-specific patterns. Some pseudogene transcripts have intact open reading frames and are translated in cultured cells, representing unannotated protein-coding genes. To assess the biological impact of noncoding pseudogenes, we CRISPR-Cas9 delete the nucleus-enriched pseudogene PDCL3P4 and observe hundreds of perturbed genes. This study highlights pseudogenes as a complex and dynamic component of the human transcriptional landscape.

Keywords: CRISPR; Long-read; PacBio; Pseudogene; lncRNA.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Cell Line
  • DNA, Complementary / genetics*
  • Gene Deletion
  • Haploidy
  • Humans
  • Promoter Regions, Genetic / genetics
  • Pseudogenes*
  • Sequence Analysis, DNA*
  • Transcriptome / genetics*

Substances

  • DNA, Complementary