Data Management and Summary Statistics with PLINK

Methods Mol Biol. 2020:2090:49-65. doi: 10.1007/978-1-0716-0199-0_3.

Abstract

PLINK is a versatile program which supports data management, quality control, and common statistical computations on matrices of genomic variant calls, in a computationally efficient manner. In population genomics, it is frequently used to take care of the "basics," so they do not need to be reimplemented when a new type of analysis needs to be performed on such a matrix. I describe several of these basic operations, and discuss uses and pitfalls.

Keywords: Allele frequency; Hardy–Weinberg equilibrium; Linkage disequilibrium; Principal component analysis; Relationship inference; Sex inference; Variant call format.

MeSH terms

  • Algorithms*
  • Computational Biology
  • Data Management / methods*
  • Gene Frequency
  • Genetic Variation
  • Genetics, Population
  • Genomics / methods*
  • Humans
  • Linkage Disequilibrium