GENESIS provides statistical methodology for analyzing genetic data from samples with population structure and/or familial relatedness. This vignette provides a description of how to use GENESIS for inferring population structure, as well as estimating relatedness measures such as kinship coefficients, identity by descent (IBD) sharing probabilities, and inbreeding coefficients. GENESIS uses PC-AiR for population structure inference that is robust to known or cryptic relatedness, and it uses PC-Relate for accurate relatedness estimation in the presence of population structure, admixutre, and departures from Hardy-Weinberg equilibrium.
The functions in the GENESIS
package read genotype data from a GenotypeData
class object as created by the GWASTools
package. Through the use of GWASTools
, a GenotypeData
class object can easily be created from:
Example R code for creating a GenotypeData
object is presented below. Much more detail can be found in the GWASTools
package reference manual.
geno <- MatrixGenotypeReader(genotype = genotype, snpID = snpID, chromosome = chromosome,
position = position, scanID = scanID)
genoData <- GenotypeData(geno)
genotype
is a matrix of genotype values coded as 0 / 1 / 2, where rows index SNPs and columns index samplessnpID
is an integer vector of unique SNP IDschromosome
is an integer vector specifying the chromosome of each SNPposition
is an integer vector specifying the position of each SNPscanID
is a vector of unique individual IDsfilename
is the file path to the GDS objectThe SNPRelate
package provides the snpgdsBED2GDS
function to convert binary PLINK files into a GDS file.
snpgdsBED2GDS(bed.fn = "genotype.bed", bim.fn = "genotype.bim", fam.fn = "genotype.fam",
out.gdsfn = "genotype.gds")
bed.fn
is the file path to the PLINK .bed filebim.fn
is the file path to the PLINK .bim filefam.fn
is the file path to the PLINK .fam fileout.gdsfn
is the file path for the output GDS fileOnce the PLINK files have been converted to a GDS file, then a GenotypeData
object can be created as described above.
To demonstrate PC-AiR and PC-Relate analyses with the GENESIS
package, we analyze SNP data from the Mexican Americans in Los Angeles, California (MXL) and African American individuals in the southwestern USA (ASW) population samples of HapMap 3. Mexican Americans and African Americans have a diverse ancestral background, and familial relatives are present in these data. Genotype data at a subset of 20K autosomal SNPs for 173 individuals are provided as a GDS file.
# read in GDS data
gdsfile <- system.file("extdata", "HapMap_ASW_MXL_geno.gds", package="GENESIS")
HapMap_geno <- GdsGenotypeReader(filename = gdsfile)
# create a GenotypeData class object
HapMap_genoData <- GenotypeData(HapMap_geno)
HapMap_genoData
## An object of class GenotypeData
## | data:
## File: /tmp/RtmpSJrQVL/Rinst2522ed3f97b/GENESIS/extdata/HapMap_ASW_MXL_geno.gds (901.8K)
## + [ ] *
## |--+ sample.id { Int32,factor 173 ZIP(40.9%), 283B } *
## |--+ snp.id { Int32 20000 ZIP(34.6%), 27.1K }
## |--+ snp.position { Int32 20000 ZIP(34.6%), 27.1K }
## |--+ snp.chromosome { Int32 20000 ZIP(0.13%), 103B }
## \--+ genotype { Bit2 20000x173, 844.7K } *
## | SNP Annotation:
## NULL
## | Scan Annotation:
## NULL
Conomos M.P., Reiner A.P., Weir B.S., & Thornton T.A. (2016). Model-free Estimation of Recent Genetic Relatedness. American Journal of Human Genetics, 98(1), 127-148.
Conomos M.P., Miller M.B., & Thornton T.A. (2015). Robust Inference of Population Structure for Ancestry Prediction and Correction of Stratification in the Presence of Relatedness. Genetic Epidemiology, 39(4), 276-293.
Gogarten, S.M., Bhangale, T., Conomos, M.P., Laurie, C.A., McHugh, C.P., Painter, I., … & Laurie, C.C. (2012). GWASTools: an R/Bioconductor package for quality control and analysis of Genome-Wide Association Studies. Bioinformatics, 28(24), 3329-3331.
Manichaikul, A., Mychaleckyj, J.C., Rich, S.S., Daly, K., Sale, M., & Chen, W.M. (2010). Robust relationship inference in genome-wide association studies. Bioinformatics, 26(22), 2867-2873.