assocTestSingle {GENESIS}R Documentation

Genotype Association Testing with Mixed Models

Description

assocTestSingle performs genotype association tests using the null model fit with fitNullModel.

Usage

## S4 method for signature 'SeqVarIterator'
assocTestSingle(gdsobj, null.model, test=c("Score", "Score.SPA"),
                recalc.pval.thresh=0.05, GxE=NULL, sparse=TRUE, imputed=FALSE, male.diploid=TRUE,
                genome.build=c("hg19", "hg38"), verbose=TRUE)
## S4 method for signature 'GenotypeIterator'
assocTestSingle(gdsobj, null.model, test=c("Score", "Score.SPA"),
                recalc.pval.thresh=0.05, GxE=NULL, male.diploid=TRUE, verbose=TRUE)

Arguments

gdsobj

An object of class SeqVarIterator from the package SeqVarTools, or an object of class GenotypeIterator from the package GWASTools, containing the genotype data for the variants and samples to be used for the analysis.

null.model

A null model object returned by fitNullModel.

test

A character string specifying the type of test to be performed. The possibilities are "Score" (default) and "Score.SPA"; "Score.SPA" can only be used when the family of the null model fit with fitNullModel is binomial.

recalc.pval.thresh

If test is not "Score", recalculate p-values using the specified 'test' for variants with a Score p-value below this threshold; return the score p-value for all other variants.

GxE

A vector of character strings specifying the names of the variables for which a genotype interaction term should be included. If NULL (default) no genotype interactions are included. See 'Details' for further information.

sparse

Logical indicator of whether to read genotypes as sparse Matrix objects; the default is TRUE. Set this to FALSE if the alternate allele dosage of the genotypes in the test are not expected to be mostly 0.

imputed

Logical indicator of whether to read dosages from the "DS" field containing imputed dosages instead of counting the number of alternate alleles.

male.diploid

Logical for whether males on sex chromosomes are coded as diploid.

genome.build

A character sting indicating genome build; used to identify pseudoautosomal regions on the X and Y chromosomes.

verbose

Logical indicator of whether updates from the function should be printed to the console; the default is TRUE.

Details

All samples included in null model will be included in the association test, even if a different set of samples is present in the current filter for gdsobj.

The effect size estimate is for each copy of the alternate allele. For multiallelic variants, each alternate allele is tested separately.

Sporadic missing genotype values are mean imputed using the allele frequency calculated on all other samples at that variant.

Monomorphic variants (including variants where every sample is a heterozygote) are omitted from the results.

The input GxE can be used to perform GxE tests. Multiple interaction variables may be specified, but all interaction variables specified must have been included as covariates in fitting the null model with fitNullModel. When performing GxE analyses, assocTestSingle will report two tests: (1) the joint test of all genotype interaction terms in the model (this is the test for any genotype interaction effect), and (2) the joint test of the genotype term along with all of the genotype interaction terms (this is the test for any genetic effect). Individual genotype interaction terms can be tested by creating test statistics from the reported effect size estimates and their standard errors (Note: when GxE contains a single continuous or binary covariate, this test is the same as the test for any genotype interaction effect mentioned above).

The saddle point approximation (SPA), run by using test = "Score.SPA", implements the method described by Dey et al. (2017), which was extended to mixed models by Zhou et al. (2018) in their SAIGE software. SPA provides better calibration of p-values when fitting mixed models with a binomial family for a sample with an imbalanced case to control ratio.

For the GenotypeIterator method, objects created with GdsGenotypeReader or MatrixGenotypeReader are supported. NcdfGenotypeReader objects are not supported.

Value

A data.frame where each row refers to a different variant with the columns:

variant.id

The variant ID

chr

The chromosome value

pos

The base pair position

allele.index

The index of the alternate allele. For biallelic variants, this will always be 1.

n.obs

The number of samples with non-missing genotypes

freq

The estimated alternate allele frequency

MAC

The minor allele count. For multiallelic variants, "minor" is determined by comparing the count of the alternate allele specified by allele.index with the sum of all other alleles.

If test is "Score":

Score

The value of the score function

Score.SE

The estimated standard error of the Score

Score.Stat

The score Z test statistic

Score.pval

The score p-value

Est

An approximation of the effect size estimate for each additional copy of the alternate allele

Est.SE

An approximation of the standard error of the effect size estimate

PVE

An approximation of the proportion of phenotype variance explained

If test is "Score.SPA":

SPA.pval

The score p-value after applying the saddle point approximation (SPA)

SPA.converged

logical indiactor of whether the SPA converged; NA indicates that the SPA was not applied and the original Score.pval was returned

If GxE is not NULL:

Est.G

The effect size estimate for the genotype term

Est.G:env

The effect size estimate for the genotype*env interaction term. There will be as many of these terms as there are interaction variables, and "env" will be replaced with the variable name.

SE.G

The estimated standard error of the genotype term effect size estimate

SE.G:env

The estimated standard error of the genotype*env effect size estimate. There will be as many of these terms as there are interaction variables, and "env" will be replaced with the variable name.

GxE.Stat

The Wald Z test statistic for the test of all genotype interaction terms. When there is only one genotype interaction term, this is the test statistic for that term.

GxE.pval

The Wald p-value for the test of all genotype interaction terms; i.e. the test of any genotype interaction effect

Joint.Stat

The Wald Z test statistic for the joint test of the genotype term and all of the genotype interaction terms

Joint.pval

The Wald p-value for the joint test of the genotype term and all of the genotype interaction terms; i.e. the test of any genotype effect

Author(s)

Matthew P. Conomos, Stephanie M. Gogarten, Tamar Sofer, Ken Rice, Chaoyu Yu

References

Dey, R., Schmidt, E. M., Abecasis, G. R., & Lee, S. (2017). A fast and accurate algorithm to test for binary phenotypes and its application to PheWAS. The American Journal of Human Genetics, 101(1), 37-49. Zhou, W., Nielsen, J. B., Fritsche, L. G., Dey, R., Gabrielsen, M. E., Wolford, B. N., ... & Bastarache, L. A. (2018). Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies. Nature genetics, 50(9), 1335.

See Also

fitNullModel for fitting the null mixed model needed as input to assocTestSingle. SeqVarIterator for creating the input object with genotypes.

Examples

library(SeqVarTools)
library(Biobase)

# open a sequencing GDS file
gdsfile <- seqExampleFileName("gds")
gds <- seqOpen(gdsfile)

# simulate some phenotype data
set.seed(4)
data(pedigree)
pedigree <- pedigree[match(seqGetData(gds, "sample.id"), pedigree$sample.id),]
pedigree$outcome <- rnorm(nrow(pedigree))

# construct a SeqVarIterator object
seqData <- SeqVarData(gds, sampleData=AnnotatedDataFrame(pedigree))
iterator <- SeqVarBlockIterator(seqData)

# fit the null model
nullmod <- fitNullModel(iterator, outcome="outcome", covars="sex")

# run the association test
assoc <- assocTestSingle(iterator, nullmod)

seqClose(iterator)


library(GWASTools)

# open a SNP-based GDS file
gdsfile <- system.file("extdata", "HapMap_ASW_MXL_geno.gds", package="GENESIS")
gds <- GdsGenotypeReader(filename = gdsfile)

# simulate some phenotype data
set.seed(4)
pheno <- data.frame(scanID=getScanID(gds),
                    outcome=rnorm(nscan(gds)))

# construct a GenotypeIterator object
genoData <- GenotypeData(gds, scanAnnot=ScanAnnotationDataFrame(pheno))
iterator <- GenotypeBlockIterator(genoData)

# fit the null model
nullmod <- fitNullModel(iterator, outcome="outcome")

# run the association test
assoc <- assocTestSingle(iterator, nullmod)

close(iterator)

[Package GENESIS version 2.16.1 Index]