mktable {GMRP} | R Documentation |
mktable is used to choose SNPs with LG, Pv, Pc
and Pd
and create a standard SNP
beta table for Mendelian randomization
and path analysis
, see details.
mktable(cdata, ddata,rt, varname, LG, Pv, Pc, Pd)
cdata |
causal variable GWAS data or GWAS meta-analysed data containing |
ddata |
disease GWAS data or GWAS meta-analysed data containing |
rt |
a string that specifies type of returning table. It has two options: |
varname |
a required string set that lists names of undefined causal variables for Mendelian randomization and path analyses. The first name is disease name. Here an example given is |
LG |
a numeric parameter. |
Pv |
a numeric parameter. |
Pc |
a numeric parameter. |
Pd |
a numeric parameter. Pd is a given proportion of sample size to the maximum sample size in disease data and used to choose |
The standard GWAS cdata set should have the format with following columns: chrn, posit, rsid, a1.x1, a1.x2
, ..., a1.xn
, freq.x1, freq.x2
, ..., freq.xn
, beta.x1, beta.x2
, ..., beta.xn
, sd.x1, sd.x2
, ..., sd.xn
, pvj
, N.x1, N.x2
, ..., N.xn
, pcj
. The standard GWAS ddata set should havehg.d
, SNP.d
,a1.d
, freq.d
, beta.d
, N.case
,N.ctr
,freq.case
where x1, x2
, ..., xn
are causal variables. See example.
is a numeric vector that is a column of beta values for regression of SNPs on variable vector X={x1, x2, ..., xn}
.
is a numeric vector that is a column of frequencies of allele 1 with respect to variable vector X={x1, x2, ..., xn}
.
is a numeric vector that is a column of standard deviations of variable x1,x2
, ..., xn
specific to SNP
. Note that here sd is not beta standard deviation. If sd is not specifical to SNP
s, then sd.xi has the same value for all SNPs in variable i
.
denotes disease.
is sample size.
is frequency of disease.
is a numeric vector for chromosome #.
is a numeric vector for SNP
positions on chromosome #. Some time, chrn
and posit are combined into string vector: hg19/hg18
.
is defined as p-value, pcj
and pdj
as proportions of sample size for SNP
j
to the maximum sample size in the causal variable data and in disease data, respectively.
Return a standard SNP
beta or SNP
path table containing m
SNP
s chosen with LG, Pv, Pc and Pd
and n
variables and disease for Mendelian randomization
and path analysis
.
The order of column variables must be chrn
posit
rsid
a1.x1
... a1.xn
freq.x1
... freq.xn
beta.x1
... beta.x1
... beta.xn
sd.x1
... sd.xn
... otherwise, mktable would have error. see example.
Yuan-De Tan tanyuande@gmail.com
Do, R. et al. 2013. Common variants associated with plasma triglycerides and risk for coronary artery disease. Nat Genet 45: 1345-1352.
Sheehan, N.A. et al. 2008. Mendelian randomisation and causal inference in observational epidemiology. PLoS Med 5: e177.
Sheehan, N.A.,et al. 2010. Mendelian randomisation: a tool for assessing causality in observational epidemiology. Methods Mol Biol 713: 153-166.
Willer, C.J. Schmidt, E.M. Sengupta, S. Peloso, G.M. Gustafsson, S. Kanoni, S. Ganna, A. Chen, J.,Buchkovich, M.L. Mora, S. et al (2013) Discovery and refinement of loci associated with lipid levels. Nat Genet 45: 1274-1283.
data(lpd.data) #lpd<-DataFrame(lpd.data) lpd<-lpd.data data(cad.data) #cad<-DataFrame(cad.data) cad<-cad.data # step 1: calculate pvj pvalue.LDL<-lpd$P.value.LDL pvalue.HDL<-lpd$P.value.HDL pvalue.TG<-lpd$P.value.TG pvalue.TC<-lpd$P.value.TC pv<-cbind(pvalue.LDL,pvalue.HDL,pvalue.TG,pvalue.TC) pvj<-apply(pv,1,min) #step 2: construct beta table of undefined causal variables: beta.LDL<-lpd$beta.LDL beta.HDL<-lpd$beta.HDL beta.TG<-lpd$beta.TG beta.TC<-lpd$beta.TC beta<-cbind(beta.LDL,beta.HDL,beta.TG,beta.TC) #step 3: construct a matrix for allele 1 in each undefined causal variable: a1.LDL<-lpd$A1.LDL a1.HDL<-lpd$A1.HDL a1.TG<-lpd$A1.TG a1.TC<-lpd$A1.TC alle1<-cbind(a1.LDL,a1.HDL,a1.TG,a1.TC) #step 4: calculate sample sizes of causal variables and calculate pcj N.LDL<-lpd$N.LDL N.HDL<-lpd$N.HDL N.TG<-lpd$N.TG N.TC<-lpd$N.TC ss<-cbind(N.LDL,N.HDL,N.TG,N.TC) sm<-apply(ss,1,sum) pcj<-sm/max(sm) #step 5: construct a matrix for frequency of allele1 in each undefined causal variable in 1000G.EUR freq.LDL<-lpd$Freq.A1.1000G.EUR.LDL freq.HDL<-lpd$Freq.A1.1000G.EUR.HDL freq.TG<-lpd$Freq.A1.1000G.EUR.TG freq.TC<-lpd$Freq.A1.1000G.EUR.TC freq<-cbind(freq.LDL,freq.HDL,freq.TG,freq.TC) #step 6: construct matrix for sd of each causal variable (here sd is not specific to SNPj) # the sd values were averaged over 63 studies see reference Willer et al(2013) sd.LDL<-rep(37.42,length(pvj)) sd.HDL<-rep(14.87,length(pvj)) sd.TG<-rep(92.73,length(pvj)) sd.TC<-rep(42.74,length(pvj)) sd<-cbind(sd.LDL,sd.HDL,sd.TG,sd.TC) #step 7: retriev SNP ID and position: hg19<-lpd$SNP_hg19.HDL rsid<-lpd$rsid.HDL #step 8: invoke chrp to separate chromosome number and SNP position: chr<-chrp(hg=hg19) #step 9: get new data of causal variables: newdata<-cbind(freq,beta,sd,pvj,ss,pcj) newdata<-cbind(chr,rsid,alle1,as.data.frame(newdata)) dim(newdata) #[1] 120165 25 #step 10: retrieve cad data from cad and calculate pdj and frequency of cad in population hg18.d<-cad$chr_pos_b36 SNP.d<-cad$SNP #SNPID a1.d<-tolower(cad$reference_allele) freq.d<-cad$ref_allele_frequency pvalue.d<-cad$pvalue beta.d<-cad$log_odds N.case<-cad$N_case N.ctr<-cad$N_control N.d<-N.case+N.ctr freq.case<-N.case/N.d #step 11: get new cad data: newcad<-cbind(freq.d,beta.d,N.case,N.ctr,freq.case) newcad<-cbind(hg18.d,SNP.d,a1.d,as.data.frame(newcad)) dim(newcad) #step 12: give variable list varname<-c("CAD","LDL","HDL","TG","TC") #step 3: create beta table with function mktable mybeta<-mktable(cdata=newdata,ddata=newcad,rt="beta",varname=varname,LG=1, Pv=0.00000005, Pc=0.979,Pd=0.979) beta<-mybeta[,4:8] # save beta for path analysis snp<-mybeta[,1:3] # save snp for annotation analysis beta<-DataFrame(beta)