statgen stores each analysis object on a shared
reference SNP axis. The R package loads reference panels, summary
statistics, annotations, genotypes, and LD panels into those
coordinates.
The bundled package fixtures are intentionally tiny so that examples and CRAN checks do not require external tools or large data.
This vignette reads those installed fixtures with
system.file(..., package = "statgen"). They are
demonstration inputs only. In real analyses, replace these paths with
post-harmonization reference, summary statistics, annotation, genotype,
and LD files prepared for one analysis input set. For an end-to-end
repository workflow that prepares source inputs and then runs the same R
analysis pattern on those files, see
docs/TUTORIAL_1_PREPARE_DATA.md and
docs/TUTORIAL_4_R.md.
reference_template <- file.path(
dirname(system.file("extdata", "reference_chr1.bim", package = "statgen")),
"reference_chr@.bim"
)
reference <- load_reference(reference_template)
sumstats_path <- system.file("extdata", "traits_complete.tsv.gz", package = "statgen")
sumstats <- load_sumstats(sumstats_path, reference)
annotation_paths <- system.file(
"extdata",
c("anno1.bed", "anno2.bed"),
package = "statgen"
)
annotations <- load_annotations(annotation_paths, reference)
num_snp(reference)
#> [1] 8
head(logpvec(sumstats))
#> [1] 1.920819 0.301030 1.301030 0.698970 0.397940 2.522879
colnames(annomat(annotations))
#> [1] "anno1" "anno2"R-native object caches are RDS files. They are useful for repeated analyses in the same R runtime.
Python builds the canonical LD .npz distribution. For
package examples, ld_root points to a bundled toy
distribution that already includes the R reference-cache sidecar. For
real analyses, run prepare_ld_npz_for_r() once on the
Python-built ld_npz/ directory for the same harmonized
input set, then load that directory with load_ld().
ld_root <- system.file("extdata", "ld", package = "statgen")
ld <- load_ld(ld_root)
num_snp(ld)
#> [1] 8
a1freq(ld)
#> [1] 0.30 0.40 0.20 0.35 0.15 0.28 0.32 0.22multiply_r2() multiplies aligned vectors or matrices by
LD r^2. fast_prune() applies greedy LD pruning
to aligned scores.
Genotype panels keep PLINK metadata and fetch hardcalls on demand.
The original BED files must remain available unless a replacement
bed_path is supplied.
ref_chr1 <- load_reference(system.file("extdata", "reference_chr1.bim", package = "statgen"))
bed_chr1 <- system.file("extdata", "genotype_1.bed", package = "statgen")
genotype <- load_genotype(sub("\\.bed$", "", bed_chr1), ref_chr1)
geno <- fetch_genotypes_int8(genotype, c(1, 3))
dim(geno)
#> [1] 4 2
head(geno)
#> [,1] [,2]
#> [1,] 2 2
#> [2,] 2 2
#> [3,] 2 2
#> [4,] 2 2