library(NanoMethViz)

In order to use this package, your data must be converted from the output of methylation calling software to a special tabix format. Due to the use of the Unix sort function, this can only currently be done on a Linux or MacOS system.

We currently support output from

The conversion can be done using the create_tabix_file() function. We provide example data of nanopolish output within the package, we can look inside to see how the data looks coming out of nanopolish

methy_calls <- system.file(package = "NanoMethViz",
    c("sample1_nanopolish.tsv.gz", "sample2_nanopolish.tsv.gz"))

# have a look at the first 10 rows of methy_data
methy_calls_example <- read.table(
    methy_calls[1], sep = "\t", header = TRUE, nrows = 6)

methy_calls_example
##   chromosome strand     start       end                            read_name
## 1       chr1      - 127732476 127732476 e648c4e3-ca6a-4671-af17-86dab4c819eb
## 2      chr11      - 115423144 115423144 726dd8b5-1531-4279-9cf0-a7e4d5ea0478
## 3      chr11      +  69150806  69150814 34f9ee3e-4b27-4d2d-a203-4067f0662044
## 4       chr1      + 170484965 170484965 d8309c06-375f-4dfe-b22e-0c47af888cd9
## 5       chrY      -   4082060   4082060 f68940f6-4236-4f0f-9af7-a81b5c2911b6
## 6       chr8      + 120733312 120733312 13ae181f-b88b-4d6c-a815-553ff2e25312
##   log_lik_ratio log_lik_methylated log_lik_unmethylated num_calling_strands
## 1         -5.91            -100.38               -94.47                   1
## 2         -8.07            -115.21              -107.13                   1
## 3         -1.65            -183.12              -181.47                   1
## 4          2.74            -112.14              -114.88                   1
## 5         -1.78            -135.09              -133.32                   1
## 6          5.02            -129.31              -134.33                   1
##   num_motifs            sequence
## 1          1         CATTACGTTTC
## 2          1         AACTTCGTTGA
## 3          2 GGTCACGGGAATCCGGTTC
## 4          1         AGAAGCGCTAA
## 5          1         CTCACCGTATA
## 6          1         TCTGACGTTGA

We then create a temporary path to store a converted file, this will be deleted once you exit your R session. Once create_tabix_file() is run, it will create a tabix file along with its index. Because we have a small amount of data, we can read in a small portion of it to see how it looks, do not do this with large datasets as it decompresses all the data and will take very long to run.

methy_tabix <- file.path(tempdir(), "methy_data.bgz")
samples <- c("sample1", "sample2")

# you should see messages when running this yourself
create_tabix_file(methy_calls, methy_tabix, samples)

# don't do this with actual data
# we have to use gzfile to tell R that we have a gzip compressed file
methy_data <- read.table(
    gzfile(methy_tabix), col.names = methy_col_names(), nrows = 6)

methy_data
##    sample  chr      pos strand statistic                            read_name
## 1 sample2 chr1  5141050      -      6.93 3818f2e2-d520-4305-bbab-efad891f67f2
## 2 sample1 chr1  6283067      -      1.05 36e3c55f-c41f-4bd6-b371-54368d013008
## 3 sample1 chr1  7975278      -      1.39 6f6cbc59-af4c-4dfa-8e48-ef4ac4eeb13b
## 4 sample1 chr1 10230292      -      2.19 fbe53b38-e264-4c7a-824e-2651c22f8ea6
## 5 sample1 chr1 13127127      -      2.51 7660ba1f-9b44-4783-b901-ed79b2f0481b
## 6 sample1 chr1 13127134      -      2.51 7660ba1f-9b44-4783-b901-ed79b2f0481b

Now methy_tabix will be the path to a tabix object that is ready for use with NanoMethViz. Please head over to the “Introduction” vignette to see how to use this data for visualisation!