The purpose of this vignette is to provide details on how FHIR documents are transformed to graphs in BiocFHIR.
This text uses R commands that will work for an R (version 4.2 or greater) in which BiocFHIR (version 0.0.14 or greater) has been installed. The source codes are always available at github and may be available for installation by other means.
In the “Upper level FHIR concepts” vignette, we used the following code to get a peek at the information structure in a single document representing a Bundle associated with a patient.
tfile = dir(system.file("json", package="BiocFHIR"), full=TRUE)
peek = jsonlite::fromJSON(tfile)
names(peek)
## [1] "resourceType" "type" "entry"
## [1] "Bundle"
## [1] "fullUrl" "resource" "request"
## [1] 72
## [1] "data.frame"
## [1] 301 72
## [1] "resourceType" "id" "text" "extension" "identifier"
## [6] "name"
We perform a first stage of transformation with process_fhir_bundle
:
## BiocFHIR FHIR.bundle instance.
## resource types are:
## AllergyIntolerance CarePlan ... Patient Procedure
## BiocFHIR.FHIRgraph instance.
## A graphNEL graph with directed edges
## Number of Nodes = 120
## Number of Edges = 348
## 50 patients, 70 conditions
## [1] "graph" "patients" "conditions"
We made a new S3 class to hold the graph with some convenient metadata. Ultimately that metadata should be bound into the graph itself as nodeData and edgeData components.
Because basic identifying information is decomposed into components in FHIR, we have a utility to acquire the patient name for a given bundle.
## [1] "Ankunding277D'Amore443"
The edges emanating from the node corresponding to this patient are conditions that have
been recorded. Edges are retrieved using the edgeL
method.
## [1] "Chronic sinusitis (disorder)"
## [2] "Normal pregnancy"
## [3] "Miscarriage in first trimester"
## [4] "Blighted ovum"
## [5] "Viral sinusitis (disorder)"
## [6] "Acute viral pharyngitis (disorder)"
## [7] "Body mass index 30+ - obesity (finding)"
## [8] "Sprain of wrist"
## [9] "Hyperlipidemia"
We have been unable so far to see how procedures can be linked directly to conditions, except by association with a given patient. We add the procedure information as follows:
## ...some bundles had no Procedure component
## BiocFHIR.FHIRgraph instance.
## A graphNEL graph with directed edges
## Number of Nodes = 214
## Number of Edges = 864
## 50 patients, 70 conditions
Data on additional resources can be added using
the methods of add_procedures
. This
will be carried out in future
releases.
A visNetwork widget can be produced directly from a list of ingested bundles. This display can be zoomed and dragged. Procedures are green, patients are blue, conditions are red.
## ...some bundles had no Procedure component
This collection of vignettes shows some approaches to working with FHIR R4 JSON using R. It is very likely that a new collection of bundles obtained from a different source would not be properly ingested or transformed by the code present in this version of BiocFHIR. Future extensions of the package will employ direct analysis of JSON structures to identify data values and relationships, that should be more adaptable to diverse collections of documents.
Relationships among resources may be represented in many different ways. This survey of the resources in the synthea bundles is surely limited, perhaps even with respect to the information available in the bundles. FHIR experts are invited to identify gaps in this implementation. We anticipate considerable additional work needed to deal with other contexts such as research studies.
## R version 4.3.0 RC (2023-04-13 r84269)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.2 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.17-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] igraph_1.4.2 graph_1.78.0 BiocGenerics_0.46.0
## [4] rjsoncons_1.0.0 jsonlite_1.8.4 DT_0.27
## [7] BiocFHIR_1.2.0 BiocStyle_2.28.0
##
## loaded via a namespace (and not attached):
## [1] dplyr_1.1.2 compiler_4.3.0 BiocManager_1.30.20
## [4] BiocBaseUtils_1.2.0 promises_1.2.0.1 tidyselect_1.2.0
## [7] Rcpp_1.0.10 tidyr_1.3.0 later_1.3.0
## [10] jquerylib_0.1.4 yaml_2.3.7 fastmap_1.1.1
## [13] mime_0.12 R6_2.5.1 generics_0.1.3
## [16] knitr_1.42 htmlwidgets_1.6.2 visNetwork_2.1.2
## [19] tibble_3.2.1 bookdown_0.33 shiny_1.7.4
## [22] bslib_0.4.2 pillar_1.9.0 rlang_1.1.0
## [25] utf8_1.2.3 cachem_1.0.7 httpuv_1.6.9
## [28] xfun_0.39 sass_0.4.5 cli_3.6.1
## [31] magrittr_2.0.3 crosstalk_1.2.0 digest_0.6.31
## [34] xtable_1.8-4 lifecycle_1.0.3 vctrs_0.6.2
## [37] evaluate_0.20 glue_1.6.2 stats4_4.3.0
## [40] fansi_1.0.4 purrr_1.0.1 rmarkdown_2.21
## [43] ellipsis_0.3.2 tools_4.3.0 pkgconfig_2.0.3
## [46] htmltools_0.5.5