From the Human Cell Atlas (HCA) website:
The cell is the core unit of the human body—the key to understanding the biology of health and the ways in which molecular dysfunction leads to disease. Yet our characterization of the hundreds of types and subtypes of cells in the human body is limited, based partly on techniques that have limited resolution and classifications that do not always map neatly to each other. Genomics has offered a systematic approach, but it has largely been applied in bulk to many cell types at once—masking critical differences between cells—and in isolation from other valuable sources of data.
Recent advances in single-cell genomic analysis of cells and tissues have put systematic, high-resolution and comprehensive reference maps of all human cells within reach. In other words, we can now realistically envision a human cell atlas to serve as a basis for both understanding human health and diagnosing, monitoring, and treating disease.
At its core, a cell atlas would be a collection of cellular reference maps, characterizing each of the thousands of cell types in the human body and where they are found. It would be an extremely valuable resource to empower the global research community to systematically study the biological changes associated with different diseases, understand where genes associated with disease are active in our bodies, analyze the molecular mechanisms that govern the production and activity of different cell types, and sort out how different cell types combine and work together to form tissues.
The Human Cell Atlas facilitates queries on it's data coordination platform with a RESTFUL API.
To install this package, use Bioconductor's BiocManager
package.
if (!require("BiocManager"))
install.packages("BiocManager")
BiocManager::install('HCABrowser')
library(HCABrowser)
The HCABrowser package relies on having network
connectivety. Also, the HCA's Data Coordination Platform (DCP) must
also be operational. This package is meant for users who have a working
knowledge of the HCA DCP's schema to provide the user with the
fill richness of the HCA's content. For a more simple-to-use way to access and
download data from The Human Cell Atlas, please use the HCAExplorer
package.
The HCABrowser
object serves as the representation of the Human Cell
Atlas. Upon creation, it will automatically peform a cursorary query and
display a small table showing the first few bundles of the entire HCA. This
intial table contains some columns that we have determined are most useful
to users. The output also displays the url of the instance of the HCA DCP being
used, the current query, whether bundles or files are being displayed, and the
number of bundles in the results
By default, ten bundles per page will be displayed in the result for “raw”
output and 100 will be displayed for “summary” output. The host
argument
dictates the hca-dcp the client will be accessing while the api_url
argument
dictates what schema the client will be accessing. These three values can be
changed in the constructor. The default values are given below.
If the HCA cannot be reached, an error will be thrown displaying the status of the request.
hca <- HCABrowser(api_url='https://dss.data.humancellatlas.org/v1/swagger.json',
host = 'dss.data.humancellatlas.org/v1')
hca
## class: HCABrowser
## Using x-dcp at:
## dss.data.humancellatlas.org/v1
##
## Current Query:
##
## Current Selection:
Upon displaying the object, multiple fields can be seen:
HCABrowser
filter()
)select()
)
To change how many pages are being displayed, the per_page()
method can be
used.
(Note the hca dcp had a maximum of 10 bundles per page to be shown at a time)
Since there are far more bundles in the HCA than can be shown, if link
is
True
, the next set of bundles can be obtained using the nextResults
method.
hca <- nextResults(hca)
hca
The HCA extends the functionality of the dplyr package's filter()
and select()
methods.
The filter()
method allows the user to query the HCA by relating fields to
certain values. Character fields can be queried using the operators:
==
!=
%in%
%startsWith%
%endsWith%
%contains%
Numeric fields can be queried with the operators:
==
!=
%in%
>
<
>=
<=
Queries can be encompassed by parenthesese
()
Queries can be negated by placing the !
symbol in front
Combination operators can be used to combine queries
&
|
Now we see that “brain” and “Brain” are available values. Since these values are the result of input by other users, there may be errors or inconsistencies. To be safe, both fields can be queried with the following query:
hca2 <- hca %>% filter('files.specimen_from_organism_json.organ.text' == c('Brain', 'brain'))
hca2 <- hca %>% filter('files.specimen_from_organism_json.organ.text' %in% c('Brain', 'brain'))
hca2 <- hca %>% filter('files.specimen_from_organism_json.organ.text' == Brain | 'files.specimen_from_organism_json.organ.text' == brain)
hca2
## class: HCABrowser
## Using x-dcp at:
## dss.data.humancellatlas.org/v1
##
## Current Query:
## "files.specimen_from_organism_json.organ.text" == Brain | "files.specimen_from_organism_json.organ.text" == brain
##
## Current Selection:
If we also wish to search for results based on the NCBI Taxon ID for mouse, 10090, as well as brain, we can perform this query in a variety of ways.
hca2 <- hca %>% filter('files.specimen_from_organism_json.organ.text' %in% c('Brain', 'brain')) %>%
filter('specimen_from_organism_json.biomaterial_core.ncbi_taxon_id' == 10090)
hca2 <- hca %>% filter('files.specimen_from_organism_json.organ.text' %in% c('Brain', 'brain'),
'specimen_from_organism_json.biomaterial_core.ncbi_taxon_id' == 10090)
hca <- hca %>% filter('files.specimen_from_organism_json.organ.text' %in% c('Brain', 'brain') &
'specimen_from_organism_json.biomaterial_core.ncbi_taxon_id' == 10090)
hca
## class: HCABrowser
## Using x-dcp at:
## dss.data.humancellatlas.org/v1
##
## Current Query:
## "files.specimen_from_organism_json.organ.text" %in% c("Brain", "brain") & "specimen_from_organism_json.biomaterial_core.ncbi_taxon_id" == 10090
##
## Current Selection:
The HCABrowser
package is able to handle arbitrarily complex queries on the
Human Cell Atlas.
hca2 <- hca %>% filter((!organ.text %in% c('Brain', 'blood')) &
(files.specimen_from_organism_json.genus_species.text == "Homo sapiens" |
library_preparation_protocol_json.library_construction_approach.text == 'Smart-seq2')
)
hca2
## class: HCABrowser
## Using x-dcp at:
## dss.data.humancellatlas.org/v1
##
## Current Query:
## "files.specimen_from_organism_json.organ.text" %in% c("Brain", "brain") & "specimen_from_organism_json.biomaterial_core.ncbi_taxon_id" == 10090
## (!organ.text %in% c("Brain", "blood")) & (files.specimen_from_organism_json.genus_species.text == "Homo sapiens" | library_preparation_protocol_json.library_construction_approach.text == "Smart-seq2")
##
## Current Selection:
The HCABrowser
object can undo the most recent queries run on it.
hca <- hca %>% filter('files.specimen_from_organism_json.organ.text' == heart)
hca <- hca %>% filter('files.specimen_from_organism_json.organ.text' != brain)
hca <- undoEsQuery(hca, n = 2)
hca
## class: HCABrowser
## Using x-dcp at:
## dss.data.humancellatlas.org/v1
##
## Current Query:
## "files.specimen_from_organism_json.organ.text" %in% c("Brain", "brain") & "specimen_from_organism_json.biomaterial_core.ncbi_taxon_id" == 10090
##
## Current Selection:
If one would want to start from a fresh query but retain the modifications made
to the HCABrowser
object, the resetEsQuery()
method can be used.
hca <- resetEsQuery(hca)
hca
Using fields()
, we can find that the fields paired_end
and
organ.ontology
are availiable. These fields can be shown in our resulting
HCABrowser
object using the select()
method.
hca2 <- hca %>% select('paired_end', 'organ.ontology')
hca2 <- hca %>% select(c('paired_end', 'organ.ontology'))
hca2
## class: HCABrowser
## Using x-dcp at:
## dss.data.humancellatlas.org/v1
##
## Current Query:
## "files.specimen_from_organism_json.organ.text" %in% c("Brain", "brain") & "specimen_from_organism_json.biomaterial_core.ncbi_taxon_id" == 10090
##
## Current Selection:
The remainder of this vignette will review applicable uses for this package.
hca <- HCABrowser()
hca <- hca %>% filter('files.specimen_from_organism_json.organ.text' == "brain")
result <- searchBundles(hca, replica='aws', output_format='raw')
result <- parseToSearchResults(result)
bundles <- lapply(results(result), function(results) {
uuid <- stringr::str_split(results[["bundle_fqid"]], '\\.')[[1]][[1]]
getBundle(hca, uuid=uuid, replica='aws')
})
bundles
## [[1]]
## Response [https://dss.data.humancellatlas.org/v1/bundles/fffcea5e-2e6c-4ca1-9aa9-c23b90b2e8b8/checkout?replica=aws]
## Date: 2020-10-28 00:46
## Status: 200
## Content-Type: application/json
## Size: 59 B
## {"checkout_job_id":"c11fbd86-8fc2-4279-b629-ae8e5c1e8b61"}
##
## [[2]]
## Response [https://dss.data.humancellatlas.org/v1/bundles/fffcc997-3121-42af-80ca-33d1cb06f509/checkout?replica=aws]
## Date: 2020-10-28 00:46
## Status: 200
## Content-Type: application/json
## Size: 59 B
## {"checkout_job_id":"92f1c07d-c03b-4ec3-8ff1-814949b2ae61"}
##
## [[3]]
## Response [https://dss.data.humancellatlas.org/v1/bundles/fffb302b-1f5d-4aef-bdf3-a157d1d507c7/checkout?replica=aws]
## Date: 2020-10-28 00:46
## Status: 200
## Content-Type: application/json
## Size: 59 B
## {"checkout_job_id":"b3a1e712-f536-40c5-9302-d03b1dfdc99f"}
##
## [[4]]
## Response [https://dss.data.humancellatlas.org/v1/bundles/fff898a9-b528-4b0d-90b4-3cd507f36cc5/checkout?replica=aws]
## Date: 2020-10-28 00:46
## Status: 200
## Content-Type: application/json
## Size: 59 B
## {"checkout_job_id":"bb667a0c-bf71-4c50-b19d-645ee59d5733"}
##
## [[5]]
## Response [https://dss.data.humancellatlas.org/v1/bundles/fff57cf8-55b3-4274-886b-5d4c7353e5cc/checkout?replica=aws]
## Date: 2020-10-28 00:46
## Status: 200
## Content-Type: application/json
## Size: 59 B
## {"checkout_job_id":"70fd07bd-f710-47db-b968-48ed7ef57161"}
##
## [[6]]
## Response [https://dss.data.humancellatlas.org/v1/bundles/fff5329f-b548-4ecd-bf00-98200697a1b4/checkout?replica=aws]
## Date: 2020-10-28 00:46
## Status: 200
## Content-Type: application/json
## Size: 59 B
## {"checkout_job_id":"024d5d29-4dee-40da-850e-0d87856ba6b2"}
##
## [[7]]
## Response [https://dss.data.humancellatlas.org/v1/bundles/ffee0443-18b0-49f0-bc7a-e9355fb4e342/checkout?replica=aws]
## Date: 2020-10-28 00:46
## Status: 200
## Content-Type: application/json
## Size: 59 B
## {"checkout_job_id":"ccb52570-b56c-4d0b-9af7-8df19ccd6ffa"}
##
## [[8]]
## Response [https://dss.data.humancellatlas.org/v1/bundles/ffea02c7-b71a-4958-9fb4-ee5bb9d6ebc1/checkout?replica=aws]
## Date: 2020-10-28 00:46
## Status: 200
## Content-Type: application/json
## Size: 59 B
## {"checkout_job_id":"000a0723-046e-4032-93ca-be8db3becf19"}
##
## [[9]]
## Response [https://dss.data.humancellatlas.org/v1/bundles/ffe7dee0-24a0-4ae3-857d-a05eaca8445e/checkout?replica=aws]
## Date: 2020-10-28 00:46
## Status: 200
## Content-Type: application/json
## Size: 59 B
## {"checkout_job_id":"a6696567-bb56-4b59-9965-93733a7325f4"}
##
## [[10]]
## Response [https://dss.data.humancellatlas.org/v1/bundles/ffe61686-a91a-4421-b538-8318ee079ac6/checkout?replica=aws]
## Date: 2020-10-28 00:46
## Status: 200
## Content-Type: application/json
## Size: 59 B
## {"checkout_job_id":"638d698c-ed66-44ab-b0cb-ce1b1b33e2b7"}
hca <- HCABrowser()
hca <- hca %>% filter('files.specimen_from_organism_json.organ.text' == "brain")
result <- searchBundles(hca, replica='aws', output_format='raw')
result <- parseToSearchResults(result)
bundles <- lapply(results(result), function(results) {
uuid <- stringr::str_split(results[["bundle_fqid"]], '\\.')[[1]][[1]]
bundle <- c()
tryCatch({
checkout_id <- httr::content(checkoutBundle(hca, uuid=uuid, replica='aws'))$checkout_job_id
bundle_checkout_status <- httr::content(getBundleCheckout(hca, checkout_job_id=checkout_id, replica='aws'))$status
bundle <- getBundle(uuid=uuid, replica='aws')
httr::content(bundle)
}, error = function(e) {
message('Missing checkout request')
}, finally = {
NULL
})
})
files <- lapply(bundles, function(bundle) {
uuid <- bundle[['bundle']]['uuid']
getFile(hca, uuid=uuid, replica='aws')
})
files
## [[1]]
## Response [https://dss.data.humancellatlas.org/v1/files/{uuid}?replica=aws]
## Date: 2020-10-28 00:46
## Status: 400
## Content-Type: <unknown>
## <EMPTY BODY>
##
## [[2]]
## Response [https://dss.data.humancellatlas.org/v1/files/{uuid}?replica=aws]
## Date: 2020-10-28 00:46
## Status: 400
## Content-Type: <unknown>
## <EMPTY BODY>
##
## [[3]]
## Response [https://dss.data.humancellatlas.org/v1/files/{uuid}?replica=aws]
## Date: 2020-10-28 00:46
## Status: 400
## Content-Type: <unknown>
## <EMPTY BODY>
##
## [[4]]
## Response [https://dss.data.humancellatlas.org/v1/files/{uuid}?replica=aws]
## Date: 2020-10-28 00:46
## Status: 400
## Content-Type: <unknown>
## <EMPTY BODY>
##
## [[5]]
## Response [https://dss.data.humancellatlas.org/v1/files/{uuid}?replica=aws]
## Date: 2020-10-28 00:46
## Status: 400
## Content-Type: <unknown>
## <EMPTY BODY>
##
## [[6]]
## Response [https://dss.data.humancellatlas.org/v1/files/{uuid}?replica=aws]
## Date: 2020-10-28 00:46
## Status: 400
## Content-Type: <unknown>
## <EMPTY BODY>
##
## [[7]]
## Response [https://dss.data.humancellatlas.org/v1/files/{uuid}?replica=aws]
## Date: 2020-10-28 00:46
## Status: 400
## Content-Type: <unknown>
## <EMPTY BODY>
##
## [[8]]
## Response [https://dss.data.humancellatlas.org/v1/files/{uuid}?replica=aws]
## Date: 2020-10-28 00:46
## Status: 400
## Content-Type: <unknown>
## <EMPTY BODY>
##
## [[9]]
## Response [https://dss.data.humancellatlas.org/v1/files/{uuid}?replica=aws]
## Date: 2020-10-28 00:46
## Status: 400
## Content-Type: <unknown>
## <EMPTY BODY>
##
## [[10]]
## Response [https://dss.data.humancellatlas.org/v1/files/{uuid}?replica=aws]
## Date: 2020-10-28 00:46
## Status: 400
## Content-Type: <unknown>
## <EMPTY BODY>
In addition to the method shown below, the HCAMatrixBrowser package can be used itself to download expression matrix data.
# Add to filter (functions like json)
hca <- hca %>% filter('files.project_json.project_core.project_title' == 'Census of Immume Cells')
# Use ropenapi to access search function
ret <- searchBundles(hca, getEsQuery(hca), replica='aws', output_format='raw')
# Prepare SearchResult object
ret <- parseToSearchResults(ret)
res <- results(ret)
# Process SearchResult object, fetch file, save results file to BiocFileCache
for(bundles in res) {
file <- bundles[['metadata']][['manifest']][['files']]
if (endsWith('results', file)) {
file <- searchBundles(hca, uuid=file, replica='aws')
saveToBiocFileCache(file)
}
}
S3
object-oriented programming paradigm is used.dplyr
package can be used to manipulate objects in the
HCABrowser
package.