compute.backbone.tree {cellTree} | R Documentation |
Builds a ‘backbone tree’ from a fitted LDA model.
compute.backbone.tree(lda.results, grouping = NULL, start.group.label = NULL, absolute.width = 0, width.scale.factor = 1.2, outlier.tolerance.factor = 0.1, rooting.method = NULL, only.mst = FALSE, grouping.colors = NULL, merge.sequential.backbone = FALSE)
lda.results |
A fitted LDA model, as returned by |
grouping |
An (optional) vector of labels for each cell in the |
start.group.label |
If a |
absolute.width |
Numeric (optional). Distance threshold below which a cell vertex is considered to be attached to a backbone vertex (see paper for more details). By default, this threshold is computed dynamically, based on the distance distribution for each branch. |
width.scale.factor |
Numeric (optional). A scaling factor for the dynamically-computed distance threshold (ignored if |
outlier.tolerance.factor |
Numeric (optional). Proportion of vertices, out of the total number of vertices divided by the total number of branches, that can be left at the end of the backbone tree-building algorithm. |
rooting.method |
String (optional). Method used to root the backbone tree. Must be either NULL or one of: ‘longest.path’, ‘center.start.group’ or ‘average.start.group’. ‘longest.path' picks one end of the longest shortest-path between two vertices. 'center.start.group’ picks the vertex in the starting group with lowest mean-square-distance to the others. ‘average.start.group’ creates a new artificial vertex, as the average of all cells in the starting group. If no value is provided, the best method is picked based on the type of grouping and start group information available. |
only.mst |
If |
grouping.colors |
(Optional) vector of RGB colors to be used for each grouping. |
merge.sequential.backbone |
(Optional) whether to merge sequential backbone vertices that are close enough. This will produce a more compact backbone tree, but at the cost of extra computing time. |
In order to easily visualise the structural and temporal relationship between cells, we introduced a special type of tree structure dubbed ‘backbone tree’, defined as such:
Considering a set of vertices V and a distance function over all pairs of vertices: d: V × V -> R+, we call backbone tree a graph, T with backbone B, such that:
T is a tree with set of vertices V and edges E.
B is a tree with set of vertices V_B in V and edges E_B in E.
All ‘vertebrae’ vertices of T: v in V \ V_B are connected by a single edge to the closest vertex in the set of backbone vertices v*_B in V_B. I.e: v*_B = argmin_{v_B in V_B} d(v_B, v).
For all vertices in V \ V_B are less than distance δ to a vertex in the backbone tree B: for all v in V \ V_B, there is a v_B in V_B such that d(v, v_b) < δ.
In this instance, we relax the last condition to cover only ‘most’ non-backbone vertices, allowing for a variable proportion of outliers at distance > δ from any vertices in V_B.
We can then define the ‘optimal’ backbone tree to be a backbone tree such that the sum of weighted edges in the backbone subtree E_B is minimal. Finding such a tree can be easily shown to be NP-Complete (by reduction to the Vertex Cover problem), but we developed a fast heuristic relying on Minimum Spanning Tree to produce a reasonable approximation.
The resulting quasi-optimal backbone tree (simply referred to as ‘the’ backbone tree thereafter) gives a clear hierarchical representation of the cell relationship: the objective function puts pressure on finding a (small) group of prominent cells (the backbone) that are good representatives of major steps in the cell evolution (in time or space), while remaining cells are similar enough to their closest representative for their difference to be ignored. Such a tree provides a very clear visualisation of overall cell differentiation paths (including potential differentiation into sub-types).
A igraph object with either a minimum rooted spanning-tree (if only.mst
is TRUE
) or a quasi-optimal backbone tree connecting all input cells. Cell topic distribution, distances and branch order are added as vertex/edge/graph attributes.
# Load pre-computed LDA model for skeletal myoblast RNA-Seq data from HSMMSingleCell package: data(HSMM_lda_model) # Recover sampling time (in days) for each cell: library(HSMMSingleCell) data(HSMM_sample_sheet) days.factor = HSMM_sample_sheet$Hours days = as.numeric(levels(days.factor))[days.factor] # Compute near-optimal backbone tree: b.tree = compute.backbone.tree(HSMM_lda_model, days) # Plot resulting tree with sampling time as a vertex group colour: ct.plot.grouping(b.tree)