Ozkan (2018) introduced a novel approach to measuring taxonomic diversity using Deng entropy — a generalization of Shannon entropy rooted in Dempster-Shafer evidence theory (Dempster, 1967; Shafer, 1976).
The key idea: at each level of the taxonomic hierarchy (genus, family, order, etc.), Deng entropy measures how evenly species are distributed across groups. The product of these level-wise entropies gives a single number that captures the entire hierarchical diversity of a community.
This approach produces 8 complementary indices through a three-stage pipeline, each answering a slightly different question about the community.
library(taxdiv)
community <- c(
Quercus_coccifera = 25,
Quercus_infectoria = 18,
Pinus_brutia = 30,
Pinus_nigra = 12,
Juniperus_excelsa = 8,
Juniperus_oxycedrus = 6,
Arbutus_andrachne = 15,
Styrax_officinalis = 4,
Cercis_siliquastrum = 3,
Olea_europaea = 10
)
tax_tree <- build_tax_tree(
species = names(community),
Genus = c("Quercus", "Quercus", "Pinus", "Pinus",
"Juniperus", "Juniperus", "Arbutus", "Styrax",
"Cercis", "Olea"),
Family = c("Fagaceae", "Fagaceae", "Pinaceae", "Pinaceae",
"Cupressaceae", "Cupressaceae", "Ericaceae", "Styracaceae",
"Fabaceae", "Oleaceae"),
Order = c("Fagales", "Fagales", "Pinales", "Pinales",
"Pinales", "Pinales", "Ericales", "Ericales",
"Fabales", "Lamiales")
)Shannon entropy treats each species as an independent event with probability \(p_i\). But in a taxonomic hierarchy, species are grouped — two oak species share more information than an oak and a pine. Shannon cannot capture this grouping.
Deng entropy solves this through the concept of focal elements from evidence theory. At each taxonomic level, a group (e.g., “Family Fagaceae”) acts as a focal element with a mass proportional to the species it contains. The entropy accounts for both the mass distribution and the size of each focal element (how many species it contains):
\[E_d = -\sum_{i=1}^{n} m(F_i) \log_2 \frac{m(F_i)}{2^{|F_i|} - 1}\]
where \(m(F_i)\) is the mass of focal element \(F_i\) and \(|F_i|\) is the number of species it contains.
The term \(2^{|F_i|} - 1\) accounts for all possible non-empty subsets of species within the group. A genus with 3 species has \(2^3 - 1 = 7\) possible subcombinations, giving it more “evidential weight” than a single-species genus.
result <- ozkan_pto(community, tax_tree)
cat("Deng entropy by taxonomic level:\n\n")
#> Deng entropy by taxonomic level:
for (i in seq_along(result$Ed_levels)) {
level <- names(result$Ed_levels)[i]
value <- result$Ed_levels[i]
cat(sprintf(" %-10s Ed = %.4f\n", level, value))
}
#> Species Ed = 2.3026
#> Genus Ed = 2.5459
#> Family Ed = 2.5459
#> Order Ed = 2.9935How to interpret:
A level with Deng entropy = 0 means all species belong to a single group at that level — it contributes no taxonomic information.
The Ozkan method produces 8 values organized in a 2 x 2 x 2 structure:
cat("=== All 8 Ozkan pTO indices ===\n\n")
#> === All 8 Ozkan pTO indices ===
cat("Standard (all levels):\n")
#> Standard (all levels):
cat(" uTO =", round(result$uTO, 4), " (unweighted diversity)\n")
#> uTO = 7.4895 (unweighted diversity)
cat(" TO =", round(result$TO, 4), " (weighted diversity)\n")
#> TO = 10.6675 (weighted diversity)
cat(" uTO+ =", round(result$uTO_plus, 4), " (unweighted distance)\n")
#> uTO+ = 8.5502 (unweighted distance)
cat(" TO+ =", round(result$TO_plus, 4), " (weighted distance)\n\n")
#> TO+ = 11.7283 (weighted distance)
cat("Max-informative levels:\n")
#> Max-informative levels:
cat(" uTO_max =", round(result$uTO_max, 4), " (unweighted, informative only)\n")
#> uTO_max = 7.4895 (unweighted, informative only)
cat(" TO_max =", round(result$TO_max, 4), " (weighted, informative only)\n")
#> TO_max = 10.6675 (weighted, informative only)
cat(" uTO+_max =", round(result$uTO_plus_max, 4), " (unweighted distance, informative only)\n")
#> uTO+_max = 8.5502 (unweighted distance, informative only)
cat(" TO+_max =", round(result$TO_plus_max, 4), " (weighted distance, informative only)\n")
#> TO+_max = 11.7283 (weighted distance, informative only)| Question | Index |
|---|---|
| Pure taxonomic structure (no abundance) | uTO or TO |
| Taxonomic diversity + abundance evenness | uTO+ or TO+ |
| Are some taxonomic levels uninformative? | Use **_max** variants |
| Default recommendation for most studies | TO+ (most complete) |
Uses the full community as-is. Computes all 8 indices directly.
Species are removed one at a time, starting with the least abundant. After each removal, all indices are recalculated. This “slicing” procedure reveals two things:
run2 <- ozkan_pto_resample(community, tax_tree, n_iter = 101, seed = 42)
cat("Run 1 (deterministic): uTO+ =", round(run2$uTO_plus_det, 4), "\n")
#> Run 1 (deterministic): uTO+ = 8.5502
cat("Run 2 (stochastic max): uTO+ =", round(run2$uTO_plus_max, 4), "\n")
#> Run 2 (stochastic max): uTO+ = 8.5502Why does maximum > deterministic? Because some species may be taxonomically redundant. If two species from the same genus are present, removing one can increase the ratio of between-group to within-group diversity. The species whose removal increases diversity is called an “unhappy” species — it is taxonomically redundant in the community.
How to read:
Points above the red line represent subcommunities more diverse than the full community — evidence that some species are taxonomically redundant.
Some taxonomic levels carry no information. If all species belong to the same order, Deng entropy at the order level is zero — including it in the product just drags the value down without adding insight.
Run 3 repeats the calculation using only levels where Deng entropy > 0:
full <- ozkan_pto_full(community, tax_tree, n_iter = 101, seed = 42)
cat("Complete pipeline summary:\n\n")
#> Complete pipeline summary:
cat(" uTO+ TO+ uTO TO\n")
#> uTO+ TO+ uTO TO
cat("Run 1:", sprintf("%9.4f %9.4f %9.4f %9.4f",
full$run1$uTO_plus, full$run1$TO_plus,
full$run1$uTO, full$run1$TO), "\n")
#> Run 1: 8.5502 11.7283 7.4895 10.6675
cat("Run 2:", sprintf("%9.4f %9.4f %9.4f %9.4f",
full$run2$uTO_plus_max, full$run2$TO_plus_max,
full$run2$uTO_max, full$run2$TO_max), "\n")
#> Run 2: 8.5502 11.7283 7.4895 10.6675
cat("Run 3:", sprintf("%9.4f %9.4f %9.4f %9.4f",
full$run3$uTO_plus_max, full$run3$TO_plus_max,
full$run3$uTO_max, full$run3$TO_max), "\n")
#> Run 3: 8.5502 11.7283 7.5029 10.6808The jackknife procedure removes each species one at a time and recalculates all indices. This directly measures each species’ contribution:
jk <- ozkan_pto_jackknife(community, tax_tree)
cat("Jackknife results (TO+ when each species is removed):\n\n")
#> Jackknife results (TO+ when each species is removed):
jk_df <- jk$jackknife_results
for (i in seq_len(nrow(jk_df))) {
direction <- ifelse(jk_df$TO_plus[i] > result$TO_plus, "UNHAPPY", "happy")
cat(sprintf(" Remove %-25s -> TO+ = %.4f [%s]\n",
jk_df$species[i], jk_df$TO_plus[i], direction))
}
#> Remove Quercus_coccifera -> TO+ = 11.4820 [happy]
#> Remove Quercus_infectoria -> TO+ = 11.4820 [happy]
#> Remove Pinus_brutia -> TO+ = 11.6616 [happy]
#> Remove Pinus_nigra -> TO+ = 11.6616 [happy]
#> Remove Juniperus_excelsa -> TO+ = 11.6616 [happy]
#> Remove Juniperus_oxycedrus -> TO+ = 11.6616 [happy]
#> Remove Arbutus_andrachne -> TO+ = 11.3238 [happy]
#> Remove Styrax_officinalis -> TO+ = 11.3238 [happy]
#> Remove Cercis_siliquastrum -> TO+ = 11.2505 [happy]
#> Remove Olea_europaea -> TO+ = 11.2505 [happy]
cat("\nHappy species:", jk$n_happy, "\n")
#>
#> Happy species: 10
cat("Unhappy species:", jk$n_unhappy, "\n")
#> Unhappy species: 0degraded <- c(
Quercus_coccifera = 40,
Pinus_brutia = 35,
Juniperus_oxycedrus = 10
)
communities <- list(
"Intact (10 spp)" = community,
"Degraded (3 spp)" = degraded
)
plot_radar(communities, tax_tree,
title = "Intact vs Degraded Forest")
#> Warning: Removed 2 rows containing missing values or values outside the scale range
#> (`geom_point()`).The radar chart reveals which diversity dimensions are most affected by degradation. If abundance-weighted indices (Shannon, Simpson, TO+) drop more than presence/absence indices (AvTD, uTO+), the community has lost evenness. If both drop equally, the community has lost taxonomic breadth.