Shannon and Simpson indices are the two most widely used diversity measures in ecology. They quantify how species abundances are distributed within a community — essentially answering: “How diverse is this community based on how individuals are distributed among species?”
Both indices consider only the abundance distribution. They do not account for taxonomic, phylogenetic, or functional relationships between species. A community of 10 species from the same genus receives the same score as a community of 10 species spanning 10 different orders.
This is why taxdiv pairs them with taxonomic measures — classical indices capture the abundance structure, while taxonomic indices capture the hierarchical structure.
Shannon entropy, borrowed from information theory (Shannon, 1948), measures the uncertainty in predicting the species identity of a randomly chosen individual. High uncertainty means high diversity — if species are evenly distributed, it is hard to guess which species the next individual belongs to.
\[H' = -\sum_{i=1}^{S} p_i \ln(p_i)\]
where \(p_i\) is the proportion of species \(i\) and \(S\) is the total number of species.
# Default: natural logarithm
H <- shannon(community)
cat("Shannon H':", round(H, 4), "\n")
#> Shannon H': 2.0948
cat("Maximum possible H' for", length(community), "species:",
round(log(length(community)), 4), "\n")
#> Maximum possible H' for 10 species: 2.3026
cat("Evenness (H'/H'max):", round(H / log(length(community)), 4), "\n")
#> Evenness (H'/H'max): 0.9098When sample sizes are small, the observed Shannon index underestimates the true value because rare species are likely missing from the sample. taxdiv provides three correction methods:
cat("Uncorrected: ", round(shannon(community), 4), "\n")
#> Uncorrected: 2.0948
cat("Miller-Madow: ", round(shannon(community, correction = "miller_madow"), 4), "\n")
#> Miller-Madow: 2.1292
cat("Grassberger: ", round(shannon(community, correction = "grassberger"), 4), "\n")
#> Grassberger: 2.1338
cat("Chao-Shen: ", round(shannon(community, correction = "chao_shen"), 4), "\n")
#> Chao-Shen: 2.1014Which correction to use?
Simpson’s index (Simpson, 1949) measures the probability that two randomly chosen individuals belong to the same species. A community dominated by one species has a high probability (low diversity); an even community has a low probability (high diversity).
taxdiv provides all three common Simpson variants:
# Dominance (D): probability of same-species pair
D <- simpson(community, type = "dominance")
cat("Simpson dominance (D): ", round(D, 4), "\n")
#> Simpson dominance (D): 0.1424
# Gini-Simpson (1-D): probability of different-species pair
GS <- simpson(community, type = "gini_simpson")
cat("Gini-Simpson (1-D): ", round(GS, 4), "\n")
#> Gini-Simpson (1-D): 0.8576
# Inverse Simpson (1/D): effective number of species
inv <- simpson(community, type = "inverse")
cat("Inverse Simpson (1/D): ", round(inv, 4), "\n")
#> Inverse Simpson (1/D): 7.0246| Variant | Formula | Range | Interpretation |
|---|---|---|---|
| Dominance (D) | \(\sum p_i^2\) | 0 to 1 | Higher = less diverse (one species dominates) |
| Gini-Simpson (1-D) | \(1 - \sum p_i^2\) | 0 to 1 | Higher = more diverse (common choice) |
| Inverse Simpson (1/D) | \(1 / \sum p_i^2\) | 1 to S | Effective number of equally abundant species |
The inverse Simpson is often the most intuitive: a value of 6.5 means the community is as diverse as one with 6.5 perfectly even species.
# Even community
even <- c(sp1 = 20, sp2 = 20, sp3 = 20, sp4 = 20, sp5 = 20)
# Uneven community (same species, different abundances)
uneven <- c(sp1 = 90, sp2 = 4, sp3 = 3, sp4 = 2, sp5 = 1)
cat("=== Even community ===\n")
#> === Even community ===
cat("Shannon:", round(shannon(even), 4), "\n")
#> Shannon: 1.6094
cat("Simpson (1-D):", round(simpson(even, type = "gini_simpson"), 4), "\n\n")
#> Simpson (1-D): 0.8
cat("=== Uneven community ===\n")
#> === Uneven community ===
cat("Shannon:", round(shannon(uneven), 4), "\n")
#> Shannon: 0.4531
cat("Simpson (1-D):", round(simpson(uneven, type = "gini_simpson"), 4), "\n")
#> Simpson (1-D): 0.187Key difference: Shannon is more sensitive to rare species (because of the logarithm), while Simpson is more sensitive to dominant species (because of the squaring). When a community has many rare species, Shannon will detect them; Simpson may not.
| Scenario | Better index |
|---|---|
| Comparing sites with different rare species | Shannon |
| Detecting dominance shifts | Simpson |
| Need sample-size independence | Neither (use AvTD) |
| Need taxonomic information | Neither (use pTO or Delta) |
Classical indices treat all species as interchangeable. Consider:
# Community A: 5 species from 5 different orders
comm_A <- c(sp1 = 20, sp2 = 20, sp3 = 20, sp4 = 20, sp5 = 20)
# Community B: 5 species from the same genus
comm_B <- c(sp6 = 20, sp7 = 20, sp8 = 20, sp9 = 20, sp10 = 20)
cat("Community A (5 orders) - Shannon:", round(shannon(comm_A), 4), "\n")
#> Community A (5 orders) - Shannon: 1.6094
cat("Community B (1 genus) - Shannon:", round(shannon(comm_B), 4), "\n")
#> Community B (1 genus) - Shannon: 1.6094
cat("Identical scores, yet A is far more taxonomically diverse.\n")
#> Identical scores, yet A is far more taxonomically diverse.This is exactly why taxdiv includes Clarke & Warwick and Ozkan pTO indices — they incorporate the taxonomic hierarchy to distinguish between these communities. See the Clarke & Warwick and Ozkan pTO articles for details.