This article is a compact maintainer-oriented map of the package. It explains how safety decisions are produced without requiring a separate design document at the repository root.
policy() creates rules, thresholds, controls, and optional rate guards
scan_prompt() checks user input before it reaches a model
scan_context() checks retrieved rows before prompt assembly
scan_conversation() checks role-preserving chat histories
scan_tool_call() and scan_tool_output() guard tool boundaries
scan_stream() scans streamed output with rolling context
scan_output() checks model text before display, storage, or downstream use
secure_chat() orchestrates scanning, chat execution, output scanning, and audit
write_audit_log() persists the end-to-end evidence trail
The package keeps the safety path inspectable. Every scanner result
is based on explicit findings. Every finding has a rule id, severity,
action, optional OWASP LLM category, and optional character span.
Scanner reports resolve to allow, redact, or
block; orchestration results may also use
refuse or escalate when policy controls map a
block to those outcomes.
ellmer chat, object with
$chat(), or plain R function can be used.R/rules.R.R/policy.R.R/scan_prompt.R.R/scan_context.R.R/scan_output.R.R/secure_chat.R.shieldr_rule
id stable rule identifier
pattern regex pattern, or NULL
fn R predicate function, or NULL
owasp OWASP LLM category
severity low, medium, high, or critical
action allow, redact, or block
description human-readable explanation
shieldr_policy
name policy identifier stored in reports
rules list of shieldr_rule objects
thresholds redact_at and block_at numeric cutoffs
rate_guard optional shieldr_rate_guard environment
trusted_sources optional allowlist used by scan_context()
controls secure_chat() block/refuse/escalate/drop behavior
shieldr_report
action scanner action
text_clean normalized and possibly redacted text
findings list of finding objects
risk_score deterministic severity score
policy policy name
checks rules, nlp, llm, or both
metadata surface-specific operational metadata
Severity weights are:
| Severity | Score |
|---|---|
low |
0.1 |
medium |
0.3 |
high |
0.6 |
critical |
1.0 |
Findings are deduplicated before scoring. Overlapping span findings
from the same source, OWASP category, and action count as the strongest
single piece of evidence instead of stacking together. Distinct findings
still accumulate, and the total score is capped at 1.0.
Synthetic scanner or context findings are tracked separately and capped
before being added to normal rule evidence.
Actions are resolved conservatively:
if any finding is critical:
block
else if any finding action is block:
block
else if risk_score > block_at:
block
else if any finding action is redact:
redact
else if risk_score >= redact_at:
redact
else:
allow
The strict greater-than comparison for block_at keeps a
single high-severity redaction finding from escalating solely because
its score equals a threshold. Explicit block rules and
critical findings still block immediately.
shieldr_rule() and add_rule().scanner_options() for local scanners such as
encoded payloads, URL host policy, language allowlists, topic bans, and
token limits.redaction_strategy() for replace, mask, hash, drop,
and keep behavior.policy_controls() to choose refuse, escalate, drop,
or keep-redacted outcomes after scanner blocks.ollama_reviewer() or remote_reviewer().Before release, regenerate documentation, run the test suite, run
R CMD check --as-cran, review examples that require
external services, and update NEWS.md and
cran-comments.md.