Threat Model
llmshieldr is an application-level guardrail layer for R
workflows that send text to large language models or receive text from
them. It helps make common risks visible and auditable; it is not a
complete security boundary.
Assets
- User prompts and chat history.
- Retrieved context in RAG workflows.
- Model outputs before display, storage, or downstream use.
- Sensitive data such as PII, PHI, credentials, and business
records.
- Tool inputs and outputs in applications that call external
systems.
- Streaming chunks before complete model output is available.
- Audit logs and policy configuration.
Trust Boundaries
- User-provided text entering an R application.
- Retrieved documents, search results, or database rows entering model
context.
- Model output leaving the LLM provider or local model.
- Tool calls that can affect files, databases, APIs, accounts, or
transactions.
- Streaming output chunks crossing from model provider to
application.
- Audit logs written to local or shared storage.
In Scope
llmshieldr provides starter controls for:
- Direct and indirect prompt-injection language.
- Common PII, PHI, and secret patterns.
- Simple NLP intent signals for override, exposure, and harmful-action
intent.
- Output markers for unsafe agency, system-prompt leakage, unsafe
code, and high-confidence medical or financial claims.
- RAG context source allowlists and simple context anomaly
signals.
- Tool-call argument scanning and tool-output scanning.
- Conversation scanning with role-preserving metadata.
- Streaming output scanning with rolling context.
- Token and request budget guards with pre-call reservation and
rollback.
- Optional semantic review through a reviewer function, chat object,
local Ollama reviewer, or remote reviewer endpoint.
- Auditable findings, actions, risk scores, and JSONL/CSV/RDS audit
output.
Partially Covered
These areas have package surface but need workflow-specific evidence
or additional controls before they should be treated as robust
protections:
- OWASP LLM Top 10 coverage. The package maps controls to categories,
but this is not exhaustive protection for each category.
- Obfuscated prompt injection. Unicode normalization, delimiter
collapse, invisible-text findings, and encoded-payload checks help, but
a larger adversarial evaluation suite is still needed.
- RAG poisoning. Source allowlists and anomaly checks help, but there
is no provenance scoring, embedding-neighborhood analysis, or document
trust graph.
- Semantic review. Reviewer JSON is parsed with schema metadata,
confidence, evidence, recommended actions, span support, and structured
failure metadata, but reviewer reliability depends on the model and
deployment.
- Tool and streaming guardrails. Package helpers scan text surfaces,
but they do not replace application authorization, sandboxing,
idempotency, or rollback for external side effects.
Out Of Scope
llmshieldr does not provide:
- A network firewall or sandbox.
- Model training-time alignment.
- Formal compliance certification.
- Guaranteed PII/PHI discovery.
- Malware analysis.
- Full multilingual safety coverage.
- Automated execution of tools or tool authorization.
- Full human approval workflow management beyond
escalate
action metadata.
- Cross-machine distributed rate limiting.
- Protection against compromised model providers, dependencies, or
infrastructure.
Expected Use
Use llmshieldr as one transparent layer in a broader
safety design:
- Scan and redact prompts before sending them to a model.
- Scan retrieved context before adding it to prompts.
- Scan model outputs before display or downstream use.
- Scan tool-call inputs before execution and tool outputs before
reuse.
- Scan streaming output chunks when using streaming APIs.
- Configure policy controls for refusal and escalation behavior.
- Write audit logs to sensitive storage.
- Add organization-specific rules and negative tests.
- Run evaluations against your own application data before
deployment.
Non-Goals
Do not describe llmshieldr as guaranteeing safety,
compliance, jailbreak resistance, or complete OWASP coverage. It is an
R-native, transparent, testable guardrail package with starter controls
and extension points.