AI Governance & Guardrails
Production LLM features need input/output filtering, audit logs, PII handling, and a clear human-override path. The model is the easy part.
Deep dive
Layers
- Input: PII redaction before send, prompt-injection detection on user-supplied content, length limits.
- Output: structured-output validation (schemas, regex, content filters), refuse/escalate on low confidence.
- Audit: log prompts and responses (with PII handling) for incident response and continuous eval.
- Human-in-the-loop: for high-stakes outputs (medical, legal, financial), require human approval before action.
EU AI Act & similar
Classify your use case (limited risk / high risk / prohibited). Document data sources, evaluation methodology, and a risk register. Have a process for incident reporting.
Real-world example
From productionCustomer-facing chatbot leaked a system prompt via prompt injection. Added: input filter for known injection patterns, output check for system-prompt fragments, conversation-level rate limit, and an audit pipeline. Six months later: zero successful jailbreaks reported, full trace of every interaction.
Interview questions
1 senior-levelQ1How would you protect against prompt injection?▾
Defense in depth: separate system and user content with clear delimiters, treat retrieved documents as untrusted, validate structured outputs against a schema, log all interactions, and avoid giving the LLM access to side-effects (tools, write APIs) without a confirmation step.
Common mistakes
Sending raw user PII to a third-party LLM without redaction or DPA review.
Trusting model output to be valid JSON.
No audit trail — when something goes wrong, you have no story.
Trade-offs
Stricter guardrails reduce false positives but increase refusal rate and friction.