Guardrails in Umbraco AI | mattbrailsford.dev

When you add AI to a CMS, you’re giving it the power to generate content, respond to visitors, and interact with editors. That’s powerful — but it means you need to be able to set boundaries. What if a model hallucinates sensitive information, or a user sends something inappropriate?

That’s what guardrails are for. They let you define safety policies — rules that evaluate content before it reaches the AI model and after it comes back, with configurable actions when something gets flagged.

What Is a Guardrail?

A guardrail is a named, reusable safety policy containing one or more rules. Each rule has three parts:

An evaluator that checks content against specific criteria
A phase — either pre-generate (checks user input before it reaches the model) or post-generate (checks the model’s response after it comes back)
An action that determines what happens when content is flagged:

Action	What Happens
Block	The request or response is stopped entirely. An `AIGuardrailBlockedException` is thrown.
Warn	The content passes through, but a warning is logged.
Redact	The flagged content is replaced with `[REDACTED]` and the request continues.

You can mix phases and actions within a single guardrail. When multiple rules flag content in the same phase, the most restrictive action wins: Block > Redact > Warn.

Built-in Evaluators

Umbraco.AI ships with three evaluators out of the box:

Contains — checks whether content contains a specific substring. Supports case sensitivity and redaction.

Regex — pattern-based matching using regular expressions. Supports case-insensitive and multiline modes, plus redaction of matched patterns.

LLM Judge — sends content to another AI model and asks it to evaluate against criteria you define in natural language, like “Does this response contain medical advice?” You set a safety threshold between 0 and 1, and it uses a separate chat profile for evaluation. The system automatically prevents guardrails from running on guardrail evaluation requests, so no infinite loops.

Where Guardrails Are Configured

Guardrails are first-class entities managed in the backoffice under the AI section. You create them with a name and alias, add rules, and configure each evaluator’s settings.

Once created, guardrails are assigned to profiles. Since profiles control which provider, model, and settings are used for AI features, attaching guardrails here means they automatically apply everywhere that profile is used. The Prompt and Agent add-ons also support guardrail assignment, and the system aggregates guardrails from all sources for a given request.

How It Works at Runtime

When a chat request flows through Umbraco.AI, the guardrail middleware resolves which guardrails apply (from profiles, prompts, agents), then evaluates pre-generate rules against the user’s input — blocking or redacting before it reaches the model. After the AI responds, post-generate rules evaluate the response with the same logic.

For streaming responses, code-based evaluators (contains, regex) work on chunks as they arrive. Model-based evaluators (LLM judge) wait for the full response. The main limitation is post-generate redaction — once a chunk has been streamed to the client, you can’t un-send it, so redaction degrades to a warning. If you need strict content filtering on streamed responses, use Block rather than Redact for post-generate rules.

Extending with Custom Evaluators

The built-in evaluators cover common cases, but the system is fully extensible. Here’s a custom evaluator:

[AIGuardrailEvaluator("profanity-filter", "Profanity Filter", Type = AIGuardrailEvaluatorType.CodeBased)]
public class ProfanityGuardrailEvaluator
    : AIGuardrailEvaluatorBase<ProfanityConfig>
{
    public override string Description =>
        "Checks content against a profanity word list";

    public ProfanityGuardrailEvaluator(IAIEditableModelSchemaBuilder schemaBuilder)
        : base(schemaBuilder)
    { }

    public override Task<AIGuardrailResult> EvaluateAsync(
        string content,
        IReadOnlyList<ChatMessage> conversationHistory,
        AIGuardrailConfig config,
        CancellationToken cancellationToken)
    {
        var evalConfig = config.Deserialize<ProfanityConfig>() ?? new ProfanityConfig();

        var found = evalConfig.BlockedWords.FirstOrDefault(w =>
            content.Contains(w, StringComparison.OrdinalIgnoreCase));

        return Task.FromResult(new AIGuardrailResult
        {
            EvaluatorId = Id,
            Flagged = found is not null,
            Score = found is not null ? 1.0 : 0.0,
            Reason = found is not null
                ? $"Content contains blocked word: {found}"
                : null
        });
    }
}

The [AIGuardrailEvaluator] attribute registers it with a unique ID and sets the evaluator type — the system auto-discovers it. AIGuardrailEvaluatorBase<TConfig> handles the schema plumbing, and the Type controls streaming behaviour (CodeBased runs on chunks, ModelBased waits for complete content). Call config.Deserialize<TConfig>() inside EvaluateAsync to get your typed config.

If your evaluator supports redaction, implement IAIRedactableGuardrailEvaluator:

public class ProfanityGuardrailEvaluator
    : AIGuardrailEvaluatorBase<ProfanityConfig>,
      IAIRedactableGuardrailEvaluator
{
    // ... constructor and EvaluateAsync as above ...

    public Task<IReadOnlyList<AIGuardrailRedactionCandidate>>
        FindRedactionCandidatesAsync(
            string content,
            AIGuardrailConfig config,
            CancellationToken cancellationToken)
    {
        var evalConfig = config.Deserialize<ProfanityConfig>() ?? new ProfanityConfig();
        var candidates = new List<AIGuardrailRedactionCandidate>();

        foreach (var word in evalConfig.BlockedWords)
        {
            var index = content.IndexOf(word, StringComparison.OrdinalIgnoreCase);
            if (index >= 0)
            {
                candidates.Add(new AIGuardrailRedactionCandidate(
                    index, word.Length, content.Substring(index, word.Length)));
            }
        }

        return Task.FromResult<IReadOnlyList<AIGuardrailRedactionCandidate>>(
            candidates);
    }
}

Your config class uses [AIField] attributes to generate the configuration UI in the backoffice automatically:

public class ProfanityConfig
{
    [AIField(
        Label = "Blocked Words",
        Description = "List of words to block")]
    public string[] BlockedWords { get; set; } = [];
}

Wrapping Up

Guardrails give you a structured way to enforce content policies across your AI features. The combination of pre- and post-generation phases, multiple action types, and a pluggable evaluator system means you can start simple and build up as your needs evolve.

They’re available now in the Umbraco.AI backoffice — create a guardrail, add some rules, assign it to a profile, and your AI features are protected.

Until next time 👋