Loading…
Friday June 26, 2026 3:30pm - 4:15pm CEST
Organizations deploying GenAI systems quickly discover that safety controls do not automatically enforce organizational policies. Real environments operate under large and evolving sets of domains, organization-specific and external policies driven by legal requirements, industry regulations, and internal governance rules, and they change periodically. Enforcing these rules in production is not a one-time setup problem; it is a continuous governance and operations challenge.

Existing guardrail solutions are not designed to handle custom, large-scale, and continuously evolving organizational policies. When AI agent developers or AI security teams attempt to stretch these safety-oriented systems into general policy enforcement, their underlying design assumptions no longer hold because they assume a small, static policy space rather than a broad and heterogeneous one. Static rules such as regex become unmaintainable and produce unreliable detection at scale, fine-tuned classifiers require constant retraining, and LLM-as-a-judge pipelines, even when carefully calibrated, are expensive to run, introduce non-trivial latency and are difficult to audit.

This talk describes how we stress-tested existing compliance approaches, including static guardrails, fine-tuned detectors, and LLM-as-a-judge pipelines, and analyzed how they degrade under realistic policy complexity.
We present a reframing of the problem: instead of relying solely on output-level judgments, policy violations can also be detected directly in the model’s internal space with a training-free approach. We explain what this shift enables in practice, including continuous compliance monitoring, policy updates without retraining loops, and improved auditability. We also discuss the limitations of this advanced approach.

We also address a deeper conceptual issue that emerged from our error analysis: in practice, the boundary between “policies” and “instructions” is often unclear, and treating instructions as if they were policies leads to confusing and brittle failure modes. Today, both alignment boundaries and performance or business objectives are commonly expressed using the same mechanism—rules or instructions—blurring fundamentally different concerns under a single notion of “policy.” This separation is critical: some instructions define organizational and alignment constraints, while others encode task goals and performance requirements. Conflating these concepts results in misaligned controls, as they require different enforcement strategies and, in many cases, different ownership and roles within the organization.

The goal of this talk is to provide AppSec and GRC teams with a clearer mental model for operating LLM policy compliance in production, a checklist of questions to ask about existing guardrail solutions, and a better understanding of what it actually takes to keep LLM systems compliant over time.
Speakers
avatar for Oren Rachmil

Oren Rachmil

Senior AI Researcher,, Fujitsu Research of Europe

Oren Rachmil is a Senior AI Researcher at Fujitsu Research of Europe, working on the safety, evaluation, and security of large language model systems. His recent research focuses on analyzing gaps in open-source LLM vulnerability scanners, understanding evaluator reliability, and... Read More →
Friday June 26, 2026 3:30pm - 4:15pm CEST
Hall K1 (Level -2)

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link