Bring your own model: why regulated AI needs customer-controlled inference

Regulated organisations learned a hard rule about encryption: if a vendor holds the keys, the vendor holds the data, whatever the marketing says. The same rule is now arriving for AI, and most organisations have not noticed. When an application sends a prompt to a third-party model, it is shipping whatever is in that prompt, often regulated personal data, to a provider in a location and under terms the organisation did not choose. Where inference runs and which model serves it is a governance decision of exactly the same weight as who holds the keys. Treated as a vendor default, it quietly undoes the data-residency, provider-oversight, and auditability commitments the organisation makes everywhere else.

This is the case for customer-controlled inference, the AI sibling of bringing your own encryption keys (covered in customer-controlled encryption keys). The argument has three parts: sending data to a model provider is a transfer event, depending on one provider is a concentration risk, and a system that can silently switch models behind the scenes breaks the audit trail. Each points to the same answer: the customer, not the platform vendor, should decide where AI runs and what serves it.

Sending data to a model is a transfer event

A prompt is rarely empty of personal data. It carries names, case details, identifiers, internal context. Calling an external model API is therefore a processing event under GDPR, and if the provider sits outside the EEA, it is a Chapter V international transfer that needs a lawful transfer mechanism and a transfer impact assessment, which most teams calling an API have never performed. The European Data Protection Board's Opinion 28/2024, adopted in December 2024, went further: an AI model trained on personal data "cannot, in all cases, be considered anonymous," so the model itself is not a safe, data-free black box you can assume away.

The cautionary tale is already canonical. Within about three weeks of permitting ChatGPT internally in 2023, Samsung engineers leaked sensitive material three times, including semiconductor source code, and the company banned generative AI on its devices. The same year, Italy's data-protection authority temporarily blocked ChatGPT and later, in December 2024, issued a fifteen-million-euro fine over its data handling, a decision since contested in court. The durable lesson is not the euro figure but the grounds: inputs to a model are regulated data, and where they go matters.

Providers have responded with enterprise commitments not to train on customer inputs and, on request, zero-retention tiers. These are real and worth having. But they are contractual, often tier-gated or available only by request, and they reduce rather than remove the underlying fact that the data still leaves your boundary to be processed by someone else's system in someone else's location. The only control that removes the egress entirely is running inference where you choose.

One provider is a concentration risk

Standardising an entire AI estate on a single external provider recreates a risk regulators already named for cloud. Under the EU's DORA, applicable since January 2025 with no transition period, AI and model providers are ICT third-party service providers: in scope for concentration-risk management, register-of-information obligations, documented and tested exit strategies, and, for the largest, direct designation as critical providers under EU oversight. A dependency you cannot exit is a finding waiting to happen.

The fragility is not theoretical. Model providers retire versions on their own schedule: OpenAI's published deprecation timeline has pulled model versions on a few months' notice, well short of the six-to-twelve-month horizon regulated change management expects, forcing migrations that can break reproducibility for anyone who pinned to a specific version. ISO/IEC 42001 and the NIST AI Risk Management Framework both treat third-party AI as something to be monitored continuously for exactly these changes, not onboarded once and forgotten, and OWASP's Top 10 for LLM Applications lists supply-chain risk (LLM03) precisely because an opaque dependency on someone else's model is an ungoverned one.

A silent fallback breaks the audit trail

The most overlooked risk is internal. Many AI integrations are built to fail over silently to a different model or provider when the primary is unavailable or rate-limited. For a consumer app that is sensible engineering. For a regulated decision it is a governance failure, because the record no longer reflects what actually served the decision. The EU AI Act's Article 12 requires high-risk systems to "technically allow for the automatic recording of events (logs) over the lifetime of the system." An unlogged model swap defeats that, and it defeats the AI bill of materials: you cannot attest to which model, version, and provider produced an output if the system was free to change them without telling you. The control is to pin the model and provider, log which one served each inference, and forbid any fallback that is not itself approved and recorded.

The same direction holds in the MENA region, where the constraint is often residency. The UAE requires health data to remain within the country and applies localisation rules in banking, and Saudi Arabia's SAMA cloud rules and data-transfer regulation keep core financial data in-Kingdom. Tellingly, major providers have begun selling in-region data residency for enterprise AI, which confirms the underlying point: a known provider in a chosen region is now a purchasable control, not a fantasy.

The architectural answer

Customer-controlled inference means one of two things, and a serious platform supports both: running an open-weight model inside your own infrastructure, or using your own provider account in your own chosen region with a known, approved model. In both cases the customer decides the provider, the region, and the version, and nothing falls back to an unapproved alternative without leaving a record.

This is how Novantra is built to run AI. Novantra supports bringing your own model, so inference uses the provider and region you approve rather than a vendor default, and Novantra runs with no silent fallback: if an approved model is unavailable, the system does not quietly substitute one, it surfaces the condition. Every inference is logged with the model, version, and provider that served it, feeding the same hash-chained audit chain as the rest of your governance record. Because Novantra's Sovereign deployment runs inside your own infrastructure with customer-controlled keys, the prompts, the model where you self-host, and the evidence never leave your boundary. The deployment foundation is covered in sovereign by architecture, and the residency mechanics in data residency as a deployment decision.

The question regulated buyers are starting to ask their AI vendors is the one they already learned to ask about encryption: not "is it secure," but "who decides where it runs." For AI, the right answer is the customer.

Bring your own model: why regulated AI needs customer-controlled inference

Sending data to a model is a transfer event

One provider is a concentration risk

A silent fallback breaks the audit trail

The architectural answer

Table of contents

More updates

From compliance to capability: what every regulator is moving toward

Access reviews as evidence: making quarterly recertification effortless