Design Principles for LLM-based System with Zero Trust

It is a short overview of the guideline by the German Federal Office for Information Security (BSI) and French Agence nationale de la sécurité des systèmes d‘information Secrétariat général de la défense et de la sécurité nationale.

There are 6 main design principles to ensure the security of the LLM system.

Authentication and authorisation.
Input and output restrictions.
Sandboxing.
Monitoring, reporting and controlling.
Threat intelligence.
Awareness.

Let’s dive into each principle in more details.

Authentication and Authorisation

This principle implies that only the entrusted individuals have the access to the LLM system and necessary permissions to perform specific tasks.

To put it into the practice, the following strategies should be performed:

Multi-factor authentication (MFA). Do not use just password, instead use at least two different checks to prove identity.
No LLM-based authentication. LLMs are not designed to verify identity, so, it is no wise to overuse them in this task.
Restricting plug-in access. Plug-in tools should be restricted to access the conversation history, especially some sensitive data.
Least privilege principle. The access should be given based on the role, nothing more.
Dynamic access control. The access should be reviewed based on the location, time, behavioural patterns, action context, device type to track and prevent unusual access.
Attribute based access control. Access is given based on attributes (labels) like: user role, data sensitivity, time, or place
Monitoring. The system should be monitored consistently for any unusual patterns.
Documentation. All the interactions with system should be documented.
Autonomy restrictions. Do not overuse autonomous agents for simple tasks that can be easily executed with the predefined workflows and direct cod.
Multi-tenant architecture. Segregate data and agent by sensitivity level.

Input and Output Restrictions

In order to prevent prompt injections attacks, the following measures should be in place:

Gateway. A gateway functions as a control mechanism between the core LLM and its components. It applies principles similar to Zero Trust security, where no input is accepted without verification. Inputs are validated through algorithmic checks and machine learning methods to identify anomalies, such as unusual syntax, abnormal length, or the presence of prohibited terms. The purpose is to prevent malicious or manipulative prompts from being processed.
Tags. Tags are metadata labels applied to incoming data that indicate the origin of the input. By distinguishing between trusted and untrusted sources, the system can enforce restrictions accordingly. Tagged inputs allow the system to ignore or limit instructions from external or unverified systems, reducing risks such as prompt injection or evasion attacks. Furthermore, tags support fine-grained permission management, ensuring that access and functionality align with the trust level of the source.
Trust Algorithm. A trust algorithm is a structured method for evaluating the reliability of an input. It assigns a trust score based on weighted factors, including user identity, device, time, historical behaviour, and prior interactions. Multiple thresholds may be defined to determine appropriate actions, ranging from acceptance, to additional verification, to rejection. By using this scoring system, the LLM can adapt its response based on contextual risk rather than applying uniform rules.
Output control. Output control ensures that the responses generated by an LLM remain safe, accurate, and compliant with defined rules. Dedicated frameworks, such as guardrails, are used to validate outputs before they are delivered or executed. In cases where the model controls or influences system resources, outputs should be formalised into verifiable commands that can be checked against predefined rules. For critical actions, human approval (human-in-the-loop) is required. A separate LLM may also be utilised to interpret and explain generated commands in order to detect potential misuse before execution.
External tools and content. The use of external tools and retrieval of external content must be subject to strict restrictions. Automatic loading or execution of unverified content, such as embedded images or links, presents risks of data exfiltration and prompt injection. Accordingly, content from untrusted sources should not be retrieved or rendered without validation. Before transmitting or receiving data externally, the system must notify the user of the source or destination and obtain confirmation.

Sandboxing

Sandboxing refers to the isolation of the LLM system from external components, applications, or other LLMs, preventing unintended interactions. This measure protects against cascading security incidents, such as remote code execution leading to privilege escalation and full system compromise. In practice, sandboxing ensures that tasks executed by the LLM remain within a controlled environment, with no unauthorised communication beyond defined boundaries.

Memory management. LLM memory must be strictly isolated between users and sessions to prevent cross-contamination of data. Memory sanitisation, secure storage, and carefully controlled access to persistent content are required. As a compliance measure, no data should be stored unless there is explicit authorisation.
Emergency shutdown. In situations where critical security risks are detected, it must be possible to shut down the LLM system or its individual components. Backup and recovery mechanisms should be in place to preserve data integrity and allow for rapid restoration.
System isolation. Interactions with external components must be predefined and limited. LLM systems handling sensitive data should operate without internet connectivity. Users must not be permitted to open links generated by the LLM, as they may be used for exfiltration or malicious payload delivery. The use of untrusted plugins should be prohibited to avoid hidden prompt injections. Where internet access is essential, links must be restricted, validated, and preferably limited to whitelisted domains.
Session management. Each task should be executed in a new inference session, ensuring that only relevant information is carried over between sessions. Context segmentation must be applied to guarantee clear boundaries between users and agents, preventing data leakage.
Context window. Sensitive information must not be included in the context window, particularly in cases where internet access or the display of external content is enabled. Context data should be cleared at the start of each new session to minimise exposure.
Environment segregation. Development, testing, and production environments should remain separated. Testing environments are intended to identify and address weaknesses before deployment.
Network segmentation. The underlying network must be divided into isolated segments, with strict rules governing communication between them.

Monitoring, Reporting and Controlling

It is important to ensure continuous observation of threats and anomalies.

Threat detection mechanisms. Set up systems that can spot unusual activity early. This includes strange request patterns, sudden spikes in use, or behaviour that does not match normal operations. Monitoring resource use (CPU, GPU, API calls) helps detect abuse, such as brute-force attempts or excessive consumption of system capacity.
Automated responses. Prepare predefined automatic reactions to common threats. This allows the system to respond quickly without waiting for human action. Using real-time threat intelligence strengthens awareness of ongoing risks and speeds up incident resolution.
Token limits. Put limits on how many tokens each user or device can use. This prevents misuse, ensures fair access for everyone, and helps keep the system stable.
Logging and analytics. Keep detailed records of all interactions. These logs are essential for audits, investigating security incidents, and improving detection of new threats. Logs must be analysed regularly to identify suspicious behaviour.
Regular testing. Run automated security tests often to check for weaknesses. Testing confirms that security rules are working and that updates do not introduce new risks.
Real-time monitoring. Continuously check incoming prompts in real time. Block suspicious or harmful inputs immediately. This helps prevent misuse and also ensures the system runs well.

Threat Intelligence

This principle is a continuation of the previous one.

Intelligence. Use knowledge from past incidents to recognise familiar attack patterns and detect suspicious inputs before they cause harm.
Access controls. Connect the system to threat intelligence feeds that list known malicious IP addresses or agents. These can then be automatically blocked or flagged for further review.
Regular audits. Carry out security tests such as red-teaming, where simulated attacks are used to identify weaknesses and close security gaps before real attackers can exploit them.
Dynamic analysis. Work with security communities and use trusted data sources (from governments, enterprises, and security organisations) that share information on current and emerging threats. These sources include “indicators of compromise” (IOCs), which help detect malicious activity.
Restructuring. If any system component is compromised, it must be removed. The LLM system should then be reorganised to restore security and stability.

Awareness

Unlike others, this principle is about mentality of how to ensure security at human level.

Practical training and testing. Carry out simulated cyberattacks and adversarial exercises to test both the system and its users. These exercises improve awareness of risks, train stakeholders, and highlight weak points in existing defences.
Case studies and examples. Use real-world incidents, such as data theft through malicious links, in training sessions and security workshops. These practical examples help understand risks clearly.
Security communication. Provide clear and consistent messages such as “Do not trust AI systems unconditionally”. Such communication fosters a critical mindset and reduces blind trust in automated outputs.
Promote risk awareness. Distribute regular security updates, briefings, and newsletters to ensure stakeholders remain informed about new threats and evolving risks.
Explainability and transparency. Ensure that the way an LLM system makes decisions is as transparent and understandable as possible for both users and stakeholders.

Please find the complete guideline here.

British cat