AI Security · 18. September 2024

Introducing the PALIM framework to design secure GenAI applications

Building applications on top of LLMs brings enormous potential but also a number of security risks. In this post, I am happy to introduce the PALIM framework that I have developed to build secure GenAI applications on top of existing LLMs. It consists of five components to understand and look for in your planned applications design. For each component assume the worst case and then evaluate the impact.

🔣 (P) Predefined Prompts: If you have prompts for the LLM predefined in your application code that is then augmented with user input, this is susceptible to prompt attacks (e.g., “ignore all previous instructions, do instead this:…”). Your predefined prompts could be turned upside down. Look for occurrences in your code and evaluate each for its negated impact.

🕵️‍♂️ (A) Automation and Agent Plugins: Look out for any GenAI output that is directly taken into action by automation. We have to assume the LLM can produce totally wrong output, so what would be the effect of the automation then? Automation without user approval should be avoided.

🧮 (L) Logic Rules for Business or Security: Implement and enforce any important rules outside the LLM, don’t use the LLM itself. Otherwise an attacker could learn (and circumvent) the rules by the model.

🪣 (I) Isolation of Data: The LLM should only see the data it is supposed to be, and act on behalf of the user it is interacting with. If we have one, big data bucket that contains all information, the LLM could leak information to unauthorized users. But if we keep isolation of data and its access enforcement outside the LLM (data bucket A for user A, bucket B for user B), then any data access of the LLM is limited to information the authorized user has access to anyway.

📊 (M) Model of GenAI Threats: Last, but not least, to understand the full context of your application you need to look at a general threat model for GenAI LLMs. Prompt injection and data leakage are most cited security threats. But actually, the inherent ability to produce wrong information (due to hallucination or due to misleading input or other effects) is the biggest threat!