After reading so much blog posts and new services from Palo Alto, Wiz and other security vendors selling products for either AI Secure posture management or other MODERN ways too protect from prompt injections or other types of attacks that can be used against an Generative AI service. Eventually I got tired of all the different “sales pitches” and I decided to write a blog post of my own which should give you enough context and information on how to build security into your own services or what you should use.
What kind of attack vectors do we have on Generative AI services?
Well it depends on what kind of service?. If we build everything ourselves and just use an LLM from a cloud provider, the attack vectors looks like the one below
If we use PaaS services or Agent frameworks from cloud providers, well then the picture changes.
Many of these attack types are described in a OWASP top 10 attacks for LLMs OWASP Top 10 for Large Language Model Applications | OWASP Foundation, but these do not impact all the delivery methods but mostly when you are building stuff on your own.
NVIDIA even launched a tool called Garak which can be viewed as NMAP scan for LLMs NVIDIA/garak: the LLM vulnerability scanner That can test a LLM against a known set of attacks such as encoding-based injection attacks. NOTE: There are also tools like Giskard that can also be used to do security benchmarks of LLMs 📚 LLM Quickstart – Giskard Documentation
Now back to my ranting against one of the sales pitches from Palo Alto, where a Generative AI application using prompt injection is able to show information related to another user (and of course how they can protect against these types of attacks) but to secure against these types of attacks you need to understand how the work.
Generative AI services comes mostly in four different options.
- Fully SaaS based services (Like Microsoft Copilot, Perplexity, ChatGPT)
- PaaS services (Copilot Studio, Amazon Q, Google Vertex AI Builder)
- IaaS (Where you are using public cloud inference API for LLM processing or using IaaS w/GPU on Cloud) but application is hosted on a separate component)
- Private Hosted (Where you are using a private hosted LLM inference service such as Ollama, vLLM or others together with other components) t
For Private Hosted workloads the platform can look like this depending on which platform you use. For instance by using vLLM you can host multiple models running on Kubernetes. This approach uses the NVIDIA Kubernetes Device Plugin. In this stack we have full responsibility for all the layers, one of the upsides is that we can be certain that the Generative AI model is not collection anything related to user input to a cloud service.
Once a model is made available trough the inferencing API, we can then build “logic” on top of the model using orchestration or integration engines such as Langchain, Semantic Kernel or others.
In regards to cloud based LLM offerings, where most cloud providers today provide an inferencing API of them model it looks more like this. Where the cloud provider is just giving us a LLM available as an API. We can then build our logic such as a chatbot using either a orchestration such as Langchain or just pure REST APIs. In addition to these standardized LLMs from each vendor provided as a PaaS service. We can also host custom LLMs using services like Azure AI Foundry , AWS Sagemaker/Bedrock and Model Garden from Google Cloud/Vertex
Then we have PaaS services like Amazon Q or Copilot Studio which allows us to create custom virtual assistants. These services use a predefined LLM model, and does not directly provide a way to adjust the way that it interacts with the underlying model since that is managed by the platform.
Then the final example is with tools like ChatGPT, Copilot and Perplexity. Where most is managed by the provider and we only get access to the application. Which is the same as with pure SaaS services.
Securing Generative AI SaaS services
Let us start with the last example with securing Generative AI services like ChatGPT, Copilot. What kind of security risks do they pose?
Data exposure – What if we send sensitive information to the service?
Well there is a couple of things to consider, firstly services like ChatGPT and Copilot are global services so you do not know where that data is processed. Also by default specifically for ChatGPT, your data can be used to train the model but this can be turned off. It is important to note that Generative AI models are STATIC they do not directly learn from our prompts. However data can be collected to train the model from the different vendors.
The second part is the use of GPT plugins (which is now called GPTs) where you have 3.party integrations that can be used to extend the functionality of ChatGPT such as support for other file formats, like this one.
The problem with these types of GPTs is that you do not know what kind of services are being used in the backend to do the translation. So something is doing OCR on this PDF file in the backend, but I never see this I only see the prompt reply. In one case earlier with the previous version of GPTs we noticed that all uploaded PDF files were sent to a cloud service in China for processing.
Microsoft avoided this by adding a feature they call Enterprise data protection, which is a fancy term that just turns of the support for 3.party data sources.
As mentioned these services to not learn from our prompts. The models are static, but some vendors/services will collect the data for training later, so by disabling this feature where supported we avoid that services use our data for processing.
Now secondly can someone “hack” these services to get access to data stored here? Let us say that by accessing my Copilot history they can see all the data that is processed?
1: The probability of that happening is the same someone else hacking another service such as your email
2: These services have the same “limitation” with access token stealing and you should have MFA enabled
3: There could be issues with a backend service that could in theory expose information, as happened here –> OpenAI Reveals Redis Bug Behind ChatGPT User Data Exposure Incident but the risk is not higher compared to other services.
4: When you enter a prompt the content of the prompt is processed then gone from memory of the LLM engine.
So to summarize
Control the data without the use of custom plugins, control your user account with MFA/Passkeys and turn off data sharing you will be fine. The only thing missing is from a compliance perspective since many of the online SaaS services do not directly provide audit logs, so in some cases you have no insight into what kind of prompts and services that have been used. It should be noted that from a Mobil phone you can access hundreds of different GPT clones, where someone has built an application that mimics ChatGPT but with these types of unofficial apps you have no control of how data is used. \
Securing Generative AI IaaS services
In many of the base models
What if where we create our own custom services? Using integration frameworks such as Semantic Kernel / Langchain or other tools to build our own custom services? The upside with this approach is that it allows us to have full control of the entire stack, but also decide which model to use.
With these services, we can have our own data added using a RAG approach (a search engine and vector store) but that part I will get back to, the first part is integrations.
First of is that we need to understand function calling. Which is the process where you get the LLM to interact with a 3.party system (useful if you want to do live API calls to a service to get up to date information) The LLM is not the one orchestrating the API call, but is responsible for generating the correct output based upon a JSON structure, which will in turn send it to a python function or similar which does the API call.
An example of a GPT function could look like this using OpenAPI specifications.
openapi: 3.1.0
info:
title: Stripe API
description: API to interact with Stripe customers.
version: 1.0.0
servers:
- url: https://api.stripe.com/v1
description: Stripe API server
paths:
/customers:
get:
operationId: getCustomers
summary: Retrieve a list of customers
description: Returns a list of customers in Stripe.
parameters:
- name: limit
in: query
required: false
description: Limits the number of customers returned.
schema:
type: integer
This means that using this description I am providing the LLM context to trigger this specific function, given the correct prompt entered by the user.
Of course all these functions are stored in the System Prompt. So if I have a lot of different System Prompt and someone is able to get access to the System Prompt of my service it can show much of the logic.
While the issue here is the function calling and what kind of access that has. It is important to note that you should not give out sensitive information using a function call. Or worse if your function call is talking with a database, it should have only the minimum level of access rights / permissions on the data source. If the API that the GPT is triggering via the function has to wide permissions they could in theory dump the database or collect all the information stored in the database using the function call.
As an example I have here, where the GenAI engine is trying out different SQL commands against the database (based upon a chain in Langchain)
Also it is difficult to handle a user based function call, therefore functions are generic and works for all users of the service. Therefore you should ensure that the backend API/Service has restricted access.
Now what if your GenAI service or application has a bunch of different function calls? OpenAI alone supports up to 128 different functions, so that means a lot of integrations. Might also be that some of those might surface some PII or other information. Or what if my application is customer facing and people are “abusing” the service to generate malicious content?
One of my favorite tools is LLM-Guard, which sits between the application and the language model and can protect against many of these types of attacks (also falling under the category prompt injection), such as enforcing token limits protecting against DoS attacks.
from llm_guard.input_scanners import TokenLimit
scanner = TokenLimit(limit=4096, encoding_name="cl100k_base")
sanitized_prompt, is_valid, risk_score = scanner.scan(prompt)
LLM-Guard can also protect from prompt injections
from llm_guard.input_scanners import PromptInjection
from llm_guard.input_scanners.prompt_injection import MatchType
scanner = PromptInjection(threshold=0.5, match_type=MatchType.FULL)
sanitized_prompt, is_valid, risk_score = scanner.scan(prompt)
We also have cloud services like AI Gateway from Cloudflare that can also handle rate limiting but will add more latency to the LLM call AI Gateway
But also being able to anonymize data using a integration with Presidio (which is a Microsoft component Microsoft Presidio) which can both detect but also anonymize content before the information is displayed back to the user.
We also have Language models that are specifically created to evaluating the safety of content. ShieldGemma from Google for instance is a set of instruction tuned models for evaluating the safety of text prompt input and text output responses against a set of defined safety policies. This model but also other models like meta-llama/Llama-Guard-3-8B · Hugging Face can be for instance be used together with NVIDIA NeMo Guardrails which act much like LLM-Guard, and uses LLM to evaluate the content before passing it to the actual language model. However NeMo is a bit more complex to use compared to LLM-Guard.
Microsoft also has its own service called AI Content Safety, which unfortunately does not support all LLMs.
However these are just some tools that can be used together with custom built Generative AI services, to protect against PII data becoming exposed. Another issue that we can face with building services from “scratch” is the use of many different components, where many might be open-source and we might have vulnerabilities in the software stack that we use. As seen with some versions of Langchain (langchain 0.2.9 vulnerabilities | Snyk) but that might be just one component of the GenAI ecosystem. These services evolve quite fast, so it is important to have track of the versions.
What about GenAI Agent frameworks?
What if you were to build agents using tools like Autogen, Copilot Studio, Amazon Bedrock Agents or Elevenlabs? Well it is no different then other virtual assistants. We need to protect what kind of functions the agent can trigger. Who can access it, and ensuring that the memory context is only available to the user that is initiating it. Also it is important to define what kind of access the agent has to data stored. Cloud services like Amazon has filters that can be used to determine who should have access to the data Access control for vector stores using metadata filtering with Amazon Bedrock Knowledge Bases | AWS Machine Learning Blog
One should also have proper monitoring into how the agent is behaving. Depending on what kind of agent framework or orchestration framework you are using there are different ones available. If you are using Langchain you can use Langfuse or Langsmith that provides this breakdown of each activity done by the agent.
Also making sure that access to the agent is limited, that the LLM inferencing API is not directly accessible.
In this article, we dug into many different aspects on how to secure Generative AI features. At the end of the day, there is little difference from securing a GenAI service compared to other applications that you might have in your environment. Also depending on what kind of service or delivery you are using the amount of security you need to manage on your own will be different.