How does Copilot for Security work? and is it worth it?

In preparation for a presentation that I had today at Microsoft Secure in Norway, where I had a talk about how to get the most out of Copilot for Security I needed to do a lot of research. Therefore, I wanted to share some of the research here, based upon my findings on how Copilot for Security works.

NOTE: If you want to see the presentation you can view it here –> msandbu/securitycopilot (github.com)

Firstly, the Copilot for Security consists of three main parts.

1: The web service (for the standalone service) or the embedded part which is directly available as part of Sentinel, Defender and other M365 services.

2: The orchestrator which is responsible for RAG (Retrieval augmented generation) running functions/skills and task orchestration. The orchestrator also has an API frontend (api.securitycopilot.microsoft.com) which is not available in the Graph API yet….

3: The LLM (or the Security Compute Unit) which is a separate service in Microsoft Azure. The compute unit is mainly an isolated instance of a fine-tuned LLM inference API within a specific region. While the availability is still a bit limited (currently only 4 geos such as Europe and UK). Remember that the service is not part of EU Data Boundary commitment from Microsoft.

The Compute unit is a strangely priced service since the cost for the service 4$ dollar per hour per unit. While Microsoft recommends 3 units to use it properly but could be even more depending on the usage, since you can easily use the entire computing unit if you use the wrong prompts.

How can I calculate how much I need? After much testing, I noticed that a Compute Unit does not impact speed but how much information that It can process (aka the number of tokens it can process). So, for each compute unit you get X number of tokens it can process within an hour. How high the X number is I do not know, but it is really small.

An issue is that Copilot for Security has a quite limited number of tokens it can process, since it has a pretty long system message. A system message is a set of instructions that are given to the LLM, this also includes GPT Functions that allows the LLM to interact with 3.party APIs. Pro tip! Since this service has a long list of different functions you are better of turning of which functions you are not using to reduce the token usage in the system message. Do not reuse a session if you do not plan to use context from the previous prompt.

Also, when you are doing prompts in Copilot for Security, the LLM (the SCU) is always getting everything within the same session as context, which also increases the token usage.

This service is also using a fine-tuned LLM which is where Microsoft has used a baseline GPT and trained it with their data set. Then you end up with a fine-tuned LLM model which has more knowledge within the security field.

So how can Microsoft improve the model? Trough more data and understanding how organizations are using the service. In the service we have some options that we should understand.
By default, data sharing is turned on. Meaning that data from your organization and prompts will be sent to Microsoft and used to improve the model, you can turn this of within the web portal.

Now when it comes to the advantage of this service compared to regular Copilot or ChatGPT is the integrations that are available. For instance, how can we allow the service to talk with an external system through APIs or find data from Sentinel or via Defender? With using GPT Functions.

Functions is in simple terms the use of keywords in the prompt that will give instructions to the model to know that “this is going to an external API” and will therefore generate the required JSON output which another system function will then use to call the API. While in this service these are called “Skills”

As an example here with Greynoise, where the orchestrator has to guess based upon the input prompt which function it needs to trigger (so it is doing an evaluation based upon all the skills that is available and configured)

To allow Copilot to trigger these functions, we need to give the LLM instructions via the prompt such as “Find all active incidents in Microsoft Sentinel” where Sentinel is the keyword or information that the LLM needs to see to understand that it needs to call the function.  This process is handled by the orchestrator, and once the API call is done and the Orchestrator get output from the API call, it is then sent to the LLM for processing.

This service has a bunch of different skills and plugins.

  • Microsoft plugins (uses your credentials to access data)
  • Azure AI Search (Use against your own data sources)
    • Security Guidelines
    • Disaster Recovery procedures
    • Information Security Guidelines
    • Needs to be called specifically using the term “Azure AI Search” in the prompt.  This service also needs to be set up separately.
  • 3. Party service such as Service Now
  • Copilot for Security plugin
    • Which can either be API calls via functions, GPT plugins or KQL queries that can be triggered via a function.
  • Can also upload custom data (max 20 MB)

One example can be with Azure AI Search, where we can have “any” type of data made available either it is internal knowledge articles, security guidelines or something else. Which can then be used as context from Copilot Security.

In order to trigger a function to AI Search we need to specify directly “Azure AI Search”

And we need to have a pre-created AI search instance (Azure PaaS service) with vectorized data). The vectorization requires that you use Azure OpenAI with Ada embedding to generate the vectors of the data.

One issue related to functions is that this service does not handle parallel functions, meaning that it cannot trigger multiple functions as part of the same prompt. One example is that I cannot ask the service to evaluate an IP address against multiple API endpoints at the same time.

As you can see here, it only did one function call. It should be noted that parallel function called is limited in support only in the newer GPT models https://platform.openai.com/docs/guides/function-calling/parallel-function-calling, so I am guessing that this feature will be coming for security copilot some time in a later stage.

That means if you want to evaluate some information against multiple external services, you need to build something called a prompt book or create a set of prompts that you need to run sequentially. Where information from one prompt is passed onto the next prompt. All information within this prompt book is stored in a session (and therefore the context window of the LLM call)

Now while the orchestrator does a good job on evaluating which skills to use, you can hardcode it using System Capabilities, this ensures that the orchestrator knows which skill/function to use and therefore you save both time and SCU tokens

Because, when you are sending a generic prompt, the orchestrator needs to evalute the prompt against the different skills.

All these skills that are configured as also sent in as part of the system prompt meaning that you are spending A LOT of tokens on a simple prompt that can be answered directly without the use of any skills.

So always try to use System Preferences whenever possible. Also, when you are adding plugins, you have some different options either.

  • KQL
  • API
  • GPT

Adding KQL skills, means that you can pre-create a KQL query that can be triggered from the portal without actually writing the query and use it in combination with other prompts.

For instance here I have a Kusto function that is used to get the cost of the sentinel workspace by defining just a function.

Security Copilot provides a framework to easily call and get data from different sources such as the Microsot security ecosystem (including Intune, Entra ID and Azure services) and also a easy way to extend it with plugins and 3.party tools using GPT functions (or skills) also having a fine-tuned LLM that is trained on “something”.

The orchestration layer also provides an easy way to navigate trough the different plugins given that you provide it with enough examples and content.

Is it worth it?

I like the framework that this service provides and easy access to a lot of the different services that Microsoft provides in an easy way. However there are some limitations with the current set up. Firstly it is apparently locked to Microsofts fine-tuned LLM which means that we are locked down to Microsoft development cycle and use of LLM models. While I get that they want to control which LLM version to ensure that the service works properly in regards to function calls, but I would much rather like to point it to an Azure OpenAI instance where Microsoft Security LLM is made available as a model, that way we can have much easier insight into the token usage and can also have the flexibility to try out other LLMs from OpenAI when they release new features.

Secondly the cost is really high, since they use fine-tuned GPT is means that they have a specific GPT-4 version that is used and not of the later models that are running GPT-4 Turbo. Turbo also is much faster, meaning that evaluating prompts would be a lot faster as well. Even there I have the possiblity to use GPT-4 with 128,000 token support.

However Microsoft recommends that you use 3 SCU Units which is 12 dollars per hour, close to 8640$ dollars per month. For that amount I can run 1000 prompt with GPT-4 Turbo and 128,000 tokens input and output (for only 4,000$) per month. Close to half, and secondly that is only with pay-as-you-go, but gives me more control of usage and which location. The only issue is that few OpenAI LLMs on Azure OpenAI support functions (including the GPT-4 Turbo) so I still have to wait for it.

I guess that what I want to say is that

  • If you have developers that understand Azure OpenAI, Semantic Kernel / Langchain and a web framework such as Streamlit/Databutton you can build a service faster then Azure Security Copilot with more control and not be bound to the limitations that the service has today. However you will not have the native experience, but you can however easily build web interfaces on top of a LLM framework and also make it availabile trough API calls and with use inside services like Logic Apps.
  • If you dont have the capacity to build your own service, this might/can be a good service, but make sure to use the service promperly in terms of which services and or use System Capabilities to reduce the token usage and you should also ensure that if you do not plan to use this for automation, that you build some automation to reduce the SCU sizes to a minimum when not used.

A final thing, is that I really hope that Microsoft makes this more of a framework instead of being bundled with the current fine-tuned LLM.

Leave a Reply

Scroll to Top