How can LLMs help with Operational IT-Security?

A couple of weeks ago, I had a conference talk about how large language models can help people working with IT security. While we are still in the early phases of LLMs they can be quite helpful within multiple areas, which I want to highlight in this blogpost. First of with all new threats emerging such as a new vulnerability, we spend a lot of time collecting information. Going back to like Print Nightmare which was initially one vulnerability which then became multiple vulnerabilities, we needed to get different workaround and methods to detect abuse of this new vulnerability.

Number of vulnerabilities each year from 2008 – 2022 (Source CVE database)

Also, we have about 25000 known vulnerabilities each year, which means that we spend a lot of time investigating and patching (but also monitoring for potential abuse or implementing countermeasures).

In terms of LLMs many vendors have already announced that they will be providing LLM functionality integrated into their security ecosystem such as Google Cloud Security AI Workbench, VirusTotal Code insight, and lastly Microsoft Security Copilot. The purpose of many of these tools is to make it easier to digest the overwhelming amount of information. However, they come at a pretty excessive cost and even Microsoft Security Copilot we don’t even know the price details yet.

But can we build or own applications to help with IT security and secondly what can it do? Well first we need to understand how a language model works, the limitations it has, and look at some of the use cases. The model first tokenizes the text and breaks it down into smaller units (words or subwords) called tokens. These symbols are then converted to numerical values (which are called embeddings) The model processes these embedding values through multiple layers of transformer processes that enable it to understand the relationships, contexts, and meanings of each token in relation to the others. It then extracts the semantic information in the text, such as what is the main essence of the text, are concepts and concepts that are explained, and other properties that are written up. It then tries to identify dependencies in the text. Once it has continued its semantics and looked at the dependency of the text, it can try to make a summary of the text.

If we look at how the models can “predict” words this picture below is a good indication on the next words in a given context.

Secondly, the issue is that the language models are often trained on data until a given date. Such as with GPT3-5 which has been trained with data until September 2021 and GPT-4 up until January 2022. Also, language models such as with OpenAI do not have internet access by default, unless you have ChatGPT Plus which gives you access to Bing, but given the data sources available to Bing, well… it’s not good.

Also, language models are unable to learn updated content directly however, we have various alternatives to facilitate their access to data or train the model. These include utilizing methods like Fine-tuning or Retrieval Augmented Generation (RAG).

One question that came up during my talk, can we add all our security data and log information access to GPT? Since this could be a time saver we could just ask our “GPT” interface if we can look for “show abnormal traffic after working hours” for instance and it would comb through our data and find a match. However, it is not that simple. First of is we cannot train GPT directly on security data, which means that we need to have some mechanism in between such a search mechanism that the language model can use to query for the right data. Or in better terms, make the data available for the language model.

This is where a mechanism called RAG (Retrieval-augmented Generation) comes in. RAG is a method that combines the power of large pre-trained models, like those in the GPT series, with external information retrieval systems to generate more informed and factually accurate responses.

  1. Components:
    • Retriever: This component searches through a large corpus of documents (like Wikipedia) to find passages that are relevant to a given query.
    • Generator: Once relevant passages are retrieved, this component generates a response using both the original query and the retrieved passages as context.
  2. How it Works:
    • When a query or question is input into the RAG system, the retriever first identifies a set of relevant passages or documents from a predefined corpus.
    • These retrieved passages are then passed along with the query to the generator, which synthesizes the information and produces a coherent answer.
  3. Advantages:
    • Dynamic Knowledge: Unlike models like GPT which have a fixed “knowledge cutoff” due to the pre-training data, RAG can access more up-to-date information if the external corpus is regularly updated.
    • Factual Accuracy: By grounding its responses in actual passages from a knowledge source, RAG can provide more factually accurate answers.
    • Efficiency: Instead of training a model on a vast corpus, RAG can be trained on a smaller set of data, relying on the retriever to access the breadth of information when needed.
  4. Applications:

Can we use this together with our own security data? Sure, but it will be a cumbersome process.

So how can we use the language model from a security perspective?

1: Help to write/design contingency plans. For organizations that need help writing plans for instance what to do in case of ransomware attacks. For any other type of plan/report, I have always found GPT useful to get started to get writing points.

2: Help to write SIEM queries based on indicators or threats. An issue might be that since GPT models do not have the latest information available, they have a pretty good understanding of query engines like Splunk/Q-Radar/Sentinel. For instance, I find it pretty useful for Sentinel queries and if It does not generate a correct answer I always try and give it more context and examples and sometimes I paste in the documentation.

3: Code analysis! Sometimes you have some obscure PowerShell code or script that you have no idea what it does and you want to get some more information on what it does. For instance I have an built a simple streamlit and langchain application that I can use to explain code for me. This app is called “Kodeanalytiker“. If I for instance upload this PowerShell script which I know is malicious and parse it trough the app I get this feedback.

So useful insight. This is nothing fancy, you can view the source code of the app here gpt-ai/ at main · msandbu/gpt-ai (

Another thing is using GPT with virtual assistant / agents for research information or collecting information. Especially now with new vulnerabilities and information being posted in many different forums and websites we always need to collect information from somwhere which we then need to summarize or organize. Luckily there are services like assafelovic/gpt-researcher: GPT based autonomous agent that does online comprehensive research on any given topic ( which can collect information on our behalf. GPT-researcher can also scrape content from websites as well and will then summarize content from information that has been collected.

Regardless of tooling, GPT can also be quite helpful in describing and explaining content which can be a useful asset when working within IT-security! and this is while we are waiting for other services like Microsoft Security Copilot.

There is also a bunch of interesting GPT security tools on this link as well cckuailong/awesome-gpt-security: A curated list of awesome security tools, experimental case or other interesting things with LLM or GPT. (

Leave a Reply

Scroll to Top