I have been reading so many bad blog posts lately regarding Microsoft Copilot and how they work that I decided to write a bit more in-depth the inner workings of the different offerings so that you can get some more sense on how they actually work. While I wrote a somewhat intro here –> How do the different Copilot services from Microsoft actually work? – msandbu.org however I wanted to go more in-depth.
Firstly, we need to understand the ecosystem. All the different Copilot offerings from Microsoft is using a service underneath called Azure OpenAI (which is a managed service) but for the sake of these services they are managed and under control by Microsoft. Some of these Copilot offerings are either provided within your geographic or as a global service. The OpenAI service contains distinctive features, but most of the functionality that is being used is the GPT LLMs. Azure OpenAI can also be used as a standalone service that allows you to consume the LLM directly through an API or the web-interface. However, when using Copilot all these settings and APIs are abstracted away.
Is should be noted that by default Azure OpenAI will log prompts and usage for 30 days to monitor for abuse of the models, however for ALL COPILOT Service Azure OpenAI monitoring is disabled by default. Secondly the OpenAI service is like Dory from finding nemo, it will process the information but then forget about it after it has been processed.
Such as Microsoft Copilot or Copilot Pro is a global service, hence the information you send it can be sent to any global point where the service is available. While for instance Microsoft 365 Copilot is mostly operated where your Microsoft 365 organizational data resides. Bing-backed connected experiences don’t fall under Microsoft’s EU Data Boundary (EUDB) commitment.
Secondly the significant difference between the different services is what kind of data sources that they have access to and how they are presented. Such as Github Copilot only uses data (or code) within the IDE which is then sent to the LLM to process the prompt.
Below is an overview picture showing the different services with the data source that they use for context.
Now let us look at how these different Copilot services handles prompts or actions. Firstly, when you enter a prompt, let’s say that we want Copilot X to write about Subject Y. Within the different Copilot services is a Orchestrator or integration layer which will then handle the prompt, it might then do a search against the internal data and find relevant sources on Subject Y. Depending on the configuration it could also use other data sources such as Bing Search or plugins that are connected to the service. Then it combines the data collected from the search and the prompt (this then is called grounding) and then sends all this data to the LLM API to make the content based upon the prompt and grounded data.
This means that the prompt is not being directly sent to the LLM but handled through the integration layer. This integration layer can also have different mechanisms to do content filtering, rate limiting but also monitoring. The LLM itself can also have a thin proxy layer on top to do content moderation.
Github Copilot
So let us put this into context of for instance Github Copilot. Github Copilot uses Azure OpenAI underneath and is available now as an extension in most IDE tools and important to note that it is a global service. Copilot also has unique features built-in that is handled by the Copilot proxy such as Code referencing filters that tries to avoid the use of public code.
Also, the Github Copilot extension adds some predefined system prompts that is to optimize the LLM input. If I use the explain feature in Copilot with a set of code, Copilot will do the following
"{"messages":[{"role":"system","content":"\nYou are an AI programming assistant.\nWhen asked for your name, you must respond with \"GitHub Copilot\".\nFollow the user's requirements carefully & to the letter.\nYour expertise is strictly limited to software development topics.\nFollow Microsoft content policies.\nAvoid content that violates copyrights.\nFor questions not related to software development, simply give a reminder that you are an AI programming assistant.\nKeep your answers short and impersonal.\n\nYou can answer general programming questions and perform the following tasks:\n* Ask a question about the files in your current workspace\n* Explain how the selected code works\n"
So as you can see it is embedding this system prompt into the “explain” command. Secondly, Copilot will constantly stream code that you are typing into the completion API as well hence you get the “autocomplete” feature based upon that.
While Github Copilot is fairly easy since it only sends content from within either the IDE or source code that the IDE has access to, a bit heavier is Microsoft 365 Copilot which can be seen as a RAG application with custom UI in the Office applications.
Microsoft 365 Copilot
Since Microsoft 365 Copilot like many other Copilots uses RAG (Retrieval augmented generation) is has some extra components to it. Firstly, let us look at the service from an overview before we go into detail of the service.
If we for instance say to Copilot 365 from within Word to “Summarize” this document. It will send content stored within the document with the prompt to the Copilot orchestrator (which then makes out the grounding data). Depending on where your M365 data is stored the LLM processing will happen in the same region (unless there is high traffic or high load, then the LLM traffic will be processed in another region) so no search or RAG involved here.
So, what if we use Microsoft 365 Copilot with RAG? well then, the flow is a bit different. The first thing that happens when you buy and add Microsoft 365 licenses is the building of the semantic index. Which is another way of saying that content stored in Microsoft 365 will have vectors added as form of metadata to them.
Vectors is another way to data with numerical values. While we do not see these vectors to the stored data in Microsoft 365 they are processed and stored with the search engine. So when Copilot is looking for data, it will use a hybrid search which uses firstly a hybrid search approach which looks for keywords and vectors (this is to locate more relevant content) that also adding semantic ranking to find the most likely data source.
Microsoft also states that they have personalization as part of the equation, but this is not documented anywhere.
So what happens when I say to Copilot I want you to write a section about topic X? firstly, the prompt will be sent from the application to the Copilot Orchestrator which will then trigger a search against the Graph API search engine which uses a hybrid search to find the most relevant data. If it does not find the relevant content in your Microsoft 365 tenant it can also use web search with Bing, or data found from Graph Connectors.
The prompt is also converted to vectors so that it can more easily find relevant data with the search engine.
Again, the Copilot Orchestrator will the combine the prompt with grounding data (from Graph Search) and will then send it to the Azure OpenAI service which then will process the data. This is the way that most RAG based services are built and how they work. Now there are a couple of questions that I often get asked
- Why is there no support for language X in Copilot?
- The problem is, while the language model underneath in Azure OpenAI might support your language. Microsoft needs to add the logic and optimize the system prompts to work properly with language X, this might take some time for it to work.
- What are the limitations of the Copilot service?
- Firstly, in terms of data size you need to understand that processing of the data is dependent on what kind of GPT version is used, the number of tokens is the limit on how much data that can be processed.
- What kind of files are supported by Copilot?
- Adobe: PDF
- Microsoft Office (Word, PowerPoint, Excel, Loop): DOC, DOCX, FLUID, LOOP, PPT, PPTX, XLSX
- OpenOffice: ODT, ODP
- Rich Text Format: RTF
- Text and Code: ASPX, RTF, TXT
- Web / Hypertext: HTM, HTML
- When will Microsoft support the new features from OpenAI?
- While the OpenAI instance for Copilot will support the newer models such as GPT-V and GPT-4 turbo which supports upwards to 128k tokens. Microsoft needs to add the UI capabilities, logic into the orchestrator to be able to process it and lastly provide actions/functions in the UI that can trigger the new features.
- Can I connect data source X or service Y?
- Most all data can be connected using Graph Connectors. This is only for FETCHING data. You can also build conversational plugins that can be used to integrate with 3. party APIs and you can create custom actions there. These conversational plugins are built in Microsoft Copilot Studio.
Hopefully, this gave you some more insight into how the different Copilot works. Now I said that I wanted to give you a bit more insight into the other Copilots as well, so instead of loads of text I decided to add some of the drawings on how they work here.
Microsoft Copilot Studio
Microsoft Copilot and Copilot Pro
One thing to note about the so called “Copilot with commercial data protection” is intended as a secure alternative compared to other alternatives, but the only thing it does is disables plugins so that data cannot be sent outside of the Bing search engine.