How does Windows Recall work?

While there has been a lot of focus on Windows Recall the last week about the security issues (which I will get back to later in this article) however there are few of the articles that actually describe how the technology actually works, therefore I wanted to use this article to write down my current understanding on how the different runtimes locally on the machine use Generative AI to handle the processing of Recall.

Some of you might also have heard about Rewind? It is a app for Mac that does exatly the same as Recall, essentially taking screenshots locally on your device, which is then indexes using OCR and stored in compressed database and then searchable

What Microsoft introduced with Recall is a combination of LLM and OCR capabilities which pretty much does the same. Instead of regular keyword search, it also allows you do use natural language to “search” within the data source.

It takes screenshots of what is happening on the screen and then stores it under

C:\Users\user\AppData\Local\CoreAIPlatform.00\UKP\{GUID}

And all images are stored under the same folder structure under the subfolder

.\ImageStore\

The database ukg.db is a SQLlite database that will store information as text and map it towards the screenshot that was taken. As seen here it can also store passwords…

Can we disable the feature? Yes you can there are multiple ways either CSP/ Group Policy or even old Registry, but the better option is to use CSP or Group Policy since this tattoos the settings

CSP./User/Vendor/MSFT/Policy/Config/WindowsAI/DisableAIDataAnalysis
Group policyUser Configuration > Administrative Templates > Windows Components > Windows AI > Turn off saving snapshots for Windows

Registry is also an option

Disable Recall – User [HKEY_CURRENT_USER\Software\Policies\Microsoft\Windows\WindowsAI] “DisableAIDataAnalysis”=dword:00000001 Disable Recall – Machine [HKEY_LOCAL_MACHINE\SOFTWARE\Policies\Microsoft\Windows\WindowsAI] “DisableAIDataAnalysis”=dword:00000001

Recall, Studio Effects and other GenAI services in Windows are managed by a service called the Windows Copilot Runtime. When Recall is triggering a screenshot capture it is using an action called Microsoft.Windows.Vision.RecognizedText, which uses Phi Silica as the underlying language model to handle the vision API to recognize text.

Phi Silica has been custom made to work optimized with NPUs. While in the initial release of the integration with Windows, it is limited on how you can use it standalone, but more APIs like Vector Embedding, RAG API, Text Summarization will be coming later. Since it has some limitations in terms of supported features such as lack of function calling

TasksPhi-3
Language TasksYes
Math & ReasoningYes
CodingYes
Function CallingNo
Self Orchestration (Assistant)No
Dedicated Embedding ModelsNo

Silica falls into the the existing familiy of Phi models from Microsoft

You can also read the technical report of Phi-3 here 2404.14219.pdf (arxiv.org)

The LLM model is managed by ONNX Runtime, which is a cross-platform machine-learning model accelerator. That in turn uses DirectML to abstract and run the operation across the different hardware options on th device, like the GPU and NPU. Where NPU is preffered by the library, depending on what is available on the hardware.

DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning on Windows. With the release of DirectML 1.13.1 and the ONNX Runtime 1.17, Microsoft announced developer preview support for NPU acceleration in DirectML, which is something that is now part of the new Copilot+ machines.

With the introduction of the new Copilot runtime on Windows, we are witnessing a new wave of Copilot features being run locally on Windows devices using the Phi-3 language model as the base. However, Microsoft faces a significant challenge in helping end users understand the differences between the cloud-based Copilot, which uses the GPT-4 language model, and the locally running Copilot, which uses the Phi-3 language model developed by Microsoft.

There are substantial differences between these two language models in terms of features and the amount of content they can handle. The language models running locally have a much more limited context window compared to GPT-4. For example, Phi-3 supports up to 4,000 tokens, equivalent to about 2,500 words, while GPT-4 can support up to 128,000 tokens, equivalent to approximately 100,000 words.

This significant difference in token support highlights the disparity in capabilities between the two models. GPT-4, used in the cloud-based Copilot, can manage much larger chunks of text and provide more comprehensive functionality, whereas the locally run Phi-3 model is constrained by the limited hardware capabilities of the user’s device.

To bridge this gap, Microsoft must effectively communicate these differences to end users, ensuring they understand the capabilities and limitations of each model.

One of the most significant security concerns with Microsoft’s Recall feature is its lack of content moderation. This means that if you enter sensitive information, such as passwords or Social Security numbers, into a website, Recall can capture a screenshot of that information and store it. These screenshots, along with any text information, are stored in a SQLite database.

The major issue here is that these texts and screenshots are not encrypted and are stored locally in the user’s app data folder. There is also a tool called Total Recall (xaitax/TotalRecall: This tool extracts and displays data from the Recall feature in Windows 11, providing an easy way to access information about your PC’s activity snapshots. (github.com) that can easily extract sensitive information from these files. This means that if a hacker gains access to your operating system or device, they could potentially collect all this sensitive information.

Now, consider the implications if you work in industries like healthcare, finance, or defense. If Recall starts capturing screenshots of sensitive information from the applications you use, and this information is leaked or accessed by unauthorized individuals, it could be devastating for your organization. The potential for data breaches and loss of sensitive information highlights the critical need for robust security measures when using such features.

Leave a Reply

Scroll to Top