How to setup Azure OpenAI with ChatGPT using your own data?

OpenAI’s GPT models have revolutionized the field of natural language processing, and Microsoft’s Azure platform provides an excellent infrastructure for deploying AI models. In this blog post, we will explore how to set up Azure OpenAI with ChatGPT using your own data using a predefined demo that Microsoft has created that uses Cognitive Search + Blob Storage + A Web app to show the results. So, let’s get started!

I had a colleague reach out to me, asking if I could set up this demo sample that Microsoft has created (Azure-Samples/azure-search-openai-demo: Demonstration of how to leverage Azure OpenAI and Cognitive Search to enable Information Search and Discovery over organizational content (github.com) since I have access to the ChatGPT service in Azure.

This sample uses Azure Developer CLI to deploy the environment, which is preconfigured also containing some example datasets, but you can also add your own data to it as well, which I will come back to at the end of this blog post.

This deployment doesn’t use your own data directly through ChatGPT, but when you ask a question to the GPT instance it will collect information from the chat and do a optimized search to Cognitive search which will then show back the data.

To use the built-in sample there are a couple of things that you need to have installed on your machine first to make it work. While the requirements are also mentioned on the Github page I also wanted to show some of the mechanisms that you can use to install them.

While the requirements for the setup are fairly documented I just wanted to highlight it a bit more.

  • Azure developer CLI can be downloaded from here Install the Azure Developer CLI (preview) | Microsoft Learn or using this CLI command
    powershell -ex AllSigned -c “Invoke-RestMethod ‘https://aka.ms/install-azd.ps1’ | Invoke-Expression”
  • The latest PowerShell version can easily be installed using this command
    winget install –id Microsoft.Powershell –source winget
  • The latest Python version can also be installed using winget
    winget install Python.Python.3.11
  • Make sure that when you run the command azd login that you do not use a Azure AD B2B account, but a native user account within the tenant that has correct permissions on the subscription level. The issue with this is
  • @andrehe001 , the principal id is coming from azd and it is expected to be the object id from the account used to login to Azure (when you ran azd login).
  • There is currently an issue on azd which gets the id from the account for the default Azure directory.
  • There is a workaround mentioned within the issue. You need to manually set the principal-id in case you face this issue
  • Once this issue is fixed, the principal id will be set by azd as expected for a multi-directory scenario.
  • This is known issue for azd version 0.7.0-beta.1.
  • It is fixed on azd and will be part of Apr release for azd version 0.8.0-beta.1`msandbu
  • Node can also be installed using winget winget install -e –id OpenJS.NodeJS and Git as well winget install –id Git.Git -e –source winget

azd login

If you are having issues with multiple subscriptions that you service principal has access too, you can use the following to predefine location and subscription.

azd config set default.subscription subid
azd config set default.location eastus

To begin you with the sample, you can just type azd init -t azure-search-openai-demo. Which will then download the github project locally and then do you the deploy using the command azd up.

The tool is going to use a predefined Python script which contains a PDF parser that can interpret PDF files, split them up, and upload them into Azure form recognizer. As seen in the script here.

You can also use this script to upload your own data (outside the existing data that Microsoft provides) and you can of course do a full redeploy using azd up which the new files stored under the /data folder, but it takes a long time to deploy, you can use the .\scripts\prepdocs.ps1 which will process all the documents within the /data folder.

It should be noted that at the moment, the script is building a search index specified for English, will update this blog post shortly once I get it properly working for other languages.

Once the deployment is done you will get notified within the CLI with the URL

The finished web frontend will look something like this.

Leave a Reply

Scroll to Top