In December, OpenAI released ChatGPT to the public and took the internet by storm! It was stated that ChatGPT reached 100 million monthly active users just after two months and that it only took 5 days to reach 1 million users. In addition, more than a million signed up for the new ChatGPT-infused Bing search service in 48 hours, which might even make it worth trying Bing again? (ill save that for another blog post…)
Then as expected the OpenAI service and ChatGPT suffered too much traffic and have made the service unreliable at times given the high volumes of users. Then the team at OpenAI released a new service called ChatGPT Plus. ChatGPT Plus is subscription to that ChatGPT, that is priced at $20 per month and provides unrestricted access to ChatGPT, even during peak hours and quicker response times.
In addition to ChatGPT, OpenAI also has other services available trough their APIs. OpenAI developed powerful models like GPT-3, Code, DALL-E. GPT-3 model in general works with natural languages, Codex models can convert natural language into code (which many might know today is part of GitHub CoPilot), while DALL·E can produce images based on a natural language description.
In addition there are four models within the GPT-3 model: Ada, Babbage, Curie, and Davinci. Among them, Davinci is the most competent model in terms of the range of tasks it can perform with the instructions (prompts) given. Ada is the least competent model, but it is the fastest. You can read more about the different language models here –> Azure OpenAI Service models – Azure OpenAI | Microsoft Learn
Recently Microsoft released Azure OpenAI services, which is a new Azure Cognitive Service that allows customers to access OpenAI’s language models like GPT-3, Codex, and Embeddings model series through REST APIs from Microsoft Azure. Furthermore, Azure OpenAI Service has enterprise-grade features such as security, compliance, and regional availability that are only available on Azure compared to the service from OpenAI.
So what is the big difference between OpenAI and Azure OpenAI
While both services are hosted in Azure, there are some large differences
- ChatGPT is only available through the Web UI from OpenAI and is currently not available in the Azure OpenAI service. However, it became available from Microsoft now –> ChatGPT is now available in Azure OpenAI Service | Azure Blog and Updates | Microsoft Azure
- ChatGPT and OpenAI can collect a lot of data based upon their data usage policies – they can collect (prompts that are typed, output that is received) so you should never put sensitive information into ChatGPT (as also stated in their FAQ –> ChatGPT General FAQ | OpenAI Help Center)
- OpenAI has no description on where data is processed with the use of ChatGPT, most of the training of the language model is done within the US.
- Azure OpenAI Services can be deployed within three specific regions Azure. East US, South Central US and West Europe. API access to Azure OpenAI is similiar to what OpenAI has to their own services.
- Azure OpenAI Services is automatically encrypting data within the service with Microsoft managed keys, you also have the ability to encrypt the data stored within the service with your own keys as well.
- Azure OpenAI Services supports other network connectivity options allows us to use services like private endpoints to filter all communication to the service via a centralized network
- Azure OpenAI Services support use of managed identity to access the service unlike using only native API keys to authenticate to the service.
- Azure OpenAI Service utilizes prompts and completions to enhance its content management systems, as well as to identify and monitor abusive behavior. Microsoft personnel with appropriate authorization may access the prompt and completion data that our automated systems flag, especially for abuse investigations and verification. For customers using the Azure OpenAI Service within the European Union, only authorized Microsoft employees within the EU can access such data. The data collected may also be used to improve the content management systems. If a policy violation is confirmed, we may be asked to take immediate remedial measures and prevent further abuse. Failure to address this may result in the suspension or termination of our access to Azure OpenAI.
This picture below shows how data is processed within Azure OpenAI.
What are the available Customer controls for data retention in Azure OpenAI?
The storage of data related to training, validation, and training results can be achieved through the Files API by uploading the training data to fine-tune a model. This uploaded data is stored in Azure Storage, encrypted by Microsoft Managed keys while at rest, and located within the same region as the resource. The data is logically isolated using the user’s Azure subscription and API Credentials.
For creating our own fine-tuned version of OpenAI models, the Fine-tunes API is used with training data uploaded via the Files API. The fine-tuned models created are also stored in Azure Storage in the same region, encrypted at rest and logically isolated using the user’s Azure subscription and API credentials. The user can delete these fine-tuned models using the DELETE API operation.
Text prompts, queries, and responses are stored temporarily by the Azure OpenAI Service for up to 30 days. The data is encrypted and can only be accessed by authorized engineers for debugging purposes in case of system failure or for investigating patterns of abuse and misuse. Additionally, prompts and completions that have been flagged for abuse or misuse can be used to improve the content filtering system.
Currently, Azure OpenAI can be considered a more enterprise-friendly version of OpenAI APIs, providing the ability to define data processing and storage location for training data. Additionally, it allows for easy integration with other Cognitive services within Microsoft Azure and using existing network features to have more control. However while ChatGPT is not yet part of the Azure OpenAI ecosystem yet, it is expected to be added soon, and I expect that this is only the beginning and that we will see more features in this ecosystem soon.