Generative AI – Summer Edition 2024 – What´s new?

Even if many of us are in vacation mode, it is not exactly quiet around Generativ AI. Therefore I wanted to write a little update on what is happening on this front. From what is happening from the cloud providers, LLM companies, ecosystem and what the future holds. 

Firstly, there is a lot of development around “virtual agents” which in simple terms, is just a process running on a machine or as a container with its own set of instructions, functions and a language model underneath. Some of the AI Agent ecosystems are for instance CrewAI or AutoGen from Microsoft.

However there are multiple other frameworks here as well from the cloud providers such as Microsoft Copilot Studio, AWS Bedrock Agents or Google Vertex AI Agents. Even databricks has created their own Mosaic AI Agent Announcing Mosaic AI Agent Framework and Agent Evaluation | Databricks Blog

Earlier today also Meta launched their new LLaMa 3.1 consisting of different models, including the new one Llama 3.1 405B which is the largest even open-source LLM, according to Meta. This LLM is already now available from the largest cloud providers such as Google Vertex, AWS Bedrock and Azure AI Studio. 

It is also available from NVIDIA NIM to run-on premises and can also be used through Ollama (as long as you have sufficient hardware) The LLM has a larger context window support and excels at function calling and also supports multi-modality and 7 different languages . 

Meta also published a technical document which goes into detail on the LLM https://scontent.forn3-1.fna.fbcdn.net/v/t39.2365-6/452387774_1036916434819166_4173978747091533306_n.pdf?_nc_cat=104&ccb=1-7&_nc_sid=3c67a6&_nc_ohc=t6egZJ8QdI4Q7kNvgEdEGDa&_nc_ht=scontent.forn3-1.fna&oh=00_AYCTLRZtZwNDwtnohGAOscjYpOeJYDL351LGTtqHzV0_rg&oe=66A59A0D 

The building blocks around creating an “Enterprise” Generativ AI service has up until now been mostly around using different open-source components and there is not a standardized approach. Many are using tools like Semantic Kernel from Microsoft or Langchain which is used by many to build orchestration and RAG based services. 

Now Intel wishes together with others to create a standardized approach which they call “Open Platform for Enterprise AI:  Introducing The Open Platform for Enterprise AI (intel.com)

OpenAI also released GPT-4o Mini as the standard LLM now as part of ChatGPT. This model is a vast improvement of the old “ChatGPT 3.5” and also a lot cheaper for token processing. This has also now gotten available as part of Azure OpenAI

https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/ and here as now part of Azure OpenAI OpenAI’s fastest model, GPT-4o mini is now available on Azure AI | Microsoft Azure Blog

Microsoft also announced fine-tuning for Phi-3-mini and Phi-3 medium Announcing Phi-3 fine-tuning, new generative AI models, and other Azure AI updates to empower organizations to customize and scale AI applications | Microsoft Azure Blog

OpenAI also announced SearchGPT which is their approach to an intelligent search engine.   SearchGPT is a prototype of new AI search features | OpenAI which will be quite similar to Perplexity. 

It should also be noted now that Copilot for Microsoft 365 has now also started to use GPT-4 Turbo (Since the summary feature in Word now supports upwards to 80,000 words in a single document) and only GPT-4 turbo and the later LLM models from OpenAI support this amount of information within the context window.

Github has now also started to use GPT-4o as their standard LLM for Github Copilot 

GitHub Copilot Enterprise on GPT-4o – The GitHub Blog

ChatGPT has now also officially launched their desktop app for Mac, which can also use computer vision ChatGPT on your desktop | OpenAI

Also Mistral yesterday announced Mistral Large 2. Mistral Large 2 has a 128k context window and supports dozens of languages including French, German, Spanish, Italian, Portuguese, Arabic, Hindi, Russian, Chinese, Japanese, and Korean, along with 80+ coding languages including Python, Java, C, C++, JavaScript, and Bash.

The interesting part about this is that Mistral Large has even higher accuracy on function calling compared to GPT-4o as well, and als for some programming languages. 

We already have a fairly large ecosystem of different language models that it gets a bit difficult to keep track of the different vendors, features, context size support, language support and other capabilities. Luckily there is a new site called Artificial Analysis that has created a nice chart which is up to date on the different models and also shows benchmarks of the different models. 

Model & API Providers Analysis | Artificial Analysis

After the announcement from OpenAI around Sora earlier this year, that had the ability to create video from text, we have also seen a lot of new features and improvement in quality from Runway as well (that also provides text-to-video) with their new release from Gen3-Alpha

Gen-3 Alpha | Runway (runwayml.com)

AWS Also announced during AWS Summit in New York a wide range of new capabilities on their GenAI Ecosystem. 

Firstly they launched AWS App Studio, which allows you to create AWS based applications using natural language. They also announced AWS Q Developer that allows you to talk directly with your source code, providing a good alternative to Github Copilot. 

They also announced 

  • Vector search capabilites for MemoryDB
  • Amazon Bedrock has also added a bunch of new data connetors that allows it to easily connect 3.party data sources that can be used for GenAI applications as part of a RAG application
  • Fine-tuning for Claude 3 Haiku. 

You can read more of the other announcements here 


AWS AI Top Announcements of the AWS Summit in New York, 2024 | AWS News Blog (amazon.com)

Also some other releases that also show-cases how much development that is happening here. 

Stable Diffustion Stable Diffusion 3 Medium — Stability AI (kjøre lokalt) 

Visual Studio Code AI Toolkit: Visual Studio Code AI Toolkit: How to Run LLMs locally (microsoft.com)

Claude 3 Sonet Introducing Claude 3.5 Sonnet \ Anthropic

Github Codebases: Managing Copilot knowledge bases – GitHub Enterprise Cloud Docs

Lamini Memory Tuning – Introducing Lamini Memory Tuning: 95% LLM Accuracy, 10x Fewer Hallucinations | Lamini – Enterprise LLM Platform

Phi-3 mini updates  microsoft/Phi-3-mini-128k-instruct · Hugging Face

Salesforce releases their own LLM called xLAM which is a micro LLM of only 1 billion parameters with function calling and out-performs other larger LLMs on certain tasks. xLAM Marc Benioff on X: “Meet Salesforce Einstein “Tiny Giant.” Our 1B parameter model xLAM-1B is now the best micro model for function calling, outperforming models 7x its size, including GPT-3.5 & Claude. On-device agentic AI is here. Congrats Salesforce Research! Paper: https://t.co/SrntYvgxR5 https://t.co/pPgIzk82xT” / X

Microsoft GraphRAG GraphRAG: New tool for complex data discovery now on GitHub – Microsoft Research

Ollama 0.2 available ollama on X: “Ollama 0.2 is here! Concurrency is now enabled by default. https://t.co/UvvgrIeCjv This unlocks 2 major features: Parallel requests Ollama can now serve multiple requests at the same time, using only a little bit of additional memory for each request. This enables use cases https://t.co/XlCXqnDGaQ” / X

Groq supports Whisper (Speech-to-text) with insanely high rate of token generation Groq Inc on X: “👀We quietly rolled out Whisper V3 Large in GroqCloud. Now Developers can build using the Speech-to-Text capabilities of Whisper with our speed. Try it right away yourself, it’s built into GroqChat now for everyone to experience. Build on! https://t.co/z4bYbmtQBE” / X

Kyutai is a new french based AI company coming with a open-source alternative to GPT-4 with support for multimodality Lior⚡ on X: “Kyutai, a french AI lab with $300M in funding, just unveiled Moshi, an open-source GPT-4o competitor. Moshi is a real-time multimodal model that can listen, hear, and speak. Code, model, and paper will be release soon. @kyutai_labs https://t.co/Dyt1ik3zbZ” / X

Meta released new research to improve how LLMs generate tokens, by default most LLMs try and “predict” or calculate the most likely next token in a given context. Meta wants to improve the quality by having LLMs to predict multiple tokens instead of a single one. 

Multi-token prediction facebook/multi-token-prediction · Hugging Face

Gemma 2 from Google, which is their next release in OpenSource based LLMs  Google launches Gemma 2, its next generation of open models (blog.google)

How to evaluate LLM in enterprises  Evaluating large language models in business | Google Cloud Blog

Gemini 1.5 Flash Vertex AI offers enterprise-ready generative AI | Google Cloud Blog

Leave a Reply

Scroll to Top