Anthropic's Claude 3 Opus Large Language Model (LLM) outperformed OpenAI's GPT-4 (the model behind ChatGPT) for the first time in Chatbot Arena, a popular platform where users evaluate the performance of chatbots. “The king is dead,” software developer Nick Dobos wrote on social network X.
Chatbot Arena users visiting the site are asked to enter a query, after which two results from unspecified language models are shown – the person must choose which result they like best. After making thousands of comparisons, Chatbot Arena populates an updated ranking table. The site is run by the Large Model Systems Organization (LMSYS ORG), a research organization dedicated to open AI models.
“For the first time, an AI model not from OpenAI is at the top of the ranking: Opus for complex tasks, Haiku for options when you need it cheap and fast. This is encouraging – everyone will benefit from competition among developers. However, GPT-4 has been around for more than a year, and competitors are only now catching up with it,” independent AI researcher Simon Willison commented on the event.
There are currently four versions of GPT-4 in the Chatbot Arena rankings, as the model's output has changed with each update, and some users prefer specific versions or use all of them for more consistent results. GPT-4 appeared in Chatbot Arena on May 10, 2023, a week after the rankings launched, and since then, various versions of GPT-4 have consistently ranked at the top.
Chatbot Arena is valued by AI researchers for its ability to more or less objectively evaluate the effectiveness of chatbots, which is not easy, and the key factor here is the number of assessments that add up to the overall picture. Subjective assessments play a significant role in the field of AI, where the modeler can select specific metrics for advertising purposes. “I recently spent a lot of time programming with the Claude 3 Opus AI model, and it absolutely crushed GPT-4,” AI software developer Anton Bacaj wrote in X.
The success of Claude 3 from Anthropic, which is rushing to the top of the rankings, has already prompted some users to switch to it from GPT-4. Meanwhile, Gemini Advanced from Google is gaining popularity. OpenAI's position has been shaken, but the company is not resting on its laurels and is preparing new models, including GPT-5.
If you notice an error, select it with the mouse and press CTRL+ENTER.