Claude 3 Opus knocks GPT-4 from first place in the ranking of language models

Anthropic's Claude 3 Opus Large Language Model (LLM) outperformed OpenAI's GPT-4 (the model behind ChatGPT) for the first time in Chatbot Arena, a popular platform where users evaluate the performance of chatbots. “The king is dead,” software developer Nick Dobos wrote on social network X.

Image source: Anthropic

Chatbot Arena users visiting the site are asked to enter a query, after which two results from unspecified language models are shown – the person must choose which result they like best. After making thousands of comparisons, Chatbot Arena populates an updated ranking table. The site is run by the Large Model Systems Organization (LMSYS ORG), a research organization dedicated to open AI models.

“For the first time, an AI model not from OpenAI is at the top of the ranking: Opus for complex tasks, Haiku for options when you need it cheap and fast. This is encouraging – everyone will benefit from competition among developers. However, GPT-4 has been around for more than a year, and competitors are only now catching up with it,” independent AI researcher Simon Willison commented on the event.

There are currently four versions of GPT-4 in the Chatbot Arena rankings, as the model's output has changed with each update, and some users prefer specific versions or use all of them for more consistent results. GPT-4 appeared in Chatbot Arena on May 10, 2023, a week after the rankings launched, and since then, various versions of GPT-4 have consistently ranked at the top.

Chatbot Arena is valued by AI researchers for its ability to more or less objectively evaluate the effectiveness of chatbots, which is not easy, and the key factor here is the number of assessments that add up to the overall picture. Subjective assessments play a significant role in the field of AI, where the modeler can select specific metrics for advertising purposes. “I recently spent a lot of time programming with the Claude 3 Opus AI model, and it absolutely crushed GPT-4,” AI software developer Anton Bacaj wrote in X.

The success of Claude 3 from Anthropic, which is rushing to the top of the rankings, has already prompted some users to switch to it from GPT-4. Meanwhile, Gemini Advanced from Google is gaining popularity. OpenAI's position has been shaken, but the company is not resting on its laurels and is preparing new models, including GPT-5.

If you notice an error, select it with the mouse and press CTRL+ENTER.

Claude 3 Opus knocks GPT-4 from first place in the ranking of language models

Will Dragon's Dogma 2 have DLC or expansions? Capcom wants fans' opinion

No Man's Sky – Orbital update released

Aroged

Related Posts

play5 06/24 mit Coverstory zu Rise of the Ronin

The best cheap gaming monitors for great gaming even in 4K

Honkai Star Rail Coming to GeForce Now

What will you play this weekend? Stellar Blade, Manor Lords, Sand Land or something else?

No 5th season of Stranger Things for me: Netflix wants to cancel me because I reject the new prices – upcoming price increase is causing trouble

No Man's Sky – Orbital update released

Leave a Reply Cancel reply

Premium Content

Among all the deaths in Game of Thrones, the showrunners have a clear preference and it is one of the most violent

Aroged: IM Motors taught an electric car to pass the “moose test” at a speed of 71 km/h without a driver

Scientists tracked down a once-in-a-lifetime evolutionary event

Browse by Category

Categories

Recent Posts

Claude 3 Opus knocks GPT-4 from first place in the ranking of language models

Will Dragon's Dogma 2 have DLC or expansions? Capcom wants fans' opinion

No Man's Sky – Orbital update released

Related Posts

Leave a Reply Cancel reply

Premium Content

Browse by Category

Browse by Tags

Categories

Browse by Tag

Recent Posts