Google’s contract partners, working to improve the quality of responses from Google’s Gemini AI chatbot, compare them with responses from Anthropic’s competing chatbot Claude, TechCrunch writes, citing internal company correspondence. At the same time, Google left unanswered TechCrunch’s question about whether it received permission to use Claude in testing with Gemini.
Companies often evaluate the effectiveness of developed AI models in comparison with the developments of competitors using industry benchmarks, rather than instructing contractors to compare them with the AI capabilities of their competitors.
Google’s contract developers working to improve Gemini must evaluate each model response based on several criteria, such as confidence and level of detail. According to correspondence published by TechCrunch, they are given up to 30 minutes per request to determine whose answer is better – Gemini or Claude.
The developers report that Claude’s responses are more security-focused than Gemini’s. “Claude’s security settings are the strictest” among AI models, noted one of the contract developers in the service chat. In some cases, Claude did not respond to prompts that he considered unsafe, such as the suggestion of role-playing with another AI assistant. In another case, Claude avoided answering a clue while Gemini’s answer was marked as “gross violation of safety rules”because it included “nudity and tying up.”
Shira McNamara, a spokeswoman for Google DeepMind, the developer of Gemini, did not respond to TechCrunch’s question about whether Google had received Anthropic’s permission to use Claude. She clarified that DeepMind “compares simulation results” for evaluation, but does not train Gemini to use Anthropic models. “Any suggestion that we used Anthropic models for Gemini training is inaccurate” McNamara said.
If you notice an error, select it with the mouse and press CTRL+ENTER.