Source: AI Faner
xAI today released the new generation of large language model Grok-3 and its simplified version Grok-3 mini. The latest benchmark tests show that Grok-3 exhibits significant advantages in direct comparison with DeepSeek.
In the mathematics ability test (AIME'24), Grok-3 scored 52 points, significantly higher than DeepSeek-V3's 39 points. In terms of scientific knowledge assessment (GPQA), Grok-3 leads with a score of 75, while DeepSeek-V3 scores 65. In the programming ability test (LCB Oct-Feb), Grok-3 also surpassed DeepSeek-V3 with 57 points to 36 points.

In the latest AIME 2025 performance test, the Grok-3 Reasoning Beta version achieved an excellent score of 93 points in the composite score of reasoning and computing time, and its streamlined version Grok-3 mini also reached 90 points. In comparison, DeepSeek-R1 scored 75 points, while Gemini-2 Flash Thinking scored only 54 points. This result further highlights the outstanding advantages of Grok-3 in complex mathematical reasoning and computational efficiency.

It is particularly noteworthy that DeepSeek-R1, recently released by DeepSeek, also failed to surpass Grok-3 in other reasoning capability tests. In mathematical reasoning, Grok-3 scored 93 points and DeepSeek-R1 scored 73 points; in scientific reasoning, Grok-3 scored 85 points and DeepSeek-R1 scored 74 points; in programming reasoning, Grok-3 reached 79 points, while DeepSeek-R1 scored 65 points.

In addition, in the LMSYS chatbot arena evaluation, Grok-3 scored about 1,400 points, not only surpassing the DeepSeek series, but also ahead of other mainstream large models, including GPT-4, Claude, etc.

These data show that although DeepSeek has shown strong development momentum in the past few months, Grok-3's overall performance still maintains its leading position. In particular, the advantages in mathematical reasoning and computing efficiency are more obvious, which not only reflects xAI's technical strength in model research and development, but also shows the fierce competition in the AI field.