Loading leaderboard data...

Benchmark Scale

13 LLMs
×
20 Languages
×
3 Samples
=
780
Total Translations

13 LLM Models

Recent flagships and popular models.

Claude Opus 4.1
Claude Sonnet 4
DeepSeek DeepSeek Chat v3
Gemini Gemini 2.5 Flash
Gemini Gemini 2.5 Pro
Google Gemma 3 12B IT
GPT-4o
GPT-5
Grok 4
Grok Code Fast 1
Meta Llama 3.3 70B
Mistral Mistral Nemo
Qwen Qwen3 30B A3B

20 Languages

Global linguistic diversity spanning major world regions.

Saudi Arabia Arabic
China Chinese
Czech Republic Czech
Netherlands Dutch
France French
Germany German
Greece Greek
Israel Hebrew
Indonesia Indonesian
Italy Italian
Japan Japanese
South Korea Korean
Poland Polish
Portugal Portuguese
Russia Russian
Spain Spanish
Sweden Swedish
Thailand Thai
Turkey Turkish
Vietnam Vietnamese

3 Text Samples

6 Scoring Criteria

Each translation evaluated by GPT-5 across these dimensions.

Accuracy
Fluency
Style
Completeness
Cultural
Technical