Writer Palmyra LLM is designed to support over 30 languages, including Arabic, French, Spanish, Hindi, Simplified Chinese, Traditional Chinese, and more. This page provides an overview of our capabilities, performance benchmarks, and prompting examples on how to leverage these features.

When it comes to multi-language capabilities, there are two primary categories to consider: generation and translation. Generation typically refers to the ability to understand/create content, answer questions, and converse, all within the same language. Translation typically refers to the ability to transform text to and from English, where either the input or output language is English.

On this page, we display two of the many benchmarks we use to evaluate multi-language performance in our Palmyra LLMs. Writer Palmyra has the highest performance of any production LLM in the Holistic Evaluation of Language Models (HELM), an LLM evaluation framework developed by Stanford CRFM to serve as a living benchmark for the community, continuously updated with new scenarios, metrics, and models. While there are limited benchmarks available for evaluating text generation and translation in different languages, we have achieved some of the highest scores in both MMLU and BLEU for other languages.

One benchmark that Writer uses to evaluate text generation performance is MMLU (Massive Multitask Language Understanding). The MLMM evaluation covers 57 tasks including elementary mathematics, U.S. history, computer science, law, and more. To attain high accuracy on this test, models must possess extensive world knowledge and problem solving ability.

One benchmark that Writer uses to evaluate text translation performance is BLEU (Bilingual Evaluation Understudy). It’s worth noting that any BLEU score above 60 indicates a higher quality translation than a human translation.

While Palmyra’s core competency lies in the text generation realm, translation use cases are possible. However, it’s important to exercise caution in languages where benchmarks are not yet established (we are actively working on establishing these benchmarks). We believe in transparency and advise potential users to be aware of this caveat.

Therefore, any outputs or usage of Writer LLM should always be accompanied by the guidance of a human expert. We are continuously evaluating and refining our capabilities, and we are committed to learning with our customers.

LanguageMMLU/MLMMBLEU (source \ English)
Arabic68.961.2
Bengali63.354.4
Bulgarian76.364.2
Chinese simplified71.763.8
Chinese traditional73.757.0
Croatian64.966.4
Czech-52.5
Danish77.770.5
Dutch73.673.9
English70.2-
Finnish-68.9
French69.163.1
German70.471.3
Greek-60.4
Hebrew-67.8
Hindi77.968.4
Hungarian67.765.3
Indonesian67.863.5
Italian72.570.9
Japanese73.566.8
Korean-56.8
Lithuanian-59.3
Polish-60.6
Portuguese-66.2
Romanian70.967.6
Russian75.165.2
Spanish72.579.3
Swahili-62.8
Swedish-63.2
Thai-54.7
Turkish64.157.5
Ukrainian75.268.0
Vietnamese72.560.3

Dialect support

Writer Palmyra LLM also supports outputting in specific language dialects. The best results come from using a prompt with the following characteristics:

  1. The prompt itself is in the desired language and dialect
  2. The prompt clearly describes the type of dialect (e.g. “It’s essential that you use the Spanish spoken in Spain.“)
  3. The prompt provides specific examples of the dialect, both vocabulary and grammatical differences

The following example, although not in the desired language for simplicity’s sake, is an example of an optimal prompt that asks for a translation in Spanish spoken in Spain.

Hello, good afternoon! I need you to help me translate the following text. It’s essential that you use the Spanish spoken in Spain. For example, you should use words like “coche” and/or “patata” instead of “carro” and/or “pap.” Additionally, you need to pay attention to grammatical differences, such as the use of “voy a por” (Spain) instead of “voy por” (Latin America), or the structure of sentences like “hoy he comido una manzana” instead of of “hoy comí una manzana.” I prefer that you use “vosotros” (speak) instead of “ustedes” (speak), unless it’s necessary to write very formally. Here is the text to be translated:
[text you want translated]

Basic prompt examples

Translation

Read the content of this source. Provide me with a translation of all its contents in French: https://writer.com/blog/ai-guardrails/

Text generation

Please write a blog post about the importance of productivity for small businesses in Arabic.

Native multi-language support

人工知能の歴史と大規模言語モデルの開発について、短い段落を書いてください。読者はビジネステクノロジーニュースに興味がありますが、技術的なバックグラウンドはありません。技術的な概念を8年生の読解レベルで簡潔に説明してください。