This guide explains the Text generation endpoint which can be thought of as asking the Palmyra LLM a single question.

This Text Generation guide is intended to help you understand how to interact with the Text Generation API effectively. Below, you’ll find an overview of how to make requests to the endpoint, and handle the responses.

Your API key can be generated using these steps.

Endpoint overview

This endpoint is designed to generate text based on the input parameters provided. You can specify various aspects of the completion request, such as the model, prompt, maximum number of tokens, temperature, and streaming behavior.

The response will include the generated text along with the model used. The structure differs based on the value of the stream parameter.

Usage example

Here’s how you can use the endpoint in a practical scenario:


Create a request

Use your preferred HTTP client to send a POST request to with the JSON payload.

curl --location '' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <token>' \
--data '{
  "model": "palmyra-x-v2",
  "prompt": "Tell me a story",
  "max_tokens": 2048,
  "temperature": 0.7,
  "stream": false

Handle the response

For non-streaming responses, parse the JSON to access the generated text. For streaming responses, handle each chunk as it arrives and concatenate or process it as needed.

  "choices": [
      "text": "Generated story here..."
  "model": "palmyra-x-v2"

Error handling

Ensure to handle potential errors such as timeouts or model errors gracefully.


Rate limiting

Be mindful of any rate limits that might apply to avoid service disruptions.

By following this guide, you can integrate our Text Generation endpoint into your applications to leverage powerful text generation capabilities for various purposes, enhancing user interaction and content creation.