On-Prem Model Deployment (through Github)

Below is a same guide on how to deploying Writer's Palmyra-X (40B) model. This step-by-step guide is the same for all of our other models

Prerequisites:

  • A virtual machine (VM) with 4A100 80GB GPU to run the model
  • Python 3.6 or higher
  • Git

Follow these steps to deploy the Palmyra-X model:

Step 1: Create a virtual environment
1.1. Open a terminal and navigate to your desired working directory.
1.2. Run the following commands to create a virtual environment and activate it:

python -m venv palmyra-x-env
source palmyra-x-env/bin/activate  # On Linux and macOS
palmyra-x-env\Scripts\activate     # On Windows

Step 2: Clone the Palmyra-X repository
2.1. Run the following command to clone the Palmyra-X GitHub repository:

git clone https://github.com/writerai/palmyra-x.git

2.2. Change your working directory to the cloned repository folder:

cd palmyra-x

Step 3: Install required packages
3.1. Run the following command to install the necessary packages:

pip install -r requirements.txt

Step 4: Run the Palmyra-X model
4.1. Create a Python script (e.g., run_palmyra_x.py) and paste the provided usage code into the script.

4.2. Save the script and run it using the following command:

python run_palmyra_x.py

4.3. The script will download the Palmyra-X model and generate a response based on the given instruction. You should see the output on the terminal.

Step 5: (Optional) Deploy the model as a REST API
5.1. Install FastAPI and Uvicorn by running the following command:

pip install fastapi uvicorn

5.2. Create a new Python script (e.g., palmyra_x_api.py) and paste the following code into the script:

from fastapi import FastAPI
from pydantic import BaseModel

app = FastAPI()

class InstructionInput(BaseModel):
    instruction: str

@app.post("/generate/")
async def generate_response(input_data: InstructionInput):
    instruction = input_data.instruction
    # Replace the following line with the Palmyra-X model code
    response = f"Generated response for: {instruction}"
    return {"response": response}

5.3. Replace the response = f"Generated response for: {instruction}" line with the Palmyra-X model code.

5.4. Save the script and run the API server using the following command:

uvicorn palmyra_x_api:app --host 0.0.0.0 --port 8000

5.5. The REST API is now running and accessible at http://localhost:8000. You can send POST requests to the /generate/ endpoint with an instruction to receive generated responses.

With these steps, you have successfully deployed the Palmyra-X model on your virtual machine. You can further customize the deployment to suit your specific requirements.