Deploying HuggingFace Models to Amazon SageMaker

Sagemaker Notebook Setup

For this example, we’ll be working with traditional SageMaker notebook instances. First go to SageMaker on the AWS Console, and then click on Notebook Instances. Within Notebook Instances, you’ll have the option to create a Notebook Instance. Here you can select an instance with appropriate computing power. Since this example is not that intensive, so we can go with a cheaper ml.t3.medium.

Next you’ll be required to create a role, which gives your instance permissions to work with other services if necessary. For this use case keep the default execution role with SageMaker full access as we will not be working with any other AWS Services. Next create the Notebook Instance and you should be given an option to access a Jupyter Lab environment that you can work in. Here you can create a Notebook with an appropriate Kernel. The Python3 Kernel will be adequate for this example. Now we have a Notebook where we can deploy our HF model.

HF Deployment on SageMaker

To access the model that we are working with go to the following HF Hub link. This model is pre-trained and can complete a variety of tasks, we’ll be using it specifically for text classification. What’s awesome about using pre-trained models from the Hub is that they already give us instructions for deploying on SageMaker as they’re integrated together. Once on the HF Hub link, in the top right you’ll notice deploy. If you click on the options you see Amazon SageMaker.

After clicking on SageMaker, you can pick the task type (text classification) and for configuration pick AWS. This will provide the boilerplate code that you can utilize in your Notebook Instance to deploy this specific model.

Import and Hub configuration

The HF Model ID and Task is how the HuggingFace container will understand what model we are working with and what problem it is trying to solve. We can now define the HuggingFace Model built-in through SageMaker.

Create HF SageMaker Model

After creating the HuggingFace Model the last and simple step is deploying it for inference with a SageMaker real-time endpoint.

Deploy HuggingFace Model to Real-Time Endpoint

After the endpoint is deployed successfully, we can quickly test some sample inferences.

Inference with SageMaker

With this we should see text classification results display with the majority class.

Make sure to delete the endpoint to not incur any additional costs if not using it currently.