How to Deploy Torch Models with TorchServe

A simple guide to deploying PyTorch models.

3 min readAug 8, 2024

TorchServe is a flexible and easy-to-use tool for serving PyTorch models. It is a model-serving library that makes it easy to deploy and manage PyTorch models at scale.

Overview

Save the model binaries as a .mar file.
Build & Start the TorchServe server (with Docker).
Make predictions by querying the inference API.

Installation

Install the CLI toolkit and the TorchServe server.

# conda
conda install torchserve torch-model-archiver torch-workflow-archiver -c pytorch

# pip
pip install torchserve torch-model-archiver torch-workflow-archiver

Save the model binaries

Download (or train) a PyTorch model. Here I’ll use densenet161 to predict the type of house cat.

wget https://download.pytorch.org/models/densenet161-8d451a50.pth

Archive the model by using the Torch model archiver.

torch-model-archiver \
 - model-name densenet161 \
 - version 1.0 \
 - model-file ./serve/examples/image_classifier/densenet_161/model.py \
 - serialized-file ./model_store/densenet161–8d451a50.pth \
 - export-path ./model_store \
 - extra-files ./serve/examples/image_classifier/index_to_name.json \
 - handler image_classifier

Build & Start the TorchServe server

Build the server as a Docker image.

I recommend using Docker to build/run the server because it is easier to manage the dependencies & deploy the server to the cloud.

Set up a Dockerfile to build the server image.

# Use the official PyTorch image
FROM pytorch/torchserve

# Copy your .mar file into the container
COPY /model_store/densenet161.mar /home/model-server/model-store/

# Set the model store and start TorchServe
CMD ["torchserve", " - start", " - ncs", " - model-store", "/home/model-server/model-store", " - models", "densenet161.mar"]

Authorization Token

By default, TorchServe requires an authorization token to make predictions. You can set the TS_DISABLE_TOKEN_AUTHORIZATION environment variable to disable this feature. See the TorchServe docs — auth API for an example.

You can find autogenerated runtime tokens in Docker container ./key_file.json.

API tokens. Access your tokens by launching a terminal in the Docker container and viewing the ./key_file.json (run: cat key_file.json).

Build & run the TorchServe Server (Docker image).

Build the Docker image.

docker build -t torchserve-densenet .

A Docker image should be created (2.1 GB).

Run the server. Port 8080 is the default port for the inference API, and port 8081 is the default port for the management API.

docker run -p 8080:8080 -p 8081:8081 torchserve-densenet

Note: You can launch a terminal inside the container.

docker exec -u 0 -it <container_id_or_name> /bin/bash

Exit with Ctrl + D.

The server will start automatically because of the CMD command in our Dockerfile.

Make predictions with the inference API.

Test that a connection can be made to the server.

# If token disabled
curl http://localhost:8080/ping

# Alternatively, with an API key
curl -H "Authorization: Bearer <INFERENCE API KEY>" http://localhost:8080/ping

Make predictions

Finally, we query the model. This model predicts the type of housecat from an image. Call the predictions API and pass the image input as an argument (kitten_small.jpg).

curl -H "Authorization: Bearer <TOKEN>" http://127.0.0.1:8080/predictions/densenet161 -T kitten_small.jpg

Getting to Production

To get this model you will likely serve the image on Kubernetes cluster or AWS ECS cluster (Fargate/Elastic Kubernetes Service).