How to Deploy Torch Models with TorchServe
A simple guide to deploying PyTorch models.
TorchServe is a flexible and easy-to-use tool for serving PyTorch models. It is a model-serving library that makes it easy to deploy and manage PyTorch models at scale.
Overview
- Save the model binaries as a
.mar
file. - Build & Start the TorchServe server (with Docker).
- Make predictions by querying the inference API.
Installation
Install the CLI toolkit and the TorchServe server.
# conda
conda install torchserve torch-model-archiver torch-workflow-archiver -c pytorch
# pip
pip install torchserve torch-model-archiver torch-workflow-archiver
Save the model binaries
Download (or train) a PyTorch model. Here I’ll use densenet161
to predict the type of house cat.
wget https://download.pytorch.org/models/densenet161-8d451a50.pth
Archive the model by using the Torch model archiver.
torch-model-archiver \
- model-name densenet161 \
- version 1.0 \
- model-file ./serve/examples/image_classifier/densenet_161/model.py \
- serialized-file ./model_store/densenet161–8d451a50.pth \
- export-path ./model_store \
- extra-files ./serve/examples/image_classifier/index_to_name.json \
- handler image_classifier
Build & Start the TorchServe server
Build the server as a Docker image.
I recommend using Docker
to build/run the server because it is easier to manage the dependencies & deploy the server to the cloud.
Set up a Dockerfile
to build the server image.
# Use the official PyTorch image
FROM pytorch/torchserve
# Copy your .mar file into the container
COPY /model_store/densenet161.mar /home/model-server/model-store/
# Set the model store and start TorchServe
CMD ["torchserve", " - start", " - ncs", " - model-store", "/home/model-server/model-store", " - models", "densenet161.mar"]
Authorization Token
By default, TorchServe requires an authorization token to make predictions. You can set the TS_DISABLE_TOKEN_AUTHORIZATION
environment variable to disable this feature. See the TorchServe docs — auth API for an example.
You can find autogenerated runtime tokens in Docker
container ./key_file.json
.
Build & run the TorchServe Server (Docker image).
Build the Docker image.
docker build -t torchserve-densenet .
Run the server. Port 8080
is the default port for the inference API
, and port 8081
is the default port for the management API
.
docker run -p 8080:8080 -p 8081:8081 torchserve-densenet
Note: You can launch a terminal inside the container.
docker exec -u 0 -it <container_id_or_name> /bin/bash
Exit with Ctrl + D
.
The server will start automatically because of the CMD
command in our Dockerfile.
Make predictions with the inference API.
Test that a connection can be made to the server.
# If token disabled
curl http://localhost:8080/ping
# Alternatively, with an API key
curl -H "Authorization: Bearer <INFERENCE API KEY>" http://localhost:8080/ping
Make predictions
Finally, we query the model. This model predicts the type of housecat from an image. Call the predictions API
and pass the image input as an argument (kitten_small.jpg
).
curl -H "Authorization: Bearer <TOKEN>" http://127.0.0.1:8080/predictions/densenet161 -T kitten_small.jpg
Getting to Production
To get this model you will likely serve the image on Kubernetes cluster or AWS ECS cluster (Fargate/Elastic Kubernetes Service).