Deploying PyTorch Models in C++
Running Machine Learning models on embedded systems (Part I)
Python dominates the Machine Learning ecosystem, and we all benefit for it. (Computationally) expensive optimisation & combinatoric algorithms are written in C++, and wrapped in a user-friendly Python frontend (PyTorch, Scikit etc).
Most ML workflows are 99% Python & SQL, but what if you’re working with embedded systems, robotics or in an ultra-low latency (on-device) system? You may want to perform inference calls directly from an existing C++ application.
Using TorchScript, it’s possible to remove the need for the Python interpreter when loading a PyTorch model in C++
This tutorial demonstrates how to train a Neural Network in Python and deploy the model in C++
Overview
- Train (or download) a PyTorch model.
- Checkpoint & save the model.
- Convert the model to TorchScript (a serialisation independent of the Python interpreter).
- Install and link LibTorch to your C++ application.
- Perform inference.
Code
The code & tutorial is available here. Clone the repo and follow the steps highlighted in the Readme
to follow along. For reference when debugging, I’m running MacOS
on Apple silicon M1
.
1. Train & save a PyTorch model
Configure the Neural Network
We’ll train a multiclass-classifier PyTorch Neural Network to predict an item in the Fashion MNIST dataset. There is a script called exe_1_download_data.py
to automatically download the data for you in the repo.
Our neural net is a simple feed-forward network that uses a Relu activation later between several linear layers.
class NeuralNetwork(nn.Module):
def __init__(self):
super().__init__()
self.flatten = nn.Flatten()
self.linear_relu_stack = nn.Sequential(
nn.Linear(28 * 28, 512),
nn.ReLU(),
nn.Linear(512, 512),
nn.ReLU(),
nn.Linear(512, 10),
)
def forward(self, x):
x = self.flatten(x)
logits = self.linear_relu_stack(x)
return logits
Define a train
function to execute the training loop, using autograd
to perform backpropagation.
def train(train_dataloader, device, model, loss_fn, metrics_fn, optimizer, epoch, checkpoint_dir):
model.train()
for batch, (X, y) in enumerate(train_dataloader):
X, y = X.to(device), y.to(device)
# Compute prediction error
pred = model(X)
loss = loss_fn(pred, y)
accuracy = metrics_fn(pred, y)
# Backpropagation
loss.backward()
optimizer.step()
optimizer.zero_grad()
if batch % 100 == 0:
loss, current = loss.item(), batch
mlflow.log_metric("loss", loss, step=(epoch * len(train_dataloader) + batch))
mlflow.log_metric("accuracy", accuracy.item(), step=(epoch * len(train_dataloader) + batch))
print(
f"loss: {loss:3f} accuracy: {accuracy:3f} [{current} / {len(train_dataloader)}]"
)
# Save checkpoint every 500 batches
if batch % 500 == 0:
checkpoint = {
'epoch': epoch,
'batch': batch,
'model_state_dict': model.state_dict(),
'optimizer_state_dict': optimizer.state_dict(),
'loss': loss,
}
checkpoint_path = os.path.join(checkpoint_dir, f'checkpoint_epoch_{epoch}_batch_{batch}.pth')
torch.save(checkpoint, checkpoint_path)
mlflow.log_artifact(checkpoint_path)
Notice we’re using MLflow to track the experiment and log artifacts, as well as staging the model during training (checkpointing). This will be used to relaunch training and load the model later.
Launch a training job
exe_2_launch_training.py
launches a training job. The training code takes a few command line arguments to allow for checkpointing and relaunching training from the last checkpoint. This is helpful for spinning up cheap (spot) cloud instances.
# take args
parser = argparse.ArgumentParser(description='Train a neural network.')
parser.add_argument('-e', '--epochs', type=int, default=3, help='Number of epochs to train.')
parser.add_argument('-d', '--data_path', type=str, default='data', help='Path to training/testing data.')
parser.add_argument('-r', '--resume', type=bool, default=True, help='Resume training from checkpoint.')
parser.add_argument('-b', '--batch_size', type=int, default=64, help='Batch size for training.')
Configure the training runtime.
# load data ------------------------------------------------------------------------------++
training_data = torch.load(f'{DATA_PATH}/training_data.pt')
train_dataloader = DataLoader(training_data, batch_size=64, shuffle=True)
# load data ------------------------------------------------------------------------------++
# define model architecture --------------------------------------------------------------++
loss_fn = nn.CrossEntropyLoss()
metric_fn = Accuracy(task="multiclass", num_classes=10).to(device)
model = NeuralNetwork().to(device)
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)
# define model architecture --------------------------------------------------------------++
# mlflow model signature --- -------------------------------------------------------------++
input = np.random.uniform(size=[1, 28, 28])
input = torch.tensor(input).float()
signature = mlflow.models.infer_signature(
input.numpy(),
model(input).detach().numpy(),
)
# print('signature: ', signature)
# mlflow model signature -----------------------------------------------------------------++
Launch training within an MLflow run block, logging metrics, parameters & metadata with MLflow. We also register the model version after the training loop completes.
# launch traning
with mlflow.start_run() as run:
params = {
"epochs": EPOCHS,
"start_epoch": START_EPOCH,
"end_epoch": END_EPOCH,
"device": device,
"learning_rate": 1e-3,
"batch_size": 64,
"loss_function": loss_fn.__class__.__name__,
"metric_function": metric_fn.__class__.__name__,
"optimizer": "SGD",
}
# Log training parameters.
mlflow.log_params(params)
# Log model summary.
with open("model_summary.txt", "w") as f:
f.write(str(summary(model)))
mlflow.log_artifact("model_summary.txt")
checkpoint_dir = "checkpoints"
os.makedirs(checkpoint_dir, exist_ok=True)
if not NEW_RUN:
model, optimizer, start_epoch, start_batch, loss = load_checkpoint(CHECKPOINT_PATH, model, optimizer)
else:
start_epoch, start_batch = 0, 0
# Then start your training loop from start_epoch and start_batch
for t in range(START_EPOCH, END_EPOCH):
print(f"Epoch {t+1}\n-------------------------------")
train(train_dataloader, device, model, loss_fn, metric_fn, optimizer, t, checkpoint_dir)
# Save the trained model to MLflow.
mlflow.pytorch.log_model(model, "model", signature=signature)
# Register the model with the latest run -------------------------------------------------++
client = MlflowClient()
run_id = run.info.run_id
model_name = "fashion_mnist_classifier"
model_version = mlflow.register_model(f"runs:/{run_id}/model", model_name)
print(f"Model registered with name: {model_name}")
print(f"Model version: {model_version.version}")
# Register the model with the latest run -------------------------------------------------++
You can get training diagnostics with MLflow by launching an MLflow server in a new terminal on local host.
mlflow ui
Perform Inference
The model is versioned and saved with MLflow. To load a model & perform inference in Python see exe_3_perform_inference.py
.
Usually, at this stage, we’ll create a REST-API to serve the model (or use TorchServing), deploy this API on a Kubernetes/AKS/Fargate/EC2 cluster with autoscaling, and query the API from the app.
2. Convert the model binaries to TorchScript
The model needs to be converted to TorchScript. This creates a serialisation that is completely independent from the Python Interpreter. To perform the conversion, run:
python exe_4_convert_model_to_torchscript.py --model_path <PATH-TO-MODEL>
This script loads the model and performs the conversion. Here is some of the internal code, which defines the model, loads the trained weights, and saves the model with torch.jit.script
.
# define model architecture --------------------------------------------------------------++
loss_fn = nn.CrossEntropyLoss()
device = 'cuda' if torch.cuda.is_available() else 'cpu'
metric_fn = Accuracy(task="multiclass", num_classes=10).to(device)
model = NeuralNetwork().to(device)
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)
# define model architecture --------------------------------------------------------------++
# load model weights with checkpoint -----------------------------------------------------++
checkpoint_files = os.listdir('checkpoints')
CHECKPOINT_PATH, LAST_EPOCH = get_epoch_number(checkpoint_files)
model, optimizer, start_epoch, start_batch, loss = load_checkpoint(CHECKPOINT_PATH, model, optimizer)
# load model weights with checkpoint -----------------------------------------------------++
# Set the model to evaluation mode
model.eval()
# save the model with TorchScript
MODEL_NAME = 'model_scripted.pt'
model_scripted = torch.jit.script(model) # Export to TorchScript
model_scripted.save(MODEL_NAME) # Save
print('Model converted and saved as model_scripted.pt')
We now have a serialised model model_scripted.pt
ready to deploy outside of the Python runtime.
3. Installing the C++ distribution of PyTorch
Before trying to deploy your model, I recommend getting a minimal application working. I ran into several compiler errors at this stage, particularly running Apple Silicon without a VM.
Download Libtorch.
Write a minimal C++ application that instantiates Torch modules.
#include <torch/torch.h>
#include <iostream>
int main() {
// Create a tensor
torch::Tensor tensor = torch::rand({2, 3});
std::cout << "Hello World!! Here are my torch tensors::" << tensor << "!?" << std::endl;
return 0;
}
It is recommended to use CMake
to build the application. Create a CMakeList.txt
file to link Libtorch to your application.
cmake_minimum_required(VERSION 3.0 FATAL_ERROR)
project(cpp-torch-deployment)
find_package(Torch REQUIRED)
add_executable(${PROJECT_NAME} main.cpp)
target_link_libraries(${PROJECT_NAME} "${TORCH_LIBRARIES}")
set_property(TARGET ${PROJECT_NAME} PROPERTY CXX_STANDARD 17)
Create & navigate to a build
directory. Run CMake (specifying the location of Libtorch on your machine) & run make
to build the project. -S
and -B
specify the source and build locations respectively.
mkdir build
cd build
cmake -S ../ -B . -DTorch_DIR=<absolute-path-to-libtorch>
make
If everything compiles successfully, you should now have an executable ready to run in C++.
./torch-cpp-deployment
Common compiler errors
I ran into several errors compiling Libtorch, I recommend using the DTorch_DIR
CLI argument to specify the absolute path to LibTorch
.
If you’re running MacOS (which you’re unlikely to do in production) you’ll likely run into firewall issues, allow Libtorch to disable apple.quarantine
by running.
xattr -r -d com.apple.quarantine <path-to-libtorch>
Additional resources
For additional examples, this tutorial Installing C++ Distributions of PyTorch or exe_2_cpp_torchlib
in the repository. Here is an excellent tutorial series on how to get started with CMake files.
Deploy your Torch model in C++
Finally, we are ready to deploy our model. Use the same steps as above to link Libtorch to your application. torch::jit::script
can be used to load the serialised model .pt
file.
This script takes the path to the .pt
model as a command line argument & performs inference on a random input tensor.
#include <torch/script.h> // One-stop header.
#include <iostream>
#include <memory>
int main(int argc, const char* argv[]) {
if (argc != 2) {
std::cerr << "usage: example-app <path-to-exported-script-module>\n";
return -1;
}
torch::jit::script::Module module;
try {
// Deserialize the ScriptModule from a file using torch::jit::load().
module = torch::jit::load(argv[1]);
}
catch (const c10::Error& e) {
std::cerr << "error loading the model\n";
return -1;
}
std::cout << "Model loaded successfully.\n";
// inference code
// Create a vector of inputs.
std::vector<torch::jit::IValue> inputs;
inputs.push_back(torch::ones({64, 1, 28, 28}));
// Execute the model and turn its output into a tensor.
at::Tensor output = module.forward(inputs).toTensor();
std::cout << output.slice(/*dim=*/1, /*start=*/0, /*end=*/5) << '\n';
}
Build the application with CMake
as above.
Voila! We have now deployed our model in a
C++
runtime.
Additional resources
- Project Repository
- Loading a TorchScript Model in C++
- Installing C++ Distributions of PyTorch
- Tutorial series on how to get started with CMake
- PyTorch within MLflow
- PyTorch C++ API
- Deep Learning with MLflow (Part 2)
- Saving and loading a general checkpoint in PyTorch
- Saving and Loading Models (PyTorch)
The complete code/tutorial is available here: