Deploying PyTorch Models in C++

Running Machine Learning models on embedded systems (Part I)

7 min readAug 19, 2024

Python dominates the Machine Learning ecosystem, and we all benefit for it. (Computationally) expensive optimisation & combinatoric algorithms are written in C++, and wrapped in a user-friendly Python frontend (PyTorch, Scikit etc).

Most ML workflows are 99% Python & SQL, but what if you’re working with embedded systems, robotics or in an ultra-low latency (on-device) system? You may want to perform inference calls directly from an existing C++ application.

Using TorchScript, it’s possible to remove the need for the Python interpreter when loading a PyTorch model in C++

This tutorial demonstrates how to train a Neural Network in Python and deploy the model in C++

Overview

Train (or download) a PyTorch model.
Checkpoint & save the model.
Convert the model to TorchScript (a serialisation independent of the Python interpreter).
Install and link LibTorch to your C++ application.
Perform inference.

Code

The code & tutorial is available here. Clone the repo and follow the steps highlighted in the Readme to follow along. For reference when debugging, I’m running MacOS on Apple silicon M1.

GitHub - ZachWolpe/cpp-torch-deployment: How to deploy Torch models in C++.

How to deploy Torch models in C++. Contribute to ZachWolpe/cpp-torch-deployment development by creating an account on…

github.com

1. Train & save a PyTorch model

Configure the Neural Network

We’ll train a multiclass-classifier PyTorch Neural Network to predict an item in the Fashion MNIST dataset. There is a script called exe_1_download_data.py to automatically download the data for you in the repo.

Our neural net is a simple feed-forward network that uses a Relu activation later between several linear layers.

class NeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28 * 28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10),
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

Define a train function to execute the training loop, using autograd to perform backpropagation.


def train(train_dataloader, device, model, loss_fn, metrics_fn, optimizer, epoch, checkpoint_dir):
    model.train()
    for batch, (X, y) in enumerate(train_dataloader):
        X, y = X.to(device), y.to(device)

        # Compute prediction error
        pred = model(X)
        loss = loss_fn(pred, y)
        accuracy = metrics_fn(pred, y)

        # Backpropagation
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

        if batch % 100 == 0:
            loss, current = loss.item(), batch
            mlflow.log_metric("loss", loss, step=(epoch * len(train_dataloader) + batch))
            mlflow.log_metric("accuracy", accuracy.item(), step=(epoch * len(train_dataloader) + batch))
            print(
                f"loss: {loss:3f} accuracy: {accuracy:3f} [{current} / {len(train_dataloader)}]"
            )

        # Save checkpoint every 500 batches
        if batch % 500 == 0:
            checkpoint = {
                'epoch': epoch,
                'batch': batch,
                'model_state_dict': model.state_dict(),
                'optimizer_state_dict': optimizer.state_dict(),
                'loss': loss,
            }
            checkpoint_path = os.path.join(checkpoint_dir, f'checkpoint_epoch_{epoch}_batch_{batch}.pth')
            torch.save(checkpoint, checkpoint_path)
            mlflow.log_artifact(checkpoint_path)

Notice we’re using MLflow to track the experiment and log artifacts, as well as staging the model during training (checkpointing). This will be used to relaunch training and load the model later.

Launch a training job

exe_2_launch_training.py launches a training job. The training code takes a few command line arguments to allow for checkpointing and relaunching training from the last checkpoint. This is helpful for spinning up cheap (spot) cloud instances.

# take args
parser = argparse.ArgumentParser(description='Train a neural network.')
parser.add_argument('-e', '--epochs', type=int, default=3, help='Number of epochs to train.')
parser.add_argument('-d', '--data_path', type=str, default='data', help='Path to training/testing data.')
parser.add_argument('-r', '--resume', type=bool, default=True, help='Resume training from checkpoint.')
parser.add_argument('-b', '--batch_size', type=int, default=64, help='Batch size for training.')

Configure the training runtime.

    # load data ------------------------------------------------------------------------------++
    training_data = torch.load(f'{DATA_PATH}/training_data.pt')
    train_dataloader = DataLoader(training_data, batch_size=64, shuffle=True)
    # load data ------------------------------------------------------------------------------++

    # define model architecture --------------------------------------------------------------++
    loss_fn = nn.CrossEntropyLoss()
    metric_fn = Accuracy(task="multiclass", num_classes=10).to(device)
    model = NeuralNetwork().to(device)
    optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)
    # define model architecture --------------------------------------------------------------++

    # mlflow model signature --- -------------------------------------------------------------++
    input = np.random.uniform(size=[1, 28, 28])
    input = torch.tensor(input).float()
    signature = mlflow.models.infer_signature(
        input.numpy(),
        model(input).detach().numpy(),
        )
    # print('signature: ', signature)
    # mlflow model signature -----------------------------------------------------------------++

Launch training within an MLflow run block, logging metrics, parameters & metadata with MLflow. We also register the model version after the training loop completes.

# launch traning
    with mlflow.start_run() as run:
        params = {
            "epochs": EPOCHS,
            "start_epoch": START_EPOCH,
            "end_epoch": END_EPOCH,
            "device": device,
            "learning_rate": 1e-3,
            "batch_size": 64,
            "loss_function": loss_fn.__class__.__name__,
            "metric_function": metric_fn.__class__.__name__,
            "optimizer": "SGD",
        }
        # Log training parameters.
        mlflow.log_params(params)

        # Log model summary.
        with open("model_summary.txt", "w") as f:
            f.write(str(summary(model)))
        mlflow.log_artifact("model_summary.txt")

        checkpoint_dir = "checkpoints"
        os.makedirs(checkpoint_dir, exist_ok=True)

        if not NEW_RUN:
            model, optimizer, start_epoch, start_batch, loss = load_checkpoint(CHECKPOINT_PATH, model, optimizer)
        else:
            start_epoch, start_batch = 0, 0

        # Then start your training loop from start_epoch and start_batch
        for t in range(START_EPOCH, END_EPOCH):
            print(f"Epoch {t+1}\n-------------------------------")
            train(train_dataloader, device, model, loss_fn, metric_fn, optimizer, t, checkpoint_dir)
    
        # Save the trained model to MLflow.
        mlflow.pytorch.log_model(model, "model", signature=signature)

    # Register the model with the latest run -------------------------------------------------++
    client = MlflowClient()
    run_id = run.info.run_id
    model_name = "fashion_mnist_classifier"
    model_version = mlflow.register_model(f"runs:/{run_id}/model", model_name)

    print(f"Model registered with name: {model_name}")
    print(f"Model version: {model_version.version}")
    # Register the model with the latest run -------------------------------------------------++

You can get training diagnostics with MLflow by launching an MLflow server in a new terminal on local host.

mlflow ui

Perform Inference

The model is versioned and saved with MLflow. To load a model & perform inference in Python see exe_3_perform_inference.py.

Usually, at this stage, we’ll create a REST-API to serve the model (or use TorchServing), deploy this API on a Kubernetes/AKS/Fargate/EC2 cluster with autoscaling, and query the API from the app.

2. Convert the model binaries to TorchScript

The model needs to be converted to TorchScript. This creates a serialisation that is completely independent from the Python Interpreter. To perform the conversion, run:

python exe_4_convert_model_to_torchscript.py --model_path <PATH-TO-MODEL>

This script loads the model and performs the conversion. Here is some of the internal code, which defines the model, loads the trained weights, and saves the model with torch.jit.script.

# define model architecture --------------------------------------------------------------++
loss_fn = nn.CrossEntropyLoss()
device = 'cuda' if torch.cuda.is_available() else 'cpu'
metric_fn = Accuracy(task="multiclass", num_classes=10).to(device)
model = NeuralNetwork().to(device)
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)
# define model architecture --------------------------------------------------------------++

# load model weights with checkpoint -----------------------------------------------------++
checkpoint_files = os.listdir('checkpoints')
CHECKPOINT_PATH, LAST_EPOCH = get_epoch_number(checkpoint_files)
model, optimizer, start_epoch, start_batch, loss = load_checkpoint(CHECKPOINT_PATH, model, optimizer)
# load model weights with checkpoint -----------------------------------------------------++

# Set the model to evaluation mode
model.eval()

# save the model with TorchScript
MODEL_NAME = 'model_scripted.pt'
model_scripted = torch.jit.script(model) # Export to TorchScript
model_scripted.save(MODEL_NAME) # Save
print('Model converted and saved as model_scripted.pt')

We now have a serialised model model_scripted.pt ready to deploy outside of the Python runtime.

3. Installing the C++ distribution of PyTorch

Before trying to deploy your model, I recommend getting a minimal application working. I ran into several compiler errors at this stage, particularly running Apple Silicon without a VM.

Download Libtorch.

Write a minimal C++ application that instantiates Torch modules.

#include <torch/torch.h>
#include <iostream>

int main() {
    // Create a tensor
    torch::Tensor tensor = torch::rand({2, 3});
    std::cout << "Hello World!! Here are my torch tensors::" << tensor << "!?" << std::endl;
    return 0;
}

It is recommended to use CMake to build the application. Create a CMakeList.txt file to link Libtorch to your application.

cmake_minimum_required(VERSION 3.0 FATAL_ERROR)

project(cpp-torch-deployment)

find_package(Torch REQUIRED)

add_executable(${PROJECT_NAME} main.cpp)

target_link_libraries(${PROJECT_NAME} "${TORCH_LIBRARIES}")
set_property(TARGET ${PROJECT_NAME} PROPERTY CXX_STANDARD 17)

Create & navigate to a build directory. Run CMake (specifying the location of Libtorch on your machine) & run make to build the project. -S and -B specify the source and build locations respectively.

mkdir build
cd build
cmake -S ../ -B . -DTorch_DIR=<absolute-path-to-libtorch>
make

If everything compiles successfully, you should now have an executable ready to run in C++.

./torch-cpp-deployment

Common compiler errors

I ran into several errors compiling Libtorch, I recommend using the DTorch_DIR CLI argument to specify the absolute path to LibTorch.

If you’re running MacOS (which you’re unlikely to do in production) you’ll likely run into firewall issues, allow Libtorch to disable apple.quarantine by running.

xattr -r -d com.apple.quarantine <path-to-libtorch>

Additional resources

For additional examples, this tutorial Installing C++ Distributions of PyTorch or exe_2_cpp_torchlib in the repository. Here is an excellent tutorial series on how to get started with CMake files.

Deploy your Torch model in C++

Finally, we are ready to deploy our model. Use the same steps as above to link Libtorch to your application. torch::jit::script can be used to load the serialised model .pt file.

This script takes the path to the .pt model as a command line argument & performs inference on a random input tensor.

#include <torch/script.h> // One-stop header.

#include <iostream>
#include <memory>

int main(int argc, const char* argv[]) {
  if (argc != 2) {
    std::cerr << "usage: example-app <path-to-exported-script-module>\n";
    return -1;
  }

  torch::jit::script::Module module;
  try {
    // Deserialize the ScriptModule from a file using torch::jit::load().
    module = torch::jit::load(argv[1]);
  }
  catch (const c10::Error& e) {
    std::cerr << "error loading the model\n";
    return -1;
  }

  std::cout << "Model loaded successfully.\n";

  // inference code
  // Create a vector of inputs.
  std::vector<torch::jit::IValue> inputs;
  inputs.push_back(torch::ones({64, 1, 28, 28}));

  // Execute the model and turn its output into a tensor.
  at::Tensor output = module.forward(inputs).toTensor();
  std::cout << output.slice(/*dim=*/1, /*start=*/0, /*end=*/5) << '\n';
}

Build the application with CMake as above.

Voila! We have now deployed our model in a C++ runtime.

Additional resources

The complete code/tutorial is available here:

GitHub - ZachWolpe/cpp-torch-deployment: How to deploy Torch models in C++.

How to deploy Torch models in C++. Contribute to ZachWolpe/cpp-torch-deployment development by creating an account on…

github.com

Deploying PyTorch Models in C++

Running Machine Learning models on embedded systems (Part I)

Overview

Code

GitHub - ZachWolpe/cpp-torch-deployment: How to deploy Torch models in C++.

How to deploy Torch models in C++. Contribute to ZachWolpe/cpp-torch-deployment development by creating an account on…

1. Train & save a PyTorch model

Configure the Neural Network

Launch a training job

Perform Inference

2. Convert the model binaries to TorchScript

3. Installing the C++ distribution of PyTorch

Common compiler errors

Additional resources

Deploy your Torch model in C++

Additional resources

GitHub - ZachWolpe/cpp-torch-deployment: How to deploy Torch models in C++.

How to deploy Torch models in C++. Contribute to ZachWolpe/cpp-torch-deployment development by creating an account on…

Written by Zach Wolpe

Responses (1)