Model Architectures (& Training Pipelines) as Code

MLOps: Model Architectures & Training/Inference Pipelines — as Code

4 min readOct 23, 2024

The ML landscape has changed dramatically over the last few years. Ubiquitous access to large pre-trained models has radically increased the importance of fine-tuning models & transfer learning.

Taking a pre-trained large model and configuring it for your data is becoming extremely valuable

In general, we want to build model pipelines that are:

Maintainable & extendable.
Able to utilise off-the-shelf pre-trained models.
Re-trainable (or tunable) with new data.
Well documented, preferably as infrastructure as code, and comply with good MLOps.

We can achieve this by defining model pipelines as infrastructure as code.

Example

One excellent example of this is the ecg-sleep-staging repository:

GitHub - adammj/ecg-sleep-staging: Expert-level ECG-only sleep staging

Expert-level ECG-only sleep staging. Contribute to adammj/ecg-sleep-staging development by creating an account on…

github.com

Paper: Expert-level sleep staging using an electrocardiography-only feed-forward neural network

Project Structure

The brilliance of this codebase is the ability to launch new training/inference runs by changing the config files.

Define architecture & parameters as config files.
Easily modify the experiment data, model architecture & trainable parameters.
Checkpoint/stage results.

Model Architecture & Data Config Files

Two config files comprehensively describe:

Neural Network Architecture (net_params.json)
The batch training/inference job (train_params.json)
Training hyperparameters (train_params.json)

As a result, preparing the code for deployment; launching a new training/tuning job; or testing the pipeline on a new data source is as simple as swapping out the config file.

Similarly, swapping out different architectures is as simple as swapping out the net_params.json config to fit the new model spec — if the pre-trained weights are compatible. This is possible because of the modular nature of the PyTorch computational graph.

Fine-Tuning & Transfer Learning

PyTorch models make use of a special data class and data loader, responsible for preparing the data for training/inference and batch processing the data respectively.

Suppose we have a new dataset that we wish to use to fine-tune or perform transfer learning.

1. Existing Data Class & Loader

If new data can be transformed to fit the existing data class & loader, it’s as easy as updating the training config, launching a batch processing job and checkpointing the results. This is the intended use during development.

2. Data Outside of the Domain

If the new data is outside of the domain, we can write a customdata class/data loader and launch an experiment all the same. It only requires 3 dunders, all of which are self-explanatory. For example:

class CustomECGDataclass(torch.utils.data.Dataset):
    """
    ------------------------------------------------------------------------
    Custom ECG Dataset (Pass to DataLoader)
    ---------------------------------------

    Parameters:
    -----------
        : arg 1 ...
        : arg 2 ...

    ...
    ------------------------------------------------------------------------
    """
    def __init__(self, input_space : list[dict]):
        raise NotImp

    def __len__(self):
        # example
        return len(self.input_space)

    def __getitem__(self, idx):
        # example
        return self.input_space[idx]

3. Granular Control

A serious added benefit is we can go beyond adapting the model for new data.

We have full access to the computational graph, as such we can specify exactly which gradients to compute when optimising the algorithm, and therefore control which parameters we want to retrain. Freeze parameters by setting model.params.requires_grad=True.

def freeze_parameters(model : SleepNet, train_all=True):
    """
    Only Dense Output layers are trainable. Modify this function to change this behaviour
    
    See the original paper & net_params.json for network specs.

    Layer types:
    ------------
        [
            'additional_dense.bias', 
            'additional_dense.weight',
            'ecg_cnns.network', 
            'ecg_dense.bias', 
            'ecg_dense.weight',
            'inner_dense.downsample', 
            'inner_dense.network',
            'outer_dense.downsample', 
            'outer_dense.network', 
            'tcn.network'
        ]

    """
    if train_all:
        for name, param in model.named_parameters():
            param.requires_grad = True

    TRAINABLE_LAYERS = ['outer_dense.downsample', 'outer_dense.network']
    
    for name, param in model.named_parameters():
        for _trainable_layer in TRAINABLE_LAYERS:
            if _trainable_layer in name:
                param.requires_grad = True
            else:
                param.requires_grad = False

Summary of the difference between transfer learning and fine-tuning, source GPT.

In summary, we have reviewed some ideas that should be considered when designing large machine learning systems. Specifically, taking a infrastructure as code approach to machine learning systems.