Model Architectures (& Training Pipelines) as Code
MLOps: Model Architectures & Training/Inference Pipelines — as Code
The ML landscape has changed dramatically over the last few years. Ubiquitous access to large pre-trained models has radically increased the importance of fine-tuning models & transfer learning.
Taking a pre-trained large model and configuring it for your data is becoming extremely valuable
In general, we want to build model pipelines that are:
- Maintainable & extendable.
- Able to utilise off-the-shelf pre-trained models.
- Re-trainable (or tunable) with new data.
- Well documented, preferably as infrastructure as code, and comply with good MLOps.
We can achieve this by defining model pipelines as infrastructure as code.
Example
One excellent example of this is the ecg-sleep-staging
repository:
Paper: Expert-level sleep staging using an electrocardiography-only feed-forward neural network
Project Structure
The brilliance of this codebase is the ability to launch new training/inference runs by changing the config files.
- Define architecture & parameters as config files.
- Easily modify the experiment data, model architecture & trainable parameters.
- Checkpoint/stage results.
Model Architecture & Data Config Files
Two config files comprehensively describe:
- Neural Network Architecture (
net_params.json
) - The batch training/inference job (
train_params.json
) - Training hyperparameters (
train_params.json
)
As a result, preparing the code for deployment; launching a new training/tuning job; or testing the pipeline on a new data source is as simple as swapping out the config file.
Similarly, swapping out different architectures is as simple as swapping out the net_params.json
config to fit the new model spec — if the pre-trained weights are compatible. This is possible because of the modular nature of the PyTorch computational graph.
Fine-Tuning & Transfer Learning
PyTorch models make use of a special data class
and data loader
, responsible for preparing the data for training/inference and batch processing the data respectively.
Suppose we have a new dataset that we wish to use to fine-tune or perform transfer learning.
1. Existing Data Class & Loader
If new data can be transformed to fit the existing data class & loader, it’s as easy as updating the training config, launching a batch processing job and checkpointing the results. This is the intended use during development.
2. Data Outside of the Domain
If the new data is outside of the domain, we can write a customdata class
/data loader
and launch an experiment all the same. It only requires 3 dunders, all of which are self-explanatory. For example:
class CustomECGDataclass(torch.utils.data.Dataset):
"""
------------------------------------------------------------------------
Custom ECG Dataset (Pass to DataLoader)
---------------------------------------
Parameters:
-----------
: arg 1 ...
: arg 2 ...
...
------------------------------------------------------------------------
"""
def __init__(self, input_space : list[dict]):
raise NotImp
def __len__(self):
# example
return len(self.input_space)
def __getitem__(self, idx):
# example
return self.input_space[idx]
3. Granular Control
A serious added benefit is we can go beyond adapting the model for new data.
We have full access to the computational graph, as such we can specify exactly which gradients to compute when optimising the algorithm, and therefore control which parameters we want to retrain. Freeze
parameters by setting model.params.requires_grad=True
.
def freeze_parameters(model : SleepNet, train_all=True):
"""
Only Dense Output layers are trainable. Modify this function to change this behaviour
See the original paper & net_params.json for network specs.
Layer types:
------------
[
'additional_dense.bias',
'additional_dense.weight',
'ecg_cnns.network',
'ecg_dense.bias',
'ecg_dense.weight',
'inner_dense.downsample',
'inner_dense.network',
'outer_dense.downsample',
'outer_dense.network',
'tcn.network'
]
"""
if train_all:
for name, param in model.named_parameters():
param.requires_grad = True
TRAINABLE_LAYERS = ['outer_dense.downsample', 'outer_dense.network']
for name, param in model.named_parameters():
for _trainable_layer in TRAINABLE_LAYERS:
if _trainable_layer in name:
param.requires_grad = True
else:
param.requires_grad = False
In summary, we have reviewed some ideas that should be considered when designing large machine learning systems. Specifically, taking a infrastructure as code approach to machine learning systems.