Nonparametric ECG Signal Processing
A probabilistic approach to categorising ECG waveforms.
EKG offers a simple, well studied, rich data source that can readily be used to compute a plethora of health indicators. Backed by a rigorous literature — and ubiquitous availability across medical devices — ECG is often an imperative fundamental component of any physiological analysis.
Heart Rate Variability (HRV) is particularly predictive when modelling many health conditions. Although HRV is very easy to calculate, it is often corrupted when processing noisy or distorted signals
Signal Distortions
A full ECG signal is undoubtedly going to be corrupt at some point sequence, however this can be circumvented by omitting sections of the time series that are marked as irregular.
We employ a suite of algorithms to clean & process physiological signals, it’s arguably the most precient part of our ML pipeline. One such processing algorithm is tasked with catching:
- Noisy ECG signals
- Inverted ECG signals
- Corrupted ECG signals
Objective
Certain signals require more pre-processing than others, often once a signals defects are identified it is straightforward to address the distortion. For example, wavelets and Fourier analysis can smooth and reduce noise and inverted signals can be normalised.
If we can autmatically detect the signal defect we can disregard bad data & efficiently personalize the processing of good (but distorted) data.
To this aid, we employ algorithms that:
- Capture distributional shift and nonstationarity.
- Map latent functions to matrix profiles.
- Decouple generative functions to denoise signals.
Beyond these, we have begun to explore the utility of Kernel Density Estimates over a fixed sample to quantify nuanced distributional characteristics.
Kernal Density Estimates (KDEs)
KDEs are non-parametric flexible distribution estimates of a random variable. Just as Kernels are used for data smoothing elsewhere in Machine Learning, KDEs smooth the distribution of some set of data with minimal hyperparameters.
Although KDEs assume i.i.d data — an assumption that is clearly violated in signal processing — we can mitigate the danger of this violation by using stationary signals. The KDE allows for much greater flexibility than parametric statistical distributions, so by forgoing the convenience of distributional parameterisation one gains great granularity and flexibility.
Modeling the KDE Output
After fitting KDEs, one can use the CDF (cummulative distribution function) to identify peaks and troughs in the data. These extrema are then used as features to compute the probability of inversion π, distributional shift and excess noise.
Here are some examples of how this simple algorithm was able to detect extrema, that are then used to generate features: