Nonparametric ECG Signal Processing

A probabilistic approach to categorising ECG waveforms.

Zach Wolpe
4 min readMay 25, 2022


EKG offers a simple, well studied, rich data source that can readily be used to compute a plethora of health indicators. Backed by a rigorous literature — and ubiquitous availability across medical devices — ECG is often an imperative fundamental component of any physiological analysis.

Heart Rate Variability (HRV) is particularly predictive when modelling many health conditions. Although HRV is very easy to calculate, it is often corrupted when processing noisy or distorted signals

Signal Distortions

A full ECG signal is undoubtedly going to be corrupt at some point sequence, however this can be circumvented by omitting sections of the time series that are marked as irregular.

We employ a suite of algorithms to clean & process physiological signals, it’s arguably the most precient part of our ML pipeline. One such processing algorithm is tasked with catching:

  1. Noisy ECG signals
  2. Inverted ECG signals
  3. Corrupted ECG signals
Inverted Signals: An inverted signal (right) compared to a correct counterpart (left). Note that the largest peak precedes the trough in the left signal but follows the trough in the right signal. The right signal is also excessively noisy, exhibiting excess frequencies that may be removed with Wavelets.
Noisy Signals: noisy signals exhibit excess sampling variability or larger trends that distort the true latent function. The same signal is presented on both the right and left, but at different stages. The green markings on the right show the underlying ECG PQRS flags, that ought to be decoupled from it’s noisy surroundings. When datasets are sufficiently large and deep learning can regularly circumvent this without manual intervention, however, it is often necessary to decouple the noisy interference from the underlying signal (which may be achieved with Fourier Transforms, wavelets &/or Matrix Profiling).
Corrupt Signals: A corrupt signal (left) is distorted beyond use. Compare it to a regular clean ECG (right). Although it is elementary to identify visually, both signals exhibit similar means, and variances and are stationary functions over the sequence. This means that they exhibit similar distributional qualities. One may be tempted to design heuristics to detect such signals, however, the magnitude of possible signal corruptions makes this approach dangerous in production.


Certain signals require more pre-processing than others, often once a signals defects are identified it is straightforward to address the distortion. For example, wavelets and Fourier analysis can smooth and reduce noise and inverted signals can be normalised.

If we can autmatically detect the signal defect we can disregard bad data & efficiently personalize the processing of good (but distorted) data.

To this aid, we employ algorithms that:

  1. Capture distributional shift and nonstationarity.
  2. Map latent functions to matrix profiles.
  3. Decouple generative functions to denoise signals.

Beyond these, we have begun to explore the utility of Kernel Density Estimates over a fixed sample to quantify nuanced distributional characteristics.

Kernal Density Estimates (KDEs)

KDEs are non-parametric flexible distribution estimates of a random variable. Just as Kernels are used for data smoothing elsewhere in Machine Learning, KDEs smooth the distribution of some set of data with minimal hyperparameters.

A KDE function f for some kernel K fit to a set of random (assumed i.i.d) samples x.

Although KDEs assume i.i.d data — an assumption that is clearly violated in signal processing — we can mitigate the danger of this violation by using stationary signals. The KDE allows for much greater flexibility than parametric statistical distributions, so by forgoing the convenience of distributional parameterisation one gains great granularity and flexibility.

Raw signal (left) and sample histogram with an overlaying KDE (right).
A KDE fit to the data (left) vs a Gaussian distribution (right). It is clear to see that the KDE captures both much greater granularity, as well as the inherent asymmetry in the data.

Modeling the KDE Output

After fitting KDEs, one can use the CDF (cummulative distribution function) to identify peaks and troughs in the data. These extrema are then used as features to compute the probability of inversion π, distributional shift and excess noise.

Here are some examples of how this simple algorithm was able to detect extrema, that are then used to generate features:



Zach Wolpe

Machine Learning Engineer. Writing for fun.