Zach Wolpe

Mar 4, 2021

9 min read

A Reductionist Framework for Human Happiness

∞ A stochastic process that parameterize an uncountably infinite set of random variables - to solve for happiness.

Sometimes existence feels deplorable, other times, bliss. We cannot improve what we do not measure — so why not reverse engineer human emotion?

Formalizing Happiness

Before we can capture any natural phenomena — which is infinitely complex — by some statistical framework we need to formalize the search space (imposing simplifying constraints). One can, in fact, consider statistical inference an attempt to learn a simplifying, smooth, approximation of reality.

Suppose we accept the fallacies of the scientific doctrine, explicitly & axiomatically making simplifying assumptions on the nature of happiness. Warranted for brevity, suppose we work in the limit across individuals (looking at average happiness over some population).

First, one needs to make a case for modeling happiness as a stochastic function.

Suppose we consider happiness a latent representation that generates the distribution over a sequence of observable personal and physiological attributes — which in turn updates by qeurying reality (Bayesian reinforcement learning).

Suppose, further, that an individual’s happiness priors are indiscernible convolutions over one’s biological, cultural, contextual, neurological & evolutionary constituents — further distorted by some stochastic intervention. This leaves one’s prior an enigma & instead we focus primarily on the processes’ corollaries — current happiness (in Bayesian terms: the evidence).

We can now make a few further assumptions, to make modeling feasible:

Sinusoidal Functions: Emotional Battery

It is not implausible that one’s happiness is the convolution of a series of periodic (perhaps sinusoidal) functions. Consider your own life, is it not unreasonable to assume your happiness to be driven by the fluctuations in a series of personal and environmental attributes, your happiness could be a set of basis functions:

  • Career progress.
  • The state of one’s physical & mental health.
  • The quality of one’s friendships & romantic relationships.
  • The depth of one’s thought of philosophical or religious connections.
  • One’s sense of purpose & community.
  • One’s sense of novelty & excitement.

Any other process that may interact with one’s happiness. Most of these can be considered inherently cyclical, by virtue of adjusting one’s expectations and rebalancings one’s base. This cyclicity also feels intuitively accurate as our goals/status/desires/priorities adjust throughout life.


If we consider the notion of a hedonic treadmill, we arrive at an implicit assumption of stationarity by the nature of this cyclicity. Our modeling framework need not be limited by a stationary process, as we can frame one’s life as the length of the process — implying stationarity over individuals but not within individuals.


Undoubtedly an intuitive observation, our progress & satisfaction through life ebbs and flows through as we enter new stages, encounter new challenges, etc. A natural cyclicity governs each stage of our cosmic navigation — the only constant is change. Thus we have every reason to believe that a cyclical model would best capture the conditions that govern one’s well-being.

Interactive Complexity

Although I make my case for non-linearity here this also finds its grounding in intuition, the human psyche is undoubtedly complex, swayed by emergent properties and butterfly effects.

Defining Happiness

Note that I avoid providing explicit definitions of happiness — an insurmountable task. I justify this avoidance by 2 mechanisms:

  1. We consider happiness a latent manifestation of simpler, observable phenomena (relationship satisfaction, career growth, spiritual enlightenment, level of societal acceptance, etc) — thus quantifying these attributes highly correlates with happiness.

Let's call these attributes one’s Emotional Battery

2. We take a reinforcement learning approach, where one needs not know the true state of the world (happiness level) but rather needs to be able to query the world (take stock of one’s emotional battery) to update our model.

Sampling Happiness

In the past this may have been a very theoretical exercise, this is no longer the case. Our cyborg-like dependence on technology allows one to gauge physiological responses almost continuously. One could also engage with psychometrics and other mechanisms to gauge elements of the emotional battery.

We can now generate data by sampling human physiology, emotional regulation, satisfaction and well-being.

Data Generation Summary

For clarity, here are the assumptions we’re making:

  1. One’s happiness can be adequately represented by a set of observable metrics (work/relationship satisfaction, health, blood pleasure (think nonlinearly), level of novelty/excitement or peace/belonging, etc)) — we call these attributes one’s Emotional Battery.
  2. One’s Emotional Battery is readily observable through wearables, psychometrics, discussions, etc — thus we are able to generate data on one’s emotional state!
  3. We can present this Battery as a stationary process of cyclical, non-linear, highly interactive basis functions (this adds very little restrictions however allows for a sound mathematical framework).
  4. We can now model this data by defining some statistical machinery ;).

________________________________ xox _______________________________

Gaussian Processes

Note: Feel free to skip to ‘Pyro Model’ if you do not care for the statistical learning theory.

A Gaussian Process (GP) is a stochastic process — a collection of random variables indexed by a continuum — such that each finite collection of random variables as a a multivariate normal distribution.

The distribution of the GP is the joint distribution over infinite random variables, as such the distribution over functions over a continuous domain.

There are many mathematically sound approaches to deriving GPs, two mainstream approaches are the Weight Space View and the Function Space View.

Weight Space View

A GP is can be considered a Bayesian application of mapping the data to a high (possibly infinite) dimensional feature space & approximating the linear mapping between features and the response. Note: the ‘Kernel trick’ from applied math is utilized to avoid computing sparse calculations in high-dimensional manifold.

Model + Log-Likelihood

Suppose we take the simple linear model and wish to learn the function f(x).

By assuming normal variance and independence, we derive the log-likelihood of the data:


Before fitting the parameters in a Bayesian model requires a prior. (This is where the model becomes a GP).

We place a zero mean, covariance normal prior on the weights.


Bayesian Inference linear model is based on the posterior distribution over the weight:

Where the normalizing constant (marginal likelihood) is given by:

Which can be ignored as it contains no information about the weights (& is intractable).

Instead, we solve for the likelihood x prior & derive:

Under examination, one would notice that this takes the form of a Gaussian with

Thus we find the parameters/weights to be Normally distributed:

This is equivalent to a non-Bayesian sense where the negative log prior is considered a penalty term.

Prediction is made by integrating over all possible parameter values to achieve point estimates.

In the symmetric case, the mean & mode of this posterior is the MAP estimate, used for prediction. The posterior also gives one confidence interval over the data (X,y) thus we are more confident about data-rich areas.

Non-linear Basis Functions

This applies to the simple linear model but is readily extended to flexible basis functions by transforming the data to a high dimensional feature space. Suppose some phi Kernal is applied to map the data to a high dimensional feature space, we derive the analogous result:

Which simply reflex the mapping from the transformed manifold.

Function Space View

One can think of a Gaussian process as defining a distribution over functions, and inference taking place directly in the space of functions.

The function space approach is beautiful, nontrivial, & is described here.

________________________________ xox _______________________________

Pyro Mødel

Our GP is defined by — where K(x,x’) is some kernel function, chosen to be the radial basis function kernel (RBF):

The model is defined as a kernel function applied to define the covariance. The sigma & lambda parameters are hyper-parameters.

Here’s a primer on working with Pyro:

Happiness Data Generation

We wish to model happiness, by decomposing our emotional fluctuations — Emotional Battery — into a series of basis functions and quantifying some confidence by learning some distribution over the data generating process.

Suppose your emotional battery comprises of:

  1. Work performance
  2. Sleeping cycle
  3. Closeness of relationships
  4. Strength of faith

Whatever you like. These meta-characteristics interact to produce one’s

Sampling: In reality, one would want to sample a task battery by means of measuring an individual's physical/emotional & physiological state periodically. As this is a theoretical exercise, we instead assume the underlying functions.

In the code below we generate one’s AHP (Aggregate Happiness Process) by convolving a series of Emotional Battery functions (which we represent as sinusoidal basis functions).

Call the above (data instance):

Our AHP: a process that governs one’s happiness at any given time. The dark blue line is the final (convolved) Aggregate Happiness Process. The fainter, dotted, functions are one’s emotional battery (when combined produce the AHP). Access to this underlying process provides insight into one’s emotional state.

Now that we have our true (target) function, we wish to generate some function (parameterized by a Gaussian Process) that models the process. In reality, we do not have access to these underlying functions but can only sample one’s happiness — constituting one’s AHF + noice. This is done by:

  • Administering surveys
  • Measuring physical, psychological & emotional conditions, etc.

As such we only obtain samples.

We can generate data from this process (by sampling with noise). Suppose we’re studying your happiness & sample your behaviour/condition over time. We might end up with some data that looks like this:

Sampled data from the target function. Representing what we might actually observe in an individual.

Model Fitting IN Pyro!

Now that we have a data, we can fit a GP (Gaussian Process) to the data to approximate the target distribution.

GPs are flexible, adaptive, frameworks that allow us to model arbitrarily complex functions by mapping the covariance matrix to a higher dimensional feature space (via the kernel trick) to learn smooth function mappings.

Fun Fact: GPs where invented by a South African! Lekker math bru! 🎉

Optimization Methodology + Loss Function

Adam optimizer is used off the shelf to minimize the loss. The Loss utilized is the ELBO (Evidence Lower Bound) — a loss designed for computing intractable posterior distributions (Variational Inference).

Here we implement the GP in Pyro:

Blue line: the true data generating process (target function). Red line: the predictive model (samples from our GP). Purple Area: credibility (confidence) intervals around out mean estimate (the red line). Purple dots: the data.


The model fit is brilliant.

The model (red line) is smooth, but simultaneous almost perfectly maps the underlying process (blue line). The fit allows us to infer a full distribution over the function — capturing the credibility intervals (confidence) over the function space. One can now use the model to generate data, make predictions (with confidence estimates), deconvolve factors, conduct inference & hack the underlying structure in a way they see fit :P.

Cracking GPs

To build some intuition of how GPs work, why they’re adventageous & what they offer, consider the below special cases. Note the change in variance (confidence) & smooth over the various data specifications.

Left: In a data-sparse environment, the model adequately (and intuitively) learns a smoother functional fit. Right: In areas with insufficient data (no samples) the model captures vast uncertainty.
Left: The data generating process, generate many samples in the first half of the function & very few samples over the span of the second half. A single model fit over both the data-rich & data-sparse environment: note the far smoother, less confident fit over the latter continuum.

Philosophical Consideration

If one lives with purpose, often to serve a deity or to form an integral part of a some community — happiness will map accordingly. This is subsumed by a more abstract framework under which one’s self-worth is defined by a biased personal value judgment in accordance with one’s progression and status in alignment with some arbitrary and personal/biological/cultural definition of virtue. Whilst this is inherently a moving target, one may abstract further to meta principles that govern a level of peace & personal acceptance: via some linkage between acquiring one’s innate desires (love, happiness, acceptance, community, exploration, process, etc) and where one’s forecast expectations map to reality approximation over some goal sets.

By this token one might quantify happiness as a finite sample in Hilbert space that captures the human experience.

Sit with Camus. Revel in the absurd. Dance & laugh whilst lending Sisyphus a hand.


Full implementation available here.