Intelligent Hypothesis Generating

How to use modular data architectures to efficiently generate & test hypotheses.

Here I provide a paradigm for handling complexity through modular design.

Example Problem Background

Machine Learning Engineering

One should design every element of thier pipeline as independent modules to enable quick & seamless interchangable components.

Solution

Data Science Process

1. Exploratory Data Analysis (EDA)
2. Model Free Analysis
3. Theoretical Discussion
4. Model Based Analysis

Requirements

1. The ability to access the data intuitively.
2. The ability to define relationships in the data as required.
3. The ability to add functionality (methods) as required.
4. The ability to rapidly test & experiment as required.

Solution

Class Structure

- Hosts the raw data
- provides a series of methods to transform & visualize the data
- defines the relationships between the datatypes

Results

1. Test the relationship between any variable set in the data.
2. Build intuition about potential correlations in the data.
3. Visualize & express dependencies in the data.
4. Select a colour scheme - because things should be pretty ;)

Next Steps

Afterthought

Statistician, scientist, technologist — writing about stats, data science, math, philosophy, poetry & any other flavours that occupy my mind. Get in touch