How Uber uses machine learning to achieve hyper-growth in sales.

Application

Uber aims to design targeting marketing campaigns. To build models that classifies users by the likelihood of 1 of 3 events:

  • User acquisition: how likely a new individual is to purchase a product.
  • Cross-or-up-selling: how propensity of an existing user to purchase a related product.
  • User churn: how likely an existing user is to cancel the purchase.

Problem

Presented with thousands of features per individual and exceedingly large data — this seems a simple task for deep learning. In this application, however, this solution is invalid for copious reasons:

Mathematically

  • Parameters are indiscernible from one another.
  • The feature space is sparse— leading to overfitting.
  • Compute needed grows exponentially.
  • Model interpretation & diagnostics are cumbersome.

Pragmatically

Marketers need to use the data to design succinct, actionable, strategies based on individual attributes. Verbose, superfluous, information ought to be refined.

Solution: mRMR

  • A matrix of m features: X
  • A mutual information function: I(.)
  • A response class vector: Y
  • Replacing integrals with summations for discrete features.
  • Replacing the quotient with a subtraction where scale-invariant features are used.
  • Replacing the probability density estimates p(x,y), p(y) or p(x) with statistical tests (t-tests &/or F-tests) where severe computational limitations arise.
  • Non-linear feature dependency can be achieved by mapping the data to some linearly separable space with kernel functions.
  • If the downstream classification framework is pre-determined it may be pragmatic to incorporate the related accuracy metric as a proxy for mutual information.

Empirical Findings

Many variants were tested, it was shown that feature relevance decayed exponentially — albeit at varied rates — across different models.

Production Implementation

Uber then utilized this methodology by implementing the underlying algorithm as a module in one of their automated machine learning pipelines.

Conclusion

Next time you order your Uber, or dig into uber eats, just remember it probably has less to do with your decision-making & more to do with your mutual information with sales channels 😉.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Zach Wolpe

Zach Wolpe

Statistician, scientist, technologist — writing about stats, data science, math, philosophy, poetry & any other flavours that occupy my mind. Get in touch