The most commonly known approach to Predictive Modeling is linear regression, where a prediction is made from one or more predictor variables, weighted by constant coefficients. This is known as the Maximum Likelihood approach and has several downsides. Often in business, it is not sufficient to answer whatâ€™s the best prediction, but also how much better is this prediction compared to an alternative prediction? For example, if a coin is biased, it is more important to find out the chances of heads vs tails than simply calling heads or tails for the next flip. In other words, the complete predictive model should yield a distribution of predictions not a single answer.

**This brings us to the famous Bayesian statistics.**

## Bayesian Statistics

In the Bayesian philosophy, all events are governed by some underlying statistical model. The process of inferencing is to determine which of many possible models is the most accurate by drawing samples from experiments. Keep in mind that the model itself is invisible, only experimental data can be observed directly.

**The Bayesian formula is shown below with its four components. **

Evidence

**The Bayesian formula**

The four components have simple interpretations

**Posterior –**the new inferred model with the benefit of observed experimental data; the effect of price cuts on conversion rates – taking into account recent market study.**Likelihood –**the probability of observing the data (aka Evidence) for a given candidate model H (aka Hypothesis); e.g., conversion rates using current assumption about price cut.**Prior –**the presumed probability of the underlying models, usually supplied by educated guesses; the current assumption of how a price cut affects conversion.**Evidence –**the observed experimental data (e.g., conversion/no conversion).

This innocuous looking formula triggered an entire field of research which consumed some of the best minds for centuries, with the single intent of solving for the posterior either formulaically, numerically, or by means of computer simulations. With the posterior distribution solved, we can make predictions and derive any other metric of interest, such as the expectation value of some indicator and the margin of error of any given predictions.

From a practical point of view, the business advantage of adopting the Bayesian approach is plainly obvious: the model keeps getting better as one gets more feedback.

With each iteration, the posterior reflects the real-world situation more closely and leads to more accurate predictions of the outcome.

## Hierarchical Bayesian Model

When it comes to business and marketing applications, the problem doesnâ€™t end with the distribution of the predictor coefficients. At one level, a company wants to know the market response to its product or service features. However, the market response is not one homogeneous distribution, but rather a distribution of distributions depending on the customer segment.

Understanding customer heterogeneity is clearly a marketing / sales priority. The best approach to capture this level of customer intelligence is by means of random effect parameters. This additional level of modeling not only considers individual customer preferences, but also makes a prediction on new customers with no prior purchase history. This multi-level approach is known in the field as Hierarchical Bayesian Model (HBM) and represents the state-of-the-art in marketing research.

While solving the standard Bayesian Model is often challenging, the multiple levels of HBM is that much more daunting. Fortunately, we have a very powerful technique in Markov Chain Monte Carlo (MCMC) simulation, which is virtually without bound in the kinds of models it can be applied to.

## Use of MCMC Simulation

A direct calculation of the posterior involves solving the Bayesian formula. While the numerator terms are simple enough, the denominator represents an integral over all possible parameter states. Except for a narrow set of theoretical work, this integral is generally not in closed form, and not directly solvable. This difficulty is the single obstacle which trapped the Bayesian approach in a dormant state since its invention in the 1700s.

Finally, in the 1950â€™s, a group of world-class nuclear physicists realized that by taking the ratio of two probabilities, the denominator drops out, and you get past the integration obstacle. By repeatedly generating new parameter values and accepting or rejecting them based on the probability ratio, you end up with a collection of sample values which closely resembles the actual target distribution. This was the birth of the Metropolis-Hastings algorithm, one of a handful of MCMC algorithms commonly used today. Unfortunately for the art, there were only a few computers in the world that could run the algorithms back then. It was not until the mid 1980s, with the advent of cheap and fast personal computers, that MCMC became the favorite tool of academic researchers all over the globe.

## Hidden Markov Model

Having addressed the need to consider individual preferences, we face the next hurdle in that a single consumer is still not of a consistent mindset when it comes to making a purchase. The marketing concept of a *Funnel State* captures how a consumer moves from unaware, to uninterested, to actively searching, to comparison shopping, and finally to conversion. Successful messaging to individual consumers must take into account the state of buyer readiness, which of course, is not directly observable. All marketing automation systems have this type of mechanism built in, except they depend on human judgement supported by manual scoring.

The branch of Machine Learning devoted to detection of hidden states is called Hidden Markov Model (HMM). The accurate estimation of a consumerâ€™s psychological state is a powerful way of determining the Next Best Action (NBA), where the seller offers the most optimal stimulus (ads, coupons, brochures, etc.) to reach a consumer based on their mental state in addition to the overall profile. In fact, the idea of taking proactive actions to affect the outcome moves the bar beyond Predictive Modeling to Prescriptive Modeling.

Advanced HMM involving this type of stimulus-driven state transition is called Non-Homogeneous HMM. This level of complexity seems foreboding, but is very much within the capability of the MCMC technique. The combination of HMM and HBM provides the ultimate Fine-Grained Personalization solution for Marketing efforts.