We can also plot only the 94% high-density intervals (HDI), i.e. short credible intervals containing 94% of the posterior mass in a single figure via

az.plot_forest(unpooled_trace, var_names=['slope'], combined=True)

We get

Image by the author.

You can see how groups 0 to 6 have small slopes, and group 7 being way out there. But this is totally wrong because all of our slopes should be around a value of two. What happened? Simple: we tricked the model by introducing outliers to the smallest group.

This is the exact problem I was talking about earlier: the sub-model for group 7 has no chance, as it does not know what is going on in the group 0 to 6. It has no clue about the slopes usually being around a value of two.

So, let us fix the problem.

Partially pooled aka hierarchical model

with pm.Model() as hierarchical_model:
mu_slope = pm.Normal('mu_slope', 0, 1) # hyperprior 1
sigma_slope = pm.Exponential('sigma_slope', 13) # hyperprior 2

slope = pm.Normal('slope', mu_slope, sigma_slope, shape=8)
noise = pm.Exponential('noise', 10)

obs = pm.Normal('obs', slope[groups]*x, noise, observed=y)

hierarchical_trace = pm.sample(


So, what is this now? In the unpooled model, we made up priors for the slopes via

slope = pm.Normal('slope', 0, 20, shape=8)

We told the model that the slopes should be around zero, but with a quite large standard deviation of twenty.

In the hierarchical model, we define so-called hyperpriors to find a better mean and standard deviation for the slope priors. The important part is really only this:

mu_slope = pm.Normal('mu_slope', 0, 1)
sigma_slope = pm.Exponential('sigma_slope', 13)

slope = pm.Normal('slope', mu_slope, sigma_slope, shape=8)

We replace the zero with mu_slope and the twenty with sigma_slope , easy as that. Both of these are random variables as well, which we can learn using Bayesian inference. mu_slope and sigma_slope are called hyperpriors, the same way the maximum depth of a decision tree is called a hyperparameter, for example. Both of them are one hierarchy level above slope as they have to be evaluated first before slope can be calculated.

Note: I added target_accept=0.995 to the sample method to improve the sampling, because sampling from this nested posterior is not as easy anymore as for the unpooled model. I can explain this further in another article.

The posteriors look like this:

Continue reading: https://towardsdatascience.com/bayesian-hierarchical-modeling-in-pymc3-d113c97f5149?source=rss—-7f60cf5620c9—4

Source: towardsdatascience.com