We can also plot only the 94% high-density intervals (HDI), i.e. short credible intervals containing 94% of the posterior mass in a single figure via

`az.plot_forest(unpooled_trace, var_names=['slope'], combined=True)`

We get

You can see how groups 0 to 6 have small slopes, and group 7 being way out there. But this is totally wrong because all of our slopes should be around a value of two. What happened? Simple: we tricked the model by introducing outliers to the smallest group.

This is the exact problem I was talking about earlier: the sub-model for group 7 has no chance, as it does not know what is going on in the group 0 to 6. It has no clue about the slopes usually being around a value of two.

So, let us fix the problem.

Partially pooled aka hierarchical model

`with pm.Model() as hierarchical_model:mu_slope = pm.Normal('mu_slope', 0, 1) # hyperprior 1sigma_slope = pm.Exponential('sigma_slope', 13) # hyperprior 2slope = pm.Normal('slope', mu_slope, sigma_slope, shape=8)noise = pm.Exponential('noise', 10)obs = pm.Normal('obs', slope[groups]*x, noise, observed=y)hierarchical_trace = pm.sample(return_inferencedata=True,target_accept=0.995)az.plot_posterior(hierarchical_trace)`

So, what is this now? In the unpooled model, we made up priors for the slopes via

`slope = pm.Normal('slope', 0, 20, shape=8)`

We told the model that the slopes should be around zero, but with a quite large standard deviation of twenty.

In the hierarchical model, we define so-called hyperpriors to find a better mean and standard deviation for the slope priors. The important part is really only this:

`mu_slope = pm.Normal('mu_slope', 0, 1)sigma_slope = pm.Exponential('sigma_slope', 13)slope = pm.Normal('slope', mu_slope, sigma_slope, shape=8)`

We replace the zero with `mu_slope` and the twenty with `sigma_slope` , easy as that. Both of these are random variables as well, which we can learn using Bayesian inference. `mu_slope` and `sigma_slope` are called hyperpriors, the same way the maximum depth of a decision tree is called a hyperparameter, for example. Both of them are one hierarchy level above `slope` as they have to be evaluated first before `slope` can be calculated.

Note: I added `target_accept=0.995` to the `sample` method to improve the sampling, because sampling from this nested posterior is not as easy anymore as for the unpooled model. I can explain this further in another article.

The posteriors look like this: