We can also plot only the 94% high-density intervals (HDI), i.e. short credible intervals containing 94% of the posterior mass in a single figure via

`az.plot_forest(unpooled_trace, var_names=['slope'], combined=True)`

We get

You can see how groups 0 to 6 have small slopes, and group 7 being way out there. But this is totally wrong because all of our slopes should be around a value of **two**. What happened? Simple: we tricked the model by introducing outliers to the smallest group.

This is the exact problem I was talking about earlier: the sub-model for group 7 has no chance, as it does not know what is going on in the group 0 to 6. It has no clue about the slopes usually being around a value of two.

So, let us fix the problem.

## Partially pooled aka hierarchical model

with pm.Model() as hierarchical_model:mu_slope = pm.Normal('mu_slope', 0, 1) # hyperprior 1

sigma_slope = pm.Exponential('sigma_slope', 13) # hyperprior 2slope = pm.Normal('slope',

mu_slope, sigma_slope, shape=8)

noise = pm.Exponential('noise', 10)obs = pm.Normal('obs', slope[groups]*x, noise, observed=y)

hierarchical_trace = pm.sample(

az.plot_posterior(hierarchical_trace)

return_inferencedata=True,target_accept=0.995

)

So, what is this now? In the unpooled model, we made up priors for the slopes via

`slope = pm.Normal('slope', 0, 20, shape=8)`

We told the model that the slopes should be around zero, but with a quite large standard deviation of twenty.

In the hierarchical model, we define so-called **hyperpriors** to find a better mean and standard deviation for the slope priors. The important part is really only this:

**mu_slope** = pm.Normal('mu_slope', 0, 1)

**sigma_slope** = pm.Exponential('sigma_slope', 13)slope = pm.Normal('slope', **mu_slope, sigma_slope**, shape=8)

We replace the zero with `mu_slope`

and the twenty with `sigma_slope`

, easy as that. Both of these are random variables as well, which we can learn using Bayesian inference. `mu_slope`

and `sigma_slope`

are called hyperpriors, the same way the maximum depth of a decision tree is called a hyperparameter, for example. Both of them are one *hierarchy level* above `slope`

as they have to be evaluated first before `slope`

can be calculated.

Note: I added

`target_accept=0.995`

to the`sample`

method to improve the sampling, because sampling from this nested posterior is not as easy anymore as for the unpooled model. I can explain this further in another article.

The posteriors look like this:

Continue reading: https://towardsdatascience.com/bayesian-hierarchical-modeling-in-pymc3-d113c97f5149?source=rss—-7f60cf5620c9—4

Source: towardsdatascience.com

## Comments by halbot