To understand how Bayes’ Theorem relates to Bayesian Inference, we have to understand the theorem through probability distributions rather than just point probabilities. A probability distribution just gives the probability of all possible outcomes in any scenario, not just the most likely outcome.
A probability distribution can be continuous, as in the expected IQ of a randomly selected person (Normally distributed with mean=100, standard deviation=10):
Or discrete, as in our previous example. We’ll show the probability of a positive test result given disease P(T=1|D). This is simply a Bernoulli probability distribution:
In Bayesian Inference, we want to learn some probability distribution for parameters of a model given some data and prior beliefs about those parameters. We use Bayes’ Theorem to do this inference:
This seems a bit abstract, so let’s illustrate this inference with another example.
Suppose I find a quarter on the floor. Something strange happens when I leave quarters in my pocket and accidentally run them through the laundry though: the coin bends in such a way where it becomes biased and flips 90% heads and 10% tails. Unfortunately, I have no idea whether this coin that I just found is loaded or fair. I believe there is a 50% chance of the coin being loaded. Luckily, I can flip the coin to hopefully figure out whether my coin is fair.
In this scenario, we only have 1 parameter: p, the probability of flipping heads. We assign a 50% prior belief for p=0.5 (unbiased) and 50% belief for p=0.9 (biased).
Thus, our prior for P(parameters) looks like this:
This is a bit subjective, as choosing priors can be.
Say we flip the coin once. We get heads!
Now we want to know the updated probability that our coin is loaded: P(p=0.9 | heads).
Let’s solve this using Bayes’ theorem. Note: we will now use θ to represent parameters and X to represent data, as is common practice.
We are now 64% sure that the coin is loaded! To find the probability that it is fair we can redo that equation with p=0.5. Or we can use the fact that we are only dealing with 2 possibilities and realize that p(fair | data) = 1-p(loaded | data) = 0.36.
Importantly, we could also forget about the denominator, calculate P(X|p)P(p) for both values of p, and then we have our relative posterior probabilities. This is because P(X) is the same for both calculations. If you read more into Bayesian statistics, you will find Bayes Theorem written…
Continue reading: https://towardsdatascience.com/from-bayes-theorem-to-bayesian-inference-b261124633a6?source=rss—-7f60cf5620c9—4