Let’s start with a simple and intuitive example to demonstrate the working of structure learning. Suppose you have a sprinkler system in your backyard and for the last 1000 days, you measured four variables, each with two states: Rain (yes or no), Cloudy (yes or no), Sprinkler system (on or off), and Wet grass (true or false). Based on these four variables and your conception of the real world, you may have an intuition how the graph should look like. right? right? If not, it is good that you read this article because with structure learning you will find out!

With bnlearn it is easy to determine the causal relationships with only a few lines of code.

In the example below, we will import the bnlearn library, load the sprinkler dataset, and determine which DAG fits best the data. Note that the sprinkler dataset is readily cleaned without missing values and all values have the state 1 or 0.

Figure 3: Example of the best DAG for the Sprinkler system. It encodes the following logic: the probability that the grass is wet is dependent on the sprinkler and the rain. The probability that the sprinkler is on is dependent on Cloudy. The probability that it rains is dependent on Cloudy.

That’s it! We have the learned structure as shown in Figure 3. The detected DAG consists of four nodes that are connected through edges, each edge indicates a causal relation. If you look carefully, you will see the arrows on the edges, this is the causal direction. The state of Wet grass depends on two nodes, Rain and Sprinkler. The state of Rain is conditioned by Cloudy, and separately, the state Sprinkler is also conditioned by Cloudy. This DAG represents the (factorized) probability distribution, where S is the random variable for sprinkler, R for the rain, G for the wet grass, and C for cloudy.

By examining the graph, you quickly see that the only independent variable in the model is C. The other variables are conditioned on the probability of cloudy, rain, and/or the sprinkler. In general, the joint distribution for a Bayesian Network is the product of the conditional probabilities for every node given its parents:

The default setting in bnlearn for structure learning is the hillclimbsearch method and BIC scoring. Notably, different methods and scoring types can be specified. See the example to specify the search and scoring type:

Example of the various structure learning methods and scoring types in bnlearn.

Although the detected DAG for the sprinkler dataset is insightful and…

Continue reading: https://towardsdatascience.com/a-step-by-step-guide-in-detecting-causal-relationships-using-bayesian-structure-learning-in-python-c20c6b31cee5?source=rss—-7f60cf5620c9—4

Source: towardsdatascience.com