In our example we will be using the popular Titanic data set. While the workflow can be as granular and complex as the user’s work requires, for the purpose illustrating the utility of Targets, we will keep it simplistic and standard. Our workflow includes:
A. Loading the data
B. Pre-processing of the data
C. EDA markdown notebook generation
D. Xgboost model building and prediction on test set with some results on model diagnostics in Markdown
These steps cover the standard process data scientists follow, however there are iterations to this in each stage and we will illustrate different aspects of Targets below through this example.
- Folder structure
Let us create a root folder called “Targets” with the below structure (this is an example, different practices may be followed by different data scientists). Within Targets we have:
Important thing to note here is that
_Targets.R should be in the root folder.
2. Creating functions which are used in targets
Here we create some functions to start with which will become a part of our workflow. As part of the example, refer to functions below which load and pre-process the data (steps A and B).
3. Defining, visualizing and executing workflow
First we create a pipeline with just tasks A and B i.e. load and pre-process the data.
Once the targets are defined, let’s have a look at the flow:
This gives a directed acyclic graph of Targets and does not account for metadata or progress information
This gives a directed acyclic graph of Targets, accounts for metadata or progress information, global functions and objects¹. As we can see below, Targets has automatically detected the dependencies and also functions which are not used as of now anywhere for example,
bar_plot. We also see that all of the below are outdated since we haven’t run the Targets yet which will be our next step.
Along with the above commands, you can also use
tar_manifest() to ensure that you have constructed your pipeline correctly.
Now we run the pipeline:
This runs the correct targets in the correct order and stores the return values in the root folder by creating a new folder _targets. This folder will have
_targets/meta. Now when we visualize the pipeline using
tar_visnetwork(), all targets have been changed to “Up to date”.
4. Accessing files
Continue reading: https://towardsdatascience.com/data-science-workflows-with-the-targets-package-in-r-end-to-end-example-with-code-1e31318074c4?source=rss—-7f60cf5620c9—4