Walk-through Example

In our example we will be using the popular Titanic data set. While the workflow can be as granular and complex as the user’s work requires, for the purpose illustrating the utility of Targets, we will keep it simplistic and standard. Our workflow includes:

A. Loading the data
B. Pre-processing of the data
C. EDA markdown notebook generation
D. Xgboost model building and prediction on test set with some results on model diagnostics in Markdown

These steps cover the standard process data scientists follow, however there are iterations to this in each stage and we will illustrate different aspects of Targets below through this example.

  1. Folder structure
    Let us create a root folder called “Targets” with the below structure (this is an example, different practices may be followed by different data scientists). Within Targets we have:
Source : by Author

Important thing to note here is that _Targets.R should be in the root folder.

2. Creating functions which are used in targets
Here we create some functions to start with which will become a part of our workflow. As part of the example, refer to functions below which load and pre-process the data (steps A and B).

3. Defining, visualizing and executing workflow
First we create a pipeline with just tasks A and B i.e. load and pre-process the data.

Once the targets are defined, let’s have a look at the flow:

tar_glimpse() 

This gives a directed acyclic graph of Targets and does not account for metadata or progress information

Image source : By author
tar_visnetwork()

This gives a directed acyclic graph of Targets, accounts for metadata or progress information, global functions and objects¹. As we can see below, Targets has automatically detected the dependencies and also functions which are not used as of now anywhere for example, bar_plot. We also see that all of the below are outdated since we haven’t run the Targets yet which will be our next step.

Image source : By Author

Along with the above commands, you can also use tar_manifest() to ensure that you have constructed your pipeline correctly.

Now we run the pipeline:

tar_make()
Image source : By author

This runs the correct targets in the correct order and stores the return values in the root folder by creating a new folder _targets. This folder will have _targets/objects and _targets/meta. Now when we visualize the pipeline using tar_visnetwork(), all targets have been changed to “Up to date”.

Image source : By author

4. Accessing files
To access…

Continue reading: https://towardsdatascience.com/data-science-workflows-with-the-targets-package-in-r-end-to-end-example-with-code-1e31318074c4?source=rss—-7f60cf5620c9—4

Source: towardsdatascience.com