How to use Alteryx for Data Engineering and Science

Photo by Christopher Zarriello on Unsplash

Alteryx is known as a platform that combines analysis, data science and process automation. Integration with many tools can be realized very easily and used for many interesting use cases. In this walk through I used Google BigQuery as a Source and Target Platform.

How to get started:

  • Install Alteryx and create a GCP account with the right to create a service account.
  • Install the Big Query tools for Alteryx.
  • Authenticate against BigQuery by Service-to-service authentication or End user authentication (Full Guide) [1].

Use Cases

Here are a few typical use cases, that I came across, which might give you some inspiration for what Alteryx can do.

Data Preparation

Although tools and cloud platforms like GCP already bring their own data preparation tool (Cloud Dataprep) or services like the Data Transfer Service, you might choose Alteryx because of it’s variety. It also provides you with ESB and Data Analytics/Scientist functionalities — you can choose from one of the many tools to get the job done or use Python and R code.

Data Prep without coding — Image by Author

Another reason could be the on-premise-version, which might be a must-have for you and your company due to data governance concerns. So a typical use case would be to take data from any source or data already loaded into BigQuery and prepare the data for further analytic cases.

Data Prep Workflow — Image by Author

You can use one of the many already build-in data preparation tools (as seen in the blue icon in the image above) or like already said above alternatively use python and R code to do the magic. In the end, you can easily load it back to a new table within BigQuery with the BigQuery Output tool.

Data Integration

Another use case could be the usage of Alteryx for your ETL/ELT processes. Like described above, Alteryx is offering a wide toolset of connectors and data integration tools.

ELT/ETL Workflow — Image by Author

Similar to the upper use case, you can extract data in this example from a MSSQL database, transform it and lastly load it to your Data Warehouse. The wide range of supported data sources of the input tools (as in the green icon) is definitely a big plus.

Supported data sources — Image by Author

This field of application is also described in the success story of the tropical Smoothie CAFE where the data sources is AWS — see here [2].

Reports and Analytics

Also similar to the other cases…

Continue reading:—-7f60cf5620c9—4