Original photo by Sonja Langford on Unsplash

Maybe, like me, you deal with dates a lot when processing data in Python. Maybe, also like me, you get frustrated with dealing with dates in Python, and find you consult the documentation far too often to do the same things over and over again.

Like anyone who codes and finds themselves doing the same thing more than a handful of times, I wanted to make my life easier by automating some common date processing tasks, as well as some simple and frequent feature engineering, so that my common date parsing and processing tasks for a given date could be done with a single function call. I could then select which features I was interested in extracting at a given time afterwards.

This date processing is accomplished via the use of a single Python function, which accepts only a single date string formatted as ‘YYYY-MM-DD‘ (because that’s how dates are formatted), and which returns a dictionary consisting of (currently) 18 key/value feature pairs. Some of these keys are very straightforward (e.g. the parsed four 4 date year) while others are engineered (e.g. whether or not the date is a public holiday). If you find this code at all useful, you should be able to figure out how to alter or extend it to suit your needs. For some ideas on additional date/time related features you may want to code the generation of, check out this article.

Most of the functionality is accomplished using the Python datetime module, much of which relies on the strftime() method. The real benefit, however, is that there is a standard, automated approach to the same repetitive queries.

The only non-standard library used is holidays, a “fast, efficient Python library for generating country, province and state specific sets of holidays on the fly.” While the library can accommodate a whole host of national and sub-national holiodays, I have used the US national holidays for this example. With a quick glance at the project’s documentation and the code below, you will very easily determine how to change this if needed.

So, let’s first take a look at process_date() function. The comments should provide insight into what is going on, should you need it.

import datetime, re, sys, holidays

def process_date(input_str: str) -> {}:
    """Processes and engineers simple features for date strings

      input_str (str): Date string of format '2021-07-14'

      dict: Dictionary of processed date...

Continue reading: https://www.kdnuggets.com/2021/08/engineer-date-features-python.html

Source: www.kdnuggets.com