By Dr. Varshita Sher, Data Scientist

In this article, I wanted to share a few tips for writing cleaner codes that I have absorbed in the last year — mainly from pair programming. Generally speaking, including them as part of my everyday coding routine has helped me generate supreme quality Python scripts, that are easily maintainable and scalable over time.

Ever thought why senior developer’s code look so much better in comparison to a junior developer. Read on to bridge the gap….

Rather than giving generic examples on how to use these techniques, I will be giving real-life coding scenarios where I have actually used them! Here is the Jupyter Colab Notebook if you’d like to follow along!

1. Use tqdm when working with for loops.

 
 
Imagine looping over a large iterable (list, dictionary, tuple, set), and not knowing whether the code has finished running! Bummerright! In such scenarios make sure to use tqdm construct to display a progress bar alongside.

For instance, to display the progress as I read through all the files present in 44 different directories (whose paths I have already stored in a list called fpaths):

from tqdm import tqdmfiles = list()
fpaths = ["dir1/subdir1", "dir2/subdir3", ......]

for fpath in tqdm(fpaths, desc="Looping over fpaths")):
         files.extend(os.listdir(fpath))

Using tqdm with “for“ loop


Note: Use the desc argument to specify a small description for the loop.

2. Use type hinting when writing functions.

 
 
In simple terms, it means explicitly stating the type of all the arguments in your Python function definition.

I wish there were specific use cases I could provide to emphasize when I use type hinting for my work, but the truth is, I use them more often than not.

Here’s a hypothetical example of a function update_df(). It updates a given data frame by appending a row containing useful information from a simulation run — such as classifier used, accuracy scored, train-test split size, and additional remarks for that particular run.

def update_df(df: pd.DataFrame, 
              clf: str, 
              acc: float,
              remarks: List[str] = []
              split:float = 0.5) -> pd.DataFrame:

    new_row = {'Classifier':clf, 
               'Accuracy':acc, 
               'split_size':split,
               'Remarks':remarks}

    df = df.append(new_row, ignore_index=True)
    return df



Few things to note:

  • The datatype…

Continue reading: https://www.kdnuggets.com/2021/08/data-scientist-guide-efficient-coding-python.html

Source: www.kdnuggets.com