The authentication flow has completed. You may close this window.

If all goes well, you should see a page containing the above message. This means we are good to go!

To acquire the data, we can write a simple function called get_data() to get all user messages with a specified label id.

Executing API calls

The code to get email metadata is roughly as follows:

results = services.users().messages().list(userId='me', labelIds = ['XXXXXXXX'], maxResults=500).execute()

The following line of code compiles a list of the metadata for all fetched messages:

messages = results.get('messages', [])

Since the maximum number of results that will be returned by this type of API call can never exceed 500, we will collect the next page token from the API call and implement a simple while condition.

To invoke this method, we can import it from the quickstart module and specify that we want to fetch Medium Daily Digest emails. Under the hood, this will locate the correct label id.

from quickstart import get_service, get_data
service = get_service()
messages = get_data(service, 'Medium Daily Digest')
print(len(messages))

And just like that, 596 messages fetched! So can we print one out? Well… not quite. Printing the first message out actually returns the following metadata:

{'id': '17b590fa1e590dfa', 'threadId': '17b590fa1e590dfa'}

This means there are more API calls to make in order to retrieve the email content itself. To get an actual message, we must make the following API call:

msg = service.users().message().get(userId='me', id=message['id'], format='full').execute()
Image by author

Ahhh… chaos! This is not exactly the precisely extracted content we were hoping for, but it is a starting place. It turns out that our msg object is actually a dictionary which has the following keys: dict_keys([‘id’, ‘threadId’, ‘labelIds’, ‘snippet’, ‘payload’, ‘sizeEstimate’, ‘historyId’, ‘internalDate’]).

Printing out msg['snippet'] is actually pretty neat, but limits us to 200 characters or so. Thus, all of the valuable info will have to be extracted using the 'payload' key.

Image by author

Interestingly, the email body data is a MIME message part, which essentially means that the data is stored as a base64url encoded string. To decode it, we can use the base64 Python library. The fully fleshed out data acquisition code flow is provided below:

Email Body Parsing with Regex

Focusing on our example email message that we were experimenting with above, let’s decode the body…

Continue reading: https://towardsdatascience.com/extracting-metadata-from-medium-daily-digest-newsletters-via-gmail-api-97eee890a439?source=rss—-7f60cf5620c9—4

Source: towardsdatascience.com