When I’ve just started my path in data science everything was about accurate modeling for me. But quickly I realized that to provide real value, models can’t exist in a vacuum. I was missing important aspects of data to get reasonable performance, it wasn’t very clear how users react to model outcomes. So I’ve started to collect examples from the products that I thought or knew were ML-powered, to understand the ways different companies collect their data to address these questions. In this post, I want to share some of the cases I’ve gathered, mostly from consumer-facing products, and what problems they solve for data scientists and product managers working on data-powered products.
Although many of the patterns I describe below are not purely specific to ML products and can apply to any digital product, they become critical when it comes to ML. Why? ML models can operate on examples they’ve never seen before or in highly personalized environments, so they’re nearly impossible to test for every output. Allowing for user feedback thus helps to identify experiences that didn’t work well. Also, some models are capable of dynamically incorporating feedback and almost immediately adjusting user experience. Most importantly, ML models are based on data, so quality data collection at scale is a basis for quality models.
Disclaimer. I don’t actually know how the elements described in this post function within the real products — this review is based on my understanding and information companies shared in publicly available articles and presentations. What I describe here is my opinion — “what I would likely do”.
I’ll write about several categories of data collection:
- Pre-experience — used to tune in the product functionality before usage.
- Feedback — used to measure the reaction of the user to the product experience.
- Crowdsourcing — used to collect additional data which is not linked to the product experience for the specific user.
Pre-experience data collection can be used to quickly personalize the product. The goal here is to collect relevant data fast. It helps to make sure users are exposed to the data-powered product functionality as soon as possible. In some cases, it’s the only way to make a product work. So data collection needs to happen during onboarding to a product or feature. It can come in a form of filling the user profile, setting up personal goals, or calibrating the…
Continue reading: https://towardsdatascience.com/data-collection-in-machine-learning-products-816c1e1951b1?source=rss—-7f60cf5620c9—4