In my prior blog “Reframing Data Management: Data Management 2.0”, I talked about the importance of transforming data management into a business strategy that supports the sharing, re-using and continuous refinement of the data and analytics assets to derive and drive new sources of customer, product, and operational value. If data is “the world’s most valuable resource”, then we must transform data management into an offensive, “data monetization” business strategy that proactively guides organizations in the application of their data to the business to drive quantifiable financial impact (Figure 1).
Figure 1: Activating Data Management
In this blog I want to drill into the importance of Machine Learning (ML) “Features”. Now, I have always found the term “features” a bit confusing. When we describe the “features” for a car, we talk about manufacturer, model, style, color, cruise control, lane departure warnings, blue tooth connectivity, and such. But when we talk “features” with respect to ML, it’s a bit more nuanced:
ML Features are the attributes, properties, or data variables that ML models use during training and inference to make predictions.
ML Features are the data variables that are most useful in making predictions that deliver quantifiable financial impact. For example, if we build an ML model to predict the survival rate of the Titanic passengers, the ML model would learn that “Title” and “Sex are the most important features in predicting one’s likely survival (Figure 2).
Figure 2: Source: “Predicting the Survival of Titanic Passengers”
The determination as to which ML Features to engineer is highly dependent upon a deep understanding of the problem the business is trying to solve; that is, what decisions they are trying to optimize and the KPIs against which they will measure decision progress and success.
Feature Selection is the process of selecting a subset of relevant features (variables, predictors) for use in model construction. Feature selection techniques are used to:
- simplify models to make them easier to interpret by users
- shorten training times
- mitigate dimensionality which can reduce model performance
Feature Engineering is the process of blending domain knowledge with data science to engineer new data variables that can be used by ML models to make predictions in context of the problem being addressed. For a Retailer making decisions regarding in-store merchandising, the…
Continue reading: http://www.datasciencecentral.com/xn/detail/6448529:BlogPost:1069002