By Siddharth (Sid) Kashiramka, (Sr. Manager, Platform, Capital One),
Anshuman Guha, (Principal Data Scientist, Card DS, Capital One),
DeCarlos Taylor, (Director, Card DS, Capital One)

It is now well recognized, across multiple industries, that predictive modeling and machine learning may provide tremendous value for organizations that leverage these techniques as an integral part of their business model. Many organizations spanning both the public and private sectors have adopted a data-driven business strategy where insights derived from either comprehensive data analyses or the application of highly complex machine learning algorithms are used to influence key business or operational decisions. Although there are many organizations leveraging machine learning at scale, and the variety of use cases abound at a high level, the overall machine learning life cycle has a common structure among all organizations irrespective of the specific use case or application. Specifically, for any organization leveraging data science at scale, the machine learning life cycle is defined by four key components: Model Development, Model Deployment, Model Monitoring, and Model Governance (see Figure 1).

Figure 1 – Four critical steps of the machine learning life cycle.

Most data scientists are well versed in the model development part of the machine learning life cycle and have a high degree of familiarity with complex data queries (e.g., SQL), data wrangling, feature engineering, and algorithm training. Further, the performance model monitoring component of the lifecycle is somewhat germane to a data scientist’s function. The performance of models over time in response to changing data distributions or for new application domains can be monitored using relevant statistical metrics of model performance (e.g., mean squared error, precision-recall, etc.) that many data scientists are familiar with. In addition, the model governance requirements, depending on the industry, are generally well defined (although not necessarily well executed!) and are often articulated at great length in organizational policy documents and via general regulatory mandates issued by a governing body.

Although the model development, monitoring, and governance components of the machine learning life cycle are complex and fraught with challenges, the model deployment (or “productionizing”) component of the process is where many organizations seem to have the most trouble. A recent…

Continue reading: