An in-depth approach on how Yemego NGO uses machine learning techniques to improve donations

Photo Credit: Katt Yukawa; Unsplash

Business Understanding

Yemego NGO is a fictitious charity organization that provides programs and services for veterans with spinal cord injuries or diseases. Direct mailing campaigns are used to raise funds for charity organizations. Using previous mailing campaigns, charity organizations can contact people who donated in the past.

The challenge here is to draw insights from previous donor history and make predictions. By predicting donors who will donate and estimate how much donors will donate, we can help save the resources required to do the actual work of caring for the less privileged. One way to reduce costs for Yemego is to increase the efficiency of donor outreach by identifying donors most likely to donate to the organization. Since there is a $5 cost of mailing each campaign donor, this project aims to save time and money for Yemego NGO by targeting only the most probable potential donors based on past response data.

The Data

The datasets used in this project are a subset of the data originally provided in the KDD Cup’98 competition. It contains both training and test datasets. The training dataset consists of 19,000+ records and 50 categorical, ordered, and quantitative variables, while the test dataset consists of 2,000+ records and 48 categorical, ordered, or quantitative variables. In addition, the dataset contains two target variables, one binary indicating if a person donated(Target_B) and the other is the donation amount the person gave to the charity as a response to the campaign (Target_D). The variables can be used to build a model to predict whether a donor would donate to the charity organization or not donate and also estimate how much a donor would donate.

Data Preprocessing

I always start with looking at the data. Basically, look for the total number of observations, the total number of features, missing values, which features should be encoded, which features have miscellaneous data, etc. For ease of analysis, I renamed Target_B to Donated and Target_D to Amount_donated.

Looking at the training dataset, I tried to…

Continue reading:—-7f60cf5620c9—4