For this project we will attempt to use KMeans Clustering to cluster Universities into to two groups, Private and Public. It is very important to note, we actually have the labels for this data set, but we will NOT use them for the KMeans clustering algorithm, since that is an unsupervised learning algorithm.
When using the Kmeans algorithm under normal circumstances, it is because you don’t have labels. In this case we will use the labels to try to get an idea of how well the algorithm performed, but you won’t usually do this for Kmeans, so the classification report and confusion matrix at the end of this project, don’t truly make sense in a real world setting!.
We will use a data frame with 777 observations on the following 18 variables.… Read more...