Multiple Linear Regression (MLR) is a tool commonly used by data scientists. Inferential statistical tools like MLR are used to infer patterns that cannot be reached from the source data alone. In the case of MLR, we are trying to predict an outcome, or independent variable, in relation to changes in the predictors, or dependent variables, for a population by determining a linear relationship between the combination of all the dependent variables and the independent variable of a sample from that population. That’s a lot to say in words, so let’s quickly look at some equations to ensure our understanding. First, we’ll start with a linear regression with one dependent variable, where a line fit to the sample data would have the equation
where y is our outcome, x1 is our predictor, b1 is our first coefficient such that a change in the value of x1 by 1 leads to a change in y by b1, and b0 is our y-axis intercept, or where the line crosses the y-axis when x1 is zero. In this case, we would use this model to predict a novel outcome y from a novel independent value x1. We can easily extend this from a singular to a multiple linear regression by adding more coefficent * dependent variable terms, as in
where we still have our axis intercept, in addition to five dependent variables and their respective coefficients. It is technically true that we can easily interpret our coefficients b1 through b5 in the same way as for a singular regression, however, this requires some assumptions that aren’t always perfectly true for data in the real world. The primary assumptions of a linear regression, multiple and singular, are:
- Linearity: There is a linear relationship between the outcome and predictor variable(s).
- Normality: The residuals, or errors, calculated by subtracting the predicted value from the actual value, follow a normal distribution.
- Homoscedasticity: The variability in the dependent variable is equal for all values of the independent variable(s).
Besides that we often still construct linear models for data that doesn’t quite meet these standards, with many independent variables as in MLR, we run into other problems like multicollinearity, where variables that are supposed to be independent vary with each other, and the presence of categorical variables, such as an ocean temperature being classified as cool, warm or hot instead of quantified in degrees….
Continue reading: https://towardsdatascience.com/interpreting-confusing-multiple-linear-regression-results-bd986254a939?source=rss—-7f60cf5620c9—4