8. A company collects data on employees’ salaries and records their education level as a categorical variable with three levels: “High School”, “Bachelor’s”, and “Master’s”. Fit a multiple linear regression model to predict salary using education level (as a factor variable) and years of experience. Interpret the coefficients for the education levels in the regression model.
PROGRAM:
import pandas as pd
import statsmodels.formula.api as smf
# dataset
data = {
'Salary': [40000, 50000, 60000, 45000, 55000, 65000, 48000, 58000, 70000, 62000],
'Education': ['High School', "Bachelor's", "Master's", 'High School', "Bachelor's", "Master's", 'High School', "Bachelor's", "Master's", "Bachelor's"],
'Experience': [2, 5, 7, 3, 6, 8, 4, 5, 9, 6]
}
df = pd.DataFrame(data)
# Fit multiple linear regression model
# Explicitly set 'High School' as the baseline category
model = smf.ols('Salary ~ Experience + C(Education, Treatment(reference="High School"))', data=df).fit()
# Print model summary
print(model.summary())
OUTPUT:
OLS Regression Results
==============================================================================
Dep. Variable: Salary R-squared: 0.928
Model: OLS Adj. R-squared: 0.892
Method: Least Squares F-statistic: 25.72
Date: Wed, 03 Sep 2025 Prob (F-statistic): 0.000799
Time: 08:50:14 Log-Likelihood: -92.071
No. Observations: 10 AIC: 192.1
Df Residuals: 6 BIC: 193.4
Df Model: 3
Covariance Type: nonrobust
==================================================================================================================================
coef std err t P>|t| [0.025 0.975]
----------------------------------------------------------------------------------------------------------------------------------
Intercept 3.083e+04 4547.690 6.780 0.001 1.97e+04 4.2e+04
C(Education, Treatment(reference="High School"))[T.Bachelor's] 666.6667 4215.821 0.158 0.880 -9649.076 1.1e+04
C(Education, Treatment(reference="High School"))[T.Master's] -1833.3333 7411.827 -0.247 0.813 -2e+04 1.63e+04
Experience 4500.0000 1392.440 3.232 0.018 1092.822 7907.178
==============================================================================
Omnibus: 0.016 Durbin-Watson: 1.820
Prob(Omnibus): 0.992 Jarque-Bera (JB): 0.149
Skew: -0.002 Prob(JB): 0.928
Kurtosis: 2.403 Cond. No. 55.7
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.