Loading, please wait...

BAD702 Program 8

8. A company collects data on employees’ salaries and records their education level as a categorical variable with three levels: “High School”, “Bachelor’s”, and “Master’s”. Fit a multiple linear regression model to predict salary using education level (as a factor variable) and years of experience. Interpret the coefficients for the education levels in the regression model.

PROGRAM:

import pandas as pd
import statsmodels.formula.api as smf

# dataset
data = {
    'Salary': [40000, 50000, 60000, 45000, 55000, 65000, 48000, 58000, 70000, 62000],
    'Education': ['High School', "Bachelor's", "Master's", 'High School', "Bachelor's", "Master's", 'High School', "Bachelor's", "Master's", "Bachelor's"],
    'Experience': [2, 5, 7, 3, 6, 8, 4, 5, 9, 6]
}

df = pd.DataFrame(data)

# Fit multiple linear regression model
# Explicitly set 'High School' as the baseline category
model = smf.ols('Salary ~ Experience + C(Education, Treatment(reference="High School"))', data=df).fit()

# Print model summary
print(model.summary())

OUTPUT:

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                 Salary   R-squared:                       0.928
Model:                            OLS   Adj. R-squared:                  0.892
Method:                 Least Squares   F-statistic:                     25.72
Date:                Wed, 03 Sep 2025   Prob (F-statistic):           0.000799
Time:                        08:50:14   Log-Likelihood:                -92.071
No. Observations:                  10   AIC:                             192.1
Df Residuals:                       6   BIC:                             193.4
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
==================================================================================================================================
                                                                     coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------------------------------------------------------
Intercept                                                       3.083e+04   4547.690      6.780      0.001    1.97e+04     4.2e+04
C(Education, Treatment(reference="High School"))[T.Bachelor's]   666.6667   4215.821      0.158      0.880   -9649.076     1.1e+04
C(Education, Treatment(reference="High School"))[T.Master's]   -1833.3333   7411.827     -0.247      0.813      -2e+04    1.63e+04
Experience                                                      4500.0000   1392.440      3.232      0.018    1092.822    7907.178
==============================================================================
Omnibus:                        0.016   Durbin-Watson:                   1.820
Prob(Omnibus):                  0.992   Jarque-Bera (JB):                0.149
Skew:                          -0.002   Prob(JB):                        0.928
Kurtosis:                       2.403   Cond. No.                         55.7
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x