Loading, please wait...

BAD702 Program 9

9. You have data on housing prices and square footage and notice that the relationship between square footage and price is nonlinear. Fit a spline regression model to allow the relationship between square footage and price to change at 2,000 square feet. Explain how spline regression can capture different behaviours of the relationship before and after 2,000 square feet.

PROGRAM:

import pandas as pd
import numpy as np
import statsmodels.api as sm
import matplotlib.pyplot as plt

# dataset
data = {
    'Price': [200000, 250000, 300000, 320000, 350000, 400000, 420000, 450000, 500000, 550000],
    'SqFt': [1500, 1600, 1800, 1900, 2000, 2100, 2200, 2400, 2600, 2800]
}
df = pd.DataFrame(data)

# Define the spline term for SqFt with knot at 2000
df['sqft_knot'] = np.maximum(0, df['SqFt'] - 2000)

# Define independent variables (including intercept)
X = sm.add_constant(df[['SqFt', 'sqft_knot']])
y = df['Price']

# Fit linear regression with spline
model = sm.OLS(y, X).fit()

# Print summary
print(model.summary())

# Plot the fitted spline regression
plt.scatter(df['SqFt'], df['Price'], color='blue', label='Data')
plt.plot(df['SqFt'], model.predict(X), color='red', label='Spline Fit')
plt.xlabel('Square Footage')
plt.ylabel('Price')
plt.title('Spline Regression of Price on Square Footage')
plt.legend()
plt.show()

OUTPUT:

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                  Price   R-squared:                       0.992
Model:                            OLS   Adj. R-squared:                  0.990
Method:                 Least Squares   F-statistic:                     458.5
Date:                Wed, 03 Sep 2025   Prob (F-statistic):           3.78e-08
Time:                        08:53:50   Log-Likelihood:                -105.38
No. Observations:                  10   AIC:                             216.8
Df Residuals:                       7   BIC:                             217.7
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const      -2.557e+05   4.11e+04     -6.217      0.000   -3.53e+05   -1.58e+05
SqFt         308.7019     22.587     13.667      0.000     255.293     362.111
sqft_knot    -73.5822     32.468     -2.266      0.058    -150.356       3.191
==============================================================================
Omnibus:                        1.654   Durbin-Watson:                   1.972
Prob(Omnibus):                  0.437   Jarque-Bera (JB):                0.915
Skew:                           0.381   Prob(JB):                        0.633
Kurtosis:                       1.729   Cond. No.                     2.55e+04
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 2.55e+04. This might indicate that there are
strong multicollinearity or other numerical problems.
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x