Loading, please wait...

VTU Circulars & Notifications

VTU Exam Circulars & Notifications

VTU Exam Time Table

VTU Academic Calendar

BAIL606 Program 2

2. Develop a program to Load a dataset with at least two numerical columns (e.g., Iris, Titanic). Plot a scatter plot of two variables and calculate their Pearson correlation coefficient. Write a program to compute the covariance and correlation matrix for a dataset. Visualize the correlation matrix using a heatmap to know which variables have strong positive/negative correlations.

PROGRAM: download dataset file click here

#install required packages
#pip install pandas matplotlib

import pandas as pd
import matplotlib.pyplot as plt

# ==============================
# 1. Load Dataset
# ==============================
df = pd.read_csv("data.csv")

print("Dataset Preview:")
print(df.head())

print("\nColumns in dataset:")
print(df.columns)

# ==============================
# 2. Select Numerical Columns
# ==============================
num_col1 = input("\nEnter first numerical column name: ")
num_col2 = input("Enter second numerical column name: ")

x = df[num_col1]
y = df[num_col2]

# ==============================
# 3. Scatter Plot
# ==============================
plt.figure(figsize=(8, 5))
plt.scatter(x, y)
plt.title(f"Scatter Plot: {num_col1} vs {num_col2}")
plt.xlabel(num_col1)
plt.ylabel(num_col2)
plt.grid(True)
plt.show()

# ==============================
# 4. Pearson Correlation Coefficient
# ==============================
pearson_corr = x.corr(y)

print("\n--- Pearson Correlation Coefficient ---")
print(f"Correlation between {num_col1} and {num_col2}: {pearson_corr}")

# ==============================
# 5. Covariance Matrix
# ==============================
numeric_df = df.select_dtypes(include=["number"])

cov_matrix = numeric_df.cov()

print("\n--- Covariance Matrix ---")
print(cov_matrix)

# ==============================
# 6. Correlation Matrix
# ==============================
corr_matrix = numeric_df.corr()

print("\n--- Correlation Matrix ---")
print(corr_matrix)

# ==============================
# 7. Heatmap of Correlation Matrix
# ==============================
plt.figure(figsize=(8, 6))
plt.imshow(corr_matrix, cmap="coolwarm", interpolation="nearest")
plt.colorbar(label="Correlation Coefficient")

plt.xticks(range(len(corr_matrix.columns)), corr_matrix.columns, rotation=45)
plt.yticks(range(len(corr_matrix.columns)), corr_matrix.columns)

plt.title("Correlation Matrix Heatmap")

# Show values inside heatmap
for i in range(len(corr_matrix.columns)):
    for j in range(len(corr_matrix.columns)):
        plt.text(j, i, round(corr_matrix.iloc[i, j], 2),
                 ha="center", va="center")

plt.tight_layout()
plt.show()

OUTPUT:

Dataset Preview:
    Name  Age  Salary  Department  Experience
0   Amit   25   30000          IT           2
1   Riya   28   35000          HR           3
2  Rahul   35   50000     Finance           8
3  Sneha   30   45000          IT           5
4  Arjun   40   70000  Management          12

Columns in dataset:
Index(['Name', 'Age', 'Salary', 'Department', 'Experience'], dtype='str')

Enter first numerical column name: Age
Enter second numerical column name: Salary

--- Pearson Correlation Coefficient ---
Correlation between Age and Salary: 0.9903428626629642

--- Covariance Matrix ---
                      Age        Salary     Experience
Age             62.266667  1.524571e+05      44.376190
Salary      152457.142857  3.806000e+08  109114.285714
Experience      44.376190  1.091143e+05      31.838095

--- Correlation Matrix ---
                 Age    Salary  Experience
Age         1.000000  0.990343    0.996664
Salary      0.990343  1.000000    0.991228
Experience  0.996664  0.991228    1.000000
BAIL606 Program 2
BAIL606 Program 2
Syllabus Papers
SGPA CGPA