3. Develop a program to implement Principal Component Analysis (PCA) for reducing the dimensionality of the Iris dataset from 4 features to 2.
PROGRAM:
#install required packages
#pip install pandas matplotlib scikit-learn
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
# ==============================
# 1. Load Iris Dataset
# ==============================
iris = load_iris()
X = iris.data
y = iris.target
feature_names = iris.feature_names
target_names = iris.target_names
df = pd.DataFrame(X, columns=feature_names)
df["species"] = y
print("Original Dataset:")
print(df.head())
print("\nOriginal Shape:", X.shape)
# ==============================
# 2. Standardize the Data
# ==============================
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# ==============================
# 3. Apply PCA: 4 Features → 2 Components
# ==============================
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)
pca_df = pd.DataFrame(X_pca, columns=["PC1", "PC2"])
pca_df["species"] = y
print("\nPCA Dataset:")
print(pca_df.head())
print("\nReduced Shape:", X_pca.shape)
print("\nExplained Variance Ratio:")
print(pca.explained_variance_ratio_)
# ==============================
# 4. Visualize PCA Result
# ==============================
plt.figure(figsize=(8, 6))
for species in range(len(target_names)):
plt.scatter(
pca_df[pca_df["species"] == species]["PC1"],
pca_df[pca_df["species"] == species]["PC2"],
label=target_names[species]
)
plt.xlabel("Principal Component 1")
plt.ylabel("Principal Component 2")
plt.title("PCA of Iris Dataset: 4 Features Reduced to 2")
plt.legend()
plt.grid(True)
plt.show()OUTPUT:
Original Dataset:
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) species
0 5.1 3.5 1.4 0.2 0
1 4.9 3.0 1.4 0.2 0
2 4.7 3.2 1.3 0.2 0
3 4.6 3.1 1.5 0.2 0
4 5.0 3.6 1.4 0.2 0
Original Shape: (150, 4)
PCA Dataset:
PC1 PC2 species
0 -2.264703 0.480027 0
1 -2.080961 -0.674134 0
2 -2.364229 -0.341908 0
3 -2.299384 -0.597395 0
4 -2.389842 0.646835 0
Reduced Shape: (150, 2)
Explained Variance Ratio:
[0.72962445 0.22850762]
