Loading, please wait...

VTU Circulars & Notifications

VTU Exam Circulars & Notifications

VTU Exam Time Table

VTU Academic Calendar

BAIL606 Program 7

7. Develop a program to load the Titanic dataset. Split the data into training and test sets. Train a decision tree classifier. Visualize the tree structure. Evaluate accuracy, precision, recall, and F1-score.

PROGRAM:

#install required packages
#pip install pandas matplotlib scikit-learn

import pandas as pd
import matplotlib.pyplot as plt

from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, classification_report


# ==============================
# 1. Load Titanic Dataset
# ==============================
titanic = fetch_openml("titanic", version=1, as_frame=True)
df = titanic.frame

print("Titanic Dataset Preview:")
print(df.head())

# ==============================
# 2. Select Useful Columns
# ==============================
df = df[["pclass", "sex", "age", "sibsp", "parch", "fare", "embarked", "survived"]]

# Remove missing values
df = df.dropna()

# ==============================
# 3. Convert Categorical Data
# ==============================
df["sex"] = df["sex"].map({"male": 0, "female": 1})
df["embarked"] = df["embarked"].map({"C": 0, "Q": 1, "S": 2})

# Target column must be integer
df["survived"] = df["survived"].astype(int)

# ==============================
# 4. Features and Target
# ==============================
X = df.drop("survived", axis=1)
y = df["survived"]

# ==============================
# 5. Train-Test Split
# ==============================
X_train, X_test, y_train, y_test = train_test_split(
    X,
    y,
    test_size=0.2,
    random_state=42
)

# ==============================
# 6. Train Decision Tree Classifier
# ==============================
model = DecisionTreeClassifier(
    criterion="gini",
    max_depth=4,
    random_state=42
)

model.fit(X_train, y_train)

# ==============================
# 7. Prediction
# ==============================
y_pred = model.predict(X_test)

# ==============================
# 8. Evaluation Metrics
# ==============================
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

print("\n--- Decision Tree Evaluation ---")
print("Accuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)
print("F1-score:", f1)

print("\nClassification Report:")
print(classification_report(y_test, y_pred))

# ==============================
# 9. Visualize Decision Tree
# ==============================
plt.figure(figsize=(18, 10))

plot_tree(
    model,
    feature_names=X.columns,
    class_names=["Not Survived", "Survived"],
    filled=True,
    rounded=True
)

plt.title("Decision Tree Classifier - Titanic Dataset")
plt.show()

OUTPUT:

Titanic Dataset Preview:
   pclass survived                                             name     sex      age  ...    cabin  embarked boat   body                        home.dest
0       1        1                    Allen, Miss. Elisabeth Walton  female  29.0000  ...       B5         S    2    NaN                     St Louis, MO
1       1        1                   Allison, Master. Hudson Trevor    male   0.9167  ...  C22 C26         S   11    NaN  Montreal, PQ / Chesterville, ON
2       1        0                     Allison, Miss. Helen Loraine  female   2.0000  ...  C22 C26         S  NaN    NaN  Montreal, PQ / Chesterville, ON
3       1        0             Allison, Mr. Hudson Joshua Creighton    male  30.0000  ...  C22 C26         S  NaN  135.0  Montreal, PQ / Chesterville, ON
4       1        0  Allison, Mrs. Hudson J C (Bessie Waldo Daniels)  female  25.0000  ...  C22 C26         S  NaN    NaN  Montreal, PQ / Chesterville, ON

[5 rows x 14 columns]

--- Decision Tree Evaluation ---
Accuracy: 0.7799043062200957
Precision: 0.8958333333333334
Recall: 0.5119047619047619
F1-score: 0.6515151515151515

Classification Report:
              precision    recall  f1-score   support

           0       0.75      0.96      0.84       125
           1       0.90      0.51      0.65        84

    accuracy                           0.78       209
   macro avg       0.82      0.74      0.75       209
weighted avg       0.81      0.78      0.76       209
BAIL606 Program 7
Syllabus Papers
SGPA CGPA