7. Develop a program to load the Titanic dataset. Split the data into training and test sets. Train a decision tree classifier. Visualize the tree structure. Evaluate accuracy, precision, recall, and F1-score.
PROGRAM:
#install required packages
#pip install pandas matplotlib scikit-learn
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, classification_report
# ==============================
# 1. Load Titanic Dataset
# ==============================
titanic = fetch_openml("titanic", version=1, as_frame=True)
df = titanic.frame
print("Titanic Dataset Preview:")
print(df.head())
# ==============================
# 2. Select Useful Columns
# ==============================
df = df[["pclass", "sex", "age", "sibsp", "parch", "fare", "embarked", "survived"]]
# Remove missing values
df = df.dropna()
# ==============================
# 3. Convert Categorical Data
# ==============================
df["sex"] = df["sex"].map({"male": 0, "female": 1})
df["embarked"] = df["embarked"].map({"C": 0, "Q": 1, "S": 2})
# Target column must be integer
df["survived"] = df["survived"].astype(int)
# ==============================
# 4. Features and Target
# ==============================
X = df.drop("survived", axis=1)
y = df["survived"]
# ==============================
# 5. Train-Test Split
# ==============================
X_train, X_test, y_train, y_test = train_test_split(
X,
y,
test_size=0.2,
random_state=42
)
# ==============================
# 6. Train Decision Tree Classifier
# ==============================
model = DecisionTreeClassifier(
criterion="gini",
max_depth=4,
random_state=42
)
model.fit(X_train, y_train)
# ==============================
# 7. Prediction
# ==============================
y_pred = model.predict(X_test)
# ==============================
# 8. Evaluation Metrics
# ==============================
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)
print("\n--- Decision Tree Evaluation ---")
print("Accuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)
print("F1-score:", f1)
print("\nClassification Report:")
print(classification_report(y_test, y_pred))
# ==============================
# 9. Visualize Decision Tree
# ==============================
plt.figure(figsize=(18, 10))
plot_tree(
model,
feature_names=X.columns,
class_names=["Not Survived", "Survived"],
filled=True,
rounded=True
)
plt.title("Decision Tree Classifier - Titanic Dataset")
plt.show()OUTPUT:
Titanic Dataset Preview:
pclass survived name sex age ... cabin embarked boat body home.dest
0 1 1 Allen, Miss. Elisabeth Walton female 29.0000 ... B5 S 2 NaN St Louis, MO
1 1 1 Allison, Master. Hudson Trevor male 0.9167 ... C22 C26 S 11 NaN Montreal, PQ / Chesterville, ON
2 1 0 Allison, Miss. Helen Loraine female 2.0000 ... C22 C26 S NaN NaN Montreal, PQ / Chesterville, ON
3 1 0 Allison, Mr. Hudson Joshua Creighton male 30.0000 ... C22 C26 S NaN 135.0 Montreal, PQ / Chesterville, ON
4 1 0 Allison, Mrs. Hudson J C (Bessie Waldo Daniels) female 25.0000 ... C22 C26 S NaN NaN Montreal, PQ / Chesterville, ON
[5 rows x 14 columns]
--- Decision Tree Evaluation ---
Accuracy: 0.7799043062200957
Precision: 0.8958333333333334
Recall: 0.5119047619047619
F1-score: 0.6515151515151515
Classification Report:
precision recall f1-score support
0 0.75 0.96 0.84 125
1 0.90 0.51 0.65 84
accuracy 0.78 209
macro avg 0.82 0.74 0.75 209
weighted avg 0.81 0.78 0.76 209
