Loading, please wait...

BAI701 Program 5

5. Desing and implement a deep learning network for classification of textual documents.

✅ Install Required Packages

pip install tensorflow scikit-learn numpy

PROGRAM:

import numpy as np
from sklearn.datasets import fetch_20newsgroups
from sklearn.model_selection import train_test_split
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense

print("Step 1: Loading dataset ...")
data = fetch_20newsgroups(subset='all')
texts, labels = data.data, data.target
num_classes = len(set(labels))
print(f"Loaded {len(texts)} documents across {num_classes} classes.")

print("\nStep 2: Preprocessing text ...")
MAX_WORDS = 10000   # vocabulary size
MAX_LEN = 300       # max tokens per document

tokenizer = Tokenizer(num_words=MAX_WORDS, oov_token="<OOV>")
tokenizer.fit_on_texts(texts)

X = tokenizer.texts_to_sequences(texts)
X = pad_sequences(X, maxlen=MAX_LEN)
y = to_categorical(labels, num_classes=num_classes)
print("Text converted to padded sequences.")

print("\nStep 3: Splitting into train and test ...")
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)
print(f"Training samples: {len(X_train)}, Testing samples: {len(X_test)}")

print("\nStep 4: Building deep learning model ...")
model = Sequential([
    Embedding(input_dim=MAX_WORDS, output_dim=128, input_length=MAX_LEN),
    LSTM(128),
    Dense(num_classes, activation="softmax")
])
model.compile(loss="categorical_crossentropy",
              optimizer="adam",
              metrics=["accuracy"])
print(model.summary())

print("\nStep 5: Training model ...")
history = model.fit(X_train, y_train,
          epochs=5,
          batch_size=64,
          validation_split=0.1,
          verbose=2)

print("\nStep 6: Evaluating model on test set ...")
loss, acc = model.evaluate(X_test, y_test, verbose=0)
print(f"Test Accuracy: {acc:.4f}")

OUTPUT:

Step 1: Loading dataset ...
Loaded 18846 documents across 20 classes.

Step 2: Preprocessing text ...
Text converted to padded sequences.

Step 3: Splitting into train and test ...
Training samples: 15076, Testing samples: 3770

Step 4: Building deep learning model ...

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 embedding (Embedding)       (None, 300, 128)          1280000   
                                                                 
 lstm (LSTM)                 (None, 128)               131584    
                                                                 
 dense (Dense)               (None, 20)                2580      
                                                                 
=================================================================
Total params: 1414164 (5.39 MB)
Trainable params: 1414164 (5.39 MB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
None

Step 5: Training model ...
Epoch 1/5
212/212 - 76s - loss: 2.6723 - accuracy: 0.1621 - val_loss: 2.2644 - val_accuracy: 0.3150 - 76s/epoch - 357ms/step
Epoch 2/5
212/212 - 86s - loss: 1.9426 - accuracy: 0.3839 - val_loss: 1.8901 - val_accuracy: 0.4198 - 86s/epoch - 407ms/step
Epoch 3/5
212/212 - 92s - loss: 1.5213 - accuracy: 0.5242 - val_loss: 1.9722 - val_accuracy: 0.4244 - 92s/epoch - 433ms/step
Epoch 4/5
212/212 - 92s - loss: 1.3967 - accuracy: 0.5716 - val_loss: 1.6435 - val_accuracy: 0.4894 - 92s/epoch - 433ms/step
Epoch 5/5
212/212 - 90s - loss: 1.0232 - accuracy: 0.6859 - val_loss: 1.4114 - val_accuracy: 0.5590 - 90s/epoch - 423ms/step

Step 6: Evaluating model on test set ...
Test Accuracy: 0.5804
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x