AI / Machine learningUniversity project · ICT206Oct 2024

Classifying plant diseases from leaf photographs with a lightweight CNN.

A MobileNetV2 model trained on the PlantVillage dataset to classify 38 categories of healthy and diseased plant leaves across 14 species. The goal: demonstrate that a small, efficient architecture can achieve strong accuracy — well above the 90% threshold — while remaining viable for resource-constrained deployment.

Context

Why this problem. Why this architecture.

A real food security problem

Plant diseases can devastate crop yield for small farmers, sometimes threatening entire operations. Early detection matters — but traditional methods rely on expert inspection that isn't always accessible or affordable.

Image classification as a solution

A model that classifies disease from a leaf photograph could provide an affordable, fast alternative — deployable on a compact device in the field, without specialist expertise on site.

Lightweight architecture

A detection tool is only useful if it can run where it's needed. Resource-heavy models that require server infrastructure are impractical for small farms. MobileNetV2 was chosen specifically for its efficiency under constrained resources.

54,000 images. 38 categories. 6 epochs.

The model was trained on the PlantVillage dataset — over 54,000 images across 38 healthy and diseased categories covering 14 plant species. The dataset was split 70/20/10 between training, validation, and test sets.

An early attempt trained on a smaller apple-leaf-only subset (~5,000 images) produced test accuracy around 40%. Expanding to PlantVillage changed everything — the model reached 90%+ validation accuracy by epoch 2, and training was kept intentionally short (6 epochs) to avoid overfitting into the dataset's similarities.

Data augmentation — random flips, rotations, zoom, contrast adjustments — was applied to reduce the model's tendency to memorise the training images rather than learn meaningful disease features.

Training vs validation accuracy

Training and validation accuracy tracked closely — a sign of good generalisation, not memorisation.

MobileNetV2 — efficient by design

MobileNetV2 uses depthwise separable convolutions — splitting a standard convolution into a depthwise step (one filter per input channel) and a pointwise step (1×1 convolution to combine channels). This dramatically reduces the number of operations required while still capturing the feature relationships that drive classification accuracy.

The practical result: a model small enough to run on a compact device, fast enough for near-real-time inference, and accurate enough to be genuinely useful — ~96% on the held-out test set across 38 disease categories.

Why depthwise separable convolution

Standard convolution

Applies each filter across every channel simultaneously — high computational cost

Depthwise step

One filter per input channel — captures spatial features efficiently

Pointwise step

1×1 convolution combines channels — captures cross-channel relationships

Net result

Significantly fewer operations, similar feature extraction, deployable on-device

Results

Evaluated on unseen data. All targets exceeded.

Success criteria: 90%+ accuracy, 0.85+ F1-score, 85%+ confusion matrix score. All three were exceeded on the held-out test set.

Test accuracy

Correctly classified on data the model had never seen during training or validation

Specificity

Correctly identified healthy samples — important for not falsely flagging non-diseased plants

Balanced accuracy

Accounts for class count differences — a more robust measure than raw accuracy across 38 categories

Disease classes

Healthy and diseased categories across 14 plant species — tomato, potato, apple, corn, grape, and more

Pipeline

From raw images to a working classifier.

Dataset

PlantVillage: 54,000+ images across 38 classes and 14 plant species. Split 70% training / 20% validation / 10% test.

Preprocessing

Images resized to 224×224 (MobileNetV2's expected input). Pixel values normalised. Augmentation applied after early epochs.

Architecture

MobileNetV2 — a lightweight CNN using depthwise separable convolutions to reduce computation while preserving feature extraction.

Training

6 epochs total. 90%+ accuracy reached by epoch 2. ReduceLROnPlateau used to stabilise learning rate when validation loss stalled.

Evaluation

Tested on held-out data: ~96% accuracy, ~96% specificity, ~94% balanced accuracy. Confusion matrix showed minimal cross-class confusion.

Dataset

PlantVillage: 54,000+ images across 38 classes and 14 plant species. Split 70% training / 20% validation / 10% test.

Preprocessing

Images resized to 224×224 (MobileNetV2's expected input). Pixel values normalised. Augmentation applied after early epochs.

Architecture

MobileNetV2 — a lightweight CNN using depthwise separable convolutions to reduce computation while preserving feature extraction.

Training

6 epochs total. 90%+ accuracy reached by epoch 2. ReduceLROnPlateau used to stabilise learning rate when validation loss stalled.

Evaluation

Tested on held-out data: ~96% accuracy, ~96% specificity, ~94% balanced accuracy. Confusion matrix showed minimal cross-class confusion.

Technologies used

Python

TensorFlow

Keras

MobileNetV2

NumPy

Matplotlib

scikit-learn

Interested in machine learning for practical classification problems?

Let's talk — whether it's image classification, structured data, or building the data pipeline behind a model.

Let's talk →