Classifying plant diseases from leaf photographs with a lightweight CNN.
A MobileNetV2 model trained on the PlantVillage dataset to classify 38 categories of healthy and diseased plant leaves across 14 species. The goal: demonstrate that a small, efficient architecture can achieve strong accuracy — well above the 90% threshold — while remaining viable for resource-constrained deployment.
Context
Why this problem. Why this architecture.
A real food security problem
Plant diseases can devastate crop yield for small farmers, sometimes threatening entire operations. Early detection matters — but traditional methods rely on expert inspection that isn't always accessible or affordable.
Image classification as a solution
A model that classifies disease from a leaf photograph could provide an affordable, fast alternative — deployable on a compact device in the field, without specialist expertise on site.
Lightweight architecture
A detection tool is only useful if it can run where it's needed. Resource-heavy models that require server infrastructure are impractical for small farms. MobileNetV2 was chosen specifically for its efficiency under constrained resources.
54,000 images. 38 categories. 6 epochs.
The model was trained on the PlantVillage dataset — over 54,000 images across 38 healthy and diseased categories covering 14 plant species. The dataset was split 70/20/10 between training, validation, and test sets.
An early attempt trained on a smaller apple-leaf-only subset (~5,000 images) produced test accuracy around 40%. Expanding to PlantVillage changed everything — the model reached 90%+ validation accuracy by epoch 2, and training was kept intentionally short (6 epochs) to avoid overfitting into the dataset's similarities.
Data augmentation — random flips, rotations, zoom, contrast adjustments — was applied to reduce the model's tendency to memorise the training images rather than learn meaningful disease features.
Training vs validation accuracy
MobileNetV2 — efficient by design
MobileNetV2 uses depthwise separable convolutions — splitting a standard convolution into a depthwise step (one filter per input channel) and a pointwise step (1×1 convolution to combine channels). This dramatically reduces the number of operations required while still capturing the feature relationships that drive classification accuracy.
The practical result: a model small enough to run on a compact device, fast enough for near-real-time inference, and accurate enough to be genuinely useful — ~96% on the held-out test set across 38 disease categories.
Why depthwise separable convolution
Standard convolution
Applies each filter across every channel simultaneously — high computational cost
Depthwise step
One filter per input channel — captures spatial features efficiently
Pointwise step
1×1 convolution combines channels — captures cross-channel relationships
Net result
Significantly fewer operations, similar feature extraction, deployable on-device
Results
Evaluated on unseen data. All targets exceeded.
Success criteria: 90%+ accuracy, 0.85+ F1-score, 85%+ confusion matrix score. All three were exceeded on the held-out test set.
Test accuracy
0%
Correctly classified on data the model had never seen during training or validation
Specificity
0%
Correctly identified healthy samples — important for not falsely flagging non-diseased plants
Balanced accuracy
0%
Accounts for class count differences — a more robust measure than raw accuracy across 38 categories
Disease classes
0
Healthy and diseased categories across 14 plant species — tomato, potato, apple, corn, grape, and more
Pipeline
From raw images to a working classifier.
Dataset
PlantVillage: 54,000+ images across 38 classes and 14 plant species. Split 70% training / 20% validation / 10% test.
Preprocessing
Images resized to 224×224 (MobileNetV2's expected input). Pixel values normalised. Augmentation applied after early epochs.
Architecture
MobileNetV2 — a lightweight CNN using depthwise separable convolutions to reduce computation while preserving feature extraction.
Training
6 epochs total. 90%+ accuracy reached by epoch 2. ReduceLROnPlateau used to stabilise learning rate when validation loss stalled.
Evaluation
Tested on held-out data: ~96% accuracy, ~96% specificity, ~94% balanced accuracy. Confusion matrix showed minimal cross-class confusion.
Technologies used
Interested in machine learning for practical classification problems?
Let's talk — whether it's image classification, structured data, or building the data pipeline behind a model.