EcoSort

An AI-assisted waste classification project that explores how simple images of everyday trash can support better recycling, less contamination, and more sustainable cities.

CNN & Image Classification

TrashNet Dataset

Recycling & Sustainability

Scroll to begin ↓

Section 1 · Introduction

Understanding everyday waste decisions

What is EcoSort about?

EcoSort focuses on the simple moment when a person stands in front of a bin and tries to decide where an item should go. This everyday choice determines whether materials stay in circulation or end up as permanent waste. Many people rely on quick visual judgment, which often leads to items being placed in the wrong bin. These small errors matter because a single misplaced object can contaminate an entire batch of materials. EcoSort is concerned with these real-world decisions. By looking closely at images of common objects, the project highlights how people interpret waste based on visual cues alone. This perspective helps reveal how ordinary choices accumulate into larger patterns that affect communities and the environment.

Background: how waste systems work?

Waste systems begin at a bin but extend through long chains of trucks, workers, and sorting facilities that handle thousands of items each day. Once something is thrown away, it enters a process designed to separate what can be reused from what must be discarded. These systems rely heavily on items being sorted correctly at the start because early mistakes are difficult to correct later. When contaminated or incorrect items enter recycling streams, entire loads can lose value and be rejected. Rejected loads often end up in landfills even when many items inside were recyclable. This outcome increases waste, energy use, and environmental impact. Understanding this background shows why small decisions at the bin can have large consequences downstream.

Who is affected by mis-sorted waste?

Mis-sorted waste affects a wide range of people, even though most never see the consequences directly. Residents may believe they recycled correctly, only to learn that contamination caused an entire load to be discarded. Workers at sorting facilities face additional effort and safety risks when they must remove items that do not belong. Cities and campuses often pay higher fees when loads are rejected or require special handling. These costs ultimately fall on communities that rely on predictable and affordable waste services. Over time, mis-sorting reduces the effectiveness of recycling programs and public trust in the system. The combined impact touches households, workers, local budgets, and the environment.

Why does better sorting matter?

Proper sorting helps keep valuable materials in circulation rather than sending them to landfills. When recyclables remain clean and separated, they can be turned into new products with less energy and fewer raw materials. This reduces pressure on natural resources such as metals, timber, and fossil fuels. Better sorting also lowers emissions, saves energy, and improves the overall efficiency of recycling systems. Communities and campuses that sort correctly are more likely to meet sustainability goals and reduce waste costs. Clean, well-organized waste streams also lead to clearer public spaces and more reliable waste services. These benefits show why small choices at the bin carry long-term importance.

Where do images fit into this story?

Images play an important role in understanding how people sort waste because visual appearance shapes most sorting decisions. The shape, texture, and condition of an item influence whether a person believes it belongs in recycling, compost, or landfill. These details matter because people often rely on quick visual cues rather than reading labels or instructions. By examining many photos of common waste items, EcoSort reveals patterns in how people interpret these objects. Images make it easier to compare what items look like and how easily they might be mistaken for something else. This visual perspective provides a simple entry point into understanding everyday sustainability challenges. EcoSort uses these images to explore how clearer cues could eventually support better sorting habits.

Section 2

Data and preparation

This section describes the dataset behind EcoSort, how the images were gathered and cleaned, and how the final collection was organized for analysis.

What dataset does EcoSort use?

EcoSort is built on a public image collection known as the TrashNet dataset. TrashNet contains photographs of everyday waste items placed on simple backgrounds and grouped into six categories: cardboard, glass, metal, paper, plastic, and mixed trash. Each image shows a single object, such as a bottle, can, or piece of cardboard, which makes it easier to see the visual cues people might rely on when sorting. Because the backgrounds are clean and consistent, the images invite the reader to focus on the material itself rather than clutter in the scene. This makes TrashNet a practical starting point for exploring how common materials actually look at the moment someone decides where they belong.

View original TrashNet dataset

Download resized TrashNet version (Kaggle)

What does the raw dataset look like?

The raw TrashNet dataset is organized into six folders, one for each material type. Before any cleaning or splitting, a quick look at the dataset helps understand what the model will see.

Total images: 2,527 files across all classes.
Number of classes: 6 (cardboard, glass, metal, paper, plastic, trash).
Raw folder structure: one folder per class, each containing various lighting, shape, and condition variations.
Typical image resolution: varies widely (e.g., 384×512, 512×384, etc.) before resizing.
File formats: mix of JPG and PNG images.

View sample of the raw image folders (EcoSort)

Note: The full raw TrashNet dataset used for training is stored locally and excluded from the repository because of its size and Git ignore settings. The linked folder shows a small sample of the original folder structure; all analysis in EcoSort was run on the complete dataset.

How are the classes distributed?

Bar chart showing the number of images in each TrashNet category

The bar chart shows how many images are available for each category in the dataset. Some classes, such as paper and plastic, have noticeably more samples, while others like trash and metal are smaller and more varied. This imbalance matters because it shapes what the model sees most often during training: large classes provide strong, consistent patterns, while smaller classes demand more careful augmentation and evaluation to avoid the model defaulting to the majority categories.

Keeping this distribution in mind is important when interpreting EcoSort’s performance. High overall accuracy can hide weaker performance on underrepresented classes, so later sections always report results per class, not just a single aggregate score.

How were the images cleaned and prepared?

Before EcoSort can draw any conclusions, the images need to be cleaned and brought into a consistent format.

All images are loaded from their class folders and checked to ensure they contain valid pixel data.
Files are resized to a consistent resolution so each sample has the same dimensions during training.
Images are converted to RGB format to avoid inconsistencies from grayscale or multi-channel inputs.
Orientation and clarity are verified so that the main object remains visible and centered.
Obvious duplicates or corrupted images are filtered out to prevent noisy samples from entering the dataset.
Labels are standardized using folder names so each image can be traced cleanly back to its category.
The cleaned images are then saved into structured folders (train/val/test) for consistent processing throughout the pipeline.

View the cleaning and preprocessing code

Example: before and after cleaning

Before cleaning Example raw TrashNet image before cleaning

After cleaning Same image after resizing and cleaning

Original resolution

512 × 384 pixels (varies by sample)

Processed resolution

224 × 224 pixels (uniform)

These side-by-side examples illustrate how the same item looks before and after basic preparation.

The processed images look very similar to the raw images because the preprocessing step is designed to standardize the data rather than alter it. EcoSort does not apply any heavy transformations at this stage. Instead, images are cleaned, resized, and converted to a consistent RGB format while preserving their original appearance. This ensures that each sample remains visually true to the real object while still being uniform enough for the model to learn from effectively.

How was the dataset organized for exploration?

After cleaning, the dataset is divided into three groups that each serve a specific purpose in the exploration and evaluation process. This structure ensures that the model learns from one portion of the data, is checked on another, and is finally measured on completely unseen examples.

Training set (80%): the largest split, used to teach the model what typical examples of each category look like.
Validation set (10%): a checkpoint split that verifies whether the learned patterns still hold on new images the model did not train on.
Test set (10%): kept completely separate until final evaluation to measure true generalization on unseen samples.

Splitting the data this way mirrors a natural learning process: study a subset, periodically check your understanding, and then test that knowledge in a new setting. This approach reduces over-fitting, keeps the evaluation fair, and ensures the model is judged on its ability to generalize rather than memorize.

View the dataset splitting code

How does the data look after cleaning?

After preparing the dataset, each category was rechecked for image quality, duplicates, and valid file types. The cleaned dataset is slightly smaller than the raw version because blurred, unusable, or mislabeled images were removed. The bar chart below shows the new distribution across all six categories.

After-cleaning class distribution bar chart

The processed dataset preserves the same six-category structure as the original, but all images are now standardized and cleaned. Each file has been resized, converted to RGB, checked for readability, and verified to ensure the main object is clearly visible. This results in a more uniform and reliable set of samples for model training.

While the overall balance across categories remains similar, certain classes became slightly smaller after removing corrupted or low-quality files. These adjustments help reduce noise in the training data and improve the model’s ability to learn consistent visual patterns.

Total processed images: 2019 files after removing duplicates, corrupted files, and unusable samples.
Number of classes: 6 (cardboard, glass, metal, paper, plastic, trash).
Processed folder structure: organized into train, val, and test splits, each containing all six categories.
Final image resolution: all images resized to a uniform 224 × 224 RGB format.
File formats: cleaned and saved as consistent RGB JPG images.

View sample of the cleaned (processed) dataset folders

Note: Only a small sample of the processed dataset is shown here. The full processed dataset used for training is stored locally and excluded from the repository due to its size and Git ignore settings.

Data and code access

To keep EcoSort reproducible and transparent, the entire project (including the sample raw data, sample processed data, preprocessing scripts, splitting code, training pipeline, and visualizations) is available in the GitHub repository. Only small samples of the dataset are included in the repo. The full raw and processed datasets used for training are stored locally due to their size and Git ignore settings.

EcoSort GitHub Repository (project code & sample data)

Section 3 · Analysis · Models & Methods

Teaching a model to “see” materials

EcoSort uses convolutional neural networks (CNNs) to recognize materials from images. CNNs are a natural choice for this task because they are designed to look for local patterns in pixels (edges, textures, and shapes) that are strong indicators of whether an object is made of cardboard, glass, metal, paper, plastic, or mixed trash. Rather than manually designing features, the network learns filters that highlight what matters most in each class.

Two related architectures were explored. A custom EcoSortCNN was used as a baseline, built from stacked convolution, batch normalization, and pooling layers followed by a small fully connected classifier. The final model is a ResNet-18 based network, which starts from a backbone pretrained on ImageNet and then fine-tunes it for EcoSort’s six categories. Using a pretrained backbone allows the model to reuse rich, general visual features (such as corners, textures, and object parts) and adapt them to the more specific task of distinguishing common waste items.

View model code (model.py)

What is a Convolutional Neural Network (CNN)?

A Convolutional Neural Network (CNN) is a type of deep learning model that is especially good at understanding images. Instead of looking at the entire image at once, a CNN scans it using small filters that move across pixels. These filters learn to detect important visual patterns such as edges, corners, curves, and textures. As the image flows through deeper layers, the model shifts from detecting simple shapes to recognizing more complex structures like bottle rims, shiny reflections, cardboard surfaces, and so on.

Convolution layers: learn small patterns in small regions of the image.
ReLU activation: keeps only the useful parts of the signal.
Pooling layers: shrink the image while keeping the important regions.
Fully connected layer: uses all extracted features to make a final prediction.

General CNN diagram showing convolution, activation, pooling, flattening, and fully-connected layers.

Illustration: A typical CNN pipeline for image classification.

What is the Custom EcoSortCNN?

The Custom EcoSortCNN is a lightweight convolutional neural network built specifically for this project. It serves as a baseline model, showing how well a straightforward CNN can perform before introducing more advanced techniques like transfer learning. Its design follows the classic CNN structure: early layers extract patterns from the image, and the final layers turn those patterns into a prediction for one of the six material categories.

        Input Image (224 × 224 × 3)
                    ↓
        ──────────────────────────────────────────────
        Block 1:  Conv(32 filters, 3×3)
                  BatchNorm
                  ReLU
                  MaxPool(2×2)
                    ↓
        Block 2:  Conv(64 filters, 3×3)
                  BatchNorm
                  ReLU
                  MaxPool(2×2)
                    ↓
        Block 3:  Conv(128 filters, 3×3)
                  BatchNorm
                  ReLU
                  MaxPool(2×2)
                    ↓
        Block 4:  Conv(256 filters, 3×3)
                  BatchNorm
                  ReLU
                  MaxPool(2×2)
                    ↓
        ──────────────────────────────────────────────
        Flatten
                    ↓
        Fully Connected Layer (512 units)
                    ↓
        Dropout (0.5)
                    ↓
        Output Layer (6 material classes)

4 convolution blocks: each block has Conv + BatchNorm + ReLU + MaxPool.
Channels grow: 32 → 64 → 128 → 256, allowing the model to detect increasingly complex patterns.
Dense classifier: flatten → 512-unit fully connected layer → dropout → 6-class output layer.
Trained from scratch: all weights are learned only from EcoSort’s dataset.
Role: provides a clean baseline to compare against more powerful models such as ResNet-18.

What do the building blocks mean?

Convolution (Conv): slides small filters over the image to detect local patterns such as edges, corners, and textures.
Batch Normalization (BatchNorm): keeps the activations in a stable range so the network trains faster and is less likely to overfit.
ReLU activation: keeps only positive values and sets negative values to zero, helping the model learn non-linear patterns.
Max Pooling (MaxPool): reduces the spatial size by keeping only the strongest responses in each small region, which focuses the model on the most important visual cues.

The Custom EcoSortCNN essentially starts by noticing basic shapes and textures in each image and gradually builds up to more detailed features. The final layers combine these features to decide whether the object looks most like cardboard, glass, metal, paper, plastic, or mixed trash. This baseline model shows how far a standard CNN can go before adding more advanced architectures like ResNet-18.

What is ResNet-18?

ResNet-18 is a widely used deep convolutional neural network that introduced the concept of residual connections. These skip-connections act like shortcut paths, allowing information to flow forward without being lost. This makes the network much easier to train, especially as it becomes deeper. ResNet-18 is powerful, stable, and still lightweight enough for fast training and real-time applications.

In EcoSort, ResNet-18 is used through transfer learning. The model starts with weights pretrained on ImageNet, a huge dataset of everyday images. These pretrained weights already understand edges, textures, shapes, and object structures. We replace the final layer so the model predicts EcoSort’s six material categories, then fine-tune the entire network on the cleaned dataset.

        Input Image (224 × 224 × 3)
                        ↓
        Conv Layer + BatchNorm + ReLU
                        ↓
        ──────────── Residual Block (x2) ────────────
            Conv → BN → ReLU → Conv → BN
            + skip connection (input added back)
                        ↓
        ──────────── Residual Block (x2) ────────────
            Conv → BN → ReLU → Conv → BN
            + skip connection
                        ↓
        ──────────── Residual Block (x2) ────────────
                        ↓
        ──────────── Residual Block (x2) ────────────
                        ↓
        Global Average Pooling
                        ↓
        Fully Connected Layer (6 classes)

Residual blocks: solve training problems in deep networks by using skip connections.
Pretrained backbone: begins with ImageNet features learned from millions of images.
EcoSort fine-tuning: replaces the final classification layer with a 6-class head.
Advantages: faster training, higher accuracy, better generalization, more stable gradients.

What is a residual connection?

A residual connection allows a block to output: BlockOutput = F(input) + input. This simple addition prevents the network from “forgetting” information and makes deep networks learn far more effectively. Instead of struggling to learn everything from scratch, the block only learns the “difference” from the input.

ResNet-18 is like giving the model a shortcut so it can keep what it already knows while learning new details. This makes it extremely reliable for EcoSort, where the model must distinguish fine textures such as plastic gloss, metal shine, paper fibers, and cardboard grain.

Why were these models used?

EcoSort uses two complementary models - a custom CNN and a ResNet-18 backbone because each contributes something different to the overall system. Together, they allow the project to compare a simple, from-scratch architecture with a stronger transfer-learning model that is better suited for real-world accuracy.

Custom EcoSortCNN (baseline model): A lightweight, easy-to-train network designed specifically for the EcoSort dataset. It serves as a controlled reference point that shows how well a standard CNN performs without external knowledge from large datasets. This makes it ideal for understanding the dataset’s difficulty before introducing more advanced techniques.
ResNet-18 (transfer-learning model): A proven architecture pretrained on ImageNet, which already captures patterns such as edges, shapes, textures, and object structures. Fine-tuning this model allows EcoSort to benefit from strong visual features learned from millions of images, resulting in faster convergence and higher accuracy on small or class-imbalanced datasets.
Balanced comparison: Using both models allows EcoSort to measure the impact of transfer learning, identify when a simple CNN struggles, and see how much performance gain comes from deeper architectures with residual connections.
Practical considerations: ResNet-18 offers an excellent trade-off between speed and accuracy, making it suitable for a student project environment while still achieving production-quality insights. The custom CNN remains useful for experimentation, debugging, and lightweight deployments.

The custom CNN provides a clean, interpretable baseline, while ResNet-18 delivers the accuracy and generalization needed for reliable real-world waste classification. Comparing the two helps highlight the strengths and limitations of each approach.

What do the EcoSort models look like structurally?

The EcoSortCNN baseline processes each 224 × 224 RGB image through a sequence of four convolutional blocks. Each block applies 3×3 filters, batch normalization, a ReLU activation, and a 2×2 max-pooling layer, gradually increasing the number of channels (32 → 64 → 128 → 256) while reducing the spatial resolution. The resulting feature maps are then flattened and passed through a 512-unit dense layer with dropout before the final six-class output layer. This model is deliberately compact: it is easy to train from scratch and provides a clear reference point for how much value transfer learning adds.

The EcoSortResNet18 model replaces the hand-built feature extractor with a ResNet-18 backbone that has already been trained on ImageNet. The original final layer is removed and replaced with a new fully connected head that outputs six logits, one for each EcoSort category. In this project, the backbone is fine-tuned end-to-end rather than frozen, allowing the earlier layers to adjust slightly to the specific textures and shapes present in waste images while still benefiting from the strong initialization provided by pretraining.

Input: 224 × 224 RGB images, normalized with ImageNet mean and standard deviation.
Backbone: ResNet-18 with residual blocks that help train deeper networks reliably.
Output head: final fully connected layer mapping to six material classes.
Choice of model: ResNet-18 offers a good balance of accuracy and computational cost for a project-scale dataset like TrashNet.

How is the model trained?

EcoSort is trained on the cleaned and split dataset using the same train/validation/test structure described earlier. Batches of images and labels are loaded from disk, moved to the GPU when available, and passed through the network to produce class scores. The training loop then measures how far the predictions are from the true labels and uses this feedback to update the model.

Loss function: cross-entropy with a small amount of label smoothing (0.05), which slightly softens the target distribution. This helps prevent the model from becoming overly confident and improves generalization on the smaller “trash” class.
Optimizer: AdamW with a learning rate of 3 × 10⁻⁴ and weight decay 1 × 10⁻⁴, applied only to parameters that are marked as trainable. Weight decay acts as a gentle regularizer to discourage overly large weights.
Learning-rate scheduler: a ReduceLROnPlateau scheduler monitors validation accuracy and reduces the learning rate when progress stalls, allowing larger, exploratory steps early in training and finer adjustments later on.
Training length: up to 20 epochs, with the best model snapshot saved whenever the validation accuracy improves. This ensures that later evaluations always use the strongest version of the network rather than just the last epoch.

During both training and validation, EcoSort tracks average loss and accuracy across each epoch. These metrics are used to compare architectures, tune hyperparameters, and check for signs of over-fitting, such as validation accuracy plateauing while training accuracy continues to rise.

View the training loop (train.py)


Epoch 1/20  - 348.0s | Train Loss: 0.9469 | Train Acc: 0.7162 | Val Loss: 0.7927 | Val Acc: 0.8008
Epoch 2/20  - 340.1s | Train Loss: 0.6748 | Train Acc: 0.8262 | Val Loss: 0.8419 | Val Acc: 0.7729
Epoch 3/20  - 336.3s | Train Loss: 0.5755 | Train Acc: 0.8653 | Val Loss: 0.6882 | Val Acc: 0.8446
Epoch 4/20  - 337.9s | Train Loss: 0.5204 | Train Acc: 0.8975 | Val Loss: 0.6327 | Val Acc: 0.8367
Epoch 5/20  - 336.3s | Train Loss: 0.4359 | Train Acc: 0.9356 | Val Loss: 0.7124 | Val Acc: 0.8327
Epoch 6/20  - 338.9s | Train Loss: 0.4575 | Train Acc: 0.9312 | Val Loss: 0.5779 | Val Acc: 0.8845
Epoch 7/20  - 334.5s | Train Loss: 0.4033 | Train Acc: 0.9475 | Val Loss: 0.6926 | Val Acc: 0.8566
Epoch 8/20  - 335.2s | Train Loss: 0.4112 | Train Acc: 0.9455 | Val Loss: 0.5951 | Val Acc: 0.8845
Epoch 9/20  - 334.3s | Train Loss: 0.3664 | Train Acc: 0.9658 | Val Loss: 0.5065 | Val Acc: 0.8924
Epoch 10/20 - 335.8s | Train Loss: 0.3538 | Train Acc: 0.9728 | Val Loss: 0.5375 | Val Acc: 0.8924
Epoch 11/20 - 340.3s | Train Loss: 0.3316 | Train Acc: 0.9767 | Val Loss: 0.5097 | Val Acc: 0.9163
Epoch 12/20 - 342.9s | Train Loss: 0.3133 | Train Acc: 0.9861 | Val Loss: 0.5058 | Val Acc: 0.9243
Epoch 13/20 - 339.7s | Train Loss: 0.3356 | Train Acc: 0.9762 | Val Loss: 0.5007 | Val Acc: 0.9124
Epoch 14/20 - 336.1s | Train Loss: 0.3692 | Train Acc: 0.9614 | Val Loss: 0.6553 | Val Acc: 0.8367
Epoch 15/20 - 337.2s | Train Loss: 0.3634 | Train Acc: 0.9658 | Val Loss: 0.5877 | Val Acc: 0.9044
Epoch 16/20 - 337.1s | Train Loss: 0.3156 | Train Acc: 0.9886 | Val Loss: 0.4615 | Val Acc: 0.9323
Epoch 17/20 - 335.9s | Train Loss: 0.2886 | Train Acc: 0.9955 | Val Loss: 0.4603 | Val Acc: 0.9203
Epoch 18/20 - 338.4s | Train Loss: 0.2879 | Train Acc: 0.9921 | Val Loss: 0.4578 | Val Acc: 0.9283
Epoch 19/20 - 336.8s | Train Loss: 0.2803 | Train Acc: 0.9970 | Val Loss: 0.4297 | Val Acc: 0.9363
Epoch 20/20 - 335.2s | Train Loss: 0.2746 | Train Acc: 0.9965 | Val Loss: 0.4153 | Val Acc: 0.9482

Best validation accuracy: 0.9482
Best model saved to: results/model_weights/ecosort_cnn_best.pth

How is the trained model used?

After training, EcoSort loads the best saved model checkpoint and uses it to classify any new waste image. This stage is called inference. The goal is simple: take one image, process it, and return the most likely material class.

Load the model checkpoint: EcoSort restores the trained weights so the model behaves exactly as it did after training.
Preprocess the image: Resize to 224×224, convert to RGB, normalize with ImageNet statistics, and transform into a tensor.
Switch to inference mode: The model is set to eval() so BatchNorm and Dropout behave deterministically.
Run a forward pass: The image tensor is passed through the network to produce six logits - one for each EcoSort category.
Convert logits to probabilities: A softmax layer transforms these scores into probabilities that sum to 1.
Select the highest probability: The class with the largest probability becomes the final predicted label (e.g., “paper,” “plastic").

This inference pipeline is how EcoSort would operate in real-world applications - classifying images one at a time in recycling systems, sustainability apps, or automated waste-sorting tools.

        Image 
          → Preprocessing (resize, RGB, normalize)
            → Model Forward Pass
              → Softmax Probabilities
                → Predicted Material Label

Section 4 · Results

How well does EcoSort perform?

Performance on the held-out test set

After training, EcoSort is evaluated on a held-out test split that the model never sees during training or validation. The best checkpoint from the training loop is reloaded, and the model is run once over all test images to obtain final predictions. For each image, EcoSort compares the predicted material label with the true label, building up a complete picture of how often the model is correct and where mistakes occur.

The figure below shows the resulting confusion matrix, which counts how many times each true class (rows) is predicted as each possible class (columns). Darker green squares along the diagonal indicate correct predictions, while off-diagonal cells show misclassifications. This view complements overall accuracy by revealing which materials are consistently recognized and which tend to be confused with others.

EcoSort confusion matrix showing true vs predicted labels

In EcoSort’s confusion matrix, most of the mass lies on the diagonal, indicating that the model correctly distinguishes the six material types in the majority of test cases. Classes such as paper, cardboard, and plastic show very strong diagonals with almost no confusion. The few off-diagonal entries mainly occur between visually similar categories. For example, glass and metal, or occasional confusion between plastic and trash, where shape and surface cues can overlap. These patterns mirror real-world sorting challenges, where shiny or crumpled items are more easily misinterpreted.

Alongside the confusion matrix, a classification report summarizes precision, recall, and F1-score for each class. These metrics show how the model performs on both common categories and smaller, more challenging ones such as “trash,” where data is more limited and visual appearance is more varied.

        precision    recall  f1-score   support

        cardboard     0.9756    0.9756    0.9756        41
        glass         0.9423    0.9608    0.9515        51
        metal         0.9286    0.9512    0.9398        41
        paper         0.9833    0.9833    0.9833        60
        plastic       0.9787    0.9388    0.9583        49
        trash         0.9333    0.9333    0.9333        15

        accuracy                          0.9611       257
        macro avg     0.9570    0.9572    0.9570       257
        weighted avg  0.9614    0.9611    0.9611       257

The report confirms that EcoSort achieves around 96% overall accuracy, with high precision and recall across all six materials. Even the smallest class, trash, maintains strong scores, which suggests that the model is not simply favoring the more common categories. The close agreement between precision, recall, and F1 across classes indicates balanced behavior: EcoSort rarely over-predicts a class and is similarly unlikely to miss it when it appears.

View evaluation script (evaluate.py)
View full classification report (precision · recall · F1)

Which materials are easiest and hardest?

The confusion matrix and classification report together show that EcoSort performs strongest on materials with distinctive visual signatures. For example, clear glass containers or rigid cardboard pieces with visible corrugation. These items tend to produce confident predictions and high per-class precision and recall.

Most errors appear between visually similar categories. Thin pieces of cardboard may resemble paper, especially when folded or torn, while glossy plastic packaging can look similar to metal under strong reflections. The “trash” category is naturally diverse and absorbs many ambiguous cases, which lowers its scores relative to more uniform classes. These patterns mirror real-world sorting challenges, where even people hesitate or disagree on borderline items.

Clearly separated classes: items with unique textures or shapes (for example, glass bottles) are recognized most reliably.
Borderline pairs: confusion is most common between categories that share color, thickness, or surface finish (paper vs. cardboard).
Diverse “trash” class: collects many edge cases and mixed materials, which naturally lowers its precision and recall.

These class-wise patterns are important because they point to concrete opportunities for improvement; such as collecting more examples of borderline items, re-labeling confusing samples, or designing clearer signage for categories that people and models both find ambiguous.

How EcoSort interprets images (Grad-CAM heatmaps)

To move beyond raw numbers, EcoSort uses Grad-CAM visualizations to see where the fine-tuned ResNet-18 model is “looking” when it predicts each class. For a given test image, Grad-CAM produces a heatmap that highlights the regions most responsible for the chosen label. Warmer colors indicate areas that contributed strongly to the decision.

In many correctly classified examples, the heatmaps concentrate on the main object – the body of a bottle, the surface of a can, or the texture of a cardboard flap – and largely ignore the background. In harder cases, the attention sometimes spreads to shadows or clutter near the object, which helps explain why certain items are mis-sorted. These visual checks provide an additional layer of trust: they show that EcoSort’s predictions are usually driven by sensible visual cues rather than random noise.

View Grad-CAM generation code (gradcam.py)

Grad-CAM example 1: true cardboard, predicted cardboard — Example 1 – True: **cardboard**, Predicted: **cardboard**.
The Grad-CAM shows strong attention along the vertical corner seam and the lower fold of the box. These regions contain the clearest cardboard texture and depth cues, which the model relies on for its prediction.

Grad-CAM example 2: true cardboard, predicted cardboard — Example 2 – True: **cardboard**, Predicted: **cardboard**.
The Grad-CAM highlights the printed barcode area, taped seams, and upper panel of the box. These regions contain sharp edges, labels, and reflective tape that help the model distinguish cardboard from smoother materials.

Grad-CAM example 3: true cardboard, predicted cardboard — Example 3 – True: **cardboard**, Predicted: **cardboard**.
The Grad-CAM focuses on the horizontal flap cutout and the lower corrugated region of the box. These areas contain sharp edges, shadows, and consistent cardboard texture, which the model uses to confirm the material.

Grad-CAM example 4: true cardboard, predicted cardboard — Example 4 – True: **cardboard**, Predicted: **cardboard**.
The Grad-CAM concentrates on the center of the torn cardboard piece, including the printed label and surrounding corrugated texture. These regions provide strong material cues that help the model distinguish cardboard from the plain background.

Grad-CAM example 5: true cardboard, predicted cardboard — Example 5 – True: **cardboard**, Predicted: **cardboard**.
The Grad-CAM focuses on the center of the stacked cardboard piece, especially the exposed corrugated edges and layered structure. These features provide clear material cues that help the model confirm the object as cardboard.

What do these results mean?

Together, these performance metrics and visual explanations show that EcoSort is not only accurate but also behaves in a sensible, interpretable way. The model reliably identifies materials with distinct surface patterns such as cardboard, paper, and glass, while occasional errors arise in categories that overlap visually. The Grad-CAM heatmaps confirm that predictions are driven by meaningful regionslike texture, edges, folds, and printed labels rather than random background pixels.

These outcomes suggest that EcoSort is well-suited for real-world environments where items appear in varied orientations, lighting, or levels of wear. A model that consistently attends to the correct object and ignores background noise is far more dependable for downstream applications such as smart bins, educational tools, or semi-automated sorting workflows.

In short, the results demonstrate that EcoSort is both high-performing and interpretable. These are the two qualities that make it a strong foundation for expanding to additional materials or more complex real-world scenes.

Section 5 · Conclusion

What question did EcoSort try to answer?

EcoSort began with a simple guiding question: everyday images of waste items can help people sort materials more accurately. Many daily mistakes come from quick judgments made at a bin, and this project set out to understand those visual decisions more closely. By collecting, organizing, and examining hundreds of photos from the TrashNet dataset, EcoSort grounded the exploration in real images that resemble common situations people face. The project focused on the moments when someone must decide where an item belongs. This approach created a direct link between the dataset and real recycling decisions. Each step, from cleaning the images, to standardizing them, and grouping them, served the purpose of studying how objects look when people encounter them. The overall goal was to see whether images alone carry enough information for a system to provide useful sorting guidance. EcoSort uses these observations to better understand where people succeed and where they struggle in waste sorting.

What did studying the images reveal?

Reviewing the cleaned dataset revealed important visual patterns that explain why sorting mistakes happen so often. Some materials, such as rigid cardboard boxes or smooth glass bottles, have distinct shapes and textures that make them easy to recognize. Other items, especially thin cardboard, wrinkled paper, or reflective plastic, look surprisingly similar to one another in everyday lighting. These similarities showed that many sorting mistakes are rooted in genuine visual confusion, not carelessness. The process of resizing and standardizing images also highlighted how differently materials can appear depending on shadows, angles, and background contrast. Looking at these variations helped clarify the exact cues like edges, folds, labels, or shine that influence how people interpret each item. The dataset illustrated that sorting is not a simple task, and even well-intentioned people can easily misidentify borderline materials. These insights formed the foundation for understanding how EcoSort’s model would behave.

What did the results show?

When EcoSort tested its cleaned and organized dataset on new images, the model correctly identified materials in a large majority of cases. The strongest results appeared in categories with consistent textures and shapes, such as cardboard, paper, and glass. The confusion matrix revealed that most errors happened in predictable places, especially between materials that genuinely look alike in certain conditions. For example, thin cardboard was often near the boundary of the paper category, and some plastic items resembled mixed trash. These patterns reflect real sorting challenges that people face rather than random model mistakes. The results suggest that visual information alone is powerful enough to guide many everyday decisions, as long as the system has been exposed to clean and consistent examples. The analysis also showed that removing corrupted or unclear images improved the model’s reliability. The outcomes essentially demonstrated how consistent preparation and thoughtful dataset cleaning contributed directly to EcoSort’s strong performance.

How does this help with real-world sorting?

EcoSort’s findings suggest that simple image-based systems could make everyday sorting less confusing. The model tended to focus on the main object in each photo, ignoring backgrounds and distractions, which means it mirrors the way people naturally make quick judgments. Because it performs best on materials that people also find easy, it could support smoother decisions in places where sorting mistakes frequently occur. These results point to opportunities for clearer signage, visual guides, or small digital tools that show people what each category typically looks like. EcoSort demonstrates that even a small, clean dataset can reveal which items cause the most hesitation and why. This information can shape how recycling stations are labeled or how campuses and offices communicate sorting rules. The project highlights the value of providing simple visual cues rather than technical explanations. In this way, EcoSort connects image analysis to everyday behavior in a practical and understandable way.

Where could EcoSort go next?

The project opens several meaningful paths for future exploration. One natural step is to expand the dataset to include compostables, mixed-material items, or photos taken directly at campus bins. These additions would bring the system closer to real-world environments where items rarely appear clean and centered. Another direction would be creating simple tools such as a webpage or mobile interface that allow people to check an item quickly using a photo. EcoSort also provides a foundation for exploring how different communities respond to visual sorting guidance and whether image-based reminders improve daily habits. The project demonstrated that small and thoughtful interventions can make sorting feel more intuitive rather than overwhelming. Future work could focus on identifying where people hesitate most and building targeted examples for those cases. EcoSort shows that clear images and consistent organization can make a meaningful difference in how everyday sorting decisions are made.

Section 6 · About

Moukthika Gunapaneedu

Data Scientist

Portfolio moukthikagunapaneedu@gmail.com