MAIN PROJECT IMAGE CLASSIFIER
- harishpabolu777
- Apr 30, 2023
- 3 min read
Updated: Apr 30, 2023
Here in this project we are developing an image classifier with various algorithms and to find out which algorithm performs better on the data-set.

We are taking data from kaggle website for this image classifier project.
Mango leaf disease data-set. The data set contain 8 catageries.

Each category we have a lot of images related to it.
Data Preprocessing:
Now we have data set related to mango leaf disease. On this data set we need to build a image classifier with multiple algorithms. The data to be preprocessed and to be changed as per the requirements of the classifier. We use the below code for data preprocessing.
As we are building an image classifier data preprocessing involves the following sets:
Reseizing images to fixed size.
Converting the images to grayscale as it simplifies the image representation.
Normalizing the pixel values to range between 0 and 1, by dividing the pixel values by 255.
The code stated below is used for data preprocessing.

Splitting Data:
We need to divide our data into training data and testing data. Training data is used to train the classifier and testing data is used to find the accuracy of the classifier. To make a good classifier we have to train it with more so we are splitting the data as 80% for training and 20% for testing.

Implementing multiple Algorithms:
We are using 3 algorithms in this image classifier and finding which among the 3 gives us good accuracy.
We are implementing SVM, Random Forest and KNN algorithms in this image classifier.
SVM:
SVM is a deep learning algorithm which perform supervised learning for classification or regression of data groups. SVM works by mapping data to a high-dimensional feature space so that data points can be categorized, even when the data are not otherwise linearly separable.

Source :https://www.google.com/search?q=svm&rlz=1C1UEAD_enIN967IN967&sxsrf=APwXEdcqnAGq79T6I_4ZHPwo0pL1wnGjtw:1682840700241&source=lnms&tbm=isch&sa=X&ved=2ahUKEwiL2JqRjtH-AhV1FFkFHbt8Dz8Q0pQJegQIBxAE&biw=1229&bih=550&dpr=1.56#imgrc=8p6qQ8ySbQD4TM
Random Forest:
Random Forest is an algorithm which works on bagging. Here in Random-Forest. In bagging, a group of models is trained on different subsets of the dataset, and the final output is generated by collating the outputs of all the different models. In the case of random forest, the base model is a decision tree.

KNN:
A machine learning algorithm called the KNN is used to solve classification and regression issues. In order to predict the class or value of a new data point, it locates the K nearest points in the training dataset and uses their class to do so.

We are training all the 3 algorithms with our training data and by using our testing dat we are finding the accuracy of all the 3 algorithms.

After training our 3 algorithms in our image classifier the prediction accuracies are as below

on observing the accuracies of all 3 algorithms, We conclude that Random forest is best among the 3 algorithms we used in this classifier.
The Classification reports of the algorithms are below.

Prediction:
Here as we developed an classifier for prediction of mango leaf disease. we are using an image downloaded from google and implementing our classifier on it and our classifier prediction is correct. The image below gives strength to my statement.

Experiments:
Hyper Parameters tunning: For random forest classifier we implemented hyper parameter tunning using GridSearchCV. by using the GridSearchCV best hyper parameters are found and the random forest algorithm with best parameters gives an accuracy of 82.5% which is grater than the original accuracy.
Before hyper parameters tunning:

After hyper parameter tunning:

Hyper parameter tunning on SVM: Here also same as above we used GridSearchCv to find the best hyper parameters and then We got the accuracy of 67.62% which is grater than the previous accuray.
Before hyper parameter tunning:

After hyper parameter tunning:

Contributions:
In this process I learned about hyper parameters and usage of GridSearchCv to find best parameters.
Understanding which algorithm works best.
This helped me to know more about built-in libraries usage.
Challenges:
Handling huge data.
Normalizing pixel values and image pre processing.
Time duration for finding best hyperparameters as the data-set is huge.
Overfitting.
As the data-set high the ram usage is higher which stops the usage other works.
Did many experiments(with different iterations and C value ) to increase the accuracy of SVM with hyper parameter tunning.
Reference:
https://learn.drukinfotech.com/image-classification-using-machine-learning-algorithms/
https://www.analyticsvidhya.com/blog/2022/01/image-classification-using-machine-learning/
https://towardsdatascience.com/understanding-random-forests-hyperparameters-with-images-9b53fce32cb3
https://learn.drukinfotech.com/image-classification-using-machine-learning-algorithms/
https://www.geeksforgeeks.org/random-forest-classifier-using-scikit-learn/#
Youtube Link:
Jupyter code:
Github link:
Comments