Basic Image Classifier Project

Sampurn Anand

Published in

Nerd For Tech

10 min readFeb 19, 2021

Aim of this project is to:

Create a labelled dataset of Avengers images- Captain America, Iron Man, Black Widow, Hulk, Thor.
Train a CNN that is able to classify an unseen image with reasonable accuracy.

Basically, there are 4 main steps of data manipulation in an Image Classifier. These are:

Data Collection
Data Preprocessing
Feature Extraction
Model Training

Now, lets go thorough step by step implementation and various methods available. The method which is best suited will be adopted.

Note : We are using Google Colab in this project but one can use any of the available softwares.

Step : 1 — Data Collection

In this project we need to collect data in form of images. The Images can be obtained by manually scrapping single (static and/or dynamic) websites. But since a large amount of images is needed, so, many websites will have to be scrapped. Instead of going manually on different websites, one can scrap the Google Images or Bing Images.

There can be many methods for scrapping images. Three of which that are popularly used are as follows :

Using the python and web scrapping tools to automate download of a certain type of images from google directly. This method is quite illegal, so, websites and Google itself make sure that these web crawlers don’t work. Which is the reason why the codes need to be updated on a regular basis.
By using Chrome browser extensions like fatkun. These extensions are far more stable to use than the previous method. But as per the requirement of this project, images should be scrapped from internet.
Using Python tools such as Bing Image Downloader to directly export required images to a directory.

In this project, for the sake of convenience, Bing Image Downloader is used.

First we will install the downloader and import the required libraries by: -

!pip install bing-image-downloader 
from bing_image_downloader import downloader

Now, it is used to Scrap images by using the following lines of codes: -

downloader.download('Captain America Chris Evans', output_dir= './drive/MyDrive/datasets/collection', limit = 400, adult_filter_off = False, force_replace = False, timeout = 6000)

Similarly, Images for Iron Man, Thor, Hulk and Black Widow are scrapped.

Step : 2 — Data Preprocessing

This Step is vey important to perform Image Classification. It increases the overall efficiency of the algorithm.

This step is required in this project because after data collection, it was observed that many unimportant images from Comic Books were collected. These types of images will lead to consumption of more resources of the system. Moreover, the classifier might make mistakes. So, those images should not be used directly.

In this project, OpenCV and a technique called haar cascades which are used for Data Cleaning purposes. They will detect if a face and two eyes are clearly visible or not. If they are visible then the image is kept otherwise the image is discarded. Majority of the data cleaning work will be done using python code but there will be some cleaning work that will have to be done manually. Manual checking of images is required to remove the unwanted faces. For example, in the folder for Iron Man, faces of other characters might appear which decreases the efficiency of the Model.

Steps for Data Cleaning :

Faces with 2 eyes are extracted from Raw Images using Haar Cascade
Manually photos with two or more faces are discarded. Also the photos which have blurred photos and other

Haar Cascade functioning and usage in brief: -

Every Image has line and edge features. Haar Cascade uses a moving Window of this edge features to detect where are eyes and full face.

For example, to detect the eyes, the area of eyes tends to more darker than the area below. Haar Cascades use this mask to detect the areas.

OpenCV has readymade APIs to detect face, eyes, etc. 17 different xml files for running the APIs are uploaded manually for using the haar cascade functions to detect the various features.

Now, lets import the required libraries and make functions to use the face cascade and eye cascade features: -

import numpy as np
import cv2
import matplotlib
from matplotlib import pyplot as plt
%matplotlib inline

face_cascade = cv2.CascadeClassifier("./drive/MyDrive/Colab Notebooks/opencv/haarcascades/haarcascade_frontalface_default.xml")
eye_cascade = cv2.CascadeClassifier("./drive/MyDrive/Colab Notebooks/opencv/haarcascades/haarcascade_eye.xml")

Let’s have a trial run to see if the functions are working properly or not:

img = cv2.imread('/content/drive/My Drive/datasets/collection/Thor Chris Hemsworth/Image_8.jpg')
#img.shape
plt.imshow(img)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
gray.shape
plt.imshow(gray, cmap='gray')

face_cascade = cv2.CascadeClassifier('./drive/MyDrive/Colab Notebooks/opencv/haarcascades/haarcascade_frontalface_default.xml')
eye_cascade = cv2.CascadeClassifier('./drive/MyDrive/Colab Notebooks/opencv/haarcascades/haarcascade_eye.xml')
faces = face_cascade.detectMultiScale(gray)
faces

These lines of code checked if image is having a face and two eyes or not. If it didn’t have any faces or eyes then it will return an error. If the image had then it returned an output which is similar to output shown below: -

Now, Since the functions are working properly, let’s write code to check all the images in the dataset. The images which match the requirements will be converted to gray color and then cropped. These images are saved in a seprate folder for future use.

def get_cropped_image_if_2_eyes(image_path):
    img = cv2.imread(image_path)
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    faces = face_cascade.detectMultiScale(gray, 1.3, 5)
    for (x,y,w,h) in faces:
        roi_gray = gray[y:y+h, x:x+w]
        roi_color = img[y:y+h, x:x+w]
        eyes = eye_cascade.detectMultiScale(roi_gray)
        if len(eyes) >= 2:
            return roi_color

path_to_data = "./drive/My Drive/datasets/collection/"
path_to_cr_data = "./drive/My Drive/datasets/cropped/"

import os
img_dirs = []
for entry in os.scandir(path_to_data):
    if entry.is_dir():
        img_dirs.append(entry.path)

import shutil
if os.path.exists(path_to_cr_data):
     shutil.rmtree(path_to_cr_data)
os.mkdir(path_to_cr_data)

cropped_image_dirs = []
celebrity_file_names_dict = {}

for img_dir in img_dirs:
    count = 1
    celebrity_name = img_dir.split('/')[-1]
    print(celebrity_name)
    
    celebrity_file_names_dict[celebrity_name] = []
    
    for entry in os.scandir(img_dir):
        roi_color = get_cropped_image_if_2_eyes(entry.path)
        if roi_color is not None:
            cropped_folder = path_to_cr_data + celebrity_name
            if not os.path.exists(cropped_folder):
                os.makedirs(cropped_folder)
                cropped_image_dirs.append(cropped_folder)
                print("Generating cropped images in folder: ",cropped_folder) #Checking whether the code is running successfully or not
                
            cropped_file_name = celebrity_name + str(count) + ".png" #changing file type of every image to png
            cropped_file_path = cropped_folder + "/" + cropped_file_name 
            
            cv2.imwrite(cropped_file_path, roi_color)
            celebrity_file_names_dict[celebrity_name].append(cropped_file_path)
            count += 1

Step: 3 — Feature Extraction using Wavelet Transform

Importance of this step is that in Feature Extraction, colored images cause many errors. The Colored images can have a variety of shades and variety of colors which makes it a difficult task for the classifier to identify such an image.

To avoid those errors, the images are transformed in black and white colors with different contrasts for different areas. Wavelet transformation allows extraction of the important features from image. In general, in the wavelet transformed image, the area of eyes will be differentiated from the area of forehead, nose will also be distinct and so on.

While going through the image processing literature, it was found out that wavelet transforms are often the most effective way of extracting. So, Wavelet transformation is being used in this project.

After inputting the image, it will perform the wavelet transformation on top of it using PYWT (pi wavelet transform library) and it will return your new image which is the wavelet transform. Concepts on signal processing, frequency domain, time domain, Fourier transformation has been used to apply the Wavelet transformation in main Codes. A few of these concepts are explained below briefly:

Any signal, like an audio signal, image can also be considered as a signal. It can be presented in two type of domain. So, image can be presented in a spatial domain like space (x and y) or it can be represented as a frequency domain. Audio signal can be represented in a time domain or a frequency domain.
Fourier transformation will take a complex signal and will return the basic signals which makes that complex signal. For Example, let’s consider some dish, let’s say Dosa. If reverse engineering is done on Dosa, the basic ingredients are obtained which are water, rice flour, urad dal and maybe more.
Similar case is with a complex signal where there are different instruments playing in and there is also noise. There are many noise cancellation devices so how do they actually cancel the noise? That is something done using Fourier transformation because it can separate out the voice of the vocal cord and the noise. It can separate out all these signals into different frequencies and using the frequency filters some frequencies can be suppressed or it can be inter amplified. Certain frequencies in certain audio devices, treble or bass can be increased. All of this is possible because of Fourier transformation.
Wavelet transformation is kind of similar to Fourier transformation which amplifies certain features of the image.

For the further steps, input will be a vertically stacked Color image and its wavelet transformed image. Code for which is as follows:

import numpy as np 
import pywt 
import cv2      
def w2d(img, mode='haar', level=1):     
 imArray = img     
 #Datatype conversions     
 #convert to grayscale     
 imArray = cv2.cvtColor( imArray,cv2.COLOR_RGB2GRAY )     
 #convert to float     
 imArray =  np.float32(imArray)        
 imArray /= 255;     
 # compute coefficients      
 coeffs=pywt.wavedec2(imArray, mode, level=level)       
 #Process Coefficients     
 coeffs_H=list(coeffs)       
 coeffs_H[0] *= 0;        
 # reconstruction     
 imArray_H=pywt.waverec2(coeffs_H, mode);     
 imArray_H *= 255;     
 imArray_H =  np.uint8(imArray_H)      
 return imArray_H

Lets assign a number (or Key) to each of the 5 Characters.

class_dict = {}
count = 0
for celebrity_name in celebrity_file_names_dict.keys():
    class_dict[celebrity_name] = count
    count = count + 1
class_dict

{ ‘Black Widow Scarlett Johansson’: 2,
‘Captain America Chris Evans’: 0,
‘Hulk Mark Ruffalo’: 3,
‘Iron Man Tony Stark’: 1,
‘Thor Chris Hemsworth’: 4 }

Creating a dictionary to refer to path of all the cropped images of the respective characters:

celebrity_file_names_dict = {}
for img_dir in cropped_image_dirs:
    celebrity_name = img_dir.split('/')[-1]
    file_list = []
    for entry in os.scandir(img_dir):
        file_list.append(entry.path)
    celebrity_file_names_dict[celebrity_name] = file_list

Now, lets create a dictionary where the colored images are stacked vertically with their Wavelet transformed ones for future use.

X, y = [], []
for celebrity_name, training_files in celebrity_file_names_dict.items():
    for training_image in training_files:
        img = cv2.imread(training_image)
        if img is None:
          continue
        scalled_raw_img = cv2.resize(img, (32, 32)) #resizing using openCV as images maybe of different sizes
        img_har = w2d(img,'db1',5) #getting the wavelet transformed image
        scalled_img_har = cv2.resize(img_har, (32, 32)) #resizing wavelet transformed image
        combined_img = np.vstack((scalled_raw_img.reshape(32*32*3,1),scalled_img_har.reshape(32*32,1))) #vertically stacking both the images
        X.append(combined_img)
        y.append(class_dict[celebrity_name])
X = np.array(X).reshape(len(X),4096).astype(float)

Step 4 — Model Training: Using SVM with heuristic finetuning

In this Project, at first SVM is used initially to train the main model.

Then other models are tested using GridSearch to decide which model is the best fit for the project.

The GridSearch CV is used for Hypertuning parameters. It helps in deciding which model is performing the best.

In our project, we are defining the candidate models as follows for comparisons:

SVM with parameters as — Values of C are 1,10,100,1000 and Kernel Values are rbf and linear.
Random Forest with parameters as — Number of estimators (or Decision Trees) as 1,5,10.
Logistic Regression with parameters as — Values of C are 1,5,10.

Finally the best model is stored in the “Trained Model.pkl” and the class dictionary is also saved.

Code for training SVM:

from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.metrics import classification_reportX_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

#Pipeline is created to scale the Data.  
pipe = Pipeline([('scaler', StandardScaler()), ('svc', SVC(kernel = 'rbf', C = 10))])
pipe.fit(X_train,y_train)
pipe.score(X_test,y_test)

This gave an score of 0.9770992366412213

Now, let’s get a complete classification report.

print(classification_report(y_test, pipe.predict(X_test)))

This gave the following Output:

Training and testing the other models as mentioned earlier:

from sklearn import svm
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import make_pipeline
from sklearn.model_selection import GridSearchCV
model_params = {
    'svm': {
        'model': svm.SVC(gamma='auto',probability=True),
        'params' : {
            'svc__C': [1,10,100,1000],
            'svc__kernel': ['rbf','linear']
        }  
    },
    'random_forest': {
        'model': RandomForestClassifier(),
        'params' : {
            'randomforestclassifier__n_estimators': [1,5,10]
        }
    },
    'logistic_regression' : {
        'model': LogisticRegression(solver='liblinear',multi_class='auto'),
        'params': {
            'logisticregression__C': [1,5,10]
        }
    }
}
scores = []
best_estimators = {}
import pandas as pd
for algo, mp in model_params.items():
    pipe = make_pipeline(StandardScaler(), mp['model'])
    clf =  GridSearchCV(pipe, mp['params'], cv=5, return_train_score=False) 
    # cv=5 => There will be 5 folds of testing the model and 
    # then will avereage out the scores 
    clf.fit(X_train, y_train)
    # Scores are appended and a data frame is created from it
    scores.append({
        'model': algo,
        'best_score': clf.best_score_,
        'best_params': clf.best_params_
    })
    best_estimators[algo] = clf.best_estimator_
    
df = pd.DataFrame(scores,columns=['model','best_score','best_params'])
df

The Final report of scores obtained is as follows:

These scores were on the Training data. Now, let’s get the scores for the models on Testing Data:

best_estimators['svm'].score(X_test,y_test)
best_estimators['random_forest'].score(X_test,y_test)
best_estimators['logistic_regression'].score(X_test,y_test)

The scores obtained were as follows:

SVM: 0.9770992366412213
Random Forest: 0.9389312977099237
Logistic Regression: 0.9847328244274809

As noticed SVM, is performing good with both — Training Data and Testing Data, so, SVMs will be used in this project.

best_clf = best_estimators['svm']

Now drawing the Confusion Matrix:

from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, best_clf.predict(X_test))
import seaborn as sn
plt.figure(figsize = (10,7))
sn.heatmap(cm, annot=True)
plt.xlabel('Predicted')
plt.ylabel('Truth')

Saving the model in the respective pkl file for future use in making Web Apps etc.

!pip install joblib
import joblib 
# Save the model as a pickle in a file 
joblib.dump(best_clf, 'saved_model.pkl')

Saving the Dictionary as Json file for future use:

import json
with open("class_dictionary.json","w") as f:
    f.write(json.dumps(class_dict))

A model has been Successfully made which now can be further use to make Websites.

Link for the Complete Code is : Github

Thank You for spending your valuable time in reading this article. Do let me know your views and suggestions in comments.