Preprocessing in Machine Learning

4 min readJul 14, 2021

If we need some work to be done by others, then they should be trained on that kinda work. In a similar fashion machine also learns the work and defines its perfection in terms of metrics. Mostly used metrics are accuracy, precision.
As we discussed in our previous articles man learns quick but takes time to make huge amount of work, On the other hand machine learns slow but makes it quick.

Coming to preprocessing, It helps to make data sufficiently clean for a model to learn all the patterns and edges required to make a correct prediction. So it is must while passing the data to Neural Network for training and testing. Some of the most common used preprocessing techniques for image classification are:

Image Resizing :
It is done to an original image to change its size,Resizing is necessary before training because some of the images because of variations in the data used while training a model i.e, data collected from online sources, taken from a mobile etc.., may vary in the sizes. So, To form a base size for all the images we particularly resize all the images. Here while resizing we mostly convert image to 224*224, where it detects all the shapes, patterns required to be learnt making computations less complex ( as pixel size decreases computational efficiency decreases). But if we use even less size the model may not behave well.

2. Removing Noise :
It is a process of removing noise from the image to make learning better. We use this preprocessing function to smoothen the image to reduce unwanted noise. This is majorly done using Gaussian blur. It is a method to reduce the noise by smoothing the image. By doing so the visual effect is in such a way that the image is seen through a translucent screen

3. Binarization :
It is nothing but converting a RGB image into Black and White image. This is done so to avoid more learning at the colured patterns where the pixels values are more and model tends to give more weight. So, this preprocessing step helps to make all the pixels separated into white and Black based on the threshold. Important thing to notice here is setting the best threshold.
White pixel value is 255 and that of Black is 0.
With our threshold, lets say 127 less than 127 value of the pixel is considered as white and remaining ones as Black. This will be mostly helpful in text detection and recognition type problems.

But there is a stumbling block here, if the image is subjected to a lighting condition it becomes difficult to set the common threshold for all the pixels of that particular image and in that case this step may not be useful. You can have a look of such case in the below figure.

There are many methods out there for setting these thresholds in this preprocessing step. But mostly used one is OTSU’s threshold as it considers lighting, sharpness, contrast etc., of a particular image to fix better threshold.
We also have adaptive threshold where instead of considering a single threshold for whole image, It takes different one for each pixel based on characteristics of neighbouring pixels. All these can be thresholdings are predefined and can be used directly with a huge computer vision libraries like cv2 and also image libraries like PIL.

4. Noise Removal :
As we know noise plays a major role in hiding the important information in an image and makes learning process difficult while detecting patterns and edges present in it. So removing that noise is the primary objective of this preprocessing step. This is done by smoothen the image and removing the small dots/patches which may have high intensity then the rest of the image. Noise removal is performed for both coloured and binary images.

These are some of the mostly used preprocessing techniques for image classification. And Data augmentation is a pre training step, Where if we are having shortage of data to train a model then we use this technique for increasing the data count. That includes so many techniques like flip, rotate, zoom,crop, adding noise etc., So, Let’s Discuss about that in our next article.

Thank You !!! for reading it…. Follow me and stay tuned for more intrresting topics..

Preprocessing in Machine Learning

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Sai Chandra Nerella

No responses yet