Digit Recognition Using SK-Learn
Next step in AI
In this step we will dig into machine learning algorithms with more practical views.
Don’t forget to watch last post ->
Today we will using Scikit-Learn (python library) so we don’t have to write any machine learning algorithm from scratch you should know basic python syntax thats all, here we go.
If you never met sk-Learn then for starting with SK-Learn and model building pipeline of SK-Learn i think you should start with hello world program with sk-learn, you can find basics of sk-learn like installation and some of model provide by scikit-learn.
Today we will encounter multi class image classification problem. what is multi class? when we have multiple classes available rather than binary (e.g. dog and cat only two classes). we try to recognize hand written digits. Many open source data set available out there that contains hand written digits from 0 to 9 with true labels. For example number of hand written one’s(1) images and there labels (1). So this is supervised learning. what we can do? we can train any supervised algorithm to find mapping and relationship between images and labels and then take new digit’s image and predict their label. Let’s start, i am using jupyter notebook you can use whatever yo want to.
You can download code with notebook from git hub : https://github.com/parthvadhadiya/Digit-Recognition-using-SK-Learn
Its time to import library in your current environment, yeah this much easy.
In scikit-learn many data sets available for practicing so we just pull digit data set from sk-learn’s load_digits() function.
Let’s explore dataset.
here, dataset is divided into two path, digits.images contains hand written digits(X) and digits.target contains labels(Y) . both are numPy arrays and size is 1797. That mean we have 1797 images and their true labels.
hmmm how it looks like ?
Shape function returns shape of the matrix or vector, so as we can see images have 3-dimention matrix first index is length of dataset(total samples)and than height and width of images 8*8. Most of algorithm can take single dimension array and particularly sk-learn can take only single dimension so we have to reshape this with 8*8 to 64. Then we use numpy’s reshape function. So as a result we have 1797, 64 matrix.
Here, we print first image of digit. if you observe all zeros and other values you see zero digit. For this particular one, all zero value is in middle or outside and between them all values are none zero. let’s see this in more realistic way.
Here, you can easily recognize digit. i used matplotlib for plot image with numpy array you can use whatever you are comfortable with. Also print label of digit.
I hope now you have clear picture of data set. Now we are building classification model with sklearn. I hope you visit my medium link that i mentioned earlier. so you have understanding about how classification works. There i have illustrated binary classification using dog vs cat fake data set. here we have 10 classes with 64 features.
so let’s start with this. we import svm from sklearn than initialize with 0.001 gamma.
As every machine learning program we divide every data into testing and training set so we can evaluate model after training for that sk-learn have excellent function train_test_split() here we have splitted testing and training sets by this function.
Now further we will check that how much samples are in training set and testing set respectively. Note that we have kept testing size 33%
Further we are fitting data into svm classifier that we had initialize earlier. Lastly we are checking the training accuracy. here we are getting result as 0.9898989898989
]
Finally we are ready to predict, we have taken one sample from testing set. then we reshape it for plotting purpose and then plot the same with their true label. now we will pass that image into our classifiers and predict. And then we will compare our prediction with true label.
we get correct answer with simplest model…!!! you can also try with other samples.
=> NOTE:- This post originally published in http://www.programmings4beginners.com/digit-recognition-using-sk-learn/