Recognizing Handwritten Digits With Scikit-Learn

Dilnaz N
4 min readJun 11, 2021

--

Handwritten Digits

Recognizing handwritten text has been an issue since the first automatic computers needed to distinguish individual characters in handwritten documents.

Consider the ZIP codes on mail at the post office and the technology required to recognize these five digits. To sort mail automatically and efficiently, these codes must be perfectly recognized.

To address this issue in Python, the scikit-learn library provides a good example to better understand this technique, the issues involved, and the possibility of making predictions.

Objective

The aim is to predict handwritten numbers and validate the model using a variety of training and testing sets.
We established the null hypothesis as follows:

The Digits data set of scikit-learn library provides numerous data-sets that are useful for testing many problems of data analysis and prediction of the results. Some Scientist claims that it predicts the digit accurately 95% of the times.

Digits Dataset

The Scikit-learn library provides numerous datasets that are useful for testing many problems of data analysis and prediction of the results. Also in this case there is a dataset of images called Digits.
This dataset consists of 1,797 images that are 8x8 pixels in size. Each image is a handwritten digit in grayscale.

So, let’s get started

Source Code

The source code is available in GitHub

Implementation

Importing the Python Libraries

Importing Libraries

Loading the Dataset

Upon loading the dataset, we may read the dataset’s information by using the DESCR attribute.
The dataset’s textual description, the authors who contributed to its production, as well as the references will appear in the output as given.

Dataset Information

The digits.images array has the images of the handwritten digits. Each element of this array is an image represented by an 8x8 matrix of integer values corresponding to a grayscale ranging from white (value 0) to black (value 15).

Visualization Of the Array

Visualization Of the 10 Digits

The images used as inputs are 8x8 grayscale images. We can generate a flat array of 64-pixel values, with each pixel representing a column for the classifier.

Defining the Model

In this scenario, sklearn.svm.SVC, which employs the Support Vector Classification(SVC) approach, is an effective estimator.

Here, we will be considering three cases. Each case represents a different range of training and validation sets.

Case-1

Splitting the dataset with test size as 0.01

Training and Prediction Using SVC

The Accuracy and the Classification Report

For Test Case-1, the accuracy is found to be 100%.

Case-2

Splitting the dataset with test size as 0.7

The Accuracy and the Classification Report

For Test Case-2, the accuracy is found to be 98.4%

Case-3

Splitting the dataset with test size as 0.9

The Accuracy and the Classification Report

For Test Case-3, the accuracy is found to be 96.8%

Conclusion

After performing data analysis on the dataset with three separate test cases, we can conclude that the stated hypothesis is correct, i.e., the model correctly predicts the digit 95% of the time.

“I am thankful to mentors at https://internship.suvenconsultants.com for providing awesome problem statements and giving many of us a Coding Internship Exprience. Thank you www.suvenconsultants.com"

--

--

Dilnaz N
Dilnaz N

No responses yet