Chessboard image recognition based on OpenCV

Posted by itsjareds on Mon, 14 Feb 2022 03:46:30 +0100

In this issue, we will learn how to use computer vision technology to recognize chess pieces and their positions on the chessboard

We use computer vision technology and convolutional neural network (CNN) to create a classification algorithm for this project and determine the position of chess pieces on the chessboard. The final application will save the whole image and display it visually, and output the 2D image of the chessboard to view the results.

(left) the frame fed by the real-time camera and the (right) two-dimensional image of the chessboard

01. Data

We have high requirements for the data set of the project, because it will eventually affect our experimental results. The chess data sets we can find on the Internet are taken with different chess sets and different cameras, which leads us to create our own data sets. I generated a custom dataset using chess and a camera (GoPro Hero6 Black from a "first person perspective"), which made my model more accurate. The dataset contains 2406 images, divided into 13 categories (see below). Summary: this takes us a lot of time, but it makes the training image as close as possible to the image we see when we use it in the application.

Subdivisions of custom datasets

To build this dataset, I first created, when you click the S key, the video will get a frame from the video stream and save it. This program enables me to seamlessly change the pieces on the chessboard and capture the image of the chessboard again and again until I have established a large number of different chessboard configurations. Next, I created to cut it into individual pieces using the detection techniques discussed in the next section. Finally, I classify the cropped images by dividing them into labeled folders.

02. Chessboard detection

For chessboard detection, what I want to do is much more complex than using the OpenCV function findchess board corners, but it is not as advanced as CNN. Use low-level and intermediate computer vision technology to find the features of the chessboard, and then convert these features into the coordinates of the outer boundary and 64 independent squares. This process takes the intersection of horizontal and vertical lines generated by Canny edge detection and Hough transform as the center. Hierarchical clustering is used to group intersections by distance and average each group to create final coordinates (see below).

Complete chessboard detection process

03. Chessboard classification

At the beginning of the project, we want to use Keras / TensorFlow to create CNN model and classify the pieces. However, after creating the dataset, only considering the size of CNN, CNN alone can not obtain the desired results. To overcome this obstacle, I used ImageDataGenerator and transfer learning, which added my data and used other pre trained models as the basis.

Create CNN model

In order to use GPU, I created and trained CNN model in the cloud, which greatly reduced the training time. Quick tip: Google Colab is an easy way to get started using GPU. To improve the effectiveness of the data, I used the ImageDataGenerator to expand the original image and expose the model to different versions of the data. The ImageDataGenerator function randomly rotates, rescales, and flips (horizontally) the training data for each period, essentially creating more data. Although there are more conversion options, these conversion options are most effective for the project.

from keras.preprocessing.image import ImageDataGenerator
datagen = ImageDataGenerator(
test_datagen = ImageDataGenerator(rescale=1./255)
train_gen = datagen.flow_from_directory(
    folder + '/train',
    target_size = image_size,
    batch_size = batch_size,
    class_mode = 'categorical',
    color_mode = 'rgb',
test_gen = test_datagen.flow_from_directory(
    folder + '/test',
    target_size = image_size,
    batch_size = batch_size,
    class_mode = 'categorical',
    color_mode = 'rgb',

Instead of training the model from scratch, we realized transfer learning by using the pre trained model and adding the top-level model trained with my custom dataset. I followed a typical transfer learning workflow:

1. Obtain layers from the previously trained model (VGG16).

from keras.applications.vgg16 import VGG16
model = VGG16(weights='imagenet')

2. Freeze them to avoid damaging any information they contain in the training round.

3. Add a new trainable layer at the top of the frozen layer.

from keras.models import Sequential
from keras.layers import Dense, Conv2D, MaxPooling2D, Flatten
from keras.models import Model
base_model = VGG16(weights='imagenet', include_top=False, input_shape=(224,224,3)) 
# Freeze convolutional layers from VGG16
for layer in base_model.layers:
    layer.trainable = False
# Establish new fully connected block
x = base_model.output
x = Flatten()(x) 
x = Dense(500, activation='relu')(x) 
x = Dense(500, activation='relu')(x)
predictions = Dense(13, activation='softmax')(x)
# This is the model we will train
model = Model(inputs=base_model.input, outputs=predictions)
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['categorical_accuracy'])

4. Train a new layer on a custom dataset.

epochs = 10
history =
    verbose = 1,

When we use VGG16 or VGG19 as the pre training model to create the model, we choose the model using VGG16 because of the higher verification accuracy. In addition, the best epochs is 10. Any number greater than 10 will not improve the accuracy of verification, nor will it increase the difference between training and verification accuracy. Conclusion: transfer learning enables us to make full use of the advantages of deep learning in image classification without large data sets.

04. Results

In order to better visually verify the accuracy, I created the confusion matrix of model prediction. With this chart, you can easily evaluate the advantages and disadvantages of the model. Advantages: the empty - accuracy rate is 99%, and the recall rate is 100%; White and black (WP and BP) - F1 scores about 95%. Disadvantages: White Knight (WN) - high recall rate (98%), but low accuracy (65%); White bishop (WB) - lowest recall rate, 74%.

Confusion matrix of test data

05. Application

The goal of the application is to use the CNN model and visualize the performance of each step. We created, which clearly shows the steps and creates, which shows the details of each step. This application saves the original frame in the real-time video stream, 64 cropped images of each square and the final 2D image of the chessboard.

# Save the frame to be analyzed        
cv2.imwrite('frame.jpeg', frame)  
# Low-level CV techniques (grayscale & blur)        
img, gray_blur = read_img('frame.jpeg') 
# Canny algorithm        
edges = canny_edge(gray_blur) 
# Hough Transform        
lines = hough_line(edges)   
# Separate the lines into vertical and horizontal lines        h_lines, v_lines = h_v_lines(lines) 
# Find and cluster the intersecting        
intersection_points = line_intersections(h_lines, v_lines)        points = cluster_points(intersection_points)   
# Final coordinates of the board        
points = augment_points(points)       
# Crop the squares of the board a organize into a sorted list        x_list = write_crop_images(img, points, 0)        
img_filename_list = grab_cell_files()        img_filename_list.sort(key=natural_keys)   
# Classify each square and output the board in Forsyth-Edwards Notation (FEN)        
fen = classify_cells(model, img_filename_list) 
# Create and save the board image from the FEN        
board = fen_to_image(fen)      
# Display the board in ASCII        
# Display and save the chessboard image        
board_image = cv2.imread('current_board.png')        cv2.imshow('current board', board_image)  

Code link: