Compare the similarity of two pictures

Posted by cosmoparty on Sat, 18 Dec 2021 00:07:10 +0100

1. Cosine similarity

from PIL import Image
from numpy import average, linalg, dot


def get_thumbnail(image, size=(30, 30), greyscale=False):
    image = image.resize(size, Image.ANTIALIAS)
    if greyscale:
        image = image.convert('L')
    return image


def image_similarity_vectors_via_numpy(image1, image2):
    image1 = get_thumbnail(image1)
    image2 = get_thumbnail(image2)
    images = [image1, image2]
    vectors = []
    norms = []
    for image in images:
        vector = []
        for pixel_tuple in image.getdata():
            vector.append(average(pixel_tuple))
        vectors.append(vector)
        norms.append(linalg.norm(vector, 2))
    a, b = vectors
    a_norm, b_norm = norms
    res = dot(a / a_norm, b / b_norm)
    return res

import os
path = './duibi/voucher/'
path2 = './duibi/inkpad/'
files2 = os.listdir(path2)
# print(files2)
lst_data = []
for root, dirs, files in os.walk(path):
    for file in files:
        image1 = Image.open(os.path.join(root, file))
        for f in files2:
            image2 = Image.open(os.path.join(path2, f))
            cosin = image_similarity_vectors_via_numpy(image1, image2)
            lst_data.append(cosin)
            # print(cosin)
print(lst_data)
print(min(lst_data))
print(max(lst_data))

Picture 1:
shape:(26, 86)
Picture 2:
shape:(25, 85)
The size difference between the two pictures is one pixel, and the comparison result is 0.9991

2. SSIM (structural similarity measure)

This is a full reference image quality evaluation index, which measures image similarity from three aspects: brightness, contrast and structure.

SSIM value range [0, 1]. The larger the value, the smaller the image distortion.

In practical application, the sliding window can be used to block the image so that the total number of blocks is N. considering the influence of window shape on blocks, Gaussian weighting is used to calculate the mean, variance and covariance of each window, and then the structural similarity SSIM of the corresponding block is calculated. Finally, the average value is used as the structural similarity measure of the two images, that is, the average structural similarity SSIM.

from skimage.measure import compare_ssim
from scipy.misc import imread
import numpy as np
 
img1 = imread('1.jpg')
img2 = imread('2.jpg')
 
img2 = np.resize(img2, (img1.shape[0], img1.shape[1], img1.shape[2]))
 
print(img2.shape)
print(img1.shape)
ssim = compare_ssim(img1, img2, multichannel=True)
 
print(ssim)

Use the above two pictures: the result is 0.146
It seems that the result is much worse than the above result

3. Histogram based

Histogram can describe the global distribution of colors in an image. It is an entry-level image similarity calculation method.

from PIL import Image
 
def make_regalur_image(img, size = (256, 256)):
    return img.resize(size).convert('RGB')
 
 
def hist_similar(lh, rh):
    assert len(lh) == len(rh)
    return sum(1 - (0 if l == r else float(abs(l - r))/max(l, r)) for l, r in zip(lh, rh))/len(lh)
 
 
def calc_similar(li, ri):
    return hist_similar(li.histogram(), ri.histogram())
 
 
if __name__ == '__main__':
    img1 = Image.open('1.jpg')
    img1 = make_regalur_image(img1)
    img2 = Image.open('2.jpg')
    img2 = make_regalur_image(img2)
    print(calc_similar(img1, img2))

The result is 0.845
Histogram is too simple, it can only capture the similarity of color information, and can not capture more information. As long as the color distribution is similar, it will be judged that the similarity between the two is high, which is obviously unreasonable.

4. Based on Mutual Information

The similarity between the two images is characterized by calculating the mutual information of the two images.

from sklearn import metrics as mr
from scipy.misc import imread
import numpy as np
 
img1 = imread('1.jpg')
img2 = imread('2.jpg')
 
img2 = np.resize(img2, (img1.shape[0], img1.shape[1], img1.shape[2]))
 
img1 = np.reshape(img1, -1)
img2 = np.reshape(img2, -1)
print(img2.shape)
print(img1.shape)
mutual_infor = mr.mutual_info_score(img1, img2)
 
print(mutual_infor)

Result: 2.163
If the two pictures have the same size, it can represent the similarity of the two pictures to a certain extent. However, in most cases, the size of the pictures is different. If the size of the two pictures is adjusted to the same, a lot of original information will be lost, so it is difficult to grasp. After practical verification, this method is really difficult to grasp.

Topics: Python OpenCV AI Pytorch