Font reading and display of dot matrix Chinese characters

Posted by poe on Fri, 19 Nov 2021 02:06:57 +0100

1, Chinese character coding rules

1) Location code

It is stipulated in the national standard GD2312-80 that all national standard Chinese characters and symbols are allocated in a square matrix with 94 rows and 94 columns. Each row of the square matrix is called an "area", numbered from 01 to 94, and each column is called a "bit", numbered from 01 to 94, The area code and tag number of each Chinese character and symbol in the square array are combined to form four Arabic numerals, which are their "location code".
The first two digits of the location code are its area code and the last two digits are its bit code. A Chinese character or symbol can be uniquely determined by location code. Conversely, any Chinese character or symbol also corresponds to a unique location code.

For example, the location code of the Chinese character "parent" is 3624, indicating that it is 24 digits in area 36 of the square matrix. If the location code of the question mark "? Is 0331, it is 3l digits in area 03.

2) Internal code

The internal code of Chinese characters refers to the code that represents a Chinese character in the computer.
The internal code is slightly different from the location code. As mentioned above, the area code and bit code of Chinese location code are between 1 ~ 94. If the location code is directly used as the internal code, it will be confused with the basic ASCII code. In order to avoid the conflict between the internal code and the basic ASCII code, it is necessary to avoid the control code (00H~1FH) in the basic ASCII code and distinguish it from the characters in the basic ASCII code.
In order to achieve these two points, 20H can be added to the area code and bit code respectively, and 80h can be added on this basis (here "H" means that the first two digits are hexadecimal numbers). After these processes, it takes two bytes to represent a Chinese character with internal code, which are called high byte and low byte respectively. The internal code of these two bytes is represented according to the following rules:

High byte = area code + 20h + 80h (or area code + A0H)
Low byte = bit code + 20h + 80h (or bit code + AOH)

Since the hexadecimal numbers in the value range of area code and bit code of Chinese characters are 01h ~ 5eh (i.e. 01 ~ 94 in decimal system), the value range of high-order byte and low-order byte of Chinese characters is a1h ~ FEH (i.e. 161 ~ 254 in decimal system).

3) Font code

The Chinese character output code provides the Chinese character font required when outputting Chinese characters, which is used to restore the internal code to Chinese characters for output.
Because Chinese characters are square characters composed of strokes, Chinese characters, no matter how many strokes, can be placed in boxes of the same size. For example, a box composed of small dots in M rows and N columns (called the font lattice of Chinese characters), then each Chinese character can be composed of some points in the lattice. Each point is represented by a binary. If there is a pen shape, it is 1, otherwise it is 0, and the font code of the Chinese character can be obtained.
The collection of all Chinese character codes is called Chinese character library.

2, Font data storage format

The font library can be divided into display font library and print font library according to the output mode. The font used for display output is called display font, which needs to be transferred into memory during work. The word library used for printout is called the print word library, which does not need to be transferred into memory during work.

The word library can also be divided into soft word library and hard word library according to the storage mode. The soft font library is stored on the hard disk in the form of font file (i.e. font file). Now it is mostly used in this way. The hard word library solidifies the word library in a separate memory chip, and then forms an interface card with other necessary devices, which is plugged into the computer, usually called Hanka. This approach is now obsolete.

3, Chinese character dot matrix acquisition

In the dot matrix font of Chinese characters, each bit of each byte represents a point of a Chinese character. Each Chinese character is composed of a rectangular dot matrix. 0 represents no and 1 represents a point. Draw 0 and 1 in different colors to form a Chinese character. There are three commonly used dot matrix font types: 12121414 and 16 * 16.

For example, use 16 × 16 dot matrix represents a Chinese character, that is, each Chinese character is represented by 16 lines and 16 points in each line. One point needs 1-bit binary number and 16 points need 16 bit binary number (i.e. 2 bytes), so 16 lines are required × 2 bytes / line = 32 bytes, i.e. 16 bytes × 16 dot matrix to represent a Chinese character, font code needs 32 bytes.

Therefore, number of bytes = number of lattice lines × (number of lattice columns / 8).

1) Using location code to obtain Chinese characters

The Chinese character dot matrix font is stored according to the sequence of location codes. Therefore, we can obtain the dot matrix of a font according to location. Its calculation formula is as follows:
Starting position of dot matrix = ((area code - 1) * 94 + (bit code - 1)) * number of bytes of Chinese dot matrix

After obtaining the starting position of the dot matrix, we can read and take out the dot matrix of a Chinese character from this position.

2) Acquiring Chinese characters by using Chinese character internal code

As we have said earlier, the relationship between the location code of Chinese characters and the internal code is as follows:
High byte of internal code = area code + 20h + 80h (or area code + AOH) low byte of internal code = bit code + 20h + 80h (or bit code + AOH). Conversely, we can also obtain location code according to internal code:

Area code = high byte of internal code - A0H
Bit code = low byte of internal code - AOH

By combining this formula with the formula for obtaining the Chinese character dot matrix, the position of the Chinese character dot matrix can be obtained.

4, Call opencv to display pictures and print Chinese characters

1) Prepare

Content: under Ubuntu, use C/C + + (or python) to call opencv library to program and display a picture, open a text file named "logo.txt" (there is only one line of text file, including your own name and student number), and read the font data of the corresponding characters in the Chinese character 24 * 24 dot matrix font library (file HZKf2424.hz in the compressed package) according to the name and student number, Overlay the name and student number to the lower right of this picture

First, prepare a picture called blacky.jpg;

Next, download the relevant files:
Extraction code: luha
① 24 * 24 dot matrix font library: HZKf2424.hz
② ascii font: Asci0816.zf
③ Write the contents to be displayed to the logo.txt file

Then open the virtual machine, create a folder named little, and add the above files to the folder;
See the following figure for details:

2) Code

Enter the folder little,cd little, enter the code: gedit test.cpp, and create a. CPP code file;
The code is as follows:

#include<iostream>
 
#include<opencv/cv.h>
 
#include"opencv2/opencv.hpp"
 
#include<opencv/cxcore.h>
 
#include<opencv/highgui.h>
 
#include<math.h>
 
using namespace cv;
 
using namespace std;
 
void paint_chinese(Mat& image,int x_offset,int y_offset,unsigned long offset);
 
void paint_ascii(Mat& image,int x_offset,int y_offset,unsigned long offset);
 
void put_text_to_image(int x_offset,int y_offset,String image_path,char* logo_path);
 
int main(){
 
    String image_path="/home/zhan/little/blacky.jpg";
 
    char* logo_path=(char*)"/home/zhan/little/logo.txt";
 
    put_text_to_image(10,10,image_path,logo_path);
 
    return 0;
 
}
 
 
 
void paint_ascii(Mat& image,int x_offset,int y_offset,unsigned long offset){
 
    //Coordinates of the starting point of the drawing
 
	Point p;
 
	p.x = x_offset;
 
	p.y = y_offset;
 
	 //Storing ascii word film
 
	char buff[16];           
 
	//Open ascii font file
 
	FILE *ASCII;
 
	if ((ASCII = fopen("/home/zhan/little/Asci0816.zf", "rb")) == NULL){
 
		printf("Can't open ascii.zf,Please check the path!");
 
		//getch();
 
		exit(0);
 
	}
 
	fseek(ASCII, offset, SEEK_SET);
 
	fread(buff, 16, 1, ASCII);
 
	int i, j;
 
	Point p1 = p;
 
	for (i = 0; i<16; i++)                  //Sixteen char s
 
	{
 
		p.x = x_offset;
 
		for (j = 0; j < 8; j++)              //One char and eight bit s
 
		{
 
			p1 = p;
 
			if (buff[i] & (0x80 >> j))    /*Test whether the current bit is 1*/
 
			{
 
				/*
					Because the original ascii word film was 8 * 16, it was not large enough,
					So the original pixel is replaced by four pixels,
					After replacement, there are 16 * 32 pixels
					ps: I think it's unnecessary to write code like this, but I only think of this method for the time being
				*/
 
				circle(image, p1, 0, Scalar(0, 0, 255), -1);
 
				p1.x++;
 
				circle(image, p1, 0, Scalar(0, 0, 255), -1);
 
				p1.y++;
 
				circle(image, p1, 0, Scalar(0, 0, 255), -1);
 
				p1.x--;
 
				circle(image, p1, 0, Scalar(0, 0, 255), -1);
 
				
 
			}						
 
			p.x+=2;            //One pixel becomes four, so x and y should both be + 2
 
		}
 
		p.y+=2;
 
	}
 
}
 
 
 
void paint_chinese(Mat& image,int x_offset,int y_offset,unsigned long offset){//Draw Chinese characters on the picture
 
    Point p;
 
    p.x=x_offset;
 
    p.y=y_offset;
 
    FILE *HZK;
 
    char buff[72];//72 bytes for storing Chinese characters
 
    if((HZK=fopen("/home/zhan/little/HZKf2424.hz","rb"))==NULL){
 
        printf("Can't open HZKf2424.hz,Please check the path!");
 
        exit(0);//sign out
 
    }
 
    fseek(HZK, offset, SEEK_SET);/*Move the file pointer to the offset position*/
 
    fread(buff, 72, 1, HZK);/*Read 72 bytes from the offset position, and each Chinese character occupies 72 bytes*/
 
    bool mat[24][24];//Define a new matrix to store the transposed text film
 
    int i,j,k;
 
    for (i = 0; i<24; i++)                 /*24x24 Dot matrix Chinese characters, a total of 24 lines*/
 
	{
 
		for (j = 0; j<3; j++)                /*There are 3 bytes in the horizontal direction, and the value of each byte is determined by cycle*/
 
			for (k = 0; k<8; k++)              /*Each byte has 8 bits, and the loop judges whether each byte is 1*/
 
				if (buff[i * 3 + j] & (0x80 >> k))    /*Test whether the current bit is 1*/
 
				{
 
					mat[j * 8 + k][i] = true;          /*1 is stored in a new word film*/
 
				}
 
				else {
 
					mat[j * 8 + k][i] = false;
 
				}
 
	}
 
    for (i = 0; i < 24; i++)
 
	{
 
		p.x = x_offset;
 
		for (j = 0; j < 24; j++)
 
		{		
 
			if (mat[i][j])
 
				circle(image, p, 1, Scalar(255, 0, 0), -1);		  //Write (replace) pixels
 
			p.x++;                                                //Shift right one pixel
 
		}
 
		p.y++;                                                    //Move down one pixel
 
	}
 
}
 
 
 
void put_text_to_image(int x_offset,int y_offset,String image_path,char* logo_path){//Put Chinese characters on the picture
//x and y are the starting coordinates of the first word on the picture
    //Get pictures through picture path
 
    Mat image=imread(image_path);
 
    int length=18;//Length of characters to print
 
    unsigned char qh,wh;//Define area code and tag number
 
    unsigned long offset;//Offset
 
    unsigned char hexcode[30];//Hexadecimal used to store Notepad reading. Remember to use unsigned
 
    FILE* file_logo;
 
    if ((file_logo = fopen(logo_path, "rb")) == NULL){
 
		printf("Can't open txtfile,Please check the path!");
 
		//getch();
 
		exit(0);
 
	}
 
    fseek(file_logo, 0, SEEK_SET);
 
    fread(hexcode, length, 1, file_logo);
 
    int x =x_offset,y = y_offset;//x. Y: the starting coordinate of the text drawn on the picture
 
    for(int m=0;m<length;){
 
        if(hexcode[m]==0x23){
 
            break;//It ends when the # number is read
 
        }
 
        else if(hexcode[m]>0xaf){
 
            qh=hexcode[m]-0xaf;//The font used starts with Chinese characters, not Chinese symbols
 
            wh=hexcode[m+1] - 0xa0;//Calculation bit code
 
            offset=(94*(qh-1)+(wh-1))*72L;
 
            paint_chinese(image,x,y,offset);
 
            /*
            Calculate the offset in the Chinese character library
            Each Chinese character is represented by a 24 * 24 dot matrix
            A line has three bytes, a total of 24 lines, so 72 bytes are required
            Such as Zhao Zi
            The location code is 5352
            Hex bit 3534
            The internal code is d5d4
            d5-af=38(Decimal), because it starts with Chinese characters, ah, so it subtracts af instead of a0. 38 + 15 equals 53, which corresponds to the area code
            d4-a0=52
            */
 
            m=m+2;//The internal code of a Chinese character occupies two bytes,
 
            x+=24;//A Chinese character has 24 * 24 pixels. Because it is placed horizontally, it moves 24 pixels to the right
 
        }
 
        else{//When the read character is ASCII
 
        wh=hexcode[m];
 
        offset=wh*16l;//Calculate the offset of English characters
 
        paint_ascii(image,x,y,offset);
 
        m++;//English characters only occupy one byte in the file, so just move back one bit
 
        x+=16;
 }
 
 
 
    }
 
    cv::imshow("image", image);
 
    cv::waitKey();
 
}

Remember to change the path in the above code to your own path!!! Then click save.

3) Operation results

Enter the following command to run the file code:

g++ test.cpp -o test `pkg-config --cflags --libs opencv`
./test

The operation results are as follows:

4, Summary

Reading Chinese characters in Ubuntu environment is much more complex and difficult than we thought. Please pay attention to the reference content for more problems!

5, Reference content

Chinese character coding
Font reading and display of dot matrix Chinese characters

Topics: Single-Chip Microcomputer IoT stm32

Programmer Think