# Implementation and analysis of LZW codec algorithm

Posted by intergroove on Fri, 04 Mar 2022 08:56:14 +0100

This paper mainly explains the code and clarifies the code structure and ideas

# 1, Experimental purpose

Master the basic principle of dictionary coding, program LZW decoder with C/C + + language, and analyze the encoding and decoding algorithm.

# 2, Experimental principle

## 1. Code

(1) Code 0-255 is used to store the characters with Ascii code [0255] and put them in the dictionary.
(2) The encoding starts at 256 and the characters that appear are included in the dictionary
(3) Core idea: using the reusability of characters, whenever a code is output to the result, a new string is stored in the dictionary

## 2. Decoding

The process of establishing and decompressing a dictionary is not the same as the process of decompressing a dictionary. After decompression, the dictionary created when compressing the file is exactly the same as the dictionary created by decompression.

# 3, Experimental steps

## 1. Code

Step 1: initialize the dictionary to include all possible single characters, and initialize the current prefix P to null.
Step 2: the current character C = the next character in the character stream.
Step 3: judge whether P + C is in the dictionary
(1) If "yes", expand P with C, that is, let P=P + C, and return to step 2.
(2) If "no", the codeword W corresponding to the current prefix P is output; Add P + C to the dictionary; Let P=C and return to step 2

## 2. Decoding

Step 1: at the beginning of decoding, the dictionary contains all possible prefix roots.
3
Step 2: make CW: = the first codeword in the codeword stream.
Step 3: output the current suffix string CW to codeword stream.
Step 4: Previous codeword PW: = current codeword CW.
Step 5: current codeword CW: = next codeword of codeword stream.
Step 6: determine the prefix string Whether CW is in the dictionary.
(1) If "yes", the prefix string string CW output to character stream; Current prefix P: = prefix string first PW； Current character c: = current prefix string The first character of CW; Add the suffix string P+C to the dictionary.
(2) If "no", the current prefix P: = prefix string first PW； Prefix string - current character string The first character of CW; Output the suffix string P+C to the character stream, and then add it to the dictionary.
Step 7: judge whether there are still codewords to be translated in the codeword stream.
(1) If yes, go back to step 4.
(2) If "no", end

# 4, Code implementation and comments

```typedef struct {//A custom structure variable used as a file stream
FILE* fp;
int rack;
}BITFILE;

BITFILE* OpenBitFileInput(char* filename);//Structure variables are used to store input files
BITFILE* OpenBitFileOutput(char* filename);//Structure variables are used to store the output file
void CloseBitFileInput(BITFILE* bf); //Close bitstream file
void CloseBitFileOutput(BITFILE* bf);//Close bitstream file
int BitInput(BITFILE* bf);
unsigned long BitsInput(BITFILE* bf, int count);//Input bitstream
void BitOutput(BITFILE* bf, int bit);
void BitsOutput(BITFILE* bf, unsigned long code, int count);//Map the encoding corresponding to the character to a fixed length encoding
```

Source file:

```struct {
int suffix;
int parent, firstchild, nextsibling;
} dictionary[MAX_CODE + 1];
int next_code;
int d_stack[MAX_CODE]; // stack for decoding a phrase
```
```#define input(f) ((int)BitsInput( f, 16))
#define output(f, x) BitsOutput( f, (unsigned long)(x), 16)
```
```int DecodeString(int start, int code);
void InitDictionary(void);
void PrintDictionary(void) {
//Print out a custom dictionary
int n;
int count;
for (n = 256; n < next_code; n++) {
count = DecodeString(0, n);
printf("%4d->", n);
while (0 < count--) printf("%c", (char)(d_stack[count]));
printf("\n");
}
}

int DecodeString(int start, int code) {
//Decode the string from the code to d_stack in this stack
int count;
count = start;
while (0 <= code) {
d_stack[count] = dictionary[code].suffix;
code = dictionary[code].parent;
count++;
}
return count;
}
void InitDictionary(void) {
//Initialize the dictionary, set the structure with the logical data structure of the tree, and the storage structure of the array
int i;

for (i = 0; i < 256; i++) {
dictionary[i].suffix = i;
dictionary[i].parent = -1;
dictionary[i].firstchild = -1;
dictionary[i].nextsibling = i + 1;
}
dictionary[255].nextsibling = -1;
next_code = 256;
}
/*
* Input: string represented by string_code in dictionary,
* Output: the index of character+string in the dictionary
*/
int InDictionary(int character, int string_code) {
//Query whether the string added with the new character is in the dictionary
int sibling;
if (0 > string_code) return character;
sibling = dictionary[string_code].firstchild;
while (-1 < sibling) {
if (character == dictionary[sibling].suffix) return sibling;
sibling = dictionary[sibling].nextsibling;
}
return -1;
}

void AddToDictionary(int character, int string_code) {
//Adds a newly formed string to the dictionary
int firstsibling, nextsibling;
if (0 > string_code) return;
dictionary[next_code].suffix = character;
dictionary[next_code].parent = string_code;
dictionary[next_code].nextsibling = -1;
dictionary[next_code].firstchild = -1;
firstsibling = dictionary[string_code].firstchild;
if (-1 < firstsibling) {	// the parent has child
nextsibling = firstsibling;
while (-1 < dictionary[nextsibling].nextsibling)
nextsibling = dictionary[nextsibling].nextsibling;
dictionary[nextsibling].nextsibling = next_code;
}
else {// no child before, modify it to be the first
dictionary[string_code].firstchild = next_code;
}
next_code++;
}

void LZWEncode(FILE* fp, BITFILE* bf) {
int character;
int string_code;//Dictionary encoding corresponding to character or string
int index;
unsigned long file_length;

fseek(fp, 0, SEEK_END);
file_length = ftell(fp);
fseek(fp, 0, SEEK_SET);
BitsOutput(bf, file_length, 4 * 8);
InitDictionary();//Initialize dictionary
string_code = -1;
while (EOF != (character = fgetc(fp))) {//Read characters from files
index = InDictionary(character, string_code);
if (0 <= index) {	// string+character in dictionary
string_code = index;
}
else {
output(bf, string_code);// string+character not in dictionary
if (MAX_CODE > next_code) {	//When there is room in the dictionary
}
string_code = character;//Put the code of the current character after the existing code string
}
}
output(bf, string_code);// Write the code string to the corresponding binary file
}

void LZWDecode(BITFILE* bf, FILE* fp) {
int character;//Character code
int new_code, last_code;
int phrase_length;
unsigned long file_length;//file length

file_length = BitsInput(bf, 4 * 8);
if (-1 == file_length) file_length = 0;
/*Need to fill*/
InitDictionary();//Initialize dictionary
last_code = -1;
while (0 < file_length) {
new_code = input(bf);//Read in a character
if (new_code >= next_code) { // When the character code is greater than the character code in the dictionary
d_stack[0] = character;
phrase_length = DecodeString(1, last_code); //Solve the characters and store them in d_stack stack
}
else {//When the character code is smaller than the character code in the dictionary
phrase_length = DecodeString(0, new_code);//Solve the characters and store them in d_stack stack
}
character = d_stack[phrase_length - 1];
while (0 < phrase_length) {//Output string to text file
phrase_length--;
fputc(d_stack[phrase_length], fp);
file_length--;
}
if (MAX_CODE > next_code) {//When there is room in the dictionary
}
last_code = new_code;//Number of updated dictionaries last_code is the latest new_code
}
}

```
```int main(int argc, char** argv) {
FILE* fp;//File pointer entered
BITFILE* bf; //Output binary stream pointer

if (4 > argc) {//Error in input parameters
fprintf(stdout, "usage: \n%s <o> <ifile> <ofile>\n", argv[0]);
fprintf(stdout, "\t<o>: E or D reffers encode or decode\n");
fprintf(stdout, "\t<ifile>: input file name\n");
fprintf(stdout, "\t<ofile>: output file name\n");
return -1;
}
if ('E' == argv[1][0]) { // do encoding
fp = fopen(argv[2], "rb");//Read in file
bf = OpenBitFileOutput(argv[3]);//Create output file
if (NULL != fp && NULL != bf) {
LZWEncode(fp, bf);//code
fclose(fp);//Close file pointer
CloseBitFileOutput(bf);//Close file stream pointer
fprintf(stdout, "encoding done\n");
}
}
else if ('D' == argv[1][0]) {	// do decoding
bf = OpenBitFileInput(argv[2]);//Read in file stream
fp = fopen(argv[3], "wb");//Create output file
if (NULL != fp && NULL != bf) {
LZWDecode(bf, fp);//decode
fclose(fp);//Close file pointer
CloseBitFileInput(bf);//Close file stream pointer
fprintf(stdout, "decoding done\n");
}
}
else {	// otherwise
fprintf(stderr, "not supported operation\n");
}
return 0;
}
```

# 5, Experimental results and analysis

After comparing more than ten types of documents, it is concluded that:

The compression effect of text file is good, and the compression effect of image and video file is poor.

LZW coding principle is based on repeated characters, so for text files with low repetition rate in data, it is necessary to create a new string and store it in the dictionary, which will also lead to low compression efficiency.

However, this coding is mainly used for image data compression. For the signal source with simple image, smooth and low noise, it has high compression ratio and high compression and decompression speed.

There is no change when compressing yuv files.

# summary

lzw compression is a lossless compression method that compresses files into small files based on table lookup algorithm.