Custom types: structure, enumeration, union

Posted by mmoore on Sun, 23 Jan 2022 22:44:43 +0100

Key points of this chapter
structural morphology

  • Declaration of structure type
  • Self reference of structure
  • Definition and initialization of structure variables
  • Structure memory alignment
  • Structural transmission parameters
  • Struct implementation bit segment (filling & portability of bit segment)

enumeration

  • Definition of enumeration type
  • Advantages of enumeration
  • Use of enumerations

union

  • Definition of union type
  • Characteristics of joint
  • Calculation of joint size

catalogue

I Structure

1. Declaration of structure type

2. Definition and initialization of structure variables

3. Structure memory alignment

4. Structural transmission parameters

 5. Structure implementation bit segment

II enumeration

III union

I Structure

1. Declaration of structure type

A structure is a collection of values called member variables. Each member of the structure can be a different type of variable

Declaration of structure

struct tag
{
member-list;
}variable-list;

(Note: tag is the structure label)

For example, use structure to describe a student

struct Stu
{
char name[20];//name
int age;//Age
char sex[5];//Gender
char id[20];//Student number
}; //Semicolons cannot be lost

Incomplete statement

//Anonymous structure type
struct
{
int a;
char b;
float c;
}x;


struct
{
int a;
char b;
float c;
}a[20], *p;

The above two structures omit the structure tag when declaring

So here comes the question...

//Based on the above code, is the following code legal?
p = &x;

Illegal

Warning:
The compiler will treat the above two declarations as two completely different types.
So it's illegal
 

Self reference of structure

Can a structure contain a member whose type is the structure itself?

//Code 1
struct Node
{
int data;
struct Node next;
};

Note: This is wrong

Imagine, if you can, what is the sizeof(struct Node)?

Correct self reference method:

//Code 2
struct Node
{
int data;
struct Node* next;
};

There is one more point to add

If typedef is used

//Code 3
typedef struct
{
int data;
Node* next;
}Node;
//Is it feasible to write code like this?

This is not feasible because the rename has not been completed when the Node* next line is obtained

//Solution:
typedef struct Node
{
int data;
struct Node* next;
}Node;

2. Definition and initialization of structure variables

With the structure type, how to define variables is actually very simple
 

struct Point
{
int x;
int y;
}p1; //Define the variable p1 while declaring the type
struct Point p2; //Define structure variable p2
//Initialization: define variables and assign initial values at the same time.
struct Point p3 = {x, y};
struct Stu //Type declaration
{
char name[15];//name
int age; //Age
};
struct Stu s = {"zhangsan", 20};//initialization
struct Node
{
int data;
struct Point p;
struct Node* next;
}n1 = {10, {4,5}, NULL}; //Structure nesting initialization
struct Node n2 = {20, {5, 6}, NULL};//Structure nesting initialization

3. Structure memory alignment

After mastering the basic use methods of structures, we now discuss a problem in depth:

Calculate the size of the structure

This is also a very popular test point: structure memory alignment

//Exercise 1
struct S1
{
char c1;
int i;
char c2;
};
printf("%d\n", sizeof(struct S1));

Let's start with a code

The answer is 12, not 6 as we thought

Why?

Before exploring, we need to understand a macro: offsetof

It can calculate the offset of a structure member from the starting position of the structure

Next, we use offsetof to calculate the offset of each structure member from the starting position of the structure

struct S1
{
	char c1;
	int i;
	char c2;
};
int main()
{
	printf("%d\n",offsetof(struct S1, c1));
	printf("%d\n", offsetof(struct S1, i));
	printf("%d\n", offsetof(struct S1, c2));
	return 0;
}

 

We get the offset of each member

Let's try to draw a diagram of the structure memory

We find that it seems that in the memory distribution of the structure, some areas are "wasted"

At this point, we introduce the concept of memory alignment

Structure memory alignment

Concept of alignment number:

The number of alignments is the smaller value of the member's own size and the default number of alignments

The VS environment has a default alignment number of 8

There is no default alignment number in Linux environment. The alignment number is the size of the member itself

Memory alignment

  1. The first member of the structure is stored at the 0 offset of the starting position of the structure variable
  2. Starting from the second member, align to the address of an integer multiple of the alignment number
  3. The total size of the structure must be an integer multiple of the maximum number of alignments. The maximum number of alignments refers to the largest number of alignments with members
  4. If a structure is nested, the nested structure is aligned to an integer multiple of its maximum alignment number, and the overall size of the structure is an integer multiple of all the maximum alignment numbers (including the alignment number of nested structures)

    

Let's take a look at this code after learning the knowledge of structure memory alignment

Do you understand

Then let's look at a code to practice

struct S2
{
char c1;
char c2;
int i;
};
printf("%d\n", sizeof(struct S2));

The size of this structure is 8 bytes

Look at another one

struct S3
{
double d;
char c;
int i;
};
printf("%d\n", sizeof(struct S3));

 

The size of this structure is 16 bytes

Let's look at a nested structure

struct S4
{
char c1;
struct S3 s3;
double d;
};
printf("%d\n", sizeof(struct S4));

 

After learning about memory alignment, let's explore:

Why memory alignment?

Most references say this:
1. Platform reason (migration reason):
Not all hardware platforms can access any data at any address; Some hardware platforms can only get certain types of data at certain addresses, otherwise hardware exceptions will be thrown.

2. Performance reasons:
Data structures (especially stacks) should be aligned on natural boundaries as much as possible.
The reason is that in order to access misaligned memory, the processor needs to make two memory accesses; Aligned memory access requires only one access.


For example, the 32-bit environment reads four bytes at a time

on the whole:
Memory alignment of structures is a method of trading space for time.
When designing the structure, we should not only meet the alignment, but also save space. How to do this:
Let the members with small space gather together as much as possible.

//For example:
struct S1
{
char c1;
int i;
char c2;
};
struct S2
{
char c1;
char c2;
int i;
}

S1 is as like as two peas of S2 type, but the size of space occupied by S1 and S2 is somewhat different.

Modify the default number of alignments


We have seen #pragma this preprocessing instruction before. We can use it again to change our default alignment number
 

#include<stdio.h>
#pragma pack(8) / / set the default alignment number to 8
struct S1
{
	char c1;
	int i;
	char c2;
};
#pragma pack() / / unset the default alignment number and restore it to the default

#pragma pack(1) / / set the default alignment number to 1
struct S2
{
	char c1;
	int i;
	char c2;
};
#pragma pack() / / unset the default alignment number and restore it to the default
int main()
{
	//What is the output?
	printf("%d\n", sizeof(struct S1));
	printf("%d\n", sizeof(struct S2));
	return 0;
}

 

4. Structural transmission parameters

struct S
{
int data[1000];
int num;
};
struct S s = {{1,2,3,4}, 1000};
//Structural transmission parameters
void print1(struct S s)
{
printf("%d\n", s.num);
}
//Structure address transmission parameter
void print2(struct S* ps)
{
printf("%d\n", ps->num);
}
int main()
{
print1(s); //Transmission structure
print2(&s); //Transmission address
return 0;
}

 

Which of the print1 and print2 functions above is better?
The answer is: the print2 function is preferred
 

reason:
When a function passes parameters, the parameters need to be pressed on the stack, which will have system overhead in time and space.
If the structure is too large when passing a structure object, the system overhead of parameter stack pressing is relatively large, which will lead to performance degradation
Drop.
When a structure passes parameters, the address of the structure should be passed
 

 5. Structure implementation bit segment

After the structure is finished, we have to talk about the ability of the structure to realize the bit segment

First, what is a bit segment?

Bit segment, C language allows in a structural morphology Specifies the percentage of its members in median Memory Length, this member in bits is called "bit segment" or“ Bit domain ”( bit field) . Using bit segments, data can be stored in fewer bits.  

Definition: information access is generally based on byte In. In fact, sometimes it is not necessary to store one or more bytes of information. For example, "true" or "false" is represented by 0 or 1, and only 1 bit is required. Used in computer for process control, parameter detection or data communication Domain, control information often occupies only one or several bytes Binary Bit, often put several information in a byte.

Bit field (or bit field) is a data structure , the data can be position The form of compact storage and allows programmers to operate on the bits of this structure. Benefits of this data structure:

  • It can save storage space for data units. This method is particularly important when the program needs thousands of data units.

  • Bit segment can easily access a integer Value, which can simplify the program source code.

The disadvantage of bit domain data structure is that its implementation of memory allocation and memory alignment depends on specific machines and systems, and may have different results on different platforms, which leads to the fact that bit segments are not portable in essence

The declaration and structure of bit segments are similar, with two differences:

  • 1. The member of the bit field must be int, unsigned int or signed int.
  • 2. There is a colon and a number after the member name of the bit field.
     

such as

struct A
{
int _a:2;
int _b:5;
int _c:10;
int _d:30;
};

A is a bit segment type.
What is the size of segment A?
 

printf("%d\n", sizeof(struct A));

The answer is 8

Memory allocation for bit segments

1. The member of the bit segment can be int unsigned int signed int or char (belonging to the shaping family)
2. The space of bit segment is opened up in the way of 4 bytes (int) or 1 byte (char) as required.
3. Bit segments involve many uncertain factors. Bit segments are not cross platform. Pay attention to portable programs and avoid using bit segments

 

Actually:

These numbers 2, 5, 10 and 30 represent the bits they need (these numbers cannot exceed 32), 1byte=8bit

Because the space of bit segment needs to be opened up in the form of 4 bytes (int) or 1 byte (char)

So when you see int a, first open up a four byte space, that is, 4*8=32bit

a. B, C and D require a total of 47bit, more than 32bit, so another 4 bytes of space will be opened up

So the size of struct A is 8 bytes

 

Let's take another example

//An example
struct S
{
char a:3;
char b:4;
char c:5;
char d:4;
};

The answer is 3

Let's analyze it again

Because it is a char type, it first opens up a space of 1 byte, 1 byte is 8 bits, a and b use 7 bits, and then there are 1 bit left

In the vs compilation environment, the remaining bit will not be used again, but will reopen the space of 1 byte, so the final result is 3

Look at the code below

 

//An example
struct S
{
char a:3;
char b:4;
char c:5;
char d:4;
};
struct S s = {0};
s.a = 10;
s.b = 12;
s.c = 3;
s.d = 4;
//How is space opened up?

 

Through this diagram, we can intuitively see the allocation of bit segment memory

Cross platform problem of bit segment

  1. It is uncertain whether the int bit field is treated as a signed number or an unsigned number.
  2. The number of the largest bits in the bit segment cannot be determined. (the 16 bit machine is 16 at most, and the 32-bit machine is 32 at most, which is written as 27. There will be problems on the 16 bit machine
  3. Whether members in the bit segment are allocated from left to right or from right to left in memory has not been defined.
  4. When a structure contains two bit segments, and the member of the second bit segment is too large to accommodate the remaining bits of the first bit segment, it is uncertain whether to discard the remaining bits or use them

Summary:
Compared with the structure, bit segment can achieve the same effect, but it can save space, but there are cross platform problems
 

II enumeration

Enumeration, as the name suggests, is to enumerate one by one.
List the possible values one by one.
For example, in our real life:
Monday to Sunday of a week is a limited seven days, which can be listed one by one.
Gender: male, female and confidential, which can also be listed one by one.
The month has 12 months, which can also be listed one by one
 

An enumeration type is a type whose values are listed by the programmer ("enumeration"), and the programmer must name each value (enumeration constant)

Definition of enumeration type

enum Day//week
{
Mon,
Tues,
Wed,
Thur,
Fri,
Sat,
Sun
};
enum Sex//Gender
{
MALE,
FEMALE,
SECRET
};
enum Color//colour
{
RED,
GREEN,
BLUE
};

enum Day, enum Sex and enum Color defined above are all enum types
The contents in {} are possible values of enumeration types, also known as enumeration constants
 

These possible values have values. By default, they start from 0 and increase by 1 at a time. Of course, initial values can also be assigned when defining

for example

enum Color//colour
{
RED=1,
GREEN=2,
BLUE=4
};

Advantages of enumeration

We can use #define to define constants. Why do we have to use enumeration?
Advantages of enumeration:

  1. Increase the readability and maintainability of the code
  2. Compared with #define defined identifiers, enumeration has type checking, which is more rigorous.
  3. Prevents naming contamination (encapsulation)
  4. Easy to debug
  5. Easy to use, you can define multiple constants at a time

Use of enumerations

enum Color//colour
{
RED=1,
GREEN=2,
BLUE=4
};
enum Color clr = GREEN;//You can only assign values to enumeration variables with enumeration constants, so that there will be no type difference.

 

III union

Like a structure, a union consists of one or more members, and these members may have different types.

However, the compiler only allocates enough memory space for the largest member in the union. The members of the union cover each other in this space.

As a result, assigning a new value to one member will also change the values of other members

To illustrate the basic nature of the union, now declare a union variable u, which has two members

union {
		int i;
		double d;
	}u;

Note that the declaration of a union is very similar to that of a structure

struct {
		int i;
		double d;
	}s;

In fact, there is only one difference between the structural variable s and the joint variable U: the members of s are stored in different memory addresses, while the members of u are stored in the same memory address. The following is the storage of S and u in memory (assuming that the value of int type takes up 4 bytes of memory and the value of double type takes up 8 bytes)

 

In the structure variable s, members i and d occupy different memory units. S takes up 12 bytes in total

In the joint variable U, members i and d overlap each other (i is actually the first 4 bytes of d), so u only uses 8 bytes,

In addition, i and d have the same address

Definition of union type

//Declaration of union type
union Un
{
char c;
int i;
};
//Definition of joint variables
union Un un;

 

union Un
{
int i;
char c;
};
union Un un;
// Is the result of the following output the same?
printf("%d\n", &(un.i));
printf("%d\n", &(un.c));
//What is the output below?
un.i = 0x11223344;
un.c = 0x55;
printf("%x\n", un.i);

 

After learning how to use the consortium, let's review a problem we have encountered

Determine the storage size of the current computer

 

First, what is size side storage

Put the lower value at the lower address, which is the small end storage mode

Put the low value at the high address, which is the big end storage mode


 

Method 1: forcibly convert the address of a to char * and take only one byte to see whether it is 0 or 1

int cheak_sys()
{
	int a = 1;
	return (*(char*)&a);
}

int main()
{
	int ret = cheak_sys();
	if (1 == ret)
		printf("Small end\n");
	else
		printf("Big end\n");
	return 0;
}

 

Method 2: joint

int cheak_sys()
{
	union Un
	{
		char c;
		int i;
	}u;
	u.i = 1;
	return u.c;
}
int main()
{
	int ret = cheak_sys();
	if (1 == ret)
		printf("Small end\n");
	else
		printf("Big end\n");
	return 0;
}

 

Calculation of joint size

  • The size of the union is at least the size of the largest member.
  • When the maximum member size is not an integer multiple of the maximum alignment number, it should be aligned to an integer multiple of the maximum alignment number.

 

union Un1
{
char c[5];
int i;
};
union Un2
{
short c[7];
int i;
};
//What is the output below?
printf("%d\n", sizeof(union Un1));
printf("%d\n", sizeof(union Un2));

 

Topics: microsoft linq