FastDFS FileID and Trunk File information

Posted by zahadum on Fri, 18 Feb 2022 18:32:41 +0100

Docker installation FastDfs:

https://registry.hub.docker.com/r/ygqygq2/fastdfs-nginx
https://github.com/ygqygq2/fastdfs-nginx

docker network create fastdfs-net
docker run -dit --network=fastdfs-net --name tracker -v /var/fdfs/tracker:/var/fdfs ygqygq2/fastdfs-nginx:latest tracker
docker run -dit --network=fastdfs-net --name storage0 -e TRACKER_SERVER=tracker:22122 -v /var/fdfs/storage0:/var/fdfs ygqygq2/fastdfs-nginx:latest storage
docker run -dit --network=fastdfs-net --name storage1 -e TRACKER_SERVER=tracker:22122 -v /var/fdfs/storage1:/var/fdfs ygqygq2/fastdfs-nginx:latest storage

docker run -dit --net=bridge -p 22122:22122 --name tracker -v /var/fdfs/tracker:/var/fdfs ygqygq2/fastdfs-nginx:latest tracker
docker run -dit --net=bridge -p 23000:23000 -p 8080:8080 --name storage0 -e TRACKER_SERVER=172.23.101.154:22122 -v /var/fdfs/storage0:/var/fdfs ygqygq2/fastdfs-nginx:latest storage

Checked the version

$fdfs_trackerd --help
FastDFS server v6.06

Merge storage:

FastDFS provides the merge storage function. By default, the large file created is 64MB, and then many small files are stored in the large file. The space for a small file in a large file is called a Slot. It is specified that the minimum value of the Slot is 256 bytes and the maximum value is 16MB. That is, files smaller than 256 bytes also need to occupy 256 bytes. Files larger than 16MB will not be merged and stored, but will create independent files.

1. Merge storage configuration:

FastDFS provides the implementation of consolidated storage function, and all configurations are in tracker In the conf file, the specific excerpts are as follows:
trunk function startup and configuration: through tracker Conf file startup and configuration. The following configuration items are available:

use_trunk_file = true  #Whether to enable trunk storage. Default: false is not enabled, and true is enabled
slot_min_size = 256     #Unit: bytes, the minimum allocation unit of trunk file. If it is less than this, it will also cost 256 bytes to store
slot_max_size = 1MB     #The maximum file stored inside trunk. If it exceeds this value, it will be stored independently. The default is 16MB
trunk_file_size = 64MB                   #trunk file size
trunk_create_file_advance = false        #Create trunk file in advance
trunk_create_file_time_base = 02:00      #The base time when the trunk file was created in advance
trunk_create_file_interval = 86400       #Time interval between pre creation of trunk files
trunk_create_file_space_threshold = 20G  #trunk maximum free space for creating files
trunk_init_check_occupying = false       #Check whether each free space list item has been used at startup
trunk_init_reload_from_binlog = false   #Whether to rebuild the free space list purely from trunk binlog
trunk_compress_binlog_min_interval = 0  #Time interval for compressing trunk binlog

2. Merge storage file naming and file structure

When uploading a file to FastDFS is successful, the server returns the access ID of the file called fileid. When consolidated storage is not started, the fileid corresponds to the files actually stored on the disk one by one. When consolidated storage is adopted, it is no longer one-to-one correspondence, but the files corresponding to multiple fileids are stored as a large file.

Note: in the following, the large files after consolidated storage are collectively referred to as Trunk files, and the files without consolidated storage are collectively referred to as source files;
Note three concepts:
1) Trunk file: the actual file stored on the storage server disk. The default size is 64MB
2) FileId of consolidated storage file: indicates the FileId returned to the client each time the server enables consolidated storage. Note that there is no one-to-one correspondence between the FileId and the file on the disk at this time;
3) Fileid without merge storage: indicates the fileid returned during Upload when the server does not enable merge storage

Trunk File name format: fdfs_storage1/data/00/00/000001 The file name is incremented from 1, and the type is int；

fileid is encoded by Base64 url.

1. When starting merge storage, the fileid returned by the service to the client will also change

1) fileid without merge storage:

The file name (excluding suffix) adopts Base64 encoding and contains the following five fields (each field is a 4-byte integer):

group1/M00/00/00/rBEAAWCHwpKAG_IaAAE2xZYv3yo399.tar.gz

In this file name, except tar.gz is the file suffix, CmQPRlP0T4-AA9_ECDsoXi21HR0 is a base64 encoded buffer, which is composed of:
storage_id (numeric type of IP) source storage server ID or IP address
Timestamp (file creation timestamp)
file_size (if the original value is 32 bits, add a random value to fill in the front, and the final value is 64 bits)
crc32 (inspection code of document content)
Random number (the purpose of introducing random number is to prevent the generation of duplicate name files)

rBEAAWCHwpKAG_IaAAE2xZYv3yo399
| 4bytes | 4bytes    | 8bytes    |4bytes | 2bytes |
| ip     | timestamp | file_size |crc32  | Check value |

2) fileid when merging storage:

If merge storage is used, the generated file ID will be longer, and the base64 text length will be 16 characters (12 bytes) after the file name. This part also adopts Base64 encoding and contains the following three fields (each field is a 4-byte integer)

group1/M00/00/00/rBEAAWCHwtmIWTjVAAFls5d0ZtEAAAAAQAAAAAAAWYA081.conf

The merged file ID is longer, because it needs to add the saved large file ID and offset, including the following information:
file_size: occupy the space of large files (pay attention to align according to the minimum slot-256 bytes)
mtime: file modification time
crc32: crc32 code of file content
formatted_ext_name: file extension
alloc_size: the allocated space is greater than or equal to the file size
trunk file ID: large file ID, such as 00000 1
Offset: the offset of the file content in the trunk file
Size: file size

rBEAAWCHwtmIWTjVAAFls5d0ZtEAAAAAQAAAAAAAWYA081
| 4bytes | 4bytes    | 8bytes    |4bytes | 4bytes   | 4bytes | 4bytes     | 2bytes |
| ip     | timestamp | file_size |crc32  | trunk ID | offset | alloc_size | Check value |

Internal structure of Trunk file

The trunk is composed of several small files. Each small file will have a trunk header and the real data immediately following it. The structure is as follows:

|||-----------— 24bytes -------------------|||
|—1byte   —|—  4bytes  —|— 4bytes —|—4bytes— |—4bytes—|—7bytes            —|
|—filetype—|—alloc_size—|—filesize—|—crc32  —|—mtime —|—formatted_ext_name—|
|||----------- file_data filesize bytes -----------|||
|------------ file_data -------------------—|

As can be seen from the above figure, each trunk header occupies 24 bytes.

How to deal with historical documents

Historical files remain unchanged

How to modify the client

FileID base64 file length is increased by 16 characters, and the storage length is extended

Get FileID information

View file information

docker exec -it storage0 /usr/bin/fdfs_file_info /etc/fdfs/client.conf group1/M00/00/00/rBEAAWCHfWGIFxaOAADq7m4niBIAAAAAQABNwAAAOwA731.pdf
GET FROM SERVER: false

file type: normal
source storage id: 0
source ip address: 172.17.0.1
file create timestamp: 2021-04-27 02:56:33
file size: 60142
file crc32: 1848084498 (0x6e278812)

Java get file ID information

FileInfo f2 = storageClient.get_file_info("group1", "M00/00/00/rBEAAWCIuzmIeQCWAAE2xZYv3yoAAAAAQABZgAAATcA621.png");
LOG.info(f2.toString());
//rBEAAWCHwtmIWTjVAAFls5d0ZtEAAAAAQAAAAAAAWYA081
//rBEAAWCIuzmIeQCWAAE2xZYv3yoAAAAAQABZgAAATcA621
byte[] fileId = StorageClient.base64.decodeAuto("rBEAAWCIuzmIeQCWAAE2xZYv3yoAAAAAQABZgAAATcA621".substring(27));
int trunkID = ProtoCommon.buff2int(fileId, 0);
LOG.info("trunk ID:{}", trunkID);
int offset = ProtoCommon.buff2int(fileId, 4);
LOG.info("offset:{}", offset);
int alloc_size = ProtoCommon.buff2int(fileId, 8);
LOG.info("alloc_size:{}", alloc_size);

Restore Trunk source files

Java gets the Trunk Header information and restores the file

byte[] in = FileUtil.readBytes("C:\\Users\\zkl\\Desktop\\000001");

byte[] fileId = StorageClient.base64.decodeAuto("rBEAAWCIuzmIeQCWAAE2xZYv3yoAAAAAQABZgAAATcA621".substring(27));
int offset = ProtoCommon.buff2int(fileId, 4);
LOG.info("offset:{}", offset);

int len = ProtoCommon.buff2int(in, offset + 5);
LOG.info("file length:{}", len);

int crc32 = ProtoCommon.buff2int(in, offset + 9);
LOG.info("crc32:{}", crc32);

String ext = new String(in, offset + 17, 7, StandardCharsets.UTF_8);
LOG.info("Extension:{}", ext);

//Copy file
byte[] out = new byte[len];
System.arraycopy(in, offset + 24, out, 0, len);
FileUtil.writeBytes(out, "C:\\Users\\zkl\\Desktop\\" + crc32 + "." + ext);
LOG.info("Restore file complete: C:\\Users\\zkl\\Desktop\\{}.{}", crc32 + ext);

FileID resolution

Pre merge file ID

rBEAAWCHwpKAG_IaAAE2xZYv3yo399
 fetch_from_server = false
 file_type = 1
 source_ip_addr = 172.17.0.1 
 file_size = 79557
 create_timestamp = 2021-04-27 15:51:46
 crc32 = -1775247574

Merged file ID

First file fileID: rBEAAWCHwtmIWTjVAAFls5d0ZtEAAAAAQAAAAAAAWYA081

fetch_from_server = false
file_type = 1, source_ip_addr = 172.17.0.1
file_size = 91571
create_timestamp = 2021-04-27 15:52:57
crc32 = -1753979183
trunk ID: 1
offset: 0
alloc_size: 91648

Second file fileID: rBEAAWCIuzmIeQCWAAE2xZYv3yoAAAAAQABZgAAATcA621
fetch_from_server = false
file_type = 1
source_ip_addr = 172.17.0.1
file_size = 79557
create_timestamp = 2021-04-28 09:32:41
crc32 = -1775247574
trunk ID: 1
offset: 91648
alloc_size: 79616

The offset of the second file is exactly the alloc of the first file_ size

Metadata information

After creating the file, if an additional file is created after uploading NameValuePair to store metadata information, fileid + '- m' file

zkl@DESKTOP-9S8GQOE:/var/fdfs/storage0/data/00/00$ ll
total 364
drwxr-xr-x   2 root root     4096 Apr 28 09:32 ./
drwxr-xr-x 258 root root     4096 Apr 27 15:48 ../
-rw-r--r--   1 root root 67108864 Apr 28 16:40 000001
-rw-r--r--   1 root root    79557 Apr 27 15:51 rBEAAWCHwpKAG_IaAAE2xZYv3yo399.png
-rw-r--r--   1 root root       14 Apr 27 15:51 rBEAAWCHwpKAG_IaAAE2xZYv3yo399.png-m
-rw-r--r--   1 root root       14 Apr 27 15:52 rBEAAWCHwtmIWTjVAAFls5d0ZtEAAAAAQAAAAAAAWYA081.png-m
-rw-r--r--   1 root root       14 Apr 28 09:32 rBEAAWCIuzmIeQCWAAE2xZYv3yoAAAAAQABZgAAATcA621.png-m
$ cat rBEAAWCIuzmIeQCWAAE2xZYv3yoAAAAAQABZgAAATcA621.png-m
fileName1.png

Article reference:

http://www.ityouknow.com/fastdfs/2018/01/06/distributed-file-system-fastdfs.html
https://blog.csdn.net/hfty290/article/details/42026215
https://segmentfault.com/a/1190000022355751
https://blog.csdn.net/xiamoyanyulrq/article/details/81273745
https://rdc.hundsun.com/portal/article/890.html
https://www.freesion.com/article/8851731199/
https://blog.csdn.net/happyzwh/article/details/90299629
https://segmentfault.com/a/1190000018251300
https://segmentfault.com/a/1190000021273767?utm_source=tag-newest

Topics: FastDFS

Programmer Think