[ceph related] bucket dynamic slicing
1. Background description
Reference notes:
https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/2/html/object_gateway_guide_for_ubuntu/administration_cli#configuring-bucket-index-sharding
https://ceph.com/community/new-luminous-rgw-dynamic-bucket-sharding/
1.1 problem description
During the performance test, the performance plummeted (intermittent performance fluctuation) and the failure to write for a period of time (delay of more than 100 seconds)
1.2 troubleshooting
Viewing the rgw log, it is found that the number of objects written in a single bucket is too large, triggering the automatic slicing reshaping operation
[root@node113 ~]# cat /var/log/ceph/ceph-client.rgw.node113.7480.log | grep reshard 2020-09-16 04:51:50.239505 7fe71d0a7700 0 RGWReshardLock::lock failed to acquire lock on reshard.0000000009 ret=-16 2020-09-16 06:11:56.304955 7fe71d0a7700 0 RGWReshardLock::lock failed to acquire lock on reshard.0000000013 ret=-16 2020-09-16 06:41:58.919390 7fe71d0a7700 0 RGWReshardLock::lock failed to acquire lock on reshard.0000000004 ret=-16 2020-09-16 08:02:00.619906 7fe71d0a7700 0 RGWReshardLock::lock failed to acquire lock on reshard.0000000002 ret=-16 2020-09-16 08:22:01.038502 7fe71d0a7700 0 RGWReshardLock::lock failed to acquire lock on reshard.0000000012 ret=-16 2020-09-16 08:31:58.229956 7fe71d0a7700 0 RGWReshardLock::lock failed to acquire lock on reshard.0000000000 ret=-16 2020-09-16 08:52:06.020018 7fe71d0a7700 0 RGWReshardLock::lock failed to acquire lock on reshard.0000000006 ret=-16 2020-09-16 09:22:12.882771 7fe71d0a7700 0 RGWReshardLock::lock failed to acquire lock on reshard.0000000000 ret=-16
Check the related configurations of rgw. The cluster can store up to 10w objects per partition. Currently, the number of partitions per bucket is set to 8. When the number of objects written exceeds 80w, the automatic partition operation reshard will be triggered
[root@node111 ~]# ceph --show-config | grep rgw_dynamic_resharding rgw_dynamic_resharding = true [root@node111 ~]# ceph --show-config | grep rgw_max_objs_per_shard rgw_max_objs_per_shard = 100000 [root@node111 ~]# ceph --show-config | grep rgw_override_bucket_index_max_shards rgw_override_bucket_index_max_shards = 8 [root@node111 ~]# radosgw-admin bucket limit check "user_id": "lifecycle01", "buckets": [ { "bucket": "cosbench-test-pool11", "tenant": "", "num_objects": 31389791, "num_shards": 370, "objects_per_shard": 84837, "fill_status": "OK" }, { "bucket": "cycle-1", "tenant": "", "num_objects": 999, "num_shards": 8, "objects_per_shard": 124, "fill_status": "OK" },
Parameter description
-
rgw_dynamic_resharding
[official introduction of new parameters in version L]( https://ceph.com/community/new-luminous-rgw-dynamic-bucket-sharding/ , this parameter is enabled by default when a single bucket fill_ When the status reaches OVER 100.000000% (objects_per_shard > rgw_max_objects_per_shard), perform dynamic reshaping (fission new fragments and rebalance data)There is a fatal defect in this parameter. The bucket cannot read or write during the reharding process, because the metadata objects are redistributing the index and need to ensure consistency. At the same time, the larger the amount of data, the longer the time will be
-
rgw_override_bucket_index_max_shards
The number of slices created for a single bucket. The default parameter value is 0 and the maximum parameter value is 7877 (that is, the maximum number of objects written to a single bucket is 7877x100000)
The number of slices parameter is calculated as number of objects expected in a bucket / 100000. If the estimated number of objects in a bucket is 300w, the number of slices is set to 30 (300w/10w)
The number of slices in the sample cluster is 8, that is, when creating a bucket, 8 slices are created by default. When the number of objects in each slice exceeds 10w, continue reshaping to create new slices, and rebalance the index to all slices at the same time -
rgw_max_objs_per_shard
The maximum number of objects stored in a single slice. The default parameter value is 10w -
rgw_reshard_thread_interval
The interval of automatic slicing thread scanning is ten minutes by default
1.3 slice description
- Index object
RGW maintains an index for each bucket, which stores the metadata of all objects in the bucket. RGW itself does not have enough ability to traverse objects effectively. Bucket index affects the functions of object writing, modification and traversal (not reading).
Bucket indexes are also useful, such as maintaining logs for version controlled objects, bucket quota metadata, and cross region synchronization logs.
By default, each bucket has only one index object. Too many index objects will cause the following problems, so the number of objects that can be stored in each bucket is limited
– it will cause reliability problems. In extreme cases, the osd process may hang up due to slow data recovery
– it will cause performance problems. All writes to the same bucket will modify and serialize an index object
- bucket slicing
After Hammer version, bucket sharding is added to solve the problem of storing a large amount of data in a single bucket. Bucket index data can be distributed to multiple RADOS objects. The number of bucket storage objects increases with the increase of the number of shards of index data.
However, this is only valid for new buckets. You need to plan the number of slices in advance according to the final amount of data stored in the bucket. When the bucket write objects exceed the maximum number of shards that can be carried, the write performance plummets. At this time, you need to manually modify the number of shards to carry more object writes.
- Dynamic bucket slicing
After the lunar version, the dynamic bucket slicing function is added. With the increase of storage objects, the RADOSGW process will automatically find the buckets that need slicing and arrange automatic slicing.
2. Solutions
It is mainly divided into the following two cases
2.1. Determine the final number of objects written to a single bucket
Turn off the dynamic fragmentation function (to avoid the performance slump during the adjustment process), and set the number of fragments according to the final number of objects written to a single bucket
Sample final order bucket The number of objects written is 300 w,Set the number of slices to 30 Add parameter configuration in/etc/ceph/ceph.conf configuration file[global]In field [root@node45 ~]# cat /etc/ceph/ceph.conf [global] rgw_dynamic_resharding = false rgw_max_objs_per_shard = 30 restart RGW Service process [root@node45 ~]# systemctl restart ceph-radosgw.target
2.2. Uncertain number of objects written to a single bucket
Without turning off the dynamic slicing function, approximately set a number of slices
Add parameter configuration in/etc/ceph/ceph.conf configuration file[global]In field [root@node45 ~]# cat /etc/ceph/ceph.conf [global] rgw_max_objs_per_shard = 8 restart RGW Service process [root@node45 ~]# systemctl restart ceph-radosgw.target t /etc/ceph/ceph.conf [global] rgw_max_objs_per_shard = 8 restart RGW Service process [root@node45 ~]# systemctl restart ceph-radosgw.target