Ceph version: 14.2.22
Linux version: Ubuntu server 18.04
The first part is to download Ceph source code
1.1 configure Ceph source image source
Ceph source code is hosted on Github. For some reasons, domestic access to Github website is very slow, so we need to speed up the acquisition of source code from other ways. Github officially gives several Github mirror websites:
Local needs to be modified ~ / gitconfig file, you can obtain the source code from the above image website. The relevant configurations are as follows:
#Github mirror source [url "https://hub.fastgit.org/"] insteadOf = https://github.com/
Note: there are also images of Ceph source code in China, such as Gitee and Gitcode, but it is not recommended to obtain them from these websites. Ceph source code uses a large number of third-party source code as its own sub modules, and Gitee and Gitcode do not necessarily synchronize all these sub modules. On the contrary, the above two mirror websites are completely synchronized with Github, so they can be used safely.
1.2 cloning ceph source code
Ceph has a large source code. You can choose which version or branch to download according to your needs. This case pulls V14 2.22 version of the source code. The difference between version and branch: the code of version will not change over time and will be fixed at the moment of labeling; The code of the branch will be developed and changed over time.
# Replace V14 according to your own needs 2.22 the version you need git clone -b v14.2.22 --depth=1 https://github.com/ceph/ceph.git
1.3 source code of synchronization sub module
Ceph source code uses a large number of sub modules in Ceph / All sub modules are listed in the gitmodules file. Do later_ cmake. When the SH script generates the build directory, do_cmake.sh first synchronizes the sub module source code to the specified directory. According to experience, it is easy to have incomplete synchronization or synchronization failure when synchronizing the source code of sub modules, which will directly lead to the failure of building the build directory. In order to prevent this situation, it is recommended to manually synchronize the sub module source code in advance.
git submodule update --init --recursive
Note: if the source code of the synchronization sub module fails, repeat the above command. If you interrupt synchronizing the source code of the sub module, you must delete all the files of the sub module in the corresponding directory, especially git file. If not deleted git, when the above command is repeated, the synchronization of the sub module will be skipped directly, resulting in the loss of the sub module source code. This problem cannot be detected, because after executing the above command, it will still show that the synchronization is successful without prompting which sub module has not been synchronized.
Part II source code compilation
2.1 installation dependency
Ceph source code installation dependency is very simple. Directly execute install DEPs under the root directory of the source code SH script. According to experience, it is found that the script has some problems and needs to be modified slightly.
2.1.1 modify launchpad source
The script will install the GCC environment. You only need to keep one source url of the installation package. Modify install DEPs Ensure function in SH script_ decent_ gcc_ on_ ubuntu
deb [lang=none] http://ppa.launchpad.net/ubuntu-toolchain-r/test/ubuntu $codename main #deb [arch=amd64 lang=none] http://mirror.cs.uchicago.edu/ubuntu-toolchain-r $codename main #deb [arch=amd64,i386 lang=none] http://mirror.yandex.ru/mirrors/launchpad/ubuntu-toolchain-r $codename main
2.1.2 shielding calls and installing libboost
Libboost library will be installed in the script, and the boost source package will be downloaded again in the process of compiling the source code. Therefore, libboost should not be installed in the script and install DEPs should be shielded Sh the following 2 places
*Bionic*) #install_boost_on_ubuntu bionic ;;
2.1.3 setting pypi image source
The script will install the pypi library. The default url is slow to download. You need to set the image source of the pypi library. Create ~ / pip/pip.conf file and add the following contents
[global] index-url = https://mirrors.aliyun.com/pypi/simple/ [install] trusted-host=mirrors.aliyun.com
2.1.4 installing other dependencies
In the process of compiling the source code, you will encounter many functions that use the zstd library. By default, Ubuntu 18 04 only libzstd1 is installed, but it is useless. Libzstd1 dev needs to be installed
sudo apt install libzstd1-dev
2.1.5 execution script
./install-deps.sh
2.2 compiling Ceph source code
2.2.1 enable debug mode
If you want to debug Ceph source code, you need to set the compiled source code mode to debug mode. The default compiled mode is release mode, which cannot debug source code. Add the following contents to the set(VERSION 14.2.22) of ceph/CMakeList file
set(CMAKE_BUILD_TYPE "Debug") set(CMAKE_CXX_FLAGS_DEBUG "-O0 -Wall -g") set(CMAKE_CXX_FLAGS "-O0 -Wall -g") set(CMAKE_C_FLAGS "-O0 -Wall -g ")
2.2.2 build directory
Execute do directly_ Cmake script, which will carry out a series of tests, including whether the source code is complete, whether the dependencies are installed, and so on. If there is a problem, the built build directory is incomplete. The most direct impact is that the makefile file cannot be generated, resulting in failure to compile.
./do_cmake.sh
2.2.3 download the boost source package
During make compilation, the script will automatically download boost_1_72_0.tar.bz2, due to the problem of download address and network, the download is very slow. In order to save time, Download manually in advance. Download address: https://download.ceph.com/qa/boost_1_72_0.tar.bz2 , put the downloaded package in ceph/build/boost/src.
2.2.4 compilation
The compilation using make must be executed in the ceph/build directory. The ceph source code can compile a module separately or all. Use make to specify multi-threaded compilation and improve the compilation speed. However, to reasonably allocate the number of threads, it is recommended to use 4-thread compilation.
#Method 1: compile all make all -j4 #Method 2: compile a osd block separately make ceph-osd -j4 #View all modules make help
Note: source code compilation will generate many library files and binary files, which are placed in ceph/build/lib and ceph/build/bin directories respectively
The third part is to deploy the cluster of Debug version
3.1 cluster deployment
The Cpeh source code provides a script for deploying the development cluster: vstart SH, the script will use local IP and different ports to configure MON, MGR, OSD, etc. Switch to the build directory and execute the following command to deploy a new cluster
MON=1 OSD=6 MDS=0 MGR=1 RGW=0 ../src/vstart.sh -d -n -x --without-dashboard
Parameter interpretation:
- MON, OSD, MDS and MGR are the corresponding numbers configured
- -d: Debug, turn on the debug mode
- -n: New, create a new cluster
- -x: Cephx, cephx certification
- --Without dashboard, a configuration of mgr, self-test found that if this is not closed, the deployment will report an error
3.2 viewing cluster status
Switch to the build directory and execute the following command to view the cluster status
./bin/ceph -s
give the result as follows
cluster: id: 88b11a21-7dd1-49d8-bb24-c18821ff09ae health: HEALTH_OK services: mon: 1 daemons, quorum a (age 5m) mgr: x(active, since 5m) osd: 6 osds: 6 up (since 4m), 6 in (since 4m) data: pools: 0 pools, 0 pgs objects: 0 objects, 0 B usage: 12 GiB used, 594 GiB / 606 GiB avail pgs:
Note: version 14.2.22 of ceph vstart The SH script does not add the ceph executable file to the system environment variable. All ceph commands must be executed in the build directory
3.3 deploy ceph tiered storage structure
This case needs to debug ceph hierarchical storage function, so simply build a hierarchical storage structure. Allocate 6 OSDs to the cluster, create 2 pools, cache pool and ec pool, and allocate 3 OSDs to each pool.
For detailed deployment, please refer to (the article is still under preparation)
Part IV code debugging
4.1 view PG-OSD mapping relationship
If you read the source code carefully, you will find that the main OSD process is mainly responsible for ceph hierarchical storage. If it is not the main OSD, it cannot be debugged into the code. Therefore, you need to view the PG mapping relationship of the cache pool in hierarchical storage.
#Switch to the build directory and execute the following command ./bin/ceph pg ls-by-pool cache_pool PG OBJECTS DEGRADED MISPLACED UNFOUND BYTES OMAP_BYTES* OMAP_KEYS* LOG STATE SINCE VERSION REPORTED UP ACTING SCRUB_STAMP DEEP_SCRUB_STAMP 5.0 0 0 0 0 0 0 0 18 active+clean 22h 323'18 323:76 [2,4,0]p2 [2,4,0]p2 2021-09-25 16:55:28.572062 2021-09-24 11:30:14.717641
From the results, we can see that pg5 The primary OSD corresponding to 0 is OSD 2
4.2 viewing the main OSD process
Execute the following command
ps -ef | grep ceph
give the result as follows
admins 10961 19680 0 15:12 pts/0 00:00:00 grep --color=auto ceph admins 18474 1 1 Sep24 ? 01:02:09 /home/admins/code/ceph/build/bin/ceph-mon -i a -c /home/admins/code/ceph/build/ceph.conf admins 18582 1 1 Sep24 ? 00:33:41 /home/admins/code/ceph/build/bin/ceph-mgr -i x -c /home/admins/code/ceph/build/ceph.conf admins 18806 1 1 Sep24 ? 00:41:15 /home/admins/code/ceph/build/bin/ceph-osd -i 1 -c /home/admins/code/ceph/build/ceph.conf admins 19096 1 1 Sep24 ? 00:41:06 /home/admins/code/ceph/build/bin/ceph-osd -i 3 -c /home/admins/code/ceph/build/ceph.conf admins 19242 1 1 Sep24 ? 00:40:37 /home/admins/code/ceph/build/bin/ceph-osd -i 4 -c /home/admins/code/ceph/build/ceph.conf admins 19415 1 1 Sep24 ? 00:41:00 /home/admins/code/ceph/build/bin/ceph-osd -i 5 -c /home/admins/code/ceph/build/ceph.conf admins 20385 1 1 Sep24 ? 00:39:47 /home/admins/code/ceph/build/bin/ceph-osd -i 0 -c /home/admins/code/ceph/build/ceph.conf admins 22235 1 1 Sep24 ? 00:40:24 /home/admins/code/ceph/build/bin/ceph-osd -i 2 -c /home/admins/code/ceph/build/ceph.conf
As can be seen from the results, the main OSD process number is 22235
4.3 GDB multithreading debugging
The specific usage of linux gdb multithreading debugging is not introduced here. For what you need to learn, please Baidu. The following are only the debugging steps of this case
4.3.1 entering gdb mode
gdb debugging requires the administrator to execute the following commands to enter gdb mode
sudo gdb
give the result as follows
[sudo] password for admins: GNU gdb (Ubuntu 8.1.1-0ubuntu1) 8.1.1 Copyright (C) 2018 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word". (gdb)
4.3.2 attach osd2 process
(gdb) attach 22235 Attaching to process 22235 [New LWP 22237] [New LWP 22238] [New LWP 22239] [New LWP 22248] [New LWP 22249] [New LWP 22250] [New LWP 22251] [New LWP 22254] [New LWP 22255] [New LWP 22256] [New LWP 22257] [New LWP 22258] [New LWP 22259] [New LWP 22260] [New LWP 22269] [New LWP 22270] [New LWP 22271] ........ ........ [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". 0x00007fd026a7dad3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x55b3123d8910) at ../sysdeps/unix/sysv/linux/futex-internal.h:88 88 ../sysdeps/unix/sysv/linux/futex-internal.h: No such file or directory. (gdb)
4.3.3 setting breakpoints
#In this example, the power-off setting is PrimaryLogPG::do_op function start (gdb) b PrimaryLogPG.cc:1952 Breakpoint 1 at 0x55b305d28af2: file /home/admins/code/ceph/src/osd/PrimaryLogPG.cc, line 1952. #After setting the power-off, execute continue (gdb) c Continuing.
4.3.4 testing
Write data to the storage pool, and the test results are as follows
[Switching to Thread 0x7fd0034cb700 (LWP 22364)] Thread 57 "tp_osd_tp" hit Breakpoint 1, PrimaryLogPG::do_op (this=0x55b312519400, op=...) at /home/admins/code/ceph/src/osd/PrimaryLogPG.cc:1952 1952 {
As can be seen from the above results, when writing data, the function stops at line 1952 of the code. Now you can use the gdb command to debug the code, just like the normal debugging code. However, it should be noted that due to the heartbeat mechanism of ceph osd, when debugging an osd, if the process has not been completed for a long time, the osd will be marked as down and the debugging cannot be continued. You need to re-enter gdb mode!