Ceph source code compilation and GDB debugging under Linux

Posted by mgs019 on Fri, 28 Jan 2022 07:03:15 +0100

Ceph version: 14.2.22
Linux version: Ubuntu server 18.04

 
 

The first part is to download Ceph source code

1.1 configure Ceph source image source

Ceph source code is hosted on Github. For some reasons, domestic access to Github website is very slow, so we need to speed up the acquisition of source code from other ways. Github officially gives several Github mirror websites:

  1. https://github.com.cnpmjs.org/
  2. https://hub.fastgit.org/

Local needs to be modified ~ / gitconfig file, you can obtain the source code from the above image website. The relevant configurations are as follows:

#Github mirror source
[url "https://hub.fastgit.org/"]
        insteadOf = https://github.com/

Note: there are also images of Ceph source code in China, such as Gitee and Gitcode, but it is not recommended to obtain them from these websites. Ceph source code uses a large number of third-party source code as its own sub modules, and Gitee and Gitcode do not necessarily synchronize all these sub modules. On the contrary, the above two mirror websites are completely synchronized with Github, so they can be used safely.

1.2 cloning ceph source code

Ceph has a large source code. You can choose which version or branch to download according to your needs. This case pulls V14 2.22 version of the source code. The difference between version and branch: the code of version will not change over time and will be fixed at the moment of labeling; The code of the branch will be developed and changed over time.

# Replace V14 according to your own needs 2.22 the version you need
git clone -b v14.2.22 --depth=1 https://github.com/ceph/ceph.git

1.3 source code of synchronization sub module

Ceph source code uses a large number of sub modules in Ceph / All sub modules are listed in the gitmodules file. Do later_ cmake. When the SH script generates the build directory, do_cmake.sh first synchronizes the sub module source code to the specified directory. According to experience, it is easy to have incomplete synchronization or synchronization failure when synchronizing the source code of sub modules, which will directly lead to the failure of building the build directory. In order to prevent this situation, it is recommended to manually synchronize the sub module source code in advance.

git submodule update --init --recursive

Note: if the source code of the synchronization sub module fails, repeat the above command. If you interrupt synchronizing the source code of the sub module, you must delete all the files of the sub module in the corresponding directory, especially git file. If not deleted git, when the above command is repeated, the synchronization of the sub module will be skipped directly, resulting in the loss of the sub module source code. This problem cannot be detected, because after executing the above command, it will still show that the synchronization is successful without prompting which sub module has not been synchronized.
 
 

Part II source code compilation

2.1 installation dependency

Ceph source code installation dependency is very simple. Directly execute install DEPs under the root directory of the source code SH script. According to experience, it is found that the script has some problems and needs to be modified slightly.

2.1.1 modify launchpad source

The script will install the GCC environment. You only need to keep one source url of the installation package. Modify install DEPs Ensure function in SH script_ decent_ gcc_ on_ ubuntu

deb [lang=none] http://ppa.launchpad.net/ubuntu-toolchain-r/test/ubuntu $codename main
#deb [arch=amd64 lang=none] http://mirror.cs.uchicago.edu/ubuntu-toolchain-r $codename main
#deb [arch=amd64,i386 lang=none] http://mirror.yandex.ru/mirrors/launchpad/ubuntu-toolchain-r $codename main

2.1.2 shielding calls and installing libboost

Libboost library will be installed in the script, and the boost source package will be downloaded again in the process of compiling the source code. Therefore, libboost should not be installed in the script and install DEPs should be shielded Sh the following 2 places

 *Bionic*)
        #install_boost_on_ubuntu bionic
  ;;

2.1.3 setting pypi image source

The script will install the pypi library. The default url is slow to download. You need to set the image source of the pypi library. Create ~ / pip/pip.conf file and add the following contents

[global]
index-url = https://mirrors.aliyun.com/pypi/simple/
[install]
trusted-host=mirrors.aliyun.com

2.1.4 installing other dependencies

In the process of compiling the source code, you will encounter many functions that use the zstd library. By default, Ubuntu 18 04 only libzstd1 is installed, but it is useless. Libzstd1 dev needs to be installed

sudo apt install libzstd1-dev

2.1.5 execution script

./install-deps.sh

 

2.2 compiling Ceph source code

2.2.1 enable debug mode

If you want to debug Ceph source code, you need to set the compiled source code mode to debug mode. The default compiled mode is release mode, which cannot debug source code. Add the following contents to the set(VERSION 14.2.22) of ceph/CMakeList file

set(CMAKE_BUILD_TYPE "Debug")
set(CMAKE_CXX_FLAGS_DEBUG "-O0 -Wall -g")
set(CMAKE_CXX_FLAGS "-O0 -Wall -g")
set(CMAKE_C_FLAGS "-O0 -Wall -g ")

2.2.2 build directory

Execute do directly_ Cmake script, which will carry out a series of tests, including whether the source code is complete, whether the dependencies are installed, and so on. If there is a problem, the built build directory is incomplete. The most direct impact is that the makefile file cannot be generated, resulting in failure to compile.

./do_cmake.sh

2.2.3 download the boost source package

During make compilation, the script will automatically download boost_1_72_0.tar.bz2, due to the problem of download address and network, the download is very slow. In order to save time, Download manually in advance. Download address: https://download.ceph.com/qa/boost_1_72_0.tar.bz2 , put the downloaded package in ceph/build/boost/src.

2.2.4 compilation

The compilation using make must be executed in the ceph/build directory. The ceph source code can compile a module separately or all. Use make to specify multi-threaded compilation and improve the compilation speed. However, to reasonably allocate the number of threads, it is recommended to use 4-thread compilation.

#Method 1: compile all
make all -j4
#Method 2: compile a osd block separately
make ceph-osd -j4
#View all modules
make help

Note: source code compilation will generate many library files and binary files, which are placed in ceph/build/lib and ceph/build/bin directories respectively
 
 

The third part is to deploy the cluster of Debug version

3.1 cluster deployment

The Cpeh source code provides a script for deploying the development cluster: vstart SH, the script will use local IP and different ports to configure MON, MGR, OSD, etc. Switch to the build directory and execute the following command to deploy a new cluster

MON=1 OSD=6 MDS=0 MGR=1 RGW=0 ../src/vstart.sh -d -n  -x  --without-dashboard

Parameter interpretation:

  1. MON, OSD, MDS and MGR are the corresponding numbers configured
  2. -d: Debug, turn on the debug mode
  3. -n: New, create a new cluster
  4. -x: Cephx, cephx certification
  5. --Without dashboard, a configuration of mgr, self-test found that if this is not closed, the deployment will report an error

3.2 viewing cluster status

Switch to the build directory and execute the following command to view the cluster status

./bin/ceph -s 

give the result as follows

  cluster:
    id:     88b11a21-7dd1-49d8-bb24-c18821ff09ae
    health: HEALTH_OK
 
  services:
    mon: 1 daemons, quorum a (age 5m)
    mgr: x(active, since 5m)
    osd: 6 osds: 6 up (since 4m), 6 in (since 4m)
 
  data:
    pools:   0 pools, 0 pgs
    objects: 0 objects, 0 B
    usage:   12 GiB used, 594 GiB / 606 GiB avail
    pgs:   

Note: version 14.2.22 of ceph vstart The SH script does not add the ceph executable file to the system environment variable. All ceph commands must be executed in the build directory

3.3 deploy ceph tiered storage structure

This case needs to debug ceph hierarchical storage function, so simply build a hierarchical storage structure. Allocate 6 OSDs to the cluster, create 2 pools, cache pool and ec pool, and allocate 3 OSDs to each pool.
For detailed deployment, please refer to (the article is still under preparation)
 
 

Part IV code debugging

4.1 view PG-OSD mapping relationship

If you read the source code carefully, you will find that the main OSD process is mainly responsible for ceph hierarchical storage. If it is not the main OSD, it cannot be debugged into the code. Therefore, you need to view the PG mapping relationship of the cache pool in hierarchical storage.

#Switch to the build directory and execute the following command
./bin/ceph pg ls-by-pool cache_pool

PG  OBJECTS DEGRADED MISPLACED UNFOUND BYTES OMAP_BYTES* OMAP_KEYS* LOG STATE        SINCE VERSION REPORTED UP        ACTING    SCRUB_STAMP                DEEP_SCRUB_STAMP           
5.0       0        0         0       0     0           0          0  18 active+clean   22h  323'18   323:76 [2,4,0]p2 [2,4,0]p2 2021-09-25 16:55:28.572062 2021-09-24 11:30:14.717641 

From the results, we can see that pg5 The primary OSD corresponding to 0 is OSD 2

4.2 viewing the main OSD process

Execute the following command

ps -ef | grep ceph

give the result as follows

admins   10961 19680  0 15:12 pts/0    00:00:00 grep --color=auto ceph
admins   18474     1  1 Sep24 ?        01:02:09 /home/admins/code/ceph/build/bin/ceph-mon -i a -c /home/admins/code/ceph/build/ceph.conf
admins   18582     1  1 Sep24 ?        00:33:41 /home/admins/code/ceph/build/bin/ceph-mgr -i x -c /home/admins/code/ceph/build/ceph.conf
admins   18806     1  1 Sep24 ?        00:41:15 /home/admins/code/ceph/build/bin/ceph-osd -i 1 -c /home/admins/code/ceph/build/ceph.conf
admins   19096     1  1 Sep24 ?        00:41:06 /home/admins/code/ceph/build/bin/ceph-osd -i 3 -c /home/admins/code/ceph/build/ceph.conf
admins   19242     1  1 Sep24 ?        00:40:37 /home/admins/code/ceph/build/bin/ceph-osd -i 4 -c /home/admins/code/ceph/build/ceph.conf
admins   19415     1  1 Sep24 ?        00:41:00 /home/admins/code/ceph/build/bin/ceph-osd -i 5 -c /home/admins/code/ceph/build/ceph.conf
admins   20385     1  1 Sep24 ?        00:39:47 /home/admins/code/ceph/build/bin/ceph-osd -i 0 -c /home/admins/code/ceph/build/ceph.conf
admins   22235     1  1 Sep24 ?        00:40:24 /home/admins/code/ceph/build/bin/ceph-osd -i 2 -c /home/admins/code/ceph/build/ceph.conf

As can be seen from the results, the main OSD process number is 22235

4.3 GDB multithreading debugging

The specific usage of linux gdb multithreading debugging is not introduced here. For what you need to learn, please Baidu. The following are only the debugging steps of this case

4.3.1 entering gdb mode

gdb debugging requires the administrator to execute the following commands to enter gdb mode

sudo gdb

give the result as follows

[sudo] password for admins: 
GNU gdb (Ubuntu 8.1.1-0ubuntu1) 8.1.1
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".
(gdb) 

4.3.2 attach osd2 process

(gdb) attach 22235
Attaching to process 22235
[New LWP 22237]
[New LWP 22238]
[New LWP 22239]
[New LWP 22248]
[New LWP 22249]
[New LWP 22250]
[New LWP 22251]
[New LWP 22254]
[New LWP 22255]
[New LWP 22256]
[New LWP 22257]
[New LWP 22258]
[New LWP 22259]
[New LWP 22260]
[New LWP 22269]
[New LWP 22270]
[New LWP 22271]
........
........
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007fd026a7dad3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x55b3123d8910) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
88        ../sysdeps/unix/sysv/linux/futex-internal.h: No such file or directory.
(gdb)

4.3.3 setting breakpoints

#In this example, the power-off setting is PrimaryLogPG::do_op function start
(gdb) b PrimaryLogPG.cc:1952
Breakpoint 1 at 0x55b305d28af2: file /home/admins/code/ceph/src/osd/PrimaryLogPG.cc, line 1952.

#After setting the power-off, execute continue
(gdb) c
Continuing.

4.3.4 testing

Write data to the storage pool, and the test results are as follows

[Switching to Thread 0x7fd0034cb700 (LWP 22364)]
Thread 57 "tp_osd_tp" hit Breakpoint 1, PrimaryLogPG::do_op (this=0x55b312519400, op=...) 
at /home/admins/code/ceph/src/osd/PrimaryLogPG.cc:1952
1952        {

As can be seen from the above results, when writing data, the function stops at line 1952 of the code. Now you can use the gdb command to debug the code, just like the normal debugging code. However, it should be noted that due to the heartbeat mechanism of ceph osd, when debugging an osd, if the process has not been completed for a long time, the osd will be marked as down and the debugging cannot be continued. You need to re-enter gdb mode!