[CEPH] / etc / init.d/ceph: OSD. 1 not found (/ etc / CEPH / ceph.conf definitions, / var / lib / CEPH definitions) solution

Posted by olsrey on Fri, 17 Sep 2021 14:41:42 +0200

Error reporting description

The status of a stored osd is down, but when we start this osd, the following error will be reported [in fact, this osd must exist]

[root@stor-21 ~]# service ceph start osd.1
/etc/init.d/ceph: osd.1 not found (/etc/ceph/ceph.conf defines , /var/lib/ceph defines )

Information verification

osd status

Command: ceph osd tree
This is to check whether the status is down. In fact, it is more important to confirm that this host exists in the pool

[root@stor-21 ~]# ceph osd tree | grep -A 9 stor-21
-3 18.20000     host stor-21                                   
 1  1.81999         osd.1         down  0.88808          1.00000 
 3  1.81999         osd.3         down  1.00000          1.00000 
 6  1.81999         osd.6         down  1.00000          1.00000 
 9  1.81999         osd.9         down  0.96855          1.00000 
12  1.81999         osd.12        down  1.00000          1.00000 
15  1.81999         osd.15        down  0.90279          1.00000 
18  1.81999         osd.18        down  1.00000          1.00000 
21  1.81999         osd.21        down  0.86520          1.00000 
27  1.81999         osd.27        down  0.89455          1.00000 
[root@stor-21 ~]#

Does the ceph file of osd exist

The path is: / var/lib/ceph/osd. There will be the osd information of the host, which corresponds to the osd tree above

[root@stor-21 osd]# pwd
/var/lib/ceph/osd
[root@stor-21 osd]# 
[root@stor-21 osd]# ls
ceph-1  ceph-12  ceph-15  ceph-18  ceph-21  ceph-24  ceph-27  ceph-3  ceph-6  ceph-9
[root@stor-21 osd]#

Does the hard disk exist

Directly use the command: lsblk to check whether the number of hard disks is correct. If the mounting path behind the hard disk is gone at this time, it is normal, because the osd status is down [here is the screenshot after normal].

[root@stor-21 osd]# lsblk | tail -n 15
└─sdp1                   8:241  1  1.8T  0 part /var/lib/ceph/osd/ceph-12
sdq                     65:0    1  1.8T  0 disk 
└─sdq1                  65:1    1  1.8T  0 part /var/lib/ceph/osd/ceph-15
sdr                     65:16   1  1.8T  0 disk 
└─sdr1                  65:17   1  1.8T  0 part /var/lib/ceph/osd/ceph-18
sds                     65:32   1  1.8T  0 disk 
└─sds1                  65:33   1  1.8T  0 part /var/lib/ceph/osd/ceph-21
sdt                     65:48   1  1.8T  0 disk 
└─sdt1                  65:49   1  1.8T  0 part /var/lib/ceph/osd/ceph-24
sdu                     65:64   1  1.8T  0 disk 
└─sdu1                  65:65   1  1.8T  0 part /var/lib/ceph/osd/ceph-27
sr0                     11:0    1 1024M  0 rom  
sr1                     11:1    1 1024M  0 rom  
sr2                     11:2    1 1024M  0 rom  
sr3                     11:3    1 1024M  0 rom  
[root@stor-21 osd]#

mon node status

As for whether the current host is a monitor node, find your own way to verify it. The method is not necessarily. I won't explain it
Currently, my 21 hosts are mon nodes
The command to view the mon node status is: service ceph status mon. Hostname

[root@stor-21 osd]# service ceph status mon.stor-21
=== mon.stor-21 === 
mon.stor-21: running {"version":"0.94.6"}
[root@stor-21 osd]#

If the status is not running, start
Command: service ceph start mon. Hostname

resolvent

Again, if you use the command to start osd and report an error, use the following method
There are two kinds of command startup commands [operate on the host to which the osd belongs]
- 1:service ceph osd. Serial number [you can see the serial number in ceph osd tree]
- 2:/etc/init.d/ceph start osd serial number [the serial number can be seen in ceph osd tree]
In this case, we can start by activating osd. The command is:
CEPH disk activate / dev / formatted hard disk name [lsblk can see]
See below for details

#Here are the historical commands
  944  ceph-disk activate /dev/sdm1
  945  ceph-disk activate /dev/sdn1
  946  service ceph status
  947  ceph-disk activate /dev/sdo1
  948  ceph-disk activate /dev/sdp1
  949  service ceph status
  950  service ceph start mon.stor-21
  951  service ceph status
  952  ceph-disk activate /dev/sdq1
  953  ceph-disk activate /dev/sdr1
  954  ceph-disk activate /dev/sds1
  955  ceph-disk activate /dev/sdt1
  956  lsblk


# The following is the hard disk serial number seen by lsblk
sdl                      8:176  1  1.8T  0 disk 
└─sdl1                   8:177  1  1.8T  0 part /var/lib/ceph/osd/ceph-1
sdm                      8:192  1  1.8T  0 disk 
└─sdm1                   8:193  1  1.8T  0 part /var/lib/ceph/osd/ceph-3
sdn                      8:208  1  1.8T  0 disk 
└─sdn1                   8:209  1  1.8T  0 part /var/lib/ceph/osd/ceph-6
sdo                      8:224  1  1.8T  0 disk 
└─sdo1                   8:225  1  1.8T  0 part /var/lib/ceph/osd/ceph-9
sdp                      8:240  1  1.8T  0 disk 
└─sdp1                   8:241  1  1.8T  0 part /var/lib/ceph/osd/ceph-12
sdq                     65:0    1  1.8T  0 disk 
└─sdq1                  65:1    1  1.8T  0 part /var/lib/ceph/osd/ceph-15
sdr                     65:16   1  1.8T  0 disk 
└─sdr1                  65:17   1  1.8T  0 part /var/lib/ceph/osd/ceph-18
sds                     65:32   1  1.8T  0 disk 
└─sds1                  65:33   1  1.8T  0 part /var/lib/ceph/osd/ceph-21
sdt                     65:48   1  1.8T  0 disk 
└─sdt1                  65:49   1  1.8T  0 part /var/lib/ceph/osd/ceph-24
sdu                     65:64   1  1.8T  0 disk 
└─sdu1                  65:65   1  1.8T  0 part /var/lib/ceph/osd/ceph-27

At the extreme, if the above method still fails to mount, we know that the hard disk is mounted to the osd ceph file through the command or the above method. Then we can directly mount the hard disk to the osd ceph file with Mount! [because the mounting relationship is fixed. If there is no previous mounting information record, don't do so. If there is mounting information, you can try one of them. Anyway, only one osd will break in the end. If it's a big deal, rejoin the cluster!]
Normal process: if none of the above works, you can consider removing the osd from the cluster and rejoining the cluster. The disadvantage is that the amount of data synchronization will increase, just!!!

osd status view

Command: service seph status
This is how to view the status information of each osd

[root@stor-21 ~]# service ceph status
=== mon.stor-21 === 
mon.stor-21: running {"version":"0.94.6"}
=== osd.1 === 
osd.1: running {"version":"0.94.6"}
=== osd.3 === 
osd.3: running {"version":"0.94.6"}
=== osd.6 === 
osd.6: running {"version":"0.94.6"}
=== osd.9 === 
osd.9: running {"version":"0.94.6"}
=== osd.12 === 
osd.12: running {"version":"0.94.6"}
=== osd.15 === 
osd.15: running {"version":"0.94.6"}
=== osd.18 === 
osd.18: running {"version":"0.94.6"}
=== osd.21 === 
osd.21: running {"version":"0.94.6"}
=== osd.24 === 
osd.24: running {"version":"0.94.6"}
=== osd.27 === 
osd.27: running {"version":"0.94.6"}
[root@stor-21 ~]#

Topics: Linux Ceph OpenStack

Programmer Think