Oracle19c PDB level Failover error scenario test

Posted by Jonob on Wed, 19 Jan 2022 14:20:30 +0100

First of all, don't use the default service for the client connection, that is, the connection used by the application.

The source of the original text is Real Application Clusters Administration and Deployment Guide:

All of these services are used for internal management. You cannot stop or disable any of these internal services to do planned outages or to failover to Oracle Data Guard. Do not use these services for client connections.

Previously, in the user's RAC environment, a PDB was in mount state after shutdown, and there was no failover. The reason is that it uses the default service automatically established by the database for PDB. I found this article by Alfred Zhao at that time: Test 12.2.0.1 RAC PDB level Failover . This article gives the success scenario, but does not demonstrate the failure scenario, which is the purpose of this article.

My environment is to establish a 2-node RAC on the Oracle public cloud. The database version is 19.11, and one PDB is named PDB1:

SQL> show pdbs

    CON_ID CON_NAME                       OPEN MODE  RESTRICTED
---------- ------------------------------ ---------- ----------
         2 PDB$SEED                       READ ONLY  NO
         3 PDB1                           READ WRITE NO

SQL> alter session set container = pdb1;

Session altered.

SQL> select name from v$active_services;

NAME
----------------------------------------------------------------
pdb1

This service is created automatically by the database when the PDB is created. It is an internal service (such as the internal services mentioned above) and is only used for management.
You cannot stop this service unless you forcibly stop the instance or kill the process:

SQL> exec DBMS_SERVICE.STOP_SERVICE('pdb1');
BEGIN DBMS_SERVICE.STOP_SERVICE('pdb1'); END;

*
ERROR at line 1:
ORA-44793: cannot stop internal services
ORA-06512: at "SYS.DBMS_SERVICE_ERR", line 95
ORA-06512: at "SYS.DBMS_SERVICE", line 519
ORA-06512: at line 1

On Oracle public cloud service, CDB and PDB will provide spelled connection strings on the console, as follows:

# CDB
(DESCRIPTION=(CONNECT_TIMEOUT=5)(TRANSPORT_CONNECT_TIMEOUT=3)(RETRY_COUNT=3)(ADDRESS_LIST=(LOAD_BALANCE=on)(ADDRESS=(PROTOCOL=TCP)(HOST=10.0.0.74)(PORT=1521))(ADDRESS=(PROTOCOL=TCP)(HOST=10.0.0.229)(PORT=1521))(ADDRESS=(PROTOCOL=TCP)(HOST=10.0.0.213)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=racMJMWZ_icn189.pub.racdblab.oraclevcn.com)))

# PDB
(DESCRIPTION=(CONNECT_TIMEOUT=5)(TRANSPORT_CONNECT_TIMEOUT=3)(RETRY_COUNT=3)(ADDRESS_LIST=(:q=on)(ADDRESS=(PROTOCOL=TCP)(HOST=10.0.0.74)(PORT=1521))(ADDRESS=(PROTOCOL=TCP)(HOST=10.0.0.229)(PORT=1521))(ADDRESS=(PROTOCOL=TCP)(HOST=10.0.0.213)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=pdb1.pub.racdblab.oraclevcn.com)))

There are three addresses written in it, which are actually three scan VIPs:

$ srvctl config scan
SCAN name: lvracdb-s01-2022-01-14-123012-scan.pub.racdblab.oraclevcn.com, Network: 1
Subnet IPv4: 10.0.0.0/255.255.255.0/ens3, static
Subnet IPv6:
SCAN 1 IPv4 VIP: 10.0.0.213
SCAN VIP is enabled.
SCAN 2 IPv4 VIP: 10.0.0.229
SCAN VIP is enabled.
SCAN 3 IPv4 VIP: 10.0.0.74
SCAN VIP is enabled.

Test preparation

No special client is established. RAC node 1 is used as the client and SQL Plus is used as the client test program.

Configure 3 Net service names on node 1:

# The default service uses 3 scan VIPs
PDB1-SCANVIP =
(DESCRIPTION=(CONNECT_TIMEOUT=5)(TRANSPORT_CONNECT_TIMEOUT=3)(RETRY_COUNT=3)(ADDRESS_LIST=(LOAD_BALANCE=off)(FAILOVER=yes)(ADDRESS=(PROTOCOL=TCP)(HOST=10.0.0.74)(PORT=1521))(ADDRESS=(PROTOCOL=TCP)(HOST=10.0.0.229)(PORT=1521))(ADDRESS=(PROTOCOL=TCP)(HOST=10.0.0.213)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=pdb1.pub.racdblab.oraclevcn.com)))

# The default service uses 2 host VIPs, and the VIP of node 1 is in front
PDB1-VIP=
(DESCRIPTION=(CONNECT_TIMEOUT=5)(TRANSPORT_CONNECT_TIMEOUT=3)(RETRY_COUNT=3)(ADDRESS_LIST=(LOAD_BALANCE=off)(FAILOVER=yes)(ADDRESS=(PROTOCOL=TCP)(HOST=10.0.0.116)(PORT=1521))(ADDRESS=(PROTOCOL=TCP)(HOST=10.0.0.192)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=pdb1.pub.racdblab.oraclevcn.com)))

# The default service uses 2 host IPS, and the IP of node 1 is in front
PDB1-NODEIP=
(DESCRIPTION=(CONNECT_TIMEOUT=5)(TRANSPORT_CONNECT_TIMEOUT=3)(RETRY_COUNT=3)(ADDRESS_LIST=(LOAD_BALANCE=off)(FAILOVER=yes)(ADDRESS=(PROTOCOL=TCP)(HOST=10.0.0.199)(PORT=1521))(ADDRESS=(PROTOCOL=TCP)(HOST=10.0.0.113)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=pdb1.pub.racdblab.oraclevcn.com)))

The test statement format is as follows:

sqlplus sh/$USERPWD@$SERVICE_NAME

Stop instance testing

Force close instance 1:

srvctl stop instance -d $DBNAME -i $INST1 -force

Confirm closed:

$ srvctl status database -d $DBNAME
Instance racMJMWZ1 is not running on node lvracdb-s01-2022-01-14-1230121
Instance racMJMWZ2 is running on node lvracdb-s01-2022-01-14-1230122

The test results are as follows:

Test IDTest statementresult
1sqlplus sh@PDB1-SCANVIPsuccess
2sqlplus sh@PDB1-VIPsuccess
3sqlplus sh@PDB1-NODEIPsuccess

It all worked. Test 1 was successful and it is easy to understand why the remaining 2 did not go wrong.

Close database test

After discussing the following with Alfred, we found that the database was closed when the error occurred, so we decided to simulate this scenario.
Restore the instance first to ensure that the whole environment is normal:

$ srvctl start instance -d $DBNAME -i $INST1

$ srvctl status database -d $DBNAME
Instance racMJMWZ1 is running on node lvracdb-s01-2022-01-14-1230121
Instance racMJMWZ2 is running on node lvracdb-s01-2022-01-14-1230122

Log in to the database on node 1 and close PDB PDB1:

connect / as sysdba
alter pluggable database pdb1 close immediate;

Confirm that the database is closed:

SQL> show pdbs

    CON_ID CON_NAME                       OPEN MODE  RESTRICTED
---------- ------------------------------ ---------- ----------
         2 PDB$SEED                       READ ONLY  NO
         3 PDB1                           MOUNTED

The test results are as follows:

Test IDTest statementresult
1sqlplus sh@PDB1-SCANVIPsuccess
2sqlplus sh@PDB1-VIPfail
3sqlplus sh@PDB1-NODEIPfail

The error messages in case of failure are as follows:

$ sqlplus sh/$USERPWD@$PDB1-VIP

SQL*Plus: Release 19.0.0.0.0 - Production on Tue Jan 18 04:37:20 2022
Version 19.11.0.0.0

Copyright (c) 1982, 2020, Oracle.  All rights reserved.

ERROR:
ORA-01109: database not open


Enter user-name:

conclusion

  1. Do not use the default service for client connections
  2. SCAN NAME should be used in the connection string. The writing method is the simplest. High availability can be provided for the default service. There is no need to change when adding or deleting nodes.
  3. It's OK to write three scan VIPs instead of SCAN NAME, but the writing method is too complicated.
  4. The default service does not provide fine-grained control, such as setting Application Continuity.

Supplementary notes

During the above test, since node 1 is not shut down, its host VIP will not drift.
You can verify by the following command:

$ srvctl config nodeapps
Network 1 exists
Subnet IPv4: 10.0.0.0/255.255.255.0/ens3, static
Subnet IPv6:
Ping Targets:
Network is enabled
Network is individually enabled on nodes:
Network is individually disabled on nodes:
VIP exists: network number 1, hosting node lvracdb-s01-2022-01-14-1230121
VIP Name: lvracdb-s01-2022-01-14-1230121-vip.pub.racdblab.oraclevcn.com
VIP IPv4 Address: 10.0.0.116
VIP IPv6 Address:
VIP is enabled.
VIP is individually enabled on nodes:
VIP is individually disabled on nodes:
VIP exists: network number 1, hosting node lvracdb-s01-2022-01-14-1230122
VIP Name: lvracdb-s01-2022-01-14-1230122-vip.pub.racdblab.oraclevcn.com
VIP IPv4 Address: 10.0.0.192
VIP IPv6 Address:
VIP is enabled.
VIP is individually enabled on nodes:
VIP is individually disabled on nodes:
ONS exists: Local port 6100, remote port 6200, EM port 2016, Uses SSL true
ONS is enabled
ONS is individually enabled on nodes:
ONS is individually disabled on nodes:

# perhaps
$ olsnodes
lvracdb-s01-2022-01-14-1230121
lvracdb-s01-2022-01-14-1230122

$ srvctl status vip -node $NODE1
VIP 10.0.0.116 is enabled
VIP 10.0.0.116 is running on node: lvracdb-s01-2022-01-14-1230121

$ srvctl status vip -node $NODE2
VIP 10.0.0.192 is enabled
VIP 10.0.0.192 is running on node: lvracdb-s01-2022-01-14-1230122

Under normal circumstances, the default service is started on all nodes:

$ sudo su - grid

# No output
$ crsctl stat res -t|grep -i pdb1

$ ps -ef|grep tns
root        36     2  0 Jan14 ?        00:00:00 [netns]
oracle   17446 17161  0 07:03 pts/0    00:00:00 grep --color=auto tns
grid     58936     1  0 Jan14 ?        00:00:13 /u01/app/19.0.0.0/grid/bin/tnslsnr LISTENER -no_crs_notify -inherit
grid     59034     1  0 Jan14 ?        00:02:04 /u01/app/19.0.0.0/grid/bin/tnslsnr ASMNET1LSNR_ASM -no_crs_notify -inherit
grid     62964     1  0 Jan16 ?        00:00:06 /u01/app/19.0.0.0/grid/bin/tnslsnr LISTENER_SCAN3 -no_crs_notify -inherit
grid     62977     1  0 Jan16 ?        00:00:06 /u01/app/19.0.0.0/grid/bin/tnslsnr LISTENER_SCAN2 -no_crs_notify -inherit

$ lsnrctl services LISTENER_SCAN2
...
Service "pdb1.pub.racdblab.oraclevcn.com" has 2 instance(s).
  Instance "racMJMWZ1", status READY, has 1 handler(s) for this service...
    Handler(s):
      "DEDICATED" established:6 refused:0 state:ready
         REMOTE SERVER
         (ADDRESS=(PROTOCOL=TCP)(HOST=10.0.0.116)(PORT=1521))
  Instance "racMJMWZ2", status READY, has 1 handler(s) for this service...
    Handler(s):
      "DEDICATED" established:6 refused:0 state:ready
         REMOTE SERVER
         (ADDRESS=(PROTOCOL=TCP)(HOST=10.0.0.192)(PORT=1521))
...

To obtain public IP, you can refer to the following:

$ olsnodes
lvracdb-s01-2022-01-14-1230121
lvracdb-s01-2022-01-14-1230122

$ hostname -f
lvracdb-s01-2022-01-14-1230121.pub.racdblab.oraclevcn.com
[grid@lvracdb-s01-2022-01-14-1230121 ~]$ nslookup $(hostname -f)
Server:         169.254.169.254
Address:        169.254.169.254#53

Non-authoritative answer:
Name:   lvracdb-s01-2022-01-14-1230121.pub.racdblab.oraclevcn.com
Address: 10.0.0.199

$ hostname -i
10.0.0.199

reference resources

  1. https://www.cnblogs.com/jyzhao/p/10458233.html#4
  2. https://mikedietrichde.com/2017/08/16/ora-44787-dont-mess-with-the-default-oracle-service/
  3. How To Configure Server Side Transparent Application Failover (Doc ID 460982.1)
  4. https://logic.edchen.org/how-oracle-rac-vip-failover-to-another-node/
  5. https://emersontech.com.br/en/checking-network-ips-rac/
  6. https://satya-racdba.blogspot.com/2012/07/virtual-ip-vip-address-rac-oracle.html
  7. http://ermanarslan.blogspot.com/p/oracle-rac.html

Topics: Database Oracle RAC