Rabbit MQ multi machine multi node cluster configuration

Posted by raan79 on Tue, 30 Nov 2021 21:37:34 +0100

Once the message middleware such as Rabbit MQ is used in the project, it means that the volume of the project has reached a certain level, and the stand-alone node is stretched. Therefore, a cluster is usually built. Let's try to build a cluster of three machines.

Node information

Node 1: 192.168.0.116 centos1

Node 2: 192.168.0.117 centos2

Node 3: 192.168.0.118 centos3

Configure hosts file

Configure the hosts file of each node so that each node can identify each other.

# /etc/hosts

192.168.0.116 centos1
192.168.0.117 centos2
192.168.0.118 centos3

Copy cookie file

Edit the cookie file of Rabbit MQ to ensure that the cookie file of each node is the same. Here, use the cookie file of centos1 node to copy the cookie file of centos1 to / var/lib/rabbit/.erlang.cookie or $HOME/.erlang.cookie of centos2 and centos3.

cookie file location

The default path of the cookie file is / var/lib/rabbit/.erlang.cookie or $HOME/.erlang.cookie. Cookies are equivalent to key tokens. Nodes in the cluster need to exchange key tokens to obtain mutual authentication.

  • If you use the decompression installation method (binary installation or compilation installation), the file exists in the $home directory. That is, $HOME/.erlang.cookie. If we use root installation, the location is: / root/.erlang.cookie, and other users are / home / user name /. erlang.cookie.
  • If you install using the rpm package, this file will exist in the / var/lib/rabbitmq directory.

Find. erlang.cookie file

You can view the information of Rabbit MQ log, as shown below: so my. erlang.cookie file is in / root/.erlang.cookie

 [info] <0.270.0> 
 node           : rabbit@centos1
 home dir       : /root
(I am root (user initiated)
 config file(s) : (none)
 cookie hash    : tCXB8mlCcGEGGV1cYRkQCg==
 log(s)         : /usr/local/rabbitmq_server/var/log/rabbitmq/rabbit@centos1.log
                : /usr/local/rabbitmq_server/var/log/rabbitmq/rabbit@centos1_upgrade.log
 database dir   : /usr/local/rabbitmq_server/var/lib/rabbitmq/mnesia/rabbit@centos1

Configure cluster

There are three ways to configure a cluster. Here, use the rabbitmqctl tool:

  1. Configure through rabbitmqctl tool;
  2. Configure through rabbitmq.config configuration file;
  3. Configure through rabbitmq autocluster plug-in;

Start the Rabbit MQ service on three nodes:

[root@centos1 ~]# rabbitmq-server -detached
[root@centos2 ~]# rabbitmq-server -detached
[root@centos3 ~]# rabbitmq-server -detached

After startup, the three nodes are still independent. You can use rabbitmqctl cluster_ The status command to view the status.

Join cluster

Based on centos1 node, add centos2 and centos3 nodes to centos1. Taking centos2 as an example, the steps are as follows:

[root@centos2 ~]# rabbitmqctl stop_app
Stopping rabbit application on node rabbit@centos2 ...
[root@centos2 ~]# rabbitmqctl reset
Resetting node rabbit@centos2 ...
[root@centos2 ~]# rabbitmqctl join_cluster rabbit@centos1
Clustering node rabbit@centos2 with rabbit@centos1
[root@centos2 ~]# rabbitmqctl start_app
Starting node rabbit@centos2 ...

Use the command rabbitmqctl cluster on any node_ status

[
    {nodes,[{disc,[rabbit@centos1,rabbit@centos2,rabbit@centos3]}]},
    {running_nodes,[rabbit@centos1,rabbit@centos2,rabbit@centos3]},
    {cluster_name,<<"rabbit@centos1">>},
    {partitions,[]},
    {alarms,[{rabbit@centos1,[]},{rabbit@centos2,[]},{rabbit@centos3,[]}]}
]

Web side presentation

Stop a node

The above is how to build a cluster of three nodes. If you stop and start a node, what will be the result? Stop centos2 and try.

[root@centos2 ~]# rabbitmqctl stop_app

rabbitmqctl cluster_ Status view

[
    {nodes,[{disc,[rabbit@centos1,rabbit@centos2,rabbit@centos3]}]},
    {running_nodes,[rabbit@centos1,rabbit@centos3]},
    {cluster_name,<<"rabbit@centos1">>},
    {partitions,[]},
    {alarms,[{rabbit@centos1,[]},{rabbit@centos3,[]}]}
]

Web side presentation

Cluster node shutdown and startup

If all nodes in the cluster are shut down, you need to ensure that the last node shut down at startup is the first one to start. If the first node started is not the last closed node, the node will wait for the last closed node to start. The waiting time is 30 seconds. If you do not wait, the node that starts first will also fail.

There will be a retry mechanism. By default, it will retry 20 times for 30 seconds each time to wait for the startup of the last shutdown node. Current version: Rabbit MQ: Rabbit MQ 3.8.9 on Erlang 23.0

After the retry fails, the current node will close its own application because of the failure.

Retry log:

2021-11-27 17:12:21.783 [info] <0.2656.0> Waiting for Mnesia tables for 30000 ms, 9 retries left
2021-11-27 17:12:22.206 [debug] <0.2664.0> Lager installed handler lager_backend_throttle into lager_event
2021-11-27 17:12:51.784 [warning] <0.2656.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit@centos3,rabbit@centos2,rabbit@centos1],[rabbit_durable_queue]}
2021-11-27 17:12:51.784 [info] <0.2656.0> Waiting for Mnesia tables for 30000 ms, 8 retries left
2021-11-27 17:13:21.785 [warning] <0.2656.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit@centos3,rabbit@centos2,rabbit@centos1],[rabbit_durable_queue]}
2021-11-27 17:13:21.785 [info] <0.2656.0> Waiting for Mnesia tables for 30000 ms, 7 retries left
2021-11-27 17:13:51.786 [warning] <0.2656.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit@centos3,rabbit@centos2,rabbit@centos1],[rabbit_durable_queue]}
2021-11-27 17:13:51.786 [info] <0.2656.0> Waiting for Mnesia tables for 30000 ms, 6 retries left
2021-11-27 17:14:21.787 [warning] <0.2656.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit@centos3,rabbit@centos2,rabbit@centos1],[rabbit_durable_queue]}
2021-11-27 17:14:21.787 [info] <0.2656.0> Waiting for Mnesia tables for 30000 ms, 5 retries left
2021-11-27 17:14:51.788 [warning] <0.2656.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit@centos3,rabbit@centos2,rabbit@centos1],[rabbit_durable_queue]}
2021-11-27 17:14:51.788 [info] <0.2656.0> Waiting for Mnesia tables for 30000 ms, 4 retries left
2021-11-27 17:15:21.789 [warning] <0.2656.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit@centos3,rabbit@centos2,rabbit@centos1],[rabbit_durable_queue]}
2021-11-27 17:15:21.789 [info] <0.2656.0> Waiting for Mnesia tables for 30000 ms, 3 retries left
2021-11-27 17:15:51.790 [warning] <0.2656.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit@centos3,rabbit@centos2,rabbit@centos1],[rabbit_durable_queue]}
2021-11-27 17:15:51.790 [info] <0.2656.0> Waiting for Mnesia tables for 30000 ms, 2 retries left
2021-11-27 17:16:21.791 [warning] <0.2656.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit@centos3,rabbit@centos2,rabbit@centos1],[rabbit_durable_queue]}
2021-11-27 17:16:21.791 [info] <0.2656.0> Waiting for Mnesia tables for 30000 ms, 1 retries left
2021-11-27 17:16:51.792 [warning] <0.2656.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit@centos3,rabbit@centos2,rabbit@centos1],[rabbit_durable_queue]}
2021-11-27 17:16:51.792 [info] <0.2656.0> Waiting for Mnesia tables for 30000 ms, 0 retries left
2021-11-27 17:17:21.793 [error] <0.2656.0> Feature flag `quorum_queue`: migration function crashed: {error,{timeout_waiting_for_tables,[rabbit@centos3,rabbit@centos2,rabbit@centos1],[rabbit_durable_queue]}}
[{rabbit_table,wait,3,[{file,"src/rabbit_table.erl"},{line,120}]},{rabbit_core_ff,quorum_queue_migration,3,[{file,"src/rabbit_core_ff.erl"},{line,60}]},{rabbit_feature_flags,run_migration_fun,3,[{file,"src/rabbit_feature_flags.erl"},{line,1602}]},{rabbit_feature_flags,'-verify_which_feature_flags_are_actually_enabled/0-fun-2-',3,[{file,"src/rabbit_feature_flags.erl"},{line,2269}]},{maps,fold_1,3,[{file,"maps.erl"},{line,233}]},{rabbit_feature_flags,verify_which_feature_flags_are_actually_enabled,0,[{file,"src/rabbit_feature_flags.erl"},{line,2267}]},{rabbit_feature_flags,sync_feature_flags_with_cluster,3,[{file,"src/rabbit_feature_flags.erl"},{line,2082}]},{rabbit_mnesia,ensure_feature_flags_are_in_sync,2,[{file,"src/rabbit_mnesia.erl"},{line,647}]}]
2021-11-27 17:17:21.793 [warning] <0.2656.0> Feature flags: the previous instance of this node must have failed to write the `feature_flags` file at `/usr/local/rabbitmq_server/var/lib/rabbitmq/mnesia/rabbit@centos2-feature_flags`:
2021-11-27 17:17:21.793 [warning] <0.2656.0> Feature flags:   - list of previously disabled feature flags now marked as such: [empty_basic_get_metric]
2021-11-27 17:17:21.800 [info] <0.2656.0> Waiting for Mnesia tables for 30000 ms, 9 retries left
2021-11-27 17:17:51.801 [warning] <0.2656.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit@centos3,rabbit@centos2,rabbit@centos1],[rabbit_user,rabbit_user_permission,rabbit_topic_permission,rabbit_vhost,rabbit_durable_route,rabbit_durable_exchange,rabbit_runtime_parameters,rabbit_durable_queue]}
2021-11-27 17:17:51.801 [info] <0.2656.0> Waiting for Mnesia tables for 30000 ms, 8 retries left
2021-11-27 17:18:21.802 [warning] <0.2656.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit@centos3,rabbit@centos2,rabbit@centos1],[rabbit_user,rabbit_user_permission,rabbit_topic_permission,rabbit_vhost,rabbit_durable_route,rabbit_durable_exchange,rabbit_runtime_parameters,rabbit_durable_queue]}
2021-11-27 17:18:21.802 [info] <0.2656.0> Waiting for Mnesia tables for 30000 ms, 7 retries left
2021-11-27 17:18:51.803 [warning] <0.2656.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit@centos3,rabbit@centos2,rabbit@centos1],[rabbit_user,rabbit_user_permission,rabbit_topic_permission,rabbit_vhost,rabbit_durable_route,rabbit_durable_exchange,rabbit_runtime_parameters,rabbit_durable_queue]}
2021-11-27 17:18:51.803 [info] <0.2656.0> Waiting for Mnesia tables for 30000 ms, 6 retries left
2021-11-27 17:19:21.804 [warning] <0.2656.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit@centos3,rabbit@centos2,rabbit@centos1],[rabbit_user,rabbit_user_permission,rabbit_topic_permission,rabbit_vhost,rabbit_durable_route,rabbit_durable_exchange,rabbit_runtime_parameters,rabbit_durable_queue]}
2021-11-27 17:19:21.804 [info] <0.2656.0> Waiting for Mnesia tables for 30000 ms, 5 retries left
2021-11-27 17:19:51.805 [warning] <0.2656.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit@centos3,rabbit@centos2,rabbit@centos1],[rabbit_user,rabbit_user_permission,rabbit_topic_permission,rabbit_vhost,rabbit_durable_route,rabbit_durable_exchange,rabbit_runtime_parameters,rabbit_durable_queue]}
2021-11-27 17:19:51.805 [info] <0.2656.0> Waiting for Mnesia tables for 30000 ms, 4 retries left
2021-11-27 17:20:21.806 [warning] <0.2656.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit@centos3,rabbit@centos2,rabbit@centos1],[rabbit_user,rabbit_user_permission,rabbit_topic_permission,rabbit_vhost,rabbit_durable_route,rabbit_durable_exchange,rabbit_runtime_parameters,rabbit_durable_queue]}
2021-11-27 17:20:21.806 [info] <0.2656.0> Waiting for Mnesia tables for 30000 ms, 3 retries left
2021-11-27 17:20:51.807 [warning] <0.2656.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit@centos3,rabbit@centos2,rabbit@centos1],[rabbit_user,rabbit_user_permission,rabbit_topic_permission,rabbit_vhost,rabbit_durable_route,rabbit_durable_exchange,rabbit_runtime_parameters,rabbit_durable_queue]}
2021-11-27 17:20:51.807 [info] <0.2656.0> Waiting for Mnesia tables for 30000 ms, 2 retries left
2021-11-27 17:21:21.808 [warning] <0.2656.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit@centos3,rabbit@centos2,rabbit@centos1],[rabbit_user,rabbit_user_permission,rabbit_topic_permission,rabbit_vhost,rabbit_durable_route,rabbit_durable_exchange,rabbit_runtime_parameters,rabbit_durable_queue]}
2021-11-27 17:21:21.808 [info] <0.2656.0> Waiting for Mnesia tables for 30000 ms, 1 retries left
2021-11-27 17:21:51.809 [warning] <0.2656.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit@centos3,rabbit@centos2,rabbit@centos1],[rabbit_user,rabbit_user_permission,rabbit_topic_permission,rabbit_vhost,rabbit_durable_route,rabbit_durable_exchange,rabbit_runtime_parameters,rabbit_durable_queue]}
2021-11-27 17:21:51.809 [info] <0.2656.0> Waiting for Mnesia tables for 30000 ms, 0 retries left
2021-11-27 17:22:21.813 [info] <0.44.0> Application mnesia exited with reason: stopped
2021-11-27 17:22:21.813 [info] <0.44.0> Application mnesia exited with reason: stopped
2021-11-27 17:22:21.817 [error] <0.2656.0> 
2021-11-27 17:22:21.818 [error] <0.2656.0> BOOT FAILED
2021-11-27 17:22:21.818 [error] <0.2656.0> ===========
2021-11-27 17:22:21.818 [error] <0.2656.0> Timeout contacting cluster nodes: [rabbit@centos3,rabbit@centos1].
2021-11-27 17:22:21.818 [error] <0.2656.0> 
2021-11-27 17:22:21.818 [error] <0.2656.0> BACKGROUND
2021-11-27 17:22:21.818 [error] <0.2656.0> ==========
2021-11-27 17:22:21.818 [error] <0.2656.0> 
2021-11-27 17:22:21.818 [error] <0.2656.0> This cluster node was shut down while other nodes were still running.
2021-11-27 17:22:21.818 [error] <0.2656.0> To avoid losing data, you should start the other nodes first, then
2021-11-27 17:22:21.818 [error] <0.2656.0> start this one. To force this node to start, first invoke
2021-11-27 17:22:21.818 [error] <0.2656.0> "rabbitmqctl force_boot". If you do so, any changes made on other
2021-11-27 17:22:21.818 [error] <0.2656.0> cluster nodes after this one was shut down may be lost.
2021-11-27 17:22:21.818 [error] <0.2656.0> 
2021-11-27 17:22:21.818 [error] <0.2656.0> DIAGNOSTICS
2021-11-27 17:22:21.818 [error] <0.2656.0> ===========
2021-11-27 17:22:21.819 [error] <0.2656.0> 
2021-11-27 17:22:21.819 [error] <0.2656.0> attempted to contact: [rabbit@centos3,rabbit@centos1]
2021-11-27 17:22:21.819 [error] <0.2656.0> 
2021-11-27 17:22:21.819 [error] <0.2656.0> rabbit@centos3:
2021-11-27 17:22:21.819 [error] <0.2656.0>   * connected to epmd (port 4369) on centos3
2021-11-27 17:22:21.819 [error] <0.2656.0>   * node rabbit@centos3 up, 'rabbit' application not running
2021-11-27 17:22:21.819 [error] <0.2656.0>   * running applications on rabbit@centos3: [lager,observer_cli,
2021-11-27 17:22:21.819 [error] <0.2656.0>                                              stdout_formatter,
2021-11-27 17:22:21.819 [error] <0.2656.0>                                              gen_batch_server,aten,cuttlefish,
2021-11-27 17:22:21.819 [error] <0.2656.0>                                              inets,credentials_obfuscation,
2021-11-27 17:22:21.820 [error] <0.2656.0>                                              recon,ranch,jsx,goldrush,xmerl,
2021-11-27 17:22:21.820 [error] <0.2656.0>                                              tools,syntax_tools,ssl,
2021-11-27 17:22:21.820 [error] <0.2656.0>                                              public_key,asn1,crypto,compiler,
2021-11-27 17:22:21.820 [error] <0.2656.0>                                              sasl,stdlib,kernel]
2021-11-27 17:22:21.820 [error] <0.2656.0>   * suggestion: use rabbitmqctl start_app on rabbit@centos3
2021-11-27 17:22:21.820 [error] <0.2656.0> rabbit@centos1:
2021-11-27 17:22:21.820 [error] <0.2656.0>   * connected to epmd (port 4369) on centos1
2021-11-27 17:22:21.923 [error] <0.2656.0>   * node rabbit@centos1 up, 'rabbit' application not running
2021-11-27 17:22:21.924 [error] <0.2656.0>   * running applications on rabbit@centos1: [lager,observer_cli,
2021-11-27 17:22:21.924 [error] <0.2656.0>                                              stdout_formatter,
2021-11-27 17:22:21.924 [error] <0.2656.0>                                              gen_batch_server,aten,cuttlefish,
2021-11-27 17:22:21.924 [error] <0.2656.0>                                              inets,credentials_obfuscation,
2021-11-27 17:22:21.924 [error] <0.2656.0>                                              recon,ranch,jsx,goldrush,xmerl,
2021-11-27 17:22:21.924 [error] <0.2656.0>                                              tools,syntax_tools,ssl,
2021-11-27 17:22:21.924 [error] <0.2656.0>                                              public_key,asn1,crypto,compiler,
2021-11-27 17:22:21.924 [error] <0.2656.0>                                              sasl,stdlib,kernel]
2021-11-27 17:22:21.924 [error] <0.2656.0>   * suggestion: use rabbitmqctl start_app on rabbit@centos1
2021-11-27 17:22:21.924 [error] <0.2656.0> 
2021-11-27 17:22:21.924 [error] <0.2656.0> Current node details:
2021-11-27 17:22:21.924 [error] <0.2656.0>  * node name: rabbit@centos2
2021-11-27 17:22:21.924 [error] <0.2656.0>  * effective user's home directory: /root
2021-11-27 17:22:21.924 [error] <0.2656.0>  * Erlang cookie hash: tCXB8mlCcGEGGV1cYRkQCg==
2021-11-27 17:22:21.924 [error] <0.2656.0> 
2021-11-27 17:22:21.924 [error] <0.2656.0> 
2021-11-27 17:22:22.926 [info] <0.2655.0> [{initial_call,{application_master,init,['Argument__1','Argument__2','Argument__3','Argument__4']}},{pid,<0.2655.0>},{registered_name,[]},{error_info,{exit,{{timeout_waiting_for_tables,[rabbit@centos3,rabbit@centos2,rabbit@centos1],[rabbit_user,rabbit_user_permission,rabbit_topic_permission,rabbit_vhost,rabbit_durable_route,rabbit_durable_exchange,rabbit_runtime_parameters,rabbit_durable_queue]},{rabbit,start,[normal,[]]}},[{application_master,init,4,[{file,"application_master.erl"},{line,138}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,226}]}]}},{ancestors,[<0.2654.0>]},{message_queue_len,1},{messages,[{'EXIT',<0.2656.0>,normal}]},{links,[<0.2654.0>,<0.44.0>]},{dictionary,[]},{trap_exit,true},{status,running},{heap_size,1598},{stack_size,28},{reductions,368}], []
2021-11-27 17:22:22.926 [error] <0.2655.0> CRASH REPORT Process <0.2655.0> with 0 neighbours exited with reason: {{timeout_waiting_for_tables,[rabbit@centos3,rabbit@centos2,rabbit@centos1],[rabbit_user,rabbit_user_permission,rabbit_topic_permission,rabbit_vhost,rabbit_durable_route,rabbit_durable_exchange,rabbit_runtime_parameters,rabbit_durable_queue]},{rabbit,start,[normal,[]]}} in application_master:init/4 line 138
2021-11-27 17:22:22.927 [info] <0.44.0> Application rabbit exited with reason: {{timeout_waiting_for_tables,[rabbit@centos3,rabbit@centos2,rabbit@centos1],[rabbit_user,rabbit_user_permission,rabbit_topic_permission,rabbit_vhost,rabbit_durable_route,rabbit_durable_exchange,rabbit_runtime_parameters,rabbit_durable_queue]},{rabbit,start,[normal,[]]}}
2021-11-27 17:22:22.927 [info] <0.44.0> Application rabbit exited with reason: {{timeout_waiting_for_tables,[rabbit@centos3,rabbit@centos2,rabbit@centos1],[rabbit_user,rabbit_user_permission,rabbit_topic_permission,rabbit_vhost,rabbit_durable_route,rabbit_durable_exchange,rabbit_runtime_parameters,rabbit_durable_queue]},{rabbit,start,[normal,[]]}}
2021-11-27 17:22:22.928 [info] <0.44.0> Application sysmon_handler exited with reason: stopped
2021-11-27 17:22:22.928 [info] <0.44.0> Application sysmon_handler exited with reason: stopped
2021-11-27 17:22:22.932 [info] <0.44.0> Application ra exited with reason: stopped
2021-11-27 17:22:22.932 [info] <0.44.0> Application ra exited with reason: stopped
2021-11-27 17:22:22.933 [info] <0.44.0> Application os_mon exited with reason: stopped
2021-11-27 17:22:22.933 [info] <0.44.0> Application os_mon exited with reason: stopped

Failure information return

[root@centos2 ~]# rabbitmqctl start_app
Starting node rabbit@centos2 ...
Error:
{:rabbit, {{:timeout_waiting_for_tables, [:rabbit@centos3, :rabbit@centos2, :rabbit@centos1], [:rabbit_user, :rabbit_user_permission, :rabbit_topic_permission, :rabbit_vhost, :rabbit_durable_route, :rabbit_durable_exchange, :rabbit_runtime_parameters, :rabbit_durable_queue]}, {:rabbit, :start, [:normal, []]}}}

Eliminate nodes

If the last closed node finally fails to start due to some exceptions, you can use rabbitmqctl forget_ cluster_ The node command knocks the node out of the cluster.

If all nodes in the cluster are shut down due to abnormal factors, the nodes in the cluster will think that they are not the last to shut down. At this time, you need to call rabbitmqctl force_boot command to start a node, and then the cluster can start normally.

[root@centos2 ~]# rabbitmqctl force boot 
Forcing boot for Mnesia dir /usr/local/rabbitmq_server/var/lib/rabbitmq/mnesia/rabbit@centos2
[root@centos2 ~]# rabbitmq server detached

summary

The above is the process of multi machine and multi node configuration. In particular, keep in mind the startup sequence of nodes in the cluster, and be sure to start the last closed node first to avoid startup failure.

Topics: RabbitMQ cluster