High availability cluster fault tolerance & integration of Hystrix

Posted by smsharif on Sat, 15 Jan 2022 06:34:53 +0100

1, Cluster fault tolerance

The fault tolerance scheme provided by Dubbo when the cluster call fails

When the cluster call fails, Dubbo provides a variety of fault-tolerant schemes. The default is failover retry.



Relationship of each node:

1. The Invoker here is an abstraction of the callable Service of the Provider. The Invoker encapsulates the Provider address and Service interface information;
2. Directory represents multiple invokers, which can be regarded as List < Invoker >, but different from List, its value may change dynamically, such as the change pushed by the registry;
3. The Cluster disguises multiple invokers in the Directory as one Invoker, which is transparent to the upper layer. The camouflage process includes fault-tolerant logic. After the call fails, try another one again;
4. The Router is responsible for selecting subsets from multiple invokers according to routing rules, such as read-write separation, application isolation, etc;
5. LoadBalance is responsible for selecting a specific Invoker from multiple invokers for this call. The selection process includes the load balancing algorithm. After the call fails, it needs to be re selected


Cluster fault tolerance mode

    Failover Cluster

Fail to switch automatically. In case of failure, retry other servers. It is usually used for read operations, but retries cause longer delays. You can set the number of retries (excluding the first time) through {retries="2".

The number of retries is configured as follows:

<dubbo:service retries="2" />


<dubbo:reference retries="2" />

    <dubbo:method name="findFoo" retries="2" />

Tip: this configuration is the default.


    Failfast Cluster

For quick failure, only one call is initiated, and an error is reported immediately after failure. It is usually used for non idempotent write operations, such as adding new records.


    Failsafe Cluster

Fail safe. In case of exception, ignore it directly. It is usually used for operations such as writing audit logs.


    Failback Cluster

Failure automatic recovery, background record failure request, regular retransmission. Typically used for message notification operations.


    Forking Cluster

Call multiple servers in parallel, and return as long as one succeeds. It is usually used for read operations with high real-time requirements, but it needs to waste more service resources. The maximum number of parallels can be set by {forks="2".


    Broadcast Cluster

The broadcast calls all providers one by one. If any one reports an error, it will report an error. It is usually used to notify all providers to update local resource information such as cache or log.

Now in the broadcast call, you can use broadcast fail. Percentage configures the proportion of node call failures. When this proportion is reached, BroadcastClusterInvoker will no longer call other nodes and directly throw exceptions. broadcast. fail. The value of percent is in the range of 0 ~ 100. By default, exceptions are thrown when all calls fail. broadcast.fail.percent only controls whether to continue to call other nodes after failure, and does not change the result (if any one reports an error, it will report an error). broadcast. fail. The percent parameter is in dubbo2 Version 7.10 and above shall take effect.

Broadcast Cluster configuration fail. percent.

      broadcast.fail.percent=20 means that when 20% of the nodes fail to call, an exception will be thrown and no other nodes will be called.

@reference(cluster = "broadcast", parameters = {"broadcast.fail.percent", "20"})

Tip: 2.1.0 is supported.


Cluster mode configuration:

Follow the example below to configure the cluster mode on the service provider and consumer

<dubbo:service cluster="failsafe" />


<dubbo:reference cluster="failsafe" />


2, Consolidate Hystrix

Hystrix is designed to provide greater fault tolerance for delays and failures by controlling nodes that access remote systems, services and third-party libraries. Hystrix has the functions of thread and signal isolation with fallback mechanism and circuit breaker function, request caching and request packaging, as well as monitoring and configuration.

1. Configure spring cloud starter Netfix hystrix

(1) introducing dependency between service providers and service consumers

spring boot officially provides the integration of hystrix directly in POM Add dependencies to XML:



(2) then add @ EnableHystrix on the Application class to enable the hystrix starter

@EnableHystrix   //Turn on service fault tolerance
@EnableDubbo  //Enable annotation based dubbo function
public class BootUserServiceProviderApplication {

    public static void main(String[] args) {
        SpringApplication.run(BootUserServiceProviderApplication.class, args);



2. Configure the Provider side

Add @ HystrixCommand configuration on Dubbo's Provider so that the sub call will pass through the Hystrix agent.

@Service(version = "1.0.0")
public class HelloServiceImpl implements HelloService {
    //If an exception occurs in the method, fault tolerance will be performed
    @HystrixCommand(commandProperties = {
     @HystrixProperty(name = "circuitBreaker.requestVolumeThreshold", value = "10"),
     @HystrixProperty(name = "execution.isolation.thread.timeoutInMilliseconds", value = "2000") })
    public String sayHello(String name) {
        // System.out.println("async provider received: " + name);
        // return "annotation: hello, " + name;
        throw new RuntimeException("Exception to show hystrix enabled.");


3. Configure the Consumer side

For the Consumer side, you can add a layer of method calls and configure @ HystrixCommand on the method. When there is an error in the call, it will go to the call with fallbackMethod = "reliable".

public class OrderServiceImpl implements OrderService {

    @Reference(version = "1.0.0")
    UserService userService;

    //Specifies the callback method for the error
    @HystrixCommand(fallbackMethod = "hello")
    public List<UserAddress> initOrder(String userId) {
        System.out.println("user ID: " + userId);
        //1.Query the receiving address of the user
        List<UserAddress> userAddressList = userService.getUserAddressList(userId);
        System.out.println("userAddressList = " + userAddressList);

        userAddressList.forEach(t -> System.out.println(t.getUserAddress()));
        System.out.println("Call complete!");
        return userAddressList;

    //Call this method when an error occurs
    public List<UserAddress> hello(String userId) {

        //Analog data, etc...
        return Collections.emptyList();