Ali II: how to solve the problem of a large number of timeout of external interfaces, dragging down the whole system and causing an avalanche? Fuse

Posted by aouriques on Sat, 26 Feb 2022 08:43:43 +0100

Hello, everyone~

Internet plus era, business digitization has spread to every industry you can think of. There are more and more business functions and marketing methods, and the system is more and more complex.

In the face of increasingly complex business systems, the brain is less and less used

So smart people put forward the design idea of "micro service"

Based on the principle of "simplifying complex things", we split a large system into several subsystems. Each subsystem has a single responsibility and carries the business construction of a sub domain according to the design concept of DDD.

Therefore, people can focus their energy and concentrate on the in-depth construction of a business point.

Multiple microservice systems are connected in series through RPC} framework (such as dubbo, spring cloud, gRPC, etc.), but with the increasing demand, it is found that the stability between services becomes more and more important

for instance:

  • Service D hung up and the response was slow

  • Both Service G and Service F rely on Service D, which will also be implicated, and the external response will be slow

  • The impact will be transmitted upward layer by layer, and Service A and Service B will also be dragged down

  • Finally, the avalanche effect will be triggered, and the fault influence area of the system will be larger and larger

In order to solve this problem, we need to introduce the # mechanism. "If you make a decision, you will not be disturbed by it. If you make a decision continuously, you will be difficult."

What is fusing?

Fusing is actually to restrict the call of a resource when it is in an unstable state in the call link (such as call timeout or abnormal proportion increase), so as to make the request fail quickly and avoid affecting other resources and causing cascading errors.

When a resource is downgraded, calls to the resource will automatically fuse in the next "downgrade time window" (the default is to throw a "BlockException")

At present, there are many fuse frames on the market, such as Sentinel, Hystrix, Resilience4j# and so on. The design concepts of these frames are the same.

This article focuses on how Sentinel is used in the project

Sentinel (traffic guard of distributed system) is a comprehensive solution for service fault tolerance opened by Alibaba. Taking traffic as the starting point, it protects the stability of service from multiple dimensions such as traffic control, fuse degradation, system load protection, etc.

The core is divided into two parts:

1. Core library (Java client): it can run in all Java environments and has good support for Dubbo, Spring Cloud and other frameworks.

2. Dashboard: it is developed based on Spring Boot and can be run directly after packaging.

Sentinel fuse type:

  • RT response time

  • Different constant

  • Abnormal proportion

Sentinel installation

First, download the sentinel console installation package from the official website

Download address: https://github.com/alibaba/Sentinel/releases

After downloading the Jar package, open the terminal and run the command

java -Dserver.port=8180 -Dcsp.sentinel.dashboard.server=localhost:8180 -Dproject.name=sentinel-dashboard -jar sentinel-dashboard-1.8.1.jar

Log in to Sentinal console:

The default user and password are sentinel. After successful login, the interface is as follows. Let's have an intuitive experience first

Console configuration rules:

This indicates the proportion of {slow calls selected by the fusing policy. If the response time exceeds 200 milliseconds, it will be marked as a slow request. If the proportion of slow requests exceeds 30% and the number exceeds 3 within a statistical cycle of 1000 ms (which can be adjusted by itself), the subsequent requests will be fused for 10 seconds and return to normal after 10 seconds.

Annotated access

Access is very simple. You only need to configure the @ SentinelResource rule on the console in advance, and then add @ SentinelResource annotation to the code.

// The resource name is handle1 
@RequestMapping("/handle1")
@SentinelResource(value = "handle1", blockHandler = "blockHandlerTestHandler")
public String handle1(String params) { 
    // Business logic processing
    return "success";
}

// Method of interface handle1
public String blockHandlerTestHandler(String params, BlockException blockException) {
    return "Pocket return";
}

After reaching the threshold, the default prompt of the system is a paragraph of English, which is very unfriendly. We can {customize the method of revealing the bottom. Further configure the @ SentinelResource @ blockHandler and fallback attribute fields in the @ SentinelResource annotation

  • blockHandler: at the subjective level, if the current is limited or fused, this method will be called for bottom handling

  • fallback: find out the details of business exceptions. For example, if various exceptions are thrown during execution, call this method to find out the details

Through the above two layers, you can make the Sentinel framework more humanized and experience better.

Note: annotated development needs to be added to the method, and the scope is relatively fixed. In the actual combat of the following projects, we can also adopt the form of "display", which can flexibly delineate the scope of code blocks.

Project practice

We have a project here. Considering the deployment cost of customers, we want to make a lightweight scheme. The requirements are as follows:

  • I want to introduce the fuse function of the framework without deploying the console

  • The interception points are relatively closed. Similar to the remote access of dubbo consumer, the interception processing is performed at the remote communication location of the agent class

Outline scheme - flow chart:

1. We passed the proxy Newproxyinstance creates proxy subclasses for all interfaces

2. All method calls to proxy subclasses are collapsed into InvocationHandler

3. Let's splice the class name and method name, and then go to the "rule table" to query to see whether the rules are configured

4. If not, follow the normal rule to call the logic remotely

5. If so, bring the remote call logic into Sentinel's monitoring jurisdiction

6. If the circuit breaker mechanism is triggered, a BlockException will be thrown directly, and the upper layer business intercepts the exception for special processing, such as giving users more appropriate copywriting tips under modification.

Fuse state machine:

Core code logic, continue to look down

First, introduce Sentinel's dependency package:

<!-- Current limiting and fusing frame -->
<dependency>
    <groupId>com.alibaba.csp</groupId>
    <artifactId>sentinel-core</artifactId>
    <version>1.8.3</version>
</dependency>

Fuse rule table design:

CREATE TABLE `degrade_rule` (
  `id` bigint unsigned NOT NULL AUTO_INCREMENT COMMENT 'Primary key',
  `resource_name` varchar(256) NOT NULL COMMENT 'Resource name',
  `count` double NOT NULL COMMENT 'Slow call duration, in milliseconds',
  `slow_ratio_threshold` double NOT NULL COMMENT 'Slow call proportional threshold',
  `min_request_amount` int NOT NULL COMMENT 'Minimum number of requests triggered by fusing',
  `stat_interval` int NOT NULL COMMENT 'Statistics duration, in milliseconds',
  `time_window` int NOT NULL COMMENT 'Fusing duration, unit: s',
  `created_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT 'Creation time',
  `updated_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT 'Modification time',
  PRIMARY KEY (`id`) USING BTREE,
  UNIQUE KEY `uk_resource_name` (`resource_name`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8mb3 COMMENT='Fusing rule table';

Due to abandoning the deployment console, we can only manage the attribute values of the rule by ourselves. You can develop pages to manage these rules according to the internal management background style of the enterprise.

Of course, in the early days, you can manually initialize data in the database table in a simpler way. If you want to adjust the rules, go to SQL revision.

In order to perceive the change of rule table data in real time as much as possible, a timed task is developed, which runs every 10 seconds.

@Scheduled(cron = "0/10 * * * * ? ")
public void loadDegradeRule() {

    List<DegradeRuleDO> degradeRuleDOList = degradeRuleDao.queryAllRule();
    if (CollectionUtils.isEmpty(degradeRuleDOList)) {
        return;
    }

    String newMd5Hex = DigestUtils.md5Hex(JSON.toJSONString(degradeRuleDOList));
    if (StringUtils.isBlank(newMd5Hex) || StringUtils.equals(lastMd5Hex, newMd5Hex)) {
        return;
    }
    List<DegradeRule> rules = null;
    List<String> resourceNameList = new ArrayList<>();
    rules = degradeRuleDOList.stream().map(degradeRuleDO -> {
         //Resource name, that is, the object of the rule
        DegradeRule rule = new DegradeRule(degradeRuleDO.getResourceName()) 
                // Fusing strategy, supporting slow call proportion / abnormal proportion / different constant strategy
                .setGrade(CircuitBreakerStrategy.SLOW_REQUEST_RATIO.getType())
                //In slow call proportional mode, it is slow call critical RT (exceeding this value is counted as slow call); It is the corresponding threshold in the abnormal proportion / abnormal number mode
                .setCount(degradeRuleDO.getCount())
                // Fusing duration, unit: s
                .setTimeWindow(degradeRuleDO.getTimeWindow())
                // Slow call proportional threshold
                .setSlowRatioThreshold(degradeRuleDO.getSlowRatioThreshold())
                //The minimum number of requests triggered by fusing. When the number of requests is less than this value, it will not fuse even if the abnormal ratio exceeds the threshold
                .setMinRequestAmount(degradeRuleDO.getMinRequestAmount())
                //Statistical duration (unit: ms)
                .setStatIntervalMs(degradeRuleDO.getStatInterval());
        resourceNameList.add(degradeRuleDO.getResourceName());
        return rule;
    }).collect(Collectors.toList());

    if (CollectionUtils.isNotEmpty(rules)) {
        DegradeRuleManager.loadRules(rules);
        ConsumerProxyFactory.resourceNameList = resourceNameList;
        lastMd5Hex = newMd5Hex;
    }

    log.error("[DegradeRuleConfig] Fuse rule loading: " + rules);
}

Considering that the frequency of rule changes will not be very high, it is not necessary to use degraderulemanager every time Loadrules: reload rules. Here's a trick

DigestUtils.md5Hex(JSON.toJSONString(degradeRuleDOList));

Serialize the query rule content {JSON}, and then calculate its md5 summary. If the last result is consistent, it indicates that there is no change during this period. return directly without processing.

Define subclasses and implement the InvocationHandler interface. Via proxy Newproxyinstance creates a proxy subclass for the target interface.

In this way, every time you call the interface method, you are actually calling the invoke method

@Override
public Object invoke(Object proxy, Method method, Object[] args) throws Throwable {
 Class<?> clazz = proxy.getClass().getInterfaces()[0];
 String urlCode = clazz.getName() + "#" + method.getName();
 if (resourceNameList.contains(urlCode)) {
        // Add fusing treatment
        Entry entry = null;
        try {
            entry = SphU.entry(urlCode);
            // Remote network call to get results
            responseString = HttpClientUtil.postJsonRequest(url, header, body);
        } catch (BlockException blockException) {
            // Trigger fuse
            log.error("degrade trigger !  remote url :{} ", urlCode);
            throw new DegradeBlockExcetion(urlCode);
        } finally {
            if (entry != null) {
                entry.exit();
            }
        } 
     } else {
          // Routine processing without fuse judgment logic
          // ellipsis
    }    
}

Experimental data:

Topics: Java Back-end