How not to enable Hystrix
Because our entry annotation class is replaced by SpringBootApplication from @ SpringCloudApplication, spring cloud circuit breaker will not be enabled. The introduced Hystrix dependency has no effect. Please refer to Section 2 of this series: The way to upgrade Spring Cloud - Hoxton - 2. Annotation modification of entry class and transformation of OpenFeign
Using Resilience4j to achieve instance level isolation and fusing
Why do you need instance level fusing? Because some instances of a microservice may be temporarily unavailable, we hope that when we try again, we will not try these instances again. The default spring cloud circuit breaker generally implements microservice level fusing. When some instances of a microservice are temporarily unavailable but some instances are available, the whole microservice fusing is likely to occur. In general, when rolling publishing, if the operation is improper, the fuse of microservice level causes the microservice to be unavailable, but in fact, some instances are available. So we need instance level fusing, not microservice level fusing.
Why is instance level thread isolation required? Prevent an instance from having problems, slow response, and blocking the entire business thread.
The implementation in spring cloud circuit breaker has limited use for resilience4j. We want to take advantage of more functions (such as thread isolation, etc.). Moreover, spring cloud circuit breaker can be directly used to achieve microservice level fusing, but it is difficult to achieve instance level fusing. The main reason is that its configuration is based on the microservice name, and there is no extension, so there are too many places to modify the code if we want to implement it. So we abandoned spring cloud circuit breaker.
Fortunately, resilience4j officially has its own spring cloud starter, which implements the core bean configuration of all its functions and is easy to use. We use this starter and related configuration methods to achieve our instance level isolation and fusing.
Introducing
<dependency> <groupId>io.github.resilience4j</groupId> <artifactId>resilience4j-spring-cloud2</artifactId> <version>${resilience4j-spring-cloud2.version}</version> </dependency>
After that, the following beans will be loaded automatically: BulkheadRegistry, ThreadPoolBulkheadRegistry, CircuitBreakerRegistry, RateLimiterRegistry, and RetryRegistry. The configuration of these beans is as follows:
- io.github.resilience4j.bulkhead.autoconfigure.BulkheadProperties : prefix resilience4j.bulkhead
- io.github.resilience4j.bulkhead.autoconfigure.ThreadPoolBulkheadProperties : prefix resilience4j.thread-pool-bulkhead
- io.github.resilience4j.circuitbreaker.autoconfigure.CircuitBreakerProperties : prefix resilience4j.circuit breaker
- io.github.resilience4j.ratelimiter.autoconfigure.RateLimiterProperties : prefix resilience4j.ratelimiter
- io.github.resilience4j.retry.autoconfigure.RetryProperties : prefix resilience4j.retry
The main elements used here are: circuit breaker and ThreadPoolBulkhead. Circuit breaker is used for instance level fusing, and ThreadPoolBulkhead is used for instance level thread isolation.
How to configure and use
Circuit breaker related configuration: CircuitBreaker
Circuit breaker has five states: CLOSED, open and HALF_OPEN. The remaining two states are manual operation, which we will not use here: DISABLED and
FORCED_OPEN. CLOSED means the circuit breaker is closed, and the request is processed as usual. OPEN represents the opening of the circuit breaker. If there is a request, an exception will be thrown: CallNotPermittedException.
Circuit breaker uses a sliding window to count successful and failed requests and turn the circuit breaker on or off. There are two kinds of sliding windows:
- Count based sliding window: use a ring array of size N to record the latest N request results.
- Timing based sliding window: record the request results in the last N seconds
Configuration item | Default | explain |
---|---|---|
failureRateThreshold | 50 | The percentage of failed requests. If it exceeds this percentage, the circuit breaker will become OPEN |
slowCallDurationThreshold | 60000[ms] | Slow call time. When a call is slower than this time, it will be recorded as slow call |
slowCallRateThreshold | 100 | When the slow call reaches this percentage, the circuit breaker will become OPEN |
permittedNumberOfCallsInHalfOpenState | 10 | When circuit breaker is in half_ Number of requests allowed to pass in the open state |
slidingWindowType | COUNT_BASED | Sliding window type, COUNT_BASED stands for count based sliding window, TIME_BASED represents a sliding window based on timing |
slidingWindowSize | 100 | Sliding window size, if count is configured_ The base default value of 100 represents the last 100 requests. If time is configured_ The base default value of 100 represents the request of the last 100s. |
minimumNumberOfCalls | 100 | Minimum number of requests. Only when the number of requests reaches this number in the sliding window will the judgment of circuit breaker be triggered. |
waitDurationInOpenState | 60000[ms] | From OPEN to HALF_ Waiting time required for open status |
automaticTransitionFromOpenToHalfOpenEnabled | false | If it is set to true, it indicates whether to change from OPEN to half automatically_ OPEN, even if no request comes. |
recordExceptions | empty | Exception list, which specifies a list of exceptions. All exceptions in this collection or subclasses of these exceptions that are thrown during the call will be recorded as failures. Other exceptions are not considered failures, or exceptions configured in ignore exceptions are not considered failures. By default, all exceptions are considered failures. |
ignoreExceptions | empty | Exception white list. All exceptions and their subclasses in this list will not be considered as request failure, even if these exceptions are configured in recordExceptions. The default white list is empty. |
The default configuration we implement here is:
resilience4j.circuitbreaker: configs: default: # Whether to register with health indicator of Actuator registerHealthIndicator: true slidingWindowSize: 10 minimumNumberOfCalls: 5 slidingWindowType: TIME_BASED permittedNumberOfCallsInHalfOpenState: 3 automaticTransitionFromOpenToHalfOpenEnabled: true waitDurationInOpenState: 2s failureRateThreshold: 30 recordExceptions: - java.lang.Exception
The above configuration represents that by default, all exceptions and their subclasses are considered failures. The sliding window is time-based and records the requests of the last 10 seconds. The trigger circuit breaker judgment must have at least 5 requests within 10 seconds. After the failure ratio reaches more than 30%, the circuit breaker changes to open. After the circuit breaker is OPEN, it will be automatically converted to HALF after 2 seconds_ OPEN.
Configuration related to ThreadPoolBulkhead: Create and configure a ThreadPoolBulkhead
Configuration item | Default | explain |
---|---|---|
maxThreadPoolSize | Runtime.getRuntime().availableProcessors() | Maximum thread pool size |
coreThreadPoolSize | Runtime.getRuntime().availableProcessors() - 1 | Core thread pool size |
queueCapacity | 100 | Queue size |
keepAliveDuration | 20[ms] | Thread lifetime |
The default configuration we implement here is:
resilience4j.thread-pool-bulkhead: configs: default: maxThreadPoolSize: 50 coreThreadPoolSize: 10 queueCapacity: 1
Bond with open feign
We need to add CircuitBreaker and ThreadPoolBulkhead after FeignClient is called and the instance to send the request is selected. In other words, we need to get the instance of this request call and the name of the microservice, load the corresponding circuit breaker and ThreadPoolBulkhead, wrap the call request, and then execute the call.
The core implementation of FeignClient, according to org.springframework.cloud.openfeign.loadbalancer.DefaultFeignLoadBalancerConfiguration You know it is org.springframework.cloud.openfeign.loadbalancer.FeignBlockingLoadBalancerClient :
@Bean @ConditionalOnMissingBean public Client feignClient(BlockingLoadBalancerClient loadBalancerClient) { return new FeignBlockingLoadBalancerClient(new Client.Default(null, null), loadBalancerClient); }
Check the source code of FeignBlockingLoadBalancerClient:
@Override public Response execute(Request request, Request.Options options) throws IOException { final URI originalUri = URI.create(request.url()); //Microservice name String serviceId = originalUri.getHost(); Assert.state(serviceId != null, "Request URI does not contain a valid hostname: " + originalUri); //Select an instance from the load balancer ServiceInstance instance = loadBalancerClient.choose(serviceId); if (instance == null) { String message = "Load balancer does not contain an instance for the service " + serviceId; if (LOG.isWarnEnabled()) { LOG.warn(message); } return Response.builder().request(request) .status(HttpStatus.SERVICE_UNAVAILABLE.value()) .body(message, StandardCharsets.UTF_8).build(); } //Modify the original url String reconstructedUrl = loadBalancerClient.reconstructURI(instance, originalUri) .toString(); //Build a new Request Request newRequest = Request.create(request.httpMethod(), reconstructedUrl, request.headers(), request.body(), request.charset(), //This RequestTemplate can get the microservice name request.requestTemplate()); return delegate.execute(newRequest, options); }
Therefore, we can replace the default implementation by inheriting FeignBlockingLoadBalancerClient to proxy the call request. However, due to the existence of sleuth and the small bug s in it, the RequestTemplate is lost, so we can't get the name of the microservice. For this, please refer to my PR: replace method for deprecation and keep reference of requestTemplate . however, the Hoxton version will not be merged, so we need to create a path class with the same name for replacement: org.springframework.cloud.sleuth.instrument.web.client.feign.TracingFeignClient
Request build() { if (headers == null) { return delegate; } String url = delegate.url(); byte[] body = delegate.body(); Charset charset = delegate.charset(); //Keep requestTemplate return Request.create(delegate.httpMethod(), url, headers, body, charset, delegate.requestTemplate()); }
After that, we implement FeignBlockingLoadBalancerClient with CircuitBreaker and ThreadPoolBulkhead, and optimize the HttpClient:
@Bean public HttpClient getHttpClient() { // Long connection for 30 seconds PoolingHttpClientConnectionManager pollingConnectionManager = new PoolingHttpClientConnectionManager(5, TimeUnit.MINUTES); // Total connections pollingConnectionManager.setMaxTotal(1000); // Concurrent number of the same route pollingConnectionManager.setDefaultMaxPerRoute(1000); HttpClientBuilder httpClientBuilder = HttpClients.custom(); httpClientBuilder.setConnectionManager(pollingConnectionManager); // To maintain long connection configuration, keep alive needs to be added to the header httpClientBuilder.setKeepAliveStrategy(new DefaultConnectionKeepAliveStrategy()); return httpClientBuilder.build(); } @Bean public FeignBlockingLoadBalancerClient feignBlockingLoadBalancerCircuitBreakableClient(HttpClient httpClient, BlockingLoadBalancerClient loadBalancerClient, BulkheadRegistry bulkheadRegistry, ThreadPoolBulkheadRegistry threadPoolBulkheadRegistry, CircuitBreakerRegistry circuitBreakerRegistry, RateLimiterRegistry rateLimiterRegistry, RetryRegistry retryRegistry, Tracer tracer) { return new FeignBlockingLoadBalancerClient(new CircuitBreakableClient( httpClient, bulkheadRegistry, threadPoolBulkheadRegistry, circuitBreakerRegistry, rateLimiterRegistry, retryRegistry, tracer), loadBalancerClient); } @Log4j2 public static class CircuitBreakableClient extends feign.httpclient.ApacheHttpClient { private final BulkheadRegistry bulkheadRegistry; private final ThreadPoolBulkheadRegistry threadPoolBulkheadRegistry; private final CircuitBreakerRegistry circuitBreakerRegistry; private final RateLimiterRegistry rateLimiterRegistry; private final RetryRegistry retryRegistry; private final Tracer tracer; public CircuitBreakableClient(HttpClient httpClient, BulkheadRegistry bulkheadRegistry, ThreadPoolBulkheadRegistry threadPoolBulkheadRegistry, CircuitBreakerRegistry circuitBreakerRegistry, RateLimiterRegistry rateLimiterRegistry, RetryRegistry retryRegistry, Tracer tracer) { super(httpClient); this.bulkheadRegistry = bulkheadRegistry; this.threadPoolBulkheadRegistry = threadPoolBulkheadRegistry; this.circuitBreakerRegistry = circuitBreakerRegistry; this.rateLimiterRegistry = rateLimiterRegistry; this.retryRegistry = retryRegistry; this.tracer = tracer; } @Override public Response execute(Request request, Request.Options options) throws IOException { String serviceName = request.requestTemplate().feignTarget().name(); URL url = new URL(request.url()); String instanceId = serviceName + ":" + url.getHost() + ":" + url.getPort(); //Each instance has a resilience4j fuse recorder. Fuse in the instance dimension. All instances of this service share the resilience4j fuse configuration of this service ThreadPoolBulkhead threadPoolBulkhead; CircuitBreaker circuitBreaker; try { threadPoolBulkhead = threadPoolBulkheadRegistry.bulkhead(instanceId, serviceName); } catch (ConfigurationNotFoundException e) { threadPoolBulkhead = threadPoolBulkheadRegistry.bulkhead(instanceId); } try { circuitBreaker = circuitBreakerRegistry.circuitBreaker(instanceId, serviceName); } catch (ConfigurationNotFoundException e) { circuitBreaker = circuitBreakerRegistry.circuitBreaker(instanceId); } //Keep traceId Span span = tracer.currentSpan(); Supplier<CompletionStage<Response>> completionStageSupplier = ThreadPoolBulkhead.decorateSupplier(threadPoolBulkhead, CircuitBreaker.decorateSupplier(circuitBreaker, () -> { try (Tracer.SpanInScope cleared = tracer.withSpanInScope(span)) { log.info("call url: {} -> {}", request.httpMethod(), request.url()); Response execute = super.execute(request, options); if (execute.status() != HttpStatus.OK.value()) { throw new ResponseWrapperException(execute.toString(), execute); } return execute; } catch (Exception e) { throw new ResponseWrapperException(e.getMessage(), e); } }) ); try { return Try.ofSupplier(completionStageSupplier).get().toCompletableFuture().join(); } catch (CompletionException e) { Throwable cause = e.getCause(); if (cause instanceof ResponseWrapperException) { ResponseWrapperException responseWrapperException = (ResponseWrapperException) cause; if (responseWrapperException.getResponse() != null) { return (Response) responseWrapperException.getResponse(); } } throw new ResponseWrapperException(cause.getMessage(), cause); } } }
In this way, we glued open feign and added instance based fusing and thread isolation