Spring cloud upgrade 2020.0.x - 43. Why is there a link information loss in spring cloud gateway

Posted by CMC on Sun, 28 Nov 2021 03:48:48 +0100

Code address of this series: https://github.com/JoJoTec/sp...

Before we start to write our own log Filter, there is another problem I want to share with you here, that is, the loss of link information may occur in the Spring Cloud Gateway.

Main conflict - design conflict between Project Reactor and Java Logger MDC

Project reactor is the implementation of the programming mode based on asynchronous response design. Its main implementation idea is to write the execution link first, and finally sub executes the whole link. However, it is uncertain which thread executes each part of the link.

The design of Java's log framework and its context MDC (Mapped Diagnostic Context) information are designed based on threads. In fact, it can be simply understood as a ThreadLocal Map. The link information of the log is saved in this MDC.

In fact, it can be seen that Project Reactor is incompatible with the MDC of the log framework by default. As long as asynchronous thread switching occurs, the MDC will change. Spring Cloud Sleuth has added a lot of bonding code for this, but a wise man's thoughtfulness must have a mistake. Project Reactor application scenarios and libraries are also developing and growing. Spring Cloud Sleuth may also miss some scenarios, resulting in the loss of link information.

A common scenario of link information loss in Spring Cloud Gateway

We write a simple test project( Project address):

Import dependency:

<parent>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-parent</artifactId>
    <version>2.4.6</version>
</parent>

<dependencies>
    <dependency>
        <groupId>org.springframework.cloud</groupId>
        <artifactId>spring-cloud-starter-gateway</artifactId>
    </dependency>
    <dependency>
        <groupId>org.springframework.cloud</groupId>
        <artifactId>spring-cloud-starter-sleuth</artifactId>
    </dependency>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-log4j2</artifactId>
    </dependency>
    <!--log4j2 Asynchronous logging requires dependencies that all projects must use log4j2 And asynchronous log configuration-->
    <dependency>
        <groupId>com.lmax</groupId>
        <artifactId>disruptor</artifactId>
        <version>${disruptor.version}</version>
    </dependency>
</dependencies>

<dependencyManagement>
    <dependencies>
        <dependency>
            <groupId>org.springframework.cloud</groupId>
            <artifactId>spring-cloud-dependencies</artifactId>
            <version>2020.0.3</version>
            <type>pom</type>
            <scope>import</scope>
        </dependency>
    </dependencies>
</dependencyManagement>

Turn on AdaptCachedBodyGlobalFilter for all paths:

@Configuration(proxyBeanMethods = false)
public class ApiGatewayConfiguration {
    @Autowired
    private AdaptCachedBodyGlobalFilter adaptCachedBodyGlobalFilter;
    @Autowired
    private GatewayProperties gatewayProperties;

    @PostConstruct
    public void init() {
        gatewayProperties.getRoutes().forEach(routeDefinition -> {
            //Enable AdaptCachedBodyGlobalFilter for each route in the spring cloud gateway routing configuration 
            EnableBodyCachingEvent enableBodyCachingEvent = new EnableBodyCachingEvent(new Object(), routeDefinition.getId());
            adaptCachedBodyGlobalFilter.onApplicationEvent(enableBodyCachingEvent);
        });
    }
}

Configuration (we only have one route to forward the request to httpbin.org, the http request test website):

server:
  port: 8181
spring:
  application:
    name: apiGateway
  cloud:
    gateway:
      httpclient:
        connect-timeout: 500
        response-timeout: 60000
      routes:
        - id: first_route
          uri: http://httpbin.org
          predicates:
              - Path=/httpbin/**
          filters:
              - StripPrefix=1

Add two global filters, one before the AdaptCachedBodyGlobalFilter and one after the AdaptCachedBodyGlobalFilter. These two filters are very simple, just make a log.

@Log4j2
@Component
public class PreLogFilter implements GlobalFilter, Ordered {

    public static final int ORDER = new AdaptCachedBodyGlobalFilter().getOrder() - 1;

    @Override
    public Mono<Void> filter(ServerWebExchange exchange, GatewayFilterChain chain) {
        log.info("before AdaptCachedBodyGlobalFilter");
        return chain.filter(exchange);
    }

    @Override
    public int getOrder() {
        return ORDER;
    }
}

@Log4j2
@Component
public class PostLogFilter implements GlobalFilter, Ordered {

    public static final int ORDER = new AdaptCachedBodyGlobalFilter().getOrder() + 1;

    @Override
    public Mono<Void> filter(ServerWebExchange exchange, GatewayFilterChain chain) {
        log.info("after AdaptCachedBodyGlobalFilter");
        return chain.filter(exchange);
    }

    @Override
    public int getOrder() {
        return ORDER;
    }
}

Finally, specify that the output format of Log4j2 contains link information, as specified at the beginning of the series.

Start the application and then access http://127.0.0.1:8181/httpbin/anything , check the log and find that there is no link information in the PostLogFilter:

2021-09-08 06:32:35.457  INFO [service-apiGateway,51063d6f1fe264d0,51063d6f1fe264d0] [30600] [reactor-http-nio-2][?:]: before AdaptCachedBodyGlobalFilter
2021-09-08 06:32:35.474  INFO [service-apiGateway,,] [30600] [reactor-http-nio-2][?:]: after AdaptCachedBodyGlobalFilter

How does Spring Cloud Sleuth add link information

Through the source code analysis before the series, we know that at the beginning of TraceWebFilter, we encapsulated Mono into a MonoWebFilterTrace. Its core source code is:

@Override
public void subscribe(CoreSubscriber<? super Void> subscriber) {
    Context context = contextWithoutInitialSpan(subscriber.currentContext());
    Span span = findOrCreateSpan(context);
    //Putting the Span into the execution context is actually putting the link information into org.slf4j.MDC for the log
    //The MDC of logs is generally ThreadLocal Map. The implementation class of Log4j2 is org.apache.logging.log4j.ThreadContext, and its core contextMap is a Map based on ThreadLocal
    //The simple understanding is to put the link information into a ThreadLocal Map, and each thread accesses its own Map to obtain the link information
    try (CurrentTraceContext.Scope scope = this.currentTraceContext.maybeScope(span.context())) {
        //Wrap the actual subscription with the Context where the Span is located, and close the Span at the end
        this.source.subscribe(new WebFilterTraceSubscriber(subscriber, context, span, this));
    }
    //After scope.close(), the link information will be removed from the ThreadLocal Map
}

@Override
public Object scanUnsafe(Attr key) {
    if (key == Attr.RUN_STYLE) {
        //The execution mode must not switch threads, that is, synchronous
        //Because the link information of the log is placed in the ThreadLocal object. When the thread is switched, the link information is gone
        return Attr.RunStyle.SYNC; 
    }
    return super.scanUnsafe(key);
}

What does WebFilterTraceSubscriber do? When an exception occurs and the http request ends, we may want to record the response information and exception information into the Span through this class encapsulation.

After the encapsulation of MonoWebFilterTrace, spring Webflux processes the request, which is actually encapsulated into the request for subscription processing after Mono. Therefore, the publish link and subscribe link of the entire internal Mono are wrapped by the scope in WebFilterTraceSubscriber. As long as we don't transfer in gateway filter ourselves If some forced asynchronous Mono or Flux causes thread switching, the link information will not be lost.

Why is the link information lost in the above test items

Let's see what the Mono link will look like after the adaptcachedbody globalfilter:

return Mono.defer(() ->
    new MonoWebFilterTrace(source, 
        RoutePredicateHandlerMapping.this.lookupRoute(exchange) //Find route on request
                .flatMap((Function<Route, Mono<?>>) r -> {
                    exchange.getAttributes().put(GATEWAY_ROUTE_ATTR, r); //Put the route into Attributes, which we will use later
                    return Mono.just(RoutePredicateHandlerMapping.this.webHandler); //Returns the FilteringWebHandler of RoutePredicateHandlerMapping
                }).switchIfEmpty( //If it is Mono.empty(), the route is not found
                    Mono.empty() 
                    .then(Mono.fromRunnable(() -> { //After returning Mono.empty(), log
                        if (logger.isTraceEnabled()) {
                            logger.trace("No RouteDefinition found for [" + getExchangeDesc(exchange) + "]");
                    }
                })))
            .switchIfEmpty(DispatcherHandler.this.createNotFoundError()) //If no handlerMapping other than Mono.empty() is returned, 404 is returned directly
            .then(
                Mono.defer(() -> {
                    //Omit link nesting before AdaptCachedBodyGlobalFilter
                    //Read the Body. Since TCP unpacks, they need to be spliced together
                    DataBufferUtils.join(exchange.getRequest().getBody())
                        //If there is no Body, an empty DataBuffer will be returned directly
                        .defaultIfEmpty(factory.wrap(new EmptyByteBuf(factory.getByteBufAllocator())))
                        //In the modify method, the dataBuffer is placed in the Attributes list of exchange only to prevent duplicate cache requests from entering the 'adaptcachedbody globalfilter' repeatedly
                        //After that, the new body and the original request are encapsulated into new requests to continue the gateway filters link
                        .map(dataBuffer -> decorate(exchange, dataBuffer, cacheDecoratedRequest))
                        .switchIfEmpty(Mono.just(exchange.getRequest())).flatMap(function);
                })
                .then(Mono.empty()))
            ), //Call the corresponding Handler
    TraceWebFilter.this.isTracePresent(), TraceWebFilter.this, TraceWebFilter.this.spanFromContextRetriever()).transformDeferred((call) -> {
        //The processing related to MetricsWebFilter is given in the previous code and omitted here
    });
);

Where DataBufferUtils.join(exchange.getRequest().getBody()) In fact, it is a FluxReceive, which we can understand here: submit a task to try to read the request Body, add the link processing of the gateway filter to the callback after reading the Body, and return immediately after submitting the task. This may be complex. Let's use a similar example:

//First, we create a new Span
Span span = tracer.newTrace();
//Declare a MonoOperator similar to the MonoWebFilterTrace encapsulated in TraceWebFilter
class MonoWebFilterTrace<T> extends MonoOperator<T, T> {
    protected MonoWebFilterTrace(Mono<? extends T> source) {
        super(source);
    }

    @Override
    public void subscribe(CoreSubscriber<? super T> actual) {
        //Wrap the subscription with span
        try (Tracer.SpanInScope spanInScope = tracer.withSpanInScope(span)) {
            source.subscribe(actual);
            //When spanInScope is about to be closed (that is, remove the link information from the ThreadLocal Map), print the log
            log.info("stopped");
        }
    }
}

Mono.defer(() -> new MonoWebFilterTrace(
        Mono.fromRunnable(() -> {
            log.info("first");
        })
        //Simulate FluxReceive
        .then(Mono.delay(Duration.ofSeconds(1))
        .doOnSuccess(longSignal -> log.info(longSignal))))
).subscribe(aLong -> log.info(aLong));

Mono.delay is similar to FluxReceive in performance. Both are executed by asynchronously switching thread pools. After executing the above code, we can see from the log:

2021-09-08 07:12:45.236  INFO [service-apiGateway,7b2f5c190e1406cb,7b2f5c190e1406cb] [31868] [reactor-http-nio-2][?:]: first
2021-09-08 07:12:45.240  INFO [service-apiGateway,7b2f5c190e1406cb,7b2f5c190e1406cb] [31868] [reactor-http-nio-2][?:]: stopped
2021-09-08 07:12:46.241  INFO [service-apiGateway,,] [31868] [parallel-1][?:]: doOnEach_onNext(0)
2021-09-08 07:12:46.242  INFO [service-apiGateway,,] [31868] [parallel-1][?:]: onComplete()
2021-09-08 07:12:46.242  INFO [service-apiGateway,,] [31868] [parallel-1][?:]: 0

In the Spring Cloud Gateway, the thread pool used by the FluxReceive of the Request Body and the thread pool calling the GatewayFilter are the same, so the thread may still be the same. However, since the Span has ended and the link information has been removed from the ThreadLocal Map, there is still no link information in the log.

WeChat search "my programming meow" attention to the official account, daily brush, easy to upgrade technology, and capture all kinds of offer:

Topics: spring-cloud