How to add some custom mechanisms to Spring Boot graceful shutdown

Posted by cmccomas on Sat, 05 Mar 2022 13:16:58 +0100

Personal creation Convention: I declare that all articles created are original. If there is any reference to any article, it will be marked out. If there are omissions, you are welcome to criticize. If you find someone plagiarizing this article on the Internet, you are welcome to report it and actively report it to this organization github warehouse Submit issue, thank you for your support~

We know from spring boot 2.3 Starting with this version of X, an elegant shutdown mechanism is introduced. We have also deployed this mechanism online to increase the user experience. Although now everyone basically ensures that the business can be kept correct even if it is not closed gracefully through final consistency, transaction and other mechanisms. However, this will always lead to inconsistent data for a short time and affect the user experience. Therefore, graceful shutdown is introduced to ensure that the current request is processed before destroying all beans in ApplicationContext.

Problems with elegant closing

The closing process of ApplicationContext is simply divided into the following steps (corresponding to the source code doClose method of AbstractApplicationContext):

  1. Cancel the registration of the current ApplicationContext in LivBeanView (at present, it only includes canceling the registration from JMX)
  2. Publish the ContextClosedEvent event and process all listeners of this event synchronously
  3. Process all beans that implement the Lifecycle interface, parse their shutdown order, and call their stop method
  4. Destroy all beans in ApplicationContext
  5. Close BeanFactory

To simply understand graceful shutdown is to add the logic of graceful shutdown to the third step above to realize the Lifecycle, which includes the following two steps:

  1. Cut off external traffic entry: specifically, let the Web container of Spring Boot directly reject all newly received requests and no longer process new requests. For example, directly return 503
  2. Wait for the hosted Dispatcher's thread pool to process all requests: for synchronous Servlet processes, it is actually the thread pool to process Servlet requests. For asynchronous responsive WebFlux processes, it is actually the Reactor thread pool of all Web requests to process all events published by Publisher.

First, cut off the external traffic entry to ensure that no new requests come. After the thread pool processes all requests, the normal business logic also runs normally. After that, you can start to close other elements.

However, we must first ensure that the logic of elegant shutdown needs to be the first and safest in all lifecycles. In this way, ensure that all requests are processed before starting to stop other lifecycles. If not, what's the problem? For example, if a Lifecycle is a load balancer, the stop method will shut down the load balancer. If the Lifecycle stops before the graceful shutdown of the Lifecycle, it may cause some requests that have not been processed after the load balancer stops, and these requests need to use the load balancer to call other microservices, and the execution fails.

Another problem with elegant closing is that the default elegant closing function is not so comprehensive. Sometimes we need to add more closing logic on this basis. For example, in your project, there are not only thread pools for the web container to process requests, but also other thread pools. Moreover, the thread pool may be more complex. One thread pool submits to another, submits to each other, submits all kinds of requests, etc. after the thread pool for the web container to process requests processes all requests, Wait for these thread pools to execute all requests before shutting down. Another example is for MQ consumers. When gracefully closing, you should actually stop consuming new messages and wait for all current messages to be processed. These problems can be seen in the figure below:

Source code analysis access point - spring boot + undertow & synchronous Servlet environment

We trigger from the source code and analyze the mechanism of using Undertow as the Web container in spring boot and accessing the customized mechanism in the synchronous Servlet environment. First, after introducing spring boot related dependencies and configuring graceful shutdown:

pom.xml

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-web</artifactId>
    <exclusions>
        <!--Do not use default tomcat container-->
        <exclusion>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-tomcat</artifactId>
        </exclusion>
    </exclusions>
</dependency>
<!--use undertow container-->
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-undertow</artifactId>
</dependency>

application.yml

server:
  # Set the closing method to elegant closing
  shutdown: graceful
  
management:
  endpoint:
    health:
      show-details: always
    # The actor exposes the / Actor / shutdown interface for shutdown (elegant shutdown is actually enabled because elegant shutdown is enabled here)
    shutdown:
      enabled: true
  endpoints:
    jmx:
      exposure:
        exclude: '*'
    web:
      exposure:
        include: '*'

After setting the closing method to elegant closing, when Spring Boot starts, when creating a WebServer based on underwork, an elegant closing Handler will be added. Refer to the source code:

UndertowWebServerFactoryDelegate

static List<HttpHandlerFactory> createHttpHandlerFactories(Compression compression, boolean useForwardHeaders,
			String serverHeader, Shutdown shutdown, HttpHandlerFactory... initialHttpHandlerFactories) {
	List<HttpHandlerFactory> factories = new ArrayList<>(Arrays.asList(initialHttpHandlerFactories));
	if (compression != null && compression.getEnabled()) {
		factories.add(new CompressionHttpHandlerFactory(compression));
	}
	if (useForwardHeaders) {
		factories.add(Handlers::proxyPeerAddress);
	}
	if (StringUtils.hasText(serverHeader)) {
		factories.add((next) -> Handlers.header(next, "Server", serverHeader));
	}
	//If graceful shutdown is specified, add gracefulShutdown
	if (shutdown == Shutdown.GRACEFUL) {
		factories.add(Handlers::gracefulShutdown);
	}
	return factories;
}

The Handler added is the Handler of Undertow GracefulShutdownHandler , gracefulshutdown handler is a HttpHandler , the interface is simple:

public interface HttpHandler {
    void handleRequest(HttpServerExchange exchange) throws Exception;
}

In fact, each HTTP request received will go through the handleRequest method of each HttpHandler. The implementation idea of GracefulShutdownHandler is also very simple. Since each request will pass through the handleRequest method of this class, I will add an atomic counter atom + 1 when receiving the request. After the request is processed (note that after the response is returned, not the method is returned, because the request may be asynchronous, so this is made into a callback), Set the atomic counter to atomic - 1. If this counter is zero, it proves that there are no requests being processed. The source code is:

GracefulShutdownHandler:

@Override
public void handleRequest(HttpServerExchange exchange) throws Exception {
    //For atomic update, the request counter is incremented by one, and the returned snapshot is a number containing whether to close the status bit
    long snapshot = stateUpdater.updateAndGet(this, incrementActive);
    //Judge whether it is closing through the status bit
    if (isShutdown(snapshot)) {
        //If you are shutting down, directly request the number of atoms minus one
        decrementRequests();
        //Set the response code to 503
        exchange.setStatusCode(StatusCodes.SERVICE_UNAVAILABLE);
        //Mark request complete
        exchange.endExchange();
        //Return directly without going to other httphandlers
        return;
    }
    //Add the listener when the request is completed, which will be called when the request is completed and the response is returned, and reduce the counter atom by one
    exchange.addExchangeCompleteListener(listener);
    //Continue to the next HttpHandler
    next.handleRequest(exchange);
}

So, when did you call this shutdown? As we mentioned earlier, the third step of the closing process of ApplicationContext: process all beans that implement the Lifecycle interface, parse their closing order, and call their stop method. In fact, elegant closing is called here. When the spring boot + undertow & synchronous Servlet environment is started, when it comes to the step of creating WebServer, an elegantly closed Lifecycle will be created, corresponding to the source code:

ServletWebServerApplicationContext

private void createWebServer() {
	WebServer webServer = this.webServer;
	ServletContext servletContext = getServletContext();
	if (webServer == null && servletContext == null) {
		StartupStep createWebServer = this.getApplicationStartup().start("spring.boot.webserver.create");
		ServletWebServerFactory factory = getWebServerFactory();
		createWebServer.tag("factory", factory.getClass().toString());
		this.webServer = factory.getWebServer(getSelfInitializer());
		createWebServer.end();
		//Create one of the current applicationfactory shutdown in the webservercontext and register it here
		getBeanFactory().registerSingleton("webServerGracefulShutdown",
				new WebServerGracefulShutdownLifecycle(this.webServer));
		getBeanFactory().registerSingleton("webServerStartStop",
				new WebServerStartStopLifecycle(this, this.webServer));
	}
	else if (servletContext != null) {
		try {
			getSelfInitializer().onStartup(servletContext);
		}
		catch (ServletException ex) {
			throw new ApplicationContextException("Cannot initialize servlet context", ex);
		}
	}
	initPropertySources();
}

As mentioned earlier, the third step of the closing process of ApplicationContext calls the stop method of all lifecycles, which is the stop method in webservergracefulshutdown Lifecycle:

WebServerGracefulShutdownLifecycle

@Override
public void stop(Runnable callback) {
	this.running = false;
	this.webServer.shutDownGracefully((result) -> callback.run());
}

For the webServer here, because we use Undertow, the corresponding implementation is Undertow webServer. Take a look at its shutDownGracefully implementation:

UndertowWebServer

//The GracefulShutdownHandler here is the GracefulShutdownHandler added during startup
private volatile GracefulShutdownHandler gracefulShutdown;

@Override
public void shutDownGracefully(GracefulShutdownCallback callback) {
    // If gracefulshutdown handler is not null, it proves that gracefulshutdown is enabled (server.shutdown=graceful)
	if (this.gracefulShutdown == null) {
	    //null, it proves that it is not opened and closed gracefully, and nothing is waiting
		callback.shutdownComplete(GracefulShutdownResult.IMMEDIATE);
		return;
	}
	//After the request is finished, it needs to be closed
	logger.info("Commencing graceful shutdown. Waiting for active requests to complete");
	this.gracefulShutdownCallback.set(callback);
	//Call shutdown of GracefulShutdownHandler for graceful shutdown
	this.gracefulShutdown.shutdown();
	//Invoke the addShutdownListener of GracefulShutdownHandler to add the operation that is called after closing. Here is calling notifyGracefulCallback.
	//In fact, it is the callback of the calling method parameter (that is, the external callback)
	this.gracefulShutdown.addShutdownListener((success) -> notifyGracefulCallback(success));
}

private void notifyGracefulCallback(boolean success) {
	GracefulShutdownCallback callback = this.gracefulShutdownCallback.getAndSet(null);
	if (callback != null) {
		if (success) {
			logger.info("Graceful shutdown complete");
			callback.shutdownComplete(GracefulShutdownResult.IDLE);
		}
		else {
			logger.info("Graceful shutdown aborted with one or more requests still active");
			callback.shutdownComplete(GracefulShutdownResult.REQUESTS_ACTIVE);
		}
	}
}

Take another look at the shutdown method of GracefulShutdownHandler and the addShutdownListener method:

GracefulShutdownHandler:

public void shutdown() {
    //Set the off status bit and select atom + 1
    stateUpdater.updateAndGet(this, incrementActiveAndShutdown);
    //Direct request atomic minus one
    decrementRequests();
}

private void decrementRequests() {
    long snapshot = stateUpdater.updateAndGet(this, decrementActive);
    // Shutdown has completed when the activeCount portion is zero, and shutdown is set.
    //If it is completely equal to the off status bit MASK, it proves that other bits are 0 and the number of requests in the remaining processing is 0
    if (snapshot == SHUTDOWN_MASK) {
        //Call shutdownComplete
        shutdownComplete();
    }
}

private void shutdownComplete() {
    synchronized (lock) {
        lock.notifyAll();
        //Call the shutdown method of each ShutdownListener
        for (ShutdownListener listener : shutdownListeners) {
            listener.shutdown(true);
        }
        shutdownListeners.clear();
    }
}

/**
 * This method is not just literal. First of all, if it is not shut down, you cannot add a ShutdownListener
 * Then, if there is no request, directly call the shutdown method of the incoming shutdownListener
 * If there are still requests, add shutdownListeners. When other calls shutdownComplete, traverse shutdownListeners and call shutdown
 * lock It is mainly for the access security of addShutdownListener and shutdownComplete to shutdownListeners
 * lock The main purpose of wait notify is to implement awaitShutdown mechanism, which is not mentioned here
 */
public void addShutdownListener(final ShutdownListener shutdownListener) {
        synchronized (lock) {
            if (!isShutdown(stateUpdater.get(this))) {
                throw UndertowMessages.MESSAGES.handlerNotShutdown();
            }
            long count = activeCount(stateUpdater.get(this));
            if (count == 0) {
                shutdownListener.shutdown(true);
            } else {
                shutdownListeners.add(shutdownListener);
            }
        }
    }

This is the underlying principle of graceful closing, but we haven't analyzed the third step of the closing process of ApplicationContext and the stop sequence of graceful closing and other lifecycle beans. Let's clarify it here. First, let's take a look at Smart

Start closing the entry of Lifecycle Bean:

DefaultLifecycleProcessor

private void stopBeans() {
    //Read all lifecycle beans and return a LinkedHashMap. The order of traversing it is the same as that of putting it in
    //The loading order is the return order of reading all Lifecycle beans from BeanFactory. This is related to the Bean loading order and is not controllable. Maybe the loading order of this version will change after upgrading one version
	Map<String, Lifecycle> lifecycleBeans = getLifecycleBeans();
	//Group according to the Phase value of each Lifecycle
	//If the Phased interface is implemented, the phase value is returned through its phase method
	//If the Phase interface is not implemented, Phase is considered to be 0
	Map<Integer, LifecycleGroup> phases = new HashMap<>();
	lifecycleBeans.forEach((beanName, bean) -> {
		int shutdownPhase = getPhase(bean);
		LifecycleGroup group = phases.get(shutdownPhase);
		if (group == null) {
			group = new LifecycleGroup(shutdownPhase, this.timeoutPerShutdownPhase, lifecycleBeans, false);
			phases.put(shutdownPhase, group);
		}
		group.add(beanName, bean);
	});
	//If it is not empty, it proves that there is a Lifecycle that needs to be shut down and starts to shut down
	if (!phases.isEmpty()) {
	    //In reverse order of Phase value
		List<Integer> keys = new ArrayList<>(phases.keySet());
		keys.sort(Collections.reverseOrder());
		//Close one by one
		for (Integer key : keys) {
			phases.get(key).stop();
		}
	}
}

To sum up, it is actually:

  1. Get all beans that implement the Lifecycle interface in the Beanfactory of the current ApplicationContext.
  2. Read the Phase value of each Bean. If the Bean implements the Phased interface, take the value returned by the interface method. If it does not implement it, it is 0
  3. Group beans by Phase value
  4. According to the order of Phase value from large to small, traverse each group in turn to close
  5. We won't look at the code in detail for the specific logic of closing each group. We know that when closing, we also look at whether the current Lifecycle Bean still depends on other Lifecycle beans. If so, we will give priority to closing the dependent Lifecycle beans

Let's take a look at the Phase of graceful shutdown lifecycle of webserver mentioned earlier:

class WebServerGracefulShutdownLifecycle implements SmartLifecycle {
    ....
}

SmartLifecycle includes the Phased interface and the default implementation:

public interface SmartLifecycle extends Lifecycle, Phased {
    int DEFAULT_PHASE = Integer.MAX_VALUE;
    @Override
	default int getPhase() {
		return DEFAULT_PHASE;
	}
}

It can be seen that as long as SmartLifecycle is implemented, Phase defaults to the maximum value. Therefore, the graceful shutdown Lifecycle: the Phase of webserver graceful shutdown Lifecycle is the maximum value, that is, it belongs to the group that is shut down first.

Summary access point - spring boot + undertow & synchronous Servlet environment

1. Access point 1 - by adding a Bean that implements the SmartLifecycle interface, the specified Phase is smaller than the Phase of webserver gracefulshutdown lifecycle

In the previous analysis, we already know that the Phase of webserver gracefulshutdown Lifecycle is the maximum value, that is, it belongs to the group that is shut down first. What we want to achieve is to add some elegant closing logic after this, and before the Destroy Bean (the fourth step of ApplicationContext closing mentioned earlier) (that is, before the Bean is destroyed, some beans cannot be used in destruction, such as some beans in the micro service call. At this time, if there are still tasks that have not been completed, they will report exceptions). The first thing we want to think of is to add a Phase to the Lifecycle at this time to realize our elegant closing access, for example:

@Log4j2
@Component
public class BizThreadPoolShutdownLifecycle implements SmartLifecycle {
    private volatile boolean running = false;
    
    @Override
    public int getPhase() {
        //After the webserver gracefulshutdown lifecycle group
        return SmartLifecycle.DEFAULT_PHASE - 1;
    }

    @Override
    public void start() {
        this.running = true;
    }

    @Override
    public void stop() {
        //Elegant Shutdown Logic written here
        this.running = false;
    }

    @Override
    public boolean isRunning() {
        return running;
    }
}

In this way, the compatibility is better, and the dependent version of the upgraded underlying framework basically does not need to be modified. However, the problem is that a framework with Lifecycle bean s may be introduced. Although its phase is correct and smaller than webserver gracefulshutdown Lifecycle, smartlifecycle DEFAULT_ Phase - 1 is equal to our customized Lifecycle, and it just needs to wait for the end of our elegant shutdown, and the Lifecycle of the framework runs to our customized Lifecycle to stop due to the problem of Bean loading sequence. There will be problems, but the probability of problems is not large.

2. Access point 2 - it is realized by adding a ShutdownListener to the list < ShutdownListener > shutdownlisteners of GracefulShutdownHandler reflected in Undertow

This implementation method obviously limits that the container must be undertow, and the compatibility of the upgrade may be poor. However, we can execute our graceful shutdown logic immediately after the graceful shutdown of the Http thread pool. There is no need to worry that the introduction of a dependency will cause problems in our customized graceful shutdown sequence. Please judge which is better or worse than the first one. The simple implementation is:

@Log4j2
@Componenet
//Load only when the class Undertow is included
@ConditionalOnClass(name = "io.undertow.Undertow")
public class ThreadPoolFactoryGracefulShutDownHandler implements ApplicationListener<ApplicationEvent> {
    
    //Gets the handle to the gracefulShutdown field of the operation UndertowWebServer
    private static VarHandle undertowGracefulShutdown;
    //Gets the handle to the shutdownListeners field of the GracefulShutdownHandler operation
    private static VarHandle undertowShutdownListeners;

    static {
        try {
            undertowGracefulShutdown = MethodHandles
                    .privateLookupIn(UndertowWebServer.class, MethodHandles.lookup())
                    .findVarHandle(UndertowWebServer.class, "gracefulShutdown",
                            GracefulShutdownHandler.class);
            undertowShutdownListeners = MethodHandles
                    .privateLookupIn(GracefulShutdownHandler.class, MethodHandles.lookup())
                    .findVarHandle(GracefulShutdownHandler.class, "shutdownListeners",
                            List.class);
        } catch (Exception e) {
            log.warn("ThreadPoolFactoryGracefulShutDownHandler undertow not found, ignore fetch var handles");
        }
    }

    @Override
    public void onApplicationEvent(ApplicationEvent event) {
        //Only the WebServer initializedevent event is processed. This event is issued after the WebServer is created and initialized
        if (event instanceof WebServerInitializedEvent) {
            WebServer webServer = ((WebServerInitializedEvent) event).getWebServer();
            //Check whether the current web container is under tow
            if (webServer instanceof UndertowWebServer) {
                GracefulShutdownHandler gracefulShutdownHandler = (GracefulShutdownHandler) undertowGracefulShutdown.getVolatile(webServer);
                //If graceful shutdown is enabled, the gracefulshutdown handler is not null
                if (gracefulShutdownHandler != null) {
                    var shutdownListeners = (List<GracefulShutdownHandler.ShutdownListener>) undertowShutdownListeners.getVolatile(gracefulShutdownHandler);
                    shutdownListeners.add(shutdownSuccessful -> {
                        if (shutdownSuccessful) {
                            //Add your elegant closing logic
                        } else {
                            log.info("ThreadPoolFactoryGracefulShutDownHandler-onApplicationEvent shutdown failed");
                        }
                    });
                }
            }
        }
    }
}

How to realize graceful shutdown of additional thread pool

Now that we know how to access, how to close the custom thread pools in the project? First of all, we must get all the thread pools to be checked first. In different environments, the methods are different, and the implementation is relatively simple. We won't repeat here. We assume that we get all the thread pools, and there are only the following two implementations of thread pools (in fact, the two thread pools in JDK, ignoring the scheduled task thread pool scheduler):

  • java.util.concurrent.ThreadPoolExecutor: the most commonly used thread pool
  • java.util.concurrent.ForkJoinPool: a thread pool in the form of ForkJoin

For these two thread pools, how to judge whether they have no tasks to execute? Reference code:

public static boolean isCompleted(ExecutorService executorService) {
    if (executorService instanceof ThreadPoolExecutor) {
        ThreadPoolExecutor threadPoolExecutor = (ThreadPoolExecutor) executorService;
        //For ThreadPoolExecutor, it is judged that there are no active threads
        return threadPoolExecutor.getActiveCount() == 0;
    } else if (executorService instanceof ForkJoinPool) {
        //For ForkJoinPool, it is more complicated to judge that there are neither active threads nor running threads, and there are no tasks in the queue and no tasks waiting to be submitted
        ForkJoinPool forkJoinPool = (ForkJoinPool) executorService;
        return forkJoinPool.getActiveThreadCount() == 0
                && forkJoinPool.getRunningThreadCount() == 0
                && forkJoinPool.getQueuedTaskCount() == 0
                && forkJoinPool.getQueuedSubmissionCount() == 0;
    }
    return true;
}

How to judge that all thread pools have no tasks? Since the actual application may be self liberating, for example, thread pool A may submit tasks to thread pool B, thread pool B may submit tasks to thread pool C, and thread pool C may submit tasks to A and B, if we traverse all thread pools in turn, we find that the above method isCompleted returns true, It is also not guaranteed that all thread pools will run completely (for example, I check A, B and C in turn. When I check C, C submits tasks to A and B and ends. C checks and finds that the tasks are completed, but A and B that have been checked before have unfinished tasks). So my solution is: disrupt all thread pools, traverse, and check whether each thread pool is completed. If it is found that it is completed, the counter will be incremented by 1. As long as there is unfinished, it will not be incremented and the counter will be cleared. Keep looping, sleep for 1 second each time until the counter is 3 (that is, check all thread pools in random order for three consecutive times without any task):

List<ExecutorService> executorServices = Get all thread pools
for (int i = 0; i < 3; ) {
    //For three consecutive times, check in random order that all thread pools are completed before they are considered to be truly completed
    Collections.shuffle(executorServices);
    if (executorServices.stream().allMatch(ThreadPoolFactory::isCompleted)) {
        i++;
        log.info("all threads pools are completed, i: {}", i);
    } else {
        //Three consecutive times
        i = 0;
        log.info("not all threads pools are completed, wait for 1s");
        try {
            TimeUnit.SECONDS.sleep(1);
        } catch (InterruptedException ignored) {
        }
    }
}

How is it handled in rocketmq spring starter

The official spring boot starter of rocketmq: https://github.com/apache/rocketmq-spring

The first access point method we mentioned here is to make the consumer container into SmartLifcycle (Phase is the maximum value and belongs to the highest priority shutdown group), and add the Shutdown Logic in it:

DefaultRocketMQListenerContainer

@Override
public int getPhase() {
    // Returning Integer.MAX_VALUE only suggests that
    // we will be the first bean to shutdown and last bean to start
    return Integer.MAX_VALUE;
}
@Override
public void stop(Runnable callback) {
    stop();
    callback.run();
}
@Override
public void stop() {
    if (this.isRunning()) {
        if (Objects.nonNull(consumer)) {
            //Close consumer
            consumer.shutdown();
        }
        setRunning(false);
    }
}

WeChat search "my programming meow" attention to the official account, plus WeChat, daily brush, easy to upgrade technology, and acquire various offer:

I will send out some good news videos of official communities in various frameworks and add personal translation subtitles to the following addresses (including the official account above).

Topics: Spring Boot