Practice of lossless publishing of microservices in elastic cloud

Posted by nublet on Wed, 26 Jan 2022 21:19:06 +0100

1. Background

The original system is a single service, which leads to more and more complex logic and affects the whole body. In order to improve the scalability of the system, we split the original monomer system into different microservices according to functions.

2. Elastic cloud configuration

All our microservices are deployed on the elastic cloud, and we hope to achieve lossless release when deploying services. To achieve this, the following steps need to be realized:

  1. Before the container is destroyed, the service process can actively delete it from the eureka registry list;
  2. After deleting an instance from the eureka registry list, the instance will be able to undertake some traffic within a certain period of time, because other eureka clients still have the cache of the instance;
  3. Finally, wait for other threads to finish processing, and then destroy the container.

Let's see how to realize the above requirements.

2.1 eureka active offline mode

There are several ways to get the eureka registry service offline:

  1. Direct kill service

    This method is simple and crude, but in this case, although the client has stopped the service, it still exists in the registry list, which will cause errors in some module calls, so this scheme pass es.

  2. Send delete request to Eureka service

    http://{eureka-server:port}/eureka/apps/{application.name}/{instance.name}

    This scheme only cancels the registration of the service, but when the eureka service receives the heartbeat request again, it will re register the instance with eureka, so this scheme also pass es.

  3. The client notifies Eureka service to go offline

    DiscoveryManager.getInstance().shutdownComponent();

    eureka client can actively notify the registration center to go offline through the above line of code, and will not register with eureka after going offline. This scheme meets our requirements, but we need to confirm when this line of code needs to be called?

2.2 offline timing

Here, we first need to determine the time to delete an instance from the eureka registry. We have the following ideas:

1. Customize the controller interface

@GetMapping("/shutdown")
public void shutdown() {
  DiscoveryManager.getInstance().shutdownComponent();
}

Before container deployment, call this interface to go offline before deployment. But this has great disadvantages: 1 The interface cannot be exposed. At the same time, in order to avoid malicious calls by others, some authentication operations need to be added; 2. It cannot be integrated into the deployment script because the students of the elastic cloud team learned that control will not be executed before the container is destroyed The stop method in SH sends a SIGTERM signal, so there is no way to write the interface call to the deployment script. Therefore, if this method is adopted, the interface can only be called manually before each container goes online, which is too risky because this scheme is inappropriate.

2. Customize the Shutdown Hook

Runtime.getRuntime().addShutdownHook(new Thread(() -> {
  // Remove instance from eureka registration list
  DiscoveryManager.getInstance().shutdownComponent();
  // Sleep 120S
  try {
  	Thread.sleep(120 * 1000);
  } catch (Exception ignore) {
  }
}));

After receiving the SIGTERM signal from the system, the JVM will call the method in the Shutdown Hook. Is it OK to register such a Shutdown Hook?

After testing, it is found that it is not perfect. Although the eureka service can be notified in time when it goes offline to change the service, Tomcat will also refuse to receive the next request, and the druid thread pool will be close d; In this way, because other microservices cache the modified instance, there will be requests to this instance, resulting in an error in the request.

3. Spring Shutdown Hook

What causes the above situation? Through the Spring source code, we can find that during the service startup process, SpringBoot will automatically register a Shutdown Hook. The source code is as follows:

// org.springframework.boot.SpringApplication#refreshContext
private void refreshContext(ConfigurableApplicationContext context) {
  this.refresh((ApplicationContext)context);
  if (this.registerShutdownHook) {
    try {
      // Register shutdown hook
      context.registerShutdownHook();
    } catch (AccessControlException var3) {
    }
  }
}

During the startup process of SpringBoot, after refreshing the Context, if registerShutdownHook is not turned off manually (turned on by default), a Shutdown Hook will be registered.

// org.springframework.context.support.AbstractApplicationContext#registerShutdownHook
@Override
public void registerShutdownHook() {
  if (this.shutdownHook == null) {
    // No shutdown hook registered yet.
    this.shutdownHook = new Thread(SHUTDOWN_HOOK_THREAD_NAME) {
      @Override
      public void run() {
        synchronized (startupShutdownMonitor) {
          // The logic that shutdown hook really needs to execute
          doClose();
        }
      }
    };
    // Register shutdown hook
    Runtime.getRuntime().addShutdownHook(this.shutdownHook);
  }
}

The specific execution logic of Spring Shutdown Hook will be analyzed later; Now let's see if the JVM registers multiple shutdown hooks, what is their execution order?

// java.lang.Runtime#addShutdownHook
public void addShutdownHook(Thread hook) {
  SecurityManager sm = System.getSecurityManager();
  if (sm != null) {
    sm.checkPermission(new RuntimePermission("shutdownHooks"));
  }
  ApplicationShutdownHooks.add(hook);
}
// java.lang.ApplicationShutdownHooks

/* The set of registered hooks */
private static IdentityHashMap<Thread, Thread> hooks;

static synchronized void add(Thread hook) {
  if(hooks == null)
    throw new IllegalStateException("Shutdown in progress");

  if (hook.isAlive())
    throw new IllegalArgumentException("Hook already running");

  if (hooks.containsKey(hook))
    throw new IllegalArgumentException("Hook previously registered");

  hooks.put(hook, hook);
}

As you can see, when we add a Shutdown Hook, we will call ApplicationShutdownHooks Add (hook). Add a hook to the static variable private static identityhashmap < thread, thread > hooks under the ApplicationShutdownHooks class. The hook itself is a thread object.

// java.lang.ApplicationShutdownHooks#runHooks

/* Iterates over all application hooks creating a new thread for each
 * to run in. Hooks are run concurrently and this method waits for
 * them to finish.
 */
static void runHooks() {
  Collection<Thread> threads;
  synchronized(ApplicationShutdownHooks.class) {
    threads = hooks.keySet();
    hooks = null;
  }

  for (Thread hook : threads) {
    hook.start();
  }
  for (Thread hook : threads) {
    while (true) {
      try {
        hook.join();
        break;
      } catch (InterruptedException ignored) {
      }
    }
  }
}

The above source code is the execution logic of application level hooks. When hooks are executed, they call the start method of the tree class, so multiple hooks are executed asynchronously, but they will not exit until all hooks are executed.

Here, we can determine the reason for the problem of scheme 2: Although we make a smart sleep 120s in the custom Shutdown Hook, because its execution is not synchronized with the Spring Shutdown Hook, spring is also doing some finishing work during the sleep of the custom hook, resulting in an error in the request to the modification instance at this time.

Since the custom Shutdown Hook scheme doesn't work, can we do some operations here in Spring Shutdown Hook? Next, let's look at the specific implementation logic of Spring Shutdown Hook:

// org.springframework.context.support.AbstractApplicationContext#doClose
protected void doClose() {
  if (this.active.get() && this.closed.compareAndSet(false, true)) {
    
    LiveBeansView.unregisterApplicationContext(this);

    // 1. Publish shutdown event. 
    publishEvent(new ContextClosedEvent(this));

    // 2. Stop all Lifecycle beans, to avoid delays during individual destruction.
    if (this.lifecycleProcessor != null) {
      this.lifecycleProcessor.onClose();
    }

    // 3. Destroy all cached singletons in the context's BeanFactory.
    destroyBeans();

    // 4. Close the state of this context itself.
    closeBeanFactory();

    // 5. Let subclasses do some final clean-up if they wish...
    onClose();

    // 6. Reset local application listeners to pre-refresh state.
    if (this.earlyApplicationListeners != null) {
      this.applicationListeners.clear();
      this.applicationListeners.addAll(this.earlyApplicationListeners);
    }

    this.active.set(false);
  }
}

The above source code only retains the key code. You can see that Spring Shutdown Hook has done these things:

  1. Publishing the Context Close event allows the listener listening to this event to execute some custom logic before closing the application;
  2. Execute the onClose method of lifecycle processor;
  3. Destroy all cached singletons in Context BeanFactory;
  4. Close the state of the current context;
  5. Subclasses can implement OnClose method by themselves and do their own cleaning work;
  6. Reset the local application listener to the pre refresh state;

Since the first step of the Spring Shutdown Hook execution logic is to publish the Context Close event, we can create a listener to listen to this event, and then execute the logic of deleting the instance from the eureka registration list in the listening callback. The implementation is as follows:

@Component
public class EurekaShutdownConfig implements ApplicationListener<ContextClosedEvent>, PriorityOrdered {
    private static final Logger log = LoggerFactory.getLogger(EurekaShutdownConfig.class);

  	@Override
    public void onApplicationEvent(ContextClosedEvent event) {
        try {
            log.info(LogUtil.logMsg("_shutdown", "msg", "eureka instance offline begin!"));
            DiscoveryManager.getInstance().shutdownComponent();
            log.info(LogUtil.logMsg("_shutdown", "msg", "eureka instance offline end!"));
            log.info(LogUtil.logMsg("_shutdown", "msg", "start sleep 120S for cache!"));
            Thread.sleep(120 * 1000);
            log.info(LogUtil.logMsg("_shutdown", "msg", "stop sleep 120S for cache!"));
        } catch (Throwable ignore) {
        }
    }

  	@Override
    public int getOrder() {
        return 0;
    }
}

So far, the time to actively delete instances from the eureka registry has been determined.

2.3 other configurations

application.yml

server:
	# Graceful shutdown strategy
	shutdown: graceful
	# Other configurations
	...

The time for tomcat to perform graceful shutdown is in lifecycle processor Onclose(), which will not be described in detail here. You can browse the source code by yourself.

Custom thread pool

@Configuration
public class MyThreadTaskExecutor {

    @Bean
    public Executor taskExecutor() {
        ThreadPoolTaskExecutor taskExecutor = new ThreadPoolTaskExecutor();

        // Thread pool parameters
        taskExecutor.setCorePoolSize(8);
        taskExecutor.setMaxPoolSize(32);
        taskExecutor.setQueueCapacity(9999);
        taskExecutor.setKeepAliveSeconds(60);

        taskExecutor.setThreadNamePrefix("async-");

        taskExecutor.setTaskDecorator(new TraceIdTaskDecorator());

        // Wait for asynchronous thread execution to complete before service deactivation
        taskExecutor.setWaitForTasksToCompleteOnShutdown(true);
        // Forced shutdown after 60S
        taskExecutor.setAwaitTerminationSeconds(60);

        taskExecutor.initialize();

        return taskExecutor;
    }
}

The shutdown of the custom thread pool and database connection pool is performed when the bean is destroyed.

3. Summary

So far, we can summarize the processing logic after the service receives the SIGTERM signal:

If there is any mistake, please correct it.

Topics: Java Spring Spring Cloud