Eureka Server prometheus monitoring service health status

Posted by gmbot on Sat, 02 Nov 2019 22:32:33 +0100

background

The service process monitoring is generally handled by related components. In the early business, the DB resources used by specific services exceed the quota allocation, which leads to the failure of health detection. The services are successively offline from Eureka. When the service monitoring is not routed to a specific node, or routed to a specific node but does not encounter a threshold scenario, the alarm will not be triggered, which means that the service is transient and normal. Successively offline; Eureka server, as the registry, can sense the service registration status earlier. The instance node is hung (there are fewer registered instances) and the node status is not UP

Monitoring scheme

  • Eureka regularly collects registration information, instance node number and instance node status information
  • prometheus regularly collects data collected by Eureka server
  • grafana query and data alarm

Eureka registration information data collection

metric data structure definition

  • Statistics node status

    type: Gauge

eureka_instance_status{client="{client}",status="{status}"}

client: eureka client application name

status enumeration

state enum
UP 1
DOWN 5
STARTING 2
OUT_OF_SERVICE 3
UNKNOW 4

If the average value in the latest n time is greater than 1, it means abnormal and an alarm is executed

  • Count the number of nodes

    type: Gauge

eureka_instance_count{client="{client}",count="{count}"}

client: eureka client application name

count: client count

java pom dependency

<!-- boot2.x compatible-->
<!-- The client -->
<dependency>
    <groupId>io.prometheus</groupId>
    <artifactId>simpleclient</artifactId>
    <version>0.6.0</version>
</dependency>
<!-- Hotspot JVM metrics-->
<dependency>
    <groupId>io.prometheus</groupId>
    <artifactId>simpleclient_hotspot</artifactId>
    <version>0.6.0</version>
</dependency>
<!-- Exposition HTTPServer-->
<dependency>
    <groupId>io.prometheus</groupId>
    <artifactId>simpleclient_httpserver</artifactId>
    <version>0.6.0</version>
</dependency>
<!-- Pushgateway exposition-->
<dependency>
    <groupId>io.prometheus</groupId>
    <artifactId>simpleclient_pushgateway</artifactId>
    <version>0.6.0</version>
</dependency>
<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-registry-prometheus</artifactId>
    <version>1.1.4</version>
</dependency>
<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-core</artifactId>
    <version>1.1.4</version>
</dependency>

java code

@Component
public class InstanceStateCollector {

    @Autowired
    PeerAwareInstanceRegistry registry;

    private static final Logger log = LoggerFactory.getLogger(InstanceStateCollector.class);

    @Scheduled(cron = "*/5 * * * * ?")
    public void collect() {

        Applications applications = registry.getApplications();

        applications.getRegisteredApplications().forEach((registeredApplication) -> {
            Integer count = registeredApplication.size();
            String client = registeredApplication.getName();

            log.debug("client :{}, count :{}", client, count);
            PrometheusMetricsUtils.metricInstanceCount(client, count);

            registeredApplication.getInstances().forEach((instance) -> {
                String instanceId = instance.getInstanceId();
                log.debug("client :{}, instance :{}, status :{}", client, instanceId, instance.getStatus());
                PrometheusMetricsUtils.metricInstanceStatus(client, instanceId, instance.getStatus());

            });
        });
    }
}
@Service
public class PrometheusMetricsService {

    /**
     * Instance status statistics
     * eureka_instance_status{client="{client}",status="{status}"}
     */
    private static final String EUREKA_INSTANCE_STATUS = "mall_eureka_instance_status";

    /**
     * Instance quantity statistics
     * eureka_instance_count{client="{client}",count="{count}"}
     */
    private static final String EUREKA_INSTANCE_COUNT = "mall_eureka_instance_count";

    private static final String LABEL_CLIENT = "client";

    private final Gauge instanceStatusGauge;
    private final Gauge instanceCountGauge;


    public PrometheusMetricsService(CollectorRegistry registry) {
        instanceStatusGauge = Gauge
                .build(EUREKA_INSTANCE_STATUS, "instance status")
                .labelNames(LABEL_CLIENT)
                .register(registry);

        instanceCountGauge = Gauge
                .build(EUREKA_INSTANCE_COUNT, "instance count")
                .labelNames(LABEL_CLIENT)
                .register(registry);
    }

    /**
     * Instance status burying point
     *
     * @param client   client name || application name
     * @param statusValue   status
     */
    void metricInstanceStatus(String client, Integer statusValue) {
        instanceStatusGauge.labels(client).set(statusValue);
    }

    /**
     * Number of cases buried
     *
     * @param client client name || application name
     * @param count  count
     */
    void metricInstanceCount(String client, Integer count) {
        instanceCountGauge.labels(client).set(count);
    }



}

Prometheus collects Eureka server data

prometheus.yml

  - job_name: 'mgmall-eureka'
    scrape_interval: 10s 
    metrics_path: '/actuator/prometheus'
    static_configs:
      - targets: ['10.124.129.42:19110']

Grafana report maintenance

Report form

mall_eureka_instance_count{client="MGMALL-CONFIG"}
.....

![image-20190531140258528](/Users/yugj/Library/Application Support/typora-user-images/image-20190531140258528.png)

Monitor

avg() query(A,10s,now) is below 1

![image-20190531140319350](/Users/yugj/Library/Application Support/typora-user-images/image-20190531140319350.png)

Topics: Programming Java jvm