background
The service process monitoring is generally handled by related components. In the early business, the DB resources used by specific services exceed the quota allocation, which leads to the failure of health detection. The services are successively offline from Eureka. When the service monitoring is not routed to a specific node, or routed to a specific node but does not encounter a threshold scenario, the alarm will not be triggered, which means that the service is transient and normal. Successively offline; Eureka server, as the registry, can sense the service registration status earlier. The instance node is hung (there are fewer registered instances) and the node status is not UP
Monitoring scheme
- Eureka regularly collects registration information, instance node number and instance node status information
- prometheus regularly collects data collected by Eureka server
- grafana query and data alarm
Eureka registration information data collection
metric data structure definition
-
Statistics node status
type: Gauge
eureka_instance_status{client="{client}",status="{status}"}
client: eureka client application name
status enumeration
state | enum |
---|---|
UP | 1 |
DOWN | 5 |
STARTING | 2 |
OUT_OF_SERVICE | 3 |
UNKNOW | 4 |
If the average value in the latest n time is greater than 1, it means abnormal and an alarm is executed
-
Count the number of nodes
type: Gauge
eureka_instance_count{client="{client}",count="{count}"}
client: eureka client application name
count: client count
java pom dependency
<!-- boot2.x compatible--> <!-- The client --> <dependency> <groupId>io.prometheus</groupId> <artifactId>simpleclient</artifactId> <version>0.6.0</version> </dependency> <!-- Hotspot JVM metrics--> <dependency> <groupId>io.prometheus</groupId> <artifactId>simpleclient_hotspot</artifactId> <version>0.6.0</version> </dependency> <!-- Exposition HTTPServer--> <dependency> <groupId>io.prometheus</groupId> <artifactId>simpleclient_httpserver</artifactId> <version>0.6.0</version> </dependency> <!-- Pushgateway exposition--> <dependency> <groupId>io.prometheus</groupId> <artifactId>simpleclient_pushgateway</artifactId> <version>0.6.0</version> </dependency> <dependency> <groupId>io.micrometer</groupId> <artifactId>micrometer-registry-prometheus</artifactId> <version>1.1.4</version> </dependency> <dependency> <groupId>io.micrometer</groupId> <artifactId>micrometer-core</artifactId> <version>1.1.4</version> </dependency>
java code
@Component public class InstanceStateCollector { @Autowired PeerAwareInstanceRegistry registry; private static final Logger log = LoggerFactory.getLogger(InstanceStateCollector.class); @Scheduled(cron = "*/5 * * * * ?") public void collect() { Applications applications = registry.getApplications(); applications.getRegisteredApplications().forEach((registeredApplication) -> { Integer count = registeredApplication.size(); String client = registeredApplication.getName(); log.debug("client :{}, count :{}", client, count); PrometheusMetricsUtils.metricInstanceCount(client, count); registeredApplication.getInstances().forEach((instance) -> { String instanceId = instance.getInstanceId(); log.debug("client :{}, instance :{}, status :{}", client, instanceId, instance.getStatus()); PrometheusMetricsUtils.metricInstanceStatus(client, instanceId, instance.getStatus()); }); }); } }
@Service public class PrometheusMetricsService { /** * Instance status statistics * eureka_instance_status{client="{client}",status="{status}"} */ private static final String EUREKA_INSTANCE_STATUS = "mall_eureka_instance_status"; /** * Instance quantity statistics * eureka_instance_count{client="{client}",count="{count}"} */ private static final String EUREKA_INSTANCE_COUNT = "mall_eureka_instance_count"; private static final String LABEL_CLIENT = "client"; private final Gauge instanceStatusGauge; private final Gauge instanceCountGauge; public PrometheusMetricsService(CollectorRegistry registry) { instanceStatusGauge = Gauge .build(EUREKA_INSTANCE_STATUS, "instance status") .labelNames(LABEL_CLIENT) .register(registry); instanceCountGauge = Gauge .build(EUREKA_INSTANCE_COUNT, "instance count") .labelNames(LABEL_CLIENT) .register(registry); } /** * Instance status burying point * * @param client client name || application name * @param statusValue status */ void metricInstanceStatus(String client, Integer statusValue) { instanceStatusGauge.labels(client).set(statusValue); } /** * Number of cases buried * * @param client client name || application name * @param count count */ void metricInstanceCount(String client, Integer count) { instanceCountGauge.labels(client).set(count); } }
Prometheus collects Eureka server data
prometheus.yml
- job_name: 'mgmall-eureka' scrape_interval: 10s metrics_path: '/actuator/prometheus' static_configs: - targets: ['10.124.129.42:19110']
Grafana report maintenance
Report form
mall_eureka_instance_count{client="MGMALL-CONFIG"} .....
![image-20190531140258528](/Users/yugj/Library/Application Support/typora-user-images/image-20190531140258528.png)
Monitor
avg() query(A,10s,now) is below 1
![image-20190531140319350](/Users/yugj/Library/Application Support/typora-user-images/image-20190531140319350.png)