Skywalking monitors intrusion-free-k8s cluster for jvm
An APM open source project written by our Chinese people has been successfully approved by Apache for incubation. Look at Git specifically.
git address:
Many large factories take a second development project to serve their own cloud. I know Aliyun, Tencent, Qiniuyun and Huawei [ps: Developers are the big guys of Huawei]. This is a recognition of the project, but it is very expensive, "expensive" to understand, not to mention more.
Github Open Source is suitable for local deployment. It still needs some operation to go to the cloud. There is Helm on Git to deploy on the cloud[ ] But they said it was not maintained and there were many pits.
I don't know how other people deploy it. I see Skywalking as a two-part deployment, server+agent.
server: busybox + elasticsearch + oap + ui
Agent: I put the agent into the service I want to monitor by injecting the container. [ps: Others build a war package directly from probe packages and their own services]
server deployment
All deployments are under the skywalking namespace, otherwise there will be an error. To deploy in other namespaces, modify most configuration files.
I have a long sleep setup here to reduce the frequency of restarts.
apiVersion: apps/v1 kind: Deployment metadata: name: busybox spec: selector: matchLabels: app: busybox replicas: 1 template: metadata: labels: app: busybox spec: containers: - name: busybox image: busybox:1.30 resources: requests: cpu: 100m memory: 1Gi limits: cpu: 100m memory: 1Gi command: - sleep - "36000" imagePullPolicy: Always ports: - containerPort: 80 protocol: TCP
Storage is based on the official recommendation of es, git also saw someone mysql to do storage. Should be other can also, there is a need to see the source code to do.
apiVersion: v1 kind: Service metadata: name: elasticsearch labels: service: elasticsearch spec: clusterIP: None ports: - port: 9200 name: serving - port: 9300 name: node-to-node selector: service: elasticsearch
The resource constraints of es are customized according to their own needs. It is not yet known how much memory an agent needs.
Aliyun for PV and PVC.
Impression of the official are multi-copy, I do not require data retention here, internal use is not afraid to lose data, only use one.
apiVersion: apps/v1 kind: StatefulSet metadata: name: elasticsearch labels: service: elasticsearch spec: serviceName: elasticsearch # NOTE: This is number of nodes that we want to run # you may update this replicas: 1 selector: matchLabels: service: elasticsearch template: metadata: labels: service: elasticsearch spec: terminationGracePeriodSeconds: 300 initContainers: # NOTE: # This is to fix the permission on the volume # By default elasticsearch container is not run as # non root user. # - name: fix-the-volume-permission image: busybox imagePullPolicy: IfNotPresent command: - sh - -c - chown -R 1000:1000 /usr/share/elasticsearch/data securityContext: privileged: true volumeMounts: - name: data mountPath: /usr/share/elasticsearch/data # NOTE: # To increase the default vm.max_map_count to 262144 # - name: increase-the-vm-max-map-count image: busybox imagePullPolicy: IfNotPresent command: - sysctl - -w - vm.max_map_count=262144 securityContext: privileged: true # To increase the ulimit # - name: increase-the-ulimit image: busybox imagePullPolicy: IfNotPresent command: - sh - -c - ulimit -n 65536 securityContext: privileged: true containers: - name: elasticsearch image: imagePullPolicy: IfNotPresent ports: - containerPort: 9200 name: http - containerPort: 9300 name: tcp # NOTE: you can increase this resources resources: requests: memory: 4Gi limits: memory: 4Gi env: # NOTE: the cluster name; update this - name: value: elasticsearch-cluster - name: valueFrom: fieldRef: fieldPath: # NOTE: This will tell the elasticsearch node where to connect to other nodes to form a cluster - name: value: elasticsearch:9300 # NOTE: You can increase the heap size - name: ES_JAVA_OPTS value: -Xms3g -Xmx3g volumeMounts: - name: data mountPath: /usr/share/elasticsearch/data volumeClaimTemplates: - metadata: name: data spec: accessModes: - ReadWriteOnce storageClassName: alicloud-disk-efficiency-shanghai-bdf # NOTE: You can increase the storage size resources: requests: storage: 50Gi
oap is the core of skywalking.
apiVersion: v1 kind: ServiceAccount metadata: name: skywalking-oap namespace: skywalking --- apiVersion: kind: RoleBinding metadata: name: skywalking-oap namespace: skywalking roleRef: apiGroup: kind: Role name: skywalking-oap subjects: - kind: ServiceAccount name: skywalking-oap namespace: skywalking --- kind: Role apiVersion: metadata: name: skywalking-oap namespace: skywalking rules: - apiGroups: [""] resources: ["pods","endpoints","services","nodes"] verbs: ["get", "watch", "list","create","update"]
apiVersion: v1 kind: ConfigMap metadata: name: oap-config data: application.yml: |- cluster: # standalone: # Please check your ZooKeeper is 3.5+, However, it is also compatible with ZooKeeper 3.4.x. Replace the ZooKeeper 3.5+ # library the oap-libs folder with your ZooKeeper 3.4.x library. # zookeeper: # nameSpace: ${SW_NAMESPACE:""} # hostPort: ${SW_CLUSTER_ZK_HOST_PORT:localhost:2181} # #Retry Policy # baseSleepTimeMs: ${SW_CLUSTER_ZK_SLEEP_TIME:1000} # initial amount of time to wait between retries # maxRetries: ${SW_CLUSTER_ZK_MAX_RETRIES:3} # max number of times to retry kubernetes: watchTimeoutSeconds: ${SW_CLUSTER_K8S_WATCH_TIMEOUT:60} namespace: ${SW_CLUSTER_K8S_NAMESPACE:skywalking} labelSelector: ${SW_CLUSTER_K8S_LABEL:app=oap,release=skywalking} uidEnvName: ${SW_CLUSTER_K8S_UID:SKYWALKING_COLLECTOR_UID} # consul: # serviceName: ${SW_SERVICE_NAME:"SkyWalking_OAP_Cluster"} # Consul cluster nodes, example:,, # hostPort: ${SW_CLUSTER_CONSUL_HOST_PORT:localhost:8500} core: default: # Mixed: Receive agent data, Level 1 aggregate, Level 2 aggregate # Receiver: Receive agent data, Level 1 aggregate # Aggregator: Level 2 aggregate role: ${SW_CORE_ROLE:Mixed} # Mixed/Receiver/Aggregator restHost: ${SW_CORE_REST_HOST:} restPort: ${SW_CORE_REST_PORT:12800} restContextPath: ${SW_CORE_REST_CONTEXT_PATH:/} gRPCHost: ${SW_CORE_GRPC_HOST:} gRPCPort: ${SW_CORE_GRPC_PORT:11800} downsampling: - Hour - Day - Month # Set a timeout on metric data. After the timeout has expired, the metric data will automatically be deleted. recordDataTTL: ${SW_CORE_RECORD_DATA_TTL:90} # Unit is minute minuteMetricsDataTTL: ${SW_CORE_MINUTE_METRIC_DATA_TTL:90} # Unit is minute hourMetricsDataTTL: ${SW_CORE_HOUR_METRIC_DATA_TTL:36} # Unit is hour dayMetricsDataTTL: ${SW_CORE_DAY_METRIC_DATA_TTL:45} # Unit is day monthMetricsDataTTL: ${SW_CORE_MONTH_METRIC_DATA_TTL:18} # Unit is month storage: elasticsearch: nameSpace: ${SW_NAMESPACE:""} clusterNodes: ${SW_STORAGE_ES_CLUSTER_NODES:elasticsearch:9200} user: ${SW_ES_USER:""} password: ${SW_ES_PASSWORD:""} indexShardsNumber: ${SW_STORAGE_ES_INDEX_SHARDS_NUMBER:2} indexReplicasNumber: ${SW_STORAGE_ES_INDEX_REPLICAS_NUMBER:0} # Batch process setting, refer to bulkActions: ${SW_STORAGE_ES_BULK_ACTIONS:2000} # Execute the bulk every 2000 requests bulkSize: ${SW_STORAGE_ES_BULK_SIZE:20} # flush the bulk every 20mb flushInterval: ${SW_STORAGE_ES_FLUSH_INTERVAL:10} # flush the bulk every 10 seconds whatever the number of requests concurrentRequests: ${SW_STORAGE_ES_CONCURRENT_REQUESTS:2} # the number of concurrent requests metadataQueryMaxSize: ${SW_STORAGE_ES_QUERY_MAX_SIZE:5000} segmentQueryMaxSize: ${SW_STORAGE_ES_QUERY_SEGMENT_SIZE:200} # h2: # driver: ${SW_STORAGE_H2_DRIVER:org.h2.jdbcx.JdbcDataSource} # url: ${SW_STORAGE_H2_URL:jdbc:h2:mem:skywalking-oap-db} # user: ${SW_STORAGE_H2_USER:sa} # metadataQueryMaxSize: ${SW_STORAGE_H2_QUERY_MAX_SIZE:5000} # mysql: # metadataQueryMaxSize: ${SW_STORAGE_H2_QUERY_MAX_SIZE:5000} receiver-sharing-server: default: receiver-register: default: receiver-trace: default: bufferPath: ${SW_RECEIVER_BUFFER_PATH:../trace-buffer/} # Path to trace buffer files, suggest to use absolute path bufferOffsetMaxFileSize: ${SW_RECEIVER_BUFFER_OFFSET_MAX_FILE_SIZE:100} # Unit is MB bufferDataMaxFileSize: ${SW_RECEIVER_BUFFER_DATA_MAX_FILE_SIZE:500} # Unit is MB bufferFileCleanWhenRestart: ${SW_RECEIVER_BUFFER_FILE_CLEAN_WHEN_RESTART:false} sampleRate: ${SW_TRACE_SAMPLE_RATE:10000} # The sample rate precision is 1/10000. 10000 means 100% sample in default. slowDBAccessThreshold: ${SW_SLOW_DB_THRESHOLD:default:200,mongodb:100} # The slow database access thresholds. Unit ms. receiver-jvm: default: receiver-clr: default: service-mesh: default: bufferPath: ${SW_SERVICE_MESH_BUFFER_PATH:../mesh-buffer/} # Path to trace buffer files, suggest to use absolute path bufferOffsetMaxFileSize: ${SW_SERVICE_MESH_OFFSET_MAX_FILE_SIZE:100} # Unit is MB bufferDataMaxFileSize: ${SW_SERVICE_MESH_BUFFER_DATA_MAX_FILE_SIZE:500} # Unit is MB bufferFileCleanWhenRestart: ${SW_SERVICE_MESH_BUFFER_FILE_CLEAN_WHEN_RESTART:false} istio-telemetry: default: envoy-metric: default: #receiver_zipkin: # default: # host: ${SW_RECEIVER_ZIPKIN_HOST:} # port: ${SW_RECEIVER_ZIPKIN_PORT:9411} # contextPath: ${SW_RECEIVER_ZIPKIN_CONTEXT_PATH:/} query: graphql: path: ${SW_QUERY_GRAPHQL_PATH:/graphql} alarm: default: telemetry: none: log4j2.xml: |- <Configuration status="INFO"> <Appenders> <Console name="Console" target="SYSTEM_OUT"> <PatternLayout charset="UTF-8" pattern="%d - %c -%-4r [%t] %-5p %x - %m%n"/> </Console> </Appenders> <Loggers> <logger name="org.eclipse.jetty" level="INFO"/> <logger name="org.apache.zookeeper" level="INFO"/> <logger name="" level="INFO"/> <logger name="io.grpc.netty" level="INFO"/> <logger name="org.apache.skywalking.oap.server.receiver.istio.telemetry" level="DEBUG"/> <Root level="INFO"> <AppenderRef ref="Console"/> </Root> </Loggers> </Configuration> alarm-settings.yml: |- rules: service_resp_time_rule: indicator-name: service_resp_time include-names: - dubbox-provider - dubbox-consumer threshold: 1000 op: ">" period: 10 count: 1 webhooks: component-libraries.yml: |- Tomcat: id: 1 languages: Java HttpClient: id: 2 languages: Java,C#,Node.js Dubbo: id: 3 languages: Java H2: id: 4 languages: Java Mysql: id: 5 languages: Java,C#,Node.js ORACLE: id: 6 languages: Java Redis: id: 7 languages: Java,C#,Node.js Motan: id: 8 languages: Java MongoDB: id: 9 languages: Java,C#,Node.js Resin: id: 10 languages: Java Feign: id: 11 languages: Java OKHttp: id: 12 languages: Java SpringRestTemplate: id: 13 languages: Java SpringMVC: id: 14 languages: Java Struts2: id: 15 languages: Java NutzMVC: id: 16 languages: Java NutzHttp: id: 17 languages: Java JettyClient: id: 18 languages: Java JettyServer: id: 19 languages: Java Memcached: id: 20 languages: Java ShardingJDBC: id: 21 languages: Java PostgreSQL: id: 22 languages: Java,C#,Node.js GRPC: id: 23 languages: Java ElasticJob: id: 24 languages: Java RocketMQ: id: 25 languages: Java httpasyncclient: id: 26 languages: Java Kafka: id: 27 languages: Java ServiceComb: id: 28 languages: Java Hystrix: id: 29 languages: Java Jedis: id: 30 languages: Java SQLite: id: 31 languages: Java,C# h2-jdbc-driver: id: 32 languages: Java mysql-connector-java: id: 33 languages: Java ojdbc: id: 34 languages: Java Spymemcached: id: 35 languages: Java Xmemcached: id: 36 languages: Java postgresql-jdbc-driver: id: 37 languages: Java rocketMQ-producer: id: 38 languages: Java rocketMQ-consumer: id: 39 languages: Java kafka-producer: id: 40 languages: Java kafka-consumer: id: 41 languages: Java mongodb-driver: id: 42 languages: Java SOFARPC: id: 43 languages: Java ActiveMQ: id: 44 languages: Java activemq-producer: id: 45 languages: Java activemq-consumer: id: 46 languages: Java Elasticsearch: id: 47 languages: Java transport-client: id: 48 languages: Java http: id: 49 languages: Java,C#,Node.js rpc: id: 50 languages: Java,C#,Node.js RabbitMQ: id: 51 languages: Java rabbitmq-producer: id: 52 languages: Java rabbitmq-consumer: id: 53 languages: Java Canal: id: 54 languages: Java Gson: id: 55 languages: Java Redisson: id: 56 languages: Java AspNetCore: id: 3001 languages: C# EntityFrameworkCore: id: 3002 languages: C# SqlClient: id: 3003 languages: C# CAP: id: 3004 languages: C# StackExchange.Redis: id: 3005 languages: C# SqlServer: id: 3006 languages: C# Npgsql: id: 3007 languages: C# MySqlConnector: id: 3008 languages: C# EntityFrameworkCore.InMemory: id: 3009 languages: C# EntityFrameworkCore.SqlServer: id: 3010 languages: C# EntityFrameworkCore.Sqlite: id: 3011 languages: C# Pomelo.EntityFrameworkCore.MySql: id: 3012 languages: C# Npgsql.EntityFrameworkCore.PostgreSQL: id: 3013 languages: C# InMemoryDatabase: id: 3014 languages: C# AspNet: id: 3015 languages: C# # NoeJS components # [4000, 5000) for Node.js agent HttpServer: id: 4001 languages: Node.js express: id: 4002 languages: Node.js Egg: id: 4003 languages: Node.js Koa: id: 4004 languages: Node.js # Component Server mapping defines the server display names of some components # e.g. # Jedis is a client library in Java for Redis server Component-Server-Mappings: mongodb-driver: MongoDB rocketMQ-producer: RocketMQ rocketMQ-consumer: RocketMQ kafka-producer: Kafka kafka-consumer: Kafka activemq-producer: ActiveMQ activemq-consumer: ActiveMQ rabbitmq-producer: RabbitMQ rabbitmq-consumer: RabbitMQ postgresql-jdbc-driver: PostgreSQL Xmemcached: Memcached Spymemcached: Memcached h2-jdbc-driver: H2 mysql-connector-java: Mysql Jedis: Redis StackExchange.Redis: Redis Redisson: Redis SqlClient: SqlServer Npgsql: PostgreSQL MySqlConnector: Mysql EntityFrameworkCore.InMemory: InMemoryDatabase EntityFrameworkCore.SqlServer: SqlServer EntityFrameworkCore.Sqlite: SQLite Pomelo.EntityFrameworkCore.MySql: Mysql Npgsql.EntityFrameworkCore.PostgreSQL: PostgreSQL transport-client: Elasticsearch
apiVersion: v1 kind: Service metadata: name: oap labels: service: oap spec: ports: - port: 12800 name: rest - port: 11800 name: grpc - port: 1234 name: page selector: app: oap
The official recommendation here is to use multiple copies.
apiVersion: apps/v1 kind: Deployment metadata: name: oap spec: replicas: 1 selector: matchLabels: app: oap template: metadata: labels: app: oap release: skywalking spec: serviceAccountName: skywalking-oap initContainers: - name: sidecar-init image: evanxuhe/skywalking-agent-sidecar:6.1.0 # Container Mirror, Containing Static Resource Files command: ["cp", "-r", "/data/agent", "/sidecar"] volumeMounts: - name: sidecar mountPath: /sidecar containers: - name: oap image: evanxuhe/skywalking-oap-server:6.1.0 imagePullPolicy: IfNotPresent ports: - containerPort: 11800 name: grpc - containerPort: 12800 name: rest resources: requests: memory: 1Gi limits: memory: 2Gi env: - name: JAVA_OPTS value: -Xms256M -Xmx512M #Load configuration from volume otherwise read only file system will be reported because of collision with mirror path when config is mounted - name: SW_L0AD_CONFIG_FILE_FROM_VOLUME value: "true" # - name: SW_STORAGE # value: elasticsearch # - name: SW_STORAGE_ES_CLUSTER_NODES # value: "" # - name: SW_CLUSTER # value: kubernetes # - name: SW_CLUSTER_K8S_NAMESPACE # value: skywalking # - name: SW_SERVICE_MESH_OFFSET_MAX_FILE_SIZE # value: "200" # - name: SW_RECEIVER_BUFFER_DATA_MAX_FILE_SIZE # value: "800" # - name: SW_RECEIVER_BUFFER_FILE_CLEAN_WHEN_RESTART # value: "true" - name: SKYWALKING_oap_UID valueFrom: fieldRef: fieldPath: metadata.uid volumeMounts: - name: sidecar mountPath: /sidecar - name: config mountPath: /skywalking/config volumes: - name: sidecar emptyDir: {} - name: config configMap: name: oap-config
web pages
apiVersion: v1 kind: Service metadata: name: ui labels: service: ui spec: ports: - port: 8080 name: page type: ClusterIP selector: app: ui
apiVersion: apps/v1 kind: Deployment metadata: name: ui-deployment labels: app: ui spec: replicas: 1 selector: matchLabels: app: ui template: metadata: labels: app: ui spec: containers: - name: ui image: apache/skywalking-ui:6.1.0 ports: - containerPort: 8080 name: page resources: requests: memory: 1Gi limits: memory: 2Gi env: - name: SW_OAP_ADDRESS value: oap:12800 volumeMounts: - name: skywalking-web mountPath: /skywalking/webapp/webapp.yml subPath: webapp.yml volumes: - name: skywalking-web configMap: name: skywalking-web-key
Default user: admin default password: admin
Use configmap to mount and modify user name and password
apiVersion: v1 kind: ConfigMap metadata: name: skywalking-web-key labels: app: skywalking-web-key data: webapp.yml: | server: port: 8080 collector: path: /graphql ribbon: ReadTimeout: 10000 # Point to all backend's restHost:restPort, split by listOfServers: security: user: # username admin: # password password: xxx
apiVersion: extensions/v1beta1 kind: Ingress metadata: name: skywalking spec: tls: - hosts: - secretName: xxxxx-tls rules: - host: http: paths: - backend: serviceName: ui servicePort: 8080 path: /
agent deployment
docker Mirror Making
- adopt Official website Download the component package.
- Unzip the component package and copy the agent directory under the component package
- Writing Dockerfile files
FROM busybox:1.31 MAINTAINER daniel "" WORKDIR /skywalking ADD activations /skywalking/activations/ ADD config /skywalking/config/ ADD logs /skywalking/logs/ ADD optional-plugins /skywalking ADD plugins /skywalking ADD skywalking-agent.jar /skywalking
agent Embedding
Probes can be added as injection containers. Probes can also be built into service mirrors.
#### containers: - args: - -Xms1024m - -Xmx1024m - - -javaagent:/sidecar/skywalking/skywalking-agent.jar ## Startup probe - -jar - /app.war ## The path where the service war package is located #### initContainers: ## Injection container - name: sidecar image: skywalking-agent-sidecar:6.1.0 # Container Mirror, Containing Static Resource Files imagePullPolicy: IfNotPresent command: ["cp", "-r", "/skywalking", "/sidecar"] volumeMounts: - name: sidecar mountPath: /sidecar #### - name: SW_AGENT_NAME value: XXXXXX ## Service name monitored by jvm - name: SW_AGENT_COLLECTOR_BACKEND_SERVICES value: ## CLUSTER-IP: Port of oap #### volumeMounts: ## Mounted on the service - mountPath: /sidecar name: sidecar #### - emptyDir: {} name: sidecar
