ELK+Filebeat+Kafka+Zk log collection, analysis and statistics system

Posted by mattcooper on Fri, 28 Jan 2022 17:33:31 +0100

The following is a detailed note I made when I built the elk (elastic search + kibana + logstash) for the company last time. It's just that you can collect it for future use.

abstract

Solving system faults through log analysis is the main means to find problems. Logs include many types, including program logs, system logs, security logs, and so on. By analyzing the log, we can not only prevent the occurrence of faults, but also find clues when faults occur, quickly locate the fault point and solve the problems in time.

In a distributed system, if a back-end service deploys dozens of nodes, it will be very troublesome for you to view and analyze logs. Therefore, there is a great need for such a platform to collect various system logs. For this scenario, the most popular solution in the industry should be this elk solution.

Architecture diagram

[the external chain image transfer fails. The source station may have anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-jzfks7t4-1624083121762)( https://note.youdao.com/yws/res/6213/WEBRESOURCE5d824b9cea5056a96d401ec2670ed053 )]

Component introduction

Filebeat

It belongs to Beats, a lightweight data collection engine. Based on the original logstash forwarder source code. In other words: Filebeat is the new version of logstack forwarder, and it will also be the first choice of ELK Stack in the Agent

KafKa

Data buffer queue. As a message queue, it decouples the processing process and improves the scalability. With peak processing capability, the use of message queue can make key components withstand the sudden access pressure without complete collapse due to sudden overloaded requests (kafka is used to collect logs asynchronously because the amount of logs is very large)

Zookeeper

ZooKeeper is a distributed coordination service. Its main function is to provide consistency services for distributed systems. The work provided (Kafka's operation depends on it)

logstash

It is mainly used to search, analyze and filter logs. A tool for managing logs and events. You can use it to collect logs, convert logs, parse logs, and provide them as data to other module calls, such as search, storage, etc

Elasticsearch

Is a Lucene based search server. It provides three functions: collecting, analyzing and storing data. It provides a distributed multi-user full-text search engine based on RESTful web interface. Elasticsearch is developed in java and released as an open source under the Apache license terms. It is a popular enterprise search engine. Designed for cloud computing, it can achieve real-time search, stable, reliable, fast, easy to install and use

Kibana

It is an excellent front-end log display framework. It can convert logs into various icons in great detail and provide users with strong data visualization support. It can search and display index data stored in Elasticsearch. Using it, you can easily use charts, tables, maps to display and analyze data

Environment construction

Filebeat installation

  1. filebeat-6.6.0-linux-x86_64.tar.gz to the application server (each application server needs to be equipped with a fileBeat)

  2. Unzip filebeat-6.6.0-linux-x86_64.tar.gz to the specified file

tar -zxvf filebeat-6.6.0-linux-x86_64.tar.gz -C /home/app
  1. Remove filebeat Group permissions for YML
chmod go-w /home/app/filebeat-6.6.0-linux-x86_64/filebeat.yml
  1. Enter filebeat-6.6.0-linux-x86_64 folder modify filebeat YML profile
vim /home/app/filebeat-6.6.0-linux-x86_64/filebeat.yml

Open it and edit it as follows (note that there should be a space after the colon, otherwise an error will be reported)

filebeat.inputs:
-type: log
 enabled: true
 paths:
   - /home/application/gateway.log    #Applied log path
 fields:
    type: gateway     #Logs that distinguish between different applications are generally obtained by application name
 fields_under_root: true
 multiline.pattern: '^[0-9]{4}[0-9]{2}[0-9]{2}'  #The day error log is merged into one line, and the expression is compatible with the regular. It is configured according to the prefix of the normal log
 multiline.negate: true
 multiline. after
-type: log
 enabled: true
 paths:
   - /home/application/sso.log    #Applied log path
 fields:
    type: sso     #Logs that distinguish between different applications are generally obtained by application name
 fields_under_root: true
 multiline.pattern: '^[0-9]{4}[0-9]{2}[0-9]{2}'  #The day error log is merged into one line, and the expression is compatible with the regular. It is configured according to the prefix of the normal log
 multiline.negate: true
 multiline. after
  1. Startup script
nohup /home/app/filebeat-6.6.0-linux-x86_64/filebeat -e -c /home/app/filebeat-6.6.0-linux-x86_64/filebeat.log 2>&1 &
  1. When the program stops when the close command interface appears, you need to exit with the command exit

KafKa installation

  1. Will kafka_2.12-2.1.1.tgz to target server

  2. Unzip kafka_2.12-2.1.1.tgz to the specified file

tar -zxvf kafka_2.12-2.1.1.tgz -C /home/app
  1. Enter kafka_2.12-2.1.1/config/modify sever Properties configuration file
vim /home/app/kafka_2.12-2.1.1/sever.properties

Modify the following configuration

brocker.id=1
advertised.listeners=PLAINTEXT://128.33.18.12:9092 # changed to the default 9092 for the IP port number of the machine
log.dirs=/data/kafka-logs  #Set log output directory
  1. Startup script
nohup /home/app/kafka_2.12-2.1.1/bin/kafka-sever-start.sh /home/app/kafka_2.12-2.1.1/kafka.log 2>&1 &

Zookeeper installation

  1. Set zookeeper 3.14.13. tar. GZ to the target server

  2. Unzip zookeeper 3.14.13. tar. GZ to the specified file

tar zookeeper.3.14.13.tar.gz -C /home/app
  1. create profile
cp zoo_sample.cfg zoo.cfg
  1. Edit profile
dataDir=/usr/local/zookeeper/data  #Configure ZK data directory
clientPort=2181                 #The port number used to receive client requests
dataLogDir=/usr/local/zookeeper/logs  #Configure the log directory of ZK
  1. Startup script
./zkSever.sh start 

logstash installation

  1. Log stash-6.6.0 tar. GZ to the target server

  2. Unzip logstash-6.6.0 tar. GZ to the specified file

tar logstash-6.6.0.tar.gz -C /home/app
  1. Configuration file JVM Options file
vim /home/app/logstash-6.6.0/config/jvm.options

Configure the following properties

-Xms:256
-Xmx:256
  1. Modify master profile
vim /home/app/logstash-6.6.0/config/logstash.conf

Data source configuration

input {
    kafka{
        boostrap_severs => "128.33.18.12:9092"
        topics => ["test"] #Message subject configuration
        group_id => "logstash"
        codec => "json"
    }
}

Filter configuration

filter {
    if[type] == "sso" { #type corresponding to filebeat setting
        grok{   #Since the time of @ timestamp stored in es is the consumption time of logstash, set filtering here to change it to the time of log generation in the log
            match => {"message" => "(?<datetime>\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2})"}
        }
        date{
            match => ["datetime","yyyy-MM-dd HH:mm:ss"]
            target => "of@timestamp"
        }
        mutate{ #Filter fields stored in es
            remove_field => ["beat","beat","offset","source","input","log","prospector","@version","ecs","agent","datetime"]
        }
    }
}

Output configuration

output{
     if[type] == "sso" { 
        elasticsearch{
            hosts => ["128.33.18.12:9200"]
            index => "%{YYYY-MM-dd}" #Index by date here
        }
     }
}
  1. Startup script
nohup /home/app/logstash-6.6.0/bin/logstash -f /home/app/logstash-6.6.0/logstash.log 2>&1 &

Elasticsearch installation

  1. Elasticsearch-6.6.0 tar. GZ to the target server

  2. Elasticsearch-6.6.0 tar. GZ to the specified file

tar take elasticsearch-6.6.0.tar.gz -C /home/app
  1. Modify system configuration parameters (root user is required)
vim /etc/security/limits.conf

Modify the following configuration

    *         soft    nofile          65536
    *         hard    nofile          65536
    *         soft    nproc           4096
    *         hard    nproc           4096

Modify kernel parameters

vim /etc/sysctl.conf

Modify as follows

vm.max_map_count=655360

Make the command modification effective

sysctl -p
  1. Modify master profile
vim /home/app/elasticsearch-6.6.0/config/elasticsearch.yml 

Modify as follows

path.data: /home/app/elasticsearch-6.6.0/data
path.logs: /home/app/elasticsearch-6.6.0/logs
network.host: 0.0.0.0
http.port: 9200
node.name: es-node1
cluster.initial_master_nodes: ["es-node1"]
vim /home/app/elasticsearch-6.6.0/config/jvm.options

Modify as follows

-Xms2g
-Xmx2g
  1. Startup script
./elasticsearch -d

Kibana installation

  1. Kibana-6.6.0 tar. GZ to the target server

  2. Kibana-6.6.0 tar. GZ to the specified file

tar take kibana-6.6.0.tar.gz -C /home/app
  1. Modify profile
vim /home/app/take kibana-6.6.0/config/kibana.yml 

Modify as follows

#Service providing port, listening port
server.port: 5601
#Host address, which can be ip or host name
server.host: 128.33.18.12
#kibana can access multiple URL s of the es server, separated by commas
elasticsearch.hosts: ["http://128.33.18.12:9200"]
#Chinese configuration
i18n.locale: "zh-CN" 
  1. Startup script
nohup /home/app/kibana-6.6.0/bin/kibana /home/app/kibana-6.6.0/kibana.log 2>&1 &
  1. Enter the management interface http://128.33.18.12:5601 view log

Welcome to the official account: drink water with no pleasure. The article is first on the official account.