ELK+Filebeat+Kafka+Zk log collection, analysis and statistics system

Posted by mattcooper on Fri, 28 Jan 2022 17:33:31 +0100

The following is a detailed note I made when I built the elk (elastic search + kibana + logstash) for the company last time. It's just that you can collect it for future use.

abstract

Solving system faults through log analysis is the main means to find problems. Logs include many types, including program logs, system logs, security logs, and so on. By analyzing the log, we can not only prevent the occurrence of faults, but also find clues when faults occur, quickly locate the fault point and solve the problems in time.

In a distributed system, if a back-end service deploys dozens of nodes, it will be very troublesome for you to view and analyze logs. Therefore, there is a great need for such a platform to collect various system logs. For this scenario, the most popular solution in the industry should be this elk solution.

Architecture diagram

[the external chain image transfer fails. The source station may have anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-jzfks7t4-1624083121762)( https://note.youdao.com/yws/res/6213/WEBRESOURCE5d824b9cea5056a96d401ec2670ed053 )]

Component introduction

Filebeat

It belongs to Beats, a lightweight data collection engine. Based on the original logstash forwarder source code. In other words: Filebeat is the new version of logstack forwarder, and it will also be the first choice of ELK Stack in the Agent

KafKa

Data buffer queue. As a message queue, it decouples the processing process and improves the scalability. With peak processing capability, the use of message queue can make key components withstand the sudden access pressure without complete collapse due to sudden overloaded requests (kafka is used to collect logs asynchronously because the amount of logs is very large)

Zookeeper

ZooKeeper is a distributed coordination service. Its main function is to provide consistency services for distributed systems. The work provided (Kafka's operation depends on it)

logstash

It is mainly used to search, analyze and filter logs. A tool for managing logs and events. You can use it to collect logs, convert logs, parse logs, and provide them as data to other module calls, such as search, storage, etc

Elasticsearch

Is a Lucene based search server. It provides three functions: collecting, analyzing and storing data. It provides a distributed multi-user full-text search engine based on RESTful web interface. Elasticsearch is developed in java and released as an open source under the Apache license terms. It is a popular enterprise search engine. Designed for cloud computing, it can achieve real-time search, stable, reliable, fast, easy to install and use

Kibana

It is an excellent front-end log display framework. It can convert logs into various icons in great detail and provide users with strong data visualization support. It can search and display index data stored in Elasticsearch. Using it, you can easily use charts, tables, maps to display and analyze data

Environment construction

Filebeat installation

filebeat-6.6.0-linux-x86_64.tar.gz to the application server (each application server needs to be equipped with a fileBeat)
Unzip filebeat-6.6.0-linux-x86_64.tar.gz to the specified file

tar -zxvf filebeat-6.6.0-linux-x86_64.tar.gz -C /home/app

Remove filebeat Group permissions for YML

chmod go-w /home/app/filebeat-6.6.0-linux-x86_64/filebeat.yml

Enter filebeat-6.6.0-linux-x86_64 folder modify filebeat YML profile

vim /home/app/filebeat-6.6.0-linux-x86_64/filebeat.yml

Open it and edit it as follows (note that there should be a space after the colon, otherwise an error will be reported)

filebeat.inputs:
-type: log
 enabled: true
 paths:
   - /home/application/gateway.log    #Applied log path
 fields:
    type: gateway     #Logs that distinguish between different applications are generally obtained by application name
 fields_under_root: true
 multiline.pattern: '^[0-9]{4}[0-9]{2}[0-9]{2}'  #The day error log is merged into one line, and the expression is compatible with the regular. It is configured according to the prefix of the normal log
 multiline.negate: true
 multiline. after
-type: log
 enabled: true
 paths:
   - /home/application/sso.log    #Applied log path
 fields:
    type: sso     #Logs that distinguish between different applications are generally obtained by application name
 fields_under_root: true
 multiline.pattern: '^[0-9]{4}[0-9]{2}[0-9]{2}'  #The day error log is merged into one line, and the expression is compatible with the regular. It is configured according to the prefix of the normal log
 multiline.negate: true
 multiline. after

Startup script

nohup /home/app/filebeat-6.6.0-linux-x86_64/filebeat -e -c /home/app/filebeat-6.6.0-linux-x86_64/filebeat.log 2>&1 &

When the program stops when the close command interface appears, you need to exit with the command exit

KafKa installation

Will kafka_2.12-2.1.1.tgz to target server
Unzip kafka_2.12-2.1.1.tgz to the specified file

tar -zxvf kafka_2.12-2.1.1.tgz -C /home/app

Enter kafka_2.12-2.1.1/config/modify sever Properties configuration file

vim /home/app/kafka_2.12-2.1.1/sever.properties

Modify the following configuration

brocker.id=1
advertised.listeners=PLAINTEXT://128.33.18.12:9092 # changed to the default 9092 for the IP port number of the machine
log.dirs=/data/kafka-logs  #Set log output directory

Startup script

nohup /home/app/kafka_2.12-2.1.1/bin/kafka-sever-start.sh /home/app/kafka_2.12-2.1.1/kafka.log 2>&1 &

Zookeeper installation

Set zookeeper 3.14.13. tar. GZ to the target server
Unzip zookeeper 3.14.13. tar. GZ to the specified file

tar zookeeper.3.14.13.tar.gz -C /home/app

create profile

cp zoo_sample.cfg zoo.cfg

Edit profile

dataDir=/usr/local/zookeeper/data  #Configure ZK data directory
clientPort=2181                 #The port number used to receive client requests
dataLogDir=/usr/local/zookeeper/logs  #Configure the log directory of ZK

Startup script

./zkSever.sh start

logstash installation

Log stash-6.6.0 tar. GZ to the target server
Unzip logstash-6.6.0 tar. GZ to the specified file

tar logstash-6.6.0.tar.gz -C /home/app

Configuration file JVM Options file

vim /home/app/logstash-6.6.0/config/jvm.options

Configure the following properties

-Xms:256
-Xmx:256

Modify master profile

vim /home/app/logstash-6.6.0/config/logstash.conf

Data source configuration

input {
    kafka{
        boostrap_severs => "128.33.18.12:9092"
        topics => ["test"] #Message subject configuration
        group_id => "logstash"
        codec => "json"
    }
}

Filter configuration

filter {
    if[type] == "sso" { #type corresponding to filebeat setting
        grok{   #Since the time of @ timestamp stored in es is the consumption time of logstash, set filtering here to change it to the time of log generation in the log
            match => {"message" => "(?<datetime>\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2})"}
        }
        date{
            match => ["datetime","yyyy-MM-dd HH:mm:ss"]
            target => "of@timestamp"
        }
        mutate{ #Filter fields stored in es
            remove_field => ["beat","beat","offset","source","input","log","prospector","@version","ecs","agent","datetime"]
        }
    }
}

Output configuration

output{
     if[type] == "sso" { 
        elasticsearch{
            hosts => ["128.33.18.12:9200"]
            index => "%{YYYY-MM-dd}" #Index by date here
        }
     }
}

Startup script

nohup /home/app/logstash-6.6.0/bin/logstash -f /home/app/logstash-6.6.0/logstash.log 2>&1 &

Elasticsearch installation

Elasticsearch-6.6.0 tar. GZ to the target server
Elasticsearch-6.6.0 tar. GZ to the specified file

tar take elasticsearch-6.6.0.tar.gz -C /home/app

Modify system configuration parameters (root user is required)

vim /etc/security/limits.conf

Modify the following configuration

    *         soft    nofile          65536
    *         hard    nofile          65536
    *         soft    nproc           4096
    *         hard    nproc           4096

Modify kernel parameters

vim /etc/sysctl.conf

Modify as follows

vm.max_map_count=655360

Make the command modification effective

sysctl -p

Modify master profile

vim /home/app/elasticsearch-6.6.0/config/elasticsearch.yml

Modify as follows

path.data: /home/app/elasticsearch-6.6.0/data
path.logs: /home/app/elasticsearch-6.6.0/logs
network.host: 0.0.0.0
http.port: 9200
node.name: es-node1
cluster.initial_master_nodes: ["es-node1"]

vim /home/app/elasticsearch-6.6.0/config/jvm.options

Modify as follows

-Xms2g
-Xmx2g

Startup script

./elasticsearch -d

Kibana installation

Kibana-6.6.0 tar. GZ to the target server
Kibana-6.6.0 tar. GZ to the specified file

tar take kibana-6.6.0.tar.gz -C /home/app

Modify profile

vim /home/app/take kibana-6.6.0/config/kibana.yml

Modify as follows

#Service providing port, listening port
server.port: 5601
#Host address, which can be ip or host name
server.host: 128.33.18.12
#kibana can access multiple URL s of the es server, separated by commas
elasticsearch.hosts: ["http://128.33.18.12:9200"]
#Chinese configuration
i18n.locale: "zh-CN"

Startup script

nohup /home/app/kibana-6.6.0/bin/kibana /home/app/kibana-6.6.0/kibana.log 2>&1 &

Enter the management interface http://128.33.18.12:5601 view log

Welcome to the official account: drink water with no pleasure. The article is first on the official account.

Programmer Think

ELK+Filebeat+Kafka+Zk log collection, analysis and statistics system

abstract

Architecture diagram

Component introduction

Filebeat

KafKa

Zookeeper

logstash

Elasticsearch

Kibana

Environment construction

Filebeat installation

KafKa installation

Zookeeper installation

logstash installation

Elasticsearch installation

Kibana installation

Hot Topics