ELK enterprise log analysis system

Posted by ahmed17 on Thu, 20 Jan 2022 00:12:21 +0100

1, ELK

1. ELK introduction

ELK platform is a complete set of centralized log processing solution, which combines ElasticSearch, Logstash and Kiabana to meet more powerful user requirements for log query, sorting and statistics.

Elastic search: it is a distributed storage search engine based on Lucene (a full-text search engine architecture), which is used to store all kinds of logs.
- Elasticsearch is developed in Java and allows users to communicate with elasticsearch through a browser through a RESTful Web interface.
- Elasticsearch is a distributed search and analysis engine. Its advantage is that it can store, search and analyze large amounts of data in near real time.
Logstash: as a data collection engine. It supports dynamic data collection from various data sources, filtering, analyzing, enriching and unifying the data, and then storing it to the location specified by the user, which is generally sent to Elasticsearch.
- Logstash is written in JRuby language and runs on the Java virtual machine (JVM). It is a powerful data processing tool that can realize data transmission, format processing and formatted output. Logstash has powerful plug-in functions and is commonly used for log processing.
Kiabana: it is based on node The display tool developed by. JS can provide a graphical log analysis Web interface display for Logstash and ElasticSearch, and can summarize, analyze and search important data logs.
Filebeat: lightweight open source log file data collector. Usually, install filebeat on the client that needs to collect data and specify the directory and log format. Filebeat can quickly collect data and send it to logstash for parsing, or directly send it to Elasticsearch for storage. In terms of performance, it has obvious advantages over logstash running on JVM and is a substitute for it.

2. Role of ELK

Logs mainly include system logs, application logs and security logs. System operation and maintenance personnel and developers can understand the software and hardware information of the server through the log, check the errors in the configuration process and the causes of the errors. Regular analysis of logs can understand the load, performance and security of the server, so as to take timely measures to correct errors.

Often, the logs of a single machine can be basically analyzed by using grep, awk and other tools, but when the logs are scattered and stored on different devices. If you manage dozens or hundreds of servers, you are still using the traditional method of logging in to each machine in turn. Does this feel cumbersome and inefficient. It is imperative that we use centralized log management, such as the open source syslog, to collect and summarize the logs on all servers. After centralized log management, log statistics and retrieval become a more troublesome thing. Generally, we can use grep, awk, wc and other Linux commands to achieve retrieval and statistics, but we still have a little difficulty in using this method for higher requirements such as query, sorting and statistics and a large number of machines.

Generally, a large-scale system is a distributed deployment architecture. Different service modules are deployed on different servers. When a problem occurs, it is necessary to locate the specific server and service module according to the key information exposed by the problem, and build a centralized log system, which can improve the efficiency of locating the problem.

3. Basic characteristics of complete log system

Collection: it can collect log data from multiple sources

Transmission: it can analyze, filter and transmit the log data to the storage system stably

Storage: store log data

Analysis: supports UI analysis

Warning: it can provide error reporting and monitoring mechanism

4. How ELK works

(1) Deploy Logstash on all servers that need to collect logs; Or you can centralize the log management on the log server and deploy Logstash on the log server.

(2) Logstash collects logs, formats them and outputs them to the Elasticsearch cluster.

(3) Elasticsearch indexes and stores formatted data.

(4) Kibana queries the data from the ES cluster, generates charts, and displays the front-end data.

2, Deploy ELK log analysis system

1. Experimental requirements

1. Configure ELK log analysis cluster
2. Collecting logs using Logstash
3. Viewing analysis logs using Kibana

2. Environment configuration

host	operating system	IP address	Installation package / software / tools
node1	CentOS7	192.168.239.101	Elasticsearch
node2	CentOS7	192.168.239.102	Elasticsearch
apache	CentOS7	192.168.239.103	httpd / Logstash
Real machine	Windows	192.168.239.100	-

All nodes, turn off the system firewall and security mechanism

systemctl stop firewalld.service
setenforce 0

node1(192.168.239.101)

hostnamectl set-hostname node1
su -

node2(192.168.239.102)

hostnamectl set-hostname node2
su -

apache(192.168.239.103)

hostnamectl set-hostname apache
su -

3. Configure the elasticsearch environment (node1, node2)

node1(192.168.239.101)
node2(192.168.239.102)
Only one host name is different and marked

echo '192.168.239.101 node1' >> /etc/hosts
echo '192.168.239.102 node2' >> /etc/hosts

	
cd /opt
rz elasticsearch-5.5.0.rpm

rpm -ivh elasticsearch-5.5.0.rpm

systemctl daemon-reload
systemctl enable elasticsearch.service

cp /etc/elasticsearch/elasticsearch.yml /etc/elasticsearch/elasticsearch.yml.bak

vim /etc/elasticsearch/elasticsearch.yml

==17==
cluster.name: my-elk-cluster

==23==
node.name: node1						#Here, the node2 node is set to node2

==33==
path.data: /data/elk_data 

==37==
path.logs: /var/log/elasticsearch/

==43==
bootstrap.memory_lock: false 

==55==
network.host: 0.0.0.0

==59==
http.port: 9200

==68==
discovery.zen.ping.unicast.hosts: ["node1", "node2"]

grep -v "^#" /etc/elasticsearch/elasticsearch.yml

mkdir -p /data/elk_data
chown elasticsearch:elasticsearch /data/elk_data/

systemctl start elasticsearch
netstat -antp |grep 9200

1. At this time, you can access the real machine to view the node information

http://192.168.239.101:9200
http://192.168.239.102:9200

2. The real machine detects the cluster health and views the cluster status

http://192.168.239.101:9200/_cluster/health?pretty
http://192.168.239.102:9200/_cluster/health?pretty

http://192.168.239.101:9200/_cluster/state?pretty
http://192.168.239.102:9200/_cluster/state?pretty

4. Install the elasticsearch head plug-in

The above way to view the cluster is inconvenient. We can manage the cluster by installing the elastic search head plug-in

node1(192.168.239.101)
node2(192.168.239.102)

yum -y install gcc gcc-c++ make 

cd /opt
rz node-v8.2.1.tar.gz
tar xzvf node-v8.2.1.tar.gz
cd node-v8.2.1/
./configure 
make -j2 && make install

cd /usr/local/src
rz phantomjs-2.1.1-linux-x86_64.tar.bz2

tar xjvf phantomjs-2.1.1-linux-x86_64.tar.bz2


cd phantomjs-2.1.1-linux-x86_64/bin
cp phantomjs /usr/local/bin

cd /usr/local/src/
rz elasticsearch-head.tar.gz

tar xzvf elasticsearch-head.tar.gz

cd elasticsearch-head/
npm install

vim /etc/elasticsearch/elasticsearch.yml
==Last line insertion==
http.cors.enabled: true
http.cors.allow-origin: "*"

systemctl restart elasticsearch

cd /usr/local/src/elasticsearch-head/
npm run start &

netstat -lnupt |grep 9100
netstat -lnupt |grep 9200

Open browser input on real machine

http://192.168.239.101:9100 
http://192.168.239.102:9100

You can see that the cluster is healthy and green
Enter in the column following elastic search

http://192.168.239.101:9200   
http://192.168.239.102:9200

Then click the connection point to find: cluster health value: green (0 of 0)
● node1 information action
★ node2 information action

node1(192.168.239.101)
The index is index demo and the type is test. You can see the successful creation

curl -XPUT 'localhost:9200/index-demo/test/1?pretty&pretty' -H 'content-Type: application/json' -d '{"user":"zhangsan","mesg":"hello world"}'

Real machine access to view index information

http://192.168.239.101:9100

Topics: ELK

Programmer Think