ELK log analysis system

A management tool for simplified log analysis Elasticsearch (ES), Logstash and Kibana are three open source tools. The official website is: https://www.elastic.co/products

ES(nosql non relational database): storage function and index
Logstash (log collection): take the log from the application server and output it to es after format conversion

Collect / collect log s through the input function

filter: formatting data

Output: log output to es database

Kibana (display tool): display the data in es in the browser and display it through the UI interface (you can process the log according to your own needs for easy viewing and reading)

2.1 Logstash management includes four tools

Packetbeat (collect network traffic data)
Topbeat (collect CPU and memory usage data at system, process and file system levels)
Filebeat (collecting file data) is a lightweight tool compared with Logstash
Winlogbeat (collect Windows event log data)

2.2 log processing steps

1. Logstash collects logs generated by AppServer and centralizes log management
2. Format the log and store it in the ElasticSearch cluster
3. Index and store the formatted data (Elasticsearch)
4. Kibana queries the data from the Es cluster, generates charts, and then returns them to browsers

2, Basic core concepts of Elasticsearch

Relationship between relational database and Elasticsearch

mysql	Elasticsearch
Database database	Index index
table	Type type
Row row	Document document
Column column	attribute

Near real time (NRT)
Elastic search is a near real-time search platform, which means that there is a slight delay (usually 1 second) from indexing a document until the document can be searched
Cluster
The cluster has a unique identifier name, which is elasticsearch by default;
Cluster is organized by one or more nodes. They jointly hold the whole data and provide index and search functions together;
One of the nodes is the primary node, which can be elected and provides cross node joint index and search functions;
The cluster name is very important. Each node is added to its cluster based on the cluster name
Node
Node is a single server, which is a part of the cluster, stores data and participates in the index and search functions of the cluster;
Like clusters, nodes are also identified by name. By default, the character name is randomly assigned when the node is started, which can be defined by itself;
The name is used to identify the node corresponding to the server in the cluster.
Index
An index is a collection of documents with somewhat similar characteristics;
An index is identified by a name (which must be all lowercase letters), and we should use this name when we want to index, search, update and delete the documents corresponding to the index.
Type
In an index, you can define one or more types. A type is a logical classification / partition of your index; Typically, a type is defined for a document that has a common set of fields
document
Documents are represented in JSON (JavaScript object notation) format, which is a ubiquitous Internet data interaction format.
Although a document is physically located in an index, in fact, a document must be indexed and assigned a type within an index.
shards
That is why es as a search engine is fast:
In practice, the data stored in the index may exceed the hardware limit of a single node. For example, a 1 billion document requires 1TB of space, which may not be suitable for storage on the disk of a single node, or the search request from a single node is too slow. To solve this problem, elastic search provides the function of dividing the index into multiple slices. When creating an index, you can define the number of slices you want to slice. Each partition is a fully functional independent index, which can be located on any node in the cluster.
Benefits of fragmentation:
① : horizontally split and expand to increase the storage capacity
② : distributed parallel cross slice operation to improve performance and throughput
Replica
In order to prevent data loss caused by network problems and other problems, a failover mechanism is required. Therefore, elasticsearch allows us to copy one or more indexes into fragments, which is called fragmented copy or replica.
There are also two main reasons for replicas:
① : high availability to deal with fragmentation or node failure, which needs to be on different nodes
② : improve performance, increase throughput, and search can be performed on all replicas in parallel

3, Logstash introduction

Logstash is written in JRuby language, based on the simple message based architecture, and runs on the Java virtual machine (JVM). Logstash can configure a single agent to combine with other open source software to realize different functions.
The concept of Logstash is very simple. It only does three things: Collect: data input, Enrich: data processing, 1 such as filtering, modification, etc., and Transport: data output (called by other modules)

4, Introduction to Kibana

Kibana is an open source analysis and visualization platform for Elasticsearch, which is used to search and view the data interactively stored in Elasticsearch index. With kibana, advanced data analysis and display can be carried out through various charts. It is easy to operate, and the browser based user interface can quickly create a dashboard to display the Elasticsearch query dynamics in real time. Setting up kibana is very simple. Kibana installation can be completed and Elasticsearch index monitoring can be started in a few minutes without writing code.

5, Configure ELK log analysis system

Environmental preparation

host	operating system	Host name IP address	Main software
The server	Centos7.4	node1	192. 168.35.40
The server	Centos7.4	node2	192.168.35.10
The server	Centos7.4	apache	192.168.35.20

1. Install elasticsearch cluster

1.1. Configure elasticsearch environment

Change host name configuration domain name resolution view Java environment
hostnamectl set-hostname node1
hostnamectl set-hostname node2
hostnamectl set-hostname apache
vim /etc/hosts
192.168.35.40 node1
192.168.35.10 node2
 upload jdk Compressed package to opt Directory
tar xzvf jdk-8u91-linux-x64.tar.gz -C /usr/local/
cd /usr/local/
mv jdk1.8.0_91 jdk
vim /etc/profile
export JAVA_HOME=/usr/local/jdk
export JRE_HOME=${JAVA_HOME}/jreexport CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:$PATH
source /etc/profile
java -version

1.2. Deploy elasticsearch software

upload elasticsearch Package to opt Directory

rpm -ivh elasticsearch-5.5.0.rpm
systemctl daemon-reload 	##Load system services
systemctl enable elasticsearch	##Open service

modify elasticsearch configuration file

cd /etc/elasticsearch/
cp elasticsearch.yml elasticsearch.yml.bak
vim elasticsearch.yml
	17 cluster.name: my-elk-cluster		##Change cluster name
	23 node.name: node1		##Change node name
	33 path.data: /data/elk_data		##Change data storage path, elk_data needs to be created manually
	37 path.logs: /var/log/elasticsearch		##Change file directory
	43 bootstrap.memory_lock: false	##Lock the physical memory address to prevent es memory from being swapped out. Frequent swapping will lead to high IOPS (performance test: read and write times per second)
	55 network.host: 0.0.0.0		##Change to full network segment
	59 http.port: 9200		##Open port
	68 discovery.zen.ping.unicast.hosts: ["node1", "node2"]  		##Change node name
grep -v '^#' /etc/elasticsearch/elasticsearch.yml

mkdir -p /data/elk_data		##Create data storage path
chown elasticsearch:elasticsearch /data/elk_data/		##Change primary group
systemctl start elasticsearch		##Open service

The node2 server has the same configuration

In real browser 192.168.56.1 open

192.168.35.40:9200/_cluster/health?pretty		##Check cluster health
192.168.35.40:9200/_cluster/state?pretty		##View cluster status

1.3. Install elasticsearch head plug-in

The above way to view the cluster is inconvenient. We can manage the cluster by installing the elastic search head plug-in
Log in to the node1 host at 192.168.35.40

upload node-v8.2.1.tar .gz reach/opt 
yum -y install gcc gcc-C++ make 
Compile and install node Component dependent packages take 47 minutes
cd /opt
tar -xzvf node-v8.2.1.tar.gz
cd node-v8.2.1
./configure
make -j3		(This process requires 10 minutes-30 Minutes vary, depending on your computer configuration)
make install

1.4. Install phantomjs front-end frame

Upload package to/usr/local/src/
cd /usr/local/src/
tar xjvf phantomjs-2.1.1-linux-x86_64.tar.bz2
cd phantomjs-2.1.1-linux-x86_64/bin 
cp phantomjs /usr/local/bin

1.5. Install elasticsearch head data visualization tool

cd /usr/local/src/
tar xzvf elasticsearch-head.tar.gz
cd elasticsearch-head/
npm install

vim /etc/elasticsearch/elasticsearch.yml		##Modify master profile
 Insert the following two lines at the end of the configuration file
	http.cors.enabled: true    		##Enable cross domain access support. The default value is false
	http.cors.allow-origin: "*"		##Allowed domain names and addresses for cross domain access
systemctl restart elasticsearch
cd /usr/local/src/elasticsearch-head/
npm run start &start-up elasticsearch-head Start the server; Switch to background operation

View 192.168.35.40:9100 and 192.168.35.10:9100 on windows
Changing localhost to node ip will display node status information

As shown in the figure below, the index is divided into 5 by default and has one copy

curl -XPUT 'localhost:9200/klj/test/1?pretty&pretty' -H 'conten-TYPE: application/json' -d '{"user":"zs","mesg":"hapyy"}'		##Insert an index called klj, the index type is test, the index content is zs, and the information is happy

2. Install logstash

2.2. Installing apache service and jdk environment

yum -y install httpd
systemctl start httpd
 upload jdk Compressed package to opt Directory
tar xzvf jdk-8u91-linux-x64.tar.gz -C /usr/local/
cd /usr/local/
mv jdk1.8.0_91 jdk
vim /etc/profile
	export JAVA_HOME=/usr/local/jdk
	export JRE_HOME=${JAVA_HOME}/jreexport CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
	export PATH=${JAVA_HOME}/bin:$PATH
source /etc/profile
java -version

2.3 installation of logstash

Upload installation package to opt Directory

cd /opt
rpm -ivh logstash-5.5.1.rpm
systemctl start logstash.service 
ln -s /usr/share/logstash/bin/logstash /usr/local/bin/

2.4 docking test logstash

Description and explanation of the Logstash command test field:
-f: This option allows you to specify the configuration file of logstash and configure logstash according to the configuration file
-e: Followed by a string, which can be used as the configuration of logstash (if it is "", stdin is used as the standard input and stdout is used as the standard output by default)
-t: Test that the configuration file is correct and exit

2.5 standard input / output

logstash agent(agent)Plug in for
①input
②filter
③output
logstash -e 'input { stdin{} } output { stdout{} }'

Input docking without exiting. Access the data browsing of elasticsearch head plug-in in the local window
It can be seen that the overview is more than logstash-2021.8.14

3. Install kibana on node1 host

upload kibana-5.5.1-x86_64.rpm reach/usr/local/src catalogue
cd /usr/local/src
rpm -ivh kibana-5.5.1-x86_64.rpm
cd /etc/kibana/
cp kibana.yml kibana.yml.bak
vim kibana.yml
	 2 server.port: 5601		##kibana open port
	 7 server.host: "0.0.0.0"		##Address where kibana listens
	21 elasticsearch.url: "http://192.168.35.40:9200" 		## Contact elasticsearch
	30 kibana.index: ".kibana"		##Add. kibana index in elasticsearch
systemctl start kibana.service		##Start kibana service
 To access port 5601: http://192.168.35.40:5601/

3.1. Log of docking with apache (accessed and error)

cd /etc/logstash/conf.d/
vim apache_log.conf
input {
	file{
	    path => "/etc/httpd/logs/access_log"
	    type => "access"
	    start_position => "beginning"
	    }
	file{
	    path => "/etc/httpd/logs/error_log"
	    type => "error"
	    start_position => "beginning"
	    }
	}
output {
	if [type] == "access" {
	elasticsearch {
            hosts => ["192.168.35.40:9200"]
	    index => "apache_access-%{+YYYY.MM.dd}"
   	    }
	}
	if [type] == "error" {
	elasticsearch {
            hosts => ["192.168.35.40:9200"]
	    index => "apache_error-%{+YYYY.MM.dd}"
   	    }
	}
	}
logstash -f apache_log.conf		##Specify to use Apache_ Configuration file for log.conf

Enter kibana to create Apache index appche_acess and apache_error
Homepage Management – Index Patterns – Create Index Pattern – select inde name or pattern

At this time, you can query Apache by viewing kibana_ access-,Apache_error-,system-*

6, Summary

This chapter explains that elk (management tool for simplified log analysis) is composed of ElasticSearch, Logstash and Kiabana; Their functions are as follows:

ES(nosql non relational database): storage function and index
Logstash (log collection): take the log from the application server and output it to es after format conversion

Collect / collect log s through the input function
filter: formatting data
Output: log output to es database

Kibana (display tool): display the data in es in the browser and display it through the UI interface (you can process the log according to your own needs for easy viewing and reading)

Topics: Database Redis

Programmer Think