ELK log system theory and several schemes

Posted by mschrank on Sat, 13 Jun 2020 13:10:05 +0200

Log system

scene

In general, we need to do log analysis scenarios: directly in the log file, grep and awk can get the information they want. However, in large-scale scenarios, this method is inefficient, facing problems such as how to archive too much logs, how to do too slow text search, and how to multi-dimensional query. Centralized log management is required, and logs on all servers are collected and summarized. The common solution is to establish a centralized log collection system to collect, manage and access the logs on all nodes.
Large system is a distributed deployment architecture. Different service modules are deployed on different servers. When problems occur, most situations need to locate specific servers and service modules according to the key information exposed by the problems, and build a centralized log system, which can improve the efficiency of positioning problems.
Analyze a large number of log business data, such as platform PV, UV, IP, PageTOP and other dimensions. In addition, security audit, data mining, behavior analysis and so on are all supported by logs.

effect

Information search: search log information, locate corresponding bug s, and find solutions.
Service diagnosis: through the statistics and analysis of log information, we can understand the server load and service running status, find out the time-consuming requests for optimization, etc.
Data analysis: if it is a formatted log, further data analysis can be done, and meaningful information can be calculated and aggregated. For example, according to the commodity id in the request, the products of interest to top 10 users can be found

framework

There are three basic components of the log system:

Collection end: agent (collect log source data, encapsulate data source, and send data from data source to collector)
Aggregation end: collector (to process data according to certain rules, receive data from multiple agent s, summarize and import it into the back-end store)
Storage side: store (log storage system, which should have scalability and reliability, such as HDFS ES, etc.)

Common solutions

ELF/EFK

graylog

Flow analysis

ELK scheme

assembly

Elasticsearch log storage and search engine, its characteristics are: distributed, zero configuration, automatic discovery, index automatic segmentation, index copy mechanism, restful style interface, multiple data sources, automatic search load, etc.
Logstash is a fully open-source tool, which can collect and filter your logs and store them for later use (it supports dynamic data collection from various data sources and data filtering, analysis, enrichment, unified format and other operations).
Kibana is also an open-source and free tool. Kibana can provide a log analysis friendly Web interface for Logstash and ElasticSearch, which can help you summarize, analyze and search important data logs.
Filebeat: like Logstash, it is a log collection and processing tool, which is based on the original Logstash fowarder source code. Compared with Logstash, filebeat is lighter and consumes less resources

technological process

AppServer > logstash > elasticsearch > kibana > Browser
Log stash collects logs generated by AppServer and stores them in the ElasticSearch cluster, while Kibana queries data from the ElasticSearch cluster to generate a chart and then returns it to Browser.
Considering the load problem of the aggregation end (log processing, cleaning, etc.) and the transmission efficiency of the collection end, queues are usually added at the collection end and the aggregation end when the amount of logs is large, so as to realize the peak cancellation of logs.

ELK log process can have many schemes (different components can be combined freely, according to their own business configuration). The common schemes are as follows:

Logstash (acquisition, processing) - > elasticsearch (storage) - > kibana (display)
Logstash (Collection) - > logstash (aggregation, processing) - > elasticsearch (storage) - > kibana (display)
Filebeat (collection, processing) - > elasticsearch (storage) - > kibana (display)
Filebeat (Collection) - > logstash (aggregation, processing) - > elasticsearch (storage) - > kibana (display)
Filebeat (Collection) - > Kafka / redis (peak elimination) - > logstash (aggregation, processing) - > elasticsearch (storage) - > kibana (display)

deploy

Overall planning of server

host	role
Host 1	elasticsearch
Host 2	kibana,zookeeper,kafka
Host 3	logstash
Host 4	nginx,filebeat

jdk configuration

[root@localhost ~]# tar -zxf jdk-8u201-linux-x64.tar.gz -C /usr/local/
[root@localhost ~]# ls /usr/local/
bin  etc  games  include  jdk1.8.0_201  lib  lib64  libexec  sbin  share  src
[root@localhost ~]# Add content at the end of vim /etc/profile
export JAVA_HOME=/usr/local/jdk1.8.0_201
export JRE_HOME=/usr/local/jdk1.8.0_201/jre
export CLASSPATH=$JAVA_HOME/lib/tools.jar:$JAVA_HOME/lib/dt.jar
export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH
[root@localhost ~]# source /etc/profile
[root@localhost ~]# java -version
java version "1.8.0_201"
Java(TM) SE Runtime Environment (build 1.8.0_201-b09)
Java HotSpot(TM) 64-Bit Server VM (build 25.201-b09, mixed mode)

Elastic search deployment

[root@localhost ~]# wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.3.2.tar.gz
[root@localhost ~]# tar -zxf elasticsearch-6.3.2.tar.gz -C /usr/local/
[root@localhost ~]# ln -s /usr/local/elasticsearch-6.3.2/ /usr/local/es
[root@localhost ~]# chown -R es:es /usr/local/es/
mkdir -p /es/{data,log}
groupadd es
useradd es -g es -p es
chown -R es:es /es/data
chown -R es:es /es/log

Modify system parameters

echo '* soft nofile 819200' >> /etc/security/limits.conf
echo '* hard nofile 819200' >> /etc/security/limits.conf
echo '* soft nproc 2048' >> /etc/security/limits.conf
echo '* hard nproc 4096' >> /etc/security/limits.conf
echo '* soft memlock unlimited' >> /etc/security/limits.conf
echo '* hard memlock unlimited' >> /etc/security/limits.conf
echo 'vm.max_map_count=655360' >> /etc/sysctl.conf
sysctl -p

es parameter setting

vim /usr/local/es/config/elasticsearch.yml 
cluster.name: bdqn
node.name: es
node.master: true
node.data: true
path.data: /es/data
path.logs: /es/log
network.host: 0.0.0.0
http.port: 9200
discovery.zen.minimum_master_nodes: 1
bootstrap.memory_lock: false
bootstrap.system_call_filter: false #These two lines are written together

In the configuration file, except for the above modification items, others are annotation items. If the last item is true, some system calls of es will be restricted, which may cause es startup failure

Seccomp (full name: secure computing mode) is a security mechanism supported by Linux kernel since version 2.6.23. In Linux system, a large number of system calls are directly exposed to user state programs. However, not all system calls are needed, and unsafe code abuse of system calls will pose a security threat to the system. Through seccomp, we restrict the program to use some system calls, which can reduce the exposure of the system and make the program enter a "safe" state.

#Modify jvm parameters reasonably according to machine configuration

vim /usr/local/es/config/jvm.options 
-Xms1g
-Xmx1g

es control

[root@es1 ~]# su es
[es@es1 root]$ /usr/local/es/bin/elasticsearch -d
[es@es1 root]$ curl 192.168.43.249:9200
{
  "name" : "es",
  "cluster_name" : "bdqn",
  "cluster_uuid" : "KvsP1lh5Re2cxdgfNbk_eA",
  "version" : {
    "number" : "6.3.2",
    "build_flavor" : "default",
    "build_type" : "tar",
    "build_hash" : "053779d",
    "build_date" : "2018-07-20T05:20:23.451332Z",
    "build_snapshot" : false,
    "lucene_version" : "7.3.1",
    "minimum_wire_compatibility_version" : "5.6.0",
    "minimum_index_compatibility_version" : "5.0.0"
  },
  "tagline" : "You Know, for Search"
}

[root@es1 ~]# iptables -F

kibana deployment

#  Download and install
wget https://artifacts.elastic.co/downloads/kibana/kibana-6.3.2-linux-x86_64.tar.gz
tar zxvf kibana-6.3.2-linux-x86_64.tar.gz  -C  /usr/local/
ln -s /usr/local/kibana-6.3.2-linux-x86_64/ /usr/local/kibana

# Configure kibana
vim /usr/local/kibana/config/kibana.yml

server.port: 5601
server.host: "0.0.0.0"
elasticsearch.url: "http://192.168.43.249:9200"

#Start kibana
/usr/local/kibana/bin/kibana

kafka deployment

Deploy zookeeper

wget http://archive.apache.org/dist/zookeeper/zookeeper-3.4.12/zookeeper-3.4.12.tar.gz
tar zxvf zookeeper-3.4.12.tar.gz -C /usr/local
cd /usr/local/zookeeper-3.4.12
cp conf/zoo_sample.cfg conf/zoo.cfg

# edit zoo.cfg file
vim /usr/local/zookeeper-3.4.12/conf/zoo.cfg
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/data/zookeeper/zkdata
dataLogDir=/data/zookeeper/zkdatalog
clientPort=2181

#Create directory
mkdir -p /data/zookeeper/{zkdata,zkdatalog}

#Configure ZK node ID
echo 1  > /data/zookeeper/zkdata/myid
 
 #start-up
/usr/local/zookeeper-3.4.12/bin/zkServer.sh start

#stop it
/usr/local/zookeeper-3.4.12/bin/zkServer.sh stop

#View status
/usr/local/zookeeper-3.4.12/bin/zkServer.sh status

kafka deployment

#Download the latest version 2.12-2.1
wget http://mirrors.hust.edu.cn/apache/kafka/2.1.0/kafka_2.12-2.1.0.tgz

#Extract and apply to the specified location
tar zxvf kafka_2.12-2.1.0.tgz  -C /usr/local/
ln -s /usr/local/kafka_2.12-2.1.0/ /usr/local/kafka


# vim  /usr/local/kafka/config/server.properties
broker.id=2
listeners = PLAINTEXT://192.168.43.74:9092 ×
num.network.threads=5
num.io.threads=8
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
log.dirs=/data/kafka/kfdata
delete.topic.enable=true #newly added
num.partitions=2
num.recovery.threads.per.data.dir=1
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1
log.flush.interval.messages=10000 #note off
log.flush.interval.ms=1000 #note off
log.retention.hours=72 
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000
zookeeper.connect=192.168.43.74:2181
zookeeper.connection.timeout.ms=6000
group.initial.rebalance.delay.ms=0

kafka management

#Start service:
/usr/local/kafka/bin/kafka-server-start.sh -daemon /usr/local/kafka/config/server.properties 
#Out of Service:
/usr/local/kafka/bin/kafka-server-stop.sh

kafka queue

Create topic

/usr/local/kafka/bin/kafka-topics.sh --create \
--zookeeper 192.168.43.74:2181 \
--replication-factor 1 --partitions 2 \
--topic elktest

View kafka queues
/usr/local/kafka/bin/kafka-topics.sh --zookeeper 192.168.43.74:2181 --topic "elktest" --describe

kafka communication verification

Production message
/usr/local/kafka/bin/kafka-console-producer.sh --broker-list 192.168.43.74:9092 --topic elktest
Enter hello at the command prompt, enter

Consumer News
/usr/local/kafka_2.12-2.1.0/bin/kafka-console-consumer.sh --bootstrap-server 192.168.43.74:9092 --topic elktest --from-beginning
After entering the command, you can see the information just entered, as shown in the following figure

After verification, you can exit both terminals

filebeat deployment

working principle

In any environment, applications have the potential to shut down. Filebeat reads and forwards log lines, and if interrupted, remembers where all events are when they are back online.
Filebeat has internal modules (audit, Apache, Nginx, System and MySQL), which can simplify the collection, analysis and visualization of general log format through a specified command.

FileBeat won't overload your pipes. If FileBeat transfers data to Logstash, when Logstash is busy processing data, it will notify FileBeat to slow down the reading speed. Once the congestion is resolved, the FileBeat will return to its original speed and continue to propagate.

Filebeat maintains the state of each file and frequently refreshes the disk state in the registry file. The status is used to remember the last offset that the harvester is reading and ensure that all log lines are sent. Filebeat stores the delivery status of each event in a registry file. Therefore, it can ensure that the event is delivered to the configured output at least once without data loss.

Filebeat has two main components:

Harvester: a harvester is responsible for reading the contents of a single file. Harvester reads each file line by line and sends that content to the output. Start a harvester for each file. The harvester is responsible for opening and closing the file, which means that the file descriptor remains open when the harvester runs. When the harvester is reading the contents of the file, the file is deleted or renamed, then the Filebeat will continue to read the file. There is a problem. As long as the harvester responsible for this file is not closed, the disk space will not be released. By default, Filebeat saves the file open until it is closed_ Inactive arrived.
Input: an input manages the harvesters and finds all sources to read. If the input type is log, input looks for all files on the drive that match the glob path that has been defined, and starts a harvester for each file.

Installation configuration

https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-6.3.2-linux-x86_64.tar.gz

tar zxvf filebeat-6.3.2-linux-x86_64.tar.gz -C /usr/local/
ln -s /usr/local/filebeat-6.3.2-linux-x86_64/ /usr/local/filebeat

You need to configure it before you start it. Now, you don't need to configure it, and the next startup command doesn't execute it

start-up
/usr/local/filebeat/filebeat -c  /usr/local/filebeat/filebeat.yml

parameter

input parameter

paths
paths specifies the log to monitor

encoding
Code of the file, where plain does not verify or convert any input

exclude_lines
A set of regular expressions that match the rows you want to include. Filebeat only exports rows that match this set of regular expressions. By default, all rows are exported. Blank line ignored

include_lines
A set of regular expressions that match the rows you want to include. Filebeat only exports rows that match this set of regular expressions. By default, all rows are exported. Blank lines are ignored.

harvester_buffer-size
The number of bytes of buffer used by each harvester when grabbing a file. 16384 by default

max_bytes
The maximum number of bytes allowed for a single log message. Over max_bytes will be discarded and will not be sent. This setting is useful for multiline log messages because they tend to be large. The default is 10MB (10485760)

json
Option to enable Filebeat to parse the log as a JSON message

multiline
Control how Filebeat expands multiline processing log messages

exclude_files
A set of regular expressions that match the files you want to ignore. No files are excluded by default

ignore_older
Specify Filebeat to ignore the log content modified outside the specified time period, such as 2h, 2 hours or 5m(5 minutes)

scan_frequency
Specify the inspection frequency and how often to check the new files in the specified path

Multiline multiline parameters

    multiline.pattern
        Specifies a regular expression to match multiple lines
    
    multiline.negate
        Defines whether a pattern is negated. The default is false.
    
    multiline.match
        Specifies how Filebeat merges multiple rows into one event. The optional values are after or before.
    
    multiline.max_lines
        The maximum number of rows that can be combined into an event. If a multiline message contains more than max lines_ Lines, more than lines are discarded. The default is 500.

output parameter

 enabled
        Enable or disable the output. Default true
    
    host
        List of nodes. Events are sent to these nodes in circular order. If one node becomes inaccessible, it is automatically sent to the next node. Each node can be in the form of URL or IP:PORT
    
    username
        User name for authentication
    
    password
        Password for user authentication
    
    protocol
        The optional values are: http or https. The default is http
    
    headers
        Add custom HTTP headers to each request output
    
    index
        Index name, to which index
    
    timeout
        Request timeout. The default is 90 seconds.

Transmission scheme

output.elasticsearch

If you want to use filebeat to output data directly to elastic search, you need to configure output.elasticsearch

output.elasticsearch:
  hosts: ["192.168.43.249:9200"]

output.logstash

If you use FileBeat to output data to logstash, then logstash to output data to elastic search, you need to configure output.logstash . When logstash and FileBeat work together, if logstash is busy processing data, it will notify FileBeat to slow down the reading speed. Once the congestion is resolved, the FileBeat will return to its original speed and continue to propagate. In this way, the pipeline overload can be reduced.

output.logstash:
  hosts: ["192.168.199.5:5044"]  #logstaship

output.kafka

If you use filebeat to output data to kafka, then logstash as the consumer pulls the logs in kafka, and then outputs data to elastic search, you need to configure output.logstash

output.kafka:
  enabled: true
  hosts: ["192.168.43.74:9092"]
  topic: elktest

logstash deployment

working principle

Logstash has two necessary elements: input and output, and an optional element: filter. These three elements represent the three stages of logstash event processing: input > Filter > output

Collect data from data source.
filter changes the data to the format or content you specify.
output transfers data to the destination.

In practical application scenarios, there are usually more than one input, output and filter. These three elements of Logstash use plug-in management mode, which can flexibly select plug-ins needed in each stage according to application needs, and use them in combination.

configuration file

logstash.yml : the default configuration file of logstash, such as node.name , path.data , pipeline.workers , queue.type The configuration can be overridden by the relevant parameters in the command line parameters
jvm.options : JVM configuration file for logstash.
startup.options (Linux): contains the options used by the system installation script in / usr/share/logstash/bin to build the appropriate startup script for your system. When installing the Logstash package, the system installation script will execute at the end of the installation process, and use the startup.options To set options for users, groups, service names, and service descriptions.
pipelines.yml : file defining data processing flow

input common modules

Logstash supports various input options and can capture events from many common sources at the same time. It can collect data from logs, indicators, Web applications, data storage and various AWS services in a continuous streaming mode.

File: read from a file on the file system
syslog: listen for system log messages on well-known port 514 and parse them according to RFC3164 format
Redis: read from the redis server, and use the redis channel and redis list. Redis is often used as an "agent" in a centralized Logstash installation, which queues Logstash events received from remote Logstash "shippers.".
beats: handles events sent by Filebeat.

Common filter modules

The filter is an intermediate processing device in the Logstash pipeline. You can combine condition filters to perform actions on events.

Grok: parse and structure arbitrary text. Grok is currently the best way for Logstash to parse unstructured log data into structured and queryable data.
mutate: performs a general conversion on an event field. You can rename, delete, replace, and modify fields in an event.
drop: completely discards an event, such as a debug event.
clone: make a copy of an event, which may add or remove fields.
geoip: add information about the geographic location of the IP address

Common output

Elastic search: sends event data to elastic search (recommended mode).
File: writes event data to a file or disk.
Graphite: send event data to graphite (a popular open source tool for storing and drawing metrics, http://graphite.readthedocs.io/en/latest/ ).
Statsd: sends event data to statsd (a service that listens for statistics, such as counters and timers, sent over UDP and aggregates to one or more pluggable back-end services).

Common code plug-ins

JSON: encode or decode data in JSON format.
Multiline: combines multiline text events, such as java exceptions and stack trace messages, into a single event.

Transmission scheme

Transfer to ES using beats:

input {
  beats {
    port => 5044 # This port needs to be connected with filebeat.yml  Same port in
  }
}


output {
  elasticsearch {
    hosts => "192.168.199.4:9200"
    manage_template => false
    index => "%{[@metadata][beat]}-%{[@metadata][version]}-%{+YYYY.MM.dd}" 
    document_type => "%{[@metadata][type]}" 
  }
}

Read transfer from Kafka to ES:

input {
      kafka {
        bootstrap_servers => "192.168.199.6:9092"
        auto_offset_reset => "latest"
        consumer_threads => 5
        decorate_events => true
        topics => ["elktest"]
      }
}
          
output { 
    elasticsearch { 
        hosts => ["192.168.199.4:9200"]
        index => "elktest-%{+YYYY.MM.dd}"
        }
}

Installation test

# Download and install
wget https://artifacts.elastic.co/downloads/logstash/logstash-6.3.2.tar.gz
tar zxvf logstash-6.3.2.tar.gz  -C /usr/local/
ln -s /usr/local/logstash-6.3.2/ /usr/local/logstash


vim /usr/local/logstash/config/logstash.yml
#Turn on profile autoload
config.reload.automatic: true

#Define profile overload time period                       
config.reload.interval: 10    

#Define access host name, generally domain name or IP
http.host: "192.168.199.5"    

#Profile directory            
path.config: /usr/local/logstash/config/*.conf   #Note: there is a space between and /

#Log output path
path.logs: /var/log/logstash

#test(After typing in the command, enter any message)  #At present, there is no conf file. You can't type this command for the moment
/usr/local/logstash/bin/logstash -e  'input{stdin{}}output{stdout{}}'

use

Experiment 1

The key point of the experiment is to understand the coordination process of each component and learn the use of beats (filebeat s)

NginxLog ----> Filebeat —> Logstash —> ES —>Kibana

Node description

host name	address	role	effect	edition
es	192.168.43.249	Elasticsearch	Log storage	elasticsearch-6.3.2
logstash	192.168.43.147	Logstash	Log processing	Logstatsh-6.3.2
zkk	192.168.199.43.74	Kibana	Log display	kibana-6.3.2
app1	192.168.43.253	Nginx,FileBeat	Log production, log collection	filebeat-6.3.2

Experimental premise

ES install configuration and start
Filebeat installation complete
Logstash installation complete
Kibana installation configuration complete
Firewall opens related ports
Disable selinux

Service configuration

filebeat configuration

[root@es3 ~]# vim /usr/local/filebeat/filebeat.yml 
#=========================== Filebeat inputs =============================

#filebeat.inputs:

# Each - is an input. Most options can be set at the input level, so
# you can use different inputs for various configurations.
# Below are the input specific configurations.
filebeat:
  prospectors:
    - type: log
      paths:
        - /usr/local/nginx/logs/access.log
      tags: ["nginx"]

  # Change to true to enable this input configuration.
  enabled: true

#----------------------------- Logstash output --------------------------------
output.logstash:
    hosts: ["192.168.43.147:5044"]

//There is an elastic search output above logstas output that needs to be commented out

#function
/usr/local/filebeat/filebeat -c  /usr/local/filebeat/filebeat.yml

filebeat host installs nginx and starts

A brief history

logstash configuration

[root@localhost ~]# vim /usr/local/logstash/config/nginx-access.conf
input {
  beats {
    port => 5044
  }
}

output {
  elasticsearch {
        hosts => "192.168.43.249:9200"
        index => "nginx-%{+YYYY.MM.dd}"
  }
}

#function
/usr/local/logstash/bin/logstash -f /usr/local/logstash/config/nginx-access.conf

kibana show
Accessing the local 5601 port on the kibana host
firefox 192.168.43.74:5601

Click discover on the left menu and enter nginx - * as the index style in the input box of the new page, then click Next

After adding the index, click the visualization in the left menu to create a visualization chart, and select the vertical bar in the new page

Select the index style we have created, and you can see the data. If it is not obvious, go to nginx and visit yourself several times

View index list on es for verification

[root@es1 ~]# curl 192.168.43.249:9200/_cat/indices?v
health status index            uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   .kibana          kT2cHcPUQEuRZ4bNr4bhZQ   1   0          2            0     10.6kb         10.6kb
red    open   index            WgQXJcVDSf6BiVAogsKk5g   5   1          1            0      3.6kb          3.6kb
yellow open   nginx-2019.02.08 qKww-bo7SOqGSjEb3wAj8A   5   1         26            0    229.2kb        229.2kb
red    open   bdqntest         BnAC8m91TzKWGPsxwj23nA   5   1          0            0       781b           781b

Experiment 2

The focus of the experiment is to understand the coordination process of each component and the architecture after joining the message queue

NginxLog —> Filebeat —> Kafka —> Logstash —> ES —>Kibana

Node description

host name	address	role	effect	edition
es	192.168.43.249	Elasticsearch	Log storage	elasticsearch-6.3.2
logstash	192.168.43.147	Logstash	Log processing	Logstatsh-6.3.2
zkk	192.168.199.43.74	Kibana,kafka,zookeeper	Log display	kibana-6.3.2
app1	192.168.43.253	Nginx,FileBeat	Log production, log collection	filebeat-6.3.2

Experimental premise

1. ES install configuration and start
 2. Filebeat installation complete
 3. Logstash installation complete
 4. Kibana installation configuration complete
 5. Firewall opens related ports
 6. Disable selinux
 7. Zookeeper running and starting
 8. Kafka is configured to run

Service configuration

kafka configuration

#Create a topic and test that the consumer is OK (previously created)
/usr/local/kafka_2.12-2.1.0/bin/kafka-topics.sh --create \
--zookeeper 192.168.43.74:2181 \
--replication-factor 1 --partitions 2 \
--topic elktest1

filebeat configuration

[root@es3 nginx-1.15.4]# vim /usr/local/filebeat/filebeat.yml 
On the basis of the above experiment, comment out the logstash output and write the output of kafka
output.kafka:
  enabled: true
  hosts: ["192.168.43.74:9092"]
  topic: "elktest1"

#Running filebeat
/usr/local/filebeat/filebeat -c  /usr/local/filebeat/filebeat.yml

logstash configuration

[root@localhost ~]# vim /usr/local/logstash/config/nginx-access.conf
input{
  kafka{
    bootstrap_servers => "192.168.43.74:9092"
    topics => "elktest"
    consumer_threads => 1
    decorate_events => true
    auto_offset_reset => "latest"

  }
}

output {
   elasticsearch {
         hosts => ["192.168.43.249:9200"]
         index => "nginx-kafka-%{+YYYY-MM-dd}"
   }
}

#function
/usr/local/logstash/bin/logstash -f /usr/local/logstash/config/nginx-access.conf

kibana show

kibana host starts kafka (if it has been started before, this will not be ignored)

 #Start zookeeper
/usr/local/zookeeper-3.4.12/bin/zkServer.sh start
#Start kafka
/usr/local/kafka/bin/kafka-server-start.sh -daemon /usr/local/kafka/config/server.properties

Accessing the local 5601 port on the kibana host
firefox 192.168.43.74:5601

Experiment 3

The key point of the experiment is to understand the coordination process of each component and learn how to merge multiple lines of logs

Tomcat(Catalina.out) —> Filebeat —> Kafka —> Logstash —> ES —>Kibana

Node description

host name	address	role	effect	edition
es	192.168.43.249	Elasticsearch	Log storage	elasticsearch-6.3.2
logstash	192.168.43.147	Logstash	Log processing	Logstatsh-6.3.2
zkk	192.168.199.43.74	Kibana,kafka,zookeeper	Log display	kibana-6.3.2
app1	192.168.43.253	Tomcat,FileBeat	Log production, log collection	filebeat-6.3.2

Experimental premise

1. ES install configuration and start
 2. Filebeat installation complete
 3. Logstash installation complete
 4. Kibana installation configuration complete
 5. Firewall opens related ports
 6. Disable selinux
 7. Zookeeper running and starting
 8. Kafka is configured to run
 9. Tomcat8 installation configuration and default log format

Service configuration

kafka configuration

#Create a topic and test that the consumer is OK (previously created)
/usr/local/kafka_2.12-2.1.0/bin/kafka-topics.sh --create \
--zookeeper 192.168.43.74:2181 \
--replication-factor 1 --partitions 2 \
--topic elktest1

filebeat configuration

# Match the log lines in the Tomcat log that do not start with the time (Format: 16-Jan-2019 00:54:38.599 information...), and merge multiple lines

[root@app1 filebeat]# cat filebeat.yml 
filebeat:
  prospectors:
    - type: log
      paths:
        - /usr/local/apache-tomcat-8.0.53/logs/catalina.out
      multiline.pattern: '^[0-9]{2}-[A-Za-z]*-[0-9]{4}'
      multiline.negate: true
      multiline.match: after

output.kafka:
  enabled: true
  hosts: ["192.168.43.74:9092"]
  topic: "elktest1"

#Running filebeat
/usr/local/filebeat/filebeat -c  /usr/local/filebeat/filebeat.yml

logstash configuration

[root@logstash config]# cat tomcat-catalina.conf 
input{
  kafka{
    bootstrap_servers => "192.168.43.74:9092"
    topics => "elktest"
    consumer_threads => 1
    decorate_events => true
    auto_offset_reset => "latest1"
  }
}


filter {
  json {
    source => "message"
  }        
}

output {
   elasticsearch {
         hosts => ["192.168.43.249:9200"]
         index => "tomcat-kafka-%{+YYYY-MM-dd}"
   }
}

#function
/usr/local/logstash/bin/logstash -f /usr/local/logstash/config/nginx-access.conf

Result verification

Original log format
Repeatedly start Tomcat to simulate error reporting
kibana shows multiline merging
Expand the error log to see if the merge is normal

Topics: kafka ElasticSearch Zookeeper Nginx

Programmer Think

ELK log system theory and several schemes

Log system

scene

effect

framework

Common solutions

ELF/EFK

graylog

Flow analysis

ELK scheme

assembly

technological process

deploy

Overall planning of server

jdk configuration

Elastic search deployment

es parameter setting

es control

kibana deployment

kafka deployment

Deploy zookeeper

kafka deployment

kafka management

kafka queue

kafka communication verification

filebeat deployment

working principle

Installation configuration

parameter

Transmission scheme

logstash deployment

working principle

configuration file

input common modules

Common filter modules

Common output

Common code plug-ins

Transmission scheme

Installation test

use

Experiment 1

Node description

Experimental premise

Service configuration

Experiment 2

Node description

Experimental premise

Service configuration

Experiment 3

Node description

Experimental premise

Service configuration

Result verification

Hot Topics