ELKB log system

Posted by Servyces on Fri, 07 Jan 2022 10:43:59 +0100

ELKB log system

Antecedents

Why do you do this?

"Inspiration source" is the blogger's own project. The blogger of this project writes too freely. There are miscellaneous functions. He always reports some strange errors... Errors. Let's look at the log and find out how to make mistakes, right. However, this local report is wrong. It's ok. I'll take a direct look at the console. ok, it can be solved quickly. The fault is that there is a test environment. jenkins is used for pipeline deployment and runs on the server. If an error is reported, you can only look at the log file. This is the log, right? Tail - N 300 XXX Log or tail - f XXX Log, isn't this pure xx. ok, just build a log system by yourself and integrate jinkens by the way. This is all later

My logs are on the remote server. I will install Beats on the server, which is lighter than Logstash, and then use Redis as a message queue to deploy a real ELK system on the virtual machine.

Why add Redis?

Because my virtual machine is on the intranet, I can't guarantee that the virtual machine will be on at any time, so I think of an FTP service, and then manually adjust the FTP download every time I query the log. However, I find that it seems that I have to manually download the log every time, which is very annoying. I learned about Beats later, but I was thinking, when he gave Logstash, my virtual machine went down again? Unexpectedly, I saw a post saying that Filebeat has a persistence mechanism. If it is found that the destination is unavailable, it will support saving, and then send it back when it is available. Then I thought about it. If I lose it in the message middleware, I can actually cry. However, at present, the chat system has not officially entered the development stage. It is just a demo, so the message queue has not been used. Therefore, let's use redis as a message queue. I'm still thinking, when I really get up, will I burst the remaining 1g of memory? Let's talk about it then. Who hasn't had a few accidents, right

How do you do it? What can I do for you?

Es must be. In fact, now I also have es and kibana. L and B are Logstash and Beats respectively.

Logstash is an open source data collection engine with good real-time performance. It can dynamically unify data from different sources, output the data to the destination you want, and directly import the data into ES. I remember when I learned ES myself, I still handwritten java code to take the data from MySQL, and then assembled it into ES myself. Logstash can completely replace this step, so that we Java only care about the core retrieval business

Beats is an open source data transmitter. It can be installed on the proxy server, and it can be directly connected to ES. It is divided into many types, because log collection is actually some log files, so Filebeat is fully competent

Filebeat

Download and install

https://www.elastic.co/guide/en/beats/filebeat/7.16/setup-repositories.html#_yum

I directly use the YUM download method provided on the official website (CentOS 8.2)

# Introduction rpm
sudo rpm --import https://packages.elastic.co/GPG-KEY-elasticsearch
# Configure yum warehouse
vim /etc/yum.repos.d/elastic.repo
# Paste the following code to execute: wq
[elastic-7.x]
name=Elastic repository for 7.x packages
baseurl=https://artifacts.elastic.co/packages/7.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md
# install
sudo yum install filebeat
# Startup and self start
sudo systemctl enable filebeat
# If CentOS 7 X startup and self startup
sudo chkconfig --add filebeat 
# The directory structure is very similar to nginx
# home -- /usr/share/filebeat
# bin -- /usr/share/filebeat/bin
# conf -- /etc/filebeat
# data -- /var/lib/filebeat
# logs -- /var/log/filebeat

Post a document home page. Other installation methods are here

https://www.elastic.co/guide/en/beats/filebeat/current/index.html

Simple configuration

# ============================== Filebeat inputs ===============================

filebeat.inputs:
  - type: log
  enabled: true
  paths:
    - /var/log/nginx/*.log

# ============================== Filebeat modules ==============================

filebeat.config.modules:
  path: ${path.config}/modules.d/*.yml
  reload.enabled: false
# ======================= Elasticsearch template setting =======================

setup.template.settings:
  index.number_of_shards: 1

# ================================== Outputs ===================================

# ------------------------------ Logstash Output -------------------------------
output.redis:
  hosts: ["47.96.5.71:9736"]
  password: "Tplentiful"
  key: "filebeat:"
  db: 0
  timeout: 5
#output.logstash:
  # The Logstash hosts
  #hosts: ["localhost:5044"]

  # Optional SSL. By default is off.
  # List of root certificates for HTTPS server verifications
  #ssl.certificate_authorities: ["/etc/pki/root/ca.pem"]

  # Certificate for SSL client authentication
  #ssl.certificate: "/etc/pki/client/cert.pem"

  # Client Certificate Key
  #ssl.key: "/etc/pki/client/cert.key"

# ================================= Processors =================================
processors:
  - add_host_metadata:
      when.not.contains.tags: forwarded
  - add_cloud_metadata: ~
  - add_docker_metadata: ~
  - add_kubernetes_metadata: ~


Data sample

I'm here to put the data Output into Redis and view it with a visualization tool

input configuration

Since I only collect the log files of my own projects, I don't need to make other complex configurations for the Input related to filebeat. Interested hxd can check the official documents by themselves

filebeat.inputs:
- type: log
  enabled: true
  backoff: "1s"
  paths:
    - /data/family_doctor/logs/article/*.log
    - /data/family_doctor/logs/article/*/*.log
  fields:
    filename: article_log
  fields_under_root: true
  multiline:
    type: pattern
    pattern: '^\[[0-9]{4}-[0-9]{2}-[0-9]{2}'
    negate: true
    match: after
  
- type: log
  enabled: true
  backoff: "1s"
  paths:
    - /data/family_doctor/logs/other/*.log
    - /data/family_doctor/logs/other/*/*.log
  # Customize a fields field to filter the output
  fields:
    filename: other_log
  fields_under_root: true
  # Use regular matching to match timestampe for multi line statistics (example on the official website)
  multiline:
    type: pattern
    # type: count the number of rows that can be customized
    pattern: '^\[[0-9]{4}-[0-9]{2}-[0-9]{2}'
    negate: true
    match: after
...

output configuration

output.redis:
  hosts: ["host:port"]
  password: "Tplentiful"
  # Default key value
  key: "default_log"
  db: 0
  timeout: 5
  keys:
  # Generate different key values according to custom field names
    - key: "article_log"
      when.equals:
        filename: article_log
    - key: "other_log"
      when.equals:
        filename: other_log
    - key: "user_log"
      when.equals:
        filename: user_log
    - key: "sys_log"
      when.equals:
        filename: sys_log
    - key: "rtc_log"
      when.equals:
        filename: rtc_log

filebeat startup script

#!/bin/bash
agent="/usr/share/filebeat/bin/filebeat"
args="-c /etc/filebeat/filebeat.yml"
test() {
$agent $args
}
start() {
    pid=`ps -ef | grep $agent | grep -v grep | awk '{print $2}'`
    if [ ! "$pid" ];then
        echo "start-up filebeat: "
        test
        if [ $? -ne 0 ]; then
            echo
            exit 1
        fi
        $agent $args &
        if [ $? == '0' ];then
            echo "filebeat Start successful"
        else
            echo "filebeat Startup failed"
        fi
    else
        echo "filebeat Running"
        exit
    fi
}
stop() {
    echo -n $"stop it filebeat: "
    pid=`ps -ef | grep $agent | grep -v grep | awk '{print $2}'`
    if [ ! "$pid" ];then
echo "filebeat Has stopped"
    else
        kill $pid
echo "filebeat Stop successful"
    fi
}
restart() {
    stop
    start
}
status(){
    pid=`ps -ef | grep $agent | grep -v grep | awk '{print $2}'`
    if [ ! "$pid" ];then
        echo "filebeat Currently stopped"
    else
        echo "filebeat Currently running"
    fi
}
case "$1" in
    start)
        start
    ;;
    stop)
        stop
    ;;
    restart)
        restart
    ;;
    status)
        status
    ;;
    *)
        echo $"Usage: $0 {start|stop|restart|status}"
        exit 1
esac

Logstash

Download and install

https://www.elastic.co/guide/en/logstash/current/installing-logstash.html

# To use the Java version, you need to install the JDK before this. There are a lot of them on the Internet, so they don't post them specifically
# rpm package management
sudo rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch
# Create edit file
vim /etc/yum.repos.d/logstash.repo
# Copy paste
[logstash-7.x]
name=Elastic repository for 7.x packages
baseurl=https://artifacts.elastic.co/packages/7.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md
# install
sudo yum install logstash
# Directory structure and default location
# home -- /usr/share/logstash
# bin -- /usr/share/logstash/bin
# settings -- /etc/logstash
# conf -- /etc/logstash/conf.d/*.conf 
# logs -- /var/log/logstash
# plugins -- /usr/share/logstash/plugins
# data -- /var/lib/logstash

Basic knowledge

https://www.elastic.co/guide/en/logstash/current/pipeline.html

Input

Data sources collected

There are four common forms of Logstash input:

  1. File: a standard file system. Collecting data is like tail -fn 100
  2. syslog: the log file of the system (I don't know much about it)
  3. Redis: get data from redis (in fact, Message Oriented Middleware)
  4. Beats: get data directly from beats

There are more ES, HTTP, JDBC, Kafka, Log4j, RabbitMQ, WebSocket

Filters

The filter is a unique event processing device in the Logstash pipeline, which filters and processes different events according to specific conditions

  1. At present, the official website of grok is the best way to parse data in a structured and searchable manner
  2. mutate: process event fields, such as renaming, deleting, modifying and replacing
  3. drop: delete event completely
  4. Clone: clone an event. You can add or delete some fields

Output

This is the last stage of the Logstash pipeline. An event can pass through multiple different outputs. Once all outputs are completed, the event is completed

  1. ElasticSearch: direct docking with ES
  2. file: write data to disk
  3. email: send by mail
  4. Http: in the form of HTTP request
  5. Message Oriented Middleware: Kafka, RabbitMQ, redis (pretending to be Message Oriented Middleware)
  6. WebSocket: sent in the form of socket

Codecs

Codec allows different data to have different output forms

  1. JSON: output in JSON format
  2. Multiline: multiline output (multiple events are combined into one event)

configuration file

logstash.yml

The parameters in the command line will override the configuration in yml

# It can be understood as the maximum upper limit condition of acquisition frequency
pipeline:
  batch:
    size: 50
    delay: 5
# Node name
node.name: logstash1
# Data path
path.data: /var/lib/logstash
# Configure automatic loading. Logstash should not be forced to kill. If there are still blocked logs that are not output, data will be lost
config.reload.automatic: true
# Configure hot load test cycle
config.reload.interval: 10s
# host and port
api.http.host: 192.168.5.128
api.http.port: 9600
# Log path
path.logs: /var/log/logstash

 

pipeline.yml

A Logstash instance can run multiple pipeline configurations

There is no configuration here, so it is impossible to use such advanced functions. The official document says that the configuration here will overwrite the pipeline configuration in the yaml above

jvm.options

JVM parameters for Logstash

## JVM configuration

# Xms represents the initial size of total heap space
# Xmx represents the maximum size of total heap space

-Xms384m
-Xmx384m

################################################################
## Expert settings
################################################################
##
## All settings below this section are considered
## expert settings. Don't tamper with them unless
## you understand what you are doing
##
################################################################

## GC configuration
8-13:-XX:+UseConcMarkSweepGC
8-13:-XX:CMSInitiatingOccupancyFraction=75
8-13:-XX:+UseCMSInitiatingOccupancyOnly

## Locale
# Set the locale language
#-Duser.language=en

# Set the locale country
#-Duser.country=US

# Set the locale variant, if any
#-Duser.variant=

## basic

# set the I/O temp directory
#-Djava.io.tmpdir=$HOME

# set to headless, just in case
-Djava.awt.headless=true

# ensure UTF-8 encoding by default (e.g. filenames)
-Dfile.encoding=UTF-8

# use our provided JNA always versus the system one
#-Djna.nosys=true

# Turn on JRuby invokedynamic
-Djruby.compile.invokedynamic=true
# Force Compilation
-Djruby.jit.threshold=0
# Make sure joni regexp interruptability is enabled
-Djruby.regexp.interruptible=true

## heap dumps

# generate a heap dump when an allocation from the Java heap fails
# heap dumps are created in the working directory of the JVM
-XX:+HeapDumpOnOutOfMemoryError

# specify an alternative path for heap dumps
# ensure the directory exists and has sufficient space
#-XX:HeapDumpPath=${LOGSTASH_HOME}/heapdump.hprof

## GC logging
#-XX:+PrintGCDetails
#-XX:+PrintGCTimeStamps
#-XX:+PrintGCDateStamps
#-XX:+PrintClassHistogram
#-XX:+PrintTenuringDistribution
#-XX:+PrintGCApplicationStoppedTime

# log GC status to a file with time stamps
# ensure the directory exists
#-Xloggc:${LS_GC_LOG_FILE}

# Entropy source for randomness
-Djava.security.egd=file:/dev/urandom

# Copy the logging context from parent threads to children
-Dlog4j2.isThreadContextMapInheritable=true

17-:--add-opens java.base/sun.nio.ch=ALL-UNNAMED
17-:--add-opens java.base/java.io=ALL-UNNAMED

 

log4j2.properties

Logstash's log configuration file - just use the default log. We just troubleshoot this log ourselves

startup.options

Startup parameter configuration, some paths and working directories, and startup user configuration

The configuration of logstash input and output is based on logstash sample under / etc/logstash / Conf configuration

ElasticSearch

Download and install

# Classic three piece set
rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch
vim /etc/yum.repos.d/elasticsearch.repo
# Copy it in
[elasticsearch]
name=Elasticsearch repository for 7.x packages
baseurl=https://artifacts.elastic.co/packages/7.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=0
autorefresh=1
type=rpm-md
# implement
sudo yum install --enablerepo=elasticsearch elasticsearch
# The directory structure and default path should be specified
# home -- /usr/share/elasticsearch /
# bin -- /usr/share/elasticsearch/bin / the startup files are all here. In fact, they are in / etc / init D / inside is commonly used
# logs -- /var/log/elasticsearch / log file directory
# Conf -- configuration file directory of / etc / elasticsearch / es
# Evoment conf -- / etc / sysconfig / elasticsearch configuration file of ES environment variable, JVM parameters, etc
# data -- /var/lib/elasticsearch / data
# jdk -- /usr/share/elasticsearch/jdk / where the JDK is configured, it can be modified in the event conf
# plugins -- /usr/share/elasticsearch/plugins / plug-in, which may use plug-ins such as word splitter

For more usage of ES, please refer to the official website documents or the Chinese community

https://www.elastic.co/guide/en/elasticsearch/reference/7.16/index.html

configuration file

elasticsearch.yml

# Path / etc / elasticsearch / elasticsearch yml
# Node name
node.name: loges1
# Data directory
path.data: /var/lib/elasticsearch
# Log directory
path.logs: /var/log/elasticsearch
# IP address of your own server
network.host: 192.168.5.128
# ES port, remember to turn on the firewall
http.port: 9200
# Maximum amount of data per request
http.max_content_length: 50mb
# ES has its own cluster. Configure the master node
cluster.initial_master_nodes: ["loges1"]

log4j2.properties

# He was what he was. I didn't change it. Some configurations of ES built-in Java log printing do not make much sense

jvm.options

# When ES is started, there are some JVM parameters. The front is the version corresponding to JDK, and the back is some configurations, garbage collector, whether to print GC logs or not. I just changed the heap size. I shouldn't use much space, so I changed Xms512m Xmx512m

elasticsearch

# Path / etc/sysconfig/elasticsearch
# Here are some environment variable parameters. I configured es in it_ JAVA_ Home doesn't seem to work, so I set it directly in / etc/profile and execute source /etc/profile
ES_JAVA_HOME=/usr/share/elasticsearch/jdk
ES_PATH_CONF=/etc/elasticsearch
ES_JAVA_OPTS="-Xms512m -Xmx512m"
ES_STARTUP_SLEEP_TIME=5
# Others are commented out. If any problems are found later, they will be corrected one by one and added

Note: ES cannot be started directly with the Root account and will report an error, so it is best to open a new account. In fact, ES created an account called elasticsearch and a group called this name for us during installation. Its user name is too long, so I don't need to create one myself to join this group. Don't forget to give users permission to some of the above directories

Just start ES /usr/share/elasticsearch/bin/elasticsearch -d

Kibana

Download and install

# Copy paste
rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch
vim /etc/yum.repos.d/kibana.repo
# Copy the following
[kibana-7.x]
name=Kibana repository for 7.x packages
baseurl=https://artifacts.elastic.co/packages/7.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md
# implement
sudo yum install kibana
# Power on
sudo systemctl daemon-reload
sudo systemctl enable kibana.service
# Start stop
sudo systemctl start kibana.service
sudo systemctl stop kibana.service
# directory structure
# home -- /usr/share/kibana
# bin -- /usr/share/kibana/bin
# conf -- /etc/kibana
# data -- /var/lib/kibana
# logs -- /var/log/kibana
# plugins -- /usr/share/kibana/plugins

configuration file

server.port: 5601
server.host: "192.168.5.128"
server.name: "kibana-1"
elasticsearch.hosts: ["http://192.168.5.128:9200"]
# Just start it by a non root user

kibana has nothing to say. I just use it to write some request templates. Finally, the real call is Java. The nesting of request parameters is clear at a glance

For subsequent updates over time, paste a directory first