Use of Logstash plug-in (use of input, filter and output)

Posted by vineld on Tue, 11 Jan 2022 12:31:24 +0100

1: Input plug-in (input)

Enter the official documentation of the plug-in: https://www.elastic.co/guide/en/logstash/current/input-plugins.html

The following is an example:

  1. Standard input
  2. read file
  3. Read TCP network data

1 standard input (stdin)

Type Hello World (stdin {}) on the console, and press enter to output on the current console (codec = > Ruby debug)

input{
    stdin{
       
    }
}
output {
    stdout{
        codec=>rubydebug    
    }
}

Test:

[root@localhost config]# ls
java_pid12011.hprof  jvm.options        logstash-sample.conf  my-app.conf    startup.options
java_pid21521.hprof  log4j2.properties  logstash.yml          pipelines.yml
[root@localhost config]# vim test1.conf
input{
    stdin{
       
    }
}
output {
    stdout{
        codec=>rubydebug    
    }
}
~
~
~
~
~
~
"test1.conf" [new] 17L, 167C Written                                                           
[root@localhost config]# ls
java_pid12011.hprof  jvm.options        logstash-sample.conf  my-app.conf    startup.options
java_pid21521.hprof  log4j2.properties  logstash.yml          pipelines.yml  test1.conf
[root@localhost config]# 

[root@localhost config]# ../bin/logstash -f test1.conf 
Using JAVA_HOME defined java: /opt/elasticsearch-7.6.1/jdk
WARNING: Using JAVA_HOME while Logstash distribution comes with a bundled JDK.
DEPRECATION: The use of JAVA_HOME is now deprecated and will be removed starting from 8.0. Please configure LS_JAVA_HOME instead.
OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be removed in a future release.
Sending Logstash logs to /usr/local/logstash/logs which is now configured via log4j2.properties
[2022-01-11T15:49:40,297][INFO ][logstash.runner          ] Log4j configuration path used is: /usr/local/logstash/config/log4j2.properties
[2022-01-11T15:49:40,315][INFO ][logstash.runner          ] Starting Logstash {"logstash.version"=>"7.16.2", "jruby.version"=>"jruby 9.2.20.1 (2.5.8) 2021-11-30 2a2962fbd1 OpenJDK 64-Bit Server VM 13.0.2+8 on 13.0.2+8 +indy +jit [linux-x86_64]"}
[2022-01-11T15:49:40,652][WARN ][logstash.config.source.multilocal] Ignoring the 'pipelines.yml' file because modules or command line options are specified
[2022-01-11T15:49:42,009][INFO ][logstash.agent           ] Successfully started Logstash API endpoint {:port=>9600, :ssl_enabled=>false}
[2022-01-11T15:49:42,568][INFO ][org.reflections.Reflections] Reflections took 84 ms to scan 1 urls, producing 119 keys and 417 values 
[2022-01-11T15:49:43,491][INFO ][logstash.javapipeline    ][main] Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>2, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50, "pipeline.max_inflight"=>250, "pipeline.sources"=>["/usr/local/logstash/config/test1.conf"], :thread=>"#<Thread:0x772c1831 run>"}
[2022-01-11T15:49:44,165][INFO ][logstash.javapipeline    ][main] Pipeline Java execution initialization time {"seconds"=>0.67}
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by com.jrubystdinchannel.StdinChannelLibrary$Reader (file:/usr/local/logstash/vendor/bundle/jruby/2.5.0/gems/jruby-stdin-channel-0.2.0-java/lib/jruby_stdin_channel/jruby_stdin_channel.jar) to field java.io.FilterInputStream.in
WARNING: Please consider reporting this to the maintainers of com.jrubystdinchannel.StdinChannelLibrary$Reader
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
[2022-01-11T15:49:44,268][INFO ][logstash.javapipeline    ][main] Pipeline started {"pipeline.id"=>"main"}
The stdin plugin is now waiting for input:
[2022-01-11T15:49:44,327][INFO ][logstash.agent           ] Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
sss
{
       "message" => "sss",
          "host" => "localhost.localdomain",
    "@timestamp" => 2022-01-11T07:49:47.049Z,
      "@version" => "1"
}
hello world
{
       "message" => "hello world",
          "host" => "localhost.localdomain",
    "@timestamp" => 2022-01-11T07:49:57.473Z,
      "@version" => "1"
}

2 read file

You can view the specific contents as follows:


logstash uses a ruby gem library called filewatch to listen for file changes and pass a call sincedb database file to record the reading progress (timestamp) of the monitored log file

  1. The default path of sincedb data file is < path Under data > / plugins / inputs / file, the file name is similar to sincedb_ one hundred and twenty-three thousand four hundred and fifty-six
  2. <path. Data > indicates the logstash plug-in storage directory. The default is LOGSTASH_HOME/data.

give an example:

  1. path: location of log; After starting the configuration file, logstash can read the contents of the log in the current location
  2. start_position: the starting position, which tells logstash where to start reading when reading files (in the configuration below, it starts reading from the beginning)
  3. It will be read with the data of each row
  4. It will always take screenshots if it is not turned off
input {
    file {
        path => ["/var/*/*"]
        start_position => "beginning"
    }
}
output {
    stdout{
        codec=>rubydebug    
    }
}

By default, logstash will read data from the end of the file, that is, the logstash process will get data line by line in the form of tail -f command.

3 read TCP network data

Read network data, as demonstrated earlier, not here.

input {
  tcp {
    port => "1234"
  }
}

filter {
  grok {
    match => { "message" => "%{SYSLOGLINE}" }
  }
}

output {
    stdout{
        codec=>rubydebug
    }
}

2: Filter plug-in

See official documents for details: https://www.elastic.co/guide/en/logstash/current/filter-plugins.html

1 Grok regular capture

Through regular parsing of arbitrary text, unstructured log data is made into a structured and convenient query structure. It is the best way to parse unstructured log data in logstash.

Match a row of data with something. For example, after logstash reads the data, it will read a line of data into a message. To know exactly what each field is and facilitate query, Grok can divide this row of data into multiple fields.

Grok's syntax rules are:

%{grammar: semantics}

give an example:

  1. For example, the input content is:

    172.16.213.132 [07/Feb/2019:16:24:19 +0800] "GET / HTTP/1.1" 403 5039
    
    • %The result of {IP:clientip} matching pattern is: clientip: 172.16.213.132
    • %The result of {HTTPDATE:timestamp} matching pattern is: timestamp: 07/Feb/2018:16:24:19 +0800
    • %{QS:referrer} the result of matching the pattern will be: referrer: "GET / HTTP/1.1"
  2. The following is a combined matching pattern, which can get all the contents entered above:

    %{IP:clientip}\ \[%{HTTPDATE:timestamp}\]\ %{QS:referrer}\ %{NUMBER:response}\ %{NUMBER:bytes}
    

    Through the above combined matching pattern, we divide the input content into five parts, namely five fields, and divide the input content into different data fields, which is very useful for parsing and querying log data in the future, which is the purpose of using grok.

  3. Complete example

    input{
        stdin{}
    }
    filter{
        grok{
            match => ["message","%{IP:clientip}\ \[%{HTTPDATE:timestamp}\]\ %{QS:referrer}\ %{NUMBER:response}\ %{NUMBER:bytes}"]
        }
    }
    output{
        stdout{
            codec => "rubydebug"
        }
    }
    

    Enter log content:

    172.16.213.132 [07/Feb/2019:16:24:19 +0800] "GET / HTTP/1.1" 403 5039
    
  4. Execution process

    [root@localhost config]# ls
    java_pid12011.hprof  logstash-sample.conf  startup.options
    java_pid21521.hprof  logstash.yml          test1.conf
    jvm.options          my-app.conf
    log4j2.properties    pipelines.yml
    [root@localhost config]# vim test2.conf
    input {
        stdin {
        }
    }
    
    filter{
        grok{
            match => ["message","%{IP:clientip}\ \[%{HTTPDATE:timestamp}\]\ %{QS:referrer}\ %{NUMBER:response}\ %{NUMBER:bytes}"]
        }
    }
    
    
    output {
        stdout {
            codec=>rubydebug
        }
    }
    ~
    ~
    ~
    ~
    "test2.conf" [new] 17L, 237C Written                
    [root@localhost config]# ../bin/logstash -f test2.conf 
    Using JAVA_HOME defined java: /opt/elasticsearch-7.6.1/jdk
    WARNING: Using JAVA_HOME while Logstash distribution comes with a bundled JDK.
    DEPRECATION: The use of JAVA_HOME is now deprecated and will be removed starting from 8.0. Please configure LS_JAVA_HOME instead.
    OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be removed in a future release.
    Sending Logstash logs to /usr/local/logstash/logs which is now configured via log4j2.properties
    [2022-01-11T16:31:45,624][INFO ][logstash.runner          ] Log4j configuration path used is: /usr/local/logstash/config/log4j2.properties
    [2022-01-11T16:31:45,636][INFO ][logstash.runner          ] Starting Logstash {"logstash.version"=>"7.16.2", "jruby.version"=>"jruby 9.2.20.1 (2.5.8) 2021-11-30 2a2962fbd1 OpenJDK 64-Bit Server VM 13.0.2+8 on 13.0.2+8 +indy +jit [linux-x86_64]"}
    [2022-01-11T16:31:45,919][WARN ][logstash.config.source.multilocal] Ignoring the 'pipelines.yml' file because modules or command line options are specified
    [2022-01-11T16:31:47,168][INFO ][logstash.agent           ] Successfully started Logstash API endpoint {:port=>9600, :ssl_enabled=>false}
    [2022-01-11T16:31:47,858][INFO ][org.reflections.Reflections] Reflections took 70 ms to scan 1 urls, producing 119 keys and 417 values 
    [2022-01-11T16:31:49,066][INFO ][logstash.javapipeline    ][main] Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>2, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50, "pipeline.max_inflight"=>250, "pipeline.sources"=>["/usr/local/logstash/config/test2.conf"], :thread=>"#<Thread:0xfabd406 run>"}
    [2022-01-11T16:31:49,824][INFO ][logstash.javapipeline    ][main] Pipeline Java execution initialization time {"seconds"=>0.75}
    WARNING: An illegal reflective access operation has occurred
    WARNING: Illegal reflective access by com.jrubystdinchannel.StdinChannelLibrary$Reader (file:/usr/local/logstash/vendor/bundle/jruby/2.5.0/gems/jruby-stdin-channel-0.2.0-java/lib/jruby_stdin_channel/jruby_stdin_channel.jar) to field java.io.FilterInputStream.in
    WARNING: Please consider reporting this to the maintainers of com.jrubystdinchannel.StdinChannelLibrary$Reader
    WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
    WARNING: All illegal access operations will be denied in a future release
    [2022-01-11T16:31:49,910][INFO ][logstash.javapipeline    ][main] Pipeline started {"pipeline.id"=>"main"}
    [2022-01-11T16:31:49,985][INFO ][logstash.agent           ] Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
    The stdin plugin is now waiting for input:
    172.16.213.132 [07/Feb/2022:16:24:19 +0800] "GET / HTTP/1.1" 403 5039
    {
          "response" => "403",
          "clientip" => "172.16.213.132",
        "@timestamp" => 2022-01-11T08:32:38.109Z,
             "bytes" => "5039",
          "referrer" => "\"GET / HTTP/1.1\"",
           "message" => "172.16.213.132 [07/Feb/2022:16:24:19 +0800] \"GET / HTTP/1.1\" 403 5039",
         "timestamp" => "07/Feb/2022:16:24:19 +0800",
              "host" => "localhost.localdomain",
          "@version" => "1"
    }
    
    

2 time processing (Date)

The date plug-in is especially important for sorting events and backfilling old data. It can be used to convert the time field in the log record into a logstack:: timestamp object, and then transfer it to the @ timestamp field
Example (only the filter part is listed):

filter {
    grok {
        match => ["message", "%{HTTPDATE:timestamp}"]
    }
    date {
        match => ["timestamp", "dd/MMM/yyyy:HH:mm:ss Z"]
    }
}

3 data modification (Mutate)

3.1 regular expression replacement matching field (gsub)

gsub can replace the matched value in the field with a regular expression, which is only valid for string fields

give an example:

  1. This example indicates that the file will be_ name_ Replace all "/" characters in the 1 field with ""
    filter {
        mutate {
            gsub => ["filed_name_1", "/" , "_"]
        }
    }
    

3.2 delimiter split string into array (split)

Split can split the string in the field into an array by the specified delimiter
give an example:

  1. Set filed_name_2 fields are separated into arrays with "|" as interval.
    filter {
        mutate {
            split => ["filed_name_2", "|"]
        }
    }
    

3.3 rename field

Rename can rename a field

give an example:

  1. Set field old_ Rename field to new_field.
    filter {
        mutate {
            rename => { "old_field" => "new_field" }
        }
    }
    

3.4 remove_field

remove_field can delete a field

give an example:

  1. Delete the timestamp field.
    filter {
        mutate {
            remove_field  =>  ["timestamp"]
        }
    }
    

3.5 GeoIP address query classification

filter {
    geoip {
        source => "ip_field"
    }
}

3.6 comprehensive examples

Profile content:

  1. Convert = > ["response", "float]: convert the response to the float type (originally resolved by the NUMBER type)
  2. Rename = > {"response" = > "response_new"}: Rename response to response_ new
  3. input {
        stdin {}
    }
    filter {
        grok {
            match => { "message" => "%{IP:clientip}\ \[%{HTTPDATE:timestamp}\]\ %{QS:referrer}\ %{NUMBER:response}\ %{NUMBER:bytes}" }
            remove_field => [ "message" ]
       }
    date {
            match => ["timestamp", "dd/MMM/yyyy:HH:mm:ss Z"]
        }
    mutate {
              convert => [ "response","float" ]
               rename => { "response" => "response_new" }   
               gsub => ["referrer","\"",""]          
               split => ["clientip", "."]
            }
    }
    output {
        stdout {
            codec => "rubydebug"
        }
     }
    
  4. Test log
    172.16.213.132 [07/Feb/2019:16:24:19 +0800] "GET / HTTP/1.1" 403 5039
    
  5. Operation process
    [root@localhost config]# ls
    java_pid12011.hprof  log4j2.properties     my-app.conf      test1.conf
    java_pid21521.hprof  logstash-sample.conf  pipelines.yml    test2.conf
    jvm.options          logstash.yml          startup.options
    [root@localhost config]# 
    [root@localhost config]# vim test3.conf
    input {
        stdin {}
    }
    filter {
        grok {
            match => { "message" => "%{IP:clientip}\ \[%{HTTPDATE:timestamp}\]\ %{QS:referrer}\ %{NUMBER:response}\ %{NUMBER:bytes}" }
            remove_field => [ "message" ]
       }
    date {
            match => ["timestamp", "dd/MMM/yyyy:HH:mm:ss Z"]
        }
    mutate {
              convert => [ "response","float" ]
               rename => { "response" => "response_new" }
               gsub => ["referrer","\"",""]
               split => ["clientip", "."]
            }
    }
    output {
        stdout {
            codec => "rubydebug"
        }
    }
    "test3.conf" [new] 23L, 556C Written                                               
    [root@localhost config]# ../bin/logstash -f test3.conf 
    Using JAVA_HOME defined java: /opt/elasticsearch-7.6.1/jdk
    WARNING: Using JAVA_HOME while Logstash distribution comes with a bundled JDK.
    DEPRECATION: The use of JAVA_HOME is now deprecated and will be removed starting from 8.0. Please configure LS_JAVA_HOME instead.
    OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be removed in a future release.
    Sending Logstash logs to /usr/local/logstash/logs which is now configured via log4j2.properties
    [2022-01-11T17:57:16,417][INFO ][logstash.runner          ] Log4j configuration path used is: /usr/local/logstash/config/log4j2.properties
    [2022-01-11T17:57:16,430][INFO ][logstash.runner          ] Starting Logstash {"logstash.version"=>"7.16.2", "jruby.version"=>"jruby 9.2.20.1 (2.5.8) 2021-11-30 2a2962fbd1 OpenJDK 64-Bit Server VM 13.0.2+8 on 13.0.2+8 +indy +jit [linux-x86_64]"}
    [2022-01-11T17:57:16,693][WARN ][logstash.config.source.multilocal] Ignoring the 'pipelines.yml' file because modules or command line options are specified
    [2022-01-11T17:57:17,938][INFO ][logstash.agent           ] Successfully started Logstash API endpoint {:port=>9600, :ssl_enabled=>false}
    [2022-01-11T17:57:18,676][INFO ][org.reflections.Reflections] Reflections took 86 ms to scan 1 urls, producing 119 keys and 417 values 
    [2022-01-11T17:57:19,987][INFO ][logstash.javapipeline    ][main] Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>2, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50, "pipeline.max_inflight"=>250, "pipeline.sources"=>["/usr/local/logstash/config/test3.conf"], :thread=>"#<Thread:0x73426cc8 run>"}
    [2022-01-11T17:57:20,809][INFO ][logstash.javapipeline    ][main] Pipeline Java execution initialization time {"seconds"=>0.81}
    WARNING: An illegal reflective access operation has occurred
    WARNING: Illegal reflective access by com.jrubystdinchannel.StdinChannelLibrary$Reader (file:/usr/local/logstash/vendor/bundle/jruby/2.5.0/gems/jruby-stdin-channel-0.2.0-java/lib/jruby_stdin_channel/jruby_stdin_channel.jar) to field java.io.FilterInputStream.in
    WARNING: Please consider reporting this to the maintainers of com.jrubystdinchannel.StdinChannelLibrary$Reader
    WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
    WARNING: All illegal access operations will be denied in a future release
    [2022-01-11T17:57:20,882][INFO ][logstash.javapipeline    ][main] Pipeline started {"pipeline.id"=>"main"}
    The stdin plugin is now waiting for input:
    [2022-01-11T17:57:20,933][INFO ][logstash.agent           ] Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
    172.16.213.132 [07/Feb/2019:16:24:19 +0800] "GET / HTTP/1.1" 403 5039
    {
            "referrer" => "GET / HTTP/1.1",
        "response_new" => "403",
               "bytes" => "5039",
           "timestamp" => "07/Feb/2019:16:24:19 +0800",
            "clientip" => [
            [0] "172",
            [1] "16",
            [2] "213",
            [3] "132"
        ],
          "@timestamp" => 2019-02-07T08:24:19.000Z,
                "host" => "localhost.localdomain",
            "@version" => "1"
    }
    
    

3: Output plug-in (output)

Official documentation of this plug-in: https://www.elastic.co/guide/en/logstash/current/output-plugins.html
output is the last stage of Logstash. An event can go through multiple outputs. Once all outputs are processed, the whole event will be executed. Some common outputs include:

  • File: indicates that log data is written to a file on disk.
  • Elasticsearch: indicates to send log data to elasticsearch. Elasticsearch can save data efficiently, conveniently and easily.

3.1 output to standard output (stdout)

Output on console

output {
    stdout {
        codec => rubydebug
    }
}

3.2 save as file

Store the output data in a file

output {
    file {
        path => "/data/log/%{+yyyy-MM-dd}/%{host}_%{+HH}.log"
    }
}

3.3 output to elasticsearch

Output data to es

output {
    elasticsearch {
        host => ["192.168.1.1:9200","172.16.213.77:9200"]
        index => "logstash-%{+YYYY.MM.dd}"       
    }
}
  • host: is an array type value, followed by the address and port of the elasticsearch node. The default port is 9200. Multiple addresses can be added.
  • Index: the name of the index written to elasticsearch. Variables can be used here. Logstash provides% {+ YYYY.MM.dd}. When parsing the syntax, if you see a string beginning with a + sign, you will automatically think that it is followed by a time format, and try to parse the subsequent string with a time format. This split writing method in days can easily delete old data or search data within a specified time range. Also, note that the index name cannot contain uppercase letters.
  • manage_template: used to set whether to turn on the logstash automatic template management function. If it is set to false, the automatic template management function will be turned off. If we have customized the template, it should be set to false.
  • template_name: this configuration item is used to set the name of the template in Elasticsearch.

Comprehensive test

Monitor the nginx unit log, obtain data from the nginx log file, and finally output it to the corresponding index in es through filter conversion

input {
    file {
        path => ["D:/ES/logstash-7.3.0/nginx.log"]        
        start_position => "beginning"
    }
}

filter {
    grok {
        match => { "message" => "%{IP:clientip}\ \[%{HTTPDATE:timestamp}\]\ %{QS:referrer}\ %{NUMBER:response}\ %{NUMBER:bytes}" }
        remove_field => [ "message" ]
   }
	date {
        match => ["timestamp", "dd/MMM/yyyy:HH:mm:ss Z"]
    }
	mutate {
           rename => { "response" => "response_new" }
           convert => [ "response","float" ]
           gsub => ["referrer","\"",""]
           remove_field => ["timestamp"]
           split => ["clientip", "."]
        }
}

output {
    stdout {
        codec => "rubydebug"
    }

	elasticsearch {
	    host => ["localhost:9200"]
	    index => "logstash-%{+YYYY.MM.dd}"       
	}

}

Topics: Java Back-end