Flume13: flume optimization

Posted by mcfmullen on Fri, 04 Mar 2022 03:22:49 +0100

1, Flume optimization

1. Adjust the memory size of Flume process,

It is recommended to set 1G~2G. Too small will lead to frequent GC
Because Flume process is also based on Java, it involves the memory setting of the process. Generally, it is recommended to set the memory of a single Flume process (or a single Agent) to 1G~2G. If the memory is too small, GC will occur frequently, affecting the execution efficiency of the Agent.
How many specific settings are appropriate?

This requirement depends on the size and speed of the data read by the Agent, so it needs to be analyzed in detail. When Flume's Agent starts, a process will start accordingly. We can check the GC information of the process through jstat -gcutil PID 1000 and refresh it every second. If the number of GC times increases too fast, it indicates that the memory is not enough.

Use jps to view the flume process currently started

[root@bigdata04 ~]# jps
2957 Jps
2799 Application

Execute jstat -gcutil PID 1000

[root@bigdata04 ~]# jstat -gcutil 2799 1000
S0     S1     E      O      M     CCS    YGC     YGCT    FGC    FGCT     GCT   
100.00   0.00  17.54  42.80  96.46  92.38      8    0.029     0    0.000    0.029
100.00   0.00  17.54  42.80  96.46  92.38      8    0.029     0    0.000    0.029
100.00   0.00  17.54  42.80  96.46  92.38      8    0.029     0    0.000    0.029
100.00   0.00  17.54  42.80  96.46  92.38      8    0.029     0    0.000    0.029
100.00   0.00  17.54  42.80  96.46  92.38      8    0.029     0    0.000    0.029
100.00   0.00  17.54  42.80  96.46  92.38      8    0.029     0    0.000    0.029
100.00   0.00  17.54  42.80  96.46  92.38      8    0.029     0    0.000    0.029
100.00   0.00  17.54  42.80  96.46  92.38      8    0.029     0    0.000    0.029

Here we mainly look at YGC, ygct, FGC, fgct and GCT

YGC: Indicates the new generation heap memory GC The number of times, if generated every tens of seconds, is also acceptable. If it occurs every second YGC,That means you need to increase memory
YGCT: Indicates the new generation heap memory GC Total time consumed
FGC: FULL GC Number of occurrences, note that if FUCC GC,be Flume The process will enter a suspended state, FUCC GC After execution
Flume Will continue to work, so FUCC GC It has a great impact on efficiency. The lower the value of this index, the better. There is no better.
GCT: All types of GC Total time consumed

If you need to adjust the Flume process memory, you need to adjust Flume env Java in SH script_ Opts parameters
Export Java_ Only when the # number in front of the opts parameter is removed will it take effect.

export JAVA_OPTS="-Xms1024m -Xmx1024m -Dcom.sun.management.jmxremote"

It is recommended that Xms and Xmx be set to the same size here to avoid memory exchange, which also consumes performance.

2. When a server starts multiple agent s, it is recommended to modify the configuration to distinguish the log files

Because there is log4j in the conf directory Properties, which specifies the name and location of the log file. The logs generated by all agents started by using the configuration under the conf directory will be recorded in the same log file. If we start more than 10 agents on a machine and later find that an Agent hangs, we want to check the log and analyze the problem, which is crazy, Because the logs generated by all agents are mixed together, it is impossible to analyze the logs at all.

Therefore, it is recommended to copy multiple conf directories and then modify log4j. In the corresponding conf directory The file name of the properties log (which can ensure that the logs of multiple agent s are stored separately), and the log level is adjusted to warn (to reduce the generation of garbage logs). The default info level will record a lot of log information.

In this way, when starting the Agent, specify different conf directories through the – conf parameter. It is convenient to analyze the log later. Each Agent has a separate log file.

Take bigdata04 machine as an example:
Copy the conf failover directory and use this directory when starting the failover task of sink in the future
Modify log4j The log record level and log file name in properties. The log file directory can be used without modification. The logs directory can be used uniformly.

[root@bigdata04 apache-flume-1.9.0-bin]# cp -r conf/ conf-failover
[root@bigdata04 apache-flume-1.9.0-bin]# cd conf-failover/
[root@bigdata04 conf-failover]# vi log4j.properties 

That's it when you restart

[root@bigdata04 apache-flume-1.9.0-bin]# nohup bin/flume-ng agent --name a1 --conf conf-failover --conf-file conf/failover.conf &

Note: in this command, conf failover / failover is specified after the – conf file parameter Conf is also OK, because conf failover is copied based on the conf directory, and the failover in these two directories The contents of the conf file are consistent.

Flume will load configuration information such as log4j according to the directory indicated after – conf, and – conf file specifies the independent path of the specific configuration file

In theory, which directory is used for failover Conf is OK, because the failover.com in these two directories The contents of the conf file are consistent, but in order to avoid ambiguity, it's best to use the failover.com file under the newly copied conf failover directory Conf file

In this way, flume failover will be generated in the logs directory of flume Log file, and only WARN and ERROR level logs are recorded in the file, so that the later troubleshooting log is very clear.

[root@bigdata04 apache-flume-1.9.0-bin]# cd logs/
[root@bigdata04 logs]# ll
total 4
-rw-r--r--. 1 root root 478 May  3 16:25 flume-failover.log
[root@bigdata04 logs]# more flume-failover.log 
03 May 2020 16:25:38,992 ERROR [SinkRunner-PollingRunner-FailoverSinkP
rocessor] (org.apache.flume.SinkRunner$PollingRunner.run:158)  - Unabl
e to deliver event. Exception follows.
org.apache.flume.EventDeliveryException: All sinks failed to process, 
nothing left to failover to
        at org.apache.flume.sink.FailoverSinkProcessor.process(Failove
        at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.ja
        at java.lang.Thread.run(Thread.java:748)

Topics: Big Data Hadoop flume