How does Flink analyze and handle back pressure?

Posted by blankextacy on Tue, 23 Nov 2021 14:12:55 +0100

1. Concept

backpressure is a very common problem in flow calculation. backpressure means that a node in the data pipeline becomes a bottleneck, and the processing rate can not keep up with the data sent by the upstream, so it is necessary to speed limit the upstream. Since real-time computing applications usually use message queues to decouple the production end and the consumer end, and the consumer end data source is pull based, the back pressure is usually transmitted from a node to the data source and reduces the intake rate of the data source (such as Kafka consumer).

① A node has a performance bottleneck, which may be the failure of the machine where the node is located (network, disk, etc.), the network delay of the machine, insufficient disk, and frequent GC,Data hotspot and other reasons.

② Most message oriented middleware, such as kafka of consumer from broker Put data pull Local, and producer Put data push reach broker.

2. Influence of back pressure

Backpressure does not directly affect the availability of jobs. It indicates that jobs are in a sub-health state, have potential performance bottlenecks, and may lead to greater data processing delays. Generally speaking, for some applications with low delay requirements or less data volume, the impact of back pressure may not be obvious. However, for large-scale Flink operation, backpressure may cause serious problems.

Back pressure will affect checkpoint

① checkpoint Duration:checkpoint barrier Follow the normal data flow. If the data processing is blocked, the checkpoint barrier The length of time flowing through the entire data pipeline becomes longer, resulting in checkpoint The overall time becomes longer.

② state Size: to ensure Exactly-Once Accurate once, for those with more than two input pipes Operator，checkpoint barrier Alignment is required, i.e. a faster input pipe is received barrier After that, the data behind it will be cached but not processed until the end of the slower input pipeline barrier Also arrived. These cached data will be put into state Inside, cause checkpoint Get bigger.

checkpoint Is the key to ensuring accuracy, checkpoint Longer time may lead to checkpoint Timeout failed and state Size may slow down checkpoint Even lead to OOM.

3.Flink's back pressure mechanism

Realization of network flow control: dynamic feedback / automatic back pressure

Consumer It needs to be given in time Producer Make one feedback，Immediately inform Producer What is the acceptable rate. There are two types of dynamic feedback:

Negative feedback: occurs when the reception rate is less than the transmission rate, and is notified Producer Reduce transmission rate

Positive feedback: occurs when the sending rate is less than the receiving rate, and is notified Producer You can increase the transmission rate

3.1 Flink backpressure mechanism

Flink has three types of data exchange:

Data exchange of the same Task;
Data exchange between different tasks and the JVM;
Exchange between different tasks and different Task managers.

3.1.1 data exchange of the same Task

Multiple operators are connected in series through operator chain, which is mainly used to avoid the overhead of serialization and network communication.

Operator chain operator chain Conditions for concatenating multiple operators:

① The parallelism of upstream and downstream is consistent

② The penetration of downstream nodes is 1

③ Upstream and downstream nodes share the same slot

④ Downstream node chain Strategy is ALWAYS(for example map,flatmap,filter The default is ALWAYS)

⑤ Upstream node chain Strategy is ALWAYS or HEAD(source The default is HEAD)

⑥ The data partition method between two nodes is forward

⑦ User not disabled chain

3.1.2 different tasks are the same Task Manager Data exchange

In TaskA, the data output by the operator is first serialized by the record Writer and then passed to the result Partition. Then, the data is passed to the Input Gate of TaskB through the local channel, and then to the record reader for reverse sequence.

3.1.3 exchange between different tasks and different Task managers

The difference from the above 3.1.2 is that the data is transferred to netty first, and the data is pushed to the Task at the remote end through netty.

3.2 TCP based backpressure mechanism of Flink (before V1.5)

Before version 1.5, TCP flow control mechanism was adopted instead of feedback mechanism.

TCP based before Flink 1.5 Back pressure mechanism

The sender Flink has a layer of Network Buffer, and the bottom layer uses Netty communication, that is, there is a layer of Channel Buffer. Finally, the Socket communication also has a Buffer. Similarly, the receiver also has a corresponding level 3 Buffer. Flink (before V1.5) essentially uses the flow control mechanism of TCP to realize feedback .

TCP realizes network flow control by using sliding window

There is a 16 bit window field at the head of the TCP message segment. When the receiver receives the data from the sender, the remaining size of its own buffer is set to the 16 bit window field in the ACK response message. The window field value changes with the network transmission. The larger the window, the higher the network throughput.

Reference: 1 [computer network] 3.1 transport layer - TCP/UDP protocol

2.Apache Flink advanced tutorial (7): network flow control and back pressure analysis

Example: TCP uses sliding windows to limit traffic

Step 1: the sender will send 4, 5 and 6, and the receiver can also receive all data.

Step 2: when the consumer consumes 2, the window at the receiving end will slide forward one grid, that is, there is 1 grid left in the window. Then it is sent to the sender ACK = 7,window = 1.

Step 3: after the sender sends 7, the receiver receives 7, but the consumer fault of the receiver cannot consume data. At this time, the receiver sends ACK = 8 and window = 0 to the sender. Since window = 0 at this time, the sender cannot send any data, which will reduce the transmission speed of the sender to 0.

Disadvantages of TCP based backpressure mechanism

① single Task The back pressure blocked the whole TaskManager of socket，cause checkpoint barrier Can't spread, eventually leading to checkpoint Time growth even checkpoint Timeout failed.

② The backpressure path is too long, resulting in backpressure time delay.

3.3 credit based backpressure mechanism of Flink (since V1.5)

The back pressure mechanism is implemented at the Flink level, and the feedback is transmitted through ResultPartition and InputGate .

Credit-base of feedback Steps:

① every time ResultPartition towards InputGate When sending data, one will be sent backlog size Tell the downstream how many messages to send, and the downstream will calculate how many messages to send Buffer To receive messages. ( backlog The function of is to make the consumer feel the situation of our production side)

② If there is sufficient downstream Buffer ，Will be returned upstream Credit (Indicates remaining buffer Quantity) to inform the sending message (whether the two dotted lines on the figure are used or not) Netty and Socket Communicate).

Production section send backlog=1

The consumer returns credit=3

When the production end runs out of buffer, it returns credit=0

There is also a data backlog on the production side

4. Locate the back pressure node

4.1 back pressure monitoring of Flink Web UI - direct mode

The backpressure monitoring of Flink Web UI provides Subtask level backpressure monitoring. The principle of monitoring is to collect all threads running on TaskManager through Thread.getStackTrace(), collect the number of threads blocked in buffer requests (meaning downstream blocking), and calculate the ratio rate between the number of buffer blocked threads and the number of bus processes. Where, rate < 0.1 is OK, 0.1 < = rate < = 0.5 is LOW, and rate > 0.5 is HIGH.

The following two scenarios may cause backpressure:

① The sending rate of this node cannot keep up with its data generation rate. This scenario is generally a single input multiple output operator, such as FlatMap. Positioning means because it is from Source Task reach Sink Task The first node with backpressure, so this node is the root node of backpressure.

② The downstream node processes data slowly, and the transmission rate of the node is limited by backpressure. The positioning means is to continue to check the downstream nodes from this node.

matters needing attention:

① because Flink Web UI The backpressure panel monitors the sending end, so the root node of backpressure does not necessarily reflect high backpressure on the backpressure panel. If a node is a performance bottleneck, it will not cause high backpressure in itself, but high backpressure in its upstream. Overall, if the first node with backpressure is found, the backpressure source is this node or its downstream node.

② The above two states cannot be distinguished through the back pressure panel, so they need to be combined Metrics And other monitoring means. If the number of nodes of a job is large or the degree of parallelism is large, all nodes need to be collected Task According to the stack information, the pressure on the back pressure panel will be very large or even unavailable.

4.2 Flink Task Metrics - Indirect

(1) Review Flink credit based network

① TaskManager Data transmission between

different TaskManager Two on the Subtask Usually, channel Quantity equals grouping key Or equal to the operator concurrency. these channel Will reuse the same TaskManager Process TCP Request and share the receiver Subtask Rank Buffer Pool. 

② receiving end

each channel In the initial stage, a fixed number of exclusive will be allocated Exclusive Buffer，Used to store received data. operator Operator Release again after use Exclusive  Buffer. explain: channel Receiver idle Buffer The quantity is called Credit，Credit It will be regularly synchronized to the sender to decide how many to send Buffer Data.

③ Scenes with high traffic

Receiving end, channel Write full Exclusive Buffer After, Flink Will to Buffer Pool Apply for the remaining Floating Buffer. Sender, one Subtask be-all Channel Will share the same Buffer Pool，Therefore, no distinction is made Exclusive Buffer and Floating Buffer.

(2) Flink Task Metrics monitors back pressure

Network and task I/O Metrics is a lightweight backpressure monitor, which is used for continuously running jobs. The following metrics are the most useful backpressure indicators.

The idea of using Metrics to analyze backpressure: if the Buffer occupancy rate of the sender of a Subtask is very high, it indicates that it is limited by the downstream backpressure; If the Buffer occupation at the receiving end of a Subtask is very high, it indicates that it transmits the back pressure to the upstream.

Explanation:

① outPoolUsage and inPoolUsage Both low indicate current Subtask Is normal, and both are high, respectively indicating the current Subtask By downstream backpressure.

② If one Subtask of outPoolUsage Is high, usually downstream Task Therefore, the possibility that it itself is the root of back pressure can be investigated.

③ If one Subtask of outPoolUsage Is low, but its inPoolUsage If it is high, it indicates that it may be the root of backpressure. Because usually the back pressure will be transmitted to its upstream, resulting in some upstream pressure Subtask of outPoolUsage Is high.

Note: backpressure is sometimes transient and has little effect, such as from a channel Short network delay or TaskManager Normal GC，In this case, it can not be handled.

The following table divides inPoolUsage into floatingBuffersUsage and exclusiveBuffersUsage, and summarize upstream tasks outPoolUsage and The relationship between floatingBuffersUsage and exclusiveBuffersUsage further analyzes the backpressure of a Subtask and its upstream Subtask.

Resolution:

① floatingBuffersUsage High indicates that the back pressure is conducting upstream.

② exclusiveBuffersUsage It indicates that the back pressure may be inclined. If floatingBuffersUsage High exclusiveBuffersUsage Low, there is a tilt. Because a few channel Takes up most of the floating Buffer(channel Have your own exclusive buffer，When exclusive buffer After consumption, it will be used floating Buffer).

5. How does Flink analyze back pressure

The above mainly locates the backpressure through TaskThread, and the analysis of the cause of backpressure is similar to the performance bottleneck of an ordinary program.

(1) Data skew

Confirm through the Records Sent and Record Received of each SubTask in the Web UI. In addition, the State size of different subtasks in the Checkpoint detail is also a useful indicator for analyzing data skew. The solution is to perform local / pre aggregation on the key s of data packets to eliminate / reduce data skew.

(2) Execution efficiency of user code

Conduct CPU profile for TaskManager and analyze whether TaskThread is running full of a CPU core: if it is not running full, analyze the functions in which CPU is mainly spent, such as the user function (ReDoS) of Regex occasionally stuck in the production environment; If it is not full, you need to see where the Task Thread is blocked. It may be some synchronous calls of the user function itself, or system activities such as checkpoint or GC.

(3) TaskManager memory and GC

Frequent Full GC and even loss of contact caused by unreasonable memory in each area of the TaskManager JVM. You can add - 20: + printgcdetails to print GC logs to observe GC problems. It is recommended that TaskManager enable G1 garbage collector to optimize GC.

Topics: flink

Programmer Think