Flink parsing: time window

Posted by tharagleb on Mon, 17 Jan 2022 22:58:46 +0100

catalogue

Time concept

WaterMarks and window concept

watermark data structure

Multi source watermark processing

window

Tumbling Windows scrolling window

Sliding Windows sliding window

Session Windows session window

Global Windows global window

Triggers window trigger

Fire and purge

WindowAssigner default Triggers

Built in Triggers and custom Triggers

Window Functions window functions

ReduceFunction

AggregateFunction

ProcessWindowFunction

ProcessWindowFunction for incremental aggregation

Incremental aggregation using ReduceFunction

Incremental aggregation using AggregateFunction

Using per window state in ProcessWindowFunction

Evictors

Late data

Allowed Lateness

side output

Some considerations of late data

Consideration of state size

reference resources

Time concept

Time plays an important role in computing because of the real-time stream processing events in Flink framework. For example, time series analysis, aggregation based on a specific time period (window), or event processing in important cases. Flink's DataStream supports three kinds of time: EventTime, IngestTime and ProcessingTime, and has a large number of time-based operator s.

Compare these three times:

  • EventTime
    • The event generation time already exists before entering Flink, and can be extracted from the event field
    • You must specify how watermarks are generated
    • Advantages: certainty can give correct results in case of disorder, delay or data repetition
    • Weakness: performance and latency are affected when handling unordered events
  • Ingesttime (rarely used...)
    • The time when the event enters Flink, that is, the time of the current system obtained in source, which is used uniformly for subsequent operations
    • It is not necessary to specify the generation method of watermarks (automatic generation)
    • Weakness: unable to process out of order time and delay data
  • ProcessingTime
    • Current system time of the machine performing the operation (each operator is different)
    • No coordination between flow and machine is required
    • Advantages: best performance and lowest latency
    • Weakness: uncertainty, which is easily affected by various factors (such as the speed of event generation, the speed of reaching flink, the transmission speed between operators, etc.), regardless of sequence and delay at all

in summary:

  • Performance: processingtime > ingesttime > eventtime
  • Delay: processingtime < ingesttime < eventtime
  • Deterministic: processingtime < ingesttime < eventtime

If the Time type is not set, the default is processingTime. Generally, projects basically use EventTime. If you need to use EventTime, you need to specify the timestamp assignor & watermark generator after the source.

WaterMarks and window concept

Before talking about water level watermarks, we can consider what the water level problem is to solve. In the actual working scenario of streaming computing, the sequence of events has a certain impact on the correctness of the calculation results. However, due to the network delay or storage itself, the data is delayed and out of order. For example, the data generated in the first second arrives in the fifth second.

Therefore, to solve this problem, Flink proposed watermark, which specializes in the calculation of EventTime window. Its essence is actually a timestamp. For late element s, it is impossible to wait indefinitely. There must be a mechanism to ensure that after a specific time, the trigger window must be taken for calculation. This mechanism is watermark. It can be understood that watermark is a way to tell Flink how late messages are and how long to wait for late data. Generally, it is generated by Flink Source or customized watermark generator according to requirements, and then flows to the downstream operator along with the ordinary data flow. The operator receiving watermark will take a max operation according to the newly arrived watermark.

watermark data structure

There are many different elements flowing in Flink DataStream, which are collectively called StreamElement. StreamElement can be any type of StreamRecord, Watermark, StreamStatus and LatencyMarker. It is an abstract class (the base class of Flink class to carry messages). The other four types inherit StreamElement.

public abstract class StreamElement {
  //Determine whether it is Watermark
  public final boolean isWatermark() {
    return getClass() == Watermark.class;
  }
  //Judge whether it is StreamStatus
  public final boolean isStreamStatus() {
    return getClass() == StreamStatus.class;
  }
  //Determine whether it is StreamRecord
  public final boolean isRecord() {
    return getClass() == StreamRecord.class;
  }
  //Judge whether it is LatencyMarker
  public final boolean isLatencyMarker() {
    return getClass() == LatencyMarker.class;
  }
  //Convert to StreamRecord
  public final <E> StreamRecord<E> asRecord() {
    return (StreamRecord<E>) this;
  }
  //Convert to Watermark
  public final Watermark asWatermark() {
    return (Watermark) this;
  }
  //Convert to StreamStatus
  public final StreamStatus asStreamStatus() {
    return (StreamStatus) this;
  }
  //Convert to LatencyMarker
  public final LatencyMarker asLatencyMarker() {
    return (LatencyMarker) this;
  }
}

Watermark inherits StreamElement. Watermark is a level of abstraction with events. It contains a member variable timestamp, which identifies the time progress of the current data. Watermark actually flows with the data stream as part of the data stream.

At present, Flink has two ways to generate watermark

  • Punctuated: trigger the generation of a new water mark through some special mark events in the data flow. In this way, the triggering of the window is independent of the time, but depends on when the tag event is received, that is, each incremental eventTime in the data flow will generate a watermark. In actual production, punctuated mode will generate a large number of watermarks in the scene with high TPS, which will put pressure on downstream operators to a certain extent. Therefore, punctuated mode will be selected to generate watermarks only in the scene with high real-time requirements.
  • Periodic: a Watermark generated periodically (such as a certain time interval or a certain number of records). In actual production, the periodic method must combine the two dimensions of time and accumulated number to continue to generate Watermark periodically, otherwise there will be a great delay in extreme cases.

Therefore, the generation method of Watermark needs to be selected according to different business scenarios.

Multi source watermark processing

If there are data from multiple source s in a job during actual stream processing, for example, after grouping by group, the same key will be shuffle d to the same node and have different watermarks. Because within Flink, in order to ensure that watermark keeps monotonically increasing, Flink will select the smallest of all incoming eventtimes to flow out to the downstream. Thus, the monotonic increment of watermark and the integrity of data are guaranteed. The following figure (here are the pictures of other big men):

window

The windows in Flink can be divided into: rolling window (no overlap), Sliding Window (possible overlap), Session Window (active gap) and global window

After specifying the keyed in the general program, define the window assignor. window assigner defines how elements in the stream are distributed to each window. You can use the window(...) (for {keyed} streams) or} windowAll(...) (for non keyed streams) specifies a windowassignor. Windowassignor} is responsible for distributing each data in the stream to one or more windows. Flink provides some defined window assignors for the most common situations, namely tumbling windows, sliding windows, session windows and global windows. You can also inherit the windowassignor class to implement a custom window assignor. All built-in window assignors (except global window) distribute data based on time, either processing time or event time.

Moreover, the time-based window uses [start timestamp, end timestamp] to describe the size of the bed width. In the Flink code, TimeWindow is used to process the time-based window. It has the methods of querying the start and end timestamp and returning the largest timestamp that the window can store, maxTimestamp().

Tumbling Windows scrolling window

The assigner of the scrolling window distributes the elements to the window of the specified size. The size of the scrolling window is fixed and the respective ranges do not overlap.

The time interval can be in time milliseconds(x),Time.seconds(x),Time.minutes(x), etc. Here is the official sample code.

DataStream<T> input = ...;

// Scroll the event time window
input
    .keyBy(<key selector>)
    .window(TumblingEventTimeWindows.of(Time.seconds(5)))
    .<windowed transformation>(<window function>);

// Scroll through the processing time window
input
    .keyBy(<key selector>)
    .window(TumblingProcessingTimeWindows.of(Time.second(5)))
    .<windowed transformation>(<window function>);

// A one-day scrolling event time window with an offset of - 8 hours
input
    .keyBy(<key selector>)
    .window(TumblingEventTimeWindows.of(Time.days(1), Time.hours(-8)))
    .<windowed transformation>(<window function>);

As shown in the above example, the assignors of the scrolling window can also pass in the optional {offset} parameter. This parameter can be used to align windows. For example, when offset is not set, a scrolling window with a length of one hour will align with the epoch of linux. You will get such as 1:00:00.000 - 1:59:59.999, 2:00:00.000 - 2:59:59.999, etc. If you want to change the alignment, you can set an offset. If you set the offset for 15 minutes, you will get 1:15:00.000 - 2:14:59.999, 2:15:00.000 - 3:14:59.999, etc. An important offset use case is to adjust the time difference of the window according to UTC-0. For example, in China, you may set offset to time hours(-8).

Sliding Windows sliding window

Slide the assignor , distribution element of the window to the window of the specified size. The window size is set through the , window size , parameter. The sliding window requires an additional sliding distance (sliding step window slide) parameter to control the frequency of generating new windows. Therefore, if the slide is smaller than the window size, sliding the window can allow the windows to overlap. In this case, an element may be distributed to multiple windows.

For example, if you set a window with a size of 10 minutes and a sliding distance of 5 minutes, you will get a new window every 5 minutes, which contains the data arrived in the previous 10 minutes (as shown in the figure below).

 

The example code is as follows:

DataStream<T> input = ...;

// Slide the event time window
input
    .keyBy(<key selector>)
    .window(SlidingEventTimeWindows.of(Time.seconds(10), Time.seconds(5)))
    .<windowed transformation>(<window function>);

// Sliding the processing time window
input
    .keyBy(<key selector>)
    .window(SlidingProcessingTimeWindows.of(Time.seconds(10), Time.seconds(5)))
    .<windowed transformation>(<window function>);

// Slide the processing time window with an offset of - 8 hours
input
    .keyBy(<key selector>)
    .window(SlidingProcessingTimeWindows.of(Time.hours(12), Time.hours(1), Time.hours(-8)))
    .<windowed transformation>(<window function>);

Session Windows session window

The assignor of the session window will group the data according to the active session. Unlike scrolling and sliding windows, session windows do not overlap each other and have no fixed start or end time. The session window closes after a period of time without receiving data, that is, after an inactive interval. The assignor of the session window can set a fixed session gap or use the session gap extractor function to dynamically define how long it is considered inactive. When the inactive period is exceeded, the current session will be closed and the next data will be distributed to the new session window.

The dynamic interval can be specified by implementing the {SessionWindowTimeGapExtractor} interface.

DataStream<T> input = ...;

// Event time session window with fixed interval set
input
    .keyBy(<key selector>)
    .window(EventTimeSessionWindows.withGap(Time.minutes(10)))
    .<windowed transformation>(<window function>);

// Event time session window with dynamic interval set
input
    .keyBy(<key selector>)
    .window(EventTimeSessionWindows.withDynamicGap((element)-> {
        // Determines and returns the session interval
    }))
    .<windowed transformation>(<window function>);

// Processing time session window with fixed interval
input
    .keyBy(<key selector>)
    .window(ProcessingTimeSessionWindows.withGap(Time.minutes(10)))
    .<windowed transformation>(<window function>);
    
// Processing time session window with dynamic interval set
input
    .keyBy(<key selector>)
    .window(ProcessingTimeSessionWindows.withDynamicGap((element) -> {
        // Determines and returns the session interval
    }))
    .<windowed transformation>(<window function>);

Session window has no fixed start or end time, so its calculation method is different from sliding window and rolling window. Within Flink, the operator of the session window will create a window for each piece of data, and then merge the windows whose distance does not exceed the preset interval. In order for the window to be merged, the session window needs to have a function that supports merging Trigger And Window Function For example, ReduceFunction, AggregateFunction, or ProcessWindowFunction.

Global Windows global window

The assignor of the global window distributes all data with the same key to a global window. This window mode is only available when you specify a custom window trigger Useful when. Otherwise, the calculation will not occur because the global window has no natural end point to trigger the accumulated data.

The example code is as follows:

DataStream<T> input = ...;

input
    .keyBy(<key selector>)
    .window(GlobalWindows.create())
    .<windowed transformation>(<window function>);

Triggers window trigger

Trigger determines when a window (defined by the window assignor) can be processed by the window function. Generally speaking, the time stamp of watermark > = window Endtime and there is data in the window will trigger the calculation of the window. Each windowassignor has a default trigger. If the default trigger cannot meet the requirements, you can use the trigger(...) Call the custom trigger specified in.

The Trigger interface provides five methods to respond to different events:

  • The onElement() method is called when each element is added to the window.
  • The onEventTime() method is called when the registered event time timer is triggered.
  • The onProcessiongTime() method is called when the registered processing time timer is triggered.
  • The onMerge() method is related to stateful triggers. This method will merge the state of the trigger corresponding to the window when the two windows are merged, such as when using the session window.
  • Finally, the clear() method handles the logic required when the corresponding window is removed.

There are two points to note:

1. The first three methods determine how the trigger responds to events arriving at the window by returning TriggerResult. The solutions are as follows:

  • CONTINUE: do nothing
  • FIRE: trigger calculation
  • PURGE: clears the elements in the window
  • FIRE_AND_PURGE: trigger calculation and clear the elements in the window after calculation

2. Any of the above methods can be used to register processing time or event time timer.

Fire and purge

When the trigger determines that a window can be calculated, it will trigger, that is, it will return "FIRE" or "FIRE"_ AND_ PURGE. This is a signal for the window operator to send the calculation result of the current window. If a window specifies ProcessWindowFunction, all elements will be passed to ProcessWindowFunction. If it is ReduceFunction or AggregateFunction, the aggregated results will be sent directly.

When the trigger is triggered, it can return FIRE or FIRE_AND_PURGE. FIRE ^ retains the contents of the triggered window, while ^ FIRE_AND_PURGE} will delete these contents. Flink's built-in trigger uses FIRE by default and does not clear the status of the window.

Purge will only remove the contents of the window, not the meta information and trigger status of the window.

WindowAssigner default Triggers

WindowAssigner's default trigger is sufficient for many situations. For example, all event time window assignors use EventTimeTrigger by default. This trigger will be triggered directly after the watermark crosses the window end time.

GlobalWindow's default trigger is never triggered. Therefore, when using GlobalWindow, you must define a trigger yourself.

When you are When a trigger is specified in trigger(), you actually overwrite the current trigger Windowassignor is the default trigger. For example, if you specify one CountTrigger Tumbling event time windows, your window will no longer be triggered by time, but by the number of elements. If you want both response time and response quantity, you need to customize the trigger.

Built in Triggers and custom Triggers

Flink includes some built-in trigger s.

  • The EventTimeTrigger mentioned earlier is triggered according to the event time measured by watermark.
  • ProcessingTimeTrigger: triggered according to processing time.
  • CountTrigger triggered when the elements in the window exceed the preset limit.
  • PurgingTrigger: receive another trigger and convert it into a trigger that will clean up data.

If you need to implement a custom trigger, you should look at this abstract class Trigger  . Please note that this API is still evolving, so it may change in later versions of Flink.

Window Functions window functions

After defining the window assignor, we need to specify how to calculate the data in each window after the window is triggered. This is the responsibility of window function.

There are three kinds of window functions: ReduceFunction, AggregateFunction {or ProcessWindowFunction. The first two are more efficient (because they are pre aggregated, see State Size )Because Flink can perform incremental aggregation after each data reaches the window. ProcessWindowFunction will get an iteratable that can traverse all data in the current window, as well as meta information about the window.

The window conversion operation using {ProcessWindowFunction} is not as efficient as the other two functions, because Flink must cache all the data in the window before triggering. ProcessWindowFunction can be combined with ReduceFunction or AggregateFunction to improve efficiency. In this way, you can not only incrementally aggregate the data in the window, but also receive the metadata of the window from ProcessWindowFunction. Let's look at examples of each function.

ReduceFunction

ReduceFunction specifies how two input data are combined to produce one output data. The types of input and output data must be the same. Flink uses ReduceFunction to incrementally aggregate the data in the window.

ReduceFunction can be defined as follows:

DataStream<Tuple2<String, Long>> input = ...;
//The above example is to sum the second attribute of tuples in the window.
input
    .keyBy(<key selector>)
    .window(<window assigner>)
    .reduce(new ReduceFunction<Tuple2<String, Long>>() {
      public Tuple2<String, Long> reduce(Tuple2<String, Long> v1, Tuple2<String, Long> v2) {
        return new Tuple2<>(v1.f0, v1.f1 + v2.f1);
      }
    });

AggregateFunction

ReduceFunction is a special case of AggregateFunction. AggregateFunction receives three types: input data type (IN), accumulator type (ACC), and output data type (OUT). The input data type is the element type of the input stream. The AggregateFunction interface has the following methods: adding each element to the accumulator, creating an initial accumulator, merging two accumulators, and extracting output from the accumulator (OUT type). We illustrate by the following example.

Like ReduceFunction, Flink will perform incremental aggregation directly when the input data reaches the window.

AggregateFunction can be defined as follows:

/**
 * The accumulator is used to keep a running sum and a count. The {@code getResult} method
 * computes the average.
 */

// The above example calculates the average value of the second attribute of all elements in the window.
private static class AverageAggregate
    implements AggregateFunction<Tuple2<String, Long>, Tuple2<Long, Long>, Double> {
  @Override
  public Tuple2<Long, Long> createAccumulator() {
    return new Tuple2<>(0L, 0L);
  }

  @Override
  public Tuple2<Long, Long> add(Tuple2<String, Long> value, Tuple2<Long, Long> accumulator) {
    return new Tuple2<>(accumulator.f0 + value.f1, accumulator.f1 + 1L);
  }

  @Override
  public Double getResult(Tuple2<Long, Long> accumulator) {
    return ((double) accumulator.f0) / accumulator.f1;
  }

  @Override
  public Tuple2<Long, Long> merge(Tuple2<Long, Long> a, Tuple2<Long, Long> b) {
    return new Tuple2<>(a.f0 + b.f0, a.f1 + b.f1);
  }
}

DataStream<Tuple2<String, Long>> input = ...;

input
    .keyBy(<key selector>)
    .window(<window assigner>)
    .aggregate(new AverageAggregate());

ProcessWindowFunction

ProcessWindowFunction is more flexible than other window functions because it can obtain iteratable containing all elements in the window and Context objects used to obtain time and status information. The flexibility of ProcessWindowFunction comes at the cost of performance and resource consumption, because the data in the window cannot be incrementally aggregated, and all data needs to be cached before the window is triggered.

The signature of ProcessWindowFunction is as follows:

public abstract class ProcessWindowFunction<IN, OUT, KEY, W extends Window> implements Function {

    /**
     * Evaluates the window and outputs none or several elements.
     *
     * @param key The key for which this window is evaluated.
     * @param context The context in which the window is being evaluated.
     * @param elements The elements in the window being evaluated.
     * @param out A collector for emitting elements.
     *
     * @throws Exception The function may throw exceptions to fail the program and trigger recovery.
     */
    public abstract void process(
            KEY key,
            Context context,
            Iterable<IN> elements,
            Collector<OUT> out) throws Exception;

   	/**
   	 * The context holding window metadata.
   	 */
   	public abstract class Context implements java.io.Serializable {
   	    /**
   	     * Returns the window that is being evaluated.
   	     */
   	    public abstract W window();

   	    /** Returns the current processing time. */
   	    public abstract long currentProcessingTime();

   	    /** Returns the current event-time watermark. */
   	    public abstract long currentWatermark();

   	    /**
   	     * State accessor for per-key and per-window state.
   	     *
   	     * <p><b>NOTE:</b>If you use per-window state you have to ensure that you clean it up
   	     * by implementing {@link ProcessWindowFunction#clear(Context)}.
   	     */
   	    public abstract KeyedStateStore windowState();

   	    /**
   	     * State accessor for per-key global state.
   	     */
   	    public abstract KeyedStateStore globalState();
   	}

}

The key} parameter is selected by the} KeySelector} specified in} keyBy(). If you give the index of the key in the Tuple or specify the key in the string form of the attribute name, the type of the key will always be Tuple, and you need to manually convert it to a Tuple of the correct size to extract the key.

ProcessWindowFunction can be defined as follows:

DataStream<Tuple2<String, Long>> input = ...;

input
  .keyBy(t -> t.f0)
  .window(TumblingEventTimeWindows.of(Time.minutes(5)))
  .process(new MyProcessWindowFunction());

/* ... */

public class MyProcessWindowFunction 
    extends ProcessWindowFunction<Tuple2<String, Long>, String, String, TimeWindow> {

  @Override
  public void process(String key, Context context, Iterable<Tuple2<String, Long>> input, Collector<String> out) {
    long count = 0;
    for (Tuple2<String, Long> in: input) {
      count++;
    }
    out.collect("Window: " + context.window() + "count: " + count);
  }
}

The above example uses ProcessWindowFunction to count the elements in the window and output the information of the window itself.

Note that use ProcessWindowFunction is used to complete simple aggregation tasks Very inefficient.  

ProcessWindowFunction for incremental aggregation

ProcessWindowFunction can be used with ReduceFunction or AggregateFunction to enable incremental aggregation when data reaches the window. When the window is closed, ProcessWindowFunction will get the aggregated result. In this way, it can incrementally aggregate the elements of the window and obtain the metadata of the window from ProcessWindowFunction.

You can also use incremental aggregation for outdated windowfunctions.

Incremental aggregation using ReduceFunction

The following example shows how to combine "ReduceFunction" and "ProcessWindowFunction" to return the minimum element in the window and the start time of the window.

DataStream<SensorReading> input = ...;

input
  .keyBy(<key selector>)
  .window(<window assigner>)
  .reduce(new MyReduceFunction(), new MyProcessWindowFunction());

// Function definitions

private static class MyReduceFunction implements ReduceFunction<SensorReading> {

  public SensorReading reduce(SensorReading r1, SensorReading r2) {
      return r1.value() > r2.value() ? r2 : r1;
  }
}

private static class MyProcessWindowFunction
    extends ProcessWindowFunction<SensorReading, Tuple2<Long, SensorReading>, String, TimeWindow> {

  public void process(String key,
                    Context context,
                    Iterable<SensorReading> minReadings,
                    Collector<Tuple2<Long, SensorReading>> out) {
      SensorReading min = minReadings.iterator().next();
      out.collect(new Tuple2<Long, SensorReading>(context.window().getStart(), min));
  }
}

Incremental aggregation using AggregateFunction

The following example shows how to combine "AggregateFunction" and "ProcessWindowFunction", calculate the average value and output it together with the key corresponding to the window.

DataStream<Tuple2<String, Long>> input = ...;

input
  .keyBy(<key selector>)
  .window(<window assigner>)
  .aggregate(new AverageAggregate(), new MyProcessWindowFunction());

// Function definitions

/**
 * The accumulator is used to keep a running sum and a count. The {@code getResult} method
 * computes the average.
 */
private static class AverageAggregate
    implements AggregateFunction<Tuple2<String, Long>, Tuple2<Long, Long>, Double> {
  @Override
  public Tuple2<Long, Long> createAccumulator() {
    return new Tuple2<>(0L, 0L);
  }

  @Override
  public Tuple2<Long, Long> add(Tuple2<String, Long> value, Tuple2<Long, Long> accumulator) {
    return new Tuple2<>(accumulator.f0 + value.f1, accumulator.f1 + 1L);
  }

  @Override
  public Double getResult(Tuple2<Long, Long> accumulator) {
    return ((double) accumulator.f0) / accumulator.f1;
  }

  @Override
  public Tuple2<Long, Long> merge(Tuple2<Long, Long> a, Tuple2<Long, Long> b) {
    return new Tuple2<>(a.f0 + b.f0, a.f1 + b.f1);
  }
}

private static class MyProcessWindowFunction
    extends ProcessWindowFunction<Double, Tuple2<String, Double>, String, TimeWindow> {

  public void process(String key,
                    Context context,
                    Iterable<Double> averages,
                    Collector<Tuple2<String, Double>> out) {
      Double average = averages.iterator().next();
      out.collect(new Tuple2<>(key, average));
  }
}

Using per window state in ProcessWindowFunction

In addition to accessing the keyed state (any rich function can be used), ProcessWindowFunction can also use the keyed state whose scope is only the "currently processing window". In this case, it is very important to understand what window means in "per window". There are several windows to understand:

  • Window defined in window operation: for example, a scrolling window with a length of one hour or a sliding window with a length of two hours and a sliding window of one hour is defined.
  • Window instance corresponding to a key: for example, the time window from 12:00 to 13:00 with user ID XYZ as the key. The specific situation depends on the definition of the window. Many different window instances will be generated according to the specific key and time period.

Per window state acts on the latter. In other words, if we process 1000 events with different keys and all events are in the [12:00, 13:00) time window, we will get 1000 window instances, and each instance has its own keyed per window state.

There are two methods in the Context object received by process() that allow us to access the following two state s:

  • globalState() to access the global keyed state
  • windowState(), the access scope is limited to the keyed state of the current window

If you may trigger a window multiple times (for example, when your late data will trigger the window calculation again, or you customize the trigger to trigger the window in advance according to speculation), this function will be very useful. At this time, you may need to store information about the previous trigger or the total number of triggers in per window state.

When using window States, be sure to clear them when deleting windows. They should be defined in the clear() method.

Evictors

Flink's window model allows you to specify an optional "evictor" in addition to "windowassignor" and "trigger". As shown in the code at the beginning of this article, through {evictor(...) Method is passed in to evactor. Evictor can delete elements from the window after trigger triggers, before or after calling the window function. The evictor interface provides two methods to implement this function:

/**
 * Optionally evicts elements. Called before windowing function.
 *
 * @param elements The elements currently in the pane.
 * @param size The current number of elements in the pane.
 * @param window The {@link Window}
 * @param evictorContext The context for the Evictor
 */
void evictBefore(Iterable<TimestampedValue<T>> elements, int size, W window, EvictorContext evictorContext);

/**
 * Optionally evicts elements. Called after windowing function.
 *
 * @param elements The elements currently in the pane.
 * @param size The current number of elements in the pane.
 * @param window The {@link Window}
 * @param evictorContext The context for the Evictor
 */
void evictAfter(Iterable<TimestampedValue<T>> elements, int size, W window, EvictorContext evictorContext);

evictBefore() contains the logic before calling the window function, and {evictAfter() contains the logic after calling the window function. Elements removed before the window function is called are not evaluated by the window function.

Flink has three built-in Evictors:

  • CountEvictor: only record the number of elements specified by the user. Once the number of elements in the window exceeds this number, the redundant elements will be removed from the beginning of the window cache
  • DeltaEvictor: receive the , DeltaFunction , and , threshold , parameters, calculate the difference between the last element and all elements in the window cache, and remove elements with a difference greater than or equal to , threshold ,.
  • TimeEvictor: receive the {interval} parameter, expressed in milliseconds. It finds the maximum timestamp max of the element in the window_ TS , and remove the ratio , max_ts - all elements with a small interval.

By default, all built-in evictor logic is executed before calling the window function.

Specifying an evictor avoids pre aggregation because all elements in the window must pass through the evictor before calculation.

Flink does not guarantee the order of elements in the window. That is, even if the evictor removes an element from the beginning of the window cache, the element does not necessarily arrive at the window first or last.

Late data

When using the event time window, the data may be late, that is, the watermark used by Flink to track the event time progress has crossed the timestamp at the end of the window before the data arrives. In fact, late data is a special case of out of order data. The time of data coming is far beyond watermark's expectation, resulting in the window being closed before the data arrives.

Generally, for late data, three methods are adopted:

  • Reactivate the closed window and recalculate to correct the result
  • Collect the late data and process it separately
  • Discard the late data as an error message

Flink's default processing method is to discard directly. The other two are Side Output and allowed latency.

The Side Output mechanism can put the late events into a data flow branch separately, which will be used as a by-product of the window calculation results for users to obtain and process them specially.

The Allowed Lateness mechanism allows users to set a maximum Allowed Lateness. Flink will save the status of the window after the window is closed until it exceeds the allowable late time. The late events during this period will not be discarded, but will trigger window recalculation by default. Because additional memory is required to save the window state, and if the {ProcessWindowFunction} API is used for window calculation, each late event may trigger a full calculation of the window, which is expensive. Therefore, the allowable late time should not be set too long, and there should not be too many late events. Otherwise, it should be considered to reduce the speed of raising the water mark or adjust the algorithm.

Allowed Lateness

By default, once watermark crosses the timestamp at the end of the window, the late data will be directly discarded. However, Flink allows you to specify the maximum allowed latency of the window operator. Allowed lateness defines how long an element can be late without being discarded. This parameter is 0 by default. Elements that arrive during the period before watermark exceeds the end of the window and reaches the end of the window plus allowed lateness will still be added to the window. Depending on the trigger of the window, an element that is late but not discarded may trigger the window again, such as EventTimeTrigger.

In order to realize this function, Flink will save the window state to allowed latency, and will delete the window and its state only after timeout (e.g Window Lifecycle Described).

By default, allowed latency is set to 0. That is, elements arriving after watermark will be discarded.

You can specify allowed lateness as follows:

DataStream<T> input = ...;

input
    .keyBy(<key selector>)
    .window(<window assigner>)
    .allowedLateness(<time>)
    .<windowed transformation>(<window function>);

When using , GlobalWindows , no data will be considered late because the end timestamp of the global window is , long MAX_ VALUE.

side output

Via Flink Bypass output (side output stream) function, you can get the data stream of late data.

First, you need to use side outputlatedata (outputtag) on the stream after the window is opened to indicate that you need to obtain late data. Then, you can get the bypass output stream from the result of the window operation.

final OutputTag<T> lateOutputTag = new OutputTag<T>("late-data"){};

DataStream<T> input = ...;

SingleOutputStreamOperator<T> result = input
    .keyBy(<key selector>)
    .window(<window assigner>)
    .allowedLateness(<time>)
    .sideOutputLateData(lateOutputTag)
    .<windowed transformation>(<window function>);

DataStream<T> lateStream = result.getSideOutput(lateOutputTag);

Some considerations of late data

When allowed latency greater than 0 is specified, the window itself and its contents will still be retained after watermark crosses the end of the window. At this time, if a late but not discarded data arrives, it may trigger the window again. This trigger is called "late firing", which is different from "main firing" representing the first trigger window. In the case of using session windows, late firing may further merge existing windows because they may connect existing windows that have not been merged.

You should note that the element sent by late firing should be regarded as an update to the previous calculation results, that is, your data flow will contain multiple results of the same calculation task. Your application needs to consider these repeated results, or remove the repeated parts.

Consideration of state size

Windows can be defined over a long period of time (such as days, weeks, or months) and accumulate a large state. There are several rules to keep in mind when estimating the storage requirements calculated by the window:

  1. Flink creates a copy of an element in each window to which it belongs. Therefore, only one copy of an element will exist in the settings of the scrolling window (an element belongs to only one window unless it is late). On the contrary, an element may be copied into multiple sliding windows, as we have shown in Window Assigners As described in. Therefore, setting a sliding window with a size of one day and a sliding distance of one second may not be a good idea.

  2. ReduceFunction , and , AggregateFunction , can greatly reduce the storage requirements, because they aggregate the arriving elements in place and store only one value per window. Using ProcessWindowFunction needs to accumulate all the elements in the window.

  3. Pre aggregation can be avoided by using {evictor}, because all data in the window must pass through evictor before calculation.

reference resources

Windows | Apache Flink

[vernacular analysis] Flink's Watermark mechanism - Rossi's thinking - blog Garden

Apache Flink Talk Series (03) - Watermark Alibaba cloud developer community

Topics: Big Data flink