Flink_ 06_ Processapi (personal summary)

Posted by springo on Thu, 24 Feb 2022 06:54:29 +0100

Statement: 1 ***
              2. Because it is a personal summary, write the article with the most concise words
              3. If there is any mistake or improper place, please point out

Side output

That is, tributaries can be used to receive late data or classify data into multiple tributaries

For sliding windows, there are many overlapping windows. When the late data is not received by all windows, it will enter the side output stream

Only Process, the lowest API, can use the side output stream through the environment context

Case: output data with temperature value lower than 30 degrees to SideOutput

// Define the side output stream label, and pay attention to its anonymous implementation class
// The side output stream label must be a subclass of it. OutputTagb cannot be used directly
final OutputTag<SensorReading> lowTempTag = new OutputTag<SensorReading>("lowTemp") { };

SingleOutputStreamOperator<SensorReading> highTempStream = dataStream.process(new ProcessFunction<SensorReading, SensorReading>( ) {
    @Override
    public void processElement(SensorReading value, Context ctx, Collector<SensorReading> out) {
        if (value.getTemperature( ) < 30) {
            ctx.output(lowTempTag, value);
        } else {
            out.collect(value);
        }
    }
});
DataStream<SensorReading> lowTempStream = highTempStream.getSideOutput(lowTempTag);
highTempStream.print("high");
lowTempStream.print("low");

8 process APIs:

  1. ProcessFunction

  2. KeyedProcessFunction

    You have to keyBy first,

    Each element of the stream is processed to out Output any number of elements in the form of collect (xxx)

    • ¬∑processElement(I value, Context ctx, Collector<O> out)

      ctx can

      1. Timestamp of the access element

      2. key to access element

      3. Access TimerService(ctx.timerService())

        TimerService:

        method:

        1. EventTime correlation
          • long currentWatermark() returns the event time of the current data
          • Void registereventtimer (long timestamp) registers the timer of the current key
          • Void deleteeventtimer (long timestamp) deletes the timer. If not, it will not be executed
        2. ProcessingTime related
          • long currentProcessingTime() returns the processing time of the current data
          • Void registerprocessingtimer (long timestamp) registers the timer of the current key
          • Void deleteprocessingtimer (long timestamp) deletes the timer. If not, it will not be executed
        • When the Timer timer is triggered, the callback function onTimer() will be executed

        • If the timer started when the registration window is closed, it is better to delay 1s based on WindowEndTime;

          Because at the critical point, it is necessary to trigger both window calculation and timer;

          The timer task depends on the calculation of the window first, so it is better to give a delay of 1s

        Case requirement: if the temperature value rises continuously within 10 seconds (processing time), an alarm will be given

        public class TempIncreaseWarning extends KeyedProcessFunction<String, SensorReading, String> {
            private Integer interval;
        
            public TempIncreaseWarning(Integer interval) {
                this.interval = interval;
            }
        
            // Record the last temperature
            private ValueState<Double> lastTempState;
            // Record timer trigger time
            private ValueState<Long> timerTsState;
        
            @Override
            public void open(Configuration parameters) throws Exception {
                lastTempState = getRuntimeContext( ).getState(new ValueStateDescriptor<Double>("last-temp", Double.class, Double.MIN_VALUE));
                timerTsState = getRuntimeContext( ).getState(new ValueStateDescriptor<Long>("timer-ts", Long.class));
            }
        
        
            @Override
            public void processElement(SensorReading value, Context ctx, Collector<String> out) throws Exception {
                // Take out status
                Double lastTemp = lastTempState.value( );
                Long timerTs = timerTsState.value( );
        
                // Update temperature status
                lastTempState.update(value.getTemperature( ));
                // Whenever the temperature rises, && there is no timer
                if (value.getTemperature( ) > lastTemp && timerTs == null) {
                    long ts = ctx.timerService( ).currentProcessingTime( ) + interval * 1000L;
                    // Register timer
                    ctx.timerService( ).registerProcessingTimeTimer(ts);
                    // For subsequent deletion, the timer can find the registration timestamp
                    timerTsState.update(ts);
                }
                // The & & timer is empty whenever the temperature rises
                else if (value.getTemperature( ) <= lastTemp && timerTs != null) {
                    // Clear the timer. Note that ts cannot be used. We are looking for the time stamp of the registered timer
                    ctx.timerService( ).deleteProcessingTimeTimer(timerTs);
                    timerTsState.clear( );
                }
            }
        
            @Override
            public void onTimer(long timestamp, OnTimerContext ctx, Collector<String> out) {
                out.collect("sensor" + ctx.getCurrentKey( ) + "Continuous temperature" + interval + "Second rise");
                timerTsState.clear( );
            }
        }
        
      4. Output data to side output stream

    • . Ontimer (long timestamp, ontimercontext CTX, collector < o > out) is a callback function, which is called when the previously registered timer triggers.

    timestamp is the time stamp set by the timer to trigger the operation

    If you register an expired time, it will trigger the timer when you enter the data again

  3. CoProcessFunction

    The flow after connect is re process

    There are processElement1() and processElement2()

  4. ProcessJoinFunction

  5. BroadcastProcessFunction

    Stream A has one partition and stream B has four partitions. Stream B uses the data of stream A, so it is necessary to broadcast the data of one partition of stream A to the four partitions of stream B

    process after broadcasting

  6. KeyedBroadcastProcessFunction

  7. ProcessWindowFunction

    Such as aggregate(AggregateFunction<IN, ACC, OUT>aggFunction,ProcessWindowFunction<IN, OUT, KEY, W> windowFunction)

  8. ProcessAllWindowFunction

Topics: Big Data flink