Timer used in Flink program
Sometimes, we need to use timers in computing tasks to help us deal with business, such as automatic settlement of orders? Automatic praise? Regular collection? Wait
However, it should be noted that we cannot flexibly configure CRON expressions for computing tasks, but only specify the trigger time.
1, What kind of Flink job can start the timer
The JOB that needs to start the scheduled JOB must be processed by the KeyedProcessFunction low-order function, not Window
We can execute our data processing logic and start the timer in the processElement method.
OnTimer method is the specific method executed when the timer is triggered,
2, Timer function display
Result display
Order time and automatic approval time
When the corresponding order praise time is up, check whether to evaluate and turn on automatic praise
3, Logical implementation
(1) Custom data source
I'm dead here. You can implement the RichSourceFunction interface according to your own needs
package com.leilei; import cn.hutool.core.util.RandomUtil; import org.apache.flink.streaming.api.functions.source.RichSourceFunction; import java.util.UUID; import java.util.concurrent.atomic.AtomicInteger; /** * @author lei * @version 1.0 * @date 2021/3/21 22:00 * @desc Simulated order source */ public class MyOrderSource extends RichSourceFunction<Order> { private Boolean flag = true; private final String[] products = new String[]{"Braised chicken and rice", "Beijing Roast Duck", "Bridgehead spareribs"}; private final String[] users = new String[]{"Ma bond", "Huang sirang", "Zhang Mazi"}; AtomicInteger num; @Override public void run(SourceContext<Order> ctx) throws Exception { while (flag) { Order order = Order.builder() .product(products[RandomUtil.randomInt(3)]) .username(users[RandomUtil.randomInt(3)]) .orderId(UUID.randomUUID().toString().replace("-", ".")) .orderTime(System.currentTimeMillis()) .build(); Thread.sleep(5000); // The comment code simulates that there is only one timer for the same key. After the execution time, it overwrites before //if (num.get()<4) { Thread.sleep(5000); }else { Thread.sleep(500000); } num.incrementAndGet(); ctx.collect(order); } } @Override public void cancel() { flag = false; } }
package com.leilei; import lombok.AllArgsConstructor; import lombok.Builder; import lombok.Data; import lombok.NoArgsConstructor; @NoArgsConstructor @AllArgsConstructor @Data @Builder /** * @author lei * @version 1.0 * @date 2021/3/21 22:00 * @desc Order simulation object */ public class Order { private String product; private String username; private String orderId; private Long orderTime; }
(2) Custom order KEY generator
package com.leilei; import java.text.SimpleDateFormat; /** * @author lei * @version 1.0 * @desc * @date 2021-03-23 11:16 */ public class KeyUtil { public static String buildKey(Order order) { String date = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss").format(order.getOrderTime()); return order.getUsername()+"--" + order.getProduct()+"--" + order.getOrderId() + "--Order time :" + date; // The comment code simulates that there is only one timer for the same key. After the execution time, it overwrites before //return order.getUsername(); } }
(3) ProcessFunction logic processing and timer function
(1) Define lower order processing functions
We need to inherit the abstract class KeyedProcessFunction, because my computing tasks will be grouped according to the KEY
(2) Write calculation logic and timer registration in processing function
If our computing program is ProcessFunction, one element will trigger the processElement method once. It is a real flow processing, and can only be processed one by one. Unlike window, we can customize the size of the window (window is the bridge between flow and batch).
[the external chain image transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-tagcu2me-1629024341221) (C: \ users \ leile \ appdata \ roaming \ typora \ user images \ image-20210815175350234. PNG)]
value: is our input element
ctx: it is an execution environment in which we can start the timer, obtain the current KEY, obtain the side bit output (OutPutTag), and so on
out: is the output element result collector
(1) Register timer
The trigger time of the timer supports processing time and event time... We can register ourselves according to our choice
(2) What is the difference between processing time timer and event time timer?
Processing time: that is, the time for the flink operator to process data (increasing with the processing of data (elements))
ex: there are now three data sources in the following order: a > b > C
C enters the operator and registers a processing time timer (12:02:00). The time on the machine where the Flink job is located is 12:00:00, so now the processing time of the Flink job is 12:00:00
B enters the operator. The time on the machine where the Flink job is located is 12:01:00, so now the processing time of the Flink job is 12:01:00
A enters the operator. The time on the machine where the Flink job is located is 12:02:00, so now the processing time of the Flink job is 12:02:00
After the job processes the A element, it will trigger the timer registered by C (the processing time has been greater than or equal to 12:02:00)
The event time is the time attribute carried by the data itself (whether it will increase is affected by the time of the source data. The selection of event time is affected by whether there is a KEY or not. Please refer to the previous TimeWindow for details)
ex: now there are three data sources in the following order: a > b > C. assuming that their data enters the Flink program a little late, but they each contain their own time attribute a (12:01:00) > b (12:01:00) > C (12:00:00)
C enters the operator and registers an event timer (12:02:00), and the time of C element becomes the latest event time of Flink job
B enters the operator. The event time of element B is 12:01:00, and the time of element B becomes the latest event time of Flink job
A enters the operator, but the event time of element a is still 12:01:00, which is equal to the event time of Flink job, so the event time of Flink job is still 12:01:00
After the calculation of A is completed, the timer (12:02:00) triggered and registered by C will not be triggered, because in the current Flink job, the event time is up (12:01:00), and it will wait until there is data with an event time greater than or equal to 12:02:00
(3) Complete code of processing function
package com.leilei; import org.apache.flink.api.common.state.MapState; import org.apache.flink.api.common.state.MapStateDescriptor; import org.apache.flink.api.common.typeinfo.TypeInformation; import org.apache.flink.configuration.Configuration; import org.apache.flink.streaming.api.functions.KeyedProcessFunction; import org.apache.flink.util.Collector; import java.text.SimpleDateFormat; import java.util.Iterator; import java.util.Map; /** * @author lei * @version 1.0 * @date 2021/3/21 22:17 * @desc Simulation order automatic evaluation process calculation */ public class OrderSettlementProcess extends KeyedProcessFunction<String, Order, Object> { private final Long overTime; MapState<String, Long> productState; SimpleDateFormat df = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss"); public OrderSettlementProcess(Long overTime) { this.overTime = overTime; } /** * data processing * * @param currentOrder * @param ctx * @param out * @throws Exception */ @Override public void processElement(Order currentOrder, Context ctx, Collector<Object> out) throws Exception { long time = currentOrder.getOrderTime() + this.overTime; //Register a processing time timer whose trigger time is value getOrderTime() + this. overTime ctx.timerService().registerProcessingTimeTimer(time); // The comment code simulates that there is only one timer for the same key, and the execution time overrides the previous one (remove the previous timer) //if (productState.contains(ctx.getCurrentKey())) { // ctx.timerService().deleteProcessingTimeTimer(productState.get(ctx.getCurrentKey())); //} productState.put(ctx.getCurrentKey(), time); System.out.println(KeyUtil.buildKey(currentOrder) + " Order expires on:" + time + " :" + df.format(time)); } /** * Timed task trigger * * @param timestamp This is the CTX set above timerService(). registerProcessingTimeTimer(time); time stamp * @param ctx The context environment can obtain the grouping key of the current Process / set the timer, etc * @param out data collection * @throws Exception */ @Override public void onTimer(long timestamp, OnTimerContext ctx, Collector<Object> out) throws Exception { super.onTimer(timestamp, ctx, out); System.out.println("Scheduled task execution:" + timestamp + ":" + ctx.getCurrentKey()); Iterator<Map.Entry<String, Long>> orderIterator = productState.iterator(); if (orderIterator.hasNext()) { Map.Entry<String, Long> orderEntry = orderIterator.next(); String key = orderEntry.getKey(); Long expire = orderEntry.getValue(); //Simulate calling to query order status if (!isEvaluation(key) && expire == timestamp) { //todo data collection System.err.println(key + ">>>>> If the order is not evaluated and the maximum evaluation time is exceeded, the default setting is five-star praise!"); } else { System.out.println(key + "Order has been evaluated!"); } } } @Override public void open(Configuration parameters) throws Exception { super.open(parameters); //Define map storage state MapStateDescriptor<String, Long> mapStateDescriptor = new MapStateDescriptor<>("productState", TypeInformation.of(String.class), TypeInformation.of(Long.class)); productState = getRuntimeContext().getMapState(mapStateDescriptor); } public Boolean isEvaluation(String orderKey) { //todo query order status return false; } }
(4) Flink timer DEMO main startup class
package com.leilei; import org.apache.flink.streaming.api.datastream.DataStream; import org.apache.flink.streaming.api.datastream.DataStreamSource; import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; /** * @author lei * @version 1.0 * @date 2021/3/21 22:00 * @desc flink Timer (automatic praise, overtime settlement, etc.) */ public class FlinkTimer { public static void main(String[] args) { StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); DataStreamSource<Order> streamSource = env.addSource(new MyOrderSource()); DataStream<Object> stream = streamSource //Group by order .keyBy(KeyUtil::buildKey) .process(new OrderSettlementProcess(120 * 1000L)); stream.printToErr(); try { env.execute(); } catch (Exception e) { e.printStackTrace(); } } }