FlinkCEP introduction and entry case (based on the source code of flink 1.13.x)

Posted by tgavin on Thu, 09 Dec 2021 06:47:45 +0100

1. What is FlinkCEP?

1.1. CEP

CEP [composite event processing] finds out relevant event combinations (occurrence events) from different event streams through rules, and further processes the found (processing occurrence events).

CEP: first capture various subtle events (basic events or simple events), then analyze and sort out [event patterns], find out more meaningful events (composite events), and finally decide what actions to take. The analysis and sorting of events to find more meaningful events is the core of CEP and the most difficult place. For the understanding of CEP concept, please refer to Easy to understand CEP Technology

CEP is an analysis technology based on event flow in dynamic environment. Event here usually refers to all kinds of data collected, such as transaction records, and continuous. By analyzing the relationship between events, using filtering, correlation, aggregation and other technologies, formulate detection rules according to the timing relationship and aggregation relationship between events, continuously query the qualified event patterns from the event flow, and finally analyze and obtain more complex composite events.

1.2 FlinkCEP

FlinkCEP(Complex event processing for Flink) is a complex event processing library based on Flink
Detect the event pattern in unbounded flow or bounded flow, so as to mine the value of data.

1.3. Features

  • Objective: to find composite events from ordered simple event flows [i.e. defined event patterns]
  • Input: one or more event flows composed of simple events [watermark must be specified for timing relationship analysis]
  • Handling: identify the internal relationship between simple events, and multiple simple events conforming to certain rules constitute complex events
  • Output: complex events satisfying rules

1.4. Flink CEP application scenario

CEP has many application scenarios, such as stock curve prediction, network intrusion, logistics order tracking, e-commerce order, IOT scenario, etc.
It is generally divided into the following three categories:

  • Risk control: conduct real-time detection of abnormal behavior patterns of users. When a user has behavior that should not have occurred, judge whether the user is suspected of illegal operation. For example, when the same bank card swipes from two different places within 10 minutes, an alarm mechanism will be triggered to facilitate the monitoring of credit card theft
  • Strategy marketing: use pre-defined rules to track the user's behavior trajectory in real time, and send the promotion of corresponding strategies to users whose behavior trajectory matches the predefined rules in real time.
  • Operation and maintenance monitoring: flexibly configure multiple indicators and dependencies to achieve more complex monitoring mode.

2. FlinkCEP introduction case

2.1. Import dependency

<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-cep_2.11</artifactId>
    <version>1.13.3</version>
</dependency>

2.2. Development process

  1. Read the event stream and convert it to DataStream
  2. Watermark must be specified
  3. Define event pattern
  4. Applies the event pattern on the specified event flow
  5. Match or select qualified events and generate alarms

See the following for some code snippets:

//Read event stream
DataStreamSource<String> source = env.readTextFile("/data/input/events.txt");
DataStreamSource<String> source = env.socketTextStream("bigdata01", 10088);

SingleOutputStreamOperator<Event> flatMapStream = source.flatMap((FlatMapFunction<String, Event>) (v, out) -> {
            out.collect(new Event(v.split(",")));
        }).returns(Types.POJO(Event.class));

2.3. Getting started code example

import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.NoArgsConstructor;
import org.apache.flink.api.common.eventtime.SerializableTimestampAssigner;
import org.apache.flink.api.common.eventtime.WatermarkStrategy;
import org.apache.flink.cep.CEP;
import org.apache.flink.cep.PatternSelectFunction;
import org.apache.flink.cep.PatternStream;
import org.apache.flink.cep.pattern.Pattern;
import org.apache.flink.cep.pattern.conditions.SimpleCondition;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.windowing.time.Time;

import java.time.Duration;
import java.time.LocalDateTime;
import java.time.ZoneOffset;
import java.time.format.DateTimeFormatter;
import java.util.List;
import java.util.Map;

public class MyCEPTest {
    public static void main(String args[]) throws Exception {
        final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        final DateTimeFormatter dateTimeFormatter = DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss");
        DataStream<EventMsg> dataStream =
                env.fromElements(
                    new EventMsg(1L, LocalDateTime.parse("2020-04-15 08:05:01", dateTimeFormatter), "A", "INFO"),
                    new EventMsg(2L, LocalDateTime.parse("2020-04-15 08:06:11", dateTimeFormatter), "A", "error"),
                    new EventMsg(3L, LocalDateTime.parse("2020-04-15 08:07:21", dateTimeFormatter), "A", "critical"),
                    new EventMsg(4L, LocalDateTime.parse("2020-04-15 08:08:21", dateTimeFormatter), "A", "INFO"),
                    new EventMsg(5L, LocalDateTime.parse("2020-04-15 08:09:21", dateTimeFormatter), "B", "INFO"),
                    new EventMsg(6L, LocalDateTime.parse("2020-04-15 08:11:51", dateTimeFormatter), "B", "error"),
                    new EventMsg(7L, LocalDateTime.parse("2020-04-15 08:12:20", dateTimeFormatter), "B", "critical"),
                    new EventMsg(8L, LocalDateTime.parse("2020-04-15 08:15:22", dateTimeFormatter), "B", "INFO"),
                    new EventMsg(9L, LocalDateTime.parse("2020-04-15 08:17:34", dateTimeFormatter), "B", "error"));

        SingleOutputStreamOperator<EventMsg> watermarks = dataStream.assignTimestampsAndWatermarks(
                // Maximum disorder degree
                WatermarkStrategy.<EventMsg>forBoundedOutOfOrderness(Duration.ofSeconds(3))
                        .withTimestampAssigner(
                                (SerializableTimestampAssigner<EventMsg>) (element, recordTimestamp) -> toEpochMilli(element.getEventTime()))
        );
        Pattern<EventMsg, ?> pattern = Pattern.<EventMsg>begin("start")
                .next("middle").where(new SimpleCondition<EventMsg>() {
                    @Override
                    public boolean filter(EventMsg value) throws Exception {
                        return value.getEventType().equals("error");
                    }
                }).followedBy("end").where(new SimpleCondition<EventMsg>() {
                    @Override
                    public boolean filter(EventMsg value) throws Exception {
                        return value.getEventType().equals("critical");
                    }
                }).within(Time.seconds(180));

        PatternStream<EventMsg> patternStream = CEP.pattern(watermarks, pattern);

        DataStream<String> alerts = patternStream.select(new PatternSelectFunction<EventMsg, String>() {
            @Override
            public String select(Map<String, List<EventMsg>> msgs) throws Exception {
                StringBuffer sb = new StringBuffer();
                msgs.forEach((k,v)->{
                  sb.append(k+",");
                  sb.append(v.toString()+"\n");
                });
                return sb.toString();
            }
        });

        alerts.print();
        env.execute("Flink CEP Test");
    }

    public static final ZoneOffset zoneOffset8 = ZoneOffset.of("+8");

    public static long toEpochMilli(LocalDateTime dt) {
        return dt.toInstant(zoneOffset8).toEpochMilli();
    }

    @Data
    @AllArgsConstructor
    @NoArgsConstructor
    public static class EventMsg {
        public long eventId;
        public LocalDateTime eventTime;
        public String eventName;
        public String eventType;

        @Override
        public String toString(){
            return String.format("%s-%s-%s-%s",eventId,eventName,eventType,eventTime);
        }
    }
}

Topics: flink