Getting started with Flink LookupJoin Flink Rule

Posted by Jason28 on Thu, 03 Mar 2022 13:45:20 +0100

This article refers to Flink version 1.14.0

Pre knowledge

If you only want to know about Flink, you can skip it. If you need to start development, you must master the pre knowledge first.
There are many articles about calculate and Flink SQL on the Internet. You can refer to:
flink sql knows why (6) | flink sql dating call (read this article is enough) - Nuggets
[source code analysis] take you through the internal execution process of Flink SQL / Table API

Flink Join

Regular Join

For example, common inner joins:

SELECT * FROM Orders
JOIN Product
ON Orders.productId = Product.id

This kind of JOIN requires the data on both sides of the JOIN to be permanently retained in the Flink state to ensure the accuracy of the output results, which will lead to the infinite expansion of the State.
You can configure the TTL of state (time to live: table.exec.state.ttl) to avoid unlimited growth, but please note that this may affect the readiness of query results.

Interval Join

JOIN based on JOIN conditions and time constraints. Based on two keystreams, it connects each data on one stream with the data in different time windows on the other stream according to the JOIN condition.
For example, query the order and the associated payment information, where the payment is within 1 hour before and after the order time:

SELECT
  ...
FROM
  Orders AS o JOIN Payment AS p ON
  o.orderId = p.orderId AND
  p.payTime BETWEEN orderTime - INTERVAL '1' HOUR AND
  orderTime + INTERVAL '1' HOUR

Temporal join

Firstly, the concept of a temporal table is introduced. This is a dynamic table that changes over time. It may contain multiple snapshots of the table.
For records in temporal tables, tables that can track and access their historical versions are called version tables, such as the changeLog of the database;
Tables that can only track and access the latest version are called ordinary tables, such as database tables.

In Flink, the table that defines the primary key constraint and event time attribute is the version table.

Temporary JOIN allows the JOIN version table, that is, the main table can use a constantly updated version table to expand its details according to time and equivalent Association conditions. Both tables must be event time or processing time at the same time.

When using the event time, the version table retains all version data from the last watermark to the current time, and the watermark needs to be configured in both the left and right tables; The right table must be CDC data, the primary key must be configured correctly, and the primary key must be in the equivalent Association condition of the JOIN. For example:

-- The left table is ordinary append-only surface.
CREATE TABLE orders (
    order_id    STRING,
    price       DECIMAL(32,2),
    currency    STRING,
    order_time  TIMESTAMP(3),
    WATERMARK FOR order_time AS order_time
) WITH (/* ... */);

-- The right table is the version table of exchange rate, CDC data
CREATE TABLE currency_rates (
    currency STRING,
    conversion_rate DECIMAL(32, 2),
    update_time TIMESTAMP(3) METADATA FROM `values.source.timestamp` VIRTUAL,
    WATERMARK FOR update_time AS update_time,
    PRIMARY KEY(currency) NOT ENFORCED
) WITH (
   'connector' = 'kafka',
   'value.format' = 'debezium-json',
   /* ... */
);

SELECT 
     order_id,
     price,
     currency,
     conversion_rate,
     order_time
FROM orders
LEFT JOIN currency_rates FOR SYSTEM_TIME AS OF orders.order_time
-- The primary key must be in the association condition
ON orders.currency = currency_rates.currency;

order_id  price  currency  conversion_rate  order_time
========  =====  ========  ===============  =========
o_001     11.11  EUR       1.14             12:00:00
o_002     12.51  EUR       1.10             12:06:00

When processing time is used, users can regard the Lookup table (right table) as an ordinary HashMap, which stores the latest full amount of data. Flink can directly JOIN a table of an external database system without storing the state of the latest version. For example:

SELECT
  o.amout, o.currency, r.rate, o.amount * r.rate
FROM
  Orders AS o
  JOIN LatestRates FOR SYSTEM_TIME AS OF o.proctime AS r
  ON r.currency = o.currency;

-- or Join A table function
SELECT
  o_amount, r_rate
FROM
  Orders,
  LATERAL TABLE (Rates(o_proctime))
WHERE
  r_currency = o_currency

Note: "FOR SYSTEM_TIME AS OF" syntax does not support VIEW / any latest table, because considering that the implementation of Flink is not consistent with its semantics, the JOIN processing of the left stream will not wait for the version table on the right (VIEW / table function) to complete the snapshot. Personal understanding may cause that the right table on the left table JOIN is not necessarily the latest data.

Lookup Join

It is associated with the Temporal Join based on event time and queries the data of the right table at the time point when the JOIN operator is executed. It is generally used for dimension table Association and only supports equivalent JOIN. For example:

SELECT
  o.amout, o.currency, r.rate, o.amount * r.rate
FROM
  Orders AS o
  JOIN LatestRates FOR SYSTEM_TIME AS OF o.proctime AS r
  ON r.currency = o.currency;

Lookup Join execution process

This paper takes Flink single test case as an example to explain. Novices can start to develop custom rules based on this.

preparation

Compiling Flink Table module

Execute in the Flink table directory: mvn clean package -Pfast,hive-2.1.1,scala-2.12 -DskipTests

Open single test file

The UT of Flink Rule includes:

Logical plan test: Flink table / Flink table Planner / SRC / test / Scala / org / Apache / Flink / table / Planner / plan / rules / logical
Physical plan test: Flink table / Flink table Planner / SRC / test / Scala / org / Apache / Flink / table / Planner / plan / stream / SQL, XXX/batch/sql
Integration test: Flink table / Flink table Planner / SRC / test / Scala / org / Apache / Flink / table / Planner / Runtime / stream / SQL, XXX/batch/sql

This is also the UT required to submit Rule related PR to the community

Open log level

Before the code requiring single test, add: configurator setAllLevels("", Level.TRACE)

Tracking sql execution

The following is based on the file: Flink table / Flink table Planner / SRC / test / Scala / org / Apache / Flink / table / Planner / plan / stream / SQL / join / lookupjointest The implementation of scala is analyzed.
Execute single test: testJoinTemporalTable
SELECT * FROM MyTable AS T JOIN LookupTable
FOR SYSTEM_TIME AS OF T.proctime AS D ON T.a = D.id

sql parsing

Parser (invoke syntax support) will parse the SQL statement "FOR SYSTEM_TIME AS OF" into SqlSnapshot(
SqlNode), validate() and convert it into LogicalSnapshot(RelNode). You can see the logical execution plan:

LogicalProject(a=[$0], b=[$1], c=[$2], proctime=[$3], rowtime=[$4], id=[$5], name=[$6], age=[$7])
  LogicalCorrelate(correlation=[$cor0], joinType=[inner], requiredColumns=[{0, 3}])
    LogicalTableScan(table=[[default_catalog, default_database, MyTable]])
    LogicalFilter(condition=[=($cor0.a, $0)])
      LogicalSnapshot(period=[$cor0.proctime])
        LogicalTableScan(table=[[default_catalog, default_database, LookupTable]])

Optimizer optimization

FlinkStreamProgram/FlinkBatchProgram defines a series of rules to transform and optimize logical / physical plans.
This case will go through the following important conversion processes:

LogicalCorrelateToJoinFromLookupTableRuleWithFilter:

// It can be seen from the definition of the class that the logical plan above can match the rule
class LogicalCorrelateToJoinFromLookupTableRuleWithFilter
  extends LogicalCorrelateToJoinFromLookupTemporalTableRule(
    operand(classOf[LogicalCorrelate],
      operand(classOf[RelNode], any()),
      operand(classOf[LogicalFilter],
        operand(classOf[LogicalSnapshot],
          operand(classOf[RelNode], any())))),
    "LogicalCorrelateToJoinFromLookupTableRuleWithFilter"
  ) {
    override def matches(call: RelOptRuleCall): Boolean = {
      val snapshot: LogicalSnapshot = call.rel(3)
      val snapshotInput: RelNode = trimHep(call.rel(4))
      isLookupJoin(snapshot, snapshotInput)
    }
    ......
}

// Judge whether it is lookupJoin after matching the rule
protected def isLookupJoin(snapshot: LogicalSnapshot, snapshotInput: RelNode): Boolean = {
  ......
  // Is the processing time and the table of the snapshot is LookupTableSource
  isProcessingTime && snapshotOnLookupSource
}

After matching, it will convert LogicalCorrelate into LogicalJoin

 LogicalProject(a=[$0], b=[$1], c=[$2], proctime=[$3], rowtime=[$4], id=[$5], name=[$6], age=[$7])
+- LogicalJoin(condition=[=($0, $5)], joinType=[inner])
   :- LogicalTableScan(table=[[default_catalog, default_database, MyTable]])
   +- LogicalSnapshot(period=[$cor0.proctime])
      +- LogicalTableScan(table=[[default_catalog, default_database, LookupTable]])

FlinkProjectJoinTransposeRule + ProjectRemoveRule: Project operator pushes down and cuts

// Swap Project and the lower Join operator to push down Project
public FlinkProjectJoinTransposeRule(
        PushProjector.ExprCondition preserveExprCondition, RelBuilderFactory relFactory) {
    super(operand(Project.class, operand(Join.class, any())), relFactory, null);
    this.preserveExprCondition = preserveExprCondition;
}

After optimization:

LogicalJoin(condition=[=($0, $5)], joinType=[inner])
:- LogicalTableScan(table=[[default_catalog, default_database, MyTable]])
+- LogicalSnapshot(period=[$cor0.proctime])
   +- LogicalTableScan(table=[[default_catalog, default_database, LookupTable]])

The next Volcano rule will combine and optimize the logical plan to generate the optimal plan. It can be seen that after execution, the optimal result is:

12129 [main] DEBUG org.apache.calcite.plan.RelOptPlanner [] - Cheapest plan:
FlinkLogicalJoin(condition=[=($0, $5)], joinType=[inner]): rowcount = 3.0E7, cumulative cost = {4.0E8 rows, 5.0E8 cpu, 1.37E10 io, 0.0 network, 0.0 memory}, id = 403
  FlinkLogicalDataStreamTableScan(table=[[default_catalog, default_database, MyTable]], fields=[a, b, c, proctime, rowtime]): rowcount = 1.0E8, cumulative cost = {1.0E8 rows, 1.0E8 cpu, 4.8E9 io, 0.0 network, 0.0 memory}, id = 378
  FlinkLogicalSnapshot(period=[$cor0.proctime]): rowcount = 1.0E8, cumulative cost = {2.0E8 rows, 2.0E8 cpu, 4.0E9 io, 0.0 network, 0.0 memory}, id = 402
    FlinkLogicalTableSourceScan(table=[[default_catalog, default_database, LookupTable]], fields=[id, name, age]): rowcount = 1.0E8, cumulative cost = {1.0E8 rows, 1.0E8 cpu, 2.0E9 io, 0.0 network, 0.0 memory}, id = 381

// Final result:
FlinkLogicalJoin(condition=[=($0, $5)], joinType=[inner])
:- FlinkLogicalDataStreamTableScan(table=[[default_catalog, default_database, MyTable]], fields=[a, b, c, proctime, rowtime])
+- FlinkLogicalSnapshot(period=[$cor0.proctime])
   +- FlinkLogicalTableSourceScan(table=[[default_catalog, default_database, LookupTable]], fields=[id, name, age])

Try rule

Rules                                                                   Attempts           Time (us)
FlinkJoinPushExpressionsRule                                                   2                 553
JoinConditionPushRule                                                          2                 152
FlinkLogicalTableSourceScanConverter(in:NONE,out:LOGICAL)                      1              54,956
FlinkLogicalJoinConverter(in:NONE,out:LOGICAL)                                 1               4,787
FlinkLogicalSnapshotConverter(in:NONE,out:LOGICAL)                             1               3,162
FlinkLogicalDataStreamTableScanConverter(in:NONE,out:LOGICAL)                   1               1,403
SimplifyJoinConditionRule                                                      1                 249
* Total                                                                        9              65,262

Among them: Several converters are placed in logical_ In converters, this set contains a series of logical rules for converting calculate node into Flink node.

For example: FlinkLogicalSnapshotConverter:

// Convert LogicalSnapshot to FlinkLogicalSnapshot
class FlinkLogicalSnapshotConverter
  extends ConverterRule(
  // Match the LogicalSnapshot type without Convention, and the output is flinkconventions LOGICAL
    classOf[LogicalSnapshot],
    Convention.NONE,
    FlinkConventions.LOGICAL,
    "FlinkLogicalSnapshotConverter") {

  def convert(rel: RelNode): RelNode = {
    val snapshot = rel.asInstanceOf[LogicalSnapshot]
    val newInput = RelOptRule.convert(snapshot.getInput, FlinkConventions.LOGICAL)
    FlinkLogicalSnapshot.create(newInput, snapshot.getPeriod)
  }
}

Operators that increase processing time and materialize

// convert time indicators
chainedProgram.addLast(TIME_INDICATOR, new FlinkRelTimeIndicatorProgram)
// If it is an event time, and if necessary, an sqlFunction will be created to implement it
rexBuilder.makeCall(FlinkSqlOperatorTable.PROCTIME_MATERIALIZE, expr)

Converted:

FlinkLogicalCalc(select=[a, b, c, PROCTIME_MATERIALIZE(proctime) AS proctime, rowtime, id, name, age])
+- FlinkLogicalJoin(condition=[=($0, $5)], joinType=[inner])
   :- FlinkLogicalDataStreamTableScan(table=[[default_catalog, default_database, MyTable]], fields=[a, b, c, proctime, rowtime])
   +- FlinkLogicalSnapshot(period=[$cor0.proctime])
      +- FlinkLogicalTableSourceScan(table=[[default_catalog, default_database, LookupTable]], fields=[id, name, age])

Physical rule optimization

After being processed by the following physical Volcano rules

FlinkJoinPushExpressionsRule          
JoinConditionPushRule                                                  
StreamPhysicalTableSourceScanRule(in:LOGICAL,out:STREAM_PHYSICAL)
FlinkLogicalTableSourceScanConverter(in:NONE,out:LOGICAL)      
StreamPhysicalSnapshotOnTableScanRule                       
StreamPhysicalCalcRule(in:LOGICAL,out:STREAM_PHYSICAL)    
FlinkLogicalJoinConverter(in:NONE,out:LOGICAL)         
StreamPhysicalDataStreamScanRule(in:LOGICAL,out:STREAM_PHYSICAL) 
FlinkLogicalSnapshotConverter(in:NONE,out:LOGICAL)                 
FlinkLogicalDataStreamTableScanConverter(in:NONE,out:LOGICAL)
SimplifyJoinConditionRule

Get the best result:

Calc(select=[a, b, c, PROCTIME_MATERIALIZE(proctime) AS proctime, rowtime, id, name, age])
+- LookupJoin(table=[default_catalog.default_database.LookupTable], joinType=[InnerJoin], async=[false], lookup=[id=a], select=[a, b, c, proctime, rowtime, id, name, age])
   +- DataStreamScan(table=[[default_catalog, default_database, MyTable]], fields=[a, b, c, proctime, rowtime])

Streamphysicalcrule: convert flinklogicalc to streamphysicalc
SnapshotOnTableScanRule: Set

FlinkLogicalJoin
+-  FlinkLogicalDataStreamTableScan
+-  FlinkLogicalSnapshot
    +- FlinkLogicalTableSourceScan

convert to

StreamPhysicalLookupJoin
+- StreamPhysicalDataStreamScan

Here is the key transformation logic of LookupJoin:

// This rule uses the matching criteria of the parent class
class SnapshotOnTableScanRule
  extends BaseSnapshotOnTableScanRule("StreamPhysicalSnapshotOnTableScanRule") {
}

// As you can see, it just matches the logical plan before optimization
abstract class BaseSnapshotOnTableScanRule(description: String)
  extends RelOptRule(
    operand(classOf[FlinkLogicalJoin],
      operand(classOf[FlinkLogicalRel], any()),
      operand(classOf[FlinkLogicalSnapshot],
        operand(classOf[TableScan], any()))),
    description)
  with CommonLookupJoinRule 

private def doTransform(
  join: FlinkLogicalJoin,
  input: FlinkLogicalRel,
  temporalTable: RelOptTable,
  calcProgram: Option[RexProgram]): StreamPhysicalLookupJoin = {
    
  val joinInfo = join.analyzeCondition

  val cluster = join.getCluster

  val providedTrait = join.getTraitSet.replace(FlinkConventions.STREAM_PHYSICAL)
  val requiredTrait = input.getTraitSet.replace(FlinkConventions.STREAM_PHYSICAL)
   //Convert the input from a logical node to a physical node, where the StreamPhysicalDataStreamScanRule will be triggered,
   //Convert FlinkLogicalTableSourceScan to StreamPhysicalDataStreamScan
  val convInput = RelOptRule.convert(input, requiredTrait)
  new StreamPhysicalLookupJoin(
    cluster,
    providedTrait,
    convInput,
    temporalTable,
    calcProgram,
    joinInfo,
    join.getJoinType)
}

This completes the conversion of the physical plan

Translation physics program

planner.translate()
These include:

val execGraph = translateToExecNodeGraph(optimizedRelNodes)
val transformations = translateToPlan(execGraph)

In translateToExecNodeGraph: the translateToExecNode method of the last node generated by the physical plan will be called. as

StreamPhysicalLookupJoin will be converted to StreamExecLookupJoin
In translateToPlan: call the translateToPlanInternal method of ExecNode. Take CommonExecLookupJoin as an example:

protected CommonExecLookupJoin(......){
    //The checksum asynchronous LookupFunction logic is ignored here
    public Transformation<RowData> translateToPlanInternal(PlannerBase planner) {
        // -----------Create factory for lookupFunction Operator---------------
        RelOptTable temporalTable = temporalTableSourceSpec.getTemporalTable(planner);
    
        UserDefinedFunction userDefinedFunction =
                LookupJoinUtil.getLookupFunction(temporalTable, lookupKeys.keySet());
        UserDefinedFunctionHelper.prepareInstance(
                planner.getTableConfig().getConfiguration(), userDefinedFunction);
    
        boolean isLeftOuterJoin = joinType == FlinkJoinType.LEFT;
        StreamOperatorFactory<RowData> operatorFactory;
    
        operatorFactory =
                createSyncLookupJoin(
                        temporalTable,
                        planner.getTableConfig(),
                        lookupKeys,
                        (TableFunction<Object>) userDefinedFunction,
                        planner.getRelBuilder(),
                        inputRowType,
                        tableSourceRowType,
                        resultRowType,
                        isLeftOuterJoin,
                        planner.getExecEnv().getConfig().isObjectReuseEnabled());
        //-------------------------------------------------------
        // Convert to Transformation
        Transformation<RowData> inputTransformation =
                (Transformation<RowData>) inputEdge.translateToPlan(planner);
        return new OneInputTransformation<>(
                inputTransformation,
                getDescription(),
                operatorFactory,
                InternalTypeInfo.of(resultRowType),
                inputTransformation.getParallelism());
    }
}

//Only the core logic is listed, which is mainly divided into three parts
private StreamOperatorFactory<RowData> createSyncLookupJoin() {
    // The function of lookupFunction is generated through codeGenerator and packaged into FlatMap function
    GeneratedFunction<FlatMapFunction<RowData, RowData>> generatedFetcher =
        LookupJoinCodeGenerator.generateSyncLookupFunction();
    // The Collector that generates the output of the table function
    GeneratedCollector<TableFunctionCollector<RowData>> generatedCollector =
        LookupJoinCodeGenerator.generateCollector();
    // Finally, the ProcessFunction of LookupJoinRunner will be generated
    // If there is Calc on the lookupJoin side (i.e. the right table), the Runner will have Calc calculation logic
    // For example: SELECT * FROM T JOIN DIM FOR SYSTEM_TIME AS OF T.proctime AS D ON T.a = D.b + 1
    // The Fetcher will read out the original data in the LookupFunction, and then compare it with the data in the main table (left stream) after calc calculation
    GeneratedFunction<FlatMapFunction<RowData, RowData>> generatedCalc =
        LookupJoinCodeGenerator.generateCalcMapFunction(
                config,
                JavaScalaConversionUtil.toScala(projectionOnTemporalTable),
                filterOnTemporalTable,
                temporalTableOutputType,
                tableSourceRowType);
    
    ProcessFunction<RowData, RowData> processFunc =
            new LookupJoinWithCalcRunner(
                    generatedFetcher,
                    generatedCalc,
                    generatedCollector,
                    isLeftOuterJoin,
                    rightRowType.getFieldCount());
}

Finally, transformations - > streamgraph - > jobgraph will be unified with the process of DataStream API.

Topics: Big Data flink

Programmer Think