This article refers to Flink version 1.14.0
Pre knowledge
If you only want to know about Flink, you can skip it. If you need to start development, you must master the pre knowledge first.
There are many articles about calculate and Flink SQL on the Internet. You can refer to:
flink sql knows why (6) | flink sql dating call (read this article is enough) - Nuggets
[source code analysis] take you through the internal execution process of Flink SQL / Table API
Flink Join
Regular Join
For example, common inner joins:
SELECT * FROM Orders JOIN Product ON Orders.productId = Product.id
This kind of JOIN requires the data on both sides of the JOIN to be permanently retained in the Flink state to ensure the accuracy of the output results, which will lead to the infinite expansion of the State.
You can configure the TTL of state (time to live: table.exec.state.ttl) to avoid unlimited growth, but please note that this may affect the readiness of query results.
Interval Join
JOIN based on JOIN conditions and time constraints. Based on two keystreams, it connects each data on one stream with the data in different time windows on the other stream according to the JOIN condition.
For example, query the order and the associated payment information, where the payment is within 1 hour before and after the order time:
SELECT ... FROM Orders AS o JOIN Payment AS p ON o.orderId = p.orderId AND p.payTime BETWEEN orderTime - INTERVAL '1' HOUR AND orderTime + INTERVAL '1' HOUR
Temporal join
Firstly, the concept of a temporal table is introduced. This is a dynamic table that changes over time. It may contain multiple snapshots of the table.
For records in temporal tables, tables that can track and access their historical versions are called version tables, such as the changeLog of the database;
Tables that can only track and access the latest version are called ordinary tables, such as database tables.
In Flink, the table that defines the primary key constraint and event time attribute is the version table.
Temporary JOIN allows the JOIN version table, that is, the main table can use a constantly updated version table to expand its details according to time and equivalent Association conditions. Both tables must be event time or processing time at the same time.
- When using the event time, the version table retains all version data from the last watermark to the current time, and the watermark needs to be configured in both the left and right tables; The right table must be CDC data, the primary key must be configured correctly, and the primary key must be in the equivalent Association condition of the JOIN. For example:
-- The left table is ordinary append-only surface. CREATE TABLE orders ( order_id STRING, price DECIMAL(32,2), currency STRING, order_time TIMESTAMP(3), WATERMARK FOR order_time AS order_time ) WITH (/* ... */); -- The right table is the version table of exchange rate, CDC data CREATE TABLE currency_rates ( currency STRING, conversion_rate DECIMAL(32, 2), update_time TIMESTAMP(3) METADATA FROM `values.source.timestamp` VIRTUAL, WATERMARK FOR update_time AS update_time, PRIMARY KEY(currency) NOT ENFORCED ) WITH ( 'connector' = 'kafka', 'value.format' = 'debezium-json', /* ... */ ); SELECT order_id, price, currency, conversion_rate, order_time FROM orders LEFT JOIN currency_rates FOR SYSTEM_TIME AS OF orders.order_time -- The primary key must be in the association condition ON orders.currency = currency_rates.currency; order_id price currency conversion_rate order_time ======== ===== ======== =============== ========= o_001 11.11 EUR 1.14 12:00:00 o_002 12.51 EUR 1.10 12:06:00
- When processing time is used, users can regard the Lookup table (right table) as an ordinary HashMap, which stores the latest full amount of data. Flink can directly JOIN a table of an external database system without storing the state of the latest version. For example:
SELECT o.amout, o.currency, r.rate, o.amount * r.rate FROM Orders AS o JOIN LatestRates FOR SYSTEM_TIME AS OF o.proctime AS r ON r.currency = o.currency; -- or Join A table function SELECT o_amount, r_rate FROM Orders, LATERAL TABLE (Rates(o_proctime)) WHERE r_currency = o_currency
Note: "FOR SYSTEM_TIME AS OF" syntax does not support VIEW / any latest table, because considering that the implementation of Flink is not consistent with its semantics, the JOIN processing of the left stream will not wait for the version table on the right (VIEW / table function) to complete the snapshot. Personal understanding may cause that the right table on the left table JOIN is not necessarily the latest data.
Lookup Join
It is associated with the Temporal Join based on event time and queries the data of the right table at the time point when the JOIN operator is executed. It is generally used for dimension table Association and only supports equivalent JOIN. For example:
SELECT o.amout, o.currency, r.rate, o.amount * r.rate FROM Orders AS o JOIN LatestRates FOR SYSTEM_TIME AS OF o.proctime AS r ON r.currency = o.currency;
Lookup Join execution process
This paper takes Flink single test case as an example to explain. Novices can start to develop custom rules based on this.
preparation
Compiling Flink Table module
Execute in the Flink table directory: mvn clean package -Pfast,hive-2.1.1,scala-2.12 -DskipTests
Open single test file
The UT of Flink Rule includes:
- Logical plan test: Flink table / Flink table Planner / SRC / test / Scala / org / Apache / Flink / table / Planner / plan / rules / logical
- Physical plan test: Flink table / Flink table Planner / SRC / test / Scala / org / Apache / Flink / table / Planner / plan / stream / SQL, XXX/batch/sql
- Integration test: Flink table / Flink table Planner / SRC / test / Scala / org / Apache / Flink / table / Planner / Runtime / stream / SQL, XXX/batch/sql
This is also the UT required to submit Rule related PR to the community
Open log level
Before the code requiring single test, add: configurator setAllLevels("", Level.TRACE)
Tracking sql execution
- The following is based on the file: Flink table / Flink table Planner / SRC / test / Scala / org / Apache / Flink / table / Planner / plan / stream / SQL / join / lookupjointest The implementation of scala is analyzed.
- Execute single test: testJoinTemporalTable
SELECT * FROM MyTable AS T JOIN LookupTable
FOR SYSTEM_TIME AS OF T.proctime AS D ON T.a = D.id
sql parsing
Parser (invoke syntax support) will parse the SQL statement "FOR SYSTEM_TIME AS OF" into SqlSnapshot(
SqlNode), validate() and convert it into LogicalSnapshot(RelNode). You can see the logical execution plan:
LogicalProject(a=[$0], b=[$1], c=[$2], proctime=[$3], rowtime=[$4], id=[$5], name=[$6], age=[$7]) LogicalCorrelate(correlation=[$cor0], joinType=[inner], requiredColumns=[{0, 3}]) LogicalTableScan(table=[[default_catalog, default_database, MyTable]]) LogicalFilter(condition=[=($cor0.a, $0)]) LogicalSnapshot(period=[$cor0.proctime]) LogicalTableScan(table=[[default_catalog, default_database, LookupTable]])
Optimizer optimization
FlinkStreamProgram/FlinkBatchProgram defines a series of rules to transform and optimize logical / physical plans.
This case will go through the following important conversion processes:
- LogicalCorrelateToJoinFromLookupTableRuleWithFilter:
// It can be seen from the definition of the class that the logical plan above can match the rule class LogicalCorrelateToJoinFromLookupTableRuleWithFilter extends LogicalCorrelateToJoinFromLookupTemporalTableRule( operand(classOf[LogicalCorrelate], operand(classOf[RelNode], any()), operand(classOf[LogicalFilter], operand(classOf[LogicalSnapshot], operand(classOf[RelNode], any())))), "LogicalCorrelateToJoinFromLookupTableRuleWithFilter" ) { override def matches(call: RelOptRuleCall): Boolean = { val snapshot: LogicalSnapshot = call.rel(3) val snapshotInput: RelNode = trimHep(call.rel(4)) isLookupJoin(snapshot, snapshotInput) } ...... }
// Judge whether it is lookupJoin after matching the rule protected def isLookupJoin(snapshot: LogicalSnapshot, snapshotInput: RelNode): Boolean = { ...... // Is the processing time and the table of the snapshot is LookupTableSource isProcessingTime && snapshotOnLookupSource }
After matching, it will convert LogicalCorrelate into LogicalJoin
LogicalProject(a=[$0], b=[$1], c=[$2], proctime=[$3], rowtime=[$4], id=[$5], name=[$6], age=[$7]) +- LogicalJoin(condition=[=($0, $5)], joinType=[inner]) :- LogicalTableScan(table=[[default_catalog, default_database, MyTable]]) +- LogicalSnapshot(period=[$cor0.proctime]) +- LogicalTableScan(table=[[default_catalog, default_database, LookupTable]])
- FlinkProjectJoinTransposeRule + ProjectRemoveRule: Project operator pushes down and cuts
// Swap Project and the lower Join operator to push down Project public FlinkProjectJoinTransposeRule( PushProjector.ExprCondition preserveExprCondition, RelBuilderFactory relFactory) { super(operand(Project.class, operand(Join.class, any())), relFactory, null); this.preserveExprCondition = preserveExprCondition; }
After optimization:
LogicalJoin(condition=[=($0, $5)], joinType=[inner]) :- LogicalTableScan(table=[[default_catalog, default_database, MyTable]]) +- LogicalSnapshot(period=[$cor0.proctime]) +- LogicalTableScan(table=[[default_catalog, default_database, LookupTable]])
-
The next Volcano rule will combine and optimize the logical plan to generate the optimal plan. It can be seen that after execution, the optimal result is:
12129 [main] DEBUG org.apache.calcite.plan.RelOptPlanner [] - Cheapest plan: FlinkLogicalJoin(condition=[=($0, $5)], joinType=[inner]): rowcount = 3.0E7, cumulative cost = {4.0E8 rows, 5.0E8 cpu, 1.37E10 io, 0.0 network, 0.0 memory}, id = 403 FlinkLogicalDataStreamTableScan(table=[[default_catalog, default_database, MyTable]], fields=[a, b, c, proctime, rowtime]): rowcount = 1.0E8, cumulative cost = {1.0E8 rows, 1.0E8 cpu, 4.8E9 io, 0.0 network, 0.0 memory}, id = 378 FlinkLogicalSnapshot(period=[$cor0.proctime]): rowcount = 1.0E8, cumulative cost = {2.0E8 rows, 2.0E8 cpu, 4.0E9 io, 0.0 network, 0.0 memory}, id = 402 FlinkLogicalTableSourceScan(table=[[default_catalog, default_database, LookupTable]], fields=[id, name, age]): rowcount = 1.0E8, cumulative cost = {1.0E8 rows, 1.0E8 cpu, 2.0E9 io, 0.0 network, 0.0 memory}, id = 381 // Final result: FlinkLogicalJoin(condition=[=($0, $5)], joinType=[inner]) :- FlinkLogicalDataStreamTableScan(table=[[default_catalog, default_database, MyTable]], fields=[a, b, c, proctime, rowtime]) +- FlinkLogicalSnapshot(period=[$cor0.proctime]) +- FlinkLogicalTableSourceScan(table=[[default_catalog, default_database, LookupTable]], fields=[id, name, age])
Try rule
Rules Attempts Time (us) FlinkJoinPushExpressionsRule 2 553 JoinConditionPushRule 2 152 FlinkLogicalTableSourceScanConverter(in:NONE,out:LOGICAL) 1 54,956 FlinkLogicalJoinConverter(in:NONE,out:LOGICAL) 1 4,787 FlinkLogicalSnapshotConverter(in:NONE,out:LOGICAL) 1 3,162 FlinkLogicalDataStreamTableScanConverter(in:NONE,out:LOGICAL) 1 1,403 SimplifyJoinConditionRule 1 249 * Total 9 65,262
Among them: Several converters are placed in logical_ In converters, this set contains a series of logical rules for converting calculate node into Flink node.
- For example: FlinkLogicalSnapshotConverter:
// Convert LogicalSnapshot to FlinkLogicalSnapshot class FlinkLogicalSnapshotConverter extends ConverterRule( // Match the LogicalSnapshot type without Convention, and the output is flinkconventions LOGICAL classOf[LogicalSnapshot], Convention.NONE, FlinkConventions.LOGICAL, "FlinkLogicalSnapshotConverter") { def convert(rel: RelNode): RelNode = { val snapshot = rel.asInstanceOf[LogicalSnapshot] val newInput = RelOptRule.convert(snapshot.getInput, FlinkConventions.LOGICAL) FlinkLogicalSnapshot.create(newInput, snapshot.getPeriod) } }
- Operators that increase processing time and materialize
// convert time indicators chainedProgram.addLast(TIME_INDICATOR, new FlinkRelTimeIndicatorProgram) // If it is an event time, and if necessary, an sqlFunction will be created to implement it rexBuilder.makeCall(FlinkSqlOperatorTable.PROCTIME_MATERIALIZE, expr)
Converted:
FlinkLogicalCalc(select=[a, b, c, PROCTIME_MATERIALIZE(proctime) AS proctime, rowtime, id, name, age]) +- FlinkLogicalJoin(condition=[=($0, $5)], joinType=[inner]) :- FlinkLogicalDataStreamTableScan(table=[[default_catalog, default_database, MyTable]], fields=[a, b, c, proctime, rowtime]) +- FlinkLogicalSnapshot(period=[$cor0.proctime]) +- FlinkLogicalTableSourceScan(table=[[default_catalog, default_database, LookupTable]], fields=[id, name, age])
Physical rule optimization
After being processed by the following physical Volcano rules
FlinkJoinPushExpressionsRule JoinConditionPushRule StreamPhysicalTableSourceScanRule(in:LOGICAL,out:STREAM_PHYSICAL) FlinkLogicalTableSourceScanConverter(in:NONE,out:LOGICAL) StreamPhysicalSnapshotOnTableScanRule StreamPhysicalCalcRule(in:LOGICAL,out:STREAM_PHYSICAL) FlinkLogicalJoinConverter(in:NONE,out:LOGICAL) StreamPhysicalDataStreamScanRule(in:LOGICAL,out:STREAM_PHYSICAL) FlinkLogicalSnapshotConverter(in:NONE,out:LOGICAL) FlinkLogicalDataStreamTableScanConverter(in:NONE,out:LOGICAL) SimplifyJoinConditionRule
Get the best result:
Calc(select=[a, b, c, PROCTIME_MATERIALIZE(proctime) AS proctime, rowtime, id, name, age]) +- LookupJoin(table=[default_catalog.default_database.LookupTable], joinType=[InnerJoin], async=[false], lookup=[id=a], select=[a, b, c, proctime, rowtime, id, name, age]) +- DataStreamScan(table=[[default_catalog, default_database, MyTable]], fields=[a, b, c, proctime, rowtime])
- Streamphysicalcrule: convert flinklogicalc to streamphysicalc
- SnapshotOnTableScanRule: Set
FlinkLogicalJoin +- FlinkLogicalDataStreamTableScan +- FlinkLogicalSnapshot +- FlinkLogicalTableSourceScan
convert to
StreamPhysicalLookupJoin +- StreamPhysicalDataStreamScan
Here is the key transformation logic of LookupJoin:
// This rule uses the matching criteria of the parent class class SnapshotOnTableScanRule extends BaseSnapshotOnTableScanRule("StreamPhysicalSnapshotOnTableScanRule") { }
// As you can see, it just matches the logical plan before optimization abstract class BaseSnapshotOnTableScanRule(description: String) extends RelOptRule( operand(classOf[FlinkLogicalJoin], operand(classOf[FlinkLogicalRel], any()), operand(classOf[FlinkLogicalSnapshot], operand(classOf[TableScan], any()))), description) with CommonLookupJoinRule private def doTransform( join: FlinkLogicalJoin, input: FlinkLogicalRel, temporalTable: RelOptTable, calcProgram: Option[RexProgram]): StreamPhysicalLookupJoin = { val joinInfo = join.analyzeCondition val cluster = join.getCluster val providedTrait = join.getTraitSet.replace(FlinkConventions.STREAM_PHYSICAL) val requiredTrait = input.getTraitSet.replace(FlinkConventions.STREAM_PHYSICAL) //Convert the input from a logical node to a physical node, where the StreamPhysicalDataStreamScanRule will be triggered, //Convert FlinkLogicalTableSourceScan to StreamPhysicalDataStreamScan val convInput = RelOptRule.convert(input, requiredTrait) new StreamPhysicalLookupJoin( cluster, providedTrait, convInput, temporalTable, calcProgram, joinInfo, join.getJoinType) }
This completes the conversion of the physical plan
Translation physics program
planner.translate()
These include:
val execGraph = translateToExecNodeGraph(optimizedRelNodes) val transformations = translateToPlan(execGraph)
In translateToExecNodeGraph: the translateToExecNode method of the last node generated by the physical plan will be called. as
- StreamPhysicalLookupJoin will be converted to StreamExecLookupJoin
In translateToPlan: call the translateToPlanInternal method of ExecNode. Take CommonExecLookupJoin as an example:
protected CommonExecLookupJoin(......){ //The checksum asynchronous LookupFunction logic is ignored here public Transformation<RowData> translateToPlanInternal(PlannerBase planner) { // -----------Create factory for lookupFunction Operator--------------- RelOptTable temporalTable = temporalTableSourceSpec.getTemporalTable(planner); UserDefinedFunction userDefinedFunction = LookupJoinUtil.getLookupFunction(temporalTable, lookupKeys.keySet()); UserDefinedFunctionHelper.prepareInstance( planner.getTableConfig().getConfiguration(), userDefinedFunction); boolean isLeftOuterJoin = joinType == FlinkJoinType.LEFT; StreamOperatorFactory<RowData> operatorFactory; operatorFactory = createSyncLookupJoin( temporalTable, planner.getTableConfig(), lookupKeys, (TableFunction<Object>) userDefinedFunction, planner.getRelBuilder(), inputRowType, tableSourceRowType, resultRowType, isLeftOuterJoin, planner.getExecEnv().getConfig().isObjectReuseEnabled()); //------------------------------------------------------- // Convert to Transformation Transformation<RowData> inputTransformation = (Transformation<RowData>) inputEdge.translateToPlan(planner); return new OneInputTransformation<>( inputTransformation, getDescription(), operatorFactory, InternalTypeInfo.of(resultRowType), inputTransformation.getParallelism()); } }
//Only the core logic is listed, which is mainly divided into three parts private StreamOperatorFactory<RowData> createSyncLookupJoin() { // The function of lookupFunction is generated through codeGenerator and packaged into FlatMap function GeneratedFunction<FlatMapFunction<RowData, RowData>> generatedFetcher = LookupJoinCodeGenerator.generateSyncLookupFunction(); // The Collector that generates the output of the table function GeneratedCollector<TableFunctionCollector<RowData>> generatedCollector = LookupJoinCodeGenerator.generateCollector(); // Finally, the ProcessFunction of LookupJoinRunner will be generated // If there is Calc on the lookupJoin side (i.e. the right table), the Runner will have Calc calculation logic // For example: SELECT * FROM T JOIN DIM FOR SYSTEM_TIME AS OF T.proctime AS D ON T.a = D.b + 1 // The Fetcher will read out the original data in the LookupFunction, and then compare it with the data in the main table (left stream) after calc calculation GeneratedFunction<FlatMapFunction<RowData, RowData>> generatedCalc = LookupJoinCodeGenerator.generateCalcMapFunction( config, JavaScalaConversionUtil.toScala(projectionOnTemporalTable), filterOnTemporalTable, temporalTableOutputType, tableSourceRowType); ProcessFunction<RowData, RowData> processFunc = new LookupJoinWithCalcRunner( generatedFetcher, generatedCalc, generatedCollector, isLeftOuterJoin, rightRowType.getFieldCount()); }
Finally, transformations - > streamgraph - > jobgraph will be unified with the process of DataStream API.