background
There is a bug in the previously designed tranform, which can be solved by raising the limit, so learn the rules raised on calculate. By the way, it also lays a foundation for the elimination of public expressions later.
rule
UnionPullUpConstantsRule
What did you do?
It is not difficult to see from the name that the constant of union is raised. For example, the following SQL has two constants 2
select 2, deptno, job from emp as e1 union all select 2, deptno, job from emp as e2
The plan before optimization is as follows
LogicalUnion(all=[true]) LogicalProject(EXPR$0=[2], DEPTNO=[$7], JOB=[$2]) LogicalTableScan(table=[[CATALOG, SALES, EMP]]) LogicalProject(EXPR$0=[2], DEPTNO=[$7], JOB=[$2]) LogicalTableScan(table=[[CATALOG, SALES, EMP]])
The optimized plan is as follows. The operation of querying constant 2 is after union
Although this case is not enough to see the effect of optimization, after changing union to union distinct, it can be found that the optimization can solve the amount of computation of distinct (combined with the push-down of other filter s, the optimization effect is more obvious)
LogicalProject(EXPR$0=[2], DEPTNO=[$0], JOB=[$1]) LogicalUnion(all=[true]) LogicalProject(DEPTNO=[$7], JOB=[$2]) LogicalTableScan(table=[[CATALOG, SALES, EMP]]) LogicalProject(DEPTNO=[$7], JOB=[$2]) LogicalTableScan(table=[[CATALOG, SALES, EMP]])
As can be seen from the above example, this rule modifies two logicalprojects and adds a LogicalProject to LogicalUnion. The code corresponding to the rule should not be very simple
How?
Like most rules, it is handled in the onMatch method
- Test constant
Firstly, constant is extracted to judge whether optimization is needed
final Union union = call.rel(0); // In fact, the builder of line expression can be placed below. This code will not be used final RexBuilder rexBuilder = union.getCluster().getRexBuilder(); // Metadata records some query information final RelMetadataQuery mq = call.getMetadataQuery(); // Get the predicate through getPulledUpPredicates final RelOptPredicateList predicates = mq.getPulledUpPredicates(union); // If the predicate is empty, there is no need to optimize if (predicates == null) { return; } // When the predicate is not empty, the constant is extracted from it final Map<Integer, RexNode> constants = new HashMap<>(); for (Map.Entry<RexNode, RexNode> e : predicates.constantMap.entrySet()) { if (e.getKey() instanceof RexInputRef) { constants.put(((RexInputRef) e.getKey()).getIndex(), e.getValue()); } } // Constants do not need to be optimized and are returned directly // None of the expressions are constant. Nothing to do. if (constants.isEmpty()) { return; }
- Build the expression for the top LogicalProject
Go down and start real optimization. The RexBuilder just created will be used below
// Create expressions for Project operators before and after the Union List<RelDataTypeField> fields = union.getRowType().getFieldList();//Field of union List<RexNode> topChildExprs = new ArrayList<>(); // Expression or constant List<String> topChildExprsFields = new ArrayList<>(); //Field name corresponding to expression List<RexNode> refs = new ArrayList<>(); // Reference to expression // Referenced builder ImmutableBitSet.Builder refsIndexBuilder = ImmutableBitSet.builder(); for (RelDataTypeField field : fields) { final RexNode constant = constants.get(field.getIndex()); // Constants can be directly put in if (constant != null) { topChildExprs.add(constant); topChildExprsFields.add(field.getName()); } else { // In case of an extraordinary amount, build the application final RexNode expr = rexBuilder.makeInputRef(union, field.getIndex()); topChildExprs.add(expr); topChildExprsFields.add(field.getName()); refs.add(expr); refsIndexBuilder.set(field.getIndex()); } } ImmutableBitSet refsIndex = refsIndexBuilder.build(); // Update top Project positions final Mappings.TargetMapping mapping = RelOptUtil.permutation(refs, union.getInput(0).getRowType()).inverse(); topChildExprs = ImmutableList.copyOf(RexUtil.apply(mapping, topChildExprs));
The whole step is to build the expression of the top Project with files
The picture shows the single test of testpullconstant through union
3. Build a new plan
It was mentioned earlier that the two logicalprojects before the Union should be changed and a LogicalProject after the Union should be added, which is here
// Create new Project-Union-Project sequences final RelBuilder relBuilder = call.builder(); for (RelNode input : union.getInputs()) { List<Pair<RexNode, String>> newChildExprs = new ArrayList<>(); for (int j : refsIndex) { newChildExprs.add( Pair.of(rexBuilder.makeInputRef(input, j), input.getRowType().getFieldList().get(j).getName())); } if (newChildExprs.isEmpty()) { // At least a single item in project is required. newChildExprs.add( Pair.of(topChildExprs.get(0), topChildExprsFields.get(0))); } // Add the input with project on top relBuilder.push(input); relBuilder.project(Pair.left(newChildExprs), Pair.right(newChildExprs)); } relBuilder.union(union.all, union.getInputs().size()); // Create top Project fixing nullability of fields relBuilder.project(topChildExprs, topChildExprsFields); relBuilder.convert(union.getRowType(), false);
- Finishing
All reoptrules submit the results of relBuilder through transformTo
call.transformTo(relBuilder.build());
AggregateProjectPullUpConstantsRule
What did you do?
You can see from the name that it is related to aggregate. There are also relatively simple single tests, as follows
select job, empno, sal, sum(sal) as s from emp where empno = 10 group by job, empno, sal
Before optimization, the data of EMPNO should be aggregated during aggregation
LogicalAggregate(group=[{0, 1, 2}], S=[SUM($2)]) LogicalProject(JOB=[$2], EMPNO=[$0], SAL=[$5]) LogicalFilter(condition=[=($0, 10)]) LogicalTableScan(table=[[CATALOG, SALES, EMP]])
After optimization, there is no need to aggregate EMPNO data during aggregation, saving part of the calculation
LogicalProject(JOB=[$0], EMPNO=[10], SAL=[$1], S=[$2]) LogicalAggregate(group=[{0, 1}], S=[SUM($1)]) LogicalProject(JOB=[$2], SAL=[$5]) LogicalFilter(condition=[=($0, 10)]) LogicalTableScan(table=[[CATALOG, SALES, EMP]])
Like UnionPullUpConstantsRule, the LogicalProject before LogicalAggregate is changed and a LogicalProject after LogicalAggregate is added, but one more LogicalAggregate is changed
How?
- Test constant
It is very similar to UnionPullUpConstantsRule. If there is no constant, you can return directly without optimization
final Aggregate aggregate = call.rel(0); final RelNode input = call.rel(1); final int groupCount = aggregate.getGroupCount(); if (groupCount == 1) { // No room for optimization since we cannot convert from non-empty // GROUP BY list to the empty one. return; } final RexBuilder rexBuilder = aggregate.getCluster().getRexBuilder(); final RelMetadataQuery mq = call.getMetadataQuery(); final RelOptPredicateList predicates = mq.getPulledUpPredicates(aggregate.getInput()); if (predicates == null) { return; } final NavigableMap<Integer, RexNode> map = new TreeMap<>(); for (int key : aggregate.getGroupSet()) { final RexInputRef ref = rexBuilder.makeInputRef(aggregate.getInput(), key); if (predicates.constantMap.containsKey(ref)) { map.put(key, predicates.constantMap.get(ref)); } } // None of the group expressions are constant. Nothing to do. if (map.isEmpty()) { return; }
- Build Aggregate
Because there is less aggregated data, you also need to adjust the key of LogicalAggregate
if (groupCount == map.size()) { // At least a single item in group by is required. // Otherwise "GROUP BY 1, 2" might be altered to "GROUP BY ()". // Removing of the first element is not optimal here, // however it will allow us to use fast path below (just trim // groupCount). map.remove(map.navigableKeySet().first()); } ImmutableBitSet newGroupSet = aggregate.getGroupSet(); for (int key : map.keySet()) { newGroupSet = newGroupSet.clear(key); } final int newGroupCount = newGroupSet.cardinality(); // If the constants are on the trailing edge of the group list, we just // reduce the group count. final RelBuilder relBuilder = call.builder(); relBuilder.push(input); // Clone aggregate calls. final List<AggregateCall> newAggCalls = new ArrayList<>(); for (AggregateCall aggCall : aggregate.getAggCallList()) { newAggCalls.add( aggCall.adaptTo(input, aggCall.getArgList(), aggCall.filterArg, groupCount, newGroupCount)); } relBuilder.aggregate(relBuilder.groupKey(newGroupSet), newAggCalls);
Notice relbuilder During aggregate (), the project before aggregate is also trimmed
- Build Project
It is mainly to collect the expressions required by the top project, and then build
// Create a projection back again. List<Pair<RexNode, String>> projects = new ArrayList<>(); int source = 0; for (RelDataTypeField field : aggregate.getRowType().getFieldList()) { RexNode expr; final int i = field.getIndex(); if (i >= groupCount) { // Aggregate expressions' names and positions are unchanged. expr = relBuilder.field(i - map.size()); } else { int pos = aggregate.getGroupSet().nth(i); if (map.containsKey(pos)) { // Re-generate the constant expression in the project. RelDataType originalType = aggregate.getRowType().getFieldList().get(projects.size()).getType(); if (!originalType.equals(map.get(pos).getType())) { expr = rexBuilder.makeCast(originalType, map.get(pos), true); } else { expr = map.get(pos); } } else { // Project the aggregation expression, in its original // position. expr = relBuilder.field(source); ++source; } } projects.add(Pair.of(expr, field.getName())); } relBuilder.project(Pair.left(projects), Pair.right(projects)); // inverse
- Finishing
call.transformTo(relBuilder.build()); plan submitted after optimization