Calcite's PullUp rule

Posted by pk-uk on Tue, 01 Feb 2022 03:15:11 +0100

background

There is a bug in the previously designed tranform, which can be solved by raising the limit, so learn the rules raised on calculate. By the way, it also lays a foundation for the elimination of public expressions later.

rule

UnionPullUpConstantsRule

What did you do?

It is not difficult to see from the name that the constant of union is raised. For example, the following SQL has two constants 2

select 2, deptno, job from emp as e1
union all
select 2, deptno, job from emp as e2

The plan before optimization is as follows

LogicalUnion(all=[true])
  LogicalProject(EXPR$0=[2], DEPTNO=[$7], JOB=[$2])
    LogicalTableScan(table=[[CATALOG, SALES, EMP]])
  LogicalProject(EXPR$0=[2], DEPTNO=[$7], JOB=[$2])
    LogicalTableScan(table=[[CATALOG, SALES, EMP]])

The optimized plan is as follows. The operation of querying constant 2 is after union

Although this case is not enough to see the effect of optimization, after changing union to union distinct, it can be found that the optimization can solve the amount of computation of distinct (combined with the push-down of other filter s, the optimization effect is more obvious)

LogicalProject(EXPR$0=[2], DEPTNO=[$0], JOB=[$1])
  LogicalUnion(all=[true])
    LogicalProject(DEPTNO=[$7], JOB=[$2])
      LogicalTableScan(table=[[CATALOG, SALES, EMP]])
    LogicalProject(DEPTNO=[$7], JOB=[$2])
      LogicalTableScan(table=[[CATALOG, SALES, EMP]])

As can be seen from the above example, this rule modifies two logicalprojects and adds a LogicalProject to LogicalUnion. The code corresponding to the rule should not be very simple

How?

Like most rules, it is handled in the onMatch method

  1. Test constant
    Firstly, constant is extracted to judge whether optimization is needed
final Union union = call.rel(0);
// In fact, the builder of line expression can be placed below. This code will not be used
final RexBuilder rexBuilder = union.getCluster().getRexBuilder();
// Metadata records some query information
final RelMetadataQuery mq = call.getMetadataQuery();
// Get the predicate through getPulledUpPredicates
final RelOptPredicateList predicates = mq.getPulledUpPredicates(union);
// If the predicate is empty, there is no need to optimize
if (predicates == null) {
  return;
}
// When the predicate is not empty, the constant is extracted from it
final Map<Integer, RexNode> constants = new HashMap<>();
for (Map.Entry<RexNode, RexNode> e : predicates.constantMap.entrySet()) {
  if (e.getKey() instanceof RexInputRef) {
    constants.put(((RexInputRef) e.getKey()).getIndex(), e.getValue());
  }
}
// Constants do not need to be optimized and are returned directly
// None of the expressions are constant. Nothing to do.
if (constants.isEmpty()) {
  return;
}
  1. Build the expression for the top LogicalProject
    Go down and start real optimization. The RexBuilder just created will be used below
 // Create expressions for Project operators before and after the Union
 List<RelDataTypeField> fields = union.getRowType().getFieldList();//Field of union
 List<RexNode> topChildExprs = new ArrayList<>(); // Expression or constant
 List<String> topChildExprsFields = new ArrayList<>(); //Field name corresponding to expression
 List<RexNode> refs = new ArrayList<>(); // Reference to expression
 // Referenced builder
 ImmutableBitSet.Builder refsIndexBuilder = ImmutableBitSet.builder();
 for (RelDataTypeField field : fields) {
   final RexNode constant = constants.get(field.getIndex());
   // Constants can be directly put in
   if (constant != null) {
     topChildExprs.add(constant);
     topChildExprsFields.add(field.getName());
   } else {
   	 // In case of an extraordinary amount, build the application
     final RexNode expr = rexBuilder.makeInputRef(union, field.getIndex());
     topChildExprs.add(expr);
     topChildExprsFields.add(field.getName());
     refs.add(expr);
     refsIndexBuilder.set(field.getIndex());
   }
 }
ImmutableBitSet refsIndex = refsIndexBuilder.build();
// Update top Project positions
final Mappings.TargetMapping mapping =
    RelOptUtil.permutation(refs, union.getInput(0).getRowType()).inverse();
topChildExprs = ImmutableList.copyOf(RexUtil.apply(mapping, topChildExprs));

The whole step is to build the expression of the top Project with files

The picture shows the single test of testpullconstant through union


3. Build a new plan
It was mentioned earlier that the two logicalprojects before the Union should be changed and a LogicalProject after the Union should be added, which is here

// Create new Project-Union-Project sequences
final RelBuilder relBuilder = call.builder();
for (RelNode input : union.getInputs()) {
  List<Pair<RexNode, String>> newChildExprs = new ArrayList<>();
  for (int j : refsIndex) {
    newChildExprs.add(
        Pair.of(rexBuilder.makeInputRef(input, j),
            input.getRowType().getFieldList().get(j).getName()));
  }
  if (newChildExprs.isEmpty()) {
    // At least a single item in project is required.
    newChildExprs.add(
        Pair.of(topChildExprs.get(0), topChildExprsFields.get(0)));
  }
  // Add the input with project on top
  relBuilder.push(input);
  relBuilder.project(Pair.left(newChildExprs), Pair.right(newChildExprs));
}
relBuilder.union(union.all, union.getInputs().size());
// Create top Project fixing nullability of fields
relBuilder.project(topChildExprs, topChildExprsFields);
relBuilder.convert(union.getRowType(), false);
  1. Finishing
    All reoptrules submit the results of relBuilder through transformTo
    call.transformTo(relBuilder.build());

AggregateProjectPullUpConstantsRule

What did you do?

You can see from the name that it is related to aggregate. There are also relatively simple single tests, as follows

select job, empno, sal, sum(sal) as s
from emp where empno = 10
group by job, empno, sal

Before optimization, the data of EMPNO should be aggregated during aggregation

LogicalAggregate(group=[{0, 1, 2}], S=[SUM($2)])
  LogicalProject(JOB=[$2], EMPNO=[$0], SAL=[$5])
    LogicalFilter(condition=[=($0, 10)])
      LogicalTableScan(table=[[CATALOG, SALES, EMP]])

After optimization, there is no need to aggregate EMPNO data during aggregation, saving part of the calculation

LogicalProject(JOB=[$0], EMPNO=[10], SAL=[$1], S=[$2])
  LogicalAggregate(group=[{0, 1}], S=[SUM($1)])
    LogicalProject(JOB=[$2], SAL=[$5])
      LogicalFilter(condition=[=($0, 10)])
        LogicalTableScan(table=[[CATALOG, SALES, EMP]])

Like UnionPullUpConstantsRule, the LogicalProject before LogicalAggregate is changed and a LogicalProject after LogicalAggregate is added, but one more LogicalAggregate is changed

How?

  1. Test constant
    It is very similar to UnionPullUpConstantsRule. If there is no constant, you can return directly without optimization
 final Aggregate aggregate = call.rel(0);
 final RelNode input = call.rel(1);

 final int groupCount = aggregate.getGroupCount();
 if (groupCount == 1) {
   // No room for optimization since we cannot convert from non-empty
   // GROUP BY list to the empty one.
   return;
 }

 final RexBuilder rexBuilder = aggregate.getCluster().getRexBuilder();
 final RelMetadataQuery mq = call.getMetadataQuery();
 final RelOptPredicateList predicates =
     mq.getPulledUpPredicates(aggregate.getInput());
 if (predicates == null) {
   return;
 }
 final NavigableMap<Integer, RexNode> map = new TreeMap<>();
 for (int key : aggregate.getGroupSet()) {
   final RexInputRef ref =
       rexBuilder.makeInputRef(aggregate.getInput(), key);
   if (predicates.constantMap.containsKey(ref)) {
     map.put(key, predicates.constantMap.get(ref));
   }
 }

 // None of the group expressions are constant. Nothing to do.
 if (map.isEmpty()) {
   return;
 }
  1. Build Aggregate
    Because there is less aggregated data, you also need to adjust the key of LogicalAggregate
if (groupCount == map.size()) {
  // At least a single item in group by is required.
  // Otherwise "GROUP BY 1, 2" might be altered to "GROUP BY ()".
  // Removing of the first element is not optimal here,
  // however it will allow us to use fast path below (just trim
  // groupCount).
  map.remove(map.navigableKeySet().first());
}

ImmutableBitSet newGroupSet = aggregate.getGroupSet();
for (int key : map.keySet()) {
  newGroupSet = newGroupSet.clear(key);
}
final int newGroupCount = newGroupSet.cardinality();

// If the constants are on the trailing edge of the group list, we just
// reduce the group count.
final RelBuilder relBuilder = call.builder();
relBuilder.push(input);

// Clone aggregate calls.
final List<AggregateCall> newAggCalls = new ArrayList<>();
for (AggregateCall aggCall : aggregate.getAggCallList()) {
  newAggCalls.add(
      aggCall.adaptTo(input, aggCall.getArgList(), aggCall.filterArg,
          groupCount, newGroupCount));
}
relBuilder.aggregate(relBuilder.groupKey(newGroupSet), newAggCalls);

Notice relbuilder During aggregate (), the project before aggregate is also trimmed

  1. Build Project
    It is mainly to collect the expressions required by the top project, and then build
// Create a projection back again.
List<Pair<RexNode, String>> projects = new ArrayList<>();
int source = 0;
for (RelDataTypeField field : aggregate.getRowType().getFieldList()) {
  RexNode expr;
  final int i = field.getIndex();
  if (i >= groupCount) {
    // Aggregate expressions' names and positions are unchanged.
    expr = relBuilder.field(i - map.size());
  } else {
    int pos = aggregate.getGroupSet().nth(i);
    if (map.containsKey(pos)) {
      // Re-generate the constant expression in the project.
      RelDataType originalType =
          aggregate.getRowType().getFieldList().get(projects.size()).getType();
      if (!originalType.equals(map.get(pos).getType())) {
        expr = rexBuilder.makeCast(originalType, map.get(pos), true);
      } else {
        expr = map.get(pos);
      }
    } else {
      // Project the aggregation expression, in its original
      // position.
      expr = relBuilder.field(source);
      ++source;
    }
  }
  projects.add(Pair.of(expr, field.getName()));
}
relBuilder.project(Pair.left(projects), Pair.right(projects)); // inverse
  1. Finishing
    call.transformTo(relBuilder.build()); plan submitted after optimization

Topics: Database SQL DBA