PostgreSQL -- Query Optimization -- organize plan tree

Posted by bampot on Sun, 26 Dec 2021 09:47:40 +0100

2021SC@SDUSC

summary

PostgreSQL code I am responsible for: query compilation and execution
Analysis content of this blog: Query Optimization - generate plan
The whole process of query optimization can be divided into three stages: preprocessing, generating path and generating plan. In the last blog, I analyzed the process of generating the complete plan and the functions I invoked. In this blog, I will analyze the last step of query optimization, and also the last step of generating the plan.
After the generated complete plan is sorted out by the plan tree, it can be handed over to the query executor for execution. The main function responsible for sorting is set_plan_references function. Sorting out the plan tree is the last step of the query optimizer. It is mainly to facilitate the execution of the executor and make final adjustments to some expression details of the plan tree. For example, change the upper Var structure into a reference to the output result of the sub plan, obtain the OID of the operator, etc. At the same time, this step will also delete those useless sub query scan plan nodes. Different functions will be used to perform different sorting operations during sorting. The main sorting functions are as follows:

Main functionFunction introduction
fix_expr_referencesBy calling fix_expr_references_walker to clean up the expression (target attribute table or condition expression)
set_subqueryscan_referencesSet on SubqueryScan_ plan_ References, that is, trying to remove the SubqueryScan node
fix_opfuncidsBy calling fix_opfuncids_walker, which complements the value of the expression node called by the operator in an expression tree
trivial_subqueryscanCheck whether a SubqueryScan can be deleted from the plan tree
adjust_plan_varnosAdjust the offset of the indexes of varnos and other related tables in the plan tree through rtoffset
adjust_expr_varnosAdjust the offset of varnos in a variable in an expression through rtoffset
fix_expr_referencesDo the final cleanup of expressions (such as target attribute table or constraint expression)
set_join_referencesModify the target attribute table and constraint expression of the join node by setting varnos as the inner connection or outer connection and setting the value of attno as the number of tuples of the result of inner connection or outer connection
set_inner_join_referencesThe join node that handles the expression that appears in the internal index join
set_uppernode_referencesUpdate the target variable list and expression of the upper layer according to the return tuple of the left subtree sub plan
build_tlist_indexThe result of building an index for the target attribute table of a child node
search_indexed_tlist_for_varFind a variable var in the list of an index. If found, return a copy of it. If not found, return null
search_indexed_tlist_for_non_varFind a non variable node in the list of an index. If it exists, return a Var to point to the item. If it does not exist, return null
join_referencesCreate a join expression or a collection of target attribute tables by modifying the value of varno/varattno in the sentence to the target attribute table of external connection or internal connection
replace_vars_with_subplan_refsThis program modifies the expression tree so that all variable nodes are associated with the target node of the sub plan. It is mainly used to deal with non connected expressions or target attribute tables in upper level plan nodes

Next, I will select some functions for code analysis and function explanation

set_plan_references function

The functions of this function are
1: Reduce the range table of each sub query to a single list, and eliminate the rangetbentry field that is useless to the performer
2: Adjust the variables in the scan node to match the flat range table
3: Adjust the variables in the top-level plan node so that they refer to output sub plans
4: Optimize the local aggregation or minimum aggregation of the summary in the ag plan node
5:PARAM_ The multixpr parameter is used by the normal param_ Replaced by exec parameter
6: Calculate the regproc OIDs of the operator (that is, find the function that implements each operator)
7: Create a list of specific objects on which the plan depends. These lists are called plancache C is used to control the invalidation of cache plan
8: Assign a unique ID to each plan node in the plan tree

set_plan_references(PlannerInfo *root, Plan *plan)
{
	PlannerGlobal *glob = root->glob;
	int			rtoffset = list_length(glob->finalrtable);
	ListCell   *lc;
//Add the RTE of all queries to the flattened range table. RTE will increase their rangetable indexes through rtoffset (additional RTS, which are not referenced by the Plan tree, may be added after these RTs)
	add_rtes_to_flat_rtable(root, false);
//Adjust the RT index of PlanRowMarks and add it to the final row mark list

	foreach(lc, root->rowMarks)//ergodic
	{
		PlanRowMark *rc = lfirst_node(PlanRowMark, lc);
		PlanRowMark *newrc;
    //Because all fields are scalars, a flat copy is sufficient 
		newrc = (PlanRowMark *) palloc(sizeof(PlanRowMark));
		memcpy(newrc, rc, sizeof(PlanRowMark));
    //Adjust index But not the index of the row flag
		newrc->rti += rtoffset;
		newrc->prti += rtoffset;
		//Add to final rowmarks field
		glob->finalrowmarks = lappend(glob->finalrowmarks, newrc);
	}
	//Modify plan tree
	return set_plan_refs(root, plan, rtoffset);
}

set_subqueryscan_references function

Set on SubqueryScan_ plan_ References, that is, trying to remove the SubqueryScan node. If not, it must be processed normally.

set_subqueryscan_references(PlannerInfo *root,
							SubqueryScan *plan,
							int rtoffset)
{
	RelOptInfo *rel;
	Plan	   *result;

	 //The reoptinfo of the subquery needs to be found because its subroot is required 
	 	rel = find_base_rel(root, plan->scan.scanrelid);

	//Process sub plans recursively
	plan->subplan = set_plan_references(rel->subroot, plan->subplan);

	if (trivial_subqueryscan(plan))
	{
		// We can omit the SubqueryScan node and directly pull out the sub plan
		result = clean_up_removed_plan_level((Plan *) plan, plan->subplan);
	}
	else
	{
		 //Set is required to reserve SubqueryScan nodes_ plan_ References is what you were supposed to do. Note that there is no set here_ upper_ References (), because SubqueryScan always correctly references the output of its sub plan when it is created
		
		plan->scan.scanrelid += rtoffset;
		plan->scan.plan.targetlist =
			fix_scan_list(root, plan->scan.plan.targetlist, rtoffset);
		plan->scan.plan.qual =
			fix_scan_list(root, plan->scan.plan.qual, rtoffset);

		result = (Plan *) plan;
	}
   //Return query tree
	return result;
}

set_join_references function

By setting varnos to OUTER_VAR or INNER_VAR, and set the attno value to the result field number of the corresponding external connection or internal connection tuple item to modify the target list and quals of the connection node to reference its sub plan

set_join_references(PlannerInfo *root, Join *join, int rtoffset)
{
	Plan	   *outer_plan = join->plan.lefttree;
	Plan	   *inner_plan = join->plan.righttree;
	indexed_tlist *outer_itlist;
	indexed_tlist *inner_itlist;

	outer_itlist = build_tlist_index(outer_plan->targetlist);
	inner_itlist = build_tlist_index(inner_plan->targetlist);

	//Handle join expressions (including merge or hash clauses) first. These are logically below the join, so they can always use all available values in the input list. You can also handle NestLoopParams, because these cannot refer to nullable subexpressions
	join->joinqual = fix_join_expr(root,join->joinqual,
								   outer_itlist,
								   inner_itlist,
								   (Index) 0,
								   rtoffset);

	//Do specific connection types
	if (IsA(join, NestLoop))
	{
		NestLoop   *nl = (NestLoop *) join;
		ListCell   *lc;

		foreach(lc, nl->nestParams)
		{
			NestLoopParam *nlp = (NestLoopParam *) lfirst(lc);

			nlp->paramval = (Var *) fix_upper_expr(root,
												   (Node *) nlp->paramval,												   outer_itlist,OUTER_VAR,rtoffset);
//Check that any PlaceHolderVar is replaced with a simple Var
			if (!(IsA(nlp->paramval, Var) &&
				  nlp->paramval->varno == OUTER_VAR))
				elog(ERROR, "NestLoopParam was not reduced to a simple Var");
		}
	}
	else if (IsA(join, MergeJoin))//Judge whether the link type is MergeJoin
	{
		MergeJoin  *mj = (MergeJoin *) join;

		mj->mergeclauses = fix_join_expr(root,
										 mj->mergeclauses,
										 outer_itlist,
										 inner_itlist,
										 (Index) 0,
										 rtoffset);
	}
	else if (IsA(join, HashJoin))//Judge whether it is a connection or a HashJoin
	{
		HashJoin   *hj = (HashJoin *) join;

		hj->hashclauses = fix_join_expr(root,
										hj->hashclauses,
										outer_itlist,
										inner_itlist,
										(Index) 0,
										rtoffset);

//The hash key of HashJoin is used to find the matching entity of hash table from its external plan.
		hj->hashkeys = (List *) fix_upper_expr(root,
											   (Node *) hj->hashkeys, outer_itlist,OUTER_VAR,rtoffset);
	}

	 //targetlist and qpqual need to be fixed. They are logically above the connection, which means that they should not reuse any input expression that can be evaluated on the empty side of the outer connection
	 //Determine the type of connection
	switch (join->jointype)
	{
		case JOIN_LEFT:
		case JOIN_SEMI:
		case JOIN_ANTI:
			inner_itlist->has_non_vars = false;
			break;
		case JOIN_RIGHT:
			outer_itlist->has_non_vars = false;
			break;
		case JOIN_FULL:
			outer_itlist->has_non_vars = false;
			inner_itlist->has_non_vars = false;
			break;
		default:
			break;
	}

	join->plan.targetlist = fix_join_expr(root,
										  join->plan.targetlist,
										  outer_itlist,
										  inner_itlist,
										  (Index) 0,
										  rtoffset);
	join->plan.qual = fix_join_expr(root,
									join->plan.qual,
									outer_itlist,
									inner_itlist,
									(Index) 0,
									rtoffset);

	pfree(outer_itlist);
	pfree(inner_itlist);
}

search_indexed_tlist_for_non_var function

search_ indexed_ tlist_ for_ non_ The Var function looks for a non Var in an indexed TList. If a matching one is found, it returns a Var constructed to refer to the TList item. If there is no match, NULL is returned. Note: unless itlist - > has_ ph_ Vars or itlist > has_ non_ Vars, otherwise calling this is a waste of time. In addition, set_join_references() depends on the ability to clear itlist - > has_ non_ Vars to prevent non vars matching.

search_indexed_tlist_for_non_var(Expr *node,
								 indexed_tlist *itlist, Index newvarno)
{
	TargetEntry *tle;
//If it is a simple constant, it is foolish to replace it with Var, even if there is exactly the same constant below; Var is more expensive to execute than constants. More importantly, replacing it may confuse some places in the actuator where you want to see simple Consts, such as abandoned columns
	if (IsA(node, Const))//Judge whether the node type is const
		return NULL;

	tle = tlist_member(node, itlist->tlist);
	if (tle)
	{
		//A matching sub plan output representation was found
		Var		   *newvar;

		newvar = makeVarFromTargetEntry(newvarno, tle);
		newvar->varnoold = 0;	//There is no ordinary Var
		newvar->varoattno = 0;
		return newvar;
	}
	return NULL;				//Mismatch returns null
}

trivial_subqueryscan function

Detect whether a SubqueryScan can be deleted from the plan tree. If it is not eligible for inspection, it can be deleted, and the target list just re digests the output of the sub plan.

trivial_subqueryscan(SubqueryScan *plan)
{
	int			attrno;
	ListCell   *lp,
			   *lc;
	//Returns false if the plan expression is not empty
	if (plan->scan.plan.qual != NIL)
		return false;
/// / if the length of the list is different, false is returned
	if (list_length(plan->scan.plan.targetlist) !=
		list_length(plan->subplan->targetlist))
		return false;			

	attrno = 1;
	forboth(lp, plan->scan.plan.targetlist, lc, plan->subplan->targetlist)
	{
		TargetEntry *ptle = (TargetEntry *) lfirst(lp);
		TargetEntry *ctle = (TargetEntry *) lfirst(lc);
     //If tlist does not match the garbage state, false is returned
		if (ptle->resjunk != ctle->resjunk)
			return false;	

//Accept a Var that references the corresponding element in the sub plan tlist, or a Const that is equivalent to the sub plan element
		if (ptle->expr && IsA(ptle->expr, Var))
		{
			Var		   *var = (Var *) ptle->expr;

			Assert(var->varno == plan->scan.scanrelid);
			Assert(var->varlevelsup == 0);
			//false if out of order
			if (var->varattno != attrno)
				return false;	
		}
		else if (ptle->expr && IsA(ptle->expr, Const))
		{
			if (!equal(ptle->expr, ctle->expr))
				return false;
		}
		else
			return false;

		attrno++;
	}
	return true;
}

build_tlist_index function

build_tlist_index - creates an index data structure for a sublist.
In most cases, the sub plan tlist will be a "flat" tlist with only Vars.
Therefore, we try to optimize this situation by extracting the information of Vars. Matching the parent list with the child list is still an O(N^2) operation, but it is at least much smaller than the ordinary constant.

build_tlist_index(List *tlist)
{
	indexed_tlist *itlist;
	tlist_vinfo *vinfo;
	ListCell   *l;//Temporary variable

	//Create a data structure with enough slots to accommodate all tlist entries
	itlist = (indexed_tlist *)
		palloc(offsetof(indexed_tlist, vars) +
			   list_length(tlist) * sizeof(tlist_vinfo));
    //itlist initialization
	itlist->tlist = tlist;
	itlist->has_ph_vars = false;
	itlist->has_non_vars = false;

	//Find Vars and fill in the index array
	vinfo = itlist->vars;
	foreach(l, tlist)//Traverse tlist linked list
	{
		TargetEntry *tle = (TargetEntry *) lfirst(l);

		if (tle->expr && IsA(tle->expr, Var))
		{
			Var		   *var = (Var *) tle->expr;
           //Assign a value to the varno field
			vinfo->varno = var->varno;
			 //Assign a value to the varattno field
			vinfo->varattno = var->varattno;
			 //Assign a value to the resno field
			vinfo->resno = tle->resno;
			vinfo++;
		}
		//If the expression is of type PlaceHolderVar, it will have_ ph_ Vars set to true
		else if (tle->expr && IsA(tle->expr, PlaceHolderVar))
			itlist->has_ph_vars = true;
		//Otherwise, if not, it will have_ non_ The vars field is set to true
		else
			itlist->has_non_vars = true;
	}

	itlist->num_vars = (vinfo - itlist->vars);

	return itlist;
}

summary

Through this blog, I explained the main functions and functions of the last step of query optimization, that is, sorting out the plan tree (after this step, the plan tree can be sorted out and handed over to the query executor for execution), as well as the code analysis of some of the functions. Thank you for your criticism and correction!

Topics: Database PostgreSQL