05 SQL statement execution process parsing 1 - query analysis and optimization rewriting

Posted by refiking on Sat, 04 Dec 2021 04:35:19 +0100

sql execution statement flow analysis

The whole processing flow is in exec_simple_query function. The code structure is as follows:

/*
 * exec_simple_query
 *
 * Execute a "simple Query" protocol message.
 */
static void
exec_simple_query(const char *query_string)
{
	...
	//Original syntax tree acquisition
	/*
	 * Do basic parsing of the query or queries (this should be safe even if
	 * we are in aborted transaction state!)
	 */
	parsetree_list = pg_parse_query(query_string);

	...
	//Circular processing of sql statements
	/*
	 * Run through the raw parsetree(s) and process each one.
	 */
	foreach(parsetree_item, parsetree_list)
	{
		...
		
		//Analyze and rewrite the original syntax tree to generate a query syntax tree
		querytree_list = pg_analyze_and_rewrite(parsetree, query_string,
												NULL, 0, NULL);
		//Optimize the query syntax tree to generate an execution plan
		plantree_list = pg_plan_queries(querytree_list,
										CURSOR_OPT_PARALLEL_OK, NULL);

		...
		
		//Execute statement
		/*
		 * Run the portal to completion, and then drop it (and the receiver).
		 */
		(void) PortalRun(portal,
						 FETCH_ALL,
						 true,	/* always top level */
						 true,
						 receiver,
						 receiver,
						 completionTag);

		...
	}
	...
}

Query analysis and optimization rewriting

Lexical and grammatical analysis

Use FLEX and BISON for syntax parsing. See https://my.oschina.net/Greedxuji/blog/4290160

Query analysis and optimization rewriting

After lexical and grammatical parsing of sql statements, an original syntax tree will be obtained. The function of query analysis is to analyze and rewrite the original syntax tree and convert the original tree into one or more query syntax trees.

The functions of this part are mainly in PG_ analyze_ and_ The main operation steps are syntax analysis and optimized rewriting.

The code framework is as follows:

/*
 * Given a raw parsetree (gram.y output), and optionally information about
 * types of parameter symbols ($n), perform parse analysis and rule rewriting.
 *
 * A list of Query nodes is returned, since either the analyzer or the
 * rewriter might expand one query to several.
 *
 * NOTE: for reasons mentioned above, this must be separate from raw parsing.
 */
List *
pg_analyze_and_rewrite(RawStmt *parsetree, const char *query_string,
					   Oid *paramTypes, int numParams,
					   QueryEnvironment *queryEnv)
{
	Query	   *query;
	List	   *querytree_list;

	TRACE_POSTGRESQL_QUERY_REWRITE_START(query_string);

	/*
	 * (1) Perform parse analysis.
	 */
	if (log_parser_stats)
		ResetUsage();

	//Primitive syntax tree analysis
	query = parse_analyze(parsetree, query_string, paramTypes, numParams,
						  queryEnv);

	if (log_parser_stats)
		ShowUsage("PARSE ANALYSIS STATISTICS");

	//Optimized rewriting of original syntax tree
	/*
	 * (2) Rewrite the queries, as necessary
	 */
	querytree_list = pg_rewrite_query(query);

	TRACE_POSTGRESQL_QUERY_REWRITE_DONE(query_string);

	return querytree_list;
}

Query analysis parse_analyze

Query analysis is to transform the original syntax tree into a query syntax tree. Because the element syntax tree is a tree structure, the nodes traversing the tree perform the corresponding processing.

The basic call stack is as follows. It can be seen that the relevant processing of select has been fully included; The corresponding sql statements can be executed according to the corresponding execution node.

parse_analyze
	->transformTopLevelStmt
		->transformOptionalSelectInto
			->transformStmt
				->transformInsertStmt
				->transformDeleteStmt
				->transformUpdateStmt
				->transformSelectStmt
				->transformDeclareCursorStmt
				->transformExplainStmt
				->transformCreateTableAsStmt
				->transformCallStmt

Main function analysis

Take "SELECT * FROM A_TBL,B_TBL WHERE xx == xx" as an example.

The transformSelectStmt function is called first when the command is executed.

The SELECT command contains 7 kinds of information processing: with. From. Target. Where. Having. Order by. Group by. Distinct. Each information processing corresponds to a processing function. The specific codes are as follows:

static Query *
transformSelectStmt(ParseState *pstate, SelectStmt *stmt)
{
	Query	   *qry = makeNode(Query);
	Node	   *qual;
	ListCell   *l;

	qry->commandType = CMD_SELECT;

	/* process the WITH clause independently of all else */
	if (stmt->withClause)
	{
		qry->hasRecursive = stmt->withClause->recursive;
		qry->cteList = transformWithClause(pstate, stmt->withClause);
		qry->hasModifyingCTE = pstate->p_hasModifyingCTE;
	}

	/* Complain if we get called from someplace where INTO is not allowed */
	if (stmt->intoClause)
		ereport(ERROR,
				(errcode(ERRCODE_SYNTAX_ERROR),
				 errmsg("SELECT ... INTO is not allowed here"),
				 parser_errposition(pstate,
									exprLocation((Node *) stmt->intoClause))));

	/* make FOR UPDATE/FOR SHARE info available to addRangeTableEntry */
	pstate->p_locking_clause = stmt->lockingClause;

	/* make WINDOW info available for window functions, too */
	pstate->p_windowdefs = stmt->windowClause;

	/* process the FROM clause */
	transformFromClause(pstate, stmt->fromClause);

	/* transform targetlist */
	qry->targetList = transformTargetList(pstate, stmt->targetList,
										  EXPR_KIND_SELECT_TARGET);

	/* mark column origins */
	markTargetListOrigins(pstate, qry->targetList);

	/* transform WHERE */
	qual = transformWhereClause(pstate, stmt->whereClause,
								EXPR_KIND_WHERE, "WHERE");

	/* initial processing of HAVING clause is much like WHERE clause */
	qry->havingQual = transformWhereClause(pstate, stmt->havingClause,
										   EXPR_KIND_HAVING, "HAVING");

	/*
	 * Transform sorting/grouping stuff.  Do ORDER BY first because both
	 * transformGroupClause and transformDistinctClause need the results. Note
	 * that these functions can also change the targetList, so it's passed to
	 * them by reference.
	 */
	qry->sortClause = transformSortClause(pstate,
										  stmt->sortClause,
										  &qry->targetList,
										  EXPR_KIND_ORDER_BY,
										  false /* allow SQL92 rules */ );

	qry->groupClause = transformGroupClause(pstate,
											stmt->groupClause,
											&qry->groupingSets,
											&qry->targetList,
											qry->sortClause,
											EXPR_KIND_GROUP_BY,
											false /* allow SQL92 rules */ );

	if (stmt->distinctClause == NIL)
	{
		qry->distinctClause = NIL;
		qry->hasDistinctOn = false;
	}
	else if (linitial(stmt->distinctClause) == NULL)
	{
		/* We had SELECT DISTINCT */
		qry->distinctClause = transformDistinctClause(pstate,
													  &qry->targetList,
													  qry->sortClause,
													  false);
		qry->hasDistinctOn = false;
	}
	else
	{
		/* We had SELECT DISTINCT ON */
		qry->distinctClause = transformDistinctOnClause(pstate,
														stmt->distinctClause,
														&qry->targetList,
														qry->sortClause);
		qry->hasDistinctOn = true;
	}

	/* transform LIMIT */
	qry->limitOffset = transformLimitClause(pstate, stmt->limitOffset,
											EXPR_KIND_OFFSET, "OFFSET");
	qry->limitCount = transformLimitClause(pstate, stmt->limitCount,
										   EXPR_KIND_LIMIT, "LIMIT");

	/* transform window clauses after we have seen all window functions */
	qry->windowClause = transformWindowDefinitions(pstate,
												   pstate->p_windowdefs,
												   &qry->targetList);

	/* resolve any still-unresolved output columns as being type text */
	if (pstate->p_resolve_unknowns)
		resolveTargetListUnknowns(pstate, qry->targetList);

	qry->rtable = pstate->p_rtable;
	qry->jointree = makeFromExpr(pstate->p_joinlist, qual);

	qry->hasSubLinks = pstate->p_hasSubLinks;
	qry->hasWindowFuncs = pstate->p_hasWindowFuncs;
	qry->hasTargetSRFs = pstate->p_hasTargetSRFs;
	qry->hasAggs = pstate->p_hasAggs;

	foreach(l, stmt->lockingClause)
	{
		transformLockingClause(pstate, qry,
							   (LockingClause *) lfirst(l), false);
	}

	assign_query_collations(pstate, qry);

	/* this must be done after collations, for reliable comparison of exprs */
	if (pstate->p_hasAggs || qry->groupClause || qry->groupingSets || qry->havingQual)
		parseCheckAggregates(pstate, qry);

	return qry;
}

FROM processing:

transformFromClause

During FROM processing, traverse fromlist and transfer each "base table" to transformFromClauseItem for processing. The base table processed by transformFromClauseItem may directly process the base table or query table, for example: select * from aa,(select * from bb) as BB; Therefore, the processing is divided into the following types:

  • RangeVar calls transformTableEntry: a common type base table. The base table information is directly stored in pstate - > P_ In the rtable linked list, the subsequent results are displayed in the order of the linked list
  • RangeSubselect calls transformRangeSubselect: the base table of the subquery type. Because it is a complete select statement, it finally calls the transformStmt function for analysis; The parsed results are stored in pstate - > P_ In the rtable linked list, the rtekind field is set to RTE as a difference_ SUBQUERY.
  • RangeFunction calls transformRangeFunction: query the function and finally call addRangeTableEntryForFunction function to store the result in pstate - > P_ In the rtable linked list, the rtekind field is set to RTE as a difference_ FUNCTION.
  • RangeTableFunc call transformRangeTableFunc: call XMLTABLE related functions and store the results in pstate - > P_ In the rtable linked list, the rtekind field is set to RTE as a difference_ TABLEFUNC.
  • RangeTableSample calls transformFromClauseItem:
  • JoinExpr calls the transformFromClauseItem: join connection statement to parse the left and right nodes, obtain the base table information, create a new RTE, and store the result in pstate - > P_ Rtable linked list. As a difference, the rtekind field is set to RTE_JOIN.

After processing is complete, add all indications to pstate - > P_ In namespace, this value is used for subsequent parsing of column names in select * to query all column names; Or judge whether the column name of the query exists.

Relevant codes are as follows:

/*
 * transformFromClause -
 *	  Process the FROM clause and add items to the query's range table,
 *	  joinlist, and namespace.
 *
 * Note: we assume that the pstate's p_rtable, p_joinlist, and p_namespace
 * lists were initialized to NIL when the pstate was created.
 * We will add onto any entries already present --- this is needed for rule
 * processing, as well as for UPDATE and DELETE.
 */
void
transformFromClause(ParseState *pstate, List *frmList)
{
	ListCell   *fl;

	/*
	 * The grammar will have produced a list of RangeVars, RangeSubselects,
	 * RangeFunctions, and/or JoinExprs. Transform each one (possibly adding
	 * entries to the rtable), check for duplicate refnames, and then add it
	 * to the joinlist and namespace.
	 *
	 * Note we must process the items left-to-right for proper handling of
	 * LATERAL references.
	 */
	foreach(fl, frmList)
	{
		Node	   *n = lfirst(fl);
		RangeTblEntry *rte;
		int			rtindex;
		List	   *namespace;

		n = transformFromClauseItem(pstate, n,
									&rte,
									&rtindex,
									&namespace);

		checkNameSpaceConflicts(pstate, pstate->p_namespace, namespace);

		/* Mark the new namespace items as visible only to LATERAL */
		setNamespaceLateralState(namespace, true, true);

		pstate->p_joinlist = lappend(pstate->p_joinlist, n);
		pstate->p_namespace = list_concat(pstate->p_namespace, namespace);
	}

	/*
	 * We're done parsing the FROM list, so make all namespace items
	 * unconditionally visible.  Note that this will also reset lateral_only
	 * for any namespace items that were already present when we were called;
	 * but those should have been that way already.
	 */
	setNamespaceLateralState(pstate->p_namespace, false, true);
}
static Node *
transformFromClauseItem(ParseState *pstate, Node *n,
						RangeTblEntry **top_rte, int *top_rti,
						List **namespace)
{
	if (IsA(n, RangeVar))
	{
		...
	}
	else if (IsA(n, RangeSubselect))
	{
		...
	}
	else if (IsA(n, RangeFunction))
	{
		...
	}
	else if (IsA(n, RangeTableFunc))
	{
		...
	}
	else if (IsA(n, RangeTableSample))
	{
		...
	}
	else if (IsA(n, JoinExpr))
	{
		...
	}
}

Query target column to get:

transformTargetList

When querying all column names, you need to convert * to all column names, such as "SELECT * FROM A_TBL,B_TBL WHERE xx == xx". When parsing column names, use pstate - > P_ Verify whether the column name exists in the incoming table in the namespace. If it does not exist, an error will be reported.

When obtaining, it is divided into string type column name and period type column name.

  • All column names: stored directly in qry - > targetlist.
  • If * asterisk exists in the character, call ExpandColumnRefStar for processing: if * exists, it will be extended to all column names (SELECT *, dname FROM emp, dept). For * column names with table names, you need to verify that there are no more than 4 column names (SELECT emp.*, dname FROM emp, dept). Store the results in qry - > targetlist.
  • Period. Type: call ExpandIndirectionStar to process: parse the expression, verify whether the column name exists, and store it in qry - > targetlist.

The corresponding codes are as follows:

List *
transformTargetList(ParseState *pstate, List *targetlist,
					ParseExprKind exprKind)
{
	List	   *p_target = NIL;
	bool		expand_star;
	ListCell   *o_target;

	/* Shouldn't have any leftover multiassign items at start */
	Assert(pstate->p_multiassign_exprs == NIL);

	/* Expand "something.*" in SELECT and RETURNING, but not UPDATE */
	expand_star = (exprKind != EXPR_KIND_UPDATE_SOURCE);

	foreach(o_target, targetlist)
	{
		ResTarget  *res = (ResTarget *) lfirst(o_target);

		/*
		 * Check for "something.*".  Depending on the complexity of the
		 * "something", the star could appear as the last field in ColumnRef,
		 * or as the last indirection item in A_Indirection.
		 */
		if (expand_star)
		{
			if (IsA(res->val, ColumnRef))
			{
				ColumnRef  *cref = (ColumnRef *) res->val;

				if (IsA(llast(cref->fields), A_Star))
				{
					/* It is something.*, expand into multiple items */
					p_target = list_concat(p_target,
										   ExpandColumnRefStar(pstate,
															   cref,
															   true));
					continue;
				}
			}
			else if (IsA(res->val, A_Indirection))
			{
				A_Indirection *ind = (A_Indirection *) res->val;

				if (IsA(llast(ind->indirection), A_Star))
				{
					/* It is something.*, expand into multiple items */
					p_target = list_concat(p_target,
										   ExpandIndirectionStar(pstate,
																 ind,
																 true,
																 exprKind));
					continue;
				}
			}
		}

		/*
		 * Not "something.*", or we want to treat that as a plain whole-row
		 * variable, so transform as a single expression
		 */
		p_target = lappend(p_target,
						   transformTargetEntry(pstate,
												res->val,
												NULL,
												exprKind,
												res->name,
												false));
	}

	...
}

WHERE processing:

transformWhereClause

The where statement is processed in this function. There is no specific function to process this statement. The transformExpr function is still used for processing. When there is only one expression in where, the transformExpr function processes T_ColumnRef branch; When there are multiple expressions in where, the transformExpr function handles t_ The boolexpr branch is split into t in the transformBoolExpr function_ Columnref branch processing.

The final result of WHERE will be stored in jointree. Qry - > jointree = make from expr (pstate - > p_joinlist, qual);. Therefore, the jointree will be optimized during subsequent plan tree optimization.

The code is as follows:

Node *
transformWhereClause(ParseState *pstate, Node *clause,
					 ParseExprKind exprKind, const char *constructName)
{
	Node	   *qual;

	if (clause == NULL)
		return NULL;

	qual = transformExpr(pstate, clause, exprKind);

	qual = coerce_to_boolean(pstate, qual, constructName);

	return qual;
}

HAVING treatment:

transformWhereClause

Follow the where statement

/* initial processing of HAVING clause is much like WHERE clause */
	qry->havingQual = transformWhereClause(pstate, stmt->havingClause,
										   EXPR_KIND_HAVING, "HAVING");

GROUP BY processing:

transformGroupClause

When the group by statement is processed, it needs to be processed together with the order by statement. Order by sorting is required before group by grouping.

ORDER BY processing:

DISTINCT processing:

The above two are not introduced

Optimize rewrite pg_rewrite_query

According to pg_rewrite the rules defined in rewrite.

Topics: Database PostgreSQL SQL