Detailed explanation of kernel and jdbc Driver implementation of postgresql/lightdb prepare, bind, execute and fetch cursor

Posted by forum on Fri, 28 Jan 2022 13:19:21 +0100

preparedstatement separates PREPARE, BIND and EXECUTE. Its advantage is to avoid repeated syntax analysis, semantic analysis and rewriting. For complex SQL, its effect is more obvious.

In the extended-query protocol, execution of SQL commands is divided into multiple steps. The state retained between steps is represented by two types of objects: prepared statements and portals. A prepared statement represents the result of parsing and semantic analysis of a textual query string. A prepared statement is not in itself ready to execute, because it might lack specific values for parameters. A portal represents a ready-to-execute or already-partially-executed statement, with any missing parameter values filled in. (For SELECT statements, a portal is equivalent to an open cursor, but we choose to use a different term since cursors don't handle non-SELECT statements.)

The overall execution cycle consists of a parse step, which creates a prepared statement from a textual query string; a bind step, which creates a portal given a prepared statement and values for any needed parameters; and an execute step that runs a portal's query. In the case of a query that returns rows (SELECT, SHOW, etc), the execute step can be told to fetch only a limited number of rows, so that multiple execute steps might be needed to complete the operation.

The backend can keep track of multiple prepared statements and portals (but note that these exist only within a session, and are never shared across sessions). Existing prepared statements and portals are referenced by names assigned when they were created. In addition, an "unnamed" prepared statement and portal exist. Although these behave largely the same as named objects, operations on them are optimized for the case of executing a query only once and then discarding it, whereas operations on named objects are optimized on the expectation of multiple uses.

Unlike SIMPLE QUERY, the entry of P B E is different:

        switch (firstchar)
        {
            case 'Q':            /* simple query */
                {
                    const char *query_string;

                    /* Set statement_timestamp() */
                    SetCurrentStatementStartTimestamp();

                    query_string = pq_getmsgstring(&input_message);
                    pq_getmsgend(&input_message);

                    if (am_walsender)
                    {
                        if (!exec_replication_command(query_string))
                            exec_simple_query(query_string);
                    }
                    else
                        exec_simple_query(query_string);

                    send_ready_for_query = true;
                }
                break;

            case 'P':            /* parse */
                {
                    const char *stmt_name;
                    const char *query_string;
                    int            numParams;
                    Oid           *paramTypes = NULL;

                    forbidden_in_wal_sender(firstchar);

                    /* Set statement_timestamp() */
                    SetCurrentStatementStartTimestamp();

                    stmt_name = pq_getmsgstring(&input_message);
                    query_string = pq_getmsgstring(&input_message);
                    numParams = pq_getmsgint(&input_message, 2);
                    if (numParams > 0)
                    {
                        paramTypes = (Oid *) palloc(numParams * sizeof(Oid));
                        for (int i = 0; i < numParams; i++)
                            paramTypes[i] = pq_getmsgint(&input_message, 4);
                    }
                    pq_getmsgend(&input_message);

                    exec_parse_message(query_string, stmt_name,
                                       paramTypes, numParams);
                }
                break;

            case 'B':            /* bind */
                forbidden_in_wal_sender(firstchar);

                /* Set statement_timestamp() */
                SetCurrentStatementStartTimestamp();

                /*
                 * this message is complex enough that it seems best to put
                 * the field extraction out-of-line
                 */
                exec_bind_message(&input_message);
                break;

            case 'E':            /* execute */
                {
                    const char *portal_name;
                    int            max_rows;

                    forbidden_in_wal_sender(firstchar);

                    /* Set statement_timestamp() */
                    SetCurrentStatementStartTimestamp();

                    portal_name = pq_getmsgstring(&input_message);
                    max_rows = pq_getmsgint(&input_message, 4);
                    pq_getmsgend(&input_message);

                    exec_execute_message(portal_name, max_rows);
                }
                break;

prepare processing flow:

StorePreparedStatement prepare.c:432
PrepareQuery prepare.c:173
standard_ProcessUtility utility.c:737
pgss_ProcessUtility pg_stat_statements.c:1201
pgaudit_ProcessUtility_hook pgaudit.c:1412
ProcessUtility utility.c:521
PortalRunUtility pquery.c:1157
PortalRunMulti pquery.c:1303
PortalRun pquery.c:779
exec_simple_query postgres.c:1326
PostgresMain postgres.c:4445
BackendRun postmaster.c:4883
BackendStartup postmaster.c:4567
ServerLoop postmaster.c:1854
PostmasterMain postmaster.c:1487
main main.c:231
__libc_start_main 0x00007f32f566f555
_start 0x0000000000484799

When PREPARE, the statement will be parsed, analyzed, and rewritten. BIND is plan ned and EXECUTE is executed.

Finally, save it in per backend prepared by calling StorePreparedStatement_ Queries hash (in fact, if it is placed in a global variable, it can be independent of backend).

FetchPreparedStatement prepare.c:477
UtilityReturnsTuples utility.c:2020
ChoosePortalStrategy pquery.c:258
PortalStart pquery.c:464
exec_simple_query postgres.c:1287
PostgresMain postgres.c:4445
BackendRun postmaster.c:4883
BackendStartup postmaster.c:4567
ServerLoop postmaster.c:1854
PostmasterMain postmaster.c:1487
main main.c:231
__libc_start_main 0x00007f32f566f555
_start 0x0000000000484799

GetCachedPlan plancache.c:1157
ExecuteQuery prepare.c:233
standard_ProcessUtility utility.c:742
pgss_ProcessUtility pg_stat_statements.c:1201
pgaudit_ProcessUtility_hook pgaudit.c:1412
ProcessUtility utility.c:521
PortalRunUtility pquery.c:1157
PortalRunMulti pquery.c:1303
PortalRun pquery.c:779
exec_simple_query postgres.c:1326
PostgresMain postgres.c:4445
BackendRun postmaster.c:4883
BackendStartup postmaster.c:4567
ServerLoop postmaster.c:1854
PostmasterMain postmaster.c:1487
main main.c:231
__libc_start_main 0x00007f32f566f555
_start 0x0000000000484799

BIND:

The generation of execution plan occurs in the BIND phase, because the generation of execution plan by CBO depends on parameters. Selecting custom or generic execution plan is also in this step. Obtain the CachedPlanSource through FetchPreparedStatement (not planned, so it is called plansource. For those that have been planned, it is called generic plan, which is determined by choose_custom_plan and parameter plan_cache_mode), as follows:

/*
 * Lookup an existing query in the hash table. If the query does not
 * actually exist, throw ereport(ERROR) or return NULL per second parameter.
 *
 * Note: this does not force the referenced plancache entry to be valid,
 * since not all callers care.
 */
PreparedStatement *
FetchPreparedStatement(const char *stmt_name, bool throwError)
{
    PreparedStatement *entry;

    /*
     * If the hash table hasn't been initialized, it can't be storing
     * anything, therefore it couldn't possibly store our plan.
     */
    if (prepared_queries)
        entry = (PreparedStatement *) hash_search(prepared_queries,
                                                  stmt_name,
                                                  HASH_FIND,
                                                  NULL);
    else
        entry = NULL;

    if (!entry && throwError)
        ereport(ERROR,
                (errcode(ERRCODE_UNDEFINED_PSTATEMENT),
                 errmsg("prepared statement \"%s\" does not exist",
                        stmt_name)));

    return entry;
}

Decide whether to generate a new plan or reuse generic_ The logic of the plan is as follows:

CachedPlan *
GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
              bool useResOwner, QueryEnvironment *queryEnv)
{
......
    /* Make sure the querytree list is valid and we have parse-time locks */
    qlist = RevalidateCachedQuery(plansource, queryEnv);

    /* Decide whether to use a custom plan */
    customplan = choose_custom_plan(plansource, boundParams);

    if (!customplan)
    {
        if (CheckCachedPlan(plansource))
        {
            /* We want a generic plan, and we already have a valid one */
            plan = plansource->gplan;
            Assert(plan->magic == CACHEDPLAN_MAGIC);
        }
        else
        {
            /* Build a new generic plan */
            plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv);
            /* Just make real sure plansource->gplan is clear */
            ReleaseGenericPlan(plansource);
            /* Link the new generic plan into the plansource */
            plansource->gplan = plan;
            plan->refcount++;
            /* Immediately reparent into appropriate context */
            if (plansource->is_saved)
            {
                /* saved plans all live under CacheMemoryContext */
                MemoryContextSetParent(plan->context, CacheMemoryContext);
                plan->is_saved = true;
            }
            else
            {
                /* otherwise, it should be a sibling of the plansource */
                MemoryContextSetParent(plan->context,
                                       MemoryContextGetParent(plansource->context));
            }
            /* Update generic_cost whenever we make a new generic plan */
            plansource->generic_cost = cached_plan_cost(plan, false);

            /*
             * If, based on the now-known value of generic_cost, we'd not have
             * chosen to use a generic plan, then forget it and make a custom
             * plan.  This is a bit of a wart but is necessary to avoid a
             * glitch in behavior when the custom plans are consistently big
             * winners; at some point we'll experiment with a generic plan and
             * find it's a loser, but we don't want to actually execute that
             * plan.
             */
            customplan = choose_custom_plan(plansource, boundParams);

            /*
             * If we choose to plan again, we need to re-copy the query_list,
             * since the planner probably scribbled on it.  We can force
             * BuildCachedPlan to do that by passing NIL.
             */
            qlist = NIL;
        }
    }

    if (customplan)
    {
        /* Build a custom plan */
        plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv);
        /* Accumulate total costs of custom plans, but 'ware overflow */
        if (plansource->num_custom_plans < INT_MAX)
        {
            plansource->total_custom_cost += cached_plan_cost(plan, true);
            plansource->num_custom_plans++;
        }
    }
......

It should be noted that, BIND It distinguishes between statements and portals (although portal is the runtime representation and portal is the prerequisite for execute). The difference between named portal and unnamed portal is that named portal will last until the end of the transaction or be displayed for destruction, unnamed portal will be automatically destroyed in the next BIND execution, and named portal must be displayed for destruction before the next BIND.

https://jdbc.postgresql.org/documentation/head/server-prepare.html

Note: the above figure also shows that the PG server supports receiving multiple continuous request commands at one time, such as PARSE and BIND above.

The second request does not contain PARSE, as follows:

This is processed on the client side.

　　https://www.postgresql.org/docs/current/sql-prepare.html

　　https://www.postgresql.org/docs/current/view-pg-prepared-statements.html

Show the executed PREPARE in PG_ prepared_ Precompiled statements are displayed in real time in statements, and protocol level PREPARE is not reflected here. Because the postcache of postgresql is per backend, you have to query PG in java to verify_ prepared_ statements.

zjh@postgres=# PREPARE fooplan (int, text, bool, numeric) AS
zjh@postgres-#     INSERT INTO foo VALUES($1, $2, $3, $4);
PREPARE
zjh@postgres=# select * from pg_prepared_statements ;
  name   |                   statement                   |         prepare_time         |        parameter_types         | from_sql 
---------+-----------------------------------------------+------------------------------+--------------------------------+----------
 fooplan | PREPARE fooplan (int, text, bool, numeric) AS+| 2022-01-26 10:04:10.81974+00 | {integer,text,boolean,numeric} | t
         |     INSERT INTO foo VALUES($1, $2, $3, $4);   |                              |                                | 
(1 row)

According to https://www.postgresql.org/message-id/CAL454F2yiTPqnTAVw78teOCnHvYxMSjzSekH8wjOPxVNTLFejw%40mail.gmail.com JDBC only needs to set setPrepareThreshold(1). javadoc does say so, as follows:

  /**
   * Turn on the use of prepared statements in the server (server side prepared statements are
   * unrelated to jdbc PreparedStatements) As of build 302, this method is equivalent to
   * <code>setPrepareThreshold(1)</code>.
   *
   * @param flag use server prepare
   * @throws SQLException if something goes wrong
   * @since 7.3
   * @deprecated As of build 302, replaced by {@link #setPrepareThreshold(int)}
   */
  @Deprecated
  void setUseServerPrepare(boolean flag) throws SQLException;

The code determines whether oneshotquery is through org postgresql. jdbc. In PgStatement#executeInternal, isOneShotQuery(cachedQuery) is invoked, and it is judged by mPrepareThreshold==0. pgjdbc The client has several chicken ribs.

Programmer Think

Detailed explanation of kernel and jdbc Driver implementation of postgresql/lightdb prepare, bind, execute and fetch cursor

Hot Topics