Introduction to PostgreSQL architecture

Posted by steekyjim on Thu, 09 Dec 2021 14:43:08 +0100

PostgreSQL is the most open source database like oracle. We can compare Oracle and learn its architecture, which is easier to understand. The main structure of PostgreSQL is as follows:

1. Storage structure

PG data storage structure is divided into logical storage structure and physical storage structure. Among them: logical storage structure is the internal way to organize and manage data; Physical storage structure is the way to organize and manage data in the operating system.

1.1 logical storage structure

A database cluster is a collection of database objects.

In relational database theory, database objects are data structures used to store or reference data.

Tables are an example, as well as indexes, sequences, views, functions, and so on.

In PostgreSQL, the database itself is also a database object and is logically separated from each other.

Other database objects (such as tables, indexes, etc.) except databases belong to their respective databases. Although they belong to the same database cluster, they cannot directly access another database object in the cluster from one database in the cluster.

The Database itself is also a Database object. A Database cluster can contain multiple databases, multiple users, and each Database and all objects in the Database have their owner: User.

  • When creating a Database, a default Schema named public will be created for the Database. Each Database can have multiple schemas.

  • When creating other database objects in this database, if no Schema is specified, they will be in the public Schema.

  • Schema can be understood as a namespace in a database. All objects created in the database are created in schema.

  • A user can access different schemas from the same client connection.

  • Different schemas can have multiple database objects with the same name, such as Table, Index, View, Sequence, Function, etc.

1.2 physical storage structure

The database files are saved in the data directory created at initdb by default.
There are many directories and files with different types and functions in the data directory.
In addition to data files, there are parameter files, control files, database operation logs and pre write logs.

1.2. 1 data directory

The data directory is used to store PostgreSQL persistent data. Generally, the data directory path can be configured as PGDATA environment variable.
PGDATA="/var/lib/postgresql/data"


The physical storage structure of PostgreSQL mainly refers to the files stored on the hard disk, including data files, log files, parameter files, control files and redo logs (WAL)

1.2. 2 data file (table file)

First, learn about two basic database objects: OID and tablespace.

OID

All database objects in PostgreSQL are internally managed by their own object identifiers (OID s).
They are unsigned 4-byte integers.
The relationship between database objects and individual OID s is stored in the appropriate system directory, depending on the type of object.
The OID of the database is stored in PG_ In the database system table, you can query through the following code.

select oid, datname from pg_database where datname = '';

The OIDs of tables, indexes, sequences and other objects in the database are stored in PG_ In the class system table, you can obtain the oid of these objects through the following code

select oid, relname, relkind from pg_class where relname ~ 'tb1';

Tablespace

In PostgresSql, the largest logical storage unit is table space.
The objects created in the database are saved in the table space. For example, tables, indexes and the whole database can be allocated to a specific table space.
When creating a database object, you can specify the table space of the database object. If not specified, the default table space is used, that is, the location of the file of the database object.

PG is automatically created when the database directory is initialized_ Default and pg_global tablespace.

1.2. 3 log file

logging_collector = on
log_destination = 'csvlog'
log_directory = '/home/postgres/pg_log'
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
log_connections = on
log_disconnections = on
log_rotation_age = 1d
log_rotation_size = 20MB

# The postgres log records various operations after the user logs into the database. The postgres log is divided into three categories through the parameter pg_statement to control,
# Default PG_ The statement parameter value is none, i.e. no record. DDL (record create,drop and alter) can be set
# Mod (record ddl+insert,delete,update and truncate) and all(mod+select).

log_statement = 'all'                   # none, ddl, mod, all

The effect is as follows

[root@k8slys01 pgsmaster]# tail -f log/postgresql-2021-12-09_121245.csv
2021-12-09 12:20:06.077 UTC [47] LOG:  statement: select pg_switch_wal()
2021-12-09 12:20:06.158 UTC [47] LOG:  statement: select oid, datname from pg_database
2021-12-09 12:20:06.161 UTC [38] LOG:  statement: SELECT c.conkey FROM pg_constraint c WHERE c.contype = 'p' and c.conrelid = 1262
2021-12-09 12:20:06.167 UTC [47] LOG:  statement: SELECT * from pg_class
2021-12-09 12:20:06.182 UTC [38] LOG:  statement: SELECT c.conkey FROM pg_constraint c WHERE c.contype = 'p' and c.conrelid = 1259
2021-12-09 12:20:06.184 UTC [47] LOG:  statement: select pg_current_wal_lsn(),
        pg_walfile_name(pg_current_wal_lsn()),
        pg_walfile_name_offset(pg_current_wal_lsn())
2021-12-09 12:21:06.397 UTC [38] LOG:  statement: INSERT INTO "public"."tb_user"("name") VALUES ('33333') RETURNING *
2021-12-09 12:21:06.405 UTC [38] LOG:  statement: SELECT * FROM "public"."tb_user" WHERE "name" = '33333'

More complex later independent writing

Topics: Database PostgreSQL architecture postgres