Postgresql's pageinspect plug-in framework and data recovery exploration

Posted by lrsdsout on Tue, 01 Mar 2022 16:50:55 +0100

1. Plug in code analysis

pageinspect is a very typical and simple plug-in. Most processes can be learned and reused.

If you need to write a new plug-in, you can directly copy and modify the following two functions, and most of the framework code can be used directly.

get_raw_page the classic process of reading a page

get_raw_page only returns one line of data, so there is no need to go through the process of generating tuples in the plug-in loop.

Classic process of reading pages:

1. schema+table name wrapper: makeRangeVarFromNameList
2. Open table: relation_openrv
3. Read page (return to the page where the pin lives): ReadBufferExtended
4. Page locking: LockBuffer
5. Read data: memcpy
6. Page unlock: LockBuffer
7. unpin: ReleaseBuffer
8. Close table: relation_close

get_raw_page
  get_raw_page_internal(relname, MAIN_FORKNUM, blkno)
    (1) relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname))
      textToQualifiedNameList: Add before table name public. Or direct use schema.tablename
      makeRangeVarFromNameList: {type = T_RangeVar, catalogname = 0x0, schemaname = 0x2ae57c8 "public", relname = 0x2ae58f8 "t81", inh = 1 '\001', relpersistence = 112 'p', alias = 0x0, location = -1}
    (2) rel = relation_openrv(relrv, AccessShareLock)
      relation_openrv
        RangeVarGetRelid: RangeVar turn OID
          RangeVarGetRelidExtended
            LookupExplicitNamespace: Found public of id2200
            get_relname_relid(relation->relname, namespaceId)
              // Typical read flow of system table cache
              GetSysCacheOid2(RELNAMENSP, PointerGetDatum(relname), ObjectIdGetDatum(relnamespace))
                GetSysCacheOid (cacheId=49, key1=44980472, key2=2200, key3=0, key4=0)
                  SearchSysCache (cacheId=49, key1=44980472, key2=2200, key3=0, key4=0)
                    SearchCatCache (cache=0x2a97f80, v1=44980472, v2=2200, v3=0, v4=0)
                  HeapTupleGetOid(tuple)
    (3) buf = ReadBufferExtended(rel, forknum, blkno, RBM_NORMAL, NULL)
    (4) LockBuffer(buf, BUFFER_LOCK_SHARE)
    (5) memcpy(raw_page_data, BufferGetPage(buf), BLCKSZ)
    (6) LockBuffer(buf, BUFFER_LOCK_UNLOCK)
    (7) ReleaseBuffer(buf)
    (8) relation_close(rel, AccessShareLock)

heap_page_items

If the plug-in needs to return multiple rows of data, here is a typical PG plug-in framework:

Datum
heap_page_items(PG_FUNCTION_ARGS)
{
// Once again, put the initialization here
if (SRF_IS_FIRSTCALL())
{
    ...
    fctx = SRF_FIRSTCALL_INIT();
		mctx = MemoryContextSwitchTo(fctx->multi_call_memory_ctx);
		...
		fctx->max_calls = PageGetMaxOffsetNumber(inter_call_data->page);
		fctx->user_fctx = Custom data to be used later;
		MemoryContextSwitchTo(mctx);
}

fctx = SRF_PERCALL_SETUP();
Custom data to be used later = fctx->user_fctx;

// The outer layer calls max_ Call times, spell one line at a time, and return the data to SRF_RETURN_NEXT
if (fctx->call_cntr < fctx->max_calls)
{
  Assembly data values
  
  resultTuple = heap_form_tuple(tupdesc, values, nulls);
  result = HeapTupleGetDatum(resultTuple);
  SRF_RETURN_NEXT(fctx, result);
}
else
  SRF_RETURN_DONE(fctx);
}

The data assembly is relatively simple. It basically uses macros to get data in the page.

2 recovery data exploration

When a customer just deleted a piece of data, he asked if there was a quick way to find a field of the deleted data?
Here we try to use pageinspect to do recovery experiments.

Conclusion first:

  1. The data field using pageinspect can be reversed to the original data, but only if the data field is still there.
  2. Even if there is no vacuum, the next reading and writing of the page where the deleted data is located may purne the page, resulting in that the deleted data can no longer be found by pageinspect.
  3. Using pageinspect to recover data is not feasible.

experimental data

create table t21(id int, d1 int, d2 varchar(8), d3 text);
alter table t21 set ( autovacuum_enabled = false, toast.autovacuum_enabled = false);
insert into t21 select a, a+100, 'aA012345',md5(random()::text) from generate_series(1,10000) a;
select * from t21;
  id   |  d1   |    d2    |                d3                
-------+-------+----------+----------------------------------
     1 |   101 | aA012345 | e10022e0fa6ecece950a1ab20caac824
     2 |   102 | aA012345 | 7c963b472d452c4dde9d36760b41a8de
     3 |   103 | aA012345 | 8142f1299d5cf7e384e7b3c43076710d
     4 |   104 | aA012345 | c3c76a8c0cf98e6004051b3bfe415310
     5 |   105 | aA012345 | 163214ae0c27f387d0207e89210e5a57
     6 |   106 | aA012345 | 302cb39b8ebc649d045f2a1123d379bb
     7 |   107 | aA012345 | be22d7234058e2e9247911c011eef47f
     8 |   108 | aA012345 | 7fe88087d8d6fea4470eece997f49360
     9 |   109 | aA012345 | 279eb01c225b71b083b5df2d9bd87d7b
    10 |   110 | aA012345 | 0463ef122689cb78f795c6d4309f3565
    11 |   111 | aA012345 | 1587dd3d027b6fda21d97085369434d7

Experimental process

Delete 5 page data

delete from t21 where ctid='(5,3)';
DELETE 1

Observe the current page status, and the data is still there

postgres=# SELECT lp, t_xmin, t_xmax, t_infomask,substring(t_data from 0 for 8) from heap_page_items(get_raw_page('t21', 5));
 lp |   t_xmin   |   t_xmax   | t_infomask |    substring     
----+------------+------------+------------+------------------
  1 | 1477542367 |          0 |       2306 | \xe60100004a0200
  2 | 1477542367 |          0 |       2306 | \xe70100004b0200
  3 | 1477542367 | 1477542374 |        258 | \xe80100004c0200
  4 | 1477542367 |          0 |       2306 | \xe90100004d0200
  5 | 1477542367 |          0 |       2306 | \xea0100004e0200
  6 | 1477542367 |          0 |       2306 | \xeb0100004f0200

When parsing the third item, you can reverse parse the original data according to the data type.

postgres=# SELECT t_attrs FROM heap_page_item_attrs(get_raw_page('t21', 5), 't21'::regclass);
                                                            t_attrs                                                            
-------------------------------------------------------------------------------------------------------------------------------
 {"\\xe6010000","\\x4a020000","\\x136141303132333435","\\x433333623539396461646233336235336137356138313065653162353035386262"}
 {"\\xe7010000","\\x4b020000","\\x136141303132333435","\\x436330383466376163646361386438356432326566643537633137623161396561"}
 {"\\xe8010000","\\x4c020000","\\x136141303132333435","\\x436566386436353365306437333439613639623161613835383236386531376430"}
 {"\\xe9010000","\\x4d020000","\\x136141303132333435","\\x433134356539636636336536393231653137353661616130303438633865363364"}
 {"\\xea010000","\\x4e020000","\\x136141303132333435","\\x433262663333653063663735643664326335303538323137306136306635303133"}

Next, make a query. On the fifth page of the query, it is found that pageinspect can no longer see the deleted data. If you can't see it, you can't parse it.

select * from t21;

SELECT lp, t_xmin, t_xmax, t_infomask,substring(t_data from 0 for 8) from heap_page_items(get_raw_page('t21', 5));
 lp |   t_xmin   | t_xmax | t_infomask |    substring     
----+------------+--------+------------+------------------
  1 | 1477542367 |      0 |       2306 | \xe60100004a0200
  2 | 1477542367 |      0 |       2306 | \xe70100004b0200
  3 |            |        |            | 
  4 | 1477542367 |      0 |       2306 | \xe90100004d0200
  5 | 1477542367 |      0 |       2306 | \xea0100004e0200
  6 | 1477542367 |      0 |       2306 | \xeb0100004f0200
  7 | 1477542367 |      0 |       2306 | \xec010000500200

What information is useful for DELETE XLOG?

If you only focus on the XLOG of DELETE, you can execute the following command:

pg_waldump 000000010000002F000000E7 -r heap | grep DELETE

rmgr: Heap        len (rec/tot):     59/  8231, tx: 1477542368, lsn: 2F/E7CE5D70, prev 2F/E7CE5D38, desc: DELETE off 1 KEYS_UPDATED , blkref #0: rel 1663/13212/143325 blk 1 FPW
rmgr: Heap        len (rec/tot):     59/  8231, tx: 1477542369, lsn: 2F/E7CE7E50, prev 2F/E7CE7E18, desc: DELETE off 1 KEYS_UPDATED , blkref #0: rel 1663/13212/143325 blk 4 FPW
rmgr: Heap        len (rec/tot):     59/  8151, tx: 1477542370, lsn: 2F/E7CE9FD8, prev 2F/E7CE9FA0, desc: DELETE off 2 KEYS_UPDATED , blkref #0: rel 1663/13212/143325 blk 1 FPW
rmgr: Heap        len (rec/tot):     54/    54, tx: 1477542371, lsn: 2F/E7CF0080, prev 2F/E7CEE0E0, desc: DELETE off 3 KEYS_UPDATED , blkref #0: rel 1663/13212/143325 blk 1
rmgr: Heap        len (rec/tot):     54/    54, tx: 1477542372, lsn: 2F/E7CF0158, prev 2F/E7CF0120, desc: DELETE off 5 KEYS_UPDATED , blkref #0: rel 1663/13212/143325 blk 1
rmgr: Heap        len (rec/tot):     59/  8151, tx: 1477542373, lsn: 2F/E7CF22F0, prev 2F/E7CF22B8, desc: DELETE off 2 KEYS_UPDATED , blkref #0: rel 1663/13212/143325 blk 4 FPW
rmgr: Heap        len (rec/tot):     59/  8231, tx: 1477542374, lsn: 2F/E7CF4428, prev 2F/E7CF43F0, desc: DELETE off 3 KEYS_UPDATED , blkref #0: rel 1663/13212/143325 blk 5 FPW

XLOG records the offset of tuple in the page and the page ID, which should play a role in confirming the deletion behavior and recovering data.

rmgr: Heap       

 len (rec/tot):     59/  8231, 

tx: 1477542374, 

lsn: 2F/E7CF4428, 

prev 2F/E7CF43F0, 

desc: DELETE 

off 3 KEYS_UPDATED , (Offset within page)

blkref #0: rel 1663/13212/143325 

blk 5 (page ID)

FPW

Topics: Database PostgreSQL pgsql