1. Opening remarks
Environmental Science:
Processor architecture: arm64
Kernel source code: linux-5.10 fifty
ubuntu version: 20.04 one
Code reading tool: vim+ctags+cscope
Due to the existence of page cache in Linux kernel, the modified file data will not be synchronized to the disk immediately, but will be cached in the page cache in memory. We call this page inconsistent with the disk data as dirty page, and the dirty page will be synchronized to the disk at the right time. In order to write back dirty pages in the page cache, you need to mark the page as dirty.
Dirty page tracking refers to how the kernel records file pages as dirty at an appropriate time, so that the kernel knows which pages to write back to disk when writing back dirty pages. Anonymous pages do not need to track dirty pages because they do not need to be synchronized to disk; Private file pages do not need to track dirty pages, because during mapping, writable pages will be mapped as read-only, and write access will be copied on write and transformed into anonymous pages; Therefore, only shared file pages need to track dirty pages. Tracking has two levels: one is page table entry record, and the other is page descriptor record.
There are two ways to access the file page: one is to map the file through mmap, and the other is to operate the file through the write interface of the file system. This article will explain these two ways. In the Linux kernel, because tracking dirty pages involves file writeback, page missing exception, reverse mapping and other technologies, this paper also focuses on how to track dirty pages in the Linux kernel.
2.mmap mapped file page
The basic process is as follows:
1) Share files through mmap mapping.
2) When accessing a file page for the first time, read the file page to the page cache after a page shortage occurs. If it is a write access, set the page table entry of the corresponding process to be dirty and writable.
3) When a dirty page is written back, it will find each vma mapped to this page through the reverse mapping mechanism, and set the page table entry of the corresponding process as read-only and clear the dirty mark.
4) If the file page is accessed for the second time, there are two cases of dirty page processing:
- The file page in the page cache has not been written back to the disk (before step 3). At this moment, the file page is still a dirty page. Because the page table entries of the corresponding process are dirty and writable, you can write this page directly.
- The file page in the page cache has been written back to the disk (after 3 steps). At this moment, this file page is no longer a dirty page. Because the page table entry is read-only, a copy missing page exception will occur during write access. In exception handling, the shared file page mapping will be handled, and the page table entry of the corresponding process will be set to dirty and writable again.
The analysis is as follows:
2.1 when accessing a file page for the first time
If it is an mmap mapping file page, the page missing exception of conversion table error type will occur when the page table is not filled in.
//mm/memory.c handle_pte_fault ->do_fault ->do_shared_fault ->__do_fault //Read file page to page cache ->do_page_mkwrite ->vmf->vma->vm_ops->page_mkwrite() ->filemap_page_mkwrite, //For ext2 ->set_page_dirty(page) ->__set_page_dirty_buffers ->__set_page_dirty//The marked page in the page cache is dirty ->TestSetPageDirty(page) //Set page descriptor dirty flag ->finish_fault //Set page table entry ->alloc_set_pte ->if (write) entry = maybe_mkwrite(pte_mkdirty(entry), vma) //Set page table items to be dirty and writable
2.2 dirty page updating
//mm/page-writeback.c write_cache_pages ->clear_page_dirty_for_io(page) //For each page written back ->page_mkclean(page) //Cleaning mark mm / rmap c ->page_mkclean_one //The reverse mapping looks up each vma of the page and invokes clean mark and write protect processing ->entry = pte_wrprotect(entry); //Write protected processing, set read-only entry = pte_mkclean(entry); //Clean mark set_ pte_ At (VMA - > vm_mm, address, PTE, entry) / / set to page table entry ->TestClearPageDirty(page) //Clean page descriptor dirty flag
2.3 when accessing the file page for the second time
1) When the dirty page has not been written back (specifically, before calling clear_page_dirty_for_io), the page descriptor has been set with dirty flag, and the page table item has been set with dirty flag and writable.
At this time, you can directly write and access the file page without page missing.
2) When the dirty page has been written back (specifically, after calling clear_page_dirty_for_io), the page descriptor has cleared the dirty tag, the page table entry has cleared the dirty tag, and is read-only.
At this time, a copy page missing exception occurs when writing to access the file page (access permission error page missing).
The call chain is as follows:
//mm/memory.c handle_pte_fault ->if (vmf->flags & FAULT_FLAG_WRITE) { //vma writable if (!pte_write(entry)) //Page table entry has no writable attribute return do_wp_page(vmf) / / copy missing page exception handling during write do_wp_page ->} else if (unlikely((vma->vm_flags & (VM_WRITE|VM_SHARED)) == (VM_WRITE|VM_SHARED))) { //Is a shared writable file mapping vma return wp_page_shared(vmf); ->do_page_mkwrite ->vmf->vma->vm_ops->page_mkwrite() ->filemap_page_mkwrite, //For ext2 - > Set_ page_ dirty(page) ->__set_page_dirty_buffers //The marked page in the page cache is dirty ->TestSetPageDirty(page) //Set page descriptor dirty flag ->finish_mkwrite_fault ->wp_page_reuse ->entry = maybe_mkwrite(pte_mkdirty(entry), vma) //Reset page table entries to be dirty and writable
2.4 write access again
Repeat the above steps.
3. File page of write interface operation
When accessing a file page through the write interface, the file page will be read to the page cache and will not be mapped to any process address space. All dirty pages are tracked by setting / clearing the dirty flag of the page descriptor.
3.1 when accessing a file page for the first time
First read the file page to the page cache, and then write the user space write buffer data to the page cache. The call chain is as follows:
ext2_file_write_iter //fs/ext2/file.c ->generic_file_write_iter //mm/filemap.c ->__generic_file_write_iter ->generic_perform_write ->a_ops->write_begin() //Processing allocation before writing page cache page - > IOV_ iter_ copy_ from_ user_ Atomic / / write buffer data in user space to page cache - > A_ ops->write_ End() / / post write processing ->block_write_end ->__block_commit_write ->mark_buffer_dirty if (!TestSetPageDirty(page)) { //Set page descriptor dirty flag - >__ set_ page_ Dirty / / set the page to dirty (set the dirty flag of the page descriptor)
3.2 updating dirty pages
write_cache_pages //mm/page-writeback.c ->clear_page_dirty_for_io ->TestClearPageDirty(page) //Clear dirty flag of page descriptor
3.3 when accessing the file page for the second time
Before dirty page write back, the dirty flag bit of page descriptor is still set and waiting for write back. It is not necessary to set the dirty flag bit of page descriptor.
After the dirty page is written back, the dirty flag bit of the page descriptor is cleared, and the file page writing call chain will set the dirty flag bit of the page descriptor.
4. Summary
1) For the shared file page mapped by mmap, because this file page may be shared by multiple processes to multiple Vmas, the dirty page is tracked through the dirty flag bit of the page table entry: if a page missing exception occurs during the first write access, the file page will be read into the page cache and the dirty flag of the page table entry of the process will be set, Before the write back (before the completion of clear_page_dirty_for_io), the dirty flag of page table items is set. During the write back (the call of clear_page_dirty_for_io), the dirty flag bits of all page table items mapping this page will be cleared through the reverse mapping mechanism and the read-only permission will be set. After the write back (after the completion of clear_page_dirty_for_io), a copy page missing exception will occur during write access again. Set the dirty flag bit of the page table item again, and repeat this to track the dirty page.
2) For the file page accessed directly through the write interface, because the file page will only be read into the page cache and is not mapped to any process address space, the process write access is through copy_from_user, so the dirty page is recorded through the page descriptor. Before writeback (before the completion of clear_page_dirty_for_io), when writing a file, the dirty flag bit of the page descriptor will be set through the call chain of the file system. During writeback (the call of clear_page_dirty_for_io), the dirty flag bit of the page descriptor will be cleared. After writeback (after clear_page_dirty_for_io is completed), when writing access through the write interface again, the call chain of writing files through the file system will set the dirty flag bit of the page descriptor again. This is repeated, so that the dirty page is tracked.