linux memory management-page swapping

Posted by Heywood on Mon, 18 Oct 2021 19:11:13 +0200

In the process of mapping a linear address to a physical address by i386 CPU, if the mapping of the address has been established, but the P(present) flag in the corresponding page table or directory item is found to be zero, then the corresponding physical page is not in memory and thus the memory access cannot be completed. In theory, this situation should probably be referred to as obstruction rather than failure, because after all, the relationship of mapping has been established and should be different from the situation where mapping has not been established, so we call it disconnection. However, the MU hardware of the CPU does not distinguish between these two situations, as long as the P flag bit is 0, it is considered a page mapping failure and the CPU will produce a page fault. In fact, the first thing the CPU looks at in the mapping process is the P-flag bits in the page table or directory items. As long as the P flag bit is 0, the values for the remaining bit segments are meaningless. As for whether a page is not in memory or because the mapping has not been established, it is the software, that is, the page exception handler. In the case of cross-border access, we saw in the function handle_ Pte_ The first few lines in the fault:

do_page_fault=>handle_mm_fault=>handle_pte_fault

static inline int handle_pte_fault(struct mm_struct *mm,
	struct vm_area_struct * vma, unsigned long address,
	int write_access, pte_t * pte)
{
	pte_t entry;

	/*
	 * We need the page table lock to synchronize with kswapd
	 * and the SMP-safe atomic PTE updates.
	 */
	spin_lock(&mm->page_table_lock);
	entry = *pte;
	if (!pte_present(entry)) {
		/*
		 * If it truly wasn't present, we know that kswapd
		 * and the PTE updates will not touch it later. So
		 * drop the lock.
		 */
		spin_unlock(&mm->page_table_lock);
		if (pte_none(entry))
			return do_no_page(mm, vma, address, write_access, pte);
		return do_swap_page(mm, vma, address, pte, pte_to_swp_entry(entry), write_access);
	}
......

Here, the first difference is pte_present, which checks the P flag in the table entry to see if the physical page is in memory. If not, then via pte_none checks if the table entry is empty, i.e. all 0. If empty, the mapping has not been established, so do_is called No_ Page. This has been seen in previous scenarios. Conversely, if it is not empty, the mapping is established, but the physical page is not in memory, so do_swap_page, switch to this page from the switching device. This scenario is in handle_ Pte_ Pre-fault processing and execution are the same as cross-border access, so we go directly to do_swap_page. The code for this function is as follows:

do_page_fault=>handle_mm_fault=>handle_pte_fault=>do_swap_page

static int do_swap_page(struct mm_struct * mm,
	struct vm_area_struct * vma, unsigned long address,
	pte_t * page_table, swp_entry_t entry, int write_access)
{
	struct page *page = lookup_swap_cache(entry);
	pte_t pte;

	if (!page) {
		lock_kernel();
		swapin_readahead(entry);
		page = read_swap_cache(entry);
		unlock_kernel();
		if (!page)
			return -1;

		flush_page_to_ram(page);
		flush_icache_page(vma, page);
	}

	mm->rss++;

	pte = mk_pte(page, vma->vm_page_prot);

	/*
	 * Freeze the "shared"ness of the page, ie page_count + swap_count.
	 * Must lock page before transferring our swap count to already
	 * obtained page count.
	 */
	lock_page(page);
	swap_free(entry);
	if (write_access && !is_page_shared(page))
		pte = pte_mkwrite(pte_mkdirty(pte));
	UnlockPage(page);

	set_pte(page_table, pte);
	/* No need to invalidate - it was non-present before */
	update_mmu_cache(vma, address, pte);
	return 1;	/* Minor fault */
}

Let's first see what the parameters passed in during the call are. It is recommended that the reader go back to the previous scenario where the expanded stack was accessed across boundaries, and follow the CPU's execution path once to understand the context of these parameters. The mm, vma, and address in the parameter table are self-explanatory, referring to the mm_of the current process, respectively Pointer to struct structure, vm_of virtual interval to which it belongs Area_ Pointer to struct structure and linear address where mapping failed.

Parameter page_ The table points to a page table item whose mapping failed, while entry is the content of the table item. As we have said before, when a physical page is in memory, the page table entry is a pte_t structure, pointing to a memory page; When the physical page is not in memory, it is a swp_entry_t structure, pointing to a page on disk. Both are actually 32-bit unsigned integers. It is pointed out here that what is called not in memory is logical. For the page mapping hardware of the CPU, in fact, this page is likely to be in an inactive page queue or even in an active page queue.

There is also a parameter, write_access, which indicates the kind of access (read and write) that will be made when the mapping fails, is in do_ Page_ Error code error_generated from CPU in switch statement of fault Code's bit1 is determined (note that in that switch statement, there is no break statement between default and case 2:). Since then it has been passed down one by one.

Since the physical page is not in memory, entry is an index item (with several flags) that points to a page on disk and is of a type similar to a pointer.

. . . . . .

Topics: Linux memory management