Path to the file system mount point

Posted by plisken on Tue, 04 Jan 2022 17:03:28 +0100

1. About pathname lookup

1.1 treatment process

VFS pathname lookup, that is, how to export the corresponding inode from the file pathname. This process can be briefly described as: check the directory entry matching the first name to obtain the corresponding index node. Then, read the directory file containing that inode from the disk, and check the directory entry matching the second name to obtain the corresponding inode. This process is repeated for each name contained in the path.

1.2 absolute path and relative path

If the first character of the path name is "/", the path is an absolute path, so search from the directory identified by current - > FS - > root (process root directory). Otherwise, the path is relative, so the search starts from the directory identified by current - > FS - > PWD (the current directory of the process).

1.3 cache

The directory entry cache greatly speeds up the lookup process because it keeps the most commonly used directory entry objects in memory. In many cases, path name analysis can avoid reading intermediate directories from disk.

1.4 special treatment

According to the characteristics of VFS file system, we must consider:

  • Check the access rights of each directory and verify whether the process is allowed to read the contents of this directory.
  • Whether the file name is a symbolic link to another path.
  • Symbolic links can cause recycling
  • The file name may be a mount point for an installed file system. The lookup operation is to be extended to the new file system.
  • Pathname lookup should be done in the namespace of the process that issued the system call. Two processes with different namespaces use the same pathname and may specify different files.

1.5 key points of this analysis

The whole path finding process requires judgment and analysis. There are relatively many contents to be processed. Here, we only analyze how the kernel knows that a component has installed a file system as a mount point in the path finding process. How can the kernel know that the directory is a mount point, exchange the file system, and enter and access the directories or files in the sub file system.

2. path_lookup()

Pathname lookup by path_lookup() executes.

/**
 * @name:		File pathname
 * @flags:		Find flag
 * @nd:			Store search results
 * /
int fastcall path_lookup(const char *name, unsigned int flags, struct nameidata *nd)
		do_path_lookup(AT_FDCWD, name, flags, nd);
				/* Initialize some fields of nd */
				nd->last_type = LAST_ROOT;
				nd->flags = flags;
				nd->depth = 0;

				if (*name=='/') {
					/* Using an absolute path, the lookup starts at the root directory */
					/* Get the corresponding root file system, installed file system descriptor and directory entry of the root directory */
					nd->mnt = mntget(fs->rootmnt);
					nd->dentry = dget(fs->root);
				} else if (dfd == AT_FDCWD) {
					/* Using relative paths, the lookup operation starts from the current working directory */
					/* Get the installed file system descriptor and directory entry corresponding to the current working directory */
					nd->mnt = mntget(fs->pwdmnt);
					nd->dentry = dget(fs->pwd);
				} else {
				}

				path_walk(name, nd);
						current->total_link_count = 0;
						/* Find the core of the operation, refer to 3 */
						link_path_walk(name, nd);
  • The nd variable holds the search results. The dentry and mnt fields point to the directory entry object of the last path component resolved and the installed file system descriptor, respectively.
  • Due to path_ The directory entry objects and installed file system objects in the nameidata structure returned by the lookup () function represent the results of the lookup operation, so these two objects cannot be released until the caller completes using the lookup results. Therefore, path_lookup() increases the value of two object reference counters. To release these objects, call path_ The release () function, passing it the address of the nameidata structure.

3. link_path_walk()

Pathname lookup is the core of the operation.

int fastcall link_path_walk(const char *name, struct nameidata *nd)
		__link_path_walk(name, nd);
				struct inode *inode;
				
				/* Skip slash before first component of path name */
				while (*name=='/')
					name++;
				/* Gets the inode of the root directory or current working directory directory entry, which is the inode to start the lookup */
				inode = nd->dentry->d_inode;
				
				/* Decompose the path name into components, and perform the operation in the for loop for each component */
				for(;;) {
					unsigned long hash;
					struct qstr this;
					unsigned int c;
					
					/* 1. Access rights check */
					err = exec_permission_lite(inode, nd);
							if (inode->i_op && inode->i_op->permission)
								return -EAGAIN;
					if (err == -EAGAIN)
						err = vfs_permission(nd, MAY_EXEC);
					
					/* Consider the next component to parse and calculate the hash value from the name */
					this.name = name;
					c = *(const unsigned char *)name;
					hash = init_name_hash();	//hash = 0;
					do {
						name++;
						hash = partial_name_hash(c, hash);
						c = *(const unsigned char *)name;
					} while (c && (c != '/'));
					this.len = name - (const char *) this.name;
					this.hash = end_name_hash(hash);	//return hash;
					
					/* If the resolved component is "." And "..", In this case, the previously resolved directory entries should be found in the cache */
					if (this.name[0] == '.') switch (this.len) {
						default:
							break;
						case 2:
							if (this.name[1] != '.')
								break;
							/* Reference 3.1 */
							follow_dotdot(nd);
							/* inode Re point to the traced inode value */
							inode = nd->dentry->d_inode;
							/* Note that there is no break here */
						case 1:
							continue;
					}
					
					/* If the resolved component is not And You need to find it from the directory item cache */
					/* If there is a custom hash value square rule, call it to modify the hash value calculated earlier */
					if (nd->dentry->d_op && nd->dentry->d_op->d_hash)
						nd->dentry->d_op->d_hash(nd->dentry, &this);
					/* Reference 3.3 */
					do_lookup(nd, &this, &next);
					inode = next.dentry->d_inode;
					path_to_nameidata(&next, nd);
							nd->mnt = path->mnt;
							nd->dentry = path->dentry;
					continue;
				}
  1. exec_permission_lite() performs permission check. If you are accessing a directory, you need to have executable permission. If the inode has a custom permission method, execute it.

3.1 follow_dotdot()

When the component is resolved to When, call follow_ The dotdot() function

static __always_inline void follow_dotdot(struct nameidata *nd)
		while(1) {
			/* while Handling several file systems mounted on the same mount point */
			struct vfsmount *parent;
			struct dentry *old = nd->dentry;
			/* If the most recently parsed directory is the root directory of the process, it can no longer be tracked up */
			if (nd->dentry == fs->root && nd->mnt == fs->rootmnt)
				break;
			/* If the resolved directory is not the root directory of the file system */
			if (nd->dentry != nd->mnt->mnt_root) {
				nd->dentry = dget(nd->dentry->d_parent);
				break;
			}
						
			parent = nd->mnt->mnt_parent;
			/* This file system is not installed on other file systems, which is usually represented as the root file system of the namespace */
			if (parent == nd->mnt)
				break;
			/* You need to swap file systems here */
			nd->dentry = dget(nd->mnt->mnt_mountpoint);
			nd->mnt = parent;
			/* Note that in the while loop, the assignments of dentry and mnt may change, and finally the parent directory entry of the mount point will be found */
		}
		/* Judge whether the directory is the installation point and update the values of mnt and dentry, refer to 3.2 */
		follow_mount(&nd->mnt, &nd->dentry);

3.2 follow_mount()

This function is an important function for judging and switching file systems. Check whether the component name is the mount point of a file system, exchange the file system, and switch the directory entry to the root directory of the installed file system__ follow_ The mount() function is similar to this function.

static void follow_mount(struct vfsmount **mnt, struct dentry **dentry)
{
	/* Whether other file systems are installed in the directory. while handles the installation of several file systems at the same installation point */
	while (d_mountpoint(*dentry)) {
		/* Search the hash table for the file system installed on the directory item, refer to 3.3.3 */
		struct vfsmount *mounted = lookup_mnt(*mnt, *dentry);
		/* If the installation does not occur, exit directly, otherwise you need to exchange the file system */
		if (!mounted)
			break;
		dput(*dentry);
		mntput(*mnt);
		*mnt = mounted;
		/* The directory entry switches to the root directory of the installation point */
		*dentry = dget(mounted->mnt_root);
	}
}

3.3 do_lookup()

static int do_lookup(struct nameidata *nd, struct qstr *name, struct path *path)
		struct vfsmount *mnt = nd->mnt;
		/* Search the directory item cache for directory item objects, refer to 3.3.1 */
		struct dentry *dentry = __d_lookup(nd->dentry, name);

		if (!dentry)
			goto need_lookup;
done:
		path->mnt = mnt;
		path->dentry = dentry;
		/* Refer to 3.2 follow_mount, encapsulated with different parameters */
		__follow_mount(path);

need_lookup:
	/* If no such directory entry object is found, refer to 3.3.2 */
	dentry = real_lookup(nd->dentry, name, nd);
	goto done;
		

3.3.1 __d_lookup()

Searches the catalog item cache for component catalog item objects. Note that the directory entry in the parameter here passes in the address of the parent directory entry.

struct dentry * __d_lookup(struct dentry * parent, struct qstr * name)
		unsigned int len = name->len;
		unsigned int hash = name->hash;
		const unsigned char *str = name->name;
		/* According to the address of the parent directory entry and the calculated hash value, the array entry in the hash table is obtained */
		struct hlist_head *head = d_hash(parent,hash);
				dentry_hashtable + (hash & D_HASHMASK);
		struct dentry *found = NULL;
		struct hlist_node *node;
		struct dentry *dentry;
		
		hlist_for_each_entry_rcu(dentry, node, head, d_hash) {
			if (dentry->d_name.hash != hash)
				continue;
			if (dentry->d_parent != parent)
				continue;
			qstr = &dentry->d_name;
			if (parent->d_op && parent->d_op->d_compare) {
				if (parent->d_op->d_compare(parent, qstr, name))
					goto next;
			} else {
				if (qstr->len != len)
					goto next;
				if (memcmp(qstr->name, str, len))
					goto next;
				if (!d_unhashed(dentry)) 
					found = dentry;
			}

3.3.2 real_lookup()

real_ The lookup () function executes the lookup method of the inode, reads the directory from the disk, creates a new directory entry object and inserts it into the directory entry cache, and then creates a new inode object and inserts it into the inode cache.

static struct dentry * real_lookup(struct dentry * parent, struct qstr * name, struct nameidata *nd)
		struct dentry * result;
		struct inode *dir = parent->d_inode;
		/* Look again to prevent the directory entry from being created during the waiting signal */
		result = d_lookup(parent, name);
				/* Is essentially a call__ d_lookup, which adds the protection of sequence lock */
				dentry = __d_lookup(parent, name);
		if (!result) {
			struct dentry * dentry = d_alloc(parent, name);
			result = dir->i_op->lookup(dir, dentry, nd);
			result = dentry;
		}

3.3.3 lookup_mnt()

Searches the hash table for child file systems based on the mount point and parent file system and returns the installed file system descriptor.

struct vfsmount *lookup_mnt(struct vfsmount *mnt, struct dentry *dentry)
		struct vfsmount *child_mnt;
		child_mnt = __lookup_mnt(mnt, dentry, 1)
				struct list_head *head = mount_hashtable + hash(mnt, dentry);
				struct list_head *tmp = head;
				struct vfsmount *p, *found = NULL;
			
				for (;;) {
					tmp = dir ? tmp->next : tmp->prev;
					p = NULL;
					if (tmp == head)
						break;
					p = list_entry(tmp, struct vfsmount, mnt_hash);
					if (p->mnt_parent == mnt && p->mnt_mountpoint == dentry) {
						found = p;
						break;
					}
				}
				return found;		

Topics: C Linux kernel