002 learn the linked list layout of Rust notes through the linked list

Posted by Kyrst on Fri, 28 Jan 2022 09:04:37 +0100

introduce

Video address: www.bilibili.com/video/av78062009/
Relevant source code: github.com/anonymousGiga/Rust-link...

The most common linked list in Rust

Define a linked list with functional syntax as follows:

List a  = Empty | Elem a (List a)

A linked list is either an empty type or a value followed by a linked list. This type is called recursive definition type and is expressed as and type. enum in Rust is the and type of type system. Therefore, the definition of the most common Rust linked list is as follows:

#[derive(Debug)]
enum List<T> {
    Cons(T, Box<List<T>>),
    Nil,
}

fn main() {
    let list: List<i32> = List::Cons(1, Box::new(List::Cons(2, Box::new(List::Nil))));
    println!("{:?}", list);
}

Note: the above method is implemented through enumeration. In fact, it is:

pub enum List {
    Empty,
    Elem(i32, Box<List>),
}

Existing problems

However, the above is a very bad definition of linked list.
Consider a linked list with two elements, as follows:

//The first layout is also the layout above us
[] = stack //Indicates that it is allocated on the stack
() = heap  //Indicates that it is allocated on the heap
[Elem A, ptr] -> (Elem B, ptr) -> (Empty, junk)

There are two problems with this approach:

  • Element A is allocated on the stack rather than on the heap;
  • The last empty element needs to allocate space.

1. Take up extra space

Let's consider the following layout:

//Second layout
[ptr] -> (Elem A, ptr) -> (Elem B, null)

The last empty space of the latter layout does not allocate space for a node.

2. Easy to split

In a layout, the first node is not allocated on the heap, which is not very good. It may not have much impact when push ing and pop, but it will have a greater impact when splitting the linked list.

Consider splitting the following example:

First layout:

[Elem A, ptr] -> (Elem B, ptr) -> (Elem C, ptr) -> (Empty *junk*)

split off C:

[Elem A, ptr] -> (Elem B, ptr) -> (Empty *junk*)

[Elem C, ptr] -> (Empty *junk*)

Second layout:

[ptr] -> (Elem A, ptr) -> (Elem B, ptr) -> (Elem C, *null*)

split off C:

[ptr] -> (Elem A, ptr) -> (Elem B. *null*)

[ptr] ->(Elem C, *null*)

The splitting of layout 2 only needs to copy the pointer stored by B in the stack, and then replace the old value with null. Layout 1 must copy C from heap memory to stack memory.
The merging of linked lists is the opposite process of splitting linked lists.

A good feature of the linked list is that we can construct elements in the node itself, and then move it freely in the linked list without moving it (the node itself). The characteristics of the linked list are shown in the following diagram:

In the figure above, we delete the intermediate node. We have no effect on the data of the intermediate node itself, but modify the address pointed to by the next pointer of the first node, that is, the node moves in the linked list without moving the node itself.

Obviously, the first layout method above destroys this feature of the linked list.

More complex enumeration

Can we define it as follows:

pub enum List {
    Empty,
    ElemThenEmpty(i32),
    ElemThenNotEmpty(i32, Box<List>),
}

With this definition, we can save space for the last empty element. However, this is more complicated because the layout of enum itself is as follows:

//Consider the following enumeration types
enum Foo {
    A(u32),
    B(u64),
    C(u8),
}
//Its layout is as follows
struct FooRepr {
    data: u64, // Depending on the tag, this item can be u64, u32, or u8
    tag: u8, // 0 = A, 1 = B, 2 = C
}

Even if the above enumeration type is used to represent null, due to tag, null pointer optimization cannot be used (refer to Rust Necromancer's book: if an enumeration type only contains a single valued variable (such as None) and a (cascaded) non null pointer variable (such as & T), then tag is actually unnecessary).

c style linked list

Through the above analysis, we will find that what we actually need is a linked list similar to that implemented in c language.
We define List as follows:

#[derive(Debug)]
pub struct Node {
    elem: i32,
    next: List,
}

#[derive(Debug)]
pub enum List {
    Empty,
    More(Box<Node>),
}

fn main() {
    let node1 = Node { elem: 1, next: List::Empty};
    let list = Box::new(node1);
    println!("{:?}", list);
}

We must set the Node to public to run correctly, but in practice, we prefer to set the Node to private. Therefore, we further evolve as follows:

#[derive(Debug)]
pub struct List{
    head: Link,
}

#[derive(Debug)]
enum Link {
    Empty,
    More(Box<Node>),
}

#[derive(Debug)]
struct Node {
    elem: i32,
    next: Link,
}

fn main() {
    let node1 = Node { elem: 1, next: Link::Empty};
    let list = List {head: Link::More(Box::new(node1))};
    println!("{:?}", list);
}

So far, we have basically got a relatively satisfactory layout of the linked list!

Topics: Rust