19 advanced features
We will learn more advanced features in this chapter
19.1 unsafe Rust
So far, the code we have compiled, Rust, will enforce checks at compile time to ensure memory security. However, Rust also provides a mode, unsafe Rust. These codes are written in unsafe blocks. They are no different from conventional codes, but they can provide additional functions to meet the needs that we cannot achieve through secure Rust code
Why design unsafe Rust? There are two reasons:
1. Static analysis is conservative in nature. Therefore, when using ANN Rust, when the compiler checks whether a piece of code supports a guarantee, when it is uncertain, even if the code itself is safe, it will reject the code due to conservative rules, which leads to code killing by mistake
We can compile the code that cannot be determined by the compiler in unsafe Rust, but the disadvantage is that we have to be responsible for the security of the code ourselves
2. The underlying computer hardware is inherently insecure. If unsafe rust is not allowed, we will not be able to complete some tasks. Rust needs to perform some operations, such as directly interacting with the operating system, and even implementing some underlying system programming, such as writing our own operating system. This is also rust's goal and its strength
The unsafe super ability of Rust mainly includes the following aspects
1. Dereference bare pointer
2. Call unsafe functions or methods
3. Access or modify variable static variables
4. Implement unsafe trait
5. Access the field of union
Also, note: unsafe does not close the borrowing checker or disable other t rust checks: so if you use references in unsafe code, it will still be checked. The unsafe keyword only provides unchecked exceptions to the above five functions. So we can still get some degree of security in unsafe blocks
Also, the code in the unsafe block does not necessarily represent insecurity. The reason has been mentioned above
When we compile unsafe code, we want to isolate it. Encapsulating it into a secure abstraction and providing a secure API is a good way. This will prevent unsafe leaks
Let's look at these features
Dereference bare pointer
In regular code, the compiler always ensures that the reference is valid
Unsafe Rust has two new types similar to references called raw pointers. Like references, bare pointers are divided into variable and immutable, written as * const T and * mut T respectively. The * here is part of the type, not the dereference operator. Immutability means that the value cannot be assigned directly after dereference
Difference between bare pointer and reference and smart pointer
1. It is allowed to ignore borrowing rules. You can have both variable and immutable pointers, or multiple variable pointers pointing to the same position
2. It is not guaranteed to point to valid memory
3. Null is allowed
4. No automatic cleaning function can be realized
By giving up Rust's security guarantee, we can gain performance or the ability to use another language or hardware interface
fn main() { let mut num = 5; let r1 = &num as *const i32; let r2 = &mut num as *mut i32; println!("{:?},{:?}",r1,r2) }
Running `target/debug/advancedfunction` 0x7ffeed3519fc,0x7ffeed3519fc
Create immutable and immutable raw pointers by reference and print them (creating a raw pointer directly from a secure reference can always ensure that it is valid). Here, as is used to force the variable reference and immutable reference to the corresponding raw pointer type
We can not create a bare pointer in the unsafe code block, but dereference it outside
Let's create a bare pointer whose validity cannot be determined. Trying to use arbitrary memory is an undefined behavior. This address may or may not have data. Compilation may optimize this memory access, or the program may have segment errors. There is usually no good reason to write such code, but it is feasible
fn main() { let address = 0x012345usize; let r = address as *const i32; println!("{:?}",r) }
Creates a raw pointer to any memory address
Now we dereference the raw pointer in the unsafe block, because we can't dereference the raw pointer outside the unsafe block
fn main() { let mut num = 5; let r1 = &num as *const i32; let r2 = &mut num as *mut i32; println!("{:?},{:?}",r1,r2); unsafe { println!("r1 is: {}", *r1); println!("r2 is: {}", *r2) } }
Running `target/debug/advancedfunction` 0x7ffeec00695c,0x7ffeec00695c r1 is: 5 r2 is: 5
Creating a bare pointer does not pose any danger. It is only possible to encounter invalid values when accessing the value it points to
Although we can create variable pointers and good immutable pointers of the same address at the same time through bare pointers, if we modify data through variable pointers, it may potentially cause data competition. Please pay attention
Since there is such a danger, why should it have this function? First, we can call the c code interface. The other is to build security abstractions that the borrowing checker cannot understand
Calling an unsafe function or method
Similarly, unsafe functions and methods are very similar to conventional functions and methods, except that there is an unsafe keyword in front of them. Of course, we need to be responsible for these functions themselves
unsafe fn dangerous() {} unsafe { dangerous(); }
The unsafe function must be called in the unsafe block, otherwise the compiler will report the error.
src/main.rs:14:5 | 14 | dangerous() | ^^^^^^^^^^^ call to unsafe function | = note: consult the function's documentation for information on how to avoid undefined behavior
Inserting a function call into an unsafe block tells the compiler that we know what we are doing. Unsafe function bodies are also valid unsafe blocks, so there is no need to add additional unsafe blocks for another unsafe operation
Create a security abstraction of unsafe code
The inclusion of some unsafe code in a function does not mean that the whole function is unsafe. We generally encapsulate unsafe code into safe functions, which is a common abstraction
fn main() { let mut v = vec![1,2,3,4,5,6]; let r = &mut v[..]; let(a,b) = r.split_at_mut(3); assert_eq!(a,&mut [1,2,3]); assert_eq!(b,&mut [4,5,6]); }
A slice is divided into two slices, where we use split_at_mut function
Using only secure Rust to implement this function may be as follows, which is only applicable to i32 rather than generic T
pub fn split_at_mut(slice: &mut [i32],mid:usize) -> (&mut [i32],&mut [i32]) { let len = slice.len(); assert!(mid <= len); (&mut slice[..mid], &mut slice[mid..]) }
error[E0499]: cannot borrow `*slice` as mutable more than once at a time --> src/main.rs:9:11 | 4 | pub fn split_at_mut(slice: &mut [i32],mid:usize) -> (&mut [i32],&mut [i32]) { | - let's call the lifetime of this reference `'1` ... 8 | (&mut slice[..mid], | - ----- first mutable borrow occurs here | _____| | | 9 | | &mut slice[mid..]) | |___________^^^^^_______- returning this value requires that `*slice` is borrowed for `'1` | | | second mutable borrow occurs here
We borrowed two different fragments of slice in the code. This operation is safe, but Rust is not intelligent enough. It thinks we borrowed the same slice twice, so it rejected this code. Then we can only use unsafe Rust to achieve this
use std::slice; pub fn split_at_mut(slice: &mut [i32],mid:usize) -> (&mut [i32],&mut [i32]) { let len = slice.len(); let ptr = slice.as_mut_ptr(); assert!(mid <= len); unsafe { (slice::from_raw_parts_mut(ptr, mid), slice::from_raw_parts_mut(ptr.add(mid), len-mid),) } }
Let's look at the details
Because we want to borrow the same slice twice (actually borrowing two different fragments), we use the bare pointer, as_mut_ptr forcibly converts a variable reference to a bare pointer
pub const fn as_mut_ptr(&mut self) -> *mut T { self as *mut [T] as *mut T }
With bare pointers, we can use them many times in unsafe code blocks. By the way, let's take a look at from_ raw_ parts_ What did mut do? It cut slice in half!
pub unsafe fn from_raw_parts_mut<'a, T>(data: *mut T, len: usize) -> &'a mut [T] { debug_assert!(is_aligned_and_not_null(data), "attempt to create unaligned or null slice"); debug_assert!( mem::size_of::<T>().saturating_mul(len) <= isize::MAX as usize, "attempt to create slice covering at least half the address space" ); unsafe { &mut *ptr::slice_from_raw_parts_mut(data, len) } }
Note: we do not need to split_ at_ The result of the MUT function is labeled unsafe and can be called in the secure Rust. This is because we created a security abstraction of unsafe code and used unsafe code in a safe way, because we directly created valid bare pointers from using parameters (parameters are valid)
However, the following functions may crash when using slice
use std::slice; let address = 0x012345usize; let r = address as *const i32; let slice: &[i32] = unsafe { slice::from_raw_parts_mut(r,10000) }
Create slice from any memory address
We don't have the memory of this arbitrary address, nor can we guarantee that the slice created by this code contains a valid i32 value. Attempting to use a slice that is supposed to be valid will result in undefined behavior
Use the extern function to call external code
The keyword extern in Rust allows us to interact with other languages using Rust code, which helps to create and use external interfaces. An external function interface is the way a programming language defines functions that allow different (external) programming languages to call these functions
extern "C" { fn abs(input:i32)-> i32; } fn main() { unsafe { println!("Absolute value of -3 according to C: {}",abs(-3)); } }
Declare and call the extern function in another language
The functions declared in the extern block are always unsafe in Rust code, because other languages will not enforce Rust's rules and Rust cannot check them, so it is the programmer's responsibility to ensure their safety
In the external "C" block, we list the function signature and name of another language we want to call. The "C" part defines the application binary interface (ABI) used by the external function, and ABI defines how to call this function at the assembly language level. "C" ABI is the most common and follows the ABI of the C programming language
Call the Rust function from another language
We can also use extern to create an interface that allows other languages to call Rust functions
Unlike the extern block, add the extern keyword before the fn keyword and specify the ABI to be used. You also need to add #[no#u mangle] annotation to tell the Rust compiler not to mangle the name of this function. Mangle occurs when the compiler changes the function name we specify to a different name, which adds additional information for other programming procedures, but makes its name more difficult to read. The compiler of each programming language will mangle the function name in a slightly different way, so in order for the Rust function to be specified in other languages, the name mangling of the Rust compiler must be disabled
For the following example, once it is compiled into a dynamic library and linked from the C language, call_from_c functions can be accessed in C code
#[no_mangle] pub extern "C" fn call_from_c() { println!("Just called a Rust function from C!"); }
unsafe is not required for extern
Access or modify variable static variables
Global variables are called static variables in rust. Rust supports them, but they are problematic for ownership rules. If two threads access the same variable global variables, it may lead to data competition
static HELLO_WORLD: &str = "Hello, world!"; fn main() { println!("name is: {}", HELLO_WORLD); }
Define and use a global variable
Static variables are similar to constants. They are usually written in all uppercase and underscore. It is safe to access immutable static variables. However, the value memory address of static variables is fixed, but constants allow data to be copied. Static variables are variable. It is not safe to access variable static variables
static mut COUNTER:u32 =0; fn add_to_count (inc:u32) { unsafe { COUNTER += inc; } } fn main() { add_to_count(3); unsafe { println!("COUNTER:{}", COUNTER); } }
For this competition problem, please give priority to concurrency technology and thread safe smart pointers, so that the compiler can detect whether data access between different threads is safe
Implement insecure trait
A trait is unsafe when at least one of its methods contains invariants that cannot be verified by the compiler. You can add the unsafe keyword before trait to declare trait unsafe, and the implementation of trait must also be marked unsafe
unsafe trait Foo { //methods go here } unsafe impl Foo for i32 { //method implementations go here }
Define and implement unsafe trait s
In the section "using the extensible concurrency of Sync and Send traits" in Chapter 16, the compiler will automatically implement Sync and Send traits for types that are completely composed of Send and Sync types. If the implementation contains types other than Sync and Send, such as bare pointers, and you want to mark this type as Send or Sync, you must use unsafe
Accessing fields in a consortium
union is similar to struct, but only one declared field can be used in an instance at the same time
The consortium is mainly used to interact with the consortium in C code. It is not safe to access the fields of the consortium, because Rust cannot guarantee the type of data currently stored in the consortium instance. You can see the reference document for more information
When to use unsafe code
Of course, it is used when it is necessary and we can ensure its safety. After all, in these cases, the compiler can not help ensure memory safety