Ebpf is a very complex and powerful technology provided by modern Linux kernel. It makes the Linux kernel programmable and no longer a complete black box. With the development and maturity of ebpf, its application is more and more extensive. This paper introduces how to use ebpf to track nodes JS underlying code.
introduce
Although the design idea of ebpf is very simple, its implementation and use are very complex. Ebpf essentially implements a virtual machine. Users can load their own c code into the kernel and execute it, so as to participate in the logic processing of the kernel. This sounds very simple, but the whole technology is actually very complex. In terms of implementation, the kernel needs to verify the loaded code very much and complex to ensure security. The kernel also needs to implement a virtual machine to execute the user's code and add logic supporting ebpf mechanism into the kernel code. In terms of use, the cost of using or writing ebpf code is very high for us. We need to learn to build the environment, understand how to compile ebpf programs, and even understand some knowledge of Linux kernel. However, with ebpf years of development, this situation has improved a lot. There are many introductions to ebpf on the Internet, but not much here.
use
Let's take a look at how to write an ebpf program based on libbpf. Ebpf program is divided into two parts. The first part is ebpf code. hello.bpf.c
#include <linux/bpf.h>#include <bpf/bpf_helpers.h> SEC("tracepoint/syscalls/sys_enter_execve")int handle_tp(void *ctx){ int pid = bpf_get_current_pid_tgid()>> 32; char fmt[] = "BPF triggered from PID %d.\n"; bpf_trace_printk(fmt, sizeof(fmt), pid); return 0;} char LICENSE[] SEC("license") = "Dual BSD/GPL";
The above is the code loaded into the kernel for execution. It mainly uses the tracepoint mechanism of the kernel to give sys_ enter_ The execve function inserts a hook. Each time this function is executed, the hook function will be executed. The other part is the code responsible for loading ebpf code into the kernel. hello.c
#include <stdio.h>#include <stdlib.h>#include <string.h>#include <assert.h>#include <errno.h>#include <fcntl.h>#include <unistd.h>#include <sys/resource.h>#include <bpf/libbpf.h>#include "hello.skel.h" int main(int argc, char **argv){ struct hello_bpf *skel; int err; /* Open BPF application */ skel = hello_bpf__open(); /* Load & verify BPF programs */ err = hello_bpf__load(skel); /* Attach tracepoint handler */ err = hello_bpf__attach(skel); printf("Hello BPF started, hit Ctrl+C to stop!\n"); // output read_trace_pipe(); cleanup: hello_bpf__destroy(skel); return -err;}
Only the core code is listed here, hello The logic of C is very simple. Open ebpf, load it into the kernel, and finally check the input of ebpf program. This is the overall logic of ebpf program. The process is similar. The focus is to determine what we need to do, and then write different code. Finally, the ebpf code can be destroyed if tracing is no longer needed.
application
Before ebpf, the kernel was a black box for us. With ebpf, the kernel is much more transparent to us. But the software is layered. We usually don't deal with the kernel directly. We are more concerned about the upper software. Specifically, when we use a node JS, in addition to the business code, we also need to care about node JS itself. But node JS is also a black box for us. We don't know what it has done or the operation status at a certain time, which is very unfavorable for us to troubleshoot problems or understand the operation of the system. With ebpf, we can do more. The Linux kernel provides many code tracking technologies, including uprobe, which is a technology for dynamically tracking application code. For example, we want to know about node UV in Libuv of JS_ tcp_ Listen function, then we can achieve this effect through ebpf. With this ability, we can master more data and information of the system.
realization
The application layer uses uprobe, which is more complex than kprobe. Kprobe is used to track the kernel function. Because the kernel knows the virtual address corresponding to its function, we only need to tell it the function name to track the function. However, uprobe is different. Uprobe is used to track the application layer code, The kernel does not know or should not pay attention to the virtual address corresponding to a function, so this problem needs to be solved by the application layer. Let's take a look at the specific implementation. uprobe.bpf.c
#include <linux/bpf.h>#include <linux/ptrace.h>#include <bpf/bpf_helpers.h>#include <bpf/bpf_tracing.h>#include "uv.h" char LICENSE[] SEC("license") = "Dual BSD/GPL";SEC("uprobe/uv_tcp_listen")int BPF_KPROBE(uprobe, uv_tcp_t* tcp, int backlog, uv_connection_cb cb){ bpf_printk("uv_tcp_listen start %d \n", backlog); return 0;} SEC("uretprobe/uv_tcp_listen")int BPF_KRETPROBE(uretprobe, int ret){ bpf_printk("uv_tcp_listen end %d \n", ret); return 0;}
Here we implement the UV of libuv_ tcp_ The listen function performs tracking, including two tracking points: the beginning and the end of the function execution. After defining the ebpf program, let's take a look at how to load it into the kernel. uprobe.c
int main(int argc, char **argv){ struct uprobe_bpf *skel; long base_addr, uprobe_offset; int err, i; // Executable to track char execpath[50] = "/usr/bin/node"; char * func = "uv_tcp_listen"; // Calculate the address offset of a function in the executable file uprobe_offset = get_elf_func_offset(execpath, func); /* Load and verify BPF application */ skel = uprobe_bpf__open_and_load(); /* Attach tracepoint handler */ skel->links.uprobe = bpf_program__attach_uprobe(skel->progs.uprobe, false /* not uretprobe */, -1, /* any pid */ execpath, uprobe_offset); skel->links.uretprobe = bpf_program__attach_uprobe(skel->progs.uretprobe, true /* uretprobe */, -1 /* any pid */, execpath, uprobe_offset); // ... cleanup: uprobe_bpf__destroy(skel); return -err;}
uprobe.c focuses on calculating the address information of a function in an executable file. This is mainly judged by elf file, which is an executable file generated after code compilation, It can record some metadata about executable files (you can also view it through readelf -Ws exen_file). For example, the symbol table records the function information. After getting the relevant information, you can set uprobe and uretprobe. Through the above ebpf code, we can track the call of uv_tcp_listen function. With this ability, we can listen to the functions we want to listen to. In addition to the function of uprobe After that, we can also use the kprobe of the kernel to monitor kernel functions. For example, the following ebpf code can track the creation process.
SEC("kprobe/__x64_sys_execve")int BPF_KPROBE(__x64_sys_execve){ pid_t pid; pid = bpf_get_current_pid_tgid() >> 32; bpf_printk("KPROBE ENTRY pid = %d", pid); return 0;} SEC("kretprobe/__x64_sys_execve")int BPF_KRETPROBE(__x64_sys_execve_exit){ pid_t pid; pid = bpf_get_current_pid_tgid() >> 32; bpf_printk("KPROBE EXIT: pid = %d\n", pid); return 0;}
summary
This paper briefly introduces the powerful ebpf technology and its application in node JS, but this is just a simple example. We still have many things to do, such as whether it can be used in combination with addon, how to support dynamic capabilities, and so on. In addition, because the function name after C + + code compilation is different from the original one, we may not find the virtual address through the function name. There are still many places to study here. In general, ebpf is not only for node JS is very valuable, and it has the same meaning for other application layers. This is a technology direction worth exploring.
Code warehouse:
https://github.com/theanarkh/libbpf-code