Tracking with BPF

Posted by romzer on Mon, 07 Mar 2022 17:56:48 +0100


Source: Linux Observability with BPF

Organize the third chapter of the next book: Tracing with BPF

It is recommended to read: EBPF article translation (2) -- BCC Introduction (with experimental environment)

In addition, using BPF can see what is happening somewhere in your system. It's cool, isn't it. comic Working cell Tell what a cell in the body is going through. The two seem to have similar points.


The code can be divided into kernel space code and user space code according to the different running environment.

  1. There are two ways to detect code in kernel space:

    • Insert a probe program called kprobes before execution.
    • After execution, insert a probe program called kretprobes.

    However, the interface of the kernel may change, causing the probe code to fail in the next version. Therefore, the kernel introduces tracepoints. Tracepoints are static tags in kernel code that can be used to attach code to a running kernel. The main difference from k(ret)probes is that when kernel developers implement changes in the kernel, they are encoded by kernel developers; We can see all available tracepoints in / sys/kernel/debug/tracing/events. [there is no bpf directory in 5.4.0-70.]

  2. There are two ways to detect codes in user space:

    • Insert probes before execution, called upprobes
    • After execution, insert a probe program called uretprobes.

    Similarly, the code interface of user space may also change, resulting in the failure of detection code. Therefore, programmers choose to insert Tracepoints when writing user space code. You can choose whether to turn on detection or not externally.

Tracking with BPF

In software engineering, tracing is a method of collecting data for analysis and debugging. The purpose is to provide useful information for future analysis at run time. The main advantage of using BPF for tracing is that you can access almost any information in the Linux kernel and applications. Compared with other tracking technologies, BPF adds the least overhead to the performance and latency of the system, and it does not require developers to modify their applications just to collect data from them.

Starting with this chapter, we will use a powerful toolbox to write BPF programs, BPF Compiler Collection (BCC) . BCC is a set of components that make building BPF programs more predictable. Even if you have mastered Clang and LLVM, you may not want to spend more time building the same utility and ensuring that the BPF validator does not reject your program. BCC provides reusable components for common structures, such as Perf event mapping, and integrates with LLVM backend to provide better debugging options. In addition, BCC also includes the binding of several programming languages; We will use Python in the example. These bindings allow you to write the user space portion of a BPF program in a high-level language, resulting in more useful programs.

I don't know how to use BCC for the time being. I haven't seen the doc in the warehouse. The following only covers the simple use of BCC for the time being.

Temporarily simple install There are many knowledge points involved in this tool and the installation process. I'll write one later.

# sudo apt upgrade
sudo apt install bpfcc-tools linux-headers-$(uname -r)


Kprobes allows you to insert BPF programs before executing any kernel instructions.

The purpose of the following program: when you want to execute execve system call, print "the name of the current command running in the kernel".

The file name of the following program: example py

from bcc import BPF

bpf_source = """
#include <uapi/linux/ptrace.h> 

int do_sys_execve(struct pt_regs *ctx){
    char comm[16];
    bpf_get_current_comm(&comm, sizeof(comm));
    bpf_trace_printk("executing program: %s", comm);
    return 0;

bpf = BPF(text=bpf_source) # Load the BPF program into the kernel.

# Associate the program with execve syscall.
# The name of execve system call has changed in different kernel versions, and BCC provides get_syscall_fnname function to retrieve this name without having to remember the running kernel version.
execve_function = bpf.get_syscall_fnname("execve")
bpf.attach_kprobe(event = execve_function, fn_name = "do_sys_execve")

We run the program and look at the output.

➜  sudo python3
b' sogoupinyinServ-9315    [002] ....  4785.168205: 0: executing program: sogoupinyinServ
zsh-9320    [003] ....  4785.419445: 0: executing program: zsh
env-9370    [001] ....  4786.090141: 0: executing program: env

I use zsh-9320 [003] 4785.419445: 0: executing program: Zsh, analyze the output. Here's a reference EBPF article translation (2) -- BCC Introduction (with experimental environment) . As for the location of the official document supported by this analysis, I don't know yet.

  • zsh is the name of the application when execve is triggered
  • 9320 is the PID of this application
  • [003] indicates running on the third CPU core
  • executing program: zsh is what we print
  • I don't know anything else.


The kretprobes probe is called when the kernel function returns.

The code is similar to the above.

from bcc import BPF

# The kretprobes probe is called when the kernel function returns

bpf_source = """
#include <uapi/linux/ptrace.h>

int ret_sys_execve(struct pt_regs *ctx) {
  int return_value;
  char comm[16];
  bpf_get_current_comm(&comm, sizeof(comm));
  return_value = PT_REGS_RC(ctx);

  bpf_trace_printk("program: %s, return: %d\\n", comm, return_value);
  return 0;

bpf = BPF(text=bpf_source)
execve_function = bpf.get_syscall_fnname("execve")
bpf.attach_kretprobe(event=execve_function, fn_name="ret_sys_execve")


Event Tracing : Tracepoints can be used without creating a custom kernel module to register probe functions using the event tracing infrastructure.

Notes on Analysing Behaviour Using Events and Tracepoints : tracepoints can also be combined with each other. We can see all available tracepoints in / sys/kernel/debug/tracing/events.

The sample code given in the book: a BPF program that tracks all applications that load other BPF programs in the system

from bcc import BPF

bpf_source = """
int trace_bpf_prog_load(void *ctx) {
  char comm[16];
  bpf_get_current_comm(&comm, sizeof(comm));

  bpf_trace_printk("%s is loading a BPF program", comm);
  return 0;

bpf = BPF(text = bpf_source)
bpf.attach_tracepoint(tp = "bpf:bpf_prog_load", fn_name = "trace_bpf_prog_load")

There is no bpf directory in my events directory. So the whole program can't run. I wonder if I can add it myself. So I looked through the above two links, but I didn't find the answer. Therefore, I don't know how to solve this problem for the time being.


Upprobes are probes for user space functions. Used in bcc attach_uprobe()

Example in the book: probe detection for the main function of a go code.

Let's install golang first: Download and install ,Tutorial: Get started with Go

cd ~
sudo rm -rf /usr/local/go 
sudo tar -C /usr/local -xzf go1.16.3.linux-amd64.tar.gz

echo "export PATH=$PATH:/usr/local/go/bin" >> ~/.profile
source ~/.profile

go version
rm -f ~/go1.16.3.linux-amd64.tar.gz
# Create go Mod file to track the dependencies of the code
cd uprobes
go mod init uprobes

Then write a simple go code. We use the BPF probe to detect the main function.

package main // Declare a main package

import "fmt" // Import the popular fmt package

func main()  { // Implement a main function to print a message to the console
    fmt.Println("Hello, BPF")
# go run .

# Compile to generate executable
go build -o hello-bpf main.go

If a fatal error is reported: sys / SDT h: Without that file or directory, just install systemtap SDT dev.

sudo apt-get install systemtap-sdt-dev


The software package includes header file and executable file( dtrace),Can be used to add static probes to user space applications

Next, we use to write a BPF program to detect the main function in the go function above.

from bcc import BPF
import os

bpf_source = """
int trace_go_main(struct pt_regs *ctx) {
  u64 pid = bpf_get_current_pid_tgid();
  bpf_trace_printk("New hello-bpf process running with PID: %d\\n", pid);
  return 0;

bpf = BPF(text = bpf_source)
bpf.attach_uprobe(name = "./hello-bpf", sym = "main.main", fn_name = "trace_go_main")

Here, let's take a brief look attach_uprobe Parameters of the function.

  • Name = ". / Hello BPF": the address of the detected program. If you want to test the library, it's OK. Refer to the official documents by yourself.

  • sym = "main.main": sym is the detected address. I'm not sure here.

    Use nm to view the go code compiled above.

    ➜  uprobes git:(master) ✗ nm hello-bpf | grep main
    0000000000535ec0 D main..inittask
    0000000000497660 T main.main
    0000000000434d20 T runtime.main
    000000000045e460 T runtime.main.func1
    000000000045e4c0 T runtime.main.func2
    000000000054ab50 B runtime.main_init_done
    00000000004d8828 R runtime.mainPC
    0000000000578210 B runtime.mainStarted
  • fn_name = "trace_go_main": bpf program name.

Run the bpf program of python first. Next, we run the Hello bpf program. At this point, you can use example Output of Py program:

b'       hello-bpf-13691   [002] ....  8925.396573: 0: New hello-bpf process running with PID: 13691'


Combining uprobes and uretprobes can write more complex BPF programs. They can give you a more comprehensive understanding of the applications running in the system. When you can inject trace code before the function runs and immediately after completion, you can start collecting more data and measuring application behavior. A common use case is to measure the time required for a function to execute without having to change a line of code in the application.

from bcc import BPF

# use the application PID as the table key,

bpf_source = """
BPF_HASH(cache, u64, u64);

int trace_start_time(struct pt_regs *ctx) {
  u64 pid = bpf_get_current_pid_tgid();
  u64 start_time_ns = bpf_ktime_get_ns();
  cache.update(&pid, &start_time_ns);
  return 0;

bpf_source += """
int print_duration(struct pt_regs *ctx) {
  u64 pid = bpf_get_current_pid_tgid();
  u64 *start_time_ns = cache.lookup(&pid);
  if (start_time_ns == 0) {
    return 0;
  u64 duration_ns = bpf_ktime_get_ns() - *start_time_ns;
  bpf_trace_printk("Function call duration: %d\\n", duration_ns);
  return 0;

bpf = BPF(text = bpf_source)
bpf.attach_uprobe(name = "../uprobes/hello-bpf", sym = "main.main", fn_name = "trace_start_time")
bpf.attach_uretprobe(name = "../uprobes/hello-bpf", sym = "main.main", fn_name = "print_duration")
# The output is as follows
b'       hello-bpf-14768   [001] ....  9515.517235: 0: Function call duration: 29605'

User statically defined tracepoints (USDTs)

User Statically Defined Tracepoints (USDT) provide static tracepoints for applications in user space. This is a convenient way to detect applications because they provide you with a low open entry point for the tracking functionality provided by BPF. USDT passed DTrace DTrace is a tool originally developed by Sun Microsystems for dynamic detection of Unix systems. DTrace was not available in Linux until recently due to licensing issues.

For example, the following program.

#include <sys/sdt.h>

int main(int argc, char const *argv[]) {
    DTRACE_PROBE("hello-usdt", "probe-main");
    return 0;

(I failed to execute this program successfully. You can refer to: bpf tracking function - get started with USDT . Visual inspection is difficult. I'm not sure. Skip.)

USDT requires developers to insert instructions into the code, which the kernel uses as a trap for executing BPF programs. DTRACE_PROBE registers tracepoint. The kernel uses this tracepoint to insert BPF callback functions. The first parameter in this macro is the program that reports the trace. The second is the name of trace we reported. In this way, you can access trace data when the program is running

BPF programs can be used together as follows.

from bcc import BPF, USDT

# After knowing the trace points supported in the binary file,
# You can attach BPF programs to these trace points in a similar way as in the previous example:

bpf_source = """
#include <uapi/linux/ptrace.h>
int trace_binary_exec(struct pt_regs *ctx) {
  u64 pid = bpf_get_current_pid_tgid();
  bpf_trace_printk("New hello_usdt process running with PID: %d", pid);

# Create a USDT object; We didn't do this in the previous example. USDT is not part of BPF
usdt = USDT(path = "./hello_usdt")

# In our application, the BPF function executed by the tracker is attached to the probe.
usdt.enable_probe(probe = "probe-main", fn_name = "trace_binary_exec")

# Initialize our BPF environment with the trace point definition we just created.
# bpf = BPF(text = bpf_source, usdt = usdt)
bpf = BPF(text = bpf_source, usdt_contexts = [usdt])


Visual tracking data

Tracking data can be obtained above. The data is visualized into images, which is more intuitive to watch. The book gives three:

In view of my bumpy bcc usage process, I will skip the data visualization part for the time being.

Topics: bpf