[kernel doc] [trace] kprobetrace translation

Posted by hacksurfin on Mon, 24 Jan 2022 20:17:30 +0100

Original link:
Kprobe-based Event Tracing

summary

These events are similar to tracepoint based events. But unlike tracepoint, it is based on kprobes (kprobe and kretprobe). Therefore, it can detect anywhere kprobes can detect (which means that all functions except those with the _kprobes/nokprobe_inline annotation and marked NOKPROBE_SYMBOL).
Unlike Tracepoint based events, it can be added and deleted dynamically.

To enable this feature, use CONFIG_KPROBE_EVENTS=y builds the kernel.

Similar to the event tracer, it does not need to pass current_tracer activation. Instead, by adding probe points
/sys/kernel/debug/tracing/kprobe_events and by enabling it
/sys/kernel/debug/tracing/events/kprobes/<EVENT>/enabled.

kprobe_ Summary of events

P [: [GRP /] event] [mod:] sym [+ off] |memaddr [fetches]: set a probe
R [maxactive] [: [GRP /] event] [mod:] sym [+ 0] [fetches]: set a return probe
-: [GRP/]EVENT: clear probe

GRP: group name. If omitted, use "kprobes".
EVENT: EVENT name. If omitted, the EVENT name is generated according to sym + off or MEMADDR.
MOD: gives the module name of SYM.
SYM[+offs]: insert the symbol + offset of the probe.
MEMADDR: the address where the probe is inserted.
MAXACTIVE: the maximum number of instances of the specified function that can be probed at the same time, or 0 as the default, such as document / kprobes Txt as defined in section 1.3.1.

Fetches: parameter. Each probe can have up to 128 parameters.

 %REG : Get register REG
 @ADDR : stay ADDR Get memory at( ADDR (should be in the kernel)
 @SYM[+|-offs] : stay SYM +|- offs Get memory at( SYM (should be a data symbol)
 $stackN : Gets the second row of the stack N Entries (N >= 0)
 $stack : Gets the stack address.
 $retval : Gets the return value.(*)
 $comm : Get current task comm. 
 +|-offs(FETCHARG) : stay FETCHARG +|- offs Address get memory.(**)
 NAME=FETCHARG : take NAME Set to FETCHARG The name of the parameter.
 FETCHARG:TYPE : take TYPE Set to FETCHARG Type of. At present, basic types(u8/u16/u32/u64/s8/s16/s32/s64),Hex type(x8/x16/x32/x64),"String" and bit field are supported.
 (*) Only for return probe
 (**) It is very useful for obtaining the domain of data structure

Type

Fetch args supports multiple types. Kprobe trace r will access memory according to the given type. Prefixes's' and 'u' indicate that these types are signed and unsigned, respectively‘ The X 'prefix means it is unsigned. The parameters tracked are displayed in decimal ("s" and "u") or hexadecimal ("X"). If there is no type conversion, using 'x32' or 'x64' depends on the architecture (for example, x86-32 uses x32 and x86-64 uses x64).
String type is a special type that gets a "NULL terminated" string from kernel space. This means that if the string container has been called out, it will fail and store NULL.
Bit field is another special type, which takes three parameters, bit width, bit offset and container size (usually 32). Grammar is;

b<bit-width>@<bit-offset>/<container-size>

For $comm, the default type is "string"; Any other type is invalid.

Per-Probe Event Filtering

The per probe event filtering function allows you to set different filters on each probe and provide you with parameters, which will also be displayed in the trace buffer. If in kprobe_ If the correct name is specified after 'p:' or 'r:' in events, an event will be added under tracing / events / kprobes / < event >, and you can see 'id', 'enabled', 'format' and "filter" in the directory.

enabled:
You can enable / disable the probe by writing 1 or 0 on it.

format:
This shows the format of this probe event.

filter:
You can write filtering rules for this event.

ID:
This shows the ID of this probe event.

Event Profiling

You can check the total number of probe hits and probe misses in the following ways
/sys/kernel/debug/tracing/kprobe_profile
The first column is the event name, the second column is the number of probe hits, and the third column is the number of probe misses.

Usage examples

To add a probe as a new event, add it to kprobe_events writes the new definition
As follows.

   echo 'p:myprobe do_sys_open dfd=%ax filename=%dx flags=%cx mode=+4($stack)' \
   > /sys/kernel/debug/tracing/kprobe_events
   (Here ax dx cx All register names)

This will happen in the future_ sys_ Set a kprobe at the top of the open() function and record the 1st to 4th parameters as "myprobe" events. Note that the register / stack entries assigned to each function parameter depend on the architecture specific abi. If you're not sure about ABI, try using the probe subcommand of perf tools (you can find it under tools/perf /).
As this example shows, the user can choose a more familiar name for each parameter.

   echo 'r:myretprobe do_sys_open $retval' >> /sys/kernel/debug/tracing/kprobe_events

This will happen in the future_ sys_ The return point of the open() function sets a kretprobe and records the return value as a "myretprobe" event.
You can view the format of these events by / sys / kernel / debug / tracing / events / kprobes / < event > / format

cat /sys/kernel/debug/tracing/events/kprobes/myprobe/format
name: myprobe
ID: 780
format:
        field:unsigned short common_type;       offset:0;       size:2; signed:0;
        field:unsigned char common_flags;       offset:2;       size:1; signed:0;
        field:unsigned char common_preempt_count;       offset:3; size:1;signed:0;
        field:int common_pid;   offset:4;       size:4; signed:1;

        field:unsigned long __probe_ip; offset:12;      size:4; signed:0;
        field:int __probe_nargs;        offset:16;      size:4; signed:1;
        field:unsigned long dfd;        offset:20;      size:4; signed:0;
        field:unsigned long filename;   offset:24;      size:4; signed:0;
        field:unsigned long flags;      offset:28;      size:4; signed:0;
        field:unsigned long mode;       offset:32;      size:4; signed:0;
     
print fmt: "(%lx) dfd=%lx filename=%lx flags=%lx mode=%lx", REC->__probe_ip,
REC->dfd, REC->filename, REC->flags, REC->mode

You can see that the event has four parameters, such as the expression you specify.

echo > /sys/kernel/debug/tracing/kprobe_events

This will clear all probe points.

Or,

echo -:myprobe >> kprobe_events

This will selectively clear the probe points.
After definition, each event is disabled by default. To track these events, you need to enable it.

   echo  1 > /sys/kernel/debug/tracing/events/kprobes/myprobe/enable
   echo 1 > /sys/kernel/debug/tracing/events/kprobes/myretprobe/enable

You can view trace information through / sys/kernel/debug/tracing/trace.

  cat /sys/kernel/debug/tracing/trace
# tracer: nop
#
#           TASK-PID    CPU#    TIMESTAMP  FUNCTION
#              | |       |          |         |
           <...>-1447  [001] 1038282.286875: myprobe: (do_sys_open+0x0/0xd6) dfd=3 filename=7fffd1ec4440 flags=8000 mode=0
           <...>-1447  [001] 1038282.286878: myretprobe: (sys_openat+0xc/0xe <- do_sys_open) $retval=fffffffffffffffe
           <...>-1447  [001] 1038282.286885: myprobe: (do_sys_open+0x0/0xd6) dfd=ffffff9c filename=40413c flags=8000 mode=1b6
           <...>-1447  [001] 1038282.286915: myretprobe: (sys_open+0x1b/0x1d <- do_sys_open) $retval=3
           <...>-1447  [001] 1038282.286969: myprobe: (do_sys_open+0x0/0xd6) dfd=ffffff9c filename=4041c6 flags=98800 mode=10
           <...>-1447  [001] 1038282.286976: myretprobe: (sys_open+0x1b/0x1d <- do_sys_open) $retval=3

Each line shows when the kernel triggers the event, < - SYMBOL indicates that the kernel returns from SYMBOL (for example, "sys_open+0x1b / 0x1d" < - do_sys_open "indicates that the kernel returns from do_sys_open to sys_open+0x1b).