Quanzhi H64 platform el1_ Debugging analysis of abnormal interrupt in entry

Posted by learning_php_mysql on Tue, 22 Feb 2022 03:08:17 +0100

1. log analysis

[ 3537.282130] PC is at do_page_fault+0x40/0x2e0
[ 3537.282130] LR is at do_translation_fault+0x5c/0xd4
[ 3537.282130] pc : [<ffffffc000095704>] lr : [<ffffffc000095a00>] pstate: 800001c5
[ 3537.282130] sp : ffffffc027b38130
[ 3537.282130] x29: ffffffc027b38130 x28: ffffffc027bdc000 
[ 3537.282130] x27: ffffffc0009fa0e9 x26: 0000000000000000 
[ 3537.282130] x25: 0000000096000005 x24: 0000000000000025 
[ 3537.282130] x23: 0000000000000000 x22: ffffffc027b38390 
[ 3537.282130] x21: 00000000000002b0 x20: 00000000000002b0 
[ 3537.282130] x19: ffffffc027b38390 x18: 000000000000001e 
[ 3537.282130] x17: 00000000000101d0 x16: ffffffc0111dccf4 
[ 3537.282130] x15: ffffffc0111dcc04 x14: 0000000000000003 
[ 3537.282130] x13: 000000004437411e x12: ffffffc000822000 
[ 3537.282130] x11: 0000000000000006 x10: 0000000000000007 
[ 3537.282130] x9 : 000000000000000e x8 : 00125bbb859b6f00 
[ 3537.282130] x7 : 0000000000000012 x6 : ffffffc000cf77d0 
[ 3537.282130] x5 : ffffffc00079e3d8 x4 : ffffffc00079e3d8 
[ 3537.282130] x3 : ffffffc0000959a4 x2 : ffffffc027b38390 
[ 3537.282130] x1 : 0000000096000005 x0 : 00000000800001c5 

do_page_fault stack assembly:

ffffffc0000956c4 <do_page_fault>:
ffffffc0000956c4:   a9a87bfd    stp x29, x30, [sp,#-384]!
ffffffc0000956c8:   910003fd    mov x29, sp

Crash site sp and x29

sp :        ffffffc027b38130
x29:        ffffffc027b38130
x30(lr):ffffffc000095a00

X30 is the upper level LR register data, and x30 is put into the memory of [sp-384+8] address ffffc027b38138,

The data in the memory address ffffc027b38138 is ffffc000095a00, which is inversely inferred by SP,

The lr data stored in sp is consistent with the lr data of Runfei; It indicates that the CPU data is normal

2. DS5 level analysis

Use DS5 to connect according to cpu0 -- > CPU1 -- > CPU2 -- > cpu3 in turn,

DS5 stop s the online cpu in turn, loads vmlinux in turn,

After that, you can view all CPU stack information and dump the cpu current thread information:

info stack

It can be seen from the stack that the cause of the crash is that el1 is triggered after the cpu accesses an illegal address_ Sync is interrupted abnormally,

During interrupt processing, it is checked that the reason for triggering the interrupt is data abort in EL1,

Jump to do_ mem_ The abort process handles page missing exceptions,

do_ page_ The fault phase detects that the illegal address is triggered in kernel space, resulting in panic abnormal crash

At present, the scene of multiple crashes is consistent, and when there is a problem, the incoming addr parameter is relatively random,

At present, it is suspected that the pointer in the parameter passed from 32-bit user space to 64 bit kernel is abnormal, resulting in an exception when the cpu accesses the address in kernel space

At present, the main difficulty is that DS5 can only grab el1_ The crash process after sync abnormal interrupt. The SPSR and SP of CPU before CPU abnormal interrupt need to be deduced through assembly

el1_sync
    kernel_entry el=1 in sp = sp - (288-240)
                                         sp = sp - (15*16)
                                         x21 register = sp + 288
                                         x22 register = el1 lr
                                         x23 register = el1 spsr
                                         take lr Register stack [sp + 240]        //LR
                                         take x21 Register value stack[sp + 240 + 8]
                                         take x22 Register value stack[sp + 256]   //PC
                                         take x23 Register value stack[sp + 256 +8]
  el1_da:
                                     x2 = sp
            do_mem_abort: 
                                 x29 Register stack[sp-176]
                                 x30 Register stack[sp-176+8] 
                                 sp = sp - 176              
                                 x29 = sp
            do_translation_fault: 
                                        x29-->[sp-48]
                                        x30-->[sp-48+8]
                                        sp = sp - 48
            do_page_fault:
                                        x29-->[sp-384]
                                        x30-->[sp-384+8]
                                        sp = sp - 384   

Since the scene in sections 1 ~ 3 has been destroyed, the memory cannot be read,

When the phenomenon reappears, grab the valid data as follows:

#0 arch_counter_get_cntvct() at arch_timer.h:153
#1 __delay(cycles = 24000) at delay.c:31
#2 __const_udelay(xloops = <Value currently has no location>) at delay.c:42
#3 panic(fmt = <Value currently has no location>) at panic.c:187
#4 die(str = <Value currently has no location>, regs = (struct pt_regs*) 0xFFFFFFC0297FC050, err = -1778384891) at traps.c:247
#5 __do_kernel_fault(mm = (struct mm_struct*) 0xFFFFFFC029BF8680, addr = 18446743833205608120, esr = 2516582405, regs = (struct pt_regs*) 0xFFFFFFC0297FC050) at fault.c:102
#6 do_translation_fault(addr = 18446743833205608120, esr = 2516582405, regs = (struct pt_regs*) 0xFFFFFFC0297FC050) at fault.c:362
#7 do_mem_abort(addr = 18446743833205608120, esr = 2516582405, regs = (struct pt_regs*) 0xFFFFFFC0297FC050) at fault.c:459
#8 [el1_sync+0xB0]

(1)#11: try_to_wake_up

stay#In 10, sp change and x30 data stacking operations are as follows:
     #11-x29 --> #11-sp -80
     #11-x30 --> #11-sp -80 +8
     #10-sp = #11-sp -80 = 0xFFFFFFC0297FC170

     #11-x29 = data captured by DS5 0xffffc0297fc1c0
     #11-x30 = data captured by DS5 0xffffc0000cec7c
     #11-SP = 0xFFFFFFC0297FC1C0

     LR = X30 = 0xFFFFFFC0000CEC7C
     The assembly code is:
     ffffffc0000cea74 <try_to_wake_up>:
     ... ...
     ... ...
   ffffffc0000cec78:   97ffecd8    bl  ffffffc0000c9fd8 <ttwu_stat>
-->ffffffc0000cec7c:   14000020    b   ffffffc0000cecfc <try_to_wake_up+0x288> 
     ... ...
     ... ...  

x19 Register: 0 xFFFFFFC012DD3440

(2)#10: ttwu_stat

        #10-cpsr = #9-spsr  = 0x00000000800001C5
        M[4:0] = 0b00101 AARCH64 EL1h System abnormal mode  M[0]= 0b1 SP_EL1 As SP
        #10-sp   = #9-sp    = 0xFFFFFFC0297FC170
        #9-lr 0xffffc0000ca014 derivation code location:
        ffffffc0000c9fd8 <ttwu_stat>:
        ffffffc0000c9fd8:   a9bb7bfd    stp x29, x30, [sp,#-80]!
        ffffffc0000c9fdc:   910003fd    mov x29, sp
        ffffffc0000c9fe0:   a90153f3    stp x19, x20, [sp,#16]
        ffffffc0000c9fe4:   a9025bf5    stp x21, x22, [sp,#32]
        ffffffc0000c9fe8:   a90363f7    stp x23, x24, [sp,#48]
        ffffffc0000c9fec:   f90023f9    str x25, [sp,#64]
        ffffffc0000c9ff0:   90006656    adrp    x22, ffffffc000d91000 <__key.22563>
        ffffffc0000c9ff4:   aa0003f3    mov x19, x0
        ffffffc0000c9ff8:   9102e2d6    add x22, x22, #0xb8
        ffffffc0000c9ffc:   aa1e03e0    mov x0, x30
        ffffffc0000ca000:   2a0103f8    mov w24, w1
        ffffffc0000ca004:   2a0203f7    mov w23, w2
        ffffffc0000ca008:   97ff185a    bl  ffffffc000090170 <_mcount>
        ffffffc0000ca00c:   b00054b5    adrp    x21, ffffffc000b5f000 <cpu_worker_pools+0x440>
        ffffffc0000ca010:   940a6f4c    bl  ffffffc000365d40 <debug_smp_processor_id>
--->ffffffc0000ca014:   f8605ad4    ldr x20, [x22,w0,uxtw #3]

cpu in EL1 system abnormal mode from EL1_ sync-->EL1_ Da incoming do_ mem_ The X0 register of abort is as follows: mrs X0, far_el1 //el1 FAR the exception address in the exception address register X0 is 0x1999940015b4ac,

In the daily test, it is found that the address value is very random;

At present, it is suspected that during the execution of instruction ldr x20, [x22,w0,uxtw #3] by cpu, an exception occurs when accessing the register address. After the exception interrupt is generated, lr points to the instruction that currently triggers the exception

1). Check x22 register data:

#The upper level X22 saved in stack 10 is saved in stack [0xffffffc0297fc170 + 32 + 8] = [0xffffc0297fc198], and the data captured by DS5 is x22:0x00000000 00000000 
#The data of X22 register stored in stack 9 captured by DS5 is: 0xffffc000d910b8,

Use first#10. Calculate the x22 data stored in the stack in combination with the code:
    adrp    x22, ffffffc000d91000 <__key.22563>   //Calculated X22 = ffc000d91000
    add x22, x22, #0xb8 / / calculated X22 = ffc000d910b8
 After calculation x22 The data is ffffffc000d910b8,This data is consistent with#The data stored in 9-x22 is consistent            

2). Check x22,w0,uxtw #3

    x22 = 0xffffffc000d910b8
    w0  = ((unsigned long)w0)<<3 
  x22 + w0 = 0x199999940015B4AC ?
Backstepping: w0 = 0x199999D3FF3CA3F4 ? 
         w0>>3 = 0x333333A7FE7947E

#8: el1_ CPU register status data under Sync:
PC          0xFFFFFFC000083C30
SP          0xFFFFFFC0297FC050
W0          0x00001317          //Data exception
W1          0xCBD6EEA0
W2          0x0000000C
W3          0xCBD701B6
W4          0x00000001
W5          0x0035EEBC
W6          0x00CD2E21
W7          0x2064656C
W8          0x20706F74
W9          0x7F7F7F7F
W10         0xFEFEFEFF
W11         0x7F7F7F7F
W12         0x01010101
W13         0x00000038
W14         0xFFFFFFFE
W15         0x00000000
W16         0x001E1B30
W17         0x00000000
W18         0x00000000
W19         0x00005DC0
W20         0x001DC004
W21         0x00000001
W22         0x001DC068
W23         0x00000056
W24         0x96000005
W25         0x00D91000
W26         0x00B5F000
W27         0x009FA0E9
W28         0x297FC000
W29         0x297FBE10
W30         0x00353AEC

(3) Stack data saved in EL1 Mode is combined with el1 from #8 data_ Sync code flow derivation

In el1 mode: / #9-sp = #8-sp+(15*16)+(288-240)=#8-sp+288= 0xFFFFFFC0297FC170 from the code: #8-x21 = #8-sp + 288, field #8-x21=0xFFFFFFC0297FC170,

The code derivation is consistent with the field cpu data;

And the code derivation #9-sp data is consistent with the on-site cpu status data and #9-sp correct.

/#9-lr = [#8-sp+240]=[0xFFFFFFC0297FC050+240]= [0xFFFFFFC0297FC140] = (DS5 dump memory) 0xFFFFFFC0000CA014

/#9-el1 lr = [#8-sp + 256] = [0xffffc0297fc050 + 256] = [0xffffc0297fc150] = (DS5 dump memory) 0xFFFFFFC0000CA014 Code: X22 register = el1 lr, field #9-x22 register = 0xffffc0000ca014, consistent with el1 lr data;

/#9-spsr = [0xFFFFFFC0297FC050+256+8]= [0xFFFFFFC0297FC158] = (DS5 dump memory)0x00000000800001C5 deduces from the code: x23 register = el1 spsr, field x23=0x00000000800001C5, the code derivation data and cpu status data are correct;

The EL1h Mode phase is in the kernel_ The X0~X29 register assembly code of the system before abnormal interrupt is saved in the entry:

    sp = sp - (288-240)// = 0xFFFFFFC0297FC140
    push    x28, x29   // stp \xreg1,\xreg2,[sp,#-16]!
    push    x26, x27
    push    x24, x25
    push    x22, x23
    push    x20, x21
    push    x18, x19
    push    x16, x17
    push    x14, x15
    push    x12, x13
    push    x10, x11
    push    x8, x9
    push    x6, x7
    push    x4, x5
    push    x2, x3
    push    x0, x1

DS5 grab stack [#9-sp -48] ~ [#9-sp -48 -240]

Address memory data: SP [0xffffc0297fc140] ~ [0xffffc0297fc050]

Combined kernel_entry register distribution in assembly backstepping stack: x28 -- > [sp-16]: 0xffffc0297fc130 = 0xffffc0297fc000
X29-->[sp-8]:0xFFFFFFC0297FC138 = 0xFFFFFFC0297FC170

The distribution of the obtained register data is as follows:

EL1N:0xFFFFFFC0297FC050:    X0 0x00000000FFFFFFC0   X1 0x0000000000000000   X2 0x0000000000000000   X3 0x0000000000000200   X4 0x0000000000000000   X5 0x0000000000000044   X6 0xFFFFFFC000CDB33C   
EL1N:0xFFFFFFC0297FC088:    X7 0x0000000000000000   X8 0xFFFFFFC000CDB33C   X9 0x7F7F7F7F7F7F7F7F   X10 0x67531F534F4C4444  X11 0x7F7F7F7F7F7F7F7F  X12 0x0101010101010101  X13 0x0000000000000028  
EL1N:0xFFFFFFC0297FC0C0:    X14 0xFFFFFFFFFFFFFFFF  X15 0x0000000000000000  X16 0xFFFFFFC0001E1B30  X17 0x0000000000000000  X18 0x0000000000000000  X19 0xFFFFFFC012DD3440  X20 0x0000000000000001  
EL1N:0xFFFFFFC0297FC0F8:    X21 0xFFFFFFC000B5F000  X22 0xFFFFFFC000D910B8  X23 0x0000000000000000  X24 0x0000000000000000  X25 0xFFFFFFC000D91000  X26 0xFFFFFFC000B5F000  X27 0xFFFFFFC0009FA0E9  
EL1N:0xFFFFFFC0297FC130:    X28 0xFFFFFFC0297FC000  X29 0xFFFFFFC0297FC170  

(4)

el1_sync-->kernel_entry 
                     el=1  
                     sp = sp - (288-240)
                                         sp = sp - (15*16) //Stack X0-X29 registers
                                         x21 register = sp + 288
                                         x22 register = el1 lr
                                         x23 register = el1 spsr
                                         take lr Register stack [sp + 240]        //LR
                                         take x21 Register value stack[sp + 240 + 8]
                                         take x22 Register stack value[sp + 256]   //PC
                                         take x23 Register value stack[sp + 256 +8]
  el1_da:
                                     x2 = sp

Field stack data:

X19         0xFFFFFFC012DD3440
X20         0x0000000000000001
X21         0xFFFFFFC0297FC170
X22         0xFFFFFFC0000CA014
X23         0x00000000800001C5
X24         0x0000000000000025
X25         0xFFFFFFC000D91000
X26         0xFFFFFFC000B5F000
X27         0xFFFFFFC0009FA0E9
X28         0xFFFFFFC0297FC000
X29         0xFFFFFFC0297FC170
PC          0xFFFFFFC000083C30
SP          0xFFFFFFC0297FC050

Code derivation: #8-sp = #7-sp + 176 = 0xFFFFFFC0297FBFA0 + 176 = 0xFFFFFFC0297FC050 the theoretical value deduced from #7-sp to #8-sp is consistent with the field cpu SP stack data, and #8-sp is consistent with #8-X21 - and the data is normal

(5)

data_bad-->do_mem_abort
                    x29 Register stack[sp-176]
                    x30 Register stack[sp-176+8]  
                    sp = sp - 176               
                    x29 = sp

Field stack data:

X19         0x0000000096000005
X20         0xFFFFFFC800D90EB8
X21         0xFFFFFFC000B70E90
X22         0xFFFFFFC0297FC050
X23         0x00000000800001C5
X24         0x0000000000000025
X25         0xFFFFFFC000D91000
X26         0xFFFFFFC000B5F000
X27         0xFFFFFFC0009FA0E9
X28         0xFFFFFFC0297FC000
X29         0xFFFFFFC0297FBFA0
PC          0xFFFFFFC000081238
SP          0xFFFFFFC0297FBFA0

Code derivation: #7-sp = #6-sp + 48 = 0xFFFFFFC0297FBF70 +48 = 0xFFFFFFC0297FBFA0 the theoretical value of #7-sp deduced from #6-sp is consistent with the field cpu SP stack data, and #7-sp is consistent with #7-X29, and the data is normal

(6)

do_mem_abort-->do_translation_fault
                    x29-->[sp-48]
                    x30-->[sp-48+8]
                    sp = sp - 48
                    x29 = sp

Field stack data:

X19         0xFFFFFFC0297FC050
X20         0xFFFFFFC800D90EB8
X21         0x0000000096000005
X22         0xFFFFFFC029BF8680
X23         0x00000000800001C5
X24         0x0000000000000025
X25         0xFFFFFFC000D91000
X26         0xFFFFFFC000B5F000
X27         0xFFFFFFC0009FA0E9
X28         0xFFFFFFC0297FC000
X29         0xFFFFFFC0297FBF70
PC          0xFFFFFFC000095A64
SP          0xFFFFFFC0297FBF70

Conclusion:

#5-sp + 48 = 0xffffc0297fbf40 + 48 = 0xffffc0297fbf70 from #5-sp

The theoretical value extrapolated to #6-sp is consistent with the field stack data,

And #6-sp is consistent with #6-X29, and the data is normal

 

Topics: Linux