AFF A250 PCIeで修正不能なエラーが検出されました:バス:100
環境
AFF A250
問題
- ノードの停止日時:
Mar 26 11:00:00 [node_name:monitor.shutdown.brokenDisk:EMERGENCY]: data disk,parity disk in RAID group "/aggregate/plex0/rg0" are broken. Halting system now.- ブートループエラー:
Uncorrectable error detected at PCIE:Bus:100 Dev:0 Fun:0 for 2 time(s)!!!!!!!!Machine Check MC-Bank:6 - Status: 0xBB80000000000E0B, ADDR: 0x0000000000000000, MISC: 0x0000000064000000 !!!!!!!! X64 Exception Type - 12(#MC - Machine-Check) CPU Apic ID - 00000002 !!!!RIP - 0000000077B708DE, CS - 0000000000000038, RFLAGS - 0000000000000002RAX - 0000000000000000, RCX - 0000000077B12500, RDX - 0000000000000005RBX - 0000000077B62300, RSP - 0000000077B31A40, RBP - 0000000000000001RSI - 000000000003E2B4, RDI - 0000000077B52800R8 - 0000000000000005, R9 - 0000000000000001, R10 - 0000000000000000R11 - 0000000077B12500, R12 - 0000000000000000, R13 - 0000000000000000R14 - 0000000000000000, R15 - 0000000000000000DS - 0000000000000020, ES - 0000000000000020, FS - 0000000000000020GS - 0000000000000020, SS - 0000000000000020CR0 - 0000000080000033, CR2 - 0000000000000000, CR3 - 0000000077B14000CR4 - 0000000000000668, CR8 - 0000000000000000DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 - 0000000000000400GDTR - 0000000077B29128 000000000000004F, LDTR - 0000000000000000IDTR - 0000000077B88300 00000000000001FF, TR - 0000000000000040FXSAVE_STATE - 0000000077B316A0!!!! Find PE image f:\jb\bddo0\Build\YubaCity\DEBUG_MYTOOLS\X64\PurleySktPkg\Override\IA32FamilyCpuPkg\PiSmmCpuDxeSmm\PiSmmCpuDxeSmm\DEBUG\PiSmmCpuDxeSmm.pdb (ImageBase=0000000077B68000, EntryPoint=0000000077B68340) !!!!Copyright(c) 2020 American Megatrends, Inc.- PCIe リンクエラー:
Tue Mar 30 09:53:23 +0000 [node_name: kernel: nvme.link.error:error]: PCIe link initialization error for NVMe SSD in slot 4.Mon Mar 22 19:44:01 +0000 [node_name: kernel: nvme.link.disabled.error:error]: PCIe link disabled for NVMe SSD in slot 4 due to excessive errors.Thu Mar 25 00:07:05 +0000 [node_name: kernel: nvme.link.disabled.error:error]: PCIe link disabled for NVMe SSD in slot 2 due to excessive errors.Thu Mar 25 10:59:12 +0000 [node_name: kernel: nvme.link.disabled.error:error]: PCIe link disabled for NVMe SSD in slot 1 due to excessive errors.Tue Mar 30 11:45:14 +0000 [node_name: SKL cerror: pcie.stealth.errors:debug]: params: {'pcie_errors': 'IIO2: RPT(100,0,0): RPT(100,0,0): SecStatus(RcvMstAbt); PLX PCIE 9797 switch on Controller, Br[9797](102,4,0): RcvErr(P7(255)), Br[9797](102,7,0): BadTLP(8), BadDLLP(3470); '}