IO拡張ポートのPLX PCIe 8732スイッチでパニックが発生しました
環境
- ONTAP 9
- FAS8080
問題
- FAS8080システムでは、Service Processor(SPsystem log;サービスプロセッサ)」でNMIのパニック状態が発生するのは次のような状態です。
例:
PANIC: PCI Error NMI from device(s):ErrSrcID(CorrSrc(0),UCorrSrc(0x8010)), RPT(128,2,0):PLX PCIE 8732 switch on IO Expansion, PLX PCIE 8732 switch on IO Expansion, ErrSrcID(CorrSrc(0),UCorrSrc(0x8018)), RPT(128,3,0):PLX PCIE 8732 switch on IO Expansion, PLX PCIE 8732 switch on IO Expansion.  in SK process wafl_exempt18 on release 9.3P18 (C)
- SPコンソールevents all""はエラー後に表示されます。
例:
Record 28: Mon Jul 06 14:40:29.620402 2020 [Agent.notice]: 990.526: 160 : IOXM Fan_B1 Present de-asserted
 Record 29: Mon Jul 06 14:40:29.634230 2020 [Agent.notice]: 990.526: 161 : IOXM Fan_B2 Present de-asserted
 Record 30: Mon Jul 06 14:40:29.634821 2020 [Agent.notice]: 990.526: 162 : IOXM Fan_B3 Present de-asserted
 Record 31: Mon Jul 06 14:40:29.691246 2020 [Agent.notice]: 062.792: 43 : CPU 1 Correctable Error 3 asserted
 Record 32: Mon Jul 06 14:40:29.691429 2020 [Agent.notice]: 062.792: 29 : Non-maskable Interrupt from PCH to CPU asserted
 Record 33: Mon Jul 06 14:40:29.742088 2020 [Agent.notice]: 113.649: 42 : CPU 1 Correctable Error 2 asserted
 Record 34: Mon Jul 06 14:40:29.814207 2020 [Agent.notice]: 185.883: 42 : CPU 1 Correctable Error 2 de-asserted
 Record 35: Mon Jul 06 14:40:29.814382 2020 [Agent.notice]: 185.883: 43 : CPU 1 Correctable Error 3 de-asserted
 Record 36: Mon Jul 06 14:40:29.814523 2020 [Agent.notice]: 185.889: 29 : Non-maskable Interrupt from PCH to CPU de-asserted
 Record 37: Mon Jul 06 14:40:30.041525 2020 [IPMI.warning]: Error while reading sensor number : 44
 Record 38: Mon Jul 06 14:40:30.053568 2020 [IPMI.notice]: 0202 | c0 | OEM: f9ff7020ff2c | ManufId: 150300 | Undefined
 Record 39: Mon Jul 06 14:40:30.917594 2020 [IPMI.warning]: Error while reading sensor number : 45
 Record 40: Mon Jul 06 14:40:30.931970 2020 [IPMI.notice]: 0302 | c0 | OEM: f9ff7020ff2d | ManufId: 150300 | Undefined
 Record 41: Mon Jul 06 14:40:32.325609 2020 [IPMI.warning]: Error while reading sensor number : 189
 Record 42: Mon Jul 06 14:40:32.342545 2020 [IPMI.notice]: 0402 | c0 | OEM: f9ff7020ffbd | ManufId: 150300 | Undefined
 Record 43: Mon Jul 06 14:40:32.749519 2020 [IPMI.warning]: Error while reading sensor number : 190
 Record 44: Mon Jul 06 14:40:32.757665 2020 [IPMI.notice]: 0502 | c0 | OEM: f9ff7020ffbe | ManufId: 150300 | Undefined
 Record 45: Mon Jul 06 14:40:32.778577 2020 [IPMI.notice]: 0602 | 02 | EVT: 6f01ffff | IOfan1_Present | Assertion Event, "Absent"
 Record 46: Mon Jul 06 14:40:32.793519 2020 [IPMI.notice]: 0702 | 02 | EVT: 6f01ffff | IOfan2_Present | Assertion Event, "Absent"
 Record 47: Mon Jul 06 14:40:32.809474 2020 [IPMI.notice]: 0802 | 02 | EVT: 6f01ffff | IOfan3_Present | Assertion Event, "Absent"
 Record 48: Mon Jul 06 14:40:32.898375 2020 [Agent.notice]: 269.611: 14 : Attention LED (at Midplane) asserted
 Record 49: Mon Jul 06 14:40:35.640291 2020 [Agent.notice]: 005.571: 42 : CPU 1 Correctable Error 2 asserted
 Record 50: Mon Jul 06 14:40:35.640474 2020 [Agent.notice]: 005.571: 29 : Non-maskable Interrupt from PCH to CPU asserted
 Record 51: Mon Jul 06 14:40:36.061511 2020 [IPMI.warning]: Error while reading sensor number : 14
 Record 52: Mon Jul 06 14:40:36.067951 2020 [IPMI.notice]: 0902 | c0 | OEM: f9ff7020ff0e | ManufId: 150300 | Undefined
 Record 53: Mon Jul 06 14:40:36.477533 2020 [IPMI.warning]: Error while reading sensor number : 15
 Record 54: Mon Jul 06 14:40:36.484003 2020 [IPMI.notice]: 0a02 | c0 | OEM: f9ff7020ff0f | ManufId: 150300 | Undefined
 Record 55: Mon Jul 06 14:40:39.797520 2020 [IPMI.warning]: Error while reading sensor number : 42
 Record 56: Mon Jul 06 14:40:39.803964 2020 [IPMI.notice]: 0b02 | c0 | OEM: f9ff7020ff2a | ManufId: 150300 | Undefined
 Record 57: Mon Jul 06 14:40:40.213512 2020 [IPMI.warning]: Error while reading sensor number : 43
 Record 58: Mon Jul 06 14:40:40.220325 2020 [IPMI.notice]: 0c02 | c0 | OEM: f9ff7020ff2b | ManufId: 150300 | Undefined
 Record 59: Mon Jul 06 14:40:29.000000 2020 [Controller.notice]: Appliance panic. See logs for cause of panic.
 Record 60: Mon Jul 06 14:40:55.945497 2020 [IPMI.notice]: 0d02 | 02 | EVT: 6f406fff | Sensor 255 | Assertion Event, "Storage OS stop/shutdown"
 Record 61: Mon Jul 06 14:40:56.226475 2020 [Agent.notice]: 597.574: 11 : Controller Attention LED asserted
 Record 62: Mon Jul 06 14:40:56.830983 2020 [Agent.notice]: 202.552: 49 : PCH Platform Reset asserted
 Record 63: Mon Jul 06 14:40:56.831147 2020 [Agent.notice]: 202.552: 29 : Non-maskable Interrupt from PCH to CPU de-asserted
 Record 64: Mon Jul 06 14:40:56.831291 2020 [Agent.notice]: 202.612: 63 : BIOS Complete from PCH de-asserted
 Record 65: Mon Jul 06 14:40:56.831884 2020 [Agent.notice]: 203.592: 42 : CPU 1 Correctable Error 2 de-asserted
 Record 66: Mon Jul 06 14:40:56.839826 2020 [Agent.notice]: 211.285: 49 : PCH Platform Reset de-asserted
 Record 67: Mon Jul 06 14:40:56.908653 2020 [SP.critical]: Filer Reboots
- SPコンソールsystem sensors""がIOXM周辺のステータスを確認できません。
例:
Sensor Name    | Current   | Unit     | Status    | LCR     | LNC     | UNC     | UCR
 -----------------+------------+------------+------------+-----------+-----------+-----------+-----------
 IO_InFlow_Temp   | na     | degrees C  | na     | 0.000    | 10.000   | 53.000   | 63.000
 IO_OutFlow_Temp  | na     | degrees C  | na     | 0.000    | 10.000   | 62.000   | 72.000
 IO_Riser_R_Temp  | na     | degrees C  | na     | 0.000    | 10.000   | 54.000   | 64.000
 IO_Riser_L_Temp  | na     | degrees C  | na     | 0.000    | 10.000   | 53.000   | 63.000
 IO_12V       | na     | Volts    | na     | 0.000    | 0.000    | 32.130   | 32.130
 IO_12V_Curr    | na     | Amps     | na     | 0.000    | 0.000    | 63.750   | 63.750
 IO_STDBY_12V    | na     | Volts    | na     | 0.000    | 0.000    | 32.130   | 32.130
 IO_STDBY_12V_Cur | na     | Amps     | na     | na    | na    | 3.188    | 3.188
 IOfan1_Present   | 0x0    | discrete   | Absent    | na    | na    | na    | na
 IOfan1_Fault    | na     | discrete   | na     | na    | na     | na    | na
 IOfan1_F1_Speed  | na     | RPM     | na     | 1950.000  | 2040.000  | na    | na
 IOfan1_F2_Speed  | na     | RPM     | na     | 1950.000  | 2040.000  | na    | na
 IOfan2_Present   | 0x0     | discrete   | Absent    | na    | na    | na    | na
 IOfan2_Fault    | na     | discrete   | na     | na    | na    | na    | na
 IOfan2_F1_Speed  | na     | RPM     | na     | 1950.000  | 2040.000  | na    | na
 IOfan2_F2_Speed  | na     | RPM     | na     | 1950.000  | 2040.000  | na    | na
 IOfan3_Present   | 0x0     | discrete   | Absent    | na    | na    | na    | na
 IOfan3_Fault    | na     | discrete   | na     | na    | na    | na    | na
 IOfan3_F1_Speed  | na     | RPM     | na     | 1950.000  | 2040.000  | na    | na
 IOfan3_F2_Speed  | na     | RPM     | na     | 1950.000  | 2040.000  | na    | na