Detecting Double-Bit Errors
The following message appears on the console when ECC detects a double-bit error:
ALERT 05/10/2004 13:10:33 os: failed: ECC DOUBLE BIT ERROR OCCURRED
ALERT 05/10/2004 13:10:34 os: PROCESSOR EXCEPTION: 0x200n
When ECC detects a double-bit error in a system that contains a redundant SRP
module, the redundant module becomes active and the system continues to operate.
However, you must still troubleshoot the SRP module with the double-bit error. When
ECC detects a double-bit error in a system that does not contain a redundant SRP
module, you must troubleshoot the SRP module immediately. See "Fixing Double-Bit
Errors" on page 87.
Fixing Double-Bit Errors
To fix a double-bit error:
1.
2.
These actions attempt to correct a transient double-bit error. However, if the console
displays a memory test failure for the SRP module after you reboot, or if the FAIL
LED on the SRP module stays on during rebooting, the SDRAM is permanently
damaged and needs replacing. In this event, call the Juniper Networks Technical
Assistance Center to arrange for repair.
If ECC detects a double-bit error, it logs the error, stops the main processor on
the controller, and takes the SRP module offline.
Address = 0xe95db10
Data (Upper 32Bits) = 0xe95db20
Data (Lower 32Bits) = 0x55d06c
ECC Data Bits =
0x2b
ECC 1Bit Error Counter =
*** YOU MUST PERFORM A HARD RESET TO CONTINUE ***
Remove the second SRP module, if there is one.
Reboot the system with the module reset button on the primary SRP module.
(See Figure 5 on page 8.)
0x0
Double-Bit Errors on SRP Modules
Chapter 9: Troubleshooting
87