DEC AlphaServer 8200 Technical Manual page 72

Hide thumbs Also See for AlphaServer 8200:
Table of Contents

Advertisement

The CSR registers contain information about the error. The commander's
TLBER register contains either correctable or uncorrectable error status,
and the TLFADRn registers contain the command code, bank number, and
possibly the address. If TLSB_DATA_ERROR asserted, the node that
transmitted the data will have set the <DTDE>. If <DTDE> is set in a
memory node, there were only two nodes involved in the data transfer. If
<DTDE> is set in a node with cache, this is the third node that transmit-
ted dirty data. In this case <DTDE> is not set in the memory node. Error
bits in the node that transmitted the data will provide information about
where the error originated.
1.
2.
3.
Correctable read data error interrupts may be disabled. This is usually
done after the system has logged a number of these errors and may discon-
tinue logging, but software prefers to continue collecting error information.
The system can continue to operate reliably while software polls for error
information because the data will be corrected and multiple-bit errors will
still cause interrupts. Excessive single-bit read data errors usually indi-
cates a failing memory, which should eventually be replaced. The system
has probably already logged enough errors to identify the faulty memory
module.
Disabling correctable read data errors involves setting <CRDD> in the TL-
CNR register of all nodes in the system. The <CRDD> bit tells all nodes to
disable asserting TLSB_DATA_ERROR on correctable read data errors.
Commander nodes must also provide a means to disable any other actions
they would normally take to inform the data requester of the error, which
is usually an interrupt to a CPU.
Error detection is not disabled. Error bits will still set in the CSR registers
of all nodes that detect a correctable read data error. Memory nodes will
still latch the address of the first such error in the TLFADRn registers. A
CPU may poll these CSR registers to see if the errors are still occurring. If
a correctable data error occurs on a write, or any uncorrectable data error
occurs, the status registers are overwritten and the requester gets inter-
rupted.
Double-bit error interrupts cannot be disabled.
2-44 TLSB Bus
If the transmitting node has no error bits set, the data became cor-
rupted either in the commander's receivers or on the bus between the
two nodes.
If the transmitting node has CRDE (correctable read data error) or
UDE (uncorrectable data error) set in the TLBER register, the data
was corrupted at the transmitting node; but analysis of the TLESRn
registers is necessary to learn more. Which of the four TLESRn regis-
ters to look at can be determined by which DSn bits are set in the TL-
BER register. If <TCE> is set, the node failed while writing the data
to the bus. This is most likely a hardware failure on the module, but
could also be the result of another node driving data at the same time
or a bus failure.
If the transmitting node has <CRDE> or <UDE> set in the TLBER
register but not <TCE> in the TLESRn register, the data is most likely
corrupted in storage (cache or memory). If the transmitting node is a
memory, the address is definitely latched in the node's TLFADRn reg-
isters and that physical address could be tested and possibly mapped
as bad and not used again.

Advertisement

Table of Contents
loading

This manual is also suitable for:

Alphaserver 8400

Table of Contents