One embodiment of the present invention provides a system that determines
the cause of a correctable memory error. First, the system detects a
correctable error during an access to a memory location in a main memory
by a first processor, wherein the correctable error is detected by error
detection and correction circuitry. Next, the system reads tag bits for a
cache line associated with the memory location, wherein the tag bits
contain address information for the cache line, as well as state
information indicating a coherency protocol state for the cache line. The
system then tests the memory location by causing the first processor to
perform read and write operations to the memory location to produce test
results. Finally, the system uses the test results and the tag bits to
determine the cause of the correctable error, if possible.