Many computing system environments require continuous availability and
high operational readiness. The ability to find, diagnose, and correct
actual faults and potential faults in these systems is a high priority.
By combining a continually updated database of computing system
performance with the ability to analyze that information to detect faults
and then communicating that fault information to correct the fault or
provide appropriate notification of the fault results in achieving the
goals of high availability and operational readiness. FIG. (1) shows how
the data collectors, fault detectors and policy actions are combined to
meet those goals.