A system and method for automated recovery of processing of a unit of work
during an error in a batch processing system is disclosed. Generally, at
least a portion of a unit of work and instructions of operations to
perform to process the portion of the unit of work is sent to a worker
data structure. A periodic heartbeat is received from the worker data
structure indicating the worker data structure is processing the at least
a portion of the unit of work. If an unexpected termination of the worker
data structure is detected, a signal is sent to a crash handler data
structure instructing the crash handler data structure to detect and
store a current input location of the at a portion of the unit of work.
The records from the current location at the crash are skipped during
reprocessing of the unit of work to increase the chances of success
during reprocessing.