Integrated Management LogThe Integrated Management Log records system
events, critical errors, power-on message errors, and memory errors. The
log also records catastrophic hardware and software errors that typically
cause a system to fail. This information helps to quickly identify and
correct the problem and minimize downtime. Each event log entry has a status to identify the severity of the event
(OID: 1.3.6.1.4.1.232.6.2.11.3.1.2 - cpqHeEventLogEntrySeverity):
Informational
- General information about a system event.
Repaired
- Indication that this entry has been repaired. Users must mark entries
as repaired.
Caution
- Indication that a non-fatal error condition has occurred.
Critical
- A component of the system has failed.
If any events in the log have a condition of
Caution, the overall log condition will be marked as degraded.
If Critical events exist in the log, the overall log condition will be marked as failed. To clear a degraded or failed event log, mark
the log entry as repaired after you have repaired the condition that caused
a log entry to be generated. Perform the following steps. Highlight the log entries in the Integrated Management Log. Click the [Mark Repaired] button.
This button is located at the bottom of the Integrated
Management Log Section of the Web Browser.
The description column gives a brief description
of the error or event (OID: 1.3.6.1.4.1.232.6.2.11.3.1.8 - cpqHeEventLogErrorDesc). The update time column contains the
last time this log was updated (OID: 1.3.6.1.4.1.232.6.2.11.3.1.6 - cpqHeEventLogInitialTime). The status column contains
the status of the log entry (OID: 1.3.6.1.4.1.232.6.2.11.3.1.7 - cpqHeEventLogUpdateTime). Refer to the Integrated Management Log User
Guide for more information.
Critical ErrorsThe Critical Error Log records non-correctable
memory errors, as well as catastrophic hardware and software errors that
cause a system to fail. This information helps you quickly identify and
correct the problem, minimizing downtime. This section displays a description of critical
errors. The date and time of each error is followed by a brief description
of the error. The time shown is rounded to the nearest hour. If critical errors are marked with an exclamation
point (!), indicating corrective action is required, the log condition
is degraded. To eliminate the exclamation mark and indicate that an entry
has been corrected, select the entries you wish to clear and click the
[Correct Marked Entries] button or run
Diagnostics on the device. An asterisk
( * ) indicates the log entry to which the Last Failure Message applies. The following list displays errors that may
be logged. If you receive any of these errors, run Diagnostics on your
system or consult your software documentation.
Abnormal Program
Termination - The device has detected a fatal software error resulting
in a device failure.
ASR Base Memory
Parity Error - The system detected a data error in base memory
following a reset due to an ASR timeout.
ASR Extended
Memory Parity Error - The system detected a data error in extended
memory following a reset due to an ASR timeout.
ASR Memory
Parity Error - The system ROM was unable to allocate enough memory
to create a stack. It was unable to put a message on the screen or continue
booting the server.
ASR Reset
Limit Reached - The maximum number of system resets has been reached.
The HP System Configuration Utility will be loaded.
ASR Reset
Occurred - No error data is logged.
ASR Test Event
- An ASR Test event was generated by the user through the system utilities.
No action is required since the event was user-generated to test the ASR
configuration.
ASR Timeout
NMI - The server has generated an ASR NMI because the ASR timer
has not been refreshed. This generally indicates a driver has not relinquished
control of the processor causing a server failure. The resulting ASR NMI
was generated to log this event. Note the module that was executing.
CPU Internal
Corrected Error Threshold Exceeded - The system has detected that
a CPU has exceeded the threshold for the number of internal ECC cache
errors.
CPU Processor
Power Module Failed - The system has detected that a processor's
power module has failed.
Critical Temperature
- The system's critical temperature has been exceeded and auto shutdown
has been initiated.
Error Detected
On Bootup - The system detected an error during the Power-On Self-Test.
Exception
- The processor has detected a critical exception resulting in a device
failure.
Fan Failure
- The system or processor fan failed.
NMI - CPU
Local Error - The processor experienced a fatal error resulting
in a device failure.
NMI - Expansion
Board Error - A board on the expansion bus indicated an error condition
causing a device failure.
NMI - Expansion
Bus Arbitration Error - Memory refresh cycles were delayed, potentially
leading to data loss. The error results in a system failure.
NMI - Expansion
Bus Master Time-out - A bus master expansion board in the indicated
slot did not release the bus after its maximum time resulting in a device
failure.
NMI - Expansion
Bus Slave Time-out - A board on the expansion bus delayed a bus
cycle beyond the maximum time resulting in a device failure.
NMI - Failsafe
Timer Expiration - The software was unable to reset the system
failsafe timer, resulting in a system failure.
NMI - Processor
Address Error 1 - A processor internal address parity checking
error occurred, resulting in a device failure.
NMI - Processor
Address Error 2 - The processor detected an address parity error
during an inquire cycle.
NMI - Processor
Cache Parity Error - A data error occurred within the processor
cache, resulting in a system failure.
NMI - Processor
Internal Error 1 - A processor internal parity error occurred,
resulting in a device failure.
NMI - Processor
Internal Error 2 - The processor detected an internal parity error
or a functional redundancy error.
NMI - Processor
Parity Error - The processor detected a data error resulting in
a device failure.
NMI - Software
Generated Interrupt - Software indicated a system error resulting
in a system failure.
NMI - System
Concurrency Error - A potential error condition was detected within
the Data Flow Manager, resulting in a system failure.
NMI - Uncorrectable
Memory Error - The device experienced an uncorrectable memory parity
error resulting in a device failure.
NMI - Unknown
Error Type - The device driver does not recognize this NMI. You
may need to upgrade your health driver.
Processor
Failure - The processor failed during the Power-On Self-Test.
Server Manager
Failure - An error occurred in the server interface with the Server
Manager/R.
UPS A/C Line
Failure/Shutdown or Battery Low - The device has initiated a UPS
or operating system shutdown, or the battery is almost depleted after
an AC line failure.
The Last Failure Message on this window displays
the last failure message associated with a critical error.
Correctable ErrorsThis alarm indicates that a block of memory
has failed or is failing and may need to be replaced. This condition is
generally non-critical since the memory controller can correct the problem.
However, this type of error indicates that a memory component is failing
or has failed in the system issuing the alarm. The system continues to
correct any errors it can. Memory errors are corrected by the ECC memory
subsystem when they occur. If you notice an increase in these errors,
correct the problems as soon as possible. Further degradation of the memory
components may occur, and then errors may no longer be correctable
Power On MessagesThis section displays the Power-On messages
logged when the device was turned on. Refer to your device documentation
for a listing of possible Power-On error messages and their meanings.
Click the [Clear Power-On Message] button to clear the power-on message
log. This button is only available if there are messages to clear.
Remote Insight LogThe Event Log section displays the list of
events stored in the Remote Insight Board event log. A user with the appropriate
authority can clear these events. Each
event includes the following information:
Index
displays a numeric index for each event.
Time
of Event displays the time the event occurred. OID: 1.3.6.1.4.1.232.9.2.3.2.1.3 - cpqSm2EventLogDate
Description
displays a text description of the event. OID: 1.3.6.1.4.1.232.9.2.3.2.1.4 - cpqSm2EventLogMessage
|