IBM Power 750 Technical Overview And Introduction page 180

Hide thumbs Also See for Power 750:
Table of Contents

Advertisement

Data that contains information about the effect that the repair will have on the system is also
included. Error log routines in the operating system and FSP can then use this information
and decide whether the fault is a call-home candidate. If the fault requires support
intervention, a call will be placed with service and support, and a notification will be sent to
the contact that is defined in the ESA guided setup wizard
Remote support
The Remote Management and Control (RMC) subsystem is delivered as part of the base
operating system, including the operating system that runs on the Hardware Management
Console. RMC provides a secure transport mechanism across the LAN interface between the
operating system and the Hardware Management Console and is used by the operating
system diagnostic application for transmitting error information. It performs several other
functions also, but these are not used for the service infrastructure.
Service Focal Point (SFP)
A critical requirement in a logically partitioned environment is to ensure that errors are not lost
before being reported for service, and that an error should only be reported once, regardless
of how many logical partitions experience the potential effect of the error. The Manage
Serviceable Events task on the management console is responsible for aggregating duplicate
error reports, and ensures that all errors are recorded for review and management.
When a local or globally reported service request is made to the operating system, the
operating system diagnostic subsystem uses the Remote Management and Control
subsystem to relay error information to the Hardware Management Console. For global
events (platform unrecoverable errors, for example) the service processor will also forward
error notification of these events to the Hardware Management Console, providing a
redundant error-reporting path in case of errors in the Remote Management and Control
subsystem network.
The first occurrence of each failure type is recorded in the Manage Serviceable Events task
on the management console. This task then filters and maintains a history of duplicate
reports from other logical partitions on the service processor. It then looks at all active service
event requests, analyzes the failure to ascertain the root cause, and, if enabled, initiates a call
home for service. This methodology ensures that all platform errors will be reported through
at least one functional path, ultimately resulting in a single notification for a single problem.
Extended error data
Extended error data (EED) is additional data that is collected either automatically at the time
of a failure or manually at a later time. The data that is collected is dependent on the
invocation method but includes information like firmware levels, operating system levels,
additional fault isolation register values, recoverable error threshold register values, system
status, and any other pertinent data.
The data is formatted and prepared for transmission back to IBM either to assist the service
support organization with preparing a service action plan for the service representative or for
additional analysis.
System-dump handling
In certain circumstances, an error might require a dump to be automatically or manually
created. In this event, it is off-loaded to the management console. Specific management
console information is included as part of the information that can optionally be sent to IBM
support for analysis. If additional information relating to the dump is required, or if viewing the
dump remotely becomes necessary, the management console dump record notifies the IBM
support center regarding on which management console the dump is located.
166
IBM Power 750 and 760 Technical Overview and Introduction

Hide quick links:

Advertisement

Table of Contents
loading

This manual is also suitable for:

Power 760

Table of Contents