Page 1
CICS Transaction Server for z/OS Version 4 Release 1 Recovery and Restart Guide SC34-7012-02...
Page 3
CICS Transaction Server for z/OS Version 4 Release 1 Recovery and Restart Guide SC34-7012-02...
Page 4
Before using this information and the product it supports, read the information in “Notices” on page 243. This edition applies to Version 4 Release 1 of CICS Transaction Server for z/OS (product number 5655-S97) and to all subsequent releases and modifications until otherwise indicated in new editions.
The COVR transaction . Messages associated with automatic restart . Automatic restart of CICS data-sharing servers Server ARM processing . Chapter 8. Unit of work recovery and abend processing ..73 Unit of work recovery .
Input extrapartition data sets . Output extrapartition data sets Using post-initialization (PLTPI) programs Recovery for temporary storage . Backward recovery Forward recovery . Recovery for Web services . Configuring CICS to support persistent messages . Defining local queues in a service provider . Persistent message processing .
. 228 . 229 Bibliography ... . 245 . 231 CICS books for CICS Transaction Server for z/OS . 231 CICSPlex SM books for CICS Transaction Server . 233 for z/OS .
Page 10
viii CICS TS for z/OS 4.1: Recovery and Restart Guide...
Page 11
What's New in the information center, or the following publications: v CICS Transaction Server for z/OS What's New v CICS Transaction Server for z/OS Upgrading from CICS TS Version 3.2 v CICS Transaction Server for z/OS Upgrading from CICS TS Version 3.1 v CICS Transaction Server for z/OS Upgrading from CICS TS Version 2.3...
Page 12
CICS TS for z/OS 4.1: Recovery and Restart Guide...
In general, forward recovery is applicable to data set failures, or failures in similar data resources, which cause data to become unusable because it has been corrupted or because the physical storage medium has been damaged. Minimizing the effect of failures An online system should limit the effect of any failure.
Another way is to shut down CICS with an immediate shutdown and perform the forward recovery, after which a CICS emergency restart performs the backward recovery. Recoverable resources In CICS, a recoverable resource is any resource with recorded recovery information that can be recovered by backout.
v In the event of an emergency restart, when CICS backs out all those transactions that were in-flight at the time of the CICS failure (emergency restart backout). Although these occur in different situations, CICS uses the same backout process in each case.
The recovery manager also drives: v The backout processing for any units of work that were in a backout-failed state at the time of the CICS failure v The commit processing for any units of work that had not finished commit processing at the time of failure (for example, for resource definitions that were being installed when CICS failed) v The commit processing for any units of work that were in a commit-failed state...
Forward recovery journal names are of the form DFHJnn where nn is a number in the range 1–99 and is obtained from the forward recovery log id (FWDRECOVLOG) in the FILE resource definition. In this case, CICS creates a journal entry for the forward recovery log, which can be mapped by a JOURNALMODEL resource definition.
Page 21
2. If the failure occurs during the execution of a CICS syncpoint, where the conversation is with another resource manager (perhaps in another CICS region), CICS handles the resynchronization. This is described in the CICS Intercommunication Guide. If the link fails and is later reestablished, CICS and its partners use the SNA set-and-test-sequence-numbers (STSN) command to find out what they were doing (backout or commit) at the time of link failure.
When the operator replies to IXC402D, the CICS interregion communication program, DFHIRP, is notified and the suspended tasks are abended, and MRO connections closed. Until the reply is issued to IXC402D, an INQUIRE CONNECTION command continues to show connections to regions in the failed MVS as in service and normal.
Page 23
The CICS recovery manager then uses the information retrieved from the system log to: v Back out recoverable resources. v Recover changes to terminal resource definitions. (All resource definitions installed at the time of the CICS failure are initially restored from the CICS global catalog.) A special case of CICS processing following a system failure is covered in Chapter 6, “CICS emergency restart,”...
Page 24
CICS TS for z/OS 4.1: Recovery and Restart Guide...
For coupling facility data tables updated under the locking model, the coupling facility data table server stores the lock with its record in the CFDT. As in the case of RLS locks, storing the lock with its record in the coupling facility list structure that holds the coupling facility data table ensures sysplex-wide locking at record level.
– EXEC CICS CREATE CONNECTION COMPLETE – EXEC CICS DISCARD CONNECTION – EXEC CICS DISCARD TERMINAL A UOW that does not change a recoverable resource has no meaningful effect for the CICS recovery mechanisms. Nonrecoverable resources are never backed out. A unit of work can also be ended by backout, which causes a syncpoint in one of the following ways: v Implicitly when a transaction terminates abnormally, and CICS performs...
Unit of work Task A UOW 1 Task B Commit Mod 1 Task C Abbreviations: EOT: End of task UOW: Unit of work Mod: Modification to database SOT: Start of task Syncpoint Moment of system failure Figure 2. Backout of units of work CICS recovery manager The recovery manager ensures the integrity and consistency of resources (such as files and databases) both within a single CICS region and distributed over...
Premature termination of the UOW because of transaction failure v Receipt of a syncpoint request v Entry into the indoubt period during two-phase commit processing (see the CICS Transaction Server for z/OS Glossary for a definition of two-phase commit) CICS TS for z/OS 4.1: Recovery and Restart Guide...
v Notification that the resource is not available, requiring temporary suspension (shunting) of the UOW v Notification that the resource is available, enabling retry of shunted UOWs v Notification that a connection is reestablished, and can deliver a commit or rollback (backout) decision v Syncpoint rollback v Normal termination of the UOW...
others. This can happen, for example, if two data sets are updated and the UOW has to be backed out, and the following happens: v One resource backs out successfully v While committing this successful backout, the commit fails v The other resource fails to back out These events leave one data set commit-failed, and the other backout-failed.
Resynchronization after system or connection failure Units of work that fail while in an indoubt state remain shunted until the indoubt state can be resolved following successful resynchronization with the coordinator. Resynchronization takes place automatically when communications are next established between subordinate and coordinator. Any decisions held by the coordinator are passed to the subordinate, and indoubt units of work complete normally.
CICS also writes “backout-failed” records to the system log if a failure occurs in backout processing of a VSAM data set during dynamic backout or emergency restart backout. Records on the system log are used for cold, warm, and emergency restarts of a CICS region.
Page 35
v User journaling is entirely under your application programs’ control. You write records for your own purpose using EXEC CICS WRITE JOURNALNAME commands. See “Flushing journal buffers” on page 28 for information about CICS shutdown considerations. v Automatic journaling means that CICS automatically writes records to a log stream, referenced by the journal name specified in a journal model definition, as a result of: –...
Page 36
CICS TS for z/OS 4.1: Recovery and Restart Guide...
v The DFHCESD program started by the CICS-supplied transaction, CESD, attempts to purge and back out long-running tasks using increasingly stronger methods (see “The shutdown assist transaction” on page 30). v Tasks that are automatically initiated are run—if they start before the second quiesce stage.
this indicator to determine the type of startup it is to perform. See “How the state of the CICS region is reconstructed” on page 34. v CICS writes warm keypoint records to: – The global catalog for terminal control and profiles –...
Flushing journal buffers During a successful normal shutdown, CICS calls the log manager domain to flush all journal buffers, ensuring that all journal records are written to their corresponding MVS system logger log streams. During an immediate shutdown, the call to the log manager domain is bypassed and journal records are not flushed.
2. If the default shutdown assist transaction CESD is run, it allows as many tasks as possible to commit or back out cleanly, but within a shorter time than that allowed on a normal shutdown. See “The shutdown assist transaction” on page 30 for more information about CESD, which runs the CICS-supplied program DFHCESD.
The next initialization of CICS must be an emergency restart, in order to preserve data integrity. An emergency restart is ensured if the next initialization of CICS specifies START=AUTO. This is because the recovery manager’s type-of-restart indicator is set to “emergency-restart-needed” during initialization, and is not reset in the event of an immediate or uncontrolled shutdown.
You are recommended always to use the CESD shutdown-assist transaction when shutting down your CICS regions. You can use the DFHCESD program “as is”, or use the supplied source code as the basis for your own customized version (CICS supplies versions in assembler, COBOL, and PL/I). For more information about the operation of the CICS-supplied shutdown assist program, see the CICS Operations and Utilities Guide.
- File control recovery blocks (only if a SHCDS NONRLSUPDATEPERMITTED command has been used). – Transient data queue definitions – Dump table information – Interval control elements and automatic initiate descriptors at shutdown – APPC connection information so that relevant values can be restored during a persistent sessions restart –...
If you ever need to redefine and reinitialize the CICS local catalog, you should also reinitialize the global catalog. After reinitializing both catalog data sets, you must perform an initial start. Shutdown initiated by CICS log manager The CICS log manager initiates a shutdown of the region if it encounters an error in the system log that indicates previously logged data has been lost.
and therefore recovery of the most recent units of work cannot be carried out. However, data might be missing from any part of the system log and CICS cannot identify what is missing. CICS cannot examine the log and determine exactly what data is missing, because the log data might appear consistent in itself even when CICS has detected that some data is missing.
Overriding the type of start indicator The operation of the recovery manager's control record can be modified by running the recovery manager utility program, DFHRMUTL. About this task This can set an autostart record that determines the type of start CICS is to perform, effectively overriding the type of start indicator in the control record.
– APPC – MRO connections to regions running under CICS Transaction Server – The resource manager interface (RMI); for example, to DB2 and DBCTL. v CICS does not preserve any information in the global catalog or the system log that relates to local units of work.
Dynamic RLS restart If a CICS region is connected to an SMSVSAM server when the server fails, CICS continues running, and recovers using a process known as dynamic RLS restart. An SMSVSAM server failure does not cause CICS to fail, and does not affect any resource other than data sets opened in RLS mode.
NOPS, no persistent sessions For single-node persistent sessions support, you require VTAM V3.4.1 or later, which supports persistent LU-LU sessions. CICS Transaction Server for z/OS, Version 4 Release 1 functions with releases of VTAM earlier than V3.4.1, but in the earlier releases sessions are not retained in a bound state if CICS fails.
Page 51
During an emergency restart of CICS, CICS restores those sessions pending recovery from the CICS global catalog and the CICS system log to an in-session state. This process of persistent sessions recovery takes place when CICS opens its VTAM ACB. With multinode persistent sessions support, if VTAM or z/OS fails, sessions are restored when CICS reopens its VTAM ACB, either automatically by the COVR transaction, or by a CEMT or EXEC CICS SET VTAM OPEN command.
v If CICS determines that it cannot recover the session without unbinding and rebinding it. The result in each case is as if CICS has restarted following a failure without VTAM persistent sessions support. In some other situations APPC sessions are unbound. For example, if a bind was in progress at the time of the failure, sessions are unbound.
Page 53
You can then start further CICS regions with or without persistent sessions support as appropriate, provided that you do not exceed the limit for the number of regions that do have persistent sessions support. If you specify NOPS (no persistent session support) for the PSTYPE system initialization parameter, a zero value is required for the PSDINT (persistent session delay interval) system initialization parameter.
Page 54
CICS TS for z/OS 4.1: Recovery and Restart Guide...
– CICS requests the SMSVSAM server, if connected, to release all RLS retained locks. – CICS does not rebuild the non-RLS retained locks. v CICS requests the SMSVSAM server to clear the RLS sharing control status for the region. v CICS does not restore the dump table, which may contain entries controlling system and transaction dumps.
TS pool, unless you clear the coupling facility structure in which the pool resides. If you want to cause a server to reinitialize its pool, use the MVS SETXCF FORCE command to clean up the structure:...
Transient data resource definitions are installed from Resource groups defined in the CSD, as specified in the CSD group list (named on the GRPLIST system initialization parameter). Any extrapartition TD queues that require opening are opened; that is, any that specify OPEN(INITIAL). All the newly-installed TD queue definitions are written to the global catalog.
If you define new resource definitions and install them dynamically, ensure the group containing the resources is added to the appropriate group list. Monitoring and statistics The initial status of CICS monitoring is determined by the monitoring system initialization parameters (MN and MNxxxx). The initial recording status for CICS statistics is determined by the statistics system initialization parameter (STATRCD).
This is effective only if both the system log stream and the global catalog from the previous run of CICS are available at restart. See the CICS Transaction Server for z/OS Installation Guide for information about recovery of distributed units of work.
information saved in the system log from a previous run. The primary and secondary system log streams are purged and CICS begins writing a new system log. v Because CICS is starting a new catalog, it uses a new logname token in the “exchange lognames”...
Page 64
CICS TS for z/OS 4.1: Recovery and Restart Guide...
Reconnecting to SMSVSAM for RLS access CICS connects to the SMSVSAM server, if present, and exchanges RLS recovery information. In this exchange, CICS finds out whether SMSVSAM has lost any retained locks while CICS has been shut down.
(which is not the case on an emergency restart). CICS opens the auxiliary temporary storage data set for update. Temporary storage data sharing server Any queues written to a shared temporary storage pool, even though non-recoverable, persist across a warm restart.
v All intrapartition TD queues are initialized empty. v The queue resource definitions are installed from the global catalog, but they are not updated by any log records or keypoint data. They are always installed enabled. This option is intended for use when initiating remote site recovery (see Chapter 6, “CICS emergency restart,”...
Autoinstall for programs If program autoinstall is enabled (PGAIPGM=ACTIVE), program, mapset, and partitionset resource definitions are installed from the CSD only if they were cataloged; otherwise they are installed at first reference by the autoinstall process. All definitions installed from the CSD are updated with information from the warm keypoint in the system log.
Journal names and journal models The CICS log manager restores the journal name and journal model definitions from the global catalog. Journal name entries contain the names of the log streams used in the previous run, and the log manager reconnects to these during the warm restart.
v Different TCT from last run. CICS installs the TCT only, and does not apply the warm keypoint information, effectively making this a cold start for these devices. Note: CICS TS for z/OS, Version 4.1 supports only remote TCAM terminals—that is, the only TCAM terminals you can define are those attached to a remote, pre-CICS TS 3.1, terminal-owning region by TCAM/DCB.
Page 72
CICS TS for z/OS 4.1: Recovery and Restart Guide...
Any non-RLS locks associated with in-flight (and other failed) transactions are acquired as active locks for the tasks attached to perform the backouts. This means that, if any new transaction attempts to access non-RLS data that is locked by a backout task, it waits normally rather than receiving the LOCKED condition.
Reconnecting to SMSVSAM for RLS access As on a warm restart, CICS connects to the SMSVSAM server. In addition to notifying CICS about lost locks, VSAM also informs CICS of the units of work belonging to the CICS region for which it holds retained locks. See “Lost locks recovery”...
Start requests In general, start requests are recovered only when they are associated with recoverable data or are protected and the issuing unit of work is indoubt. However, recovery can be further limited by the use of the specific COLD option on the system initialization parameter for TS, ICP, or BMS.
is successful, but CICS abnormally terminates before the catalog can be updated, CICS recovers the information from the forward recovery records on the system log. If the installation or deletion of installable sets or individual resources is unsuccessful, or has not reached commit point when CICS abnormally terminates, CICS does not recover the changes.
Page 78
CICS TS for z/OS 4.1: Recovery and Restart Guide...
Restarts all the elements of a workload (for example, CICS TORs, AORs, FORs, DB2, and so on) on another MVS image after an MVS failure v Restarts CICS data sharing servers in the event of a server failure. v Restarts a failed MVS image CICS reconnects to DBCTL and VTAM automatically if either of these subsystems restart after a failure.
CICS startup JCL used to restart a CICS region is suitable for ARM. Before you begin The implementation of ARM is part of setting up your MVS environment to support CICS. See the CICS Transaction Server for z/OS Installation Guide for details. About this task During initialization CICS registers with ARM automatically.
CANCEL, CICS de-registers from ARM before terminating, because if CICS remained registered, an automatic restart would probably encounter the same error condition. For other error situations, CICS does not de-register, and automatic restarts follow. To control the number of restarts, specify in your ARM policy the number of times ARM is to restart a failed CICS region.
CICS START options You are recommended to specify START=AUTO, which causes a warm start after a normal shutdown and an emergency restart after failure. You are also recommended always to use the same JCL, even if it specifies START=COLD or START=INITIAL, to ensure that CICS restarts correctly when restarted by the MVS automatic restart manager after a failure.
CANCEL RESTART=YES. This terminates the existing connection, closes the server and its old job, and starts a new instance of the server job.
Page 84
You can also restart a server explicitly using either the server command CANCEL RESTART=YES, or the MVS command CANCEL jobname,ARMRESTART By default, the server uses an ARM element type of SYSCICSS, and an ARM element identifier of the form DFHxxnn_poolname where xx is the server type (XQ, CF or NC) and nn is the one- or two-character &SYSCLONE identifier of the MVS...
See “Commit-failed recovery” on page 83. Backout-failed A unit of work fails while backing out updates to file control recoverable resources. (The concept of backout-failed applies in principle to any resource that performs backout recovery, but CICS file control is the only resource manager to provide backout failure support.) A partial copy of the unit of work is shunted to await retry of the backout process when the problem is resolved.
Page 87
terminating transaction takes place immediately. Therefore, it does not cause any active locks to be converted into retained locks. In the case of a CICS region abend, in-flight tasks have to wait to be backed out when CICS is restarted, during which time the locks are retained to protect uncommitted resources.
Page 88
CICS does not back out changes to temporary storage queues held in main storage or in a TS server temporary storage pool. START requests Recovery of EXEC CICS START requests during transaction backout depends on some of the options specified on the request.
Page 89
intended for the started task, but does not back out the START request itself. Thus the new task will start at its specified time, but the data will not be available to the started task, to which CICS will return a NOTFND condition in response to the RETRIEVE command.
Page 90
Table 1. Effect of RESTART option on started transactions (continued) Description of non-terminal START command Specifies nonrecoverable data Without data ¹ n is defined in the transaction restart program, DFHREST, where the CICS-supplied default is 20. Note: Channel data is not recoverable, therefore channels are not available on restart.
v PROTECT operand in the CMSG transaction Note: If backout fails, CICS does not try to restart regardless of the setting of the restart program. Backout-failed recovery Backout failure support is currently provided only by CICS file control. If backout to a VSAM data set fails for any reason, CICS performs the following processing: v Invokes the backout failure global user exit program at XFCBFAIL, if this exit is enabled.
Page 92
Auxiliary temporary storage All updates to recoverable auxiliary temporary storage queues are managed in main storage until syncpoint. TS always commits forwards; therefore TS can never suffer a backout failure. Transient data All updates to logically recoverable intrapartition queues are managed in main storage until syncpoint, or until a buffer must be flushed because all buffers are in use.
Page 93
It is not possible to open the data set in RLS mode because the SMSVSAM server is not available, in which case the backout is automatically retried when the SMSVSAM server becomes available.
Page 94
SMSVSAM server might be detected by the backout request, in which case CICS file control starts to close the failed SMSVSAM control ACB and issues a console message. If the failure has already been detected by some other (earlier) request, CICS has already started to close the SMSVSAM control ACB when the backout request fails.
distinguishes between a commit failure where recoverable work was performed, and one for which only repeatable read locks were held. Indoubt failure recovery The CICS recovery manager is responsible for maintaining the state of each unit of work in a CICS region. For example, typical events that cause a change in the state of a unit of work are temporary suspension and resumption, receipt of syncpoint requests, and entry into the indoubt period during two-phase commit processing.
reads against VSAM data sets and has made no updates to other resources, it is safe to force the unit of work using the SET DSNAME or SET UOW commands. CICS saves enough information about the unit of work to allow it to be either committed or backed out when the indoubt unit of work is unshunted when the coordinator provides the resolution (or when the transaction wait time expires).
Page 98
To retrieve information about a unit of work (UOW), you can use either the CEMT, or EXEC CICS, INQUIRE UOW command. For the purposes of this illustration, the CEMT method is used. You can filter the command to show only UOWs that are associated with a particular transaction.
Page 99
When a UOW has been shunted indoubt, CICS retains locks on the recoverable resources that the UOW has updated. This prevents further tasks from changing the resource updates while they are indoubt. To display CICS locks held by a UOW that has been shunted indoubt, use the CEMT INQUIRE UOWENQ command.
Recovery from failures associated with the coupling facility This topic deals with recovery from failures arising from the use of the coupling facility, and which affect CICS units of work. It covers: v SMSVSAM cache structure failures v SMSVSAM lock structure failures (lost locks) v Connection failure to a coupling facility cache structure v Connection failure to a coupling facility lock structure v MVS system recovery and sysplex recovery...
Notifying CICS of SMSVSAM restart When the SMSVSAM servers are able to connect to a new lock structure, they use the MVS ENF to notify the CICS regions that their SMSVSAM server is available again. CICS is informed during dynamic RLS restart about those data sets for which it must perform lost locks recovery.
Page 102
They are not backed out until they make a further request to RLS. In the case of an SMSVSAM server failure that is caused by a lock structure failure, this would mean that in-flight units of work could delay the recovery from the lost locks condition until the units of work make further RLS updates.
CICS regions in the failed MVS image receive the LOCKED response. If all the MVS images in a sysplex fail, the first SMSVSAM server to restart reconnects to the lock structure in the coupling facility and converts all the locks into retained locks for the whole sysplex.
Recovery from the failure of a sysplex is just the equivalent of multiple MVS failure recoveries. Transaction abend processing If, during transaction abend processing, another abend occurs and CICS continues, there is a risk of a transaction abend loop and further processing of a resource that has lost integrity, because of uncompleted recovery.
The exit code then executes as an extension of the abending task, and runs at the same level as the program that issued the HANDLE ABEND command that activated the exit. After any program-level abend exit code has been executed, the next action depends on how the exit code ends: v If the exit code ends with an ABEND command, CICS gives control to the next higher level exit code that is active.
1. CICS invokes DFHREST only when RESTART(YES) is specified in a transaction’s resource definition. 2. Ensure that resources used by restartable transactions, such as files, temporary storage, and intrapartition transient data queues, are defined as recoverable. 3. When transaction restart occurs, a new task is attached that invokes the initial program of the transaction.
Page 107
v CICS remains operational, but the task currently in control terminates. v CICS terminates (see “Shutdown requested by the operating system” on page 29). If a program check occurs when a user task is processing, the task abends with an abend code of ASRA.
Page 108
CICS TS for z/OS 4.1: Recovery and Restart Guide...
The TEP is entered once for each terminal error, and therefore should be designed to process only one error for each invocation. Intersystem communication failures An intersystem communication failure can be caused by the failure of a CICS region, or the remote system to which it is connected. A network failure can also cause the loss of the connection between CICS and a remote system.
Question 5: If a data set becomes unusable, should all applications be terminated while recovery is performed? If degraded service to any application must be preserved while recovery of the data set takes place, you will need to include procedures to do this.
Before any design or programming work begins, all interested parties should agree on the statement—including: v Those responsible for business management v Those responsible for data management v Those who are to use the application—including the end users, and those responsible for computer and online system operation Designing the end user’s restart procedure Decide how the user is to restart work on the application after a system failure.
v If a user’s printer becomes unusable (because of hardware or communication problems), consider the use of alternatives, such as the computer center’s printer, as a standby. Security Decide the security procedures for an emergency restart or a break in communications.
and general log data to log streams defined to the MVS system logger. For more information, see Chapter 11, “Defining system and general log streams,” on page 107. Files For VSAM files defined to be accessed in RLS mode, define the recovery attributes in the ICF catalog, using IDCAMS.
Page 118
normal conditions. They should, nevertheless, be tested as far as possible, to ensure that they handle the functions for which they are designed. CICS facilities, such as the execution diagnostic facility (CEDF) and command interpreter (CECI), can help to create exception conditions and to interpret program and system reactions to those conditions.
IXCMIAPU utility. For information about defining coupling facility log streams and DASD-only log streams, see the CICS Transaction Server for z/OS Installation Guide. For more information about the coupling facility and defining log structures generally, see z/OS MVS Setting Up a Sysplex.
CICS log manager connects to its log stream automatically during system initialization, unless it is defined as TYPE(DUMMY) in a CICS JOURNALMODEL resource definition. Although the CICS system log is logically a single logical log stream, it is written to two physical log streams—a primary and a secondary. In general, it is not necessary to distinguish between these, and most references are to the system log stream.
Model log streams for CICS system logs If CICS fails to connect to its system log streams because they have not been defined, CICS attempts to have them created dynamically using model log streams. To create a log stream dynamically, CICS must specify to the MVS system logger all the log stream attributes needed for a new log stream.
Page 123
However, using model log streams defined with the CICS default name are always assigned to the same structure within an MVS image. This may not give you the best allocation in terms of recovery considerations if you are using structures defined across two or more coupling facilities.
MVSA MVSC Figure 9. Sharing system logger structures between 4 MVS images Varying the model log stream name: To balance log streams across log structures, using model log streams means customizing the model log stream names. You cannot achieve the distribution of log streams shown in this scenario using the CICS default model name.
– Short enough to avoid large volumes of data between keypoints. For information about calculating system log stream structure sizes, see the CICS Transaction Server for z/OS Installation Guide. v Except for your own recovery records that you need during emergency restart, do not write your own data to the system log (for example, audit trail data).
If a system log stream exceeds the primary storage space allocated, it spills onto secondary storage. (For a definition of primary and secondary storage, see the CICS Transaction Server for z/OS Installation Guide.) The resulting I/O can adversely affect system performance.
Page 127
Writing user-recovery data About this task You should write only recovery-related records to the system log stream. You can do this using the commands provided by the application programming interface (API) or the exit programming interfaces (XPI). This is important because user recovery records are presented to a global user exit program enabled at the XRCINPT exit point.
About this task The dddd value specifies the minimum number of days for which data is to be retained on the log. You are strongly recommended not to use the system log for records that need to be kept. Any log and journal data that needs to be preserved should be written to a general log stream.
2. Define a general log stream for forward recovery data. If you do not define a general log stream, CICS attempts to create a log stream dynamically. See “Model log streams for CICS general logs” for details. 3. Decide how you want to merge forward recovery data from different CICS regions into one or more log streams.
(The limit is several million and in normal circumstances it is not likely to be exceeded.) See the CICS Transaction Server for z/OS Installation Guide for information about managing log stream data sets.
About this task The CICS-supplied group, DFHLGMOD, includes a JOURNALMODEL for the log of logs, called DFHLGLOG, which has a log stream name of &USERID..CICSVR.DFHLGLOG. Note that &USERID resolves to the CICS region userid, and if your CICS regions run under different RACF user IDs, the DFHLGLOG definition resolves to a unique log of logs log stream name for each region.
v In a format compatible with utility programs written for versions of CICS that use the log manager for logging and journaling. See the CICS Operations and Utilities Guide for more information about using the LOGR SSI to access log stream data, and for sample JCL. If you plan to write your own utility program to read log stream data, see the CICS Customization Guide for information about log stream record formats.
Page 133
Operating a recovery process that is independent of time-stamps in the system log data ensures that CICS can restart successfully after an abnormal termination, even if the failure occurs shortly after local time has been put back. Offline utility program, DFHJUP Changing the local time forward has no effect on the processing of system log streams or general log streams by the CICS utility program, DFHJUP.
Page 134
CICS TS for z/OS 4.1: Recovery and Restart Guide...
Page 136
SPURGE({NO|YES}) This option indicates whether the transaction is initially system-purgeable; that is, whether CICS can purge the transaction as a result of: v Expiry of a deadlock timeout (DTIMOUT) delay interval v A CEMT, or EXEC CICS, SET TASK(id) PURGE|FORCEPURGE command. The default is SPURGE(NO).
RLS mode are all regarded as local to each CICS region. The SMSVSAM server takes the place of the FOR. However, there are special recovery considerations for data sets accessed in RLS mode, in connection with the role of SMSVSAM and its lock management.
Forward recovery For VSAM files, you can use a forward recovery utility, such as CICSVR, when online backout processing has failed as a result of some physical damage to the data set. For forward recovery: v Create backup copies of data sets. v Record after-images of file changes in a forward recovery stream.
Page 139
uses the ICF catalog entry recovery attributes instead of the FILE resource. To force CICS to use the FILE resource attributes instead of the catalog, set the NONRLSRECOV system initialization parameter to FILEDEF. v You define the recovery attributes for BDAM files in file entries in the file control table (FCT).
VSAM files accessed in RLS mode If you specify file definitions that open a data set in RLS mode, specify the recovery options in the ICF catalog. The recovery options on the CICS file resource definitions (RECOVERY, FWDRECOVLOG, and BACKUPTYPE) are ignored if the file definition specifies RLS access.
INQUIRE DSNAME command returns values from the VSAM base cluster block (BCB). However, because base cluster block (BCB) recovery values are not set until the first open, if you issue an INQUIRE DSNAME command before the first file is opened, CICS returns NOTAPPLIC for RECOVSTATUS. BDAM files You can specify CICS support for backward recovery for BDAM files using the LOG parameter on the DFHFCT TYPE=FILE macro.
Page 142
About this task If you use XFCNREC to suppress open failures that are a result of inconsistencies in the backout settings, CICS issues a message to warn you that the integrity of the data set can no longer be guaranteed. Any INQUIRE DSNAME RECOVSTATUS command that is issued from this point onward will return NOTRECOVABLE, regardless of the recovery attribute that CICS has previously enforced on the base cluster.
- File is defined with RECOVERY(ALL): the open fails. – Base cluster has RECOVERY(ALL): - File is defined with RECOVERY(NONE): the open fails. - File is defined with RECOVERY(BACKOUTONLY): the open fails. - File is defined with RECOVERY(ALL): the open proceeds unless FWDRECOVLOG specifies a different journal id from the base cluster, in which case the open fails.
Page 144
For more information about allocation and space requirements, see the CICS System Definition Guide.) For extrapartition transient data considerations, see “Recovery for extrapartition transient data” on page 134. You must specify the name of every intrapartition transient data queue that you want to be recoverable in the queue definition.
Physically recoverable TD queue (before failure) Item 1 Item 2 Item 3 Next READ pointer State of physically recoverable TD queue (after emergency restart) Item 1 Item 2 Item 3 Next READ pointer Figure 12. Illustration of recovery of a physically recoverable TD queue Making intrapartition TD physically recoverable can be useful in the case of some CICS queues.
Recovery for extrapartition transient data CICS does not recover extrapartition data sets. If you depend on extrapartition data, you will need to develop procedures to recover data for continued execution on restart following either a controlled or an uncontrolled shutdown of CICS. There are two areas to consider in recovering extrapartition data sets: v Input extrapartition data sets v Output extrapartition data sets...
Output extrapartition data sets The recovery of output extrapartition data sets is somewhat different from the recovery of input data sets. For a tape output data set, use a new output tape on restart. You can then use the previous output tape if you need to recover information recorded before termination.
Define temporary storage queues as recoverable using temporary storage model resource definitions as shown in the following example define statements: CEDA DEFINE DESCRIPTION(Recoverable TS queues for START requests) TSMODEL(RECOV1) CEDA DEFINE DESCRIPTION(Recoverable TS queues for BMS) TSMODEL(RECOV2) CEDA DEFINE DESCRIPTION(Recoverable TS queues for BMS) TSMODEL(RECOV3) CICS continues to support temporary storage tables (TST) and you could define recoverable TS queues in a TST, as shown in the following example: DFHTST TYPE=RECOVERY,...
About this task CICS uses Business Transaction Services (BTS) to ensure that persistent messages are recovered in the event of a CICS system failure. For this to work correctly, follows these steps: Procedure 1. Use IDCAMS to define the local request queue and repository file to MVS. You must specify a suitable value for STRINGS for the file definition.
2. For each local request queue, define a QLOCAL object. Use the following command: DEFINE QLOCAL('queuename') DESCR('description') PROCESS(processname) INITQ('initiation_queue') TRIGGER TRIGTYPE(FIRST) TRIGDATA('default_target_service') BOTHRESH(nnn) BOQNAME('requeuename') where: v queuename is the local queue name. v processname is the name of the process instance that identifies the application started by the queue manager when a trigger event occurs.
Page 151
not usable, message DFHPI0117 is issued, and CICS continues without BTS, using the existing channel-based container mechanism. If a CICS failure occurs before the Web service starts or completes processing, BTS recovery ensures that the process is rescheduled when CICS is restarted. If the Web service abends and backs out, the BTS process is marked complete with an ABENDED status.
Page 152
CICS TS for z/OS 4.1: Recovery and Restart Guide...
Page 154
v Progress transaction, to check on progress through the application. Such a function could be used after a transaction failure or after an emergency restart, as well as at any time during normal operation. For example, it could be designed to find the correct restart point at which the terminal user should recommence the interrupted work.
SAA-compatible applications The resource recovery element of the Systems Application Architecture common programming interface (CPI) provides an alternative to the standard CICS application program interface (API) if you need to implement SAA-compatible applications. The resource recovery facilities provided by the CICS implementation of the SAA resource recovery interface are the same as those provided by CICS API.
committed in one unit of work, but the transaction is to continue with one or more units of work for further processing. 3. Where file or database updates must be kept in step, make sure that your application does them in the same unit of work. This approach ensures that those updates will all be committed together or, in the event of the unit of work being interrupted, the updates will back out together to a consistent state.
back out only the updates made during that individual step; the application is responsible for restarting at the appropriate point in the conversation, which might involve recreating a screen format. However, other tasks might try to update the database between the time when update information is accepted and the time when it is applied to the database.
Unlike UMTs, coupling facility data tables are recoverable in the event of a CICS failure, CFDT server failure , or an MVS failure. However, they are not forward recoverable. Designing to avoid transaction deadlocks You must design your program to avoid transaction deadlocks.
About this task Consider using the following techniques: Procedure v Arrange for all transactions to access files in a sequence agreed in advance. This could be a suitable subject for installation standards. Be extra careful if you allow updates through multiple paths. v Enforce explicit installation enqueueing standards so that all applications do the following: 1.
The PROTECT option on a START request ensures that, if the task issuing the START fails during the unit of work, the new task will not be initiated, even though its start time may have passed. (See “START requests” on page 76 for more information about the PROTECT option.) Consider also the possibility of a started task that fails.
Using transient data queues When a number of tasks direct large amounts of data to a single terminal (for example, a printer receiving multipage reports initiated by the users), it may be necessary to queue the data (on disk) until the terminal is ready to receive it. About this task Such queuing can be done on a transient data queue associated with a terminal.
Page 162
The RESP option on a command returns a condition ID that can be tested. Alternatively, a HANDLE CONDITION command is used in the local context of a transaction program to name a label where control is passed if certain conditions occur.
Remember that: v For transactions that access a recoverable resource, DTB helps to preserve logical data integrity. v Resources that are to be updated should be made recoverable. v DTB takes place only after program level abend exits (if any) have attempted cleanup or logical recovery.
Page 164
If you want to initiate a dump, do so in the exit code at the same program level as the abend. If you initiate the dump at a program level higher than where the abend occurred, you may lose valuable diagnostic information. v Attempt local recovery, and then continue running the program.
Processing the IOERR condition Any program that attempts to process an IOERR condition for a recoverable resource must not issue a RETURN or SYNCPOINT command, but must terminate by issuing an ABEND command. A RETURN or SYNCPOINT command causes the recovery manager to complete the unit of work and commit changes to recoverable resources.
For details of PL/I coding restrictions in a CICS environment, see the appropriate PL/I programmer’s guide for your compiler. Locking (enqueuing on) resources in application programs This topic describes locking (enqueuing) functions provided by CICS (and access methods) to protect data integrity. About this task There are two forms of locking: 1.
Page 167
READ UPDATE ====== Locking ===== during update (See Note below) Task A READ UPDATE ===Wait=== Task B Abbreviations: SOT: Start of task Syncpoint Figure 13. Locking during updates to nonrecoverable files. This figure illustrates two tasks updating the same record or control interval.
The extended period of locking is needed to avoid an update committed by one task being backed out by another. (Consider what could happen if the nonextended locking action shown in Figure 13 on page 155 was used when updating a recoverable file. If task A abends just after task B has reached syncpoint and has thus committed its changes, the subsequent backout of task A returns the file to the state it was in at the beginning of task A, and task B’s committed update is lost.)
The backout fails because a duplicate key is detected in the AIX indicated by message DFHFC4701, with a failure code of X'F0'. There is no locking on the ® key to prevent the second task taking the key before the end of the first task’s unit of work.
enqueuing on temporary storage queues where concurrently executing tasks can read and change queue(s) with the same temporary storage identifier. (See “Explicit enqueuing (by the application programmer).”) Temporary storage control commands that invoke implicit enqueuing are: v WRITEQ TS v DELETEQ TS Implicit enqueuing on DL/I databases with DBCTL ™™...
After a task has issued an ENQ RESOURCE(data-area) command, any other task that issues an ENQ RESOURCE command with the same data-area parameter is suspended until the task issues a matching DEQ RESOURCE(data-area) command, or until the unit of work ends. Note: Enqueueing on more than one resource concurrently might create a deadlock between transactions.
v If both deadlocked resources are CICS resources (but not both VSAM resources), or one is CICS and the other DL/I, CICS abends the task whose DTIMOUT period elapses first. It is possible for both tasks to time out simultaneously. If neither task has a DTIMOUT period specified, they both remain suspended indefinitely, unless one of them is canceled by a master terminal command.
Procedure v Enable them in PLT programs in the first part of PLT processing. v Specify them on the system initialization parameter, TBEXITS. This takes the form TBEXITS=(name1,name2,name3,name4,name5,name6), where name1, name2, name3, name4, name5, and name6 are the names of your global user exit programs for the XRCINIT, XRCINPT, XFCBFAIL, XFCLDEL, XFCBOVER, XFCBOUT exit points.
XFCLDEL global user exit XFCLDEL is invoked when backing out a unit of work that performed a write operation to a VSAM ESDS, or a BDAM data set. XFCBOVER global user exit XFCBOVER is invoked whenever CICS is about to decide not to backout an uncommitted update, because the record could have been updated by a non-RLS batch program.
7. The CICS transaction failure program, DFHTFP, links to DFHPEP before transaction backout is performed. This means resources used by the abending transaction may not have been released. DFHPEP needs to be aware of this, and might need logic to handle resources that are still locked. 8.
When you have corrected the error, you can re-enable the relevant installed transaction definition to allow terminals to use it. You can also disable transaction identifiers when transactions are not to be accepted for application-dependent reasons, and can enable them again later. The CICS Resource Definition Guide tells you more about the master terminal operator functions.
Page 178
CICS TS for z/OS 4.1: Recovery and Restart Guide...
The RLS quiesce and unquiesce functions The RLS quiesce and unquiesce functions are initiated by a CICS command in one region, and propagated by the VSAM RLS quiesce interface to other CICS regions in the sysplex. When these functions are complete, the ICF catalog shows the quiesce state of the target data set.
Page 181
MVS1 CICS AOR1 User transaction issues: EXEC CICS SET DSNAME(ds_name) QUIESCED CICS task CICS task CFQS CFQR CICS RLS quiesce exit SMSVSAM Figure 16. The CICS RLS quiesce operation with the CICS quiesce exit program Note: 1. A suitably-authorized user application program (AOR1 in the diagram) issues an EXEC CICS SET DSNAME(...) QUIESCED command (or a terminal operator issues the equivalent CEMT command).
Page 182
SMSVSAM through the control ACB interface. 7. When all CICS regions have replied with the IDAQUIES macro QUICMP function, the SMSVSAM server that handled the original request (from AOR1 in the diagram) sets the quiesced flag in the ICF catalog. This prevents any files being opened in RLS mode for the data set, but allows non-RLS open requests that are initiated, either implicitly or explicitly, by user transactions.
Page 183
With the new RLS quiesce mechanism, you do not have to close a data set to take a non-BWO backup. However, because this causes new transactions to be abended, you may prefer to quiesce your data sets before taking a non-BWO backup.
Lost locks recovery complete A quiesce interface function initiated by VSAM. VSAM takes action associated with a sphere having completed lost locks recovery on all CICS regions that were sharing the data set. SMSVSAM invokes the CICS RLS quiesce exit program in each region that is registered with an SMSVSAM control ACB.
Note: If your file definitions specify an LSR pool id that is built dynamically by CICS, consider using the RLSTOLSR system initialization parameter. v Open the files non-RLS read-only mode in CICS. v Concurrently, run batch non-RLS. v When batch work is finished: –...
The remainder of this topic on switching to non-RLS access mode describes the options that are available if you need to switch to non-RLS mode and are prevented from doing so by retained locks. Resolving retained locks before opening data sets in non-RLS mode VSAM sets an ‘RLS-in-use’...
Page 187
– A number of data sets bound to a failing cache (which returns a CAUSE of CACHE) – All the RLS data sets updated in the unit of work when the SMSVSAM server fails (which returns a CAUSE of RLSSERVER with REASON RLSGONE).
This failure is caused by a failure of the SMSVSAM server, which is returned as RLSSERVER on the CAUSE option and COMMITFAIL or RRCOMMITFAIL on the REASON option of the INQUIRE UOWDSNFAIL command.
4. If a unit of work has been shunted with a different CAUSE and REASON, review the descriptions of these values in the INQUIRE UOWDSNFAIL command to determine what action to take to allow the shunted unit of work to complete. Choosing data availability over data integrity There may be times when you cannot resolve all the retained locks correctly, either because you cannot easily remedy the situations preventing the changes from...
Diagnostic messages DFHFC3003 and DFHFC3010 are issued for each log record. If a data set has both indoubt-failed and other (backout- or commit-) failed units of work, deal with the indoubt UOWs first, using SET DSNAME UOWACTION, because this might result in other failures which can then be cleared by the SET DSNAME RESETLOCKS command.
Page 191
CEMT INQUIRE UOWDSNFAIL DSN(’RLS.ACCOUNTS.ESDS.DBASE1’) STATUS: RESULTS Dsn(RLS.ACCOUNTS.ESDS.DBASE1 Uow(AA6DB080C40CEE01) Dsn(RLS.ACCOUNTS.ESDS.DBASE1 Uow(AA6DB08AC66B4000) The display shows a REASON code of DELEXITERROR (Del) for one unit of work, and INDEXRECFULL (Ind) for the other. 2. A CEMT SET DSNAME(...) RETRY command, after fixing the delete exit error (caused because an XFCLDEL exit program was not enabled) invokes a retry of both units of work with the following result: SET DSNAME(’RLS.ACCOUNTS.ESDS.DBASE1’) RETRY...
A special case: lost locks If a lost locks condition occurs, any affected data set remains in a lost locks state until all CICS regions have completed lost locks recovery for the data set. Lost locks recovery is complete when all uncommitted changes, which were protected by the locks that were lost, have been committed.
Page 193
v Do not use DENYNONRLSUPDATE if you run non-RLS work after specifying PERMITNONRLSUPDATE. The permit status is automatically reset by the CICS regions that hold retained locks when they open the data set in RLS mode. Post-batch processing After a non-RLS program has been permitted to override retained locks, the uncommitted changes that were protected by those locks must not normally be allowed to back out.
The CICS region cannot resolve the UOW because its CFDT server has failed. v The CICS region that owns the UOW has failed to connect to its CFDT server. The CICS region that owns the UOW cannot resynchronize with its CFDT server.
Page 196
The following access method services examples assume that CICS.DATASET.A needs to be redefined and the data moved to a data set named CICS.DATASET.B, which is then renamed: DEFINE CLUSTER (NAME(CICS.DATASET.B) ... REPRO INDATASET(CICS.DATASET.A) OUTDATASET(CICS.DATASET.B) DELETE CICS.DATASET.A ALTER CICS.DATASET.B NEWNAME(CICS.DATASET.A) If the recoverable data set has associated RLS locks, these steps are not sufficient because: v The REPRO command copies the data from CICS.DATASET.A to CICS.DATASET.B but leaves the locks associated with the original data set...
This makes the data set unavailable while the move from old to new is in progress, and also allows the following unbind operation to succeed. 4. Issue the SHCDS FRUNBIND subcommand to unbind any retained locks against the old data set. For example: SHCDS FRUNBIND(CICS.DATASET.A) This enables SMSVSAM to preserve the locks ready for rebinding later to the new data set.
v Create a new empty data set into which the copy is to be restored, and use IMPORT to copy the data from the exported version of the data set to the new empty data set. v Use SHCDS FRSETRR to mark the original data set as being under maintenance. v Use SHCDS FRUNBIND to unbind the locks from the original data set.
Recovery of data set with volume still available The procedure described here is necessary to preserve any retained locks that are held by SMSVSAM against the data in the old data set. Unless you follow all the steps of this procedure, the locks will not be valid for the new data set, with potential loss of data integrity.
9. Alter the new data set name Use access method services to rename the new data set to the name of the old data set. ALTER CICS.DATASETB NEWNAME(CICS.DATASETA) You must give the restored data set the name of the old data set to enable the following bind operation to succeed.
Page 202
The automatic restart facility is also reenabled. Each CICS region detects that its SMSVSAM server is down as a result of the TERMINATESERVER command, and waits for the server event indicating the server has restarted before it can resume RLS-mode processing.
Page 203
This is because CICS cannot run the lost locks recovery process until the data sets are available, and the data sets are made available only after the CICS VR recovery jobs are finished. If you physically restore the volume, however, the data sets that need to be forward recovered are immediately available for backout.
Page 204
This clears the SMSVSAM CFVOL-QUIESCED state and allows SMSVSAM RLS access to the volume. CICS ensures that access is not allowed to the data sets that will eventually be forward recovered, but the volume is available for other data sets. 6.
Page 205
IGW572I REQUEST TO TERMINATE SMSVSAM ADDRESS SPACE IS ACCEPTED: SMSVSAM SERVER TERMINATION SCHEDULED. In our example, terminating the servers caused abends of all in-flight tasks that were updating RLS-mode data sets. This, in turn, caused backout failures and shunted UOWs, which were reported by CICS messages. For example, the...
Page 206
IGW453I SMSVSAM ADDRESS SPACE HAS SUCCESSFULLY The SMSVSAM server reported that there were no longer any retained locks but that instead there were data sets in the “lost locks” condition: IGW414I SMSVSAM SERVER ADDRESS SPACE IS NOW ACTIVE.
Page 207
work. Assuming that all CICS regions are active, and there are no indoubt UOWs, lost locks processing, for all data sets except the ones on the failed volume, should complete quickly. 9. In this example, CEMT INQUIRE UOWDSNFAIL on CICS region ADSWA01D showed UOW failures only for the RLSADSW.VF04D.TELLCTRL and RLSADSW.VF04D.DATAENDB data sets: INQUIRE UOWDSNFAIL...
Page 208
waits for indoubt resolution before allowing general access to the data set. In such a situation you can still release the locks immediately, using the SET DSNAME command, although in most cases you will lose data integrity. See “Lost locks recovery”...
Page 209
ROUTE *ALL,VARY SMS,SMSVSAM,TERMINATESERVER 8. When all SMSVSAM servers were down, we deleted the IGWLOCK00 lock structure with the MVS command: VARY SMS,SMSVSAM,FORCEDELETELOCKSTRUCTURE 9. We restarted the SMSVSAM servers with the MVS command: ROUTE *ALL,VARY SMS,SMSVSAM,ACTIVE CICS was informed during dynamic RLS restart about the data sets for which it must perform lost locks recovery.
that before running SHCDS CFREPAIR, the restored user catalog must be import connected to the master catalog on all systems (see the “Recovering Shared Catalogs” topic in DFSMS/MVS Managing Catalogs). Forward recovery of data sets accessed in non-RLS mode For data sets accessed in non-RLS mode, use the following forward recovery procedure: 1.
Page 211
In these cases, you can resolve the cause of the failure and try the whole process again. This topic describes what to do when the failure in forward recovery cannot be resolved. In this case, where you are unsuccessful in applying all the forward recovery log data to a restored backup, you are forced to abandon the forward recovery, and revert to your most recent full backup.
Page 212
1) Force shunted indoubt units of work using SET DSNAME(...) 2) Reset locks for backout-failed units of work using SET DSNAME(...) The DFH0BAT3 sample program provides an example of how you can do this. There should not now be any shunted units of work on any CICS region with locks on the data set.
Procedure for failed non-RLS mode forward recovery operation If you are not successful in applying all the forward recovery log data to a restored backup, you are forced to abandon the forward recovery, and revert to your most recent full backup. However, during its recovery processing, CICS assumes that it is operating on a data set that has been correctly forward recovered, as in the case of recovery of a data set accessed in RLS mode (see “Procedure for failed RLS mode forward...
Page 214
CICS TS for z/OS 4.1: Recovery and Restart Guide...
forward-recovery logs. Long-running transactions, automated teller machines, and continuously available applications require the database to be up and running when the backup is being taken. The concurrent copy function used along with BWO by DFSMSdss allows backups to be taken with integrity even when control-area and control-interval splits and data set additions (new extents or add-to-end) are occurring for VSAM key sequenced data sets.
Hardware requirements The concurrent copy function is supported by the IBM extended platform and the IBM 3990 Model 6 control units. Which data sets are eligible for BWO You can use BWO only for: v Data sets that are on SMS-managed storage and that have an integrated catalog facility (ICF) catalog.
Do not define BWO for the CICSPlex SM data repository using the IDCAMS DEFINE CLUSTER definition in the ICF catalog because performance is degraded. See the CICS Transaction Server for z/OS Installation Guide for information on taking backups of the CICSPlex SM data repository.
v But if you specify BWO(TYPECICS), and the PTF has not been applied, and you have not specified LOG(ALL) and a forward recovery log stream name, BWO processing for RLS remains disabled for such files. To achieve BWO for the file, you must either: –...
Removing BWO attributes If you want to remove BWO attributes from your data sets, you must follow the correct procedure to avoid problems when taking subsequent back ups. Procedure 1. Close the VSAM data set either by shutting down CICS normally or issuing the command CEMT SET FILE CLOSED.
After an uncontrolled or immediate shutdown, further BWO backups might be taken by DFSMShsm, because the BWO status in the ICF catalog is not reset. These backups should be discarded; only the non-BWO backups taken at the end of the batch window should be used during forward recovery, together with the CICS forward recovery logs.
Each of these operations is discussed in the following sections. File opening Different processing is done for each of the three cases when a file is opened for an update. The following processing takes place: v First file opened for update against a cluster v Subsequent file opened for update against a cluster while the previous file is still open (that is, the update use count in the DSNB is not zero) v Subsequent file opened for update against a cluster after all previous files have...
Page 223
v If the file was defined with BACKUPTYPE(STATIC) and the ICF catalog indicates that the data set is already ineligible for BWO, CICS sets the BACKUPTYPE attribute in the DSNB to indicate ineligibility for BWO. However, if the ICF catalog indicates that the data set is currently eligible for BWO, IGWABWO makes it ineligible for BWO and sets the recovery point to the current time.
v If the file was defined with BACKUPTYPE(STATIC) and the ICF catalog indicates that the data set is already ineligible for BWO, the ICF catalog is not updated. However, if the ICF catalog indicates that the data set is currently eligible for BWO, IGWABWO makes it ineligible for BWO and sets the recovery point to the current time.
Shutdown and restart The way CICS closes files is determined by whether the shutdown is controlled, immediate, or uncontrolled. Controlled shutdown During a controlled shutdown, CICS closes all open files defined in the FCT. This ensures that, for files that are open for update and eligible for BWO, the BWO attributes in the ICF catalog are set to a ‘BWO disabled’...
Page 226
For DFSMS 1.2 onward, access method services supports the import and export of BWO attributes. Invalid state changes for BWO attributes CICS, DFSMSdfp, DFSMSdss, and an SMSVSAM server can all update the BWO attributes in the ICF catalog. To prevent errors, DFSMSdss fails a BWO backup if one of the following state changes is attempted during the backup: v From ‘BWO enabled and VSAM split in progress’...
DFSMSdfp must now disallow the pending change to ‘BWO enabled’ (and DFSMSdss must fail the backup) because, if the split did not finish before the end of the backup, the invalid backup would not be discarded. v From ‘BWO disabled and VSAM split occurred’ to ‘BWO enabled’. This state change could be attempted if: 1.
each CICS allows all units of work with updates for the data set to complete, and then they write the tie-up records to the forward recovery log and the log of logs, and replies to DFSMSdss. For BWO backups, it is usually not necessary for the forward recovery utility to process a log from file-open time.
The forward recovery utility should ALLOCATE, with DISP=OLD, the data set that is to be recovered. This prevents other jobs accessing a back level data set and ensures that data managers such as CICS are not still using the data set. Before the data set is opened, the forward recovery utility should set the BWO attribute flags to the ‘Forward recovery started but not ended’...
An assembler program that calls DFSMS callable services *ASM XOPTS(CICS,NOEPILOG,SP) A program that can be run as a CICS transaction to Read and Set the BWO Indicators and BWO Recovery Point via DFSMS Callable Services (IGWABWO). Invoke the program via a CICS transaction as follows: Rxxx ’data_set_name’...
Page 231
DS 30C DATEVAL DS 8C Date value from BWO recovery point SUCMSG1 DS 8C Message text TIMEVAL DS 8C Time value from BWO recovery point SUCMSG2 DS C Message text READMSG DS 0CL11 If function = READ put out BWO flags DS 7C Message text BWOVAL1 DS C...
Page 232
MVC BWOFLAGS(12),ZEROES CLI BWOC1,C’0’ PRGBIT2 DS CLI BWOC2,C’0’ PRGBIT3 DS CLI BWOC3,C’0’ PRGREAD DS CLI TRANFUNC,C’R’ BNE PRGABORT * Set up the parameters for a read call MVC DSN(44),DSNAMER PRGCONT DS * OK, our parameters are set up, so create the address list, and make * the call LOAD EP=IGWABWO,ERRET=PRGABORT BALR 14,15...
Page 233
MVC SUCMSG1(8),SUCTXT1 MVC SUCMSG2(1),SUCTXT2 UNPK KEYWORK(9),BWOTIME(5) KEYWORK(8),HEXTAB-C’0’ MVC DATEVAL(8),KEYWORK UNPK KEYWORK(9),BWOTIME+4(5) Make time printable KEYWORK(8),HEXTAB-C’0’ MVC TIMEVAL(8),KEYWORK CLI TRANFUNC,C’S’ BNE PRGREADO * Got all the info we need, so put it out and exit EXEC CICS SEND TEXT FROM(SUCMSG) LENGTH(55) ERASE WAIT PRGEXIT * It’s a read so we also need the BWO flags for output PRGREADO DS...
Page 234
CICS TS for z/OS 4.1: Recovery and Restart Guide...
acceptable. If you are located in an area prone to hurricanes or earthquakes, for example, a disaster recovery site next door would be pointless. When you are planning for disaster recovery, consider the cost of being unable to operate your business for a period of time. You have to consider the number of lost transactions, and the future loss of business as your customers go elsewhere.
v How critical and sensitive your business processes are: the more critical they are, the more frequently testing may be required. Six tiers of solutions for off-site recovery One blueprint for recovery planning describes a scheme consisting of six tiers of off-site recoverability (tiers 1-6), with a seventh tier (tier 0) that relies on local recovery only, with no off-site backup.
Page 238
Approach Backups kept offsite Procedures and inventory offsite Recovery - install required hardware, restore system and data, reconnect to network Figure 18. Disaster recovery tier 1: physical removal Your disaster recovery plan has to include information to guide the staff responsible for recovering your system, from hardware requirements to day-to-day operations.
Tier 1 Tier 1 provides a very basic level of disaster recovery. You will lose data in the disaster, perhaps a considerable amount. However, tier 1 allows you to recover and provide some form of service at low cost. You must assess whether the loss of data and the time taken to restore a service will prevent your company from continuing in business.
The advantage of tier 3 is that you should be able to provide a service to your users quite rapidly. You must assess whether the loss of data will prevent your company from continuing in business. Figure 20 summarizes the tier 3 solution. Approach Backups kept off-site Procedures and inventory...
Tier 0 - 3 No offsite Truck access data method - tapes to cold site (none) ( w e e k + ) DFSMSdss DFSMSdss DFSMShsm (ABARS) Figure 21. Disaster recovery tier 0-3: summary of solutions The advantage of these methods is their low cost. The disadvantages of these methods are: v Recovery is slow, and it can take days or weeks to recover.
Page 242
Site One VTAM 3745 Channel extender ESCON 3990 Approach Workload may be shared Site one backs up site two and the reverse Critical applications and data are online Switch network Recover other applications Figure 22. Disaster recovery tier 4: active secondary site Tier 4 closes the gap between the point-in-time backups and current online processing recovery.
v Cost is higher than for the tier 1 to 3 solutions, because you need dedicated hardware, software, and communication links. Tier 5 - two-site, two-phase commit A tier 5 solution is appropriate for a custom-designed recovery plan with special applications.
Page 244
IDMS, CPCS, ADABAS, and SuperMICR database management systems, collecting real-time log and journal data from them. RRDF is supplied by E-Net Corporation and is available from IBM as part of the IBM Cooperative Software Program. The benefits of tier 6 are: v No loss of data.
support the XRC DFSMS/MVS host, and one for the recovery 3990, this allows a total of 86 km (53.4 miles) between the 3990s. If you use channel extenders with XRC, there is no limit on the distance between your primary and remote site. For RRDF there is no limit to the distance between the primary and secondary sites.
Disaster recovery and high availability This topic describes the tier 6 solutions for high availability and data currency when recovering from a disaster. Peer-to-peer remote copy (PPRC) and extended remote copy (XRC) PPRC and XRC are both 3990-6 hardware solutions that provide data currency to secondary, remote volumes.
Page 247
PPRC and XRC do not handle it. For more information on PPRC and XRC, see Planning for IBM Remote Copy, SG24-2595-00, and DFSMS/MVS Remote Copy Administrator's Guide and Reference. PPRC or XRC? You need to choose between PPRC and XRC for transmitting data to your backup site.
where there is a high volume of transactions, but each transaction is typically less than 200 dollars in value. Other benefits of PPRC and XRC PPRC or XRC may eliminate the need for disaster recovery backups to be taken at the primary site, or to be taken at all.
At least two RRDF licenses are required to support the remote site recovery function, one for the primary site and one for the remote site. For details of RRDF support needed for the CICS Transaction Server, see “Remote Recovery Data Facility support” on page 239.
You should ensure that a senior manager is designated as the disaster recovery manager. The recovery manager must make the final decision whether to switch to a remote site, or to try to rebuild the local system (this is especially true if you have adopted a solution that does not have a warm or hot standby site).
This support is available at the remote site only if the system log is transmitted and the CICS regions at the remote site are running under CICS Transaction Server. It is possible for system log records to be transmitted to the remote site for units-of-work that subsequently become indoubt-failed or backout-failed.
If a disaster occurs at the primary site, your disaster recovery procedures should include recovery of VSAM data sets at the designated remote recovery site. You can then emergency restart the CICS regions at the remote site so that they can backout any uncommitted data.
Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead.
The licensed program described in this document and all licensed material available for it are provided by IBM under terms of the IBM Customer Agreement, IBM International Programming License Agreement, or any equivalent agreement between us.
CICS Transaction Server for z/OS Program Directory, GI13-0536 CICS Transaction Server for z/OS What's New, GC34-6994 CICS Transaction Server for z/OS Upgrading from CICS TS Version 2.3, GC34-6996 CICS Transaction Server for z/OS Upgrading from CICS TS Version 3.1, GC34-6997 CICS Transaction Server for z/OS Upgrading from CICS TS Version 3.2, GC34-6998...
CICSPlex SM Problem Determination, GC34-7037 Other CICS publications The following publications contain further information about CICS, but are not provided as part of CICS Transaction Server for z/OS, Version 4 Release 1. Designing and Programming CICS Applications, SR23-9692 CICS Application Migration Aid Guide, SC33-0768...
3270 emulator logged on to TSO v using a 3270 emulator as an MVS system console IBM Personal Communications provides 3270 emulation with accessibility features for people with disabilities. You can use this product to provide the accessibility features you need in your CICS system.
Page 260
CICS TS for z/OS 4.1: Recovery and Restart Guide...
Page 262
DL/I (continued) implicit enqueuing upon 158 intertransaction communication 146 scheduling program isolation scheduling 158 documenting recovery and restart programs 105 DSNBs, data set name blocks recovery 54 DTIMOUT option (DEFINE TRANSACTION) 123 dump table 50 dynamic RLS restart 37 dynamic transaction backout 6 basic mapping support 78 decision to use 150 emergency restart backout 6...
Page 263
locking (continued) implicit locking on recoverable files 156 in application programs 154 locks 14 log of logs failures 119 logical levels, application program 92 logical recovery 132 lost locks recovery from 89 managing UOW state 18 MNPS 38, 39, 40 moving data sets using EXPORT and IMPORT commands 185...
Page 264
system log stream basic definition 104 system logs log-tail deletion 114 system or abend exit creation 95 system recovery table (SRT) definition of 104 user extensions to 95 system warm keypoints 27 systems administration for BWO 208 tables for recovery 104 task termination, abnormal 94 DFHPEP execution 94 DFHREST execution 93...
IBM business partner, or your authorized remarketer. When you send comments to IBM, you grant IBM a nonexclusive right to use or distribute your comments in any way it believes appropriate without incurring any obligation to you. IBM or any other organizations will only use the personal information that you supply to contact you about the issues that you state on this form.
Need help?
Do you have a question about the SC34-7012-01 and is the answer not in the manual?
Questions and answers