"Request for a written description of detailed information about relationship between OpState of resources and OpState of resource group, in Administrator's and User's Guide"
This request comes from the PMR - 79859,6XM,760 (E54-z-79859)
TSA V3.2.1 FP3 design(internal behavior) change causes lsrpdomain command to fail during TSA startup.
In our customer's environment, we are doing the daily operations with user scripts, such as start and stop domain or resource group.
OpState "Unknown" of a Resource has occurred because MonitorCommand after the StopCommand is killed while the resource group is stopping.
At that time OpState of the resource group remained "PendingOffline" because other resources were yet in the middle of stopping,
then OpState of the resource group changed to "StuckOnline" after all resources are stopped, momentarily changed to "FailedOffline".
For the resource which had become "Unknown" OpState, MonitorCommand is executed again after MonitorCommandPeriod.
OpState "Offline" will then be detected by the MonitorCommand and OpState for the resource group will be "Offline".
There is no description of behavior like this in the TSA manuals.
Many other users will have to control the start and stop domain by user scripts.
If we do not know about all the transition conditions of OpState, script will abend due to script not being able to accurately detect and handle the exact state of the TSA resources.
In this regard, I got the following answers,
"Customer was using IBM.ERRM to monitor the OpState change at the resource group level which can and does report internal OpState changes, not just the "true" OpState change for the resources.
Recommended that customer change monitoring structure to watch either the aggregate resources or their constituent resources in hopes of much greater granularity of problem detection and reaction. "
It is not recommended for the user scripts to refer the OpState of the resource groups.
For error handling, we want detailed information for the relationship between the OpState of resource groups and OpState of resources, to avoid any unexpected behavior.
In "Administrator's and User's Guide (SC34-2583-01 p42)" this behavior is explained that "A resource group can reach the status Failed Offline as a result of a binder run" (see below),
We want the details of binder's judgement to be described in the manual.
Failed Offline 3
Specifies that one or more member resources contained in the resource group, are FailedOffline. In this case, all resources contained in the resource group will be set to Offline.
A resource group can reach the status Failed Offline "as a result of a binder run" if a resource group members could not be placed. See also “Assigning node location” on page 87. If this is the case the automation details show a BindingState of Sacrificed.
Due to processing by IBM, this request was reassigned to have the following updated attributes:
Brand - Cloud
Product family - Workload Automation and Control Desk
Product - Tivoli System Automation for Multiplatforms (SA MP)
For recording keeping, the previous attributes were:
Brand - WebSphere
Product family - ITSM Automation and Control Desk
Product - Tivoli System Automation for Multiplatforms (SA MP)
Due to processing by IBM, this request was reassigned to have the following updated attributes:
Brand - WebSphere
Product family - ITSM Automation and Control Desk
Product - Tivoli System Automation for Multiplatforms (SA MP)
For recording keeping, the previous attributes were:
Brand - Tivoli
Product family - Automation
Product - Tivoli System Automation for Multiplatforms (SA MP)