This is an IBM Automation portal for Cloud Management and AIOps products. To view all of your ideas submitted to IBM, create and manage groups of Ideas, or create an idea explicitly set to be either visible by all (public) or visible only to you and IBM (private), use the IBM Unified Ideas Portal (https://ideas.ibm.com).
Shape the future of IBM!
We invite you to shape the future of IBM, including product roadmaps, by submitting ideas that matter to you the most. Here's how it works:
Search existing ideas
Start by searching and reviewing ideas and requests to enhance a product or service. Take a look at ideas others have posted, and add a comment, vote, or subscribe to updateson them if they matter to you. If you can't find what you are looking for,
Post your ideas
Post an idea.
Get feedback from the IBM team and other customers to refine your idea.
Follow the idea through the IBM Ideas process.
Please use the following category to raise ideas for these offerings for all environments (traditional on-premises, containers, cloud):
Cloud Pak for Multicloud Management
Cloud Pak for Network Automation - including Orchestration and Performance Management
Cloud Pak for Watson AIOps - including Netcool Operations Management portfolio
Edge Application Manager
IBM Observability with Instana
ITM-APM Products - including IBM Tivoli Monitoring v6 and Application Performance Monitoring v8
Workload Automation - including Workload Scheduler
Tivoli System Automation - including Tivoli System Automation Application Manager (SA AM), Tivoli System Automation for Multiplatforms (SA MP)
ITCAM MS SQL Server Agent should self-recover the Collector on SQL Server recycle
The background and the issue to solve: Currently MS SQL agent version V220.127.116.11 fails to return data to TEPS after the SQL Server Instance has stopped and then been started again. This occurs if the SQL Server instance has stopped for more than the default 3 minutes. This happens because the Collector part of the agent needs the SQL Instance to be available to connect to. By default it will attempt to start and connect 3 times at 1 minute intervals and then shutdown.
Possible manual mitigation It is possible to mitigate this by extending the retry interval via an Override Local Value setting of COLL_MSSQL_RETRY_INTERVAL and/or COLL_MSSQL_RETRY_CNT environment variables. But these must be done via MTEMS UI or running TACMD ConfigureSystem post install. Also how long do you set the interval to? Alternatively a situation can be created to detect that the Collector is stopped and the SQL Instance running and take an action to start the collector. Both of these will help ease the issue, but in a large enterprise organization with several hundred instances to monitor neither is optimal.
The Requested change/Enhancement
The core agent KOQAgent_<Instance_name> will respond to the scenario where teh collector KOQCOLL_<instance_name> is stopped, has exceeded the retry interval (reached the retry count default or overriden limit) and the SQL instance <instance_name> is running. The response will be to attempt to restart the KOQCOLL_<instance_name>. The default behaviour will be to run this process once then fire a situation to the TEPS to allow the position to be alerted on to the installation ITSM tooling.
The default behaviour should be configurable via silent_install response files or directly in config files such that changes can be picked up by a recycle if the agent, or pushed via TACMD. The behaviour should be configurable to change from a default of 1 attempt before alerting to switch it off, or make a large number of attempts as might be specified by the installation. The ability to alert should be able to be toggled on or off, or set to alert every 'n' tries - where 'n' is not larger than the maximum retries set. The alert should include details of the instance, hostname and number of attempts made to start the Collector.
Do not place IBM confidential, company confidential, or personal information into any field.