We invite you to shape the future of IBM, including product roadmaps, by submitting ideas that matter to you the most. Here's how it works:
Post your ideas
Start by posting ideas and requests to enhance a product or service. Take a look at ideas others have posted and upvote them if they matter to you,
Post an idea
Upvote ideas that matter most to you
Get feedback from the IBM team to refine your idea
Help IBM prioritize your ideas and requests
The IBM team may need your help to refine the ideas so they may ask for more information or feedback. The offering manager team will then decide if they can begin working on your idea. If they can start during the next development cycle, they will put the idea on the priority list. Each team at IBM works on a different schedule, where some ideas can be implemented right away, others may be placed on a different schedule.
Receive notifications on the decision
Some ideas can be implemented at IBM, while others may not fit within the development plans for the product. In either case, the team will let you know as soon as possible. In some cases, we may be able to find alternatives for ideas which cannot be implemented in a reasonable time.
Please use the following category to raise ideas for these offerings for all environments (traditional on-premises, containers, cloud):
Cloud Pak for Multicloud Management
Cloud Pak for Network Automation - incl Orchestration and Performance Management
Cloud Pak for Watson AIOps - incl Netcool Operations Management portfolio
Edge Application Manager
IBM Observability with Instana
ITM-APM Products - incl IBM Tivoli Monitoring v6 and Application Performance Monitoring v8
Workload Automation - incl Workload Scheduler
Tivoli System Automation - inc Tivoli System Automation Application Manager (SA AM), Tivoli System Automation for Multiplatforms (SA MP)
ITCAM MS SQL Server Agent should self-recover the Collector on SQL Server recycle
The background and the issue to solve: Currently MS SQL agent version V184.108.40.206 fails to return data to TEPS after the SQL Server Instance has stopped and then been started again. This occurs if the SQL Server instance has stopped for more than the default 3 minutes. This happens because the Collector part of the agent needs the SQL Instance to be available to connect to. By default it will attempt to start and connect 3 times at 1 minute intervals and then shutdown.
Possible manual mitigation It is possible to mitigate this by extending the retry interval via an Override Local Value setting of COLL_MSSQL_RETRY_INTERVAL and/or COLL_MSSQL_RETRY_CNT environment variables. But these must be done via MTEMS UI or running TACMD ConfigureSystem post install. Also how long do you set the interval to? Alternatively a situation can be created to detect that the Collector is stopped and the SQL Instance running and take an action to start the collector. Both of these will help ease the issue, but in a large enterprise organization with several hundred instances to monitor neither is optimal.
The Requested change/Enhancement
The core agent KOQAgent_<Instance_name> will respond to the scenario where teh collector KOQCOLL_<instance_name> is stopped, has exceeded the retry interval (reached the retry count default or overriden limit) and the SQL instance <instance_name> is running. The response will be to attempt to restart the KOQCOLL_<instance_name>. The default behaviour will be to run this process once then fire a situation to the TEPS to allow the position to be alerted on to the installation ITSM tooling.
The default behaviour should be configurable via silent_install response files or directly in config files such that changes can be picked up by a recycle if the agent, or pushed via TACMD. The behaviour should be configurable to change from a default of 1 attempt before alerting to switch it off, or make a large number of attempts as might be specified by the installation. The ability to alert should be able to be toggled on or off, or set to alert every 'n' tries - where 'n' is not larger than the maximum retries set. The alert should include details of the instance, hostname and number of attempts made to start the Collector.
Why is it useful?
It will increase the automated recovery potential of failures in the OQ product Collector, reducing the need for 1) manual intervention to recover stopped monitoring
2) Guesswork on how long to set the retry interval for
3) Complexity in build to add in extra steps to configure the override retry count or to create a situation on the part of the customer to self-monitor the monitoring agent
It will provide greater reliability and stability in the product in large installations.
Do not place IBM confidential, company confidential, or personal information into any field.