ITCAM MS SQL Server Agent should self-recover the Collector on SQL Server recycle

See this idea on ideas.ibm.com

The background and the issue to solve:
Currently MS SQL agent version V6.3.1.20 fails to return data to TEPS after the SQL Server Instance has stopped and then been started again. This occurs if the SQL Server instance has stopped for more than the default 3 minutes. This happens because the Collector part of the agent needs the SQL Instance to be available to connect to. By default it will attempt to start and connect 3 times at 1 minute intervals and then shutdown.

Possible manual mitigation
It is possible to mitigate this by extending the retry interval via an Override Local Value setting of COLL_MSSQL_RETRY_INTERVAL and/or COLL_MSSQL_RETRY_CNT environment variables. But these must be done via MTEMS UI or running TACMD ConfigureSystem post install. Also how long do you set the interval to? Alternatively a situation can be created to detect that the Collector is stopped and the SQL Instance running and take an action to start the collector. Both of these will help ease the issue, but in a large enterprise organization with several hundred instances to monitor neither is optimal.

The Requested change/Enhancement

The core agent KOQAgent_<Instance_name> will respond to the scenario where teh collector KOQCOLL_<instance_name> is stopped, has exceeded the retry interval (reached the retry count default or overriden limit) and the SQL instance <instance_name> is running. The response will be to attempt to restart the KOQCOLL_<instance_name>. The default behaviour will be to run this process once then fire a situation to the TEPS to allow the position to be alerted on to the installation ITSM tooling.

The default behaviour should be configurable via silent_install response files or directly in config files such that changes can be picked up by a recycle if the agent, or pushed via TACMD. The behaviour should be configurable to change from a default of 1 attempt before alerting to switch it off, or make a large number of attempts as might be specified by the installation. The ability to alert should be able to be toggled on or off, or set to alert every 'n' tries - where 'n' is not larger than the maximum retries set. The alert should include details of the instance, hostname and number of attempts made to start the Collector.

Idea priority

High

Post comment

By clicking the "Post Comment" or "Submit Idea" button, you are agreeing to the IBM Ideas Portal Terms of Use.
Do not place IBM confidential, company confidential, or personal information into any field.

Shape the future of IBM!

Search existing ideas

Post your ideas

Specific links you will want to bookmark for future use

ITCAM MS SQL Server Agent should self-recover the Collector on SQL Server recycle

Please enter your email address

RELATED IDEAS

ITCAM MS SQL Server Agent should self-recover the Collector on SQL Server recycle