Skip to Main Content
Cloud Management and AIOps


This is an IBM Automation portal for Cloud Management and AIOps products. To view all of your ideas submitted to IBM, create and manage groups of Ideas, or create an idea explicitly set to be either visible by all (public) or visible only to you and IBM (private), use the IBM Unified Ideas Portal (https://ideas.ibm.com).

Shape the future of IBM!

We invite you to shape the future of IBM, including product roadmaps, by submitting ideas that matter to you the most. Here's how it works:

Search existing ideas

Start by searching and reviewing ideas and requests to enhance a product or service. Take a look at ideas others have posted, and add a comment, vote, or subscribe to updates on them if they matter to you. If you can't find what you are looking for,

Post your ideas
  1. Post an idea.

  2. Get feedback from the IBM team and other customers to refine your idea.

  3. Follow the idea through the IBM Ideas process.

Specific links you will want to bookmark for future use

Welcome to the IBM Ideas Portal (https://www.ibm.com/ideas) - Use this site to find out additional information and details about the IBM Ideas process and statuses.

IBM Unified Ideas Portal (https://ideas.ibm.com) - Use this site to view all of your ideas, create new ideas for any IBM product, or search for ideas across all of IBM.

ideasibm@us.ibm.com - Use this email to suggest enhancements to the Ideas process or request help from IBM for submitting your Ideas.

Workspace Instana
Categories Alert
Created by Guest
Created on May 13, 2024

Would like to define Warning and Critical threshold ranges within a single Event or SmartAlert

If a customer wants to create both a Warning and Critical threshold for the same underlying metric, they create either 2 Smart Alerts or 2 Events.    For example, create a Warning Event when CPU usage is between 80% and 90% utilization and Critical Event when CPU usage is greater than 90%.    Similar things can be done with Smart Alerts.  For example, create 2 Smart Alerts with Warning when latency is between 500ms and 1000ms and Critical when latency is greater than 1000ms.

 

This functions, but can cause problems from an Alerting perspective.    If the metrics change back and forth between the Warning and Critical ranges, it can cause multiple Events and Alerts to trigger.   Ultimately, this can lead to multiple Tickets in a ticketing system.   Here is an example:

CPU thresholds are Warning from 80% to 90% and Critical greater than 90%

  • CPU usage is 88% - An Event triggers and an Alert is sent
  • CPU increases to 92% - Another Event triggers and another Alert is sent.   And, the original Event MAY close depending on the grace period
  • CPU decreases to 87% - If the grace period for the first Event has passed, a NEW Event and Alert open.   And, depending on the grace period of the critical event, the 2nd Event closes.

It would be MUCH better if a Single Event and Alert could be updated to reflect the severity of the underlying conditions.  This could be accomplished by having a single Event or SmartAlert definition that has 2 different severities and value ranges associated with it.

Idea priority Low
  • Guest
    Reply
    |
    May 20, 2024

    Just for completeness, this is a follow-up of https://automation-management.ideas.ibm.com/ideas/INSTANA-I-1773. And it was identified as a nice-to-have, but not an ITM blocker, due to the existing workaround of creating creating 2 alert-configurations with different thresholds/severities.

    In contrast to what is described in the Idea description, there is hower one thing that is different. We don't allow defining a range (e.g. WARN from 80%-90%), and we also don't intend to do it that way. Instead, we intend to define an escalation, of different thresholds all using the same threshold operator, to ensure there are no gaps/conflicts in the threshold, when then cause the described problems in the Idea description.

    > And, the original Event MAY close depending on the grace period

    This would therefore not happen, neither in the current (workaround) solution of 2 Cutom Events, because the rule would not be defined as WARNING in range of 80-90%, but WARNING > 80%, which would be still the case when the metric changed to 92%. Both defined Custom Events (WARN if > 80%; CRITICAL if > 90%) would be active at that time, as both conditions are met.