Skip to Main Content
Cloud Management and AIOps


This is an IBM Automation portal for Cloud Management, Technology Cost Management, Network Automation and AIOps products. To view all of your ideas submitted to IBM, create and manage groups of Ideas, or create an idea explicitly set to be either visible by all (public) or visible only to you and IBM (private), use the IBM Unified Ideas Portal (https://ideas.ibm.com).

Shape the future of IBM!

We invite you to shape the future of IBM, including product roadmaps, by submitting ideas that matter to you the most. Here's how it works:

Search existing ideas

Start by searching and reviewing ideas and requests to enhance a product or service. Take a look at ideas others have posted, and add a comment, vote, or subscribe to updates on them if they matter to you. If you can't find what you are looking for,

Post your ideas
  1. Post an idea.

  2. Get feedback from the IBM team and other customers to refine your idea.

  3. Follow the idea through the IBM Ideas process.

Specific links you will want to bookmark for future use

Welcome to the IBM Ideas Portal (https://www.ibm.com/ideas) - Use this site to find out additional information and details about the IBM Ideas process and statuses.

IBM Unified Ideas Portal (https://ideas.ibm.com) - Use this site to view all of your ideas, create new ideas for any IBM product, or search for ideas across all of IBM.

ideasibm@us.ibm.com - Use this email to suggest enhancements to the Ideas process or request help from IBM for submitting your Ideas.

Status Delivered
Workspace Instana
Categories Alert
Created by Guest
Created on May 13, 2024

Would like to define Warning and Critical threshold ranges within a single Event or SmartAlert

If a customer wants to create both a Warning and Critical threshold for the same underlying metric, they create either 2 Smart Alerts or 2 Events.    For example, create a Warning Event when CPU usage is between 80% and 90% utilization and Critical Event when CPU usage is greater than 90%.    Similar things can be done with Smart Alerts.  For example, create 2 Smart Alerts with Warning when latency is between 500ms and 1000ms and Critical when latency is greater than 1000ms.

 

This functions, but can cause problems from an Alerting perspective.    If the metrics change back and forth between the Warning and Critical ranges, it can cause multiple Events and Alerts to trigger.   Ultimately, this can lead to multiple Tickets in a ticketing system.   Here is an example:

CPU thresholds are Warning from 80% to 90% and Critical greater than 90%

  • CPU usage is 88% - An Event triggers and an Alert is sent
  • CPU increases to 92% - Another Event triggers and another Alert is sent.   And, the original Event MAY close depending on the grace period
  • CPU decreases to 87% - If the grace period for the first Event has passed, a NEW Event and Alert open.   And, depending on the grace period of the critical event, the 2nd Event closes.

It would be MUCH better if a Single Event and Alert could be updated to reflect the severity of the underlying conditions.  This could be accomplished by having a single Event or SmartAlert definition that has 2 different severities and value ranges associated with it.

Idea priority Low
  • Guest
    Reply
    |
    Sep 4, 2024

    The response states that this ask was not delivered and the idea was relegated as a nice to have.  Within version 2 81, there is an option to configure an event or smart alert as a critical or warning but that would be independent of the other so yes, the warning will remain open till the metric value decreases below the threshold and the same with critical.

    Secondary, yes, there will be multiple events open/closed if the value bounces above and then below thresholds.  Using grace period, ignore for, while novel in flattening out the event cycles.

    To conclude, this was not delivered and was shuffled to "nice to have", thus, to create the functionality which all other APM software have, we will need to use the work around, which is two events or two smart alerts and then contend with the chatter and trying to line up the critical close to the critical open, and the warning open to the warning close, and with the possibility of multiple opens and closes.

    We are staring down over 2,000 alerts in our current APM and to convert them, we will need to create 4,000+ smart alerts manually since smart alerts do not have a terraform API.  Or we could use 4000 events, but then since there is no placeholder for the affected server/instance, we would need to basically create the event set for each context, so instead of the base 2000, we are looking to (5 metrics x 500 servers=2500 base) + 5 x 400 application/services = 2000 base, then x2 for critical/warning, 9,000.  Yeah the work around adds far more than a "nice to have"

  • Guest
    Reply
    |
    May 20, 2024

    Just for completeness, this is a follow-up of https://automation-management.ideas.ibm.com/ideas/INSTANA-I-1773. And it was identified as a nice-to-have, but not an ITM blocker, due to the existing workaround of creating creating 2 alert-configurations with different thresholds/severities.

    In contrast to what is described in the Idea description, there is hower one thing that is different. We don't allow defining a range (e.g. WARN from 80%-90%), and we also don't intend to do it that way. Instead, we intend to define an escalation, of different thresholds all using the same threshold operator, to ensure there are no gaps/conflicts in the threshold, when then cause the described problems in the Idea description.

    > And, the original Event MAY close depending on the grace period

    This would therefore not happen, neither in the current (workaround) solution of 2 Cutom Events, because the rule would not be defined as WARNING in range of 80-90%, but WARNING > 80%, which would be still the case when the metric changed to 92%. Both defined Custom Events (WARN if > 80%; CRITICAL if > 90%) would be active at that time, as both conditions are met.