Skip to Main Content
Cloud Management and AIOps


This is an IBM Automation portal for Cloud Management, Technology Cost Management, Network Automation and AIOps products. To view all of your ideas submitted to IBM, create and manage groups of Ideas, or create an idea explicitly set to be either visible by all (public) or visible only to you and IBM (private), use the IBM Unified Ideas Portal (https://ideas.ibm.com).

Shape the future of IBM!

We invite you to shape the future of IBM, including product roadmaps, by submitting ideas that matter to you the most. Here's how it works:

Search existing ideas

Start by searching and reviewing ideas and requests to enhance a product or service. Take a look at ideas others have posted, and add a comment, vote, or subscribe to updates on them if they matter to you. If you can't find what you are looking for,

Post your ideas
  1. Post an idea.

  2. Get feedback from the IBM team and other customers to refine your idea.

  3. Follow the idea through the IBM Ideas process.

Specific links you will want to bookmark for future use

Welcome to the IBM Ideas Portal (https://www.ibm.com/ideas) - Use this site to find out additional information and details about the IBM Ideas process and statuses.

IBM Unified Ideas Portal (https://ideas.ibm.com) - Use this site to view all of your ideas, create new ideas for any IBM product, or search for ideas across all of IBM.

ideasibm@us.ibm.com - Use this email to suggest enhancements to the Ideas process or request help from IBM for submitting your Ideas.

Status Submitted
Workspace Cloudability
Created by Guest
Created on Nov 6, 2025

Cloudability feature that analyzes Azure Event Hubs and reports over-provisioning and rightsizing recommendations

What We Need

A Cloudability feature that analyzes Azure Event Hubs usage and tells us when we're paying for more capacity than we need (or when we're about to hit limits). Since tier and capacity are set at the namespace level, we need rightsizing recommendations at that level - but with visibility into what each individual event hub is consuming.


 

Why This Matters

Event Hubs can get expensive fast, especially Premium tier. We need to know:

  • Are we overprovisioned? (wasting money)
  • Are we about to hit limits? (performance risk)
  • Should we split or consolidate namespaces?

Should we switch between Standard and Premium tiers?

Analysis Scope - What to Support

Multi-Level Analysis (Priority Order)

  1. Single Namespace Analysis - Deep dive with individual event hub breakdown 
    • Shows total namespace utilization
    • Breaks down which event hubs are consuming what percentage
    • Identifies "noisy neighbors" hogging resources
  2. Batch Namespace Analysis - Multiple namespaces in one report 
    • Summary view: "15 namespaces analyzed, 8 optimization opportunities, $12K/month savings"
    • Drill down into any namespace for details
  3. Subscription-Level - All namespaces in a subscription

Cross-Subscription - All namespaces across multiple subscriptions

Why Individual Event Hub Visibility Matters

Even though we rightsize at the namespace level, we need to see individual event hub usage to make smart decisions:

Example: prod-eventhub-namespace (Premium, 2 PUs, $2,400/month)

├─ orders-hub: 85% of throughput → Maybe needs its own namespace

├─ inventory-hub: 10% of throughput

├─ logging-hub: 3% of throughput → Could consolidate these three

└─ analytics-hub: 2% of throughput    into one Standard namespace


 

What to Collect

Basic Info

  • Subscription/Account Name
  • Vendor: Azure
  • Resource Group
  • Namespace Name
  • Current Tier: Standard or Premium
  • Current Capacity: # of TUs (Standard) or PUs (Premium)
  • Auto-Inflate Status (Available on Standard Tier only): Enabled/Disabled + max units

Date Range: Minimum 30 days recommended

Metrics to Track

Throughput (Most Important for Rightsizing)

  • Incoming bytes/sec (ingress) - converted to MB/s
  • Outgoing bytes/sec (egress) - converted to MB/s
  • Incoming messages/sec
  • Outgoing messages/sec
  • Track for each: Average, Peak, P95, P99

Why these matter:

  • Standard: 1 TU = 1 MB/s ingress OR 2 MB/s egress (whichever hits first)
  • Premium: 1 PU ≈ 8 MB/s combined throughput
  • Your bottleneck is whichever limit you hit first (usually egress on Standard)

Performance Issues

  • Throttled requests (send + consumer)
  • Server errors
  • User errors
  • Success rate

Connections

  • Active connections (current, peak, average)
  • Connections opened/closed per period

Premium Tier Only

  • CPU usage % (average, peak)
  • Memory usage % (average, peak)

Additional

  • Namespace storage utilization

Capture backlog (if using Capture feature)

Rightsizing Thresholds - When to Recommend Changes

Important: All utilization percentages below refer to throughput utilization - the percentage of ingress/egress capacity being used based on the calculations above. Always use the higher of ingress or egress as your constraining metric.

Safe to Downsize (High Confidence)

  • P95 throughput utilization < 45%
  • Peak throughput utilization < 65%
  • Throttling < 0.1% of requests
  • Sustained for 70%+ of analysis period
  • Action: Reduce capacity by 20-30%

Example: 10 TUs, P95 egress 7.2 MB/s → 36% utilization → Reduce to 7 TUs

Critical Downsize (Very Safe)

  • P95 throughput utilization < 30%
  • Peak throughput utilization < 50%
  • Zero throttling
  • Action: Reduce capacity by 30-40%

Example: 10 TUs, P95 egress 4.5 MB/s → 22% utilization → Reduce to 6 TUs

Needs Upsize (Performance Risk)

  • P95 throughput utilization > 75%
  • Peak throughput utilization > 90%
  • Throttling > 1% of requests
  • Action: Increase capacity by 20-30%

Example: 10 TUs, P95 egress 16.2 MB/s → 81% utilization → Increase to 13 TUs

Critical Upsize (Act Now)

  • P95 throughput utilization > 85%
  • Peak throughput utilization > 95%
  • Throttling > 5% of requests
  • Sustained high usage > 1 hour
  • Action: Increase capacity by 40-50% immediately

Example: 10 TUs, Peak egress 19.5 MB/s → 97% utilization → Increase to 15 TUs NOW

Optimal Range (No Changes)

  • P95 throughput: 55-70%
  • Peak throughput: 75-85%
  • Throttling: < 1%

This range provides:

  • Enough headroom for traffic spikes
  • Cost efficiency (not grossly overprovisioned)
  • Performance safety margin

Tier Change Recommendations

Premium → Standard:

  • P95 < 35% consistently
  • Predictable, steady workload
  • No dedicated resource requirements
  • Potential 40-60% cost savings

Standard → Premium:

  • Frequent throttling at max TUs
  • Need predictable performance

CPU/memory constraints on Standard

What Each Recommendation Should Include

Summary View

Current State:

- Namespace: prod-events-ns

- Tier: Premium, 3 PUs

- Current Cost: $3,600/month

- Ingress: 2.1 MB/s (P95), 3.2 MB/s (Peak)

- Egress: 4.3 MB/s (P95), 6.8 MB/s (Peak)

- Combined: 6.4 MB/s (P95), 10.0 MB/s (Peak)

- Capacity: 24 MB/s (3 PUs × 8 MB/s)

- Utilization: 27% (P95), 42% (Peak)

 

Recommendation:

- Reduce to 2 PUs (Premium)

- New Capacity: 16 MB/s

- New Utilization: 40% (P95), 63% (Peak)

- New Cost: $2,400/month

- Savings: $1,200/month ($14,400/year)

- Confidence: High (95%)

- Buffer: 37% headroom above P95

Idea priority High