This is an IBM Automation portal for Cloud Management and AIOps products. To view all of your ideas submitted to IBM, create and manage groups of Ideas, or create an idea explicitly set to be either visible by all (public) or visible only to you and IBM (private), use the IBM Unified Ideas Portal (https://ideas.ibm.com).
Shape the future of IBM!
We invite you to shape the future of IBM, including product roadmaps, by submitting ideas that matter to you the most. Here's how it works:
Search existing ideas
Start by searching and reviewing ideas and requests to enhance a product or service. Take a look at ideas others have posted, and add a comment, vote, or subscribe to updateson them if they matter to you. If you can't find what you are looking for,
Post your ideas
Post an idea.
Get feedback from the IBM team and other customers to refine your idea.
Follow the idea through the IBM Ideas process.
Specific links you will want to bookmark for future use
With the cgroupv2 starting to emerge maybe is a good idea to start collecting and using the PSI metrics available in machines/containers that enabled cgroupsv2. They are valid for Servers and containers.
Looking at Load Average (existing metric) doesn't give you the full perception of what is happening in your systems. Also Load average gives you only a 1min average value which in the container world is too long. Also, you need to relate Load average with other metrics to understand if you have problems.
With PSI (Pressure Stall Information), it identifies and quantifies the disruptions caused by such resource crunches and the time impact it has on complex workloads or even entire systems.
It creates 3 different files:
Inside those files you have the pressure metrics as 10, 60, 300 sec average.
Example for CPU:
root:~# cat /proc/pressure/cpu some avg10=0.03 avg60=0.07 avg300=0.06 total=5376072182
Avg10:How long have the processes stalled for the last 10 seconds Avg60: Howlong have the processes stalled for the last 60 seconds Avg300:How long have the processes stalled for the last 300 seconds Total:How long have the processes stalled since the server booted
If a process was starved of the CPU for 5 seconds in the last 10 seconds, the Avg10 column will be 50, which means 50% of the last 10 seconds.
Do not place IBM confidential, company confidential, or personal information into any field.