This is an IBM Automation portal for Cloud Management, Technology Cost Management, Network Automation and AIOps products. To view all of your ideas submitted to IBM, create and manage groups of Ideas, or create an idea explicitly set to be either visible by all (public) or visible only to you and IBM (private), use the IBM Unified Ideas Portal (https://ideas.ibm.com).
We invite you to shape the future of IBM, including product roadmaps, by submitting ideas that matter to you the most. Here's how it works:
Start by searching and reviewing ideas and requests to enhance a product or service. Take a look at ideas others have posted, and add a comment, vote, or subscribe to updates on them if they matter to you. If you can't find what you are looking for,
Post an idea.
Get feedback from the IBM team and other customers to refine your idea.
Follow the idea through the IBM Ideas process.
Welcome to the IBM Ideas Portal (https://www.ibm.com/ideas) - Use this site to find out additional information and details about the IBM Ideas process and statuses.
IBM Unified Ideas Portal (https://ideas.ibm.com) - Use this site to view all of your ideas, create new ideas for any IBM product, or search for ideas across all of IBM.
ideasibm@us.ibm.com - Use this email to suggest enhancements to the Ideas process or request help from IBM for submitting your Ideas.
See this idea on ideas.ibm.com
Instana agent has default JAVA heap as follows:
JAVA_MIN_MEM=64m
export JAVA_MIN_MEM
JAVA_MAX_MEM=160m
export JAVA_MAX_MEM
Despite setting more than 3 times the heap size for JAVA_MAX_MEM (eg. 512m) over the nominal memory usage (eg. average below 140m) as observed in a production environment, it can happen under peak loads or several circumstances that the agent gets into OOM. To recover from such issue, a manual systemctl restart of the service yield to restoration of the service to "normal" resource consumption usage.
It is worth noting that automatic restart depends on the how the agent got started. An agent running via kubernetes could be restarted based on the configuration. But when the agent is extracted and/or installed via package them the agent is not able to recover from a hard crash like OOM.
The proposal is to for the agent to detect OOM status to self-recover from OOM crash. The systemctl service has auto-restart service unit but the agent does not seem to return correct failure code under OOM situation. It would be desired to have auto-recovery means for the agent.
The benefit of doing so would be:
prevent manual restart operation (toil)
prevent "out-of-the-band" automation script to recover (ie clean self recovery)
improve reliability of the service
Idea priority | High |
By clicking the "Post Comment" or "Submit Idea" button, you are agreeing to the IBM Ideas Portal Terms of Use.
Do not place IBM confidential, company confidential, or personal information into any field.
Hi,
doing automatic restarts for OOM errors is a two-edged sword. It will improve reliability when the OOM happens in very rare cases caused by abnormal system state.
On the other hand, the automatic restart will cover bugs in the agent (code, configuration, sensors) which will not be noticed as such.
I urge you to open a support ticket on any OOM occasion you encounter with the agent. This gives us the ability to improve agent & sensor quality and other customers will also benefit.
Don't get me wring, this is not meant to be quality control on your side! We do test the agent and our sensors on a lot of different environments and system combinations. Although it will never mirror all of our customers setups, which may introduce those issues.
Best regards
Henning Treu - Product Manager Agent & Application Perspectives