Overview:
Our current Instana environment does not reliably persist traces and calls beyond the default 7-day retention period. According to documentation, when a trace meets specific conditions—such as being viewed or containing critical errors—it should be persisted and callable for up to 365 days. However, we are experiencing unpredictable behavior where traces that meet these conditions are still expiring after 7 days, impacting long-term analysis and troubleshooting.
Problem Statement:
Traces and calls that are either critical (e.g., containing errors) or have been viewed should be retained for 365 days.
Currently, even when these conditions are met, the traces and calls are not consistently persisted beyond 7 days.
This behavior:
- Limits our ability to conduct thorough long-term error analysis.
- Prevents effective post-incident review and troubleshooting.
- Reduces confidence in the observability solution, as the historical context is lost unexpectedly.
Proposed Enhancement:
- Guaranteed 365-Day Retention:
Automatically persist and ensure accessibility of traces and calls for 365 days once they meet one of the following criteria:
- Viewed Traces: Traces or calls that have been viewed in the UI or are gathered by API call / trace ID GET.
The size / number of calls should not be relevant for persisted / extended retention of traces here.
- Traces with generated LINK: When a trace/call has a created Link, via the Link button, it must be persisted.
The size / number of calls should not be relevant for persisted / extended retention of traces here.
- Critical Traces: At least one Trace or Call that include critical errors or log messages with severety Error or FATAL. (Since this can be a lot of traces in an error case, sampling can be used here when they are not viewed. It would be fantastic to have a sampling based on service / error type for those errors to be long term retention)
The size / number of calls can be relevant for persisted / extended retention of traces here to not cause to much data storage. alternatively a more aggressive sampling could be used here.
Side note:
- Make the saved traces only visible in the analzy trace / call view or via the persisted link if a user is looking for statistics, like in APM metrics, those traces should not be the foundation of metrics, when they passed the 7 days retention time. After 7 days trace retention, metrics are the source of truth for APM views.
Configuration and Documentation:
Update the Instana documentation to accurately reflect when and how traces are persisted for 365 days.
Business Impact:
Ensuring that critical or frequently viewed traces and calls are retained for 365 days is vital for:
- Long-term error analysis and efficient troubleshooting.
- Maintaining a comprehensive historical record for performance and reliability investigations.
- Enabling teams to review past incidents, identify recurring issues, and improve overall service quality.
Use Case Example:
A) During a critical incident, a development team generates a short link for a trace that meets the criteria for extended retention. The expectation is that this trace remains accessible for 365 days, allowing the team to perform in-depth post-incident analysis. With the current 7-day retention, the loss of this trace compromises the ability to diagnose recurring issues and delays resolution efforts.
B) During a 2 week sprint technical review the team finds suspicious metrics (errors, high latency, error log messages) which did not cause an indicent, the development team want to understand the issue in more detail and looks for traces that meets the criteria at the end of the sprint. The expectation is that they can see some traces for the specific error trace beyond the 7 days for the 2 weeks sprint, allowing the team to perform in-depth analysis to understand the nature of the problem. With the current 7-day retention, the loss of this trace compromises the ability to diagnose infrequent issues and delays resolution efforts.
Conclusion:
We request that Instana implement a solution to automatically persist and ensure 365-day retention for traces and calls that are either critical or have been viewed for a designated period. This enhancement will improve long-term observability, support effective troubleshooting, and enhance overall operational reliability.
Thank you for considering this enhancement request. I look forward to your feedback.
Hello Alireza,
Thank you for taking the time to provide your ideas to IBM. Your request may not be delivered within the release currently under development, but the theme aligns with our current roadmap and, as such, is being tagged for future consideration. IBM may consider and evaluate any community feedback for your request through activities such as voting and we may reach out to you about this request to discuss additional details with you in the future.
We truly value our relationship with you and appreciate your willingness to share details about your experience, your recommendations, and ideas.
If you have any additional feedback or thoughts, or if there is anything else I can do, please do not hesitate to reply to this message to continue the conversation.
Please note: IBM's statements regarding its plans, directions, and intent are subject to change or withdrawal without notice at IBM's sole discretion.