prometheus-native-histograms-in-proudction
created : 2023-09-03T16:05:34+00:00
modified : 2023-09-03T16:32:44+00:00
Links
- https://www.youtube.com/watch?v=TgINvIK9SYc&list=PLj6h78yzYM2ORxwcjTn4RLAOQOYjvQ2A3&index=6
Disclaimer
- Native Histograms are an experimental feature!
- Everything described here can stil lchange!
- Things might break or behave weirdly!
prometheus --enable-feature=native-histograms
Wishlist
- Everything that works well now should continue to work well.
- I never want to configure buckets again.
- All histograms should always be aggregatable with each other, across time and space.
- I want accurate quantile and percentage estimations across the whole range of observations.
- I want all of that at a lower cost thant current histograms so that I can finally partition histograms at will.
1. Resource consumption of the instrumented binary
2. Frequency of resets and resolution reduction
- Scraping 15 instances of the cloud-backend-gateway.
-
Drop everything but the
cortex_request_duration_seconds
histograms. - Scraping classic histograms:
- 964 histograms (peak)
- 16388 series (964 * 17)
- 14460 buckets (964 * 15)
- Scraping native histograms:
- 964 histograms (peak)
- 964 series
Frequency of resets to reduce bucket count
- Top 10 rest histograms are all:
{route=~"api_prom_api_v1_query(_range)", status_code="200"}
- Even among thos typically just a handful of rests per day.
- Worst offender during the 15d of the experiment: 8 resets per day. (Please check the original video)
- Rarely touching the configured 1h limit.
Frequency of resolution reduction
- Only ever happended one stap (from growth factor 1.0905… to 1.1892…).
- Happens “occasionally”…