prometheus-native-histograms-in-proudction
created : Mon, 04 Sep 2023 01:05:34 +0900
modified : Mon, 04 Sep 2023 01:32:44 +0900
Links
Disclaimer
- Native Histograms are an experimental feature!
- Everything described here can stil lchange!
- Things might break or behave weirdly!
prometheus --enable-feature=native-histograms
Wishlist
- Everything that works well now should continue to work well.
- I never want to configure buckets again.
- All histograms should always be aggregatable with each other, across time and space.
- I want accurate quantile and percentage estimations across the whole range of observations.
- I want all of that at a lower cost thant current histograms so that I can finally partition histograms at will.
1. Resource consumption of the instrumented binary
2. Frequency of resets and resolution reduction
Scraping 15 instances of the cloud-backend-gateway.
Drop everything but the
cortex_request_duration_seconds
histograms.Scraping classic histograms:
- 964 histograms (peak)
- 16388 series (964 * 17)
- 14460 buckets (964 * 15)
Scraping native histograms:
- 964 histograms (peak)
- 964 series
Frequency of resets to reduce bucket count
- Top 10 rest histograms are all:
{route=~"api_prom_api_v1_query(_range)", status_code="200"}
- Even among thos typically just a handful of rests per day.
- Worst offender during the 15d of the experiment: 8 resets per day. (Please check the original video)
- Rarely touching the configured 1h limit.
Frequency of resolution reduction
- Only ever happended one stap (from growth factor 1.0905… to 1.1892…).
- Happens “occasionally”…