Beyond Tracing - What do we do with all this data

created : 2023-03-05T17:28:31+00:00
modified : 2023-03-05T17:56:19+00:00

tempo grafana
  • https://www.youtube.com/watch?v=zVHHeO8tAWQ

Overview

  • Metrics-generator
  • Parquet
  • TraceQL

Matrics-generator

Why metrics if you have traces?

  • Transcation-oriented : Highly structured
  • Service-oriented : Aggregated, historical

  • Span metrics:
    • Rate, Error, Duration
  • Service graph metrics:
    • Extract service topology
  • Tempo Launched at Oct 2022
  • Tempo 1.0 Jun 2021
  • Search over recent data Nov 2021
  • Full backend search Jan 2022
  • Parquet storage format Dec 2021

What is Parquet?

  • Apache Parquet is an opensource, column-oriented data file format designed for efficient data storage and retrieval.
  • What dos this mean?:
    • Tempo can store and access data more efficiently
    • So can you - arege ecosystem of tools
    • No new infrastructure - just a new file format

Schema

TraceID
Duration
  Span #1
    Name
    ServiceName
    Tag #1
    Tag #2
    .
    Duration
  Span #2
    Name
    ServiceName
    Tag #1
    Event#1
    .
    Duration
    1. Encodings:
      • traceID into dictionary
      • duration into delta
      • tags into dictionary
      • events into snappy
    1. FindTraceByID
    1. Attribute search:
      • cluster="foo", namespace="bar"
      • It uses their tags
    1. Felxible schema:
      • easily add new column (e.g. cluster, http.url)
      • This feature makes us easily find tracing data using custom columns.

Inside a block

  • Parquet:
    • Open file format - use existing tools
    • parquet-tools head data.parquet

TraceQL

Selecting Traces - Basics

{ duration > 2s }
{ name = "GET /:endpoint" }
{ .http.status = 200 }
{ span.http.url =~ "/api/v1/.*" }
{ resource.namespace = "prod" }
{ .http.url="/:endpoint" && .http.status = 200 }

TraceQL - Aggregates

{ .db.system = "postgres" } | cound() > 3 }
{ name = "dns.lookup" } | avg(duration) > 500ms }

TraceQL - Pipelines of Spansets

Selecting Traces - Structural

TraceQL - Structural

{ .service.name = "foo" } >> {.service.name = "bar" }
{ name = "tcp.connect" } ~ { name = "dns.lookup" }
{ .service.name != parent.service.name }

Personal Notes

  • How to connect metircs and tracing graph? Is only the traceId enough to do it?:
    • For now, prometheus is well known as a troublesome because of its structure which cannot be horizontally scaled.
    • To solve this problem, lots of companies use Thanos with low resolution.
    • In this situation, tracing information is newly occurred data to save it.
    • In my opinion, it is needed to be examined especially in the sight of storage like retention & resolution. Because it is linked to metrics which can be stored with short retention and low resolution.