Mastering-OpenTelemetry-And-Observability

created : Sat, 12 Apr 2025 22:55:34 +0900
modified : Fri, 18 Apr 2025 02:26:47 +0900

Mastring OpenTelemetry And Observability

Chapter 1. What Is Observability?

Definition

Cloud Native Era

Monitoring Compared to Observability

Types of Monitoring


Metadata

Dimensionality

Cardinality

Semantice Conventions

Data Sensitivity

Signals

Metrics

Logs

Traces

Other Signals

Collecting Signals

Instrumentation

Push Versus Pull Collection

Data Collection

Sampling Signals

Observability

Application Performance Monitoring

The Bottom Line

Chapter 2. Introducing OpenTelemetry

Background

Observability Pain Points

The Rise of Open Source Software

Specification

Data Collection

Instrumentation

OpenTelemetry Concepts

Distributions

Pipelines

Resources

Registry

Roadmap

The Bottom Line

Chapter 3. Getting Started with the Astronomy Shop

Chapter 4. Understanding the OpenTelemetry Specification

API Specification

API Definition

API Context

API Signals

API Implementation

SDK Specification

SDK Definition

SDK Signals

SDK Implementation

Data Specification

Data Models

InstrumentPropertiesTypeDefault Aggregation
CounterMonotonicSynchronousSum
UpDownCounterAdditiveSynchronousSum
ObserableCounterMonotonicAsynchronousSum
ObservableUpDownCounterAddtiveAsynchrousSum
GaugeNonadditiveSynchronousLast Value
Observable GaugeNondditiveAsynchronousLast Value
HistogramGroupedSynchronousHistogram
Field NameDescriptionNotes
TimestampWhen the event occurredCommon syslog concepts
ObservedTimestampWhen the event was observed
SeverityTextLog level
SeverityNumberNumeric value of log level
BodyThe message of the log record
ResourceSource informationOTel concept; metadata
AttributesAdditional information
InstrumentationScopeScope that emitted the log record
TraceIDRequest trace IDUsed to enable trace correlation
SpanIdRequest span ID
TraceFlagsW3C trace flags

Data Protocols

Data Semantic Conventions

Data Compatibility

General Specification

The Bottom Line

Chapter 5. Managing the OpenTelemetry Collector

Deployment Modes

Agent Mode

Gateway Mode

FlowProsCons
Instrumentation to observability platform- Quickest time to value; simplicity.- Lowest latency.- Less data processing flexibility and requires language-specific components, such as resource detection and configuration.- Operational complexity as each language and possibly each application needs to be independently configured.- Added resource requirements to handle processing, and buffer and retry logic.- Decentralized security controls.
Instrumentation to agent to observability platform- Quick time to value, especially given that instrumentation sends data to a local OTLP destination by default.- Separates telemetry generation from transmission, reducing application load.Enhanced data processing capabilities and dynamic configuration without redploying applications.- Agent is a single point of failure and must be sized and monitored properly.
Instrumentation to gateway to observability platform- If a gateway cluster separates telemetry generation from transmission without a single point of failure.- Supports advanced data processing capabilities, including metric aggregation and tail-based sampling.- Useful in certain environments, such as serverless, where an agent deployment may not be possible.- Cannot offload all application processing capabilities, including resource detection.- Requires thought when configuring pull-based receivers to ensure proper load balancing and no data duplication.- May introduce unacceptable latency, impacting applications.
Instrumentation to agent to gateway to observability platform- The pros of agent and gateway mode. Supports the most use cases and requirements while providing the most data flexibility and portability.- Complex configuration and highest management costs.

Sizing

Components

Configuration

CategoryExamples
Metadata processing- k8sattributesprocessor- resourceprocessor
Filtering, routing, and sampling- filterprocessor- routingprocessor (fyi. deprecated router connector)- tailsamplingprocessor
Enriching- k8sattributeprocessor- resourcedetection
Generating (primarily metrics)- metricsgenerationprocessor- spanmetricsprocessor
Grouping (helpful in batching and processing)- groupbyattrprocessor- groupbytraceprocessor (valid for tail-based sampling)
Transforming (primarily metrics)- cumulativetodeltaprocessor- deltatorateprocessor- schemaprocessor

Extensions

CategoryExamples
Authentication - Used by receivers and exporters- basicauthextension- bearertokenauthextension- oidcauthextension
Health and Troubleshooting- healthcheckextension- pprofextension- remotetapextension- zpagesextension
Observers - Used by receivers to discover and collect data dynamically- dockerobserver- hostobserver- k8sobserver
Persistence - Via a database or filesystem- storage/dbstorage- storage/filestorage

Connectors

Observing

Relevant Metrics

Troubleshooting

Out of Memory crashes

Data Not Being Received or Exported

Performance Issues

Beyond the Basics

Distributions

The Bottom Line

Chapter 6. Leveraging OpenTelemetry Instrumentation

Distributions

The Bottom Line

Chapter 7. Adopting OpenTelemetry

The Basics

Why OTel and Why Now?

Instrumentation

Production Readiness

Maturity Framework

Brownfield Deployment

Data Collection