#OTel Ops Playbook

Version: 0.33.0 Updated: 2026-03-15 Applies to: ranvier-inspector Category: Guides


#1. Purpose

This playbook bridges the gap between technical OTel integration (M149, validated smoke tests) and operational deployment in real organizations. It covers credential mapping, policy enforcement, vendor configuration, environment strategy, security hardening, multi-tenant isolation, and troubleshooting.

Prerequisite: You have completed the basic OTel interop validation described in otel_interop_matrix.md. This playbook assumes your Ranvier application already emits spans via ranvier-inspector.


#2. Credential Mapping

#2.1 Pattern Overview

Service Account  β†’  OTLP Auth Header  β†’  Collector  β†’  Backend

Ranvier exports traces via OTLP (gRPC or HTTP/protobuf). Authentication is set at the Collector exporter level β€” not in the Ranvier application code. This decouples secret management from application deployments.

#2.2 API Key / Bearer Token Mapping

Preferred pattern: Environment variable injection

# otel-collector.yaml β€” exporter section
exporters:
  otlp/backend:
    endpoint: "https://otlp.backend.example.com:4317"
    headers:
      Authorization: "${BACKEND_OTLP_TOKEN}"   # injected at runtime
    tls:
      insecure: false

Inject the token via:

  • Kubernetes Secret β†’ mounted as env var in the Collector deployment
  • Docker Compose β†’ env_file: or secrets: block
  • CI/CD β†’ injected into the Collector container at deploy time

Never embed tokens in config files committed to source control.

#2.3 Per-Environment Account Separation

Environment Service Account Token Source
dev svc-ranvier-dev .env.local (not committed)
staging svc-ranvier-staging CI secret OTEL_STAGING_TOKEN
prod svc-ranvier-prod Vault / KMS / cloud secret manager

#2.4 Multi-Backend Credential Isolation

If sending to multiple backends (e.g., Datadog for prod, Jaeger for staging debug):

exporters:
  otlphttp/datadog:
    endpoint: "https://trace.agent.datadoghq.com"
    headers:
      DD-API-KEY: "${DATADOG_API_KEY}"
  otlp/jaeger:
    endpoint: "jaeger:4317"
    tls:
      insecure: true

Use separate pipelines per backend to avoid credential leakage.


#3. Redaction Policy Selection

Ranvier's ranvier-inspector extension supports three redaction modes, enforced at the OTLP exporter level:

Mode Description Use Case
public Strip all user-identifiable fields Public-facing export (CDN logs, 3rd-party SaaS)
internal Retain service/trace metadata, strip PII Internal observability platforms
strict Retain everything (raw spans) Local dev, Jaeger smoke testing

#3.1 Selecting the Mode

Set via environment variable on the Collector:

processors:
  attributes/redact_public:
    actions:
      - key: user.id
        action: delete
      - key: user.email
        action: delete
      - key: http.request.header.authorization
        action: delete

Or via Ranvier's inspector extension:

// In your ranvier-inspector setup
Inspector::builder()
    .redaction_mode(RedactionMode::Public)   // or Internal, Strict

#3.2 Enforcement in Production

Apply redaction at two boundaries for defense-in-depth:

  1. Application level (ranvier-inspector): strip at span creation
  2. Collector level (attribute processor): final gate before export

This prevents accidental leakage if the application-level filter is misconfigured.

#3.3 Validated Redaction Path

Ranvier β†’ OTel Collector (redaction processor) β†’ OTLP exporter

Evidence: docs/03_guides/otel_collector_otlp_redaction_smoke.md


#4. Environment-Based Configuration

#4.1 Configuration Strategy

dev       β†’ stdout/console exporter (no network dependency)
staging   β†’ OTel Collector β†’ Jaeger (local Docker)
prod      β†’ OTel Collector β†’ Backend SaaS (Datadog / New Relic)

#4.2 Dev (Local Console)

# config/otel.dev.toml
[otel]
exporter = "console"
sampler  = "always_on"
service_name = "${SERVICE_NAME:-my-service}-dev"

No Collector needed. Spans appear in stdout during development.

#4.3 Staging (Jaeger via Docker Compose)

# docker-compose.staging.yml
services:
  jaeger:
    image: jaegertracing/all-in-one:1.55
    ports:
      - "16686:16686"  # UI
      - "4317:4317"    # OTLP gRPC
  otel-collector:
    image: otel/opentelemetry-collector-contrib:0.96.0
    volumes:
      - ./docs/03_guides/otel_collector_jaeger_config.yaml:/etc/otelcol/config.yaml
    depends_on: [jaeger]

Reference config: docs/03_guides/otel_collector_jaeger_config.yaml

#4.4 Production (SaaS Backend)

Use the vendor-specific configs in docs/03_guides/otel_vendor_configs/.

Key principles for production:

  • Always use TLS (insecure: false)
  • Set retry_on_failure and sending_queue for resilience
  • Use sampler = "parentbased_traceidratio" with appropriate ratio (e.g., 0.1 for high-traffic services)
# Recommended production sampler
sampler:
  type: parentbased_traceidratio
  argument: "0.1"   # sample 10% of root spans

#5. Vendor-Specific Quick Reference

Vendor Endpoint Auth Config File
Datadog https://trace.agent.datadoghq.com DD-API-KEY header otel_collector_datadog_class_backend_config.yaml
New Relic https://otlp.nr-data.net:4317 api-key header otel_vendor_configs/new_relic.yaml
Honeycomb https://api.honeycomb.io:443 x-honeycomb-team header otel_vendor_configs/honeycomb.yaml
Jaeger jaeger:4317 (gRPC, local) none (internal) otel_collector_jaeger_config.yaml
Tempo tempo:4317 (gRPC, local) none (internal) otel_collector_tempo_config.yaml

See docs/03_guides/otel_vendor_configs/README.md for quick-start instructions.


#6. Security Hardening

#6.1 Token Rotation

  1. Rotate backend API keys on a schedule (30–90 days)
  2. Use short-lived tokens where supported (OIDC workload identity)
  3. Store secrets in Vault / AWS Secrets Manager / GCP Secret Manager
  4. Never log the OTLP auth header β€” configure Collector log level to warn

#6.2 TLS Configuration

All production OTLP connections must use TLS:

exporters:
  otlp/backend:
    tls:
      insecure: false
      ca_file: /etc/ssl/certs/ca-certificates.crt   # or vendor CA bundle

For self-hosted backends (Jaeger/Tempo) in staging, mTLS is recommended:

tls:
  cert_file: /certs/client.crt
  key_file: /certs/client.key
  ca_file: /certs/ca.crt

#6.3 Network Policy (Kubernetes)

Restrict egress from the application pod to only the Collector:

# NetworkPolicy: allow app β†’ collector only
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-otel-collector
spec:
  podSelector:
    matchLabels:
      app: ranvier-service
  egress:
    - to:
        - podSelector:
            matchLabels:
              app: otel-collector
      ports:
        - protocol: TCP
          port: 4317

#7. Multi-Tenant Observability Isolation

When multiple tenants use the same Ranvier deployment:

#7.1 Attribute-Based Isolation

Tag every span with a tenant.id attribute:

// In your Ranvier transition
#[transition]
async fn handle(_: (), _: &(), bus: &mut Bus) -> Outcome<String, Error> {
    let tenant = bus.read::<TenantContext>().map(|t| t.id.clone()).unwrap_or_default();
    tracing::Span::current().set_attribute("tenant.id", tenant);
    Outcome::Next("ok".to_string())
}

#7.2 Collector-Level Routing

Route spans to different backends based on tenant.id:

processors:
  filter/tenant_a:
    spans:
      include:
        match_type: strict
        attributes:
          - key: tenant.id
            value: "tenant-a"

exporters:
  otlp/tenant_a:
    endpoint: "https://tenant-a.observability.example.com:4317"
    headers:
      Authorization: "${TENANT_A_TOKEN}"

#7.3 Dataset Isolation (Honeycomb / New Relic)

Both Honeycomb and New Relic support dataset/team-level isolation natively via the API key. Use per-tenant API keys and inject via the Collector routing pattern above.


#8. Troubleshooting Runbook

#8.1 No Spans Appearing

Symptom Likely Cause Action
No output in Jaeger UI Collector not running docker ps / podman ps β€” start Collector
Collector running, no spans OTLP endpoint misconfigured Check OTEL_EXPORTER_OTLP_ENDPOINT env var
Spans visible in Collector log but not backend Backend auth failure Check API key, test with curl
Sampler dropping all spans traceidratio too low Set to 1.0 for debugging
# Quick connectivity test (gRPC)
grpcurl -plaintext localhost:4317 list

# Quick connectivity test (HTTP/protobuf)
curl -I http://localhost:4318/v1/traces

#8.2 Redaction Not Working

  • Verify ranvier-inspector version β‰₯ 0.26.0
  • Check that both application-level AND Collector-level redaction are configured
  • Enable Collector debug logs temporarily: service.telemetry.logs.level: debug

#8.3 High Cardinality / Sampling

For high-traffic services (>1000 RPS), use parent-based ratio sampling:

OTEL_TRACES_SAMPLER=parentbased_traceidratio
OTEL_TRACES_SAMPLER_ARG=0.05   # 5%

#8.4 Memory / CPU Spike in Collector

  • Enable memory_limiter processor (always first in pipeline)
  • Use batch processor with send_batch_size: 1024 and timeout: 10s
  • Scale Collector horizontally behind a load balancer
processors:
  memory_limiter:
    check_interval: 1s
    limit_percentage: 75
    spike_limit_percentage: 20
  batch:
    send_batch_size: 1024
    timeout: 10s

#8.5 Clock Skew / Span Order Issues

If spans appear out-of-order in Jaeger/Tempo:

  • Ensure all services use NTP synchronization
  • Set OTEL_SDK_DISABLED=false and verify clock source in containers

Document Purpose
`otel_interop_matrix.md` Validated interop paths (M149)
`otel_collector_smoke_baseline.md` Debug exporter smoke test
`otel_collector_jaeger_smoke.md` Jaeger smoke test
`otel_collector_tempo_smoke.md` Tempo smoke test
`otel_collector_datadog_class_smoke.md` Datadog-class relay smoke test
`otel_collector_otlp_redaction_smoke.md` Redaction adapter smoke test
`otel_vendor_configs/README.md` Vendor config catalog
ranvier/examples/otel-ops-demo/ Policy enforcement demo