#OTel Ops Playbook
Version: 0.33.0 Updated: 2026-03-15 Applies to: ranvier-inspector Category: Guides
#1. Purpose
This playbook bridges the gap between technical OTel integration (M149, validated smoke tests) and operational deployment in real organizations. It covers credential mapping, policy enforcement, vendor configuration, environment strategy, security hardening, multi-tenant isolation, and troubleshooting.
Prerequisite: You have completed the basic OTel interop validation described in
otel_interop_matrix.md. This playbook assumes your Ranvier application already emits spans viaranvier-inspector.
#2. Credential Mapping
#2.1 Pattern Overview
Service Account β OTLP Auth Header β Collector β BackendRanvier exports traces via OTLP (gRPC or HTTP/protobuf). Authentication is set at the Collector exporter level β not in the Ranvier application code. This decouples secret management from application deployments.
#2.2 API Key / Bearer Token Mapping
Preferred pattern: Environment variable injection
# otel-collector.yaml β exporter section
exporters:
otlp/backend:
endpoint: "https://otlp.backend.example.com:4317"
headers:
Authorization: "${BACKEND_OTLP_TOKEN}" # injected at runtime
tls:
insecure: falseInject the token via:
- Kubernetes Secret β mounted as env var in the Collector deployment
- Docker Compose β
env_file:orsecrets:block - CI/CD β injected into the Collector container at deploy time
Never embed tokens in config files committed to source control.
#2.3 Per-Environment Account Separation
| Environment | Service Account | Token Source |
|---|---|---|
dev |
svc-ranvier-dev |
.env.local (not committed) |
staging |
svc-ranvier-staging |
CI secret OTEL_STAGING_TOKEN |
prod |
svc-ranvier-prod |
Vault / KMS / cloud secret manager |
#2.4 Multi-Backend Credential Isolation
If sending to multiple backends (e.g., Datadog for prod, Jaeger for staging debug):
exporters:
otlphttp/datadog:
endpoint: "https://trace.agent.datadoghq.com"
headers:
DD-API-KEY: "${DATADOG_API_KEY}"
otlp/jaeger:
endpoint: "jaeger:4317"
tls:
insecure: trueUse separate pipelines per backend to avoid credential leakage.
#3. Redaction Policy Selection
Ranvier's ranvier-inspector extension supports three redaction modes, enforced at the OTLP exporter level:
| Mode | Description | Use Case |
|---|---|---|
public |
Strip all user-identifiable fields | Public-facing export (CDN logs, 3rd-party SaaS) |
internal |
Retain service/trace metadata, strip PII | Internal observability platforms |
strict |
Retain everything (raw spans) | Local dev, Jaeger smoke testing |
#3.1 Selecting the Mode
Set via environment variable on the Collector:
processors:
attributes/redact_public:
actions:
- key: user.id
action: delete
- key: user.email
action: delete
- key: http.request.header.authorization
action: deleteOr via Ranvier's inspector extension:
// In your ranvier-inspector setup
Inspector::builder()
.redaction_mode(RedactionMode::Public) // or Internal, Strict#3.2 Enforcement in Production
Apply redaction at two boundaries for defense-in-depth:
- Application level (
ranvier-inspector): strip at span creation - Collector level (attribute processor): final gate before export
This prevents accidental leakage if the application-level filter is misconfigured.
#3.3 Validated Redaction Path
Ranvier β OTel Collector (redaction processor) β OTLP exporterEvidence: docs/03_guides/otel_collector_otlp_redaction_smoke.md
#4. Environment-Based Configuration
#4.1 Configuration Strategy
dev β stdout/console exporter (no network dependency)
staging β OTel Collector β Jaeger (local Docker)
prod β OTel Collector β Backend SaaS (Datadog / New Relic)#4.2 Dev (Local Console)
# config/otel.dev.toml
[otel]
exporter = "console"
sampler = "always_on"
service_name = "${SERVICE_NAME:-my-service}-dev"No Collector needed. Spans appear in stdout during development.
#4.3 Staging (Jaeger via Docker Compose)
# docker-compose.staging.yml
services:
jaeger:
image: jaegertracing/all-in-one:1.55
ports:
- "16686:16686" # UI
- "4317:4317" # OTLP gRPC
otel-collector:
image: otel/opentelemetry-collector-contrib:0.96.0
volumes:
- ./docs/03_guides/otel_collector_jaeger_config.yaml:/etc/otelcol/config.yaml
depends_on: [jaeger]Reference config: docs/03_guides/otel_collector_jaeger_config.yaml
#4.4 Production (SaaS Backend)
Use the vendor-specific configs in docs/03_guides/otel_vendor_configs/.
Key principles for production:
- Always use TLS (
insecure: false) - Set
retry_on_failureandsending_queuefor resilience - Use
sampler = "parentbased_traceidratio"with appropriate ratio (e.g., 0.1 for high-traffic services)
# Recommended production sampler
sampler:
type: parentbased_traceidratio
argument: "0.1" # sample 10% of root spans#5. Vendor-Specific Quick Reference
| Vendor | Endpoint | Auth | Config File |
|---|---|---|---|
| Datadog | https://trace.agent.datadoghq.com |
DD-API-KEY header |
otel_collector_datadog_class_backend_config.yaml |
| New Relic | https://otlp.nr-data.net:4317 |
api-key header |
otel_vendor_configs/new_relic.yaml |
| Honeycomb | https://api.honeycomb.io:443 |
x-honeycomb-team header |
otel_vendor_configs/honeycomb.yaml |
| Jaeger | jaeger:4317 (gRPC, local) |
none (internal) | otel_collector_jaeger_config.yaml |
| Tempo | tempo:4317 (gRPC, local) |
none (internal) | otel_collector_tempo_config.yaml |
See docs/03_guides/otel_vendor_configs/README.md for quick-start instructions.
#6. Security Hardening
#6.1 Token Rotation
- Rotate backend API keys on a schedule (30β90 days)
- Use short-lived tokens where supported (OIDC workload identity)
- Store secrets in Vault / AWS Secrets Manager / GCP Secret Manager
- Never log the OTLP auth header β configure Collector log level to
warn
#6.2 TLS Configuration
All production OTLP connections must use TLS:
exporters:
otlp/backend:
tls:
insecure: false
ca_file: /etc/ssl/certs/ca-certificates.crt # or vendor CA bundleFor self-hosted backends (Jaeger/Tempo) in staging, mTLS is recommended:
tls:
cert_file: /certs/client.crt
key_file: /certs/client.key
ca_file: /certs/ca.crt#6.3 Network Policy (Kubernetes)
Restrict egress from the application pod to only the Collector:
# NetworkPolicy: allow app β collector only
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-otel-collector
spec:
podSelector:
matchLabels:
app: ranvier-service
egress:
- to:
- podSelector:
matchLabels:
app: otel-collector
ports:
- protocol: TCP
port: 4317#7. Multi-Tenant Observability Isolation
When multiple tenants use the same Ranvier deployment:
#7.1 Attribute-Based Isolation
Tag every span with a tenant.id attribute:
// In your Ranvier transition
#[transition]
async fn handle(_: (), _: &(), bus: &mut Bus) -> Outcome<String, Error> {
let tenant = bus.read::<TenantContext>().map(|t| t.id.clone()).unwrap_or_default();
tracing::Span::current().set_attribute("tenant.id", tenant);
Outcome::Next("ok".to_string())
}#7.2 Collector-Level Routing
Route spans to different backends based on tenant.id:
processors:
filter/tenant_a:
spans:
include:
match_type: strict
attributes:
- key: tenant.id
value: "tenant-a"
exporters:
otlp/tenant_a:
endpoint: "https://tenant-a.observability.example.com:4317"
headers:
Authorization: "${TENANT_A_TOKEN}"#7.3 Dataset Isolation (Honeycomb / New Relic)
Both Honeycomb and New Relic support dataset/team-level isolation natively via the API key. Use per-tenant API keys and inject via the Collector routing pattern above.
#8. Troubleshooting Runbook
#8.1 No Spans Appearing
| Symptom | Likely Cause | Action |
|---|---|---|
| No output in Jaeger UI | Collector not running | docker ps / podman ps β start Collector |
| Collector running, no spans | OTLP endpoint misconfigured | Check OTEL_EXPORTER_OTLP_ENDPOINT env var |
| Spans visible in Collector log but not backend | Backend auth failure | Check API key, test with curl |
| Sampler dropping all spans | traceidratio too low |
Set to 1.0 for debugging |
# Quick connectivity test (gRPC)
grpcurl -plaintext localhost:4317 list
# Quick connectivity test (HTTP/protobuf)
curl -I http://localhost:4318/v1/traces#8.2 Redaction Not Working
- Verify
ranvier-inspectorversionβ₯ 0.26.0 - Check that both application-level AND Collector-level redaction are configured
- Enable Collector debug logs temporarily:
service.telemetry.logs.level: debug
#8.3 High Cardinality / Sampling
For high-traffic services (>1000 RPS), use parent-based ratio sampling:
OTEL_TRACES_SAMPLER=parentbased_traceidratio
OTEL_TRACES_SAMPLER_ARG=0.05 # 5%#8.4 Memory / CPU Spike in Collector
- Enable
memory_limiterprocessor (always first in pipeline) - Use
batchprocessor withsend_batch_size: 1024andtimeout: 10s - Scale Collector horizontally behind a load balancer
processors:
memory_limiter:
check_interval: 1s
limit_percentage: 75
spike_limit_percentage: 20
batch:
send_batch_size: 1024
timeout: 10s#8.5 Clock Skew / Span Order Issues
If spans appear out-of-order in Jaeger/Tempo:
- Ensure all services use NTP synchronization
- Set
OTEL_SDK_DISABLED=falseand verify clock source in containers
#9. Reference Links
| Document | Purpose |
|---|---|
| `otel_interop_matrix.md` | Validated interop paths (M149) |
| `otel_collector_smoke_baseline.md` | Debug exporter smoke test |
| `otel_collector_jaeger_smoke.md` | Jaeger smoke test |
| `otel_collector_tempo_smoke.md` | Tempo smoke test |
| `otel_collector_datadog_class_smoke.md` | Datadog-class relay smoke test |
| `otel_collector_otlp_redaction_smoke.md` | Redaction adapter smoke test |
| `otel_vendor_configs/README.md` | Vendor config catalog |
ranvier/examples/otel-ops-demo/ |
Policy enforcement demo |