Observability
AuthPlane exposes Prometheus metrics on the admin port, structured JSON logs to stdout, and optional OTLP export for logs, metrics, and traces.
| Surface | Where | Configured by |
|---|---|---|
| Prometheus metrics | GET /metrics on the admin port (default :9001) | observability.metrics.provider: prometheus (default), observability.metrics.path: /metrics |
| Structured logs | stdout, JSON | observability.logging.format: json (default), level via AUTHPLANE_LOG_LEVEL |
| OTLP logs | observability.logging.outputs.otel: true + endpoint | AUTHPLANE_LOG_OTEL, AUTHPLANE_LOG_OTEL_ENDPOINT |
| OTLP traces | observability.tracing.enabled: true + endpoint | AUTHPLANE_TRACING_ENABLED, AUTHPLANE_TRACING_ENDPOINT |
| OTLP metrics (in parallel with Prometheus) | observability.metrics.provider: both | AUTHPLANE_METRICS_PROVIDER, AUTHPLANE_METRICS_OTEL_ENDPOINT |
/metrics is bound on the admin port so it is not publicly exposed by default. Make sure your scraper can reach it — same cluster network in Kubernetes, loopback for systemd.
Scrape Prometheus metrics
Section titled “Scrape Prometheus metrics”scrape_configs: - job_name: authserver scrape_interval: 15s metrics_path: /metrics static_configs: - targets: ["authserver:9001"] # admin portIn Kubernetes via the Helm chart, serviceMonitor.enabled: true already points at the admin port. The metrics provider has three modes — prometheus (default pull-based), otel (push over OTLP), or both.
Enable OTLP traces
Section titled “Enable OTLP traces”observability: tracing: enabled: true # AUTHPLANE_TRACING_ENABLED endpoint: otel-collector:4317 # AUTHPLANE_TRACING_ENDPOINT insecure: true # AUTHPLANE_TRACING_INSECURE (TLS off for local) sample_rate: 1.0 # AUTHPLANE_TRACING_SAMPLE_RATE; lower for prodDrop the sample rate (e.g. 0.1) once you’ve validated traces — high cardinality at full sample is expensive.
Enable OTLP logs
Section titled “Enable OTLP logs”observability: logging: level: info # AUTHPLANE_LOG_LEVEL format: json outputs: stdout: true # keep stdout; pod-logs are forever otel: true # AUTHPLANE_LOG_OTEL otel_endpoint: otel-collector:4317 insecure: trueEach log line includes trace_id, span_id, request_id, client_id, and the grant name — click a trace in Grafana Tempo to pivot to the matching logs in Loki.
The bundled LGTM stack
Section titled “The bundled LGTM stack”The repo ships a Grafana LGTM overlay at deploy/observability/docker-compose.observability.yml running Alloy (OTLP gateway on :4317), Tempo (traces), Loki (logs), Mimir (metrics), Prometheus, and Grafana. Run it standalone:
docker compose -f deploy/observability/docker-compose.observability.yml up -d# Grafana at http://localhost:3000 (admin/admin)Or pull it into your own stack with Compose’s include: directive and point logs, traces, and metrics at alloy:4317 — this is exactly what the reference deploy/docker-compose.yml does.
Metrics worth watching
Section titled “Metrics worth watching”The prefix is currently mixed: authserver_* for the OAuth core, authplane_* for newer subsystems (DPoP, token exchange, client credentials, XAA). Both are live.
| Metric | Why operators care |
|---|---|
authserver_tokens_issued_total{grant_type} | Throughput baseline, anomaly detection |
authserver_refresh_token_reuse_total | Stolen-token reuse (RFC 6749 §10.4) — page on any non-zero increase |
authserver_auth_denied_total{reason} | locked_out = brute-force; invalid_client = misconfigured caller |
authplane_dpop_proofs_rejected_total | Token-binding violations |
authplane_token_exchange_denied_total | Cross-client policy denials |
authserver_token_issuance_duration_seconds{grant_type} | Per-grant p99 latency |
authserver_http_request_duration_seconds{method,path,status} | Full HTTP-surface SLO |
authserver_active_token_families | Outstanding token families; sizing input for the purge schedule |
The exhaustive instrument list lives in docs/reference/metrics.md, generated from the metrics source of truth.
Alerts you should never deploy without
Section titled “Alerts you should never deploy without”groups: - name: authserver rules: - alert: RefreshTokenReuse # page-worthy expr: increase(authserver_refresh_token_reuse_total[5m]) > 0 for: 1m labels: { severity: critical } annotations: summary: "Refresh token reuse — possible theft"
- alert: HighAuthDenialRate # likely an attack or a regression expr: rate(authserver_auth_denied_total[5m]) > 10 for: 5m labels: { severity: warning } annotations: summary: "auth_denied rate >10/s for 5m"
- alert: TokenP99Slow # something's wrong with signing or DB expr: histogram_quantile(0.99, rate(authserver_token_issuance_duration_seconds_bucket[5m])) > 2 for: 5m labels: { severity: warning } annotations: summary: "Token issuance p99 > 2s"Verify
Section titled “Verify”# Scrape the admin port — sanity-check metric namescurl -fsS http://localhost:9001/metrics | grep -E '^authserver_|^authplane_' | head -5
# Confirm tracing is exporting (look for the OTLP grpc connection log line)kubectl logs deploy/authplane | grep -iE 'otlp|tracer'Next: Docker · Kubernetes · Configuration