Observability

AuthPlane exposes Prometheus metrics on the admin port, structured JSON logs to stdout, and optional OTLP export for logs, metrics, and traces.

Surface	Where	Configured by
Prometheus metrics	`GET /metrics` on the admin port (default `:9001`)	`observability.metrics.provider: prometheus` (default), `observability.metrics.path: /metrics`
Structured logs	stdout, JSON	`observability.logging.format: json` (default), level via `AUTHPLANE_LOG_LEVEL`
OTLP logs	`observability.logging.outputs.otel: true` + endpoint	`AUTHPLANE_LOG_OTEL`, `AUTHPLANE_LOG_OTEL_ENDPOINT`
OTLP traces	`observability.tracing.enabled: true` + endpoint	`AUTHPLANE_TRACING_ENABLED`, `AUTHPLANE_TRACING_ENDPOINT`
OTLP metrics (in parallel with Prometheus)	`observability.metrics.provider: both`	`AUTHPLANE_METRICS_PROVIDER`, `AUTHPLANE_METRICS_OTEL_ENDPOINT`

/metrics is bound on the admin port so it is not publicly exposed by default. Make sure your scraper can reach it — same cluster network in Kubernetes, loopback for systemd.

Scrape Prometheus metrics

scrape_configs:
  - job_name: authserver
    scrape_interval: 15s
    metrics_path: /metrics
    static_configs:
      - targets: ["authserver:9001"]   # admin port

In Kubernetes via the Helm chart, serviceMonitor.enabled: true already points at the admin port. The metrics provider has three modes — prometheus (default pull-based), otel (push over OTLP), or both.

Enable OTLP traces

observability:
  tracing:
    enabled: true                 # AUTHPLANE_TRACING_ENABLED
    endpoint: otel-collector:4317 # AUTHPLANE_TRACING_ENDPOINT
    insecure: true                # AUTHPLANE_TRACING_INSECURE (TLS off for local)
    sample_rate: 1.0              # AUTHPLANE_TRACING_SAMPLE_RATE; lower for prod

Drop the sample rate (e.g. 0.1) once you’ve validated traces — high cardinality at full sample is expensive.

Enable OTLP logs

observability:
  logging:
    level: info                  # AUTHPLANE_LOG_LEVEL
    format: json
    outputs:
      stdout: true               # keep stdout; pod-logs are forever
      otel: true                 # AUTHPLANE_LOG_OTEL
      otel_endpoint: otel-collector:4317
      insecure: true

Each log line includes trace_id, span_id, request_id, client_id, and the grant name — click a trace in Grafana Tempo to pivot to the matching logs in Loki.

The bundled LGTM stack

The repo ships a Grafana LGTM overlay at deploy/observability/docker-compose.observability.yml running Alloy (OTLP gateway on :4317), Tempo (traces), Loki (logs), Mimir (metrics), Prometheus, and Grafana. Run it standalone:

docker compose -f deploy/observability/docker-compose.observability.yml up -d
# Grafana at http://localhost:3000 (admin/admin)

Or pull it into your own stack with Compose’s include: directive and point logs, traces, and metrics at alloy:4317 — this is exactly what the reference deploy/docker-compose.yml does.

Metrics worth watching

The prefix is currently mixed: authserver_* for the OAuth core, authplane_* for newer subsystems (DPoP, token exchange, client credentials, XAA). Both are live.

Metric	Why operators care
`authserver_tokens_issued_total{grant_type}`	Throughput baseline, anomaly detection
`authserver_refresh_token_reuse_total`	Stolen-token reuse (RFC 6749 §10.4) — page on any non-zero increase
`authserver_auth_denied_total{reason}`	`locked_out` = brute-force; `invalid_client` = misconfigured caller
`authplane_dpop_proofs_rejected_total`	Token-binding violations
`authplane_token_exchange_denied_total`	Cross-client policy denials
`authserver_token_issuance_duration_seconds{grant_type}`	Per-grant p99 latency
`authserver_http_request_duration_seconds{method,path,status}`	Full HTTP-surface SLO
`authserver_active_token_families`	Outstanding token families; sizing input for the purge schedule

The exhaustive instrument list lives in docs/reference/metrics.md, generated from the metrics source of truth.

Alerts you should never deploy without

groups:
  - name: authserver
    rules:
      - alert: RefreshTokenReuse                 # page-worthy
        expr: increase(authserver_refresh_token_reuse_total[5m]) > 0
        for: 1m
        labels: { severity: critical }
        annotations:
          summary: "Refresh token reuse — possible theft"

      - alert: HighAuthDenialRate                # likely an attack or a regression
        expr: rate(authserver_auth_denied_total[5m]) > 10
        for: 5m
        labels: { severity: warning }
        annotations:
          summary: "auth_denied rate >10/s for 5m"

      - alert: TokenP99Slow                      # something's wrong with signing or DB
        expr: histogram_quantile(0.99, rate(authserver_token_issuance_duration_seconds_bucket[5m])) > 2
        for: 5m
        labels: { severity: warning }
        annotations:
          summary: "Token issuance p99 > 2s"

Verify

# Scrape the admin port — sanity-check metric names
curl -fsS http://localhost:9001/metrics | grep -E '^authserver_|^authplane_' | head -5

# Confirm tracing is exporting (look for the OTLP grpc connection log line)
kubectl logs deploy/authplane | grep -iE 'otlp|tracer'

Next: Docker · Kubernetes · Configuration