Skip to content

Observability

Observability is not just "more logging." In real services, it is the design that connects traces, error events, structured logs, and request context into one coherent operating model. Without that, slow requests, hidden N+1 queries, downstream timeouts, and background-task failures end up scattered across unrelated tools.

Quick takeaway: a practical FastAPI stack often uses OpenTelemetry as the trace backbone, Sentry as the error and performance product, and `structlog` for structured request-scoped logs. The important part is not the brand list; it is initializing them once at startup, separating sampling from PII policy, and avoiding high-cardinality spam.

A Practical Stack

RoleGood defaultWhy
distributed traces and spansOpenTelemetryvendor-neutral foundation with strong FastAPI and SQLAlchemy instrumentation
error and performance monitoringSentryuseful operational UI for errors plus tracing context
structured logs and request contextstructlogclean contextvars story for request and trace correlation

The Big Picture

Good observability connects logs, traces, and errors through one request context instead of leaving them isolated.

Instrument OpenTelemetry Once During Bootstrap

py
from fastapi import FastAPI
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
from opentelemetry.instrumentation.sqlalchemy import SQLAlchemyInstrumentor


def configure_observability(app: FastAPI, engine: object) -> None:
    FastAPIInstrumentor.instrument_app(
        app,
        excluded_urls="health,metrics",
    )
    SQLAlchemyInstrumentor().instrument(
        engine=engine,
    )

FastAPI and SQLAlchemy instrumentation should usually be attached once during bootstrap. Doing it inside routes or dependencies can cause duplicate instrumentation and confusing runtime behavior.

Design Sentry Sampling Deliberately

py
import sentry_sdk
from sentry_sdk.types import SamplingContext


def traces_sampler(context: SamplingContext) -> float:
    if context.get("parent_sampled") is not None:
        return float(context["parent_sampled"])

    transaction_context = context.get("transaction_context", {})
    name = str(transaction_context.get("name", ""))
    if name.startswith("GET /health"):
        return 0.0
    if name.startswith("POST /checkout"):
        return 0.5
    return 0.1


sentry_sdk.init(
    dsn="https://examplePublicKey@o0.ingest.sentry.io/0",
    traces_sampler=traces_sampler,
    sample_rate=1.0,
)
  • Error retention and trace sampling usually deserve different rates.
  • Sentry's docs explicitly recommend deliberate use of traces_sample_rate or traces_sampler.
  • Inherited parent sampling decisions usually should be preserved so distributed traces stay intact.

Bind Request Context into Logs with structlog

py
import structlog
from structlog.contextvars import bind_contextvars, clear_contextvars


log = structlog.get_logger()


async def logging_middleware(request, call_next):
    clear_contextvars()
    bind_contextvars(
        request_id=request.headers.get("x-request-id", "generated-id"),
        path=request.url.path,
    )
    response = await call_next(request)
    log.info("request.complete", status_code=response.status_code)
    return response

Good Patterns

  • bind request IDs and trace IDs in one request scope
  • exclude health checks and other noisy low-value paths from tracing
  • keep span names at route or business-action granularity
  • instrument meaningful boundaries such as DB, outbound HTTP, or queue publishing
  • decide PII and secret redaction before shipping events broadly

Patterns to Avoid

  • tracing every request at 100% without considering volume
  • adding high-cardinality values such as user_id, email, or order_id everywhere
  • reinitializing logger or Sentry scope repeatedly inside route handlers
  • logging whole request bodies or raw secrets
  • creating spans inside tight per-row loops
  • double auto-instrumenting the same path with overlapping SDKs

Operational Checklist

Initialize once

Instrumentation and SDK setup should live in app bootstrap or lifespan setup, not inside request handlers.

Sample by signal type

Error, trace, and profiling signals usually should not all use the same rate.

Constrain cardinality

Metric tags and span attributes should stay searchable and bounded.

Set PII policy first

Broad instrumentation without redaction rules creates operational and compliance risk quickly.

Official References

Built with VitePress for a Python 3.14 handbook.