Observability
Observability is not just "more logging." In real services, it is the design that connects traces, error events, structured logs, and request context into one coherent operating model. Without that, slow requests, hidden N+1 queries, downstream timeouts, and background-task failures end up scattered across unrelated tools.
Quick takeaway: a practical FastAPI stack often uses OpenTelemetry as the trace backbone, Sentry as the error and performance product, and `structlog` for structured request-scoped logs. The important part is not the brand list; it is initializing them once at startup, separating sampling from PII policy, and avoiding high-cardinality spam.
A Practical Stack
| Role | Good default | Why |
|---|---|---|
| distributed traces and spans | OpenTelemetry | vendor-neutral foundation with strong FastAPI and SQLAlchemy instrumentation |
| error and performance monitoring | Sentry | useful operational UI for errors plus tracing context |
| structured logs and request context | structlog | clean contextvars story for request and trace correlation |
The Big Picture
Instrument OpenTelemetry Once During Bootstrap
from fastapi import FastAPI
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
from opentelemetry.instrumentation.sqlalchemy import SQLAlchemyInstrumentor
def configure_observability(app: FastAPI, engine: object) -> None:
FastAPIInstrumentor.instrument_app(
app,
excluded_urls="health,metrics",
)
SQLAlchemyInstrumentor().instrument(
engine=engine,
)FastAPI and SQLAlchemy instrumentation should usually be attached once during bootstrap. Doing it inside routes or dependencies can cause duplicate instrumentation and confusing runtime behavior.
Design Sentry Sampling Deliberately
import sentry_sdk
from sentry_sdk.types import SamplingContext
def traces_sampler(context: SamplingContext) -> float:
if context.get("parent_sampled") is not None:
return float(context["parent_sampled"])
transaction_context = context.get("transaction_context", {})
name = str(transaction_context.get("name", ""))
if name.startswith("GET /health"):
return 0.0
if name.startswith("POST /checkout"):
return 0.5
return 0.1
sentry_sdk.init(
dsn="https://examplePublicKey@o0.ingest.sentry.io/0",
traces_sampler=traces_sampler,
sample_rate=1.0,
)- Error retention and trace sampling usually deserve different rates.
- Sentry's docs explicitly recommend deliberate use of
traces_sample_rateortraces_sampler. - Inherited parent sampling decisions usually should be preserved so distributed traces stay intact.
Bind Request Context into Logs with structlog
import structlog
from structlog.contextvars import bind_contextvars, clear_contextvars
log = structlog.get_logger()
async def logging_middleware(request, call_next):
clear_contextvars()
bind_contextvars(
request_id=request.headers.get("x-request-id", "generated-id"),
path=request.url.path,
)
response = await call_next(request)
log.info("request.complete", status_code=response.status_code)
return responseGood Patterns
- bind request IDs and trace IDs in one request scope
- exclude health checks and other noisy low-value paths from tracing
- keep span names at route or business-action granularity
- instrument meaningful boundaries such as DB, outbound HTTP, or queue publishing
- decide PII and secret redaction before shipping events broadly
Patterns to Avoid
- tracing every request at 100% without considering volume
- adding high-cardinality values such as
user_id,email, ororder_ideverywhere - reinitializing logger or Sentry scope repeatedly inside route handlers
- logging whole request bodies or raw secrets
- creating spans inside tight per-row loops
- double auto-instrumenting the same path with overlapping SDKs
Operational Checklist
Initialize once
Instrumentation and SDK setup should live in app bootstrap or lifespan setup, not inside request handlers.
Sample by signal type
Error, trace, and profiling signals usually should not all use the same rate.
Constrain cardinality
Metric tags and span attributes should stay searchable and bounded.
Set PII policy first
Broad instrumentation without redaction rules creates operational and compliance risk quickly.