Skip to content

Performance and Ops

FastAPI is fast, but real bottlenecks rarely live in the framework alone. Hidden blocking work, expensive validation and serialization, poor query shapes, pool contention, worker sizing, and missing observability dominate most production slowdowns.

Quick takeaway: performance tuning starts with system boundaries, not with micro-optimizing route functions. Remove blocking work, tighten DTOs and queries, size pools and workers coherently, and add latency visibility before chasing framework overhead.

Where Latency Usually Comes From

API latency is the result of several layers interacting: parsing, service logic, I/O, serialization, pool contention, and sometimes CPU-heavy blocking work.

Choosing async def Versus def

  • Use async def when the handler directly awaits async I/O.
  • Use def when the stack is mostly sync and threadpool execution is simpler.
  • In both cases, the key question is whether the code path blocks the event loop.
py
from fastapi import FastAPI
import asyncio
import time

app = FastAPI()


@app.get("/bad")
async def bad_endpoint() -> dict[str, str]:
    time.sleep(0.2)
    return {"status": "blocked"}


@app.get("/better")
async def better_endpoint() -> dict[str, str]:
    await asyncio.to_thread(time.sleep, 0.2)
    return {"status": "offloaded"}

`async def` does not make blocking code safe. A sync sleep or CPU-heavy operation still stalls the event loop unless it is moved to a worker thread or another process.

Validation and Serialization Are Part of the Cost

  • Deep response models increase serialization work.
  • Returning ORM entities directly couples lazy loading to response generation.
  • Narrower DTOs often reduce both latency and accidental I/O.
  • Boundary-specific strictness can avoid unnecessary coercion work.

Database Access Usually Dominates

Look here first

  • query count
  • N+1 behavior
  • loader options such as selectinload() or joinedload()
  • pool wait time
  • transaction length

Common failure modes

  • repeated queries inside routes
  • lazy loading during serialization
  • waiting for slow external I/O inside a long transaction

Think About the Worker Model Explicitly

LayerQuestionCommon mistake
Uvicorn workersDoes worker count fit CPU and memory?increasing workers blindly
DB poolDoes pool sizing match worker concurrency?many workers, tiny pool
request timeoutHow are slow downstreams cut off?waiting forever
background jobsShould work leave the API process entirely?doing long-running work inline

Operational Defaults Worth Adding Early

  • separate access logs from application logs
  • request or trace IDs
  • DB query latency measurements
  • timeout, retry, and circuit-breaking policies
  • health and readiness endpoints
  • checks that response contracts match OpenAPI assumptions

A Good Debugging Order

  1. Measure query count and DB latency.
  2. Measure external API latency and timeout behavior.
  3. Measure payload size and serialization cost.
  4. Measure pool and worker contention.
  5. Only then worry about framework overhead.

The observability stack itself is covered separately in Observability.

Practical Checklist

Remove hidden blocking

Audit `async` code paths for sync I/O and CPU-heavy work that still blocks the loop.

Keep response models narrow

Return only the fields you need so serialization stays predictable and lazy loading does not leak across the boundary.

Tune pools and workers together

Application worker count and DB pool size are coupled system settings, not isolated knobs.

Do not tune blind

Without latency, error-rate, and query metrics, performance work becomes guesswork.

Official References

Built with VitePress for a Python 3.14 handbook.