Performance and Ops

FastAPI is fast, but real bottlenecks rarely live in the framework alone. Hidden blocking work, expensive validation and serialization, poor query shapes, pool contention, worker sizing, and missing observability dominate most production slowdowns.

Quick takeaway: performance tuning starts with system boundaries, not with micro-optimizing route functions. Remove blocking work, tighten DTOs and queries, size pools and workers coherently, and add latency visibility before chasing framework overhead.

Where Latency Usually Comes From

API latency is the result of several layers interacting: parsing, service logic, I/O, serialization, pool contention, and sometimes CPU-heavy blocking work.

Choosing `async def` Versus `def`

Use async def when the handler directly awaits async I/O.
Use def when the stack is mostly sync and threadpool execution is simpler.
In both cases, the key question is whether the code path blocks the event loop.

from fastapi import FastAPI
import asyncio
import time

app = FastAPI()


@app.get("/bad")
async def bad_endpoint() -> dict[str, str]:
    time.sleep(0.2)
    return {"status": "blocked"}


@app.get("/better")
async def better_endpoint() -> dict[str, str]:
    await asyncio.to_thread(time.sleep, 0.2)
    return {"status": "offloaded"}

`async def` does not make blocking code safe. A sync sleep or CPU-heavy operation still stalls the event loop unless it is moved to a worker thread or another process.

Validation and Serialization Are Part of the Cost

Deep response models increase serialization work.
Returning ORM entities directly couples lazy loading to response generation.
Narrower DTOs often reduce both latency and accidental I/O.
Boundary-specific strictness can avoid unnecessary coercion work.

Database Access Usually Dominates

Look here first

query count
N+1 behavior
loader options such as selectinload() or joinedload()
pool wait time
transaction length

Common failure modes

repeated queries inside routes
lazy loading during serialization
waiting for slow external I/O inside a long transaction

Think About the Worker Model Explicitly

Layer	Question	Common mistake
Uvicorn workers	Does worker count fit CPU and memory?	increasing workers blindly
DB pool	Does pool sizing match worker concurrency?	many workers, tiny pool
request timeout	How are slow downstreams cut off?	waiting forever
background jobs	Should work leave the API process entirely?	doing long-running work inline

Operational Defaults Worth Adding Early

separate access logs from application logs
request or trace IDs
DB query latency measurements
timeout, retry, and circuit-breaking policies
health and readiness endpoints
checks that response contracts match OpenAPI assumptions

A Good Debugging Order

Measure query count and DB latency.
Measure external API latency and timeout behavior.
Measure payload size and serialization cost.
Measure pool and worker contention.
Only then worry about framework overhead.

The observability stack itself is covered separately in Observability.

Practical Checklist

Remove hidden blocking

Audit `async` code paths for sync I/O and CPU-heavy work that still blocks the loop.

Keep response models narrow

Return only the fields you need so serialization stays predictable and lazy loading does not leak across the boundary.

Tune pools and workers together

Application worker count and DB pool size are coupled system settings, not isolated knobs.

Do not tune blind

Without latency, error-rate, and query metrics, performance work becomes guesswork.

Performance and Ops ​

Where Latency Usually Comes From ​

Choosing async def Versus def ​

Validation and Serialization Are Part of the Cost ​

Database Access Usually Dominates ​

Look here first ​

Common failure modes ​

Think About the Worker Model Explicitly ​

Operational Defaults Worth Adding Early ​

A Good Debugging Order ​

Practical Checklist ​