Skip to content

Queues and Backpressure

Async systems often fail because they accept work faster than they can safely process it. Without queue limits, concurrency limits, and timeout policy, the first bottleneck is usually memory or a downstream dependency, not raw CPU speed.

Quick takeaway: the foundation of backpressure is `Queue(maxsize=...)`, `Semaphore`, timeout, and an explicit overload policy. Unbounded queues and unbounded fan-out are easy to start with and hard to operate.

Backpressure Picture

Queue size absorbs bursts. Semaphore limits downstream concurrency. Both are needed for a stable async system.

Bounded Queue Example

py
import asyncio


async def producer(queue: asyncio.Queue[int]) -> None:
    for item in range(6):
        print("produce", item, "qsize before:", queue.qsize())
        await queue.put(item)
        print("enqueued", item, "qsize after:", queue.qsize())


async def worker(
    name: str,
    queue: asyncio.Queue[int],
    limiter: asyncio.Semaphore,
) -> None:
    try:
        while True:
            item = await queue.get()
            try:
                async with limiter:
                    await asyncio.sleep(0.15)
                    print(name, "handled", item)
            finally:
                queue.task_done()
    except asyncio.QueueShutDown:
        print(name, "shutdown")


async def main() -> None:
    queue: asyncio.Queue[int] = asyncio.Queue(maxsize=2)
    limiter = asyncio.Semaphore(2)

    async with asyncio.TaskGroup() as task_group:
        task_group.create_task(producer(queue))
        for index in range(3):
            task_group.create_task(worker(f"worker-{index}", queue, limiter))

        await queue.join()
        queue.shutdown()


asyncio.run(main())

With `maxsize=2`, the producer is forced to slow down when the queue is full. That is the most basic form of backpressure. `Queue.shutdown()` in Python 3.13+ gives you a cleaner shutdown path for consumers.

Design Questions You Must Answer

  • do producers wait when the system is full?
  • do they fail after a timeout?
  • do you drop old work or reject new work?
  • how many downstream operations may run concurrently?

Checklist

Use bounded queues

Bursts should be absorbed, not allowed to grow memory usage without limit.

Use semaphores

A queue alone does not limit how hard workers hit a downstream API or database.

Add timeout policy

Decide when enqueue or processing latency becomes failure, not just delay.

Pick a drop policy

Some real-time systems are better off dropping stale work than trying to process everything eventually.

Practical Connections

  • webhook ingestion
  • rate-limited API fan-out
  • background worker queues
  • burst-heavy internal pipelines

Official Sources

Built with VitePress for a Python 3.14 handbook.