Home
Backend from First Principles / Module 19 — Concurrency & Async

Concurrency & Async

Event loop, goroutines, race conditions. Why one thread can serve 10k requests.


One cook, many dishes

A backend server has one job that sounds impossible at first: handle thousands of requests at the same time, on hardware with maybe eight CPU cores. Eight cores can run eight things truly simultaneously — so how does one process serve a thousand concurrent users?

The answer is that almost none of those thousand requests need the CPU at any given instant. A request spends the overwhelming majority of its life waiting — waiting for a database query to return, waiting for an external API, waiting for a file to be read. During that wait, the CPU is doing nothing for that request.

Picture one cook in a kitchen with ten orders. A bad cook does order one start to finish — including standing and watching the oven for twenty minutes — before touching order two. A good cook puts order one's dish in the oven, and while it bakes starts chopping for order two, then stirs order three. One cook, ten orders in flight, because the cook works during other orders' waiting time.

That is concurrency: making progress on many tasks by overlapping their idle periods. It is not the same as parallelism, and the next section makes the distinction precise — because confusing the two is the root of most misunderstanding here.


Concurrency is not parallelism

These two words get used interchangeably and they are not the same thing.

Concurrency is dealing with many things at once — a structure where multiple tasks are in progress, interleaved. The cook is concurrent: ten orders underway, but at any single instant the cook's hands are on exactly one.

Parallelism is doing many things at once — literally, simultaneously, which requires multiple workers. Ten cooks each on their own order is parallel.

Text
   CONCURRENCY (1 worker, interleaved)
   worker: [A][B][A][C][B][A][C]   ← switches between tasks
                                     during their wait time

   PARALLELISM (3 workers, simultaneous)
   worker1: [A][A][A]
   worker2: [B][B][B]              ← genuinely at the same time
   worker3: [C][C][C]

The reason this distinction matters for backend work: most backend requests are I/O-bound — they spend their time waiting for something external, not computing. For I/O-bound work, concurrency alone is enough, because the trick is just to use one task's waiting time to advance another. You do not need ten cooks; you need one cook who does not stand idle.

CPU-bound work — resizing an image, hashing a password, crunching a report — is the opposite. It needs the CPU the whole time; there is no waiting period to borrow. The only way to speed up CPU-bound work is genuine parallelism: more cores actually running. This split — I/O-bound wants concurrency, CPU-bound wants parallelism — drives every model in the rest of this module.


The event loop model

Node.js, and Python's asyncio, handle concurrency with a single thread and an event loop. This is the one-cook kitchen made real.

There is one thread running your code. When it hits an operation that has to wait — a database query, an HTTP call, a file read — it does not block and stand there. It hands the waiting off to the operating system, registers a callback, and immediately moves on to whatever else is ready to run. When the OS later signals "that query is done," the loop picks the result up and runs the callback.

Text
   ┌─────────────────────────────────────────┐
   │            EVENT LOOP (1 thread)         │
   │                                          │
   │   ready queue: [reqA-cb][reqC-cb]        │
   │        │                                 │
   │        ▼  run next ready callback         │
   │   hits an await (DB query)               │
   │        │                                 │
   │        └─► hand wait to OS, register cb,  │
   │            move on — DON'T block          │
   │                                          │
   │   OS: "query done" ─► callback re-queued  │
   └─────────────────────────────────────────┘

In modern code you rarely write raw callbacks — async/await is the readable surface over the same machinery:

JavaScript
async function getOrder(req, res) {
  const order = await db.query('SELECT ...');   // loop runs OTHER work here
  const user  = await api.fetchUser(order.userId);  // and here
  res.json({ order, user });
}

Every await is a point where this request says "I am waiting — event loop, go do something useful." That is how one thread serves thousands of connections.

The model has one sharp edge, and it follows directly from "one thread." If a piece of your code does not wait — a long synchronous loop, a synchronous file read, a heavy computation — it holds the single thread the entire time, and every other request is frozen until it finishes. There is no other thread to pick up the slack. This is "blocking the event loop," and it is the cardinal sin of event-loop runtimes. CPU-bound work must be moved off the loop — to a worker thread, a separate process, or a job queue (Module 16) — or it stalls the whole server.


The thread-pool model

The other major approach — used by Java (pre-virtual-threads), Go, Ruby, and traditional Python web servers — is to give each request its own thread.

A thread is an independent line of execution the operating system schedules. In this model, when a request's thread hits a database query, that thread blocks — it genuinely stops and waits. That is fine, because the other requests are on other threads, and the OS scheduler simply runs those instead. Blocking is allowed here; it only parks one thread, not the server.

Text
   THREAD-POOL MODEL
   request A ─► thread 1 ─► [blocks on DB]  (only thread 1 parked)
   request B ─► thread 2 ─► [running]
   request C ─► thread 3 ─► [running]
                  OS scheduler rotates the cores across threads

You do not spawn one thread per request without limit — threads cost memory (each needs its own stack, on the order of a megabyte) and switching between thousands of them has overhead. Instead you keep a fixed-size thread pool: a request borrows a thread, uses it, returns it. If all threads are busy, new requests queue for the next free one.

The trade against the event loop:

Modern runtimes are blurring this line. Go's goroutines and Java's virtual threads give you code that reads like simple blocking thread-per-request, while the runtime underneath multiplexes thousands of them onto a few real OS threads — concurrency that is cheap to write and cheap to scale. That convergence is genuinely the direction the industry is heading.


Race conditions

The moment more than one task can touch the same data, a new category of bug appears — one that does not exist in single-task code. A race condition is a bug whose occurrence depends on timing: on the order in which interleaved tasks happen to run.

The textbook example. Two requests both increment a counter that starts at 100:

Text
   read-modify-write is THREE steps, and they can interleave:

   request A          request B          counter
   read   (100)                          100
                      read   (100)        100
   add 1 → 101                            100
                      add 1 → 101         100
   write  101                            101
                      write  101          101   ◄── should be 102!

Two increments happened; the counter went up by one. Both requests read 100 before either wrote 101, so one increment was silently lost. The window where this can happen is tiny — which is exactly what makes race conditions vicious. The code passes every test, works on your laptop, works in staging, and then corrupts data in production under load, intermittently, in a way you cannot reproduce on demand.

A subtle point: pure event-loop code is not immune. Node is single-threaded, but a race still happens across await points. If request A reads a value, awaits something, and request B runs during that await and changes the value, A resumes with a stale read. The interleaving unit is the await, not the CPU instruction — but the bug is the same.

The defences:

The instinct to build first: design so tasks share as little mutable state as possible. The race you cannot have is the one with nothing to contend over.


Common mistakes — and when synchronous is fine

Blocking the event loop. In Node or asyncio, a synchronous heavy loop or a sync file read freezes every concurrent request. If work is CPU-bound, get it off the loop — worker thread, separate process, job queue.

Awaiting in a loop when the calls are independent. This runs ten independent calls strictly one after another:

JavaScript
for (const id of ids) {
  results.push(await fetchUser(id));   // each waits for the previous — slow
}

If the calls do not depend on each other, fire them together and await once:

JavaScript
const results = await Promise.all(ids.map(fetchUser));   // overlap all the waits

That is the entire point of concurrency — overlapping idle time — being thrown away by a loop that serialises it. (Do cap the fan-out; ten thousand at once is its own problem.)

Assuming single-threaded means race-free. As shown above, event-loop code still races across await points. "It is single-threaded" is not a correctness argument.

Unhandled promise rejections / swallowed async errors. An error in async code that nobody awaits or .catch()es can vanish silently or crash the process. Every async path needs an error path.

Fire-and-forget work. Calling an async function without await and moving on means you never learn if it failed, and it may still be running when the request "finishes." If it matters, await it; if it is truly background, it belongs in a job queue, not dangling off a request.

When plain synchronous code is the right answer. Async is a tool for overlapping waiting. Where there is no waiting to overlap, async adds ceremony and buys nothing:

The skill is not "make everything async." It is recognising that I/O-bound work wants concurrency, CPU-bound work wants parallelism or an off-loop home, and computation that never waits wants neither — just plain, straight, synchronous code. The next module looks at real-time systems, where the concurrency model has to hold a connection open not for milliseconds but for hours.


Source & Credit

The Backend from First Principles series is based on what I learnt from Sriniously's YouTube playlist — a thoughtful, framework-agnostic walk through backend engineering. If this material helped you, please go check the original out: youtube.com/@Sriniously. The notes here are my own restatement for revisiting later.

⁂ Back to all modules