Concurrency & Async
Event loop, goroutines, race conditions. Why one thread can serve 10k requests.
One cook, many dishes
A backend server has one job that sounds impossible at first: handle thousands of requests at the same time, on hardware with maybe eight CPU cores. Eight cores can run eight things truly simultaneously — so how does one process serve a thousand concurrent users?
The answer is that almost none of those thousand requests need the CPU at any given instant. A request spends the overwhelming majority of its life waiting — waiting for a database query to return, waiting for an external API, waiting for a file to be read. During that wait, the CPU is doing nothing for that request.
Picture one cook in a kitchen with ten orders. A bad cook does order one start to finish — including standing and watching the oven for twenty minutes — before touching order two. A good cook puts order one's dish in the oven, and while it bakes starts chopping for order two, then stirs order three. One cook, ten orders in flight, because the cook works during other orders' waiting time.
That is concurrency: making progress on many tasks by overlapping their idle periods. It is not the same as parallelism, and the next section makes the distinction precise — because confusing the two is the root of most misunderstanding here.
Concurrency is not parallelism
These two words get used interchangeably and they are not the same thing.
Concurrency is dealing with many things at once — a structure where multiple tasks are in progress, interleaved. The cook is concurrent: ten orders underway, but at any single instant the cook's hands are on exactly one.
Parallelism is doing many things at once — literally, simultaneously, which requires multiple workers. Ten cooks each on their own order is parallel.
CONCURRENCY (1 worker, interleaved)
worker: [A][B][A][C][B][A][C] ← switches between tasks
during their wait time
PARALLELISM (3 workers, simultaneous)
worker1: [A][A][A]
worker2: [B][B][B] ← genuinely at the same time
worker3: [C][C][C]
The reason this distinction matters for backend work: most backend requests are I/O-bound — they spend their time waiting for something external, not computing. For I/O-bound work, concurrency alone is enough, because the trick is just to use one task's waiting time to advance another. You do not need ten cooks; you need one cook who does not stand idle.
CPU-bound work — resizing an image, hashing a password, crunching a report — is the opposite. It needs the CPU the whole time; there is no waiting period to borrow. The only way to speed up CPU-bound work is genuine parallelism: more cores actually running. This split — I/O-bound wants concurrency, CPU-bound wants parallelism — drives every model in the rest of this module.
The event loop model
Node.js, and Python's asyncio, handle concurrency with a single thread and an event loop. This is the one-cook kitchen made real.
There is one thread running your code. When it hits an operation that has to wait — a database query, an HTTP call, a file read — it does not block and stand there. It hands the waiting off to the operating system, registers a callback, and immediately moves on to whatever else is ready to run. When the OS later signals "that query is done," the loop picks the result up and runs the callback.
┌─────────────────────────────────────────┐
│ EVENT LOOP (1 thread) │
│ │
│ ready queue: [reqA-cb][reqC-cb] │
│ │ │
│ ▼ run next ready callback │
│ hits an await (DB query) │
│ │ │
│ └─► hand wait to OS, register cb, │
│ move on — DON'T block │
│ │
│ OS: "query done" ─► callback re-queued │
└─────────────────────────────────────────┘
In modern code you rarely write raw callbacks — async/await is the readable surface over the same machinery:
async function getOrder(req, res) {
const order = await db.query('SELECT ...'); // loop runs OTHER work here
const user = await api.fetchUser(order.userId); // and here
res.json({ order, user });
}
Every await is a point where this request says "I am waiting — event loop, go do something useful." That is how one thread serves thousands of connections.
The model has one sharp edge, and it follows directly from "one thread." If a piece of your code does not wait — a long synchronous loop, a synchronous file read, a heavy computation — it holds the single thread the entire time, and every other request is frozen until it finishes. There is no other thread to pick up the slack. This is "blocking the event loop," and it is the cardinal sin of event-loop runtimes. CPU-bound work must be moved off the loop — to a worker thread, a separate process, or a job queue (Module 16) — or it stalls the whole server.
The thread-pool model
The other major approach — used by Java (pre-virtual-threads), Go, Ruby, and traditional Python web servers — is to give each request its own thread.
A thread is an independent line of execution the operating system schedules. In this model, when a request's thread hits a database query, that thread blocks — it genuinely stops and waits. That is fine, because the other requests are on other threads, and the OS scheduler simply runs those instead. Blocking is allowed here; it only parks one thread, not the server.
THREAD-POOL MODEL
request A ─► thread 1 ─► [blocks on DB] (only thread 1 parked)
request B ─► thread 2 ─► [running]
request C ─► thread 3 ─► [running]
OS scheduler rotates the cores across threads
You do not spawn one thread per request without limit — threads cost memory (each needs its own stack, on the order of a megabyte) and switching between thousands of them has overhead. Instead you keep a fixed-size thread pool: a request borrows a thread, uses it, returns it. If all threads are busy, new requests queue for the next free one.
The trade against the event loop:
- Thread-pool code is simpler to write. It is just straight-line synchronous code — no
async, noawait, no event-loop discipline. You can block; the model tolerates it.
- The event loop scales to more idle connections more cheaply. Ten thousand mostly-waiting connections is ten thousand cheap callbacks for an event loop, versus ten thousand expensive threads for a pure thread-per-request design.
Modern runtimes are blurring this line. Go's goroutines and Java's virtual threads give you code that reads like simple blocking thread-per-request, while the runtime underneath multiplexes thousands of them onto a few real OS threads — concurrency that is cheap to write and cheap to scale. That convergence is genuinely the direction the industry is heading.
Race conditions
The moment more than one task can touch the same data, a new category of bug appears — one that does not exist in single-task code. A race condition is a bug whose occurrence depends on timing: on the order in which interleaved tasks happen to run.
The textbook example. Two requests both increment a counter that starts at 100:
read-modify-write is THREE steps, and they can interleave:
request A request B counter
read (100) 100
read (100) 100
add 1 → 101 100
add 1 → 101 100
write 101 101
write 101 101 ◄── should be 102!
Two increments happened; the counter went up by one. Both requests read 100 before either wrote 101, so one increment was silently lost. The window where this can happen is tiny — which is exactly what makes race conditions vicious. The code passes every test, works on your laptop, works in staging, and then corrupts data in production under load, intermittently, in a way you cannot reproduce on demand.
A subtle point: pure event-loop code is not immune. Node is single-threaded, but a race still happens across await points. If request A reads a value, awaits something, and request B runs during that await and changes the value, A resumes with a stale read. The interleaving unit is the await, not the CPU instruction — but the bug is the same.
The defences:
- Do not share mutable state when you can avoid it. A request that only touches its own local variables and its own request context (Module 8) cannot race. The cheapest fix is having nothing to fight over.
- Make the operation atomic — indivisible, so it cannot be interrupted mid-way. The counter bug vanishes if the increment is one atomic database statement (
UPDATE ... SET count = count + 1) instead of read-then-write in application code. Let the database, which is built for exactly this, do the increment.
- Use a lock when an operation genuinely must span multiple steps. A lock forces one task to finish the whole sequence before another may start it. Locks are correct but they have a cost — held too long or too broadly, they serialise your concurrency away, and badly-ordered locks deadlock.
The instinct to build first: design so tasks share as little mutable state as possible. The race you cannot have is the one with nothing to contend over.
Common mistakes — and when synchronous is fine
Blocking the event loop. In Node or asyncio, a synchronous heavy loop or a sync file read freezes every concurrent request. If work is CPU-bound, get it off the loop — worker thread, separate process, job queue.
Awaiting in a loop when the calls are independent. This runs ten independent calls strictly one after another:
for (const id of ids) {
results.push(await fetchUser(id)); // each waits for the previous — slow
}
If the calls do not depend on each other, fire them together and await once:
const results = await Promise.all(ids.map(fetchUser)); // overlap all the waits
That is the entire point of concurrency — overlapping idle time — being thrown away by a loop that serialises it. (Do cap the fan-out; ten thousand at once is its own problem.)
Assuming single-threaded means race-free. As shown above, event-loop code still races across await points. "It is single-threaded" is not a correctness argument.
Unhandled promise rejections / swallowed async errors. An error in async code that nobody awaits or .catch()es can vanish silently or crash the process. Every async path needs an error path.
Fire-and-forget work. Calling an async function without await and moving on means you never learn if it failed, and it may still be running when the request "finishes." If it matters, await it; if it is truly background, it belongs in a job queue, not dangling off a request.
When plain synchronous code is the right answer. Async is a tool for overlapping waiting. Where there is no waiting to overlap, async adds ceremony and buys nothing:
- A pure computation with no I/O — formatting a string, summing an array, validating a shape. There is nothing to await.
asyncon a function that never waits just wraps the return value in a promise for no reason.
- A short script or a CLI that does one thing and exits. Sequential top-to-bottom code is clearer; nobody is waiting on a second request.
- Steps that genuinely depend on each other. If step two needs step one's result, they must run in order —
awaitthem in sequence. That is not a missed optimisation; it is the logic.Promise.allis for independent work only.
The skill is not "make everything async." It is recognising that I/O-bound work wants concurrency, CPU-bound work wants parallelism or an off-loop home, and computation that never waits wants neither — just plain, straight, synchronous code. The next module looks at real-time systems, where the concurrency model has to hold a connection open not for milliseconds but for hours.
The Backend from First Principles series is based on what I learnt from Sriniously's YouTube playlist — a thoughtful, framework-agnostic walk through backend engineering. If this material helped you, please go check the original out: youtube.com/@Sriniously. The notes here are my own restatement for revisiting later.