Home
Backend from First Principles / Module 20 — Real-Time Systems

Real-Time Systems

WebSockets vs SSE. Scaling stateful connections. When HTTP isn't enough.


The problem with request/response

Everything so far in this series assumes one shape of communication: the client asks, the server answers, the exchange ends. The client always speaks first. This is the request/response model, and for most of the web it is exactly right.

It has one structural limitation: the server cannot start a conversation. It can only ever reply to something the client said. So when the server is the one that knows something new — a chat message arrived, a stock price moved, another player took their turn, a background job finished — it has no way to tell the client. The client has to ask.

The crude workaround is polling: the client asks again and again, on a timer, just in case.

Text
   POLLING — client asks every 5 seconds, just in case
   client: "anything new?"  server: "no"
   client: "anything new?"  server: "no"
   client: "anything new?"  server: "no"
   client: "anything new?"  server: "YES — here it is"   ◄── 15s stale

Polling is wasteful in both directions and it is always wrong about timing. Poll every 5 seconds and you do 12 pointless round-trips a minute while idle, and news can still be up to 5 seconds stale. Poll faster to cut the staleness and you multiply the wasted requests. There is no good setting — it is wasteful and laggy at the same time.

Real-time systems exist to escape this. The rest of the module is the three ways to do it, what each costs, and — importantly — when polling was actually fine and you should not have built any of this.


Long-polling — the transitional trick

The first improvement keeps the familiar request/response machinery and bends one rule. With long-polling, the client makes a request — but instead of answering "nothing new" immediately, the server holds the request open and does not answer until it actually has something to say (or a timeout is reached).

Text
   LONG-POLLING — server holds the request until there's news
   client: "anything new?" ─► server holds... holds... holds...
                                 (event happens)
                              ◄─ "YES — here it is"
   client immediately re-asks ─► server holds again...

When the response finally arrives, the client instantly sends a new request, and the cycle repeats. The effect is near-instant delivery — news reaches the client the moment it exists, not on the next poll tick — using nothing but ordinary HTTP requests.

The cost is that it is still fundamentally a sequence of separate requests. Each delivered message is a full request/response round-trip with all its overhead, and there is a small gap between one response landing and the next request being established where an event could need buffering. It also ties up a server connection slot for the whole hold.

Long-polling's honest role today is a fallback. It works through every proxy, every firewall, every ancient corporate network, because to all of them it just looks like slow HTTP. When a real-time library cannot establish a WebSocket, this is what it quietly drops back to. You rarely choose it as the primary design anymore — but understanding it explains what the better tools improved on.


Server-Sent Events — a one-way stream

Server-Sent Events (SSE) make the one improvement long-polling was groping toward: instead of the server answering once and the client re-asking, the server keeps a single response open forever and writes new messages into it as a stream.

The client opens one connection. The server never closes it. Whenever there is news, the server writes another chunk down the same pipe. One connection, many messages, no re-asking.

Text
   SSE — one connection, server streams messages down it
   client: opens EventSource ──► server keeps response open
                              ◄── message 1
                              ◄── message 2
                              ◄── message 3   (same connection, hours later)

It is built into browsers and the client side is genuinely small:

JavaScript
const stream = new EventSource('/api/notifications');
stream.onmessage = (event) => {
  console.log('server pushed:', event.data);
};
// the browser auto-reconnects if the connection drops

SSE's defining trait — and the key to knowing when to use it — is that it is one-directional: server to client only. The client cannot send data back up the SSE connection; for that it uses normal, separate HTTP requests.

That sounds like a limitation, and sometimes it is, but very often it is an exact fit. A notification feed, a live activity log, a status dashboard, a progress indicator, a price ticker, a sports score — in all of these the server is the only party with news. The client just watches. For that whole class of "server broadcasts, client observes" problems, SSE is simpler than the alternative: it runs over plain HTTP, it reconnects automatically, it needs no special protocol handling. Reaching for something heavier when SSE fits is a common over-engineering mistake.


WebSockets — a two-way channel

When the client also needs to push — frequently, with low latency, in both directions — you want a WebSocket.

A WebSocket starts life as an ordinary HTTP request carrying a special Upgrade header. If the server agrees, that connection is upgraded: it stops being HTTP and becomes a persistent, bidirectional channel. After the upgrade, either side can send a message to the other at any time, with very little per-message overhead.

Text
   WEBSOCKET — full duplex, either side sends anytime
   client ──► HTTP request with "Upgrade: websocket"
   server ◄── "101 Switching Protocols"
   ══════════ connection upgraded ══════════
   client ⇄ server   (messages flow both ways, any time, low overhead)
JavaScript
const socket = new WebSocket('wss://example.com/game');
socket.onmessage = (event) => render(event.data);   // server → client
socket.send(JSON.stringify({ move: 'e4' }));         // client → server

This is the model for genuinely interactive systems: multiplayer games, collaborative editors where every keystroke matters, trading interfaces, video-call signalling, live chat with typing indicators. Anywhere both parties are active participants and latency is felt.

WebSockets are the most capable of the three options. They are also the most expensive to operate, and the next section is about that cost — because "most capable" is not the same as "the right default," and treating WebSockets as the automatic choice for anything live is the single most common mistake in this area.


The real cost — holding connections open

Request/response has a quiet operational virtue that is easy to take for granted: connections are short. A request arrives, is served in milliseconds, and the connection is freed. The server holds very little at once, and scaling is straightforward — any instance can serve any request.

Every real-time technology breaks that virtue. The connection is now long — open for minutes, hours, the whole time a user has the tab open. That changes the engineering in ways that catch teams off guard.

Connections are now state, and state is a ceiling. Every open WebSocket or SSE stream consumes memory and a file descriptor for its entire lifetime. Ten thousand idle users doing nothing are still ten thousand live connections your server is paying to hold. Capacity is no longer "requests per second" — it is "concurrent connections," a fundamentally different and lower limit.

Horizontal scaling gets hard. With request/response, a load balancer sends each request to whatever instance is free. With persistent connections, user A is pinned to server 1 and user B to server 2 for the whole session. Now: a chat message from A to B has to get from server 1 to server 2. The instances must talk to each other — typically through a shared message backbone like Redis pub/sub or a dedicated broker — so a message can find whichever server holds its recipient.

Text
   user A ─ws─► server 1 ──┐
                           ├──► Redis pub/sub ──► routes between instances
   user B ─ws─► server 2 ──┘

Reconnection is now your problem. Networks drop. Phones move between cell towers and wifi. Laptops sleep. A persistent connection will break, routinely, and the client must detect the drop, reconnect, and recover anything missed during the gap. SSE reconnects on its own; WebSocket reconnection logic you write yourself.

Backpressure becomes real. If the server produces messages faster than a particular slow client can receive them, those messages queue up in memory for that connection. A handful of slow consumers can quietly grow your memory until the process falls over. A real-time server has to be able to slow down, drop, or disconnect a client that cannot keep up.

None of this is a reason to avoid real-time. It is a reason to be clear-eyed: you are trading the easy scaling of stateless request/response for a stateful system, and that trade has to be worth it.


Choosing — and when not to go real-time at all

A straight decision path, cheapest first:

Text
   not time-critical          → request/response
   occasional fresh data      → polling          (still stateless — fine)
   server pushes, one-way     → SSE
   both push, low-latency     → WebSockets        (accept the stateful cost)

The honest summary: real-time is a genuine capability jump, and it is also a genuine architectural cost. You leave behind stateless scaling and take on connection limits, cross-instance routing, reconnection, and backpressure. That trade is clearly worth it for a chat app or a multiplayer game — the product is the real-time experience. It is clearly not worth it for a notification badge that could just update on the next page load.

Pick the lightest model the product actually requires, not the most impressive one. A team that ships SSE where SSE fits, and polling where polling fits, will operate a calmer system than one that puts WebSockets behind everything that moves. Next module: API design — including the design choices that decide whether you needed real-time in the first place.


Source & Credit

The Backend from First Principles series is based on what I learnt from Sriniously's YouTube playlist — a thoughtful, framework-agnostic walk through backend engineering. If this material helped you, please go check the original out: youtube.com/@Sriniously. The notes here are my own restatement for revisiting later.

⁂ Back to all modules