Graceful Shutdown
SIGTERM. Liveness vs readiness. Zero-downtime deploys in Kubernetes.
Every deploy kills your server
Starting a server is easy and you have done it a thousand times. Stopping one correctly is the part nobody teaches, and it is where a surprising share of deploy-time errors come from.
Here is the thing to internalise: your process gets killed constantly, and that is normal. Every deploy replaces old processes with new ones. Every autoscale-down event removes a process. Kubernetes reschedules pods. A node drains for maintenance. In a healthy system, processes are stopped and started all day long.
The question is not whether your process will be told to stop — it will, routinely — but what it does in the seconds between being told and actually exiting. Get that wrong and every single deploy sprays a handful of errors at whichever users were unlucky enough to have a request in flight. Get it right and deploys are invisible.
naive shutdown: graceful shutdown:
┌────────────────┐ ┌────────────────┐
│ SIGTERM │ │ SIGTERM │
│ process exits │ │ stop accepting │
│ immediately │ │ finish in-flight│
│ │ │ close resources │
│ in-flight reqs │ │ THEN exit │
│ → CONNECTION │ │ │
│ RESET │ │ users see │
│ jobs → abandoned│ │ nothing │
└────────────────┘ └────────────────┘
That gap is what this module is about.
How a process is asked to stop
Processes are stopped with signals — small messages the operating system delivers to your process. Two matter here.
SIGTERM — "please stop." This is the polite request. It is what kill sends by default, what Docker sends on docker stop, what Kubernetes sends to a pod it is terminating, what your process manager sends on a restart. Crucially, SIGTERM is catchable: your process can install a handler, run cleanup code, and decide when to actually exit.
SIGKILL — "stop now." This cannot be caught, handled, or delayed. The kernel destroys the process immediately. In-flight requests are severed mid-byte, open files are abandoned, nothing runs.
The shutdown sequence in any modern orchestrator is the same two-step:
1. SIGTERM sent ─► process has a grace period to clean up
2. (grace period: typically 30s in Kubernetes)
3. still alive? ─► SIGKILL — forced, no recourse
This design is generous. You are given a window — usually around 30 seconds — to shut down cleanly. Graceful shutdown is simply the act of using that window instead of ignoring it. A process that ignores SIGTERM does not avoid death; it just guarantees the death is the violent kind, and wastes the grace period it was handed.
Your job: catch SIGTERM, and in the seconds it buys you, leave nothing broken behind.
The shutdown sequence, step by step
A correct shutdown runs an ordered sequence. Order matters — doing these in the wrong sequence reintroduces the very errors you are trying to remove.
Step 1 — Stop accepting new work. Close the listening socket so no new connection is accepted. Connections already established keep working; new ones go elsewhere. This is the difference between server.close() (stop listening, let existing requests finish) and yanking the power.
Step 2 — Let in-flight requests finish. The requests already being handled when SIGTERM arrived are allowed to run to completion and send their responses. This is draining. A request that was 200ms from finishing gets its 200ms.
Step 3 — Stop background work cleanly. A job worker should finish its current job (or hand it back to the queue) rather than abandon it half-done. Stop polling for new jobs.
Step 4 — Close outbound resources. Now that nothing is still using them, close the database pool, the Redis connection, the message-broker channel. Doing this before step 2 would break the requests still draining — hence the order.
Step 5 — Exit. Cleanup done, call exit(0).
SIGTERM
│
▼
[1] close listening socket — no new requests accepted
│
▼
[2] drain in-flight requests — existing requests finish
│
▼
[3] stop background workers — current job completes
│
▼
[4] close DB / Redis / queue — safe now, nothing using them
│
▼
[5] exit(0)
The principle: stop intake first, finish existing work, tear down resources last.
Implementation
Here is the sequence as real code. The shape is the same in every language — install a SIGTERM handler, stop the listener, await in-flight work, close resources, exit.
const server = app.listen(3000);
let shuttingDown = false;
async function shutdown(signal) {
if (shuttingDown) return; // ignore a second signal
shuttingDown = true;
console.log(`${signal} received — starting graceful shutdown`);
// [1] stop accepting new connections; callback fires when
// all in-flight requests [2] have completed
server.close(async () => {
try {
await jobWorker.stop(); // [3] finish current job, stop polling
await db.end(); // [4] close the connection pool
await redis.quit(); // [4] close Redis
console.log('cleanup complete — exiting');
process.exit(0); // [5]
} catch (err) {
console.error('error during shutdown', err);
process.exit(1);
}
});
// safety net: if draining hangs, force exit before SIGKILL lands
setTimeout(() => {
console.error('drain timed out — forcing exit');
process.exit(1);
}, 25_000).unref(); // < the 30s orchestrator grace period
}
process.on('SIGTERM', () => shutdown('SIGTERM'));
process.on('SIGINT', () => shutdown('SIGINT')); // Ctrl-C in local dev
Two details that are easy to miss but matter.
The shuttingDown guard. Orchestrators sometimes send SIGTERM more than once. Without the guard, the second signal restarts the whole sequence and you race yourself.
The timeout safety net, set below the orchestrator's grace period. If a request hangs forever — a stuck database call, a slow external API — draining never completes and you would sit there until SIGKILL arrives anyway. The internal timeout lets you decide to give up at 25s, log it, and exit on your own terms instead of being force-killed at 30s. .unref() keeps that timer from itself holding the process alive.
The load balancer race nobody mentions
Implement everything above and you will still see occasional deploy errors. There is a subtle race, and it is worth understanding because it surprises almost everyone.
A load balancer (or Kubernetes Service) sends traffic to your process based on a readiness check — it routes to instances that report healthy. When a pod is terminated, two things happen at roughly the same time: SIGTERM is sent to your process, and the pod is removed from the load balancer's pool.
"At roughly the same time" is the problem. These are separate systems and the removal is not instant — it takes a moment to propagate. So there is a brief window where your process has already received SIGTERM and closed its listening socket, but the load balancer still thinks the pod is healthy and is still routing new requests to it. Those requests arrive at a socket that is no longer accepting. Connection refused.
t=0 SIGTERM sent ┊ LB told to remove pod
t=0 socket closing ┊ ...propagating...
t=0+ε socket CLOSED ┊ LB STILL routing here ◄── requests refused
t=0+δ ┊ LB finally stops routing
The fix is counter-intuitive: on SIGTERM, do not close the socket immediately. First fail your readiness check, then wait a few seconds, then begin the real shutdown.
async function shutdown(signal) {
isReady = false; // readiness probe now returns 503
await sleep(5000); // give the LB time to notice and stop routing
server.close(/* ...the real drain sequence... */);
}
For those few seconds the process is alive and still serving — it just reports "not ready" so the load balancer drains it from rotation. By the time you actually close the socket, no new traffic is being sent to it. The race is gone.
This is why your readiness check and your liveness check should be different endpoints. Liveness ("is the process alive?") stays green right through shutdown — you do not want to be restarted, you want to exit. Readiness ("should I get traffic?") goes red the instant SIGTERM lands.
Common mistakes and what to skip
Closing the database before requests finish. Tearing down the connection pool while requests are still draining breaks exactly those requests. Resources close last, after the drain. Order is not optional.
No drain timeout. Trust every in-flight request to finish quickly and one stuck request makes shutdown hang until SIGKILL. Always cap the drain with an internal timeout set below the orchestrator's grace period.
Ignoring the load balancer race. The most common reason a "correct" graceful shutdown still produces deploy errors. Fail readiness first, pause, then drain.
Handling SIGKILL. You cannot. Any tutorial showing a SIGKILL handler is wrong — the kernel never delivers it to your code. Design for SIGTERM; SIGKILL is the failure mode you are trying to never reach.
Doing slow work in the shutdown handler. Flushing a huge cache to disk, processing one last backlog — if it does not fit in the grace window, it gets SIGKILL'd mid-way, which is worse than not starting it. Keep shutdown lean: stop intake, drain, close, exit.
A note on proportion. If you are running a single long-lived process that you restart by hand once a month, the load-balancer race does not apply to you and a basic SIGTERM handler that closes the server and the database is plenty. The full sequence — readiness flip, timed drain, the LB pause — earns its complexity when you deploy frequently behind a load balancer or orchestrator. That is also exactly the environment where skipping it produces a steady drip of deploy errors. Match the effort to how often you actually stop processes.
Graceful shutdown is unglamorous and it is the difference between deploys nobody notices and deploys that page someone. The next module covers scaling and performance — and a fleet you can scale up and down freely depends entirely on every instance being able to stop cleanly.
The Backend from First Principles series is based on what I learnt from Sriniously's YouTube playlist — a thoughtful, framework-agnostic walk through backend engineering. If this material helped you, please go check the original out: youtube.com/@Sriniously. The notes here are my own restatement for revisiting later.