Your backend works fine right now. It handles your current traffic without complaint. That’s exactly the problem, “works fine at current traffic” and “works fine at scale” are two completely different claims, and most backends only get tested against the first one.
Here are five mistakes that don’t show up in development, don’t show up in your first hundred users, and then take the whole system down the moment things actually take off.
1. No Connection Pooling
Every request opens a new database connection. At 10 users, nobody notices. At 500 concurrent users, you hit your database’s connection limit and every new request starts failing, not because your code is wrong, but because you’ve run out of connections to hand out.
The fix is a connection pool: a fixed set of database connections that get reused across requests instead of opened and closed every time. Most ORMs and database clients support this out of the box (Prisma, Drizzle, pg-pool for raw Postgres). The default pool size is usually too small for production, check it, don't assume it.
2. No Caching
The same data gets fetched from the database on every single request. Your homepage hits the DB ten thousand times an hour, serving the exact same content every time.
A basic caching layer eliminates most of that load instantly. It doesn’t need to be complicated:
- Redis for shared cache across instances
- In-memory cache for single-instance apps
- HTTP caching headers (
Cache-Control,ETag) for anything that doesn't change per-user
The point isn’t to cache everything. It’s to stop re-fetching data that hasn’t changed since the last request.
3. N+1 Queries
You fetch a list of 100 items, then run a separate query for each item’s related data. That’s 101 queries where one would do.
At small scale this is invisible, 101 fast queries still feels instant. At real scale, with real traffic, it’s a disaster: your database spends all its time doing the same repetitive lookups instead of serving new requests.
The fix is almost always a join, or a batched query (WHERE id IN (...)) instead of a loop that queries per item. Most ORMs have a way to eager-load relations, use it. This is one of the most common and most avoidable failures in backend code.
4. No Rate Limiting
No protection against traffic spikes, bots, or a single user hammering your API. One bad actor, or a successful product launch, can take down an unprotected backend in minutes.
Rate limiting doesn’t need to be sophisticated to be effective. Even a basic per-IP or per-user request cap, enforced at the API gateway or middleware level, stops the most common failure mode: one client accidentally or intentionally sending far more requests than your system can handle.
5. Synchronous Everything
Long-running tasks blocking the main thread. File uploads, sending emails, generating PDFs, processing images, none of these should happen synchronously inside a request-response cycle.
When a task takes 3 seconds and your API is handling it inline, that’s 3 seconds where a request handler and a connection are tied up doing nothing but waiting. Move it to a queue (even a simple one). Your API response time drops immediately, and slow tasks stop being able to block fast ones.
The Pattern Behind All Five
None of these mistakes are hard to fix. Individually, they’re small, well-understood problems with well-understood solutions. What makes them dangerous is that they’re easy to skip when you’re moving fast to ship, and they don’t cost you anything until the exact moment you can least afford it: when traffic actually shows up.
The time to fix them is before you need to, not during the outage.
I build backends that don’t break when things get real, NodeJS, Go, AWS. If you’re scaling and want a second pair of eyes on your architecture, find me on LinkedIn or check out my work at jealous.dev.