We shipped the login endpoint, ran it through QA, and called it done. Single-user tests looked fine. bcrypt with 10 rounds felt reasonable — secure, standard, what every tutorial recommends.
Then we pointed k6 at it with 100 virtual users and watched the whole authentication path fall apart.
Requests that normally returned in 200ms started timing out at 30 seconds. Error rates climbed. Pod CPU graphs looked like a heartbeat monitor having a bad day. And the strangest part: when we added logging around the password check alone, a single bcrypt.compare call was taking up to 6 seconds under load — not because bcrypt rounds were wrong, but because the requests were queuing behind each other like cars at a toll booth with one open lane.
This is the story of that bottleneck, why UV_THREADPOOL_SIZE wasn't the whole answer, and how we fixed it by moving password verification into a NestJS native script — a separate entry point in the same codebase, no new microservice required.
The setup
Our stack was straightforward:
- NestJS API behind an ingress controller on Kubernetes
- PostgreSQL for user records
- bcrypt (cost factor 10) for password hashing at registration and comparison at login
- k6 for load testing before a marketing launch that expected a traffic spike
The login flow was equally standard: look up the user by email, fetch the stored hash, run bcrypt.compare(plainPassword, storedHash), issue a JWT if it matched.
In development, this felt fast. On staging with one user clicking login, p95 latency was under 300ms. We had two replicas, horizontal pod autoscaling configured, and reasonable CPU limits — 500m request, 1000m limit per pod.
Confident, we ran the load test.
What 100 VUs actually looked like
The k6 script was simple: 100 virtual users, ramp up over 30 seconds, each executing a login against a pool of test accounts with known passwords.
Within two minutes:
- p95 response time exceeded 15 seconds
- p99 hit the 30-second client timeout
- HTTP 504 errors appeared at the ingress layer
- Application logs showed login handlers starting but not finishing
CPU on both pods pegged near their limits. Memory was fine. Database connection pool had headroom. The bottleneck wasn't I/O — it was something blocking inside the Node.js process itself.
We stripped the login handler down to isolate the cost:
- DB lookup only → fast, even under load
- DB lookup + JWT signing → still fine
- DB lookup +
bcrypt.compare→ everything collapsed
That's when we measured it: under 100 concurrent login attempts, individual compare operations that took ~80ms in isolation were taking 4–6 seconds wall-clock time. Not because bcrypt got slower — because they were waiting in a queue.
Why bcrypt chokes Node.js under concurrency
bcrypt is deliberately slow. That's the feature. Each compare is CPU-intensive work designed to make brute-force attacks expensive.
In Node.js, the bcrypt npm package offloads this work to libuv's thread pool so the main event loop can keep handling other requests — in theory. In practice, that thread pool is shared globally across your entire process for all async file I/O, DNS lookups, and native crypto operations that use it.
The default pool size? 4 threads.
So when 100 login requests arrive and each one calls bcrypt.compare, you get something like this:
| Concurrent compares waiting | Effective behaviour |
|---|---|
| 1–4 | Compares run in parallel, latency stays low |
| 5–20 | Requests queue, latency grows linearly |
| 50–100 | Queue depth explodes, timeouts everywhere |
Do the math: if each compare takes 80ms and you have 4 threads, throughput is roughly 50 compares/second. At 100 simultaneous logins, the last request in a batch might wait 2 seconds just in queue time — before you account for CPU throttling, GC pauses, or other pool consumers.
That 6-second compare we measured wasn't bcrypt running for 6 seconds. It was 80ms of work + 5+ seconds of waiting.
We tried raising UV_THREADPOOL_SIZE
The first fix everyone suggests — including Stack Overflow, including me before this incident — is bumping the thread pool:
UV_THREADPOOL_SIZE=128 node dist/main.js
We added it to our Kubernetes deployment manifest:
env:
- name: UV_THREADPOOL_SIZE
value: "128"
It helped. Timeouts dropped. p95 went from 15s to around 4s. Better, but still unacceptable for a login endpoint, and the numbers didn't match what we expected.
Three problems remained:
1. CPU limits on Kubernetes
More thread pool threads don't create more CPU cores. Our pods were capped at 1 core. Throwing 128 threads at 1 core mostly means 128 threads competing for the same CPU, with context-switch overhead on top. Under load, the kernel scheduler became part of the bottleneck.
2. Every pod has its own pool
With 2 replicas, we didn't have a pool of 256 threads — we had two independent pools of 128, each attached to a pod receiving roughly half the traffic. Scaling horizontally didn't fix the per-process concurrency math.
3. The HTTP server still shared the pool
Our NestJS process wasn't just verifying passwords. It was also serving health checks, handling token refresh endpoints, writing audit logs, and running background cron tasks via @nestjs/schedule. All of them competed for the same libuv thread pool. Login traffic could starve everything else — or the reverse.
We had improved the symptom without fixing the architecture.
Why this hurts more on Kubernetes than on a laptop
On a local machine with 8 cores and no CPU limit, UV_THREADPOOL_SIZE=16 often "just works" for moderate load tests. Kubernetes adds constraints that make bcrypt's behaviour much worse:
- CPU limits throttle your process mid-compute, stretching compare times unpredictably
- Multiple replicas split traffic but don't coordinate CPU-heavy work
- Liveness probes still hit
/healthwhile login hammers the thread pool — we saw health check latency spike during load tests, which almost triggered pod restarts - Autoscaling on CPU kicked in, added a third pod, and briefly made things worse as new pods cold-started during the traffic peak
The load test that passed on a developer's M1 Mac failed miserably against production-like k8s limits. That gap is worth closing before you promise a launch date.
What didn't work (and why)
We tried several intermediate fixes. Each one taught us something:
Lowering bcrypt rounds (12 → 10 → 8) — We were already at 10. Dropping to 8 shaved maybe 30% off compare time but didn't solve queueing at 100 VUs. Also a security regression we weren't willing to ship.
Switching to bcrypt.compare vs compareSync — We were already on the async variant. Both use the thread pool. No meaningful difference under this load pattern.
Rate limiting login — Correct for abuse prevention, but the business requirement was handling 100 legitimate concurrent logins during peak events. Rate limiting just moved the failure to the user.
Bigger pods (2 CPU, 2Gi memory) — Improved throughput, raised cost, still shared the pool with the rest of the app. p95 dropped to ~2s but wasn't reliable under spikes.
We needed to stop running bcrypt inside the HTTP server process — not tune the same process harder.
The fix: move bcrypt to a NestJS native script
NestJS supports running code outside the HTTP server through a native script — a standalone entry point bootstrapped with NestFactory.createApplicationContext(). No Express adapter, no port binding, no request middleware. Just your modules, DI, and the logic you need.
That turned out to be exactly what we needed.
Before: bcrypt inside the login handler
The default NestJS pattern puts everything in one process:
// auth.service.ts — inside the HTTP app
async login(email: string, password: string) {
const user = await this.users.findByEmail(email)
const valid = await bcrypt.compare(password, user.passwordHash) // blocks the thread pool
if (!valid) throw new UnauthorizedException()
return this.signToken(user)
}
Under 100 concurrent logins, every request hit the same libuv thread pool. Health checks, cron jobs, and other endpoints shared that pool. Everything queued.
After: bcrypt in a separate NestJS script
We added a second entry point in the same project:
src/
main.ts ← HTTP API (unchanged entry)
scripts/
verify-password.ts ← native script (new entry)
The native script bootstraps only what it needs:
// scripts/verify-password.ts
async function bootstrap() {
const app = await NestFactory.createApplicationContext(AuthScriptModule, {
logger: false,
})
const verifier = app.get(PasswordVerifierService)
// Read { hash, candidate } from stdin, write result to stdout
const input = JSON.parse(await readStdin())
const valid = await verifier.compare(input.candidate, input.hash)
process.stdout.write(JSON.stringify({ valid }))
await app.close()
}
Build it as a separate output in nest-cli.json:
{
"compilerOptions": {
"assets": [],
"plugins": []
},
"projects": {
"api": { "type": "application", "root": "src", "entryFile": "main" },
"verify-password": {
"type": "application",
"root": "src",
"entryFile": "scripts/verify-password"
}
}
}
The login handler no longer calls bcrypt.compare directly. It delegates to the script:
// auth.service.ts — HTTP app, no bcrypt import
async login(email: string, password: string) {
const user = await this.users.findByEmail(email)
const { valid } = await this.scriptRunner.run('verify-password', {
candidate: password,
hash: user.passwordHash,
})
if (!valid) throw new UnauthorizedException()
return this.signToken(user)
}
ScriptRunner spawns the native script as a child process via child_process.spawn. Each compare runs in its own Node.js process with its own thread pool — completely isolated from the HTTP server.
Why this reduces the bottleneck
The improvement isn't magic — it's process isolation:
| HTTP server (before) | Native script (after) | |
|---|---|---|
| Thread pool | Shared with all requests | Dedicated per compare |
| CPU competition | bcrypt vs health checks vs cron | bcrypt only |
| Under 100 VUs | 100 compares queue on 4 threads | API spawns compares in parallel processes |
| API event loop | Blocked waiting on pool | Free for I/O |
Each verify-password script process exits after one compare. The overhead of spawning a process (~10–20ms) is negligible compared to a 6-second queue wait.
On Kubernetes, the HTTP deployment stays lean. No need to oversized UV_THREADPOOL_SIZE on the API pod. The script processes inherit the pod's CPU limit but don't block each other inside a single event loop.
Deploying on Kubernetes
Same Docker image, two commands:
# API deployment — handles HTTP only
containers:
- name: api
image: my-app:latest
command: ["node", "dist/main.js"]
# No separate worker deployment needed.
# Script runs on-demand via child_process inside the API pod.
We also tested a variant where the script runs as a long-lived sidecar process (still a NestJS native script, just kept alive). Results were similar, but spawn-per-request was simpler to ship and easier to reason about under load.
Results after the change
Same k6 script, same 100 VUs, same k8s cluster:
| Metric | Before | After |
|---|---|---|
| p95 login latency | 15,000ms+ | 380ms |
| p99 login latency | 30,000ms (timeout) | 650ms |
| Error rate | 34% | 0% |
| bcrypt compare (wall clock) | up to 6,000ms | 90–110ms |
| API pod CPU during test | ~95% | ~30% |
The API pod stopped fighting bcrypt for CPU. Compare time dropped because each verification got its own process instead of waiting in a shared queue.
Lessons I'd pass on
Load-test the auth path separately. Login is not a CRUD endpoint. It has different CPU characteristics and failure modes. A load test that only hits GET /products will miss this entirely.
Don't fix a process isolation problem with UV_THREADPOOL_SIZE alone. Raising the pool helps in dev, but on k8s with CPU limits you're still cramming all concurrent work into one Node.js process. A NestJS native script gives you isolation without a new microservice.
Kubernetes CPU limits multiply every Node.js concurrency footgun. If your pod has 500m CPU, assume you can sustain far fewer parallel bcrypt operations than your thread pool size suggests.
Measure wall-clock time, not just handler time. Our APM showed login handlers taking 6 seconds total and initially blamed the database. Only when we logged timestamps around the compare call did we see the queueing gap.
Use NestJS native scripts before reaching for a new service. NestFactory.createApplicationContext() is built for exactly this — reuse your modules and DI, run CPU-heavy work in a separate process, zero HTTP overhead. No Redis queue, no gRPC, no second deployment required.
The pain point in one sentence
Node.js login endpoints using bcrypt will silently queue under concurrent load — and Kubernetes CPU limits turn a thread pool tuning problem into a production outage.
If you're running NestJS on k8s and haven't load-tested login at realistic concurrency, do it this week. The failure mode is invisible until it isn't.
Related reading
- Docker Containers Die Quietly — And Nobody Gets Paged — another production failure that only shows up under real load
- Why Email Deliverability Is Still Broken in 2025 — silent degradation patterns in production systems