Back to Blog
nestjsnodejskubernetesperformancepain-point

100 Users Log In at Once and Your API Dies — The bcrypt Bottleneck Nobody Warns You About

Login load tests at 100 VUs timed out on Kubernetes. A single bcrypt.compare took 6 seconds. UV_THREADPOOL_SIZE wasn't enough. We fixed it by moving bcrypt out of the HTTP server into a NestJS native script.

· haile37

We shipped the login endpoint, ran it through QA, and called it done. Single-user tests looked fine. bcrypt with 10 rounds felt reasonable — secure, standard, what every tutorial recommends.

Then we pointed k6 at it with 100 virtual users and watched the whole authentication path fall apart.

Requests that normally returned in 200ms started timing out at 30 seconds. Error rates climbed. Pod CPU graphs looked like a heartbeat monitor having a bad day. And the strangest part: when we added logging around the password check alone, a single bcrypt.compare call was taking up to 6 seconds under load — not because bcrypt rounds were wrong, but because the requests were queuing behind each other like cars at a toll booth with one open lane.

This is the story of that bottleneck, why UV_THREADPOOL_SIZE wasn't the whole answer, and how we fixed it by moving password verification into a NestJS native script — a separate entry point in the same codebase, no new microservice required.

The setup

Our stack was straightforward:

  • NestJS API behind an ingress controller on Kubernetes
  • PostgreSQL for user records
  • bcrypt (cost factor 10) for password hashing at registration and comparison at login
  • k6 for load testing before a marketing launch that expected a traffic spike

The login flow was equally standard: look up the user by email, fetch the stored hash, run bcrypt.compare(plainPassword, storedHash), issue a JWT if it matched.

In development, this felt fast. On staging with one user clicking login, p95 latency was under 300ms. We had two replicas, horizontal pod autoscaling configured, and reasonable CPU limits — 500m request, 1000m limit per pod.

Confident, we ran the load test.

What 100 VUs actually looked like

The k6 script was simple: 100 virtual users, ramp up over 30 seconds, each executing a login against a pool of test accounts with known passwords.

Within two minutes:

  • p95 response time exceeded 15 seconds
  • p99 hit the 30-second client timeout
  • HTTP 504 errors appeared at the ingress layer
  • Application logs showed login handlers starting but not finishing

CPU on both pods pegged near their limits. Memory was fine. Database connection pool had headroom. The bottleneck wasn't I/O — it was something blocking inside the Node.js process itself.

We stripped the login handler down to isolate the cost:

  1. DB lookup only → fast, even under load
  2. DB lookup + JWT signing → still fine
  3. DB lookup + bcrypt.compareeverything collapsed

That's when we measured it: under 100 concurrent login attempts, individual compare operations that took ~80ms in isolation were taking 4–6 seconds wall-clock time. Not because bcrypt got slower — because they were waiting in a queue.

Why bcrypt chokes Node.js under concurrency

bcrypt is deliberately slow. That's the feature. Each compare is CPU-intensive work designed to make brute-force attacks expensive.

In Node.js, the bcrypt npm package offloads this work to libuv's thread pool so the main event loop can keep handling other requests — in theory. In practice, that thread pool is shared globally across your entire process for all async file I/O, DNS lookups, and native crypto operations that use it.

The default pool size? 4 threads.

So when 100 login requests arrive and each one calls bcrypt.compare, you get something like this:

Concurrent compares waitingEffective behaviour
1–4Compares run in parallel, latency stays low
5–20Requests queue, latency grows linearly
50–100Queue depth explodes, timeouts everywhere

Do the math: if each compare takes 80ms and you have 4 threads, throughput is roughly 50 compares/second. At 100 simultaneous logins, the last request in a batch might wait 2 seconds just in queue time — before you account for CPU throttling, GC pauses, or other pool consumers.

That 6-second compare we measured wasn't bcrypt running for 6 seconds. It was 80ms of work + 5+ seconds of waiting.

We tried raising UV_THREADPOOL_SIZE

The first fix everyone suggests — including Stack Overflow, including me before this incident — is bumping the thread pool:

UV_THREADPOOL_SIZE=128 node dist/main.js

We added it to our Kubernetes deployment manifest:

env:
  - name: UV_THREADPOOL_SIZE
    value: "128"

It helped. Timeouts dropped. p95 went from 15s to around 4s. Better, but still unacceptable for a login endpoint, and the numbers didn't match what we expected.

Three problems remained:

1. CPU limits on Kubernetes

More thread pool threads don't create more CPU cores. Our pods were capped at 1 core. Throwing 128 threads at 1 core mostly means 128 threads competing for the same CPU, with context-switch overhead on top. Under load, the kernel scheduler became part of the bottleneck.

2. Every pod has its own pool

With 2 replicas, we didn't have a pool of 256 threads — we had two independent pools of 128, each attached to a pod receiving roughly half the traffic. Scaling horizontally didn't fix the per-process concurrency math.

3. The HTTP server still shared the pool

Our NestJS process wasn't just verifying passwords. It was also serving health checks, handling token refresh endpoints, writing audit logs, and running background cron tasks via @nestjs/schedule. All of them competed for the same libuv thread pool. Login traffic could starve everything else — or the reverse.

We had improved the symptom without fixing the architecture.

Why this hurts more on Kubernetes than on a laptop

On a local machine with 8 cores and no CPU limit, UV_THREADPOOL_SIZE=16 often "just works" for moderate load tests. Kubernetes adds constraints that make bcrypt's behaviour much worse:

  • CPU limits throttle your process mid-compute, stretching compare times unpredictably
  • Multiple replicas split traffic but don't coordinate CPU-heavy work
  • Liveness probes still hit /health while login hammers the thread pool — we saw health check latency spike during load tests, which almost triggered pod restarts
  • Autoscaling on CPU kicked in, added a third pod, and briefly made things worse as new pods cold-started during the traffic peak

The load test that passed on a developer's M1 Mac failed miserably against production-like k8s limits. That gap is worth closing before you promise a launch date.

What didn't work (and why)

We tried several intermediate fixes. Each one taught us something:

Lowering bcrypt rounds (12 → 10 → 8) — We were already at 10. Dropping to 8 shaved maybe 30% off compare time but didn't solve queueing at 100 VUs. Also a security regression we weren't willing to ship.

Switching to bcrypt.compare vs compareSync — We were already on the async variant. Both use the thread pool. No meaningful difference under this load pattern.

Rate limiting login — Correct for abuse prevention, but the business requirement was handling 100 legitimate concurrent logins during peak events. Rate limiting just moved the failure to the user.

Bigger pods (2 CPU, 2Gi memory) — Improved throughput, raised cost, still shared the pool with the rest of the app. p95 dropped to ~2s but wasn't reliable under spikes.

We needed to stop running bcrypt inside the HTTP server process — not tune the same process harder.

The fix: move bcrypt to a NestJS native script

NestJS supports running code outside the HTTP server through a native script — a standalone entry point bootstrapped with NestFactory.createApplicationContext(). No Express adapter, no port binding, no request middleware. Just your modules, DI, and the logic you need.

That turned out to be exactly what we needed.

Before: bcrypt inside the login handler

The default NestJS pattern puts everything in one process:

// auth.service.ts — inside the HTTP app
async login(email: string, password: string) {
  const user = await this.users.findByEmail(email)
  const valid = await bcrypt.compare(password, user.passwordHash) // blocks the thread pool
  if (!valid) throw new UnauthorizedException()
  return this.signToken(user)
}

Under 100 concurrent logins, every request hit the same libuv thread pool. Health checks, cron jobs, and other endpoints shared that pool. Everything queued.

After: bcrypt in a separate NestJS script

We added a second entry point in the same project:

src/
  main.ts              ← HTTP API (unchanged entry)
  scripts/
    verify-password.ts ← native script (new entry)

The native script bootstraps only what it needs:

// scripts/verify-password.ts
async function bootstrap() {
  const app = await NestFactory.createApplicationContext(AuthScriptModule, {
    logger: false,
  })

  const verifier = app.get(PasswordVerifierService)

  // Read { hash, candidate } from stdin, write result to stdout
  const input = JSON.parse(await readStdin())
  const valid = await verifier.compare(input.candidate, input.hash)
  process.stdout.write(JSON.stringify({ valid }))

  await app.close()
}

Build it as a separate output in nest-cli.json:

{
  "compilerOptions": {
    "assets": [],
    "plugins": []
  },
  "projects": {
    "api": { "type": "application", "root": "src", "entryFile": "main" },
    "verify-password": {
      "type": "application",
      "root": "src",
      "entryFile": "scripts/verify-password"
    }
  }
}

The login handler no longer calls bcrypt.compare directly. It delegates to the script:

// auth.service.ts — HTTP app, no bcrypt import
async login(email: string, password: string) {
  const user = await this.users.findByEmail(email)
  const { valid } = await this.scriptRunner.run('verify-password', {
    candidate: password,
    hash: user.passwordHash,
  })
  if (!valid) throw new UnauthorizedException()
  return this.signToken(user)
}

ScriptRunner spawns the native script as a child process via child_process.spawn. Each compare runs in its own Node.js process with its own thread pool — completely isolated from the HTTP server.

Why this reduces the bottleneck

The improvement isn't magic — it's process isolation:

HTTP server (before)Native script (after)
Thread poolShared with all requestsDedicated per compare
CPU competitionbcrypt vs health checks vs cronbcrypt only
Under 100 VUs100 compares queue on 4 threadsAPI spawns compares in parallel processes
API event loopBlocked waiting on poolFree for I/O

Each verify-password script process exits after one compare. The overhead of spawning a process (~10–20ms) is negligible compared to a 6-second queue wait.

On Kubernetes, the HTTP deployment stays lean. No need to oversized UV_THREADPOOL_SIZE on the API pod. The script processes inherit the pod's CPU limit but don't block each other inside a single event loop.

Deploying on Kubernetes

Same Docker image, two commands:

# API deployment — handles HTTP only
containers:
  - name: api
    image: my-app:latest
    command: ["node", "dist/main.js"]

# No separate worker deployment needed.
# Script runs on-demand via child_process inside the API pod.

We also tested a variant where the script runs as a long-lived sidecar process (still a NestJS native script, just kept alive). Results were similar, but spawn-per-request was simpler to ship and easier to reason about under load.

Results after the change

Same k6 script, same 100 VUs, same k8s cluster:

MetricBeforeAfter
p95 login latency15,000ms+380ms
p99 login latency30,000ms (timeout)650ms
Error rate34%0%
bcrypt compare (wall clock)up to 6,000ms90–110ms
API pod CPU during test~95%~30%

The API pod stopped fighting bcrypt for CPU. Compare time dropped because each verification got its own process instead of waiting in a shared queue.

Lessons I'd pass on

Load-test the auth path separately. Login is not a CRUD endpoint. It has different CPU characteristics and failure modes. A load test that only hits GET /products will miss this entirely.

Don't fix a process isolation problem with UV_THREADPOOL_SIZE alone. Raising the pool helps in dev, but on k8s with CPU limits you're still cramming all concurrent work into one Node.js process. A NestJS native script gives you isolation without a new microservice.

Kubernetes CPU limits multiply every Node.js concurrency footgun. If your pod has 500m CPU, assume you can sustain far fewer parallel bcrypt operations than your thread pool size suggests.

Measure wall-clock time, not just handler time. Our APM showed login handlers taking 6 seconds total and initially blamed the database. Only when we logged timestamps around the compare call did we see the queueing gap.

Use NestJS native scripts before reaching for a new service. NestFactory.createApplicationContext() is built for exactly this — reuse your modules and DI, run CPU-heavy work in a separate process, zero HTTP overhead. No Redis queue, no gRPC, no second deployment required.

The pain point in one sentence

Node.js login endpoints using bcrypt will silently queue under concurrent load — and Kubernetes CPU limits turn a thread pool tuning problem into a production outage.

If you're running NestJS on k8s and haven't load-tested login at realistic concurrency, do it this week. The failure mode is invisible until it isn't.

Leave feedback

Found this useful? Have a question or suggestion? I'd love to hear it.