100 Users Log In at Once and Your API Dies — The bcrypt Bottleneck Nobody Warns You About

We shipped the login endpoint, ran it through QA, and called it done. Single-user tests looked fine. bcrypt with 10 rounds felt reasonable — secure, standard, what every tutorial recommends.

Then we pointed k6 at it with 100 virtual users and watched the whole authentication path fall apart.

Requests that normally returned in 200ms started timing out at 30 seconds. Error rates climbed. Pod CPU graphs looked like a heartbeat monitor having a bad day. And the strangest part: when we added logging around the password check alone, a single bcrypt.compare call was taking up to 6 seconds under load — not because bcrypt rounds were wrong, but because the requests were queuing behind each other like cars at a toll booth with one open lane.

This is the story of that bottleneck, why UV_THREADPOOL_SIZE wasn't the whole answer, and how we fixed it by moving password verification into a NestJS native script — a separate entry point in the same codebase, no new microservice required.

The setup

Our stack was straightforward:

NestJS API behind an ingress controller on Kubernetes
PostgreSQL for user records
bcrypt (cost factor 10) for password hashing at registration and comparison at login
k6 for load testing before a marketing launch that expected a traffic spike

The login flow was equally standard: look up the user by email, fetch the stored hash, run bcrypt.compare(plainPassword, storedHash), issue a JWT if it matched.

In development, this felt fast. On staging with one user clicking login, p95 latency was under 300ms. We had two replicas, horizontal pod autoscaling configured, and reasonable CPU limits — 500m request, 1000m limit per pod.

Confident, we ran the load test.

What 100 VUs actually looked like

The k6 script was simple: 100 virtual users, ramp up over 30 seconds, each executing a login against a pool of test accounts with known passwords.

Within two minutes:

p95 response time exceeded 15 seconds
p99 hit the 30-second client timeout
HTTP 504 errors appeared at the ingress layer
Application logs showed login handlers starting but not finishing

CPU on both pods pegged near their limits. Memory was fine. Database connection pool had headroom. The bottleneck wasn't I/O — it was something blocking inside the Node.js process itself.

We stripped the login handler down to isolate the cost:

DB lookup only → fast, even under load
DB lookup + JWT signing → still fine
DB lookup + bcrypt.compare → everything collapsed

That's when we measured it: under 100 concurrent login attempts, individual compare operations that took ~80ms in isolation were taking 4–6 seconds wall-clock time. Not because bcrypt got slower — because they were waiting in a queue.

Why bcrypt chokes Node.js under concurrency

bcrypt is deliberately slow. That's the feature. Each compare is CPU-intensive work designed to make brute-force attacks expensive.

In Node.js, the bcrypt npm package offloads this work to libuv's thread pool so the main event loop can keep handling other requests — in theory. In practice, that thread pool is shared globally across your entire process for all async file I/O, DNS lookups, and native crypto operations that use it.

The default pool size? 4 threads.

So when 100 login requests arrive and each one calls bcrypt.compare, you get something like this:

Concurrent compares waiting	Effective behaviour
1–4	Compares run in parallel, latency stays low
5–20	Requests queue, latency grows linearly
50–100	Queue depth explodes, timeouts everywhere

Do the math: if each compare takes 80ms and you have 4 threads, throughput is roughly 50 compares/second. At 100 simultaneous logins, the last request in a batch might wait 2 seconds just in queue time — before you account for CPU throttling, GC pauses, or other pool consumers.

That 6-second compare we measured wasn't bcrypt running for 6 seconds. It was 80ms of work + 5+ seconds of waiting.

We tried raising UV_THREADPOOL_SIZE

The first fix everyone suggests — including Stack Overflow, including me before this incident — is bumping the thread pool:

UV_THREADPOOL_SIZE=128 node dist/main.js

We added it to our Kubernetes deployment manifest:

env:
  - name: UV_THREADPOOL_SIZE
    value: "128"

It helped. Timeouts dropped. p95 went from 15s to around 4s. Better, but still unacceptable for a login endpoint, and the numbers didn't match what we expected.

Three problems remained:

1. CPU limits on Kubernetes

More thread pool threads don't create more CPU cores. Our pods were capped at 1 core. Throwing 128 threads at 1 core mostly means 128 threads competing for the same CPU, with context-switch overhead on top. Under load, the kernel scheduler became part of the bottleneck.

2. Every pod has its own pool

With 2 replicas, we didn't have a pool of 256 threads — we had two independent pools of 128, each attached to a pod receiving roughly half the traffic. Scaling horizontally didn't fix the per-process concurrency math.

3. The HTTP server still shared the pool

Our NestJS process wasn't just verifying passwords. It was also serving health checks, handling token refresh endpoints, writing audit logs, and running background cron tasks via @nestjs/schedule. All of them competed for the same libuv thread pool. Login traffic could starve everything else — or the reverse.

We had improved the symptom without fixing the architecture.

Why this hurts more on Kubernetes than on a laptop

On a local machine with 8 cores and no CPU limit, UV_THREADPOOL_SIZE=16 often "just works" for moderate load tests. Kubernetes adds constraints that make bcrypt's behaviour much worse:

CPU limits throttle your process mid-compute, stretching compare times unpredictably
Multiple replicas split traffic but don't coordinate CPU-heavy work
Liveness probes still hit /health while login hammers the thread pool — we saw health check latency spike during load tests, which almost triggered pod restarts
Autoscaling on CPU kicked in, added a third pod, and briefly made things worse as new pods cold-started during the traffic peak

The load test that passed on a developer's M1 Mac failed miserably against production-like k8s limits. That gap is worth closing before you promise a launch date.

What didn't work (and why)

We tried several intermediate fixes. Each one taught us something:

Lowering bcrypt rounds (12 → 10 → 8) — We were already at 10. Dropping to 8 shaved maybe 30% off compare time but didn't solve queueing at 100 VUs. Also a security regression we weren't willing to ship.

Switching to bcrypt.compare vs compareSync — We were already on the async variant. Both use the thread pool. No meaningful difference under this load pattern.

Rate limiting login — Correct for abuse prevention, but the business requirement was handling 100 legitimate concurrent logins during peak events. Rate limiting just moved the failure to the user.

Bigger pods (2 CPU, 2Gi memory) — Improved throughput, raised cost, still shared the pool with the rest of the app. p95 dropped to ~2s but wasn't reliable under spikes.

We needed to stop running bcrypt inside the HTTP server process — not tune the same process harder.

The fix: move bcrypt to a NestJS native script

NestJS supports running code outside the HTTP server through a native script — a standalone entry point bootstrapped with NestFactory.createApplicationContext(). No Express adapter, no port binding, no request middleware. Just your modules, DI, and the logic you need.

That turned out to be exactly what we needed.

The default NestJS pattern puts everything in one process:

// auth.service.ts — inside the HTTP app
async login(email: string, password: string) {
  const user = await this.users.findByEmail(email)
  const valid = await bcrypt.compare(password, user.passwordHash) // blocks the thread pool
  if (!valid) throw new UnauthorizedException()
  return this.signToken(user)
}

Under 100 concurrent logins, every request hit the same libuv thread pool. Health checks, cron jobs, and other endpoints shared that pool. Everything queued.

After: bcrypt in a separate NestJS script

We added a second entry point in the same project:

src/
  main.ts              ← HTTP API (unchanged entry)
  scripts/
    verify-password.ts ← native script (new entry)

The native script bootstraps only what it needs:

// scripts/verify-password.ts
async function bootstrap() {
  const app = await NestFactory.createApplicationContext(AuthScriptModule, {
    logger: false,
  })

  const verifier = app.get(PasswordVerifierService)

  // Read { hash, candidate } from stdin, write result to stdout
  const input = JSON.parse(await readStdin())
  const valid = await verifier.compare(input.candidate, input.hash)
  process.stdout.write(JSON.stringify({ valid }))

  await app.close()
}

Build it as a separate output in nest-cli.json:

{
  "compilerOptions": {
    "assets": [],
    "plugins": []
  },
  "projects": {
    "api": { "type": "application", "root": "src", "entryFile": "main" },
    "verify-password": {
      "type": "application",
      "root": "src",
      "entryFile": "scripts/verify-password"
    }
  }
}

The login handler no longer calls bcrypt.compare directly. It delegates to the script:

// auth.service.ts — HTTP app, no bcrypt import
async login(email: string, password: string) {
  const user = await this.users.findByEmail(email)
  const { valid } = await this.scriptRunner.run('verify-password', {
    candidate: password,
    hash: user.passwordHash,
  })
  if (!valid) throw new UnauthorizedException()
  return this.signToken(user)
}

ScriptRunner spawns the native script as a child process via child_process.spawn. Each compare runs in its own Node.js process with its own thread pool — completely isolated from the HTTP server.

Why this reduces the bottleneck

The improvement isn't magic — it's process isolation:

	HTTP server (before)	Native script (after)
Thread pool	Shared with all requests	Dedicated per compare
CPU competition	bcrypt vs health checks vs cron	bcrypt only
Under 100 VUs	100 compares queue on 4 threads	API spawns compares in parallel processes
API event loop	Blocked waiting on pool	Free for I/O

Each verify-password script process exits after one compare. The overhead of spawning a process (~10–20ms) is negligible compared to a 6-second queue wait.

On Kubernetes, the HTTP deployment stays lean. No need to oversized UV_THREADPOOL_SIZE on the API pod. The script processes inherit the pod's CPU limit but don't block each other inside a single event loop.

Deploying on Kubernetes

Same Docker image, two commands:

# API deployment — handles HTTP only
containers:
  - name: api
    image: my-app:latest
    command: ["node", "dist/main.js"]

# No separate worker deployment needed.
# Script runs on-demand via child_process inside the API pod.

We also tested a variant where the script runs as a long-lived sidecar process (still a NestJS native script, just kept alive). Results were similar, but spawn-per-request was simpler to ship and easier to reason about under load.

Results after the change

Same k6 script, same 100 VUs, same k8s cluster:

Metric	Before	After
p95 login latency	15,000ms+	380ms
p99 login latency	30,000ms (timeout)	650ms
Error rate	34%	0%
bcrypt compare (wall clock)	up to 6,000ms	90–110ms
API pod CPU during test	~95%	~30%

The API pod stopped fighting bcrypt for CPU. Compare time dropped because each verification got its own process instead of waiting in a shared queue.

Lessons I'd pass on

Load-test the auth path separately. Login is not a CRUD endpoint. It has different CPU characteristics and failure modes. A load test that only hits GET /products will miss this entirely.

Don't fix a process isolation problem with UV_THREADPOOL_SIZE alone. Raising the pool helps in dev, but on k8s with CPU limits you're still cramming all concurrent work into one Node.js process. A NestJS native script gives you isolation without a new microservice.

Kubernetes CPU limits multiply every Node.js concurrency footgun. If your pod has 500m CPU, assume you can sustain far fewer parallel bcrypt operations than your thread pool size suggests.

Measure wall-clock time, not just handler time. Our APM showed login handlers taking 6 seconds total and initially blamed the database. Only when we logged timestamps around the compare call did we see the queueing gap.

Use NestJS native scripts before reaching for a new service. NestFactory.createApplicationContext() is built for exactly this — reuse your modules and DI, run CPU-heavy work in a separate process, zero HTTP overhead. No Redis queue, no gRPC, no second deployment required.

The pain point in one sentence

Node.js login endpoints using bcrypt will silently queue under concurrent load — and Kubernetes CPU limits turn a thread pool tuning problem into a production outage.

If you're running NestJS on k8s and haven't load-tested login at realistic concurrency, do it this week. The failure mode is invisible until it isn't.

Docker Containers Die Quietly — And Nobody Gets Paged — another production failure that only shows up under real load
Why Email Deliverability Is Still Broken in 2025 — silent degradation patterns in production systems

Leave feedback