[{"data":1,"prerenderedAt":1992},["ShallowReactive",2],{"home-posts":3},[4,1504,1656,1851],{"id":5,"title":6,"author":7,"body":8,"date":1480,"description":1481,"draft":1482,"extension":1483,"image":1484,"keywords":1485,"meta":1493,"modified":1480,"navigation":720,"path":1494,"seo":1495,"stem":1496,"tags":1497,"__hash__":1503},"blog\u002Fblog\u002Fbcrypt-login-bottleneck-kubernetes.md","100 Users Log In at Once and Your API Dies — The bcrypt Bottleneck Nobody Warns You About","haile37",{"type":9,"value":10,"toc":1462},"minimark",[11,15,18,31,42,47,50,81,88,91,94,98,101,104,135,138,141,159,170,174,177,191,197,203,247,254,260,264,267,299,302,344,347,350,355,362,367,374,379,386,389,393,400,434,441,445,448,454,466,472,478,485,489,500,503,508,511,635,638,642,645,653,656,845,852,993,999,1116,1130,1134,1140,1200,1207,1213,1217,1220,1291,1294,1298,1301,1373,1376,1380,1390,1399,1405,1415,1424,1428,1434,1437,1441,1458],[12,13,14],"p",{},"We shipped the login endpoint, ran it through QA, and called it done. Single-user tests looked fine. bcrypt with 10 rounds felt reasonable — secure, standard, what every tutorial recommends.",[12,16,17],{},"Then we pointed k6 at it with 100 virtual users and watched the whole authentication path fall apart.",[12,19,20,21,30],{},"Requests that normally returned in 200ms started timing out at 30 seconds. Error rates climbed. Pod CPU graphs looked like a heartbeat monitor having a bad day. And the strangest part: when we added logging around the password check alone, ",[22,23,24,25,29],"strong",{},"a single ",[26,27,28],"code",{},"bcrypt.compare"," call was taking up to 6 seconds"," under load — not because bcrypt rounds were wrong, but because the requests were queuing behind each other like cars at a toll booth with one open lane.",[12,32,33,34,37,38,41],{},"This is the story of that bottleneck, why ",[26,35,36],{},"UV_THREADPOOL_SIZE"," wasn't the whole answer, and how we fixed it by moving password verification into a ",[22,39,40],{},"NestJS native script"," — a separate entry point in the same codebase, no new microservice required.",[43,44,46],"h2",{"id":45},"the-setup","The setup",[12,48,49],{},"Our stack was straightforward:",[51,52,53,63,69,75],"ul",{},[54,55,56,59,60],"li",{},[22,57,58],{},"NestJS"," API behind an ingress controller on ",[22,61,62],{},"Kubernetes",[54,64,65,68],{},[22,66,67],{},"PostgreSQL"," for user records",[54,70,71,74],{},[22,72,73],{},"bcrypt"," (cost factor 10) for password hashing at registration and comparison at login",[54,76,77,80],{},[22,78,79],{},"k6"," for load testing before a marketing launch that expected a traffic spike",[12,82,83,84,87],{},"The login flow was equally standard: look up the user by email, fetch the stored hash, run ",[26,85,86],{},"bcrypt.compare(plainPassword, storedHash)",", issue a JWT if it matched.",[12,89,90],{},"In development, this felt fast. On staging with one user clicking login, p95 latency was under 300ms. We had two replicas, horizontal pod autoscaling configured, and reasonable CPU limits — 500m request, 1000m limit per pod.",[12,92,93],{},"Confident, we ran the load test.",[43,95,97],{"id":96},"what-100-vus-actually-looked-like","What 100 VUs actually looked like",[12,99,100],{},"The k6 script was simple: 100 virtual users, ramp up over 30 seconds, each executing a login against a pool of test accounts with known passwords.",[12,102,103],{},"Within two minutes:",[51,105,106,112,118,124],{},[54,107,108,111],{},[22,109,110],{},"p95 response time"," exceeded 15 seconds",[54,113,114,117],{},[22,115,116],{},"p99"," hit the 30-second client timeout",[54,119,120,123],{},[22,121,122],{},"HTTP 504"," errors appeared at the ingress layer",[54,125,126,127,131,132],{},"Application logs showed login handlers ",[128,129,130],"em",{},"starting"," but not ",[128,133,134],{},"finishing",[12,136,137],{},"CPU on both pods pegged near their limits. Memory was fine. Database connection pool had headroom. The bottleneck wasn't I\u002FO — it was something blocking inside the Node.js process itself.",[12,139,140],{},"We stripped the login handler down to isolate the cost:",[142,143,144,147,150],"ol",{},[54,145,146],{},"DB lookup only → fast, even under load",[54,148,149],{},"DB lookup + JWT signing → still fine",[54,151,152,153,155,156],{},"DB lookup + ",[26,154,28],{}," → ",[22,157,158],{},"everything collapsed",[12,160,161,162,165,166,169],{},"That's when we measured it: under 100 concurrent login attempts, individual compare operations that took ~80ms in isolation were taking ",[22,163,164],{},"4–6 seconds"," wall-clock time. Not because bcrypt got slower — because they were ",[22,167,168],{},"waiting in a queue",".",[43,171,173],{"id":172},"why-bcrypt-chokes-nodejs-under-concurrency","Why bcrypt chokes Node.js under concurrency",[12,175,176],{},"bcrypt is deliberately slow. That's the feature. Each compare is CPU-intensive work designed to make brute-force attacks expensive.",[12,178,179,180,182,183,186,187,190],{},"In Node.js, the ",[26,181,73],{}," npm package offloads this work to ",[22,184,185],{},"libuv's thread pool"," so the main event loop can keep handling other requests — in theory. In practice, that thread pool is ",[22,188,189],{},"shared globally"," across your entire process for all async file I\u002FO, DNS lookups, and native crypto operations that use it.",[12,192,193,194],{},"The default pool size? ",[22,195,196],{},"4 threads.",[12,198,199,200,202],{},"So when 100 login requests arrive and each one calls ",[26,201,28],{},", you get something like this:",[204,205,206,219],"table",{},[207,208,209],"thead",{},[210,211,212,216],"tr",{},[213,214,215],"th",{},"Concurrent compares waiting",[213,217,218],{},"Effective behaviour",[220,221,222,231,239],"tbody",{},[210,223,224,228],{},[225,226,227],"td",{},"1–4",[225,229,230],{},"Compares run in parallel, latency stays low",[210,232,233,236],{},[225,234,235],{},"5–20",[225,237,238],{},"Requests queue, latency grows linearly",[210,240,241,244],{},[225,242,243],{},"50–100",[225,245,246],{},"Queue depth explodes, timeouts everywhere",[12,248,249,250,253],{},"Do the math: if each compare takes 80ms and you have 4 threads, throughput is roughly 50 compares\u002Fsecond. At 100 simultaneous logins, the last request in a batch might wait ",[22,251,252],{},"2 seconds"," just in queue time — before you account for CPU throttling, GC pauses, or other pool consumers.",[12,255,256,257,169],{},"That 6-second compare we measured wasn't bcrypt running for 6 seconds. It was ",[22,258,259],{},"80ms of work + 5+ seconds of waiting",[43,261,263],{"id":262},"we-tried-raising-uv_threadpool_size","We tried raising UV_THREADPOOL_SIZE",[12,265,266],{},"The first fix everyone suggests — including Stack Overflow, including me before this incident — is bumping the thread pool:",[268,269,274],"pre",{"className":270,"code":271,"language":272,"meta":273,"style":273},"language-bash shiki shiki-themes github-dark","UV_THREADPOOL_SIZE=128 node dist\u002Fmain.js\n","bash","",[26,275,276],{"__ignoreMap":273},[277,278,281,284,288,292,296],"span",{"class":279,"line":280},"line",1,[277,282,36],{"class":283},"s95oV",[277,285,287],{"class":286},"snl16","=",[277,289,291],{"class":290},"sU2Wk","128",[277,293,295],{"class":294},"svObZ"," node",[277,297,298],{"class":290}," dist\u002Fmain.js\n",[12,300,301],{},"We added it to our Kubernetes deployment manifest:",[268,303,307],{"className":304,"code":305,"language":306,"meta":273,"style":273},"language-yaml shiki shiki-themes github-dark","env:\n  - name: UV_THREADPOOL_SIZE\n    value: \"128\"\n","yaml",[26,308,309,318,333],{"__ignoreMap":273},[277,310,311,315],{"class":279,"line":280},[277,312,314],{"class":313},"s4JwU","env",[277,316,317],{"class":283},":\n",[277,319,321,324,327,330],{"class":279,"line":320},2,[277,322,323],{"class":283},"  - ",[277,325,326],{"class":313},"name",[277,328,329],{"class":283},": ",[277,331,332],{"class":290},"UV_THREADPOOL_SIZE\n",[277,334,336,339,341],{"class":279,"line":335},3,[277,337,338],{"class":313},"    value",[277,340,329],{"class":283},[277,342,343],{"class":290},"\"128\"\n",[12,345,346],{},"It helped. Timeouts dropped. p95 went from 15s to around 4s. Better, but still unacceptable for a login endpoint, and the numbers didn't match what we expected.",[12,348,349],{},"Three problems remained:",[12,351,352],{},[22,353,354],{},"1. CPU limits on Kubernetes",[12,356,357,358,361],{},"More thread pool threads don't create more CPU cores. Our pods were capped at 1 core. Throwing 128 threads at 1 core mostly means ",[22,359,360],{},"128 threads competing for the same CPU",", with context-switch overhead on top. Under load, the kernel scheduler became part of the bottleneck.",[12,363,364],{},[22,365,366],{},"2. Every pod has its own pool",[12,368,369,370,373],{},"With 2 replicas, we didn't have a pool of 256 threads — we had ",[22,371,372],{},"two independent pools of 128",", each attached to a pod receiving roughly half the traffic. Scaling horizontally didn't fix the per-process concurrency math.",[12,375,376],{},[22,377,378],{},"3. The HTTP server still shared the pool",[12,380,381,382,385],{},"Our NestJS process wasn't just verifying passwords. It was also serving health checks, handling token refresh endpoints, writing audit logs, and running background cron tasks via ",[26,383,384],{},"@nestjs\u002Fschedule",". All of them competed for the same libuv thread pool. Login traffic could starve everything else — or the reverse.",[12,387,388],{},"We had improved the symptom without fixing the architecture.",[43,390,392],{"id":391},"why-this-hurts-more-on-kubernetes-than-on-a-laptop","Why this hurts more on Kubernetes than on a laptop",[12,394,395,396,399],{},"On a local machine with 8 cores and no CPU limit, ",[26,397,398],{},"UV_THREADPOOL_SIZE=16"," often \"just works\" for moderate load tests. Kubernetes adds constraints that make bcrypt's behaviour much worse:",[51,401,402,408,414,424],{},[54,403,404,407],{},[22,405,406],{},"CPU limits"," throttle your process mid-compute, stretching compare times unpredictably",[54,409,410,413],{},[22,411,412],{},"Multiple replicas"," split traffic but don't coordinate CPU-heavy work",[54,415,416,419,420,423],{},[22,417,418],{},"Liveness probes"," still hit ",[26,421,422],{},"\u002Fhealth"," while login hammers the thread pool — we saw health check latency spike during load tests, which almost triggered pod restarts",[54,425,426,429,430,433],{},[22,427,428],{},"Autoscaling on CPU"," kicked in, added a third pod, and briefly made things ",[128,431,432],{},"worse"," as new pods cold-started during the traffic peak",[12,435,436,437,440],{},"The load test that passed on a developer's M1 Mac failed miserably against production-like k8s limits. That gap is worth closing ",[22,438,439],{},"before"," you promise a launch date.",[43,442,444],{"id":443},"what-didnt-work-and-why","What didn't work (and why)",[12,446,447],{},"We tried several intermediate fixes. Each one taught us something:",[12,449,450,453],{},[22,451,452],{},"Lowering bcrypt rounds (12 → 10 → 8)"," — We were already at 10. Dropping to 8 shaved maybe 30% off compare time but didn't solve queueing at 100 VUs. Also a security regression we weren't willing to ship.",[12,455,456,465],{},[22,457,458,459,461,462],{},"Switching to ",[26,460,28],{}," vs ",[26,463,464],{},"compareSync"," — We were already on the async variant. Both use the thread pool. No meaningful difference under this load pattern.",[12,467,468,471],{},[22,469,470],{},"Rate limiting login"," — Correct for abuse prevention, but the business requirement was handling 100 legitimate concurrent logins during peak events. Rate limiting just moved the failure to the user.",[12,473,474,477],{},[22,475,476],{},"Bigger pods (2 CPU, 2Gi memory)"," — Improved throughput, raised cost, still shared the pool with the rest of the app. p95 dropped to ~2s but wasn't reliable under spikes.",[12,479,480,481,484],{},"We needed to ",[22,482,483],{},"stop running bcrypt inside the HTTP server process"," — not tune the same process harder.",[43,486,488],{"id":487},"the-fix-move-bcrypt-to-a-nestjs-native-script","The fix: move bcrypt to a NestJS native script",[12,490,491,492,495,496,499],{},"NestJS supports running code outside the HTTP server through a ",[22,493,494],{},"native script"," — a standalone entry point bootstrapped with ",[26,497,498],{},"NestFactory.createApplicationContext()",". No Express adapter, no port binding, no request middleware. Just your modules, DI, and the logic you need.",[12,501,502],{},"That turned out to be exactly what we needed.",[504,505,507],"h3",{"id":506},"before-bcrypt-inside-the-login-handler","Before: bcrypt inside the login handler",[12,509,510],{},"The default NestJS pattern puts everything in one process:",[268,512,516],{"className":513,"code":514,"language":515,"meta":273,"style":273},"language-typescript shiki shiki-themes github-dark","\u002F\u002F auth.service.ts — inside the HTTP app\nasync login(email: string, password: string) {\n  const user = await this.users.findByEmail(email)\n  const valid = await bcrypt.compare(password, user.passwordHash) \u002F\u002F blocks the thread pool\n  if (!valid) throw new UnauthorizedException()\n  return this.signToken(user)\n}\n","typescript",[26,517,518,524,535,562,586,613,629],{"__ignoreMap":273},[277,519,520],{"class":279,"line":280},[277,521,523],{"class":522},"sAwPA","\u002F\u002F auth.service.ts — inside the HTTP app\n",[277,525,526,529,532],{"class":279,"line":320},[277,527,528],{"class":283},"async ",[277,530,531],{"class":294},"login",[277,533,534],{"class":283},"(email: string, password: string) {\n",[277,536,537,540,544,547,550,553,556,559],{"class":279,"line":335},[277,538,539],{"class":286},"  const",[277,541,543],{"class":542},"sDLfK"," user",[277,545,546],{"class":286}," =",[277,548,549],{"class":286}," await",[277,551,552],{"class":542}," this",[277,554,555],{"class":283},".users.",[277,557,558],{"class":294},"findByEmail",[277,560,561],{"class":283},"(email)\n",[277,563,565,567,570,572,574,577,580,583],{"class":279,"line":564},4,[277,566,539],{"class":286},[277,568,569],{"class":542}," valid",[277,571,546],{"class":286},[277,573,549],{"class":286},[277,575,576],{"class":283}," bcrypt.",[277,578,579],{"class":294},"compare",[277,581,582],{"class":283},"(password, user.passwordHash) ",[277,584,585],{"class":522},"\u002F\u002F blocks the thread pool\n",[277,587,589,592,595,598,601,604,607,610],{"class":279,"line":588},5,[277,590,591],{"class":286},"  if",[277,593,594],{"class":283}," (",[277,596,597],{"class":286},"!",[277,599,600],{"class":283},"valid) ",[277,602,603],{"class":286},"throw",[277,605,606],{"class":286}," new",[277,608,609],{"class":294}," UnauthorizedException",[277,611,612],{"class":283},"()\n",[277,614,616,619,621,623,626],{"class":279,"line":615},6,[277,617,618],{"class":286},"  return",[277,620,552],{"class":542},[277,622,169],{"class":283},[277,624,625],{"class":294},"signToken",[277,627,628],{"class":283},"(user)\n",[277,630,632],{"class":279,"line":631},7,[277,633,634],{"class":283},"}\n",[12,636,637],{},"Under 100 concurrent logins, every request hit the same libuv thread pool. Health checks, cron jobs, and other endpoints shared that pool. Everything queued.",[504,639,641],{"id":640},"after-bcrypt-in-a-separate-nestjs-script","After: bcrypt in a separate NestJS script",[12,643,644],{},"We added a second entry point in the same project:",[268,646,651],{"className":647,"code":649,"language":650},[648],"language-text","src\u002F\n  main.ts              ← HTTP API (unchanged entry)\n  scripts\u002F\n    verify-password.ts ← native script (new entry)\n","text",[26,652,649],{"__ignoreMap":273},[12,654,655],{},"The native script bootstraps only what it needs:",[268,657,659],{"className":513,"code":658,"language":515,"meta":273,"style":273},"\u002F\u002F scripts\u002Fverify-password.ts\nasync function bootstrap() {\n  const app = await NestFactory.createApplicationContext(AuthScriptModule, {\n    logger: false,\n  })\n\n  const verifier = app.get(PasswordVerifierService)\n\n  \u002F\u002F Read { hash, candidate } from stdin, write result to stdout\n  const input = JSON.parse(await readStdin())\n  const valid = await verifier.compare(input.candidate, input.hash)\n  process.stdout.write(JSON.stringify({ valid }))\n\n  await app.close()\n}\n",[26,660,661,666,680,700,711,716,722,740,745,751,781,800,822,827,840],{"__ignoreMap":273},[277,662,663],{"class":279,"line":280},[277,664,665],{"class":522},"\u002F\u002F scripts\u002Fverify-password.ts\n",[277,667,668,671,674,677],{"class":279,"line":320},[277,669,670],{"class":286},"async",[277,672,673],{"class":286}," function",[277,675,676],{"class":294}," bootstrap",[277,678,679],{"class":283},"() {\n",[277,681,682,684,687,689,691,694,697],{"class":279,"line":335},[277,683,539],{"class":286},[277,685,686],{"class":542}," app",[277,688,546],{"class":286},[277,690,549],{"class":286},[277,692,693],{"class":283}," NestFactory.",[277,695,696],{"class":294},"createApplicationContext",[277,698,699],{"class":283},"(AuthScriptModule, {\n",[277,701,702,705,708],{"class":279,"line":564},[277,703,704],{"class":283},"    logger: ",[277,706,707],{"class":542},"false",[277,709,710],{"class":283},",\n",[277,712,713],{"class":279,"line":588},[277,714,715],{"class":283},"  })\n",[277,717,718],{"class":279,"line":615},[277,719,721],{"emptyLinePlaceholder":720},true,"\n",[277,723,724,726,729,731,734,737],{"class":279,"line":631},[277,725,539],{"class":286},[277,727,728],{"class":542}," verifier",[277,730,546],{"class":286},[277,732,733],{"class":283}," app.",[277,735,736],{"class":294},"get",[277,738,739],{"class":283},"(PasswordVerifierService)\n",[277,741,743],{"class":279,"line":742},8,[277,744,721],{"emptyLinePlaceholder":720},[277,746,748],{"class":279,"line":747},9,[277,749,750],{"class":522},"  \u002F\u002F Read { hash, candidate } from stdin, write result to stdout\n",[277,752,754,756,759,761,764,766,769,772,775,778],{"class":279,"line":753},10,[277,755,539],{"class":286},[277,757,758],{"class":542}," input",[277,760,546],{"class":286},[277,762,763],{"class":542}," JSON",[277,765,169],{"class":283},[277,767,768],{"class":294},"parse",[277,770,771],{"class":283},"(",[277,773,774],{"class":286},"await",[277,776,777],{"class":294}," readStdin",[277,779,780],{"class":283},"())\n",[277,782,784,786,788,790,792,795,797],{"class":279,"line":783},11,[277,785,539],{"class":286},[277,787,569],{"class":542},[277,789,546],{"class":286},[277,791,549],{"class":286},[277,793,794],{"class":283}," verifier.",[277,796,579],{"class":294},[277,798,799],{"class":283},"(input.candidate, input.hash)\n",[277,801,803,806,809,811,814,816,819],{"class":279,"line":802},12,[277,804,805],{"class":283},"  process.stdout.",[277,807,808],{"class":294},"write",[277,810,771],{"class":283},[277,812,813],{"class":542},"JSON",[277,815,169],{"class":283},[277,817,818],{"class":294},"stringify",[277,820,821],{"class":283},"({ valid }))\n",[277,823,825],{"class":279,"line":824},13,[277,826,721],{"emptyLinePlaceholder":720},[277,828,830,833,835,838],{"class":279,"line":829},14,[277,831,832],{"class":286},"  await",[277,834,733],{"class":283},[277,836,837],{"class":294},"close",[277,839,612],{"class":283},[277,841,843],{"class":279,"line":842},15,[277,844,634],{"class":283},[12,846,847,848,851],{},"Build it as a separate output in ",[26,849,850],{},"nest-cli.json",":",[268,853,857],{"className":854,"code":855,"language":856,"meta":273,"style":273},"language-json shiki shiki-themes github-dark","{\n  \"compilerOptions\": {\n    \"assets\": [],\n    \"plugins\": []\n  },\n  \"projects\": {\n    \"api\": { \"type\": \"application\", \"root\": \"src\", \"entryFile\": \"main\" },\n    \"verify-password\": {\n      \"type\": \"application\",\n      \"root\": \"src\",\n      \"entryFile\": \"scripts\u002Fverify-password\"\n    }\n  }\n}\n","json",[26,858,859,864,872,880,888,893,900,940,947,958,969,979,984,989],{"__ignoreMap":273},[277,860,861],{"class":279,"line":280},[277,862,863],{"class":283},"{\n",[277,865,866,869],{"class":279,"line":320},[277,867,868],{"class":542},"  \"compilerOptions\"",[277,870,871],{"class":283},": {\n",[277,873,874,877],{"class":279,"line":335},[277,875,876],{"class":542},"    \"assets\"",[277,878,879],{"class":283},": [],\n",[277,881,882,885],{"class":279,"line":564},[277,883,884],{"class":542},"    \"plugins\"",[277,886,887],{"class":283},": []\n",[277,889,890],{"class":279,"line":588},[277,891,892],{"class":283},"  },\n",[277,894,895,898],{"class":279,"line":615},[277,896,897],{"class":542},"  \"projects\"",[277,899,871],{"class":283},[277,901,902,905,908,911,913,916,919,922,924,927,929,932,934,937],{"class":279,"line":631},[277,903,904],{"class":542},"    \"api\"",[277,906,907],{"class":283},": { ",[277,909,910],{"class":542},"\"type\"",[277,912,329],{"class":283},[277,914,915],{"class":290},"\"application\"",[277,917,918],{"class":283},", ",[277,920,921],{"class":542},"\"root\"",[277,923,329],{"class":283},[277,925,926],{"class":290},"\"src\"",[277,928,918],{"class":283},[277,930,931],{"class":542},"\"entryFile\"",[277,933,329],{"class":283},[277,935,936],{"class":290},"\"main\"",[277,938,939],{"class":283}," },\n",[277,941,942,945],{"class":279,"line":742},[277,943,944],{"class":542},"    \"verify-password\"",[277,946,871],{"class":283},[277,948,949,952,954,956],{"class":279,"line":747},[277,950,951],{"class":542},"      \"type\"",[277,953,329],{"class":283},[277,955,915],{"class":290},[277,957,710],{"class":283},[277,959,960,963,965,967],{"class":279,"line":753},[277,961,962],{"class":542},"      \"root\"",[277,964,329],{"class":283},[277,966,926],{"class":290},[277,968,710],{"class":283},[277,970,971,974,976],{"class":279,"line":783},[277,972,973],{"class":542},"      \"entryFile\"",[277,975,329],{"class":283},[277,977,978],{"class":290},"\"scripts\u002Fverify-password\"\n",[277,980,981],{"class":279,"line":802},[277,982,983],{"class":283},"    }\n",[277,985,986],{"class":279,"line":824},[277,987,988],{"class":283},"  }\n",[277,990,991],{"class":279,"line":829},[277,992,634],{"class":283},[12,994,995,996,998],{},"The login handler no longer calls ",[26,997,28],{}," directly. It delegates to the script:",[268,1000,1002],{"className":513,"code":1001,"language":515,"meta":273,"style":273},"\u002F\u002F auth.service.ts — HTTP app, no bcrypt import\nasync login(email: string, password: string) {\n  const user = await this.users.findByEmail(email)\n  const { valid } = await this.scriptRunner.run('verify-password', {\n    candidate: password,\n    hash: user.passwordHash,\n  })\n  if (!valid) throw new UnauthorizedException()\n  return this.signToken(user)\n}\n",[26,1003,1004,1009,1017,1035,1068,1073,1078,1082,1100,1112],{"__ignoreMap":273},[277,1005,1006],{"class":279,"line":280},[277,1007,1008],{"class":522},"\u002F\u002F auth.service.ts — HTTP app, no bcrypt import\n",[277,1010,1011,1013,1015],{"class":279,"line":320},[277,1012,528],{"class":283},[277,1014,531],{"class":294},[277,1016,534],{"class":283},[277,1018,1019,1021,1023,1025,1027,1029,1031,1033],{"class":279,"line":335},[277,1020,539],{"class":286},[277,1022,543],{"class":542},[277,1024,546],{"class":286},[277,1026,549],{"class":286},[277,1028,552],{"class":542},[277,1030,555],{"class":283},[277,1032,558],{"class":294},[277,1034,561],{"class":283},[277,1036,1037,1039,1042,1045,1048,1050,1052,1054,1057,1060,1062,1065],{"class":279,"line":564},[277,1038,539],{"class":286},[277,1040,1041],{"class":283}," { ",[277,1043,1044],{"class":542},"valid",[277,1046,1047],{"class":283}," } ",[277,1049,287],{"class":286},[277,1051,549],{"class":286},[277,1053,552],{"class":542},[277,1055,1056],{"class":283},".scriptRunner.",[277,1058,1059],{"class":294},"run",[277,1061,771],{"class":283},[277,1063,1064],{"class":290},"'verify-password'",[277,1066,1067],{"class":283},", {\n",[277,1069,1070],{"class":279,"line":588},[277,1071,1072],{"class":283},"    candidate: password,\n",[277,1074,1075],{"class":279,"line":615},[277,1076,1077],{"class":283},"    hash: user.passwordHash,\n",[277,1079,1080],{"class":279,"line":631},[277,1081,715],{"class":283},[277,1083,1084,1086,1088,1090,1092,1094,1096,1098],{"class":279,"line":742},[277,1085,591],{"class":286},[277,1087,594],{"class":283},[277,1089,597],{"class":286},[277,1091,600],{"class":283},[277,1093,603],{"class":286},[277,1095,606],{"class":286},[277,1097,609],{"class":294},[277,1099,612],{"class":283},[277,1101,1102,1104,1106,1108,1110],{"class":279,"line":747},[277,1103,618],{"class":286},[277,1105,552],{"class":542},[277,1107,169],{"class":283},[277,1109,625],{"class":294},[277,1111,628],{"class":283},[277,1113,1114],{"class":279,"line":753},[277,1115,634],{"class":283},[12,1117,1118,1121,1122,1125,1126,1129],{},[26,1119,1120],{},"ScriptRunner"," spawns the native script as a child process via ",[26,1123,1124],{},"child_process.spawn",". Each compare runs in its ",[22,1127,1128],{},"own Node.js process with its own thread pool"," — completely isolated from the HTTP server.",[504,1131,1133],{"id":1132},"why-this-reduces-the-bottleneck","Why this reduces the bottleneck",[12,1135,1136,1137,851],{},"The improvement isn't magic — it's ",[22,1138,1139],{},"process isolation",[204,1141,1142,1154],{},[207,1143,1144],{},[210,1145,1146,1148,1151],{},[213,1147],{},[213,1149,1150],{},"HTTP server (before)",[213,1152,1153],{},"Native script (after)",[220,1155,1156,1167,1178,1189],{},[210,1157,1158,1161,1164],{},[225,1159,1160],{},"Thread pool",[225,1162,1163],{},"Shared with all requests",[225,1165,1166],{},"Dedicated per compare",[210,1168,1169,1172,1175],{},[225,1170,1171],{},"CPU competition",[225,1173,1174],{},"bcrypt vs health checks vs cron",[225,1176,1177],{},"bcrypt only",[210,1179,1180,1183,1186],{},[225,1181,1182],{},"Under 100 VUs",[225,1184,1185],{},"100 compares queue on 4 threads",[225,1187,1188],{},"API spawns compares in parallel processes",[210,1190,1191,1194,1197],{},[225,1192,1193],{},"API event loop",[225,1195,1196],{},"Blocked waiting on pool",[225,1198,1199],{},"Free for I\u002FO",[12,1201,1202,1203,1206],{},"Each ",[26,1204,1205],{},"verify-password"," script process exits after one compare. The overhead of spawning a process (~10–20ms) is negligible compared to a 6-second queue wait.",[12,1208,1209,1210,1212],{},"On Kubernetes, the HTTP deployment stays lean. No need to oversized ",[26,1211,36],{}," on the API pod. The script processes inherit the pod's CPU limit but don't block each other inside a single event loop.",[504,1214,1216],{"id":1215},"deploying-on-kubernetes","Deploying on Kubernetes",[12,1218,1219],{},"Same Docker image, two commands:",[268,1221,1223],{"className":304,"code":1222,"language":306,"meta":273,"style":273},"# API deployment — handles HTTP only\ncontainers:\n  - name: api\n    image: my-app:latest\n    command: [\"node\", \"dist\u002Fmain.js\"]\n\n# No separate worker deployment needed.\n# Script runs on-demand via child_process inside the API pod.\n",[26,1224,1225,1230,1237,1248,1258,1277,1281,1286],{"__ignoreMap":273},[277,1226,1227],{"class":279,"line":280},[277,1228,1229],{"class":522},"# API deployment — handles HTTP only\n",[277,1231,1232,1235],{"class":279,"line":320},[277,1233,1234],{"class":313},"containers",[277,1236,317],{"class":283},[277,1238,1239,1241,1243,1245],{"class":279,"line":335},[277,1240,323],{"class":283},[277,1242,326],{"class":313},[277,1244,329],{"class":283},[277,1246,1247],{"class":290},"api\n",[277,1249,1250,1253,1255],{"class":279,"line":564},[277,1251,1252],{"class":313},"    image",[277,1254,329],{"class":283},[277,1256,1257],{"class":290},"my-app:latest\n",[277,1259,1260,1263,1266,1269,1271,1274],{"class":279,"line":588},[277,1261,1262],{"class":313},"    command",[277,1264,1265],{"class":283},": [",[277,1267,1268],{"class":290},"\"node\"",[277,1270,918],{"class":283},[277,1272,1273],{"class":290},"\"dist\u002Fmain.js\"",[277,1275,1276],{"class":283},"]\n",[277,1278,1279],{"class":279,"line":615},[277,1280,721],{"emptyLinePlaceholder":720},[277,1282,1283],{"class":279,"line":631},[277,1284,1285],{"class":522},"# No separate worker deployment needed.\n",[277,1287,1288],{"class":279,"line":742},[277,1289,1290],{"class":522},"# Script runs on-demand via child_process inside the API pod.\n",[12,1292,1293],{},"We also tested a variant where the script runs as a long-lived sidecar process (still a NestJS native script, just kept alive). Results were similar, but spawn-per-request was simpler to ship and easier to reason about under load.",[504,1295,1297],{"id":1296},"results-after-the-change","Results after the change",[12,1299,1300],{},"Same k6 script, same 100 VUs, same k8s cluster:",[204,1302,1303,1316],{},[207,1304,1305],{},[210,1306,1307,1310,1313],{},[213,1308,1309],{},"Metric",[213,1311,1312],{},"Before",[213,1314,1315],{},"After",[220,1317,1318,1329,1340,1351,1362],{},[210,1319,1320,1323,1326],{},[225,1321,1322],{},"p95 login latency",[225,1324,1325],{},"15,000ms+",[225,1327,1328],{},"380ms",[210,1330,1331,1334,1337],{},[225,1332,1333],{},"p99 login latency",[225,1335,1336],{},"30,000ms (timeout)",[225,1338,1339],{},"650ms",[210,1341,1342,1345,1348],{},[225,1343,1344],{},"Error rate",[225,1346,1347],{},"34%",[225,1349,1350],{},"0%",[210,1352,1353,1356,1359],{},[225,1354,1355],{},"bcrypt compare (wall clock)",[225,1357,1358],{},"up to 6,000ms",[225,1360,1361],{},"90–110ms",[210,1363,1364,1367,1370],{},[225,1365,1366],{},"API pod CPU during test",[225,1368,1369],{},"~95%",[225,1371,1372],{},"~30%",[12,1374,1375],{},"The API pod stopped fighting bcrypt for CPU. Compare time dropped because each verification got its own process instead of waiting in a shared queue.",[43,1377,1379],{"id":1378},"lessons-id-pass-on","Lessons I'd pass on",[12,1381,1382,1385,1386,1389],{},[22,1383,1384],{},"Load-test the auth path separately."," Login is not a CRUD endpoint. It has different CPU characteristics and failure modes. A load test that only hits ",[26,1387,1388],{},"GET \u002Fproducts"," will miss this entirely.",[12,1391,1392,1398],{},[22,1393,1394,1395,1397],{},"Don't fix a process isolation problem with ",[26,1396,36],{}," alone."," Raising the pool helps in dev, but on k8s with CPU limits you're still cramming all concurrent work into one Node.js process. A NestJS native script gives you isolation without a new microservice.",[12,1400,1401,1404],{},[22,1402,1403],{},"Kubernetes CPU limits multiply every Node.js concurrency footgun."," If your pod has 500m CPU, assume you can sustain far fewer parallel bcrypt operations than your thread pool size suggests.",[12,1406,1407,1410,1411,1414],{},[22,1408,1409],{},"Measure wall-clock time, not just handler time."," Our APM showed login handlers taking 6 seconds total and initially blamed the database. Only when we logged timestamps ",[128,1412,1413],{},"around"," the compare call did we see the queueing gap.",[12,1416,1417,1420,1421,1423],{},[22,1418,1419],{},"Use NestJS native scripts before reaching for a new service."," ",[26,1422,498],{}," is built for exactly this — reuse your modules and DI, run CPU-heavy work in a separate process, zero HTTP overhead. No Redis queue, no gRPC, no second deployment required.",[43,1425,1427],{"id":1426},"the-pain-point-in-one-sentence","The pain point in one sentence",[1429,1430,1431],"blockquote",{},[12,1432,1433],{},"Node.js login endpoints using bcrypt will silently queue under concurrent load — and Kubernetes CPU limits turn a thread pool tuning problem into a production outage.",[12,1435,1436],{},"If you're running NestJS on k8s and haven't load-tested login at realistic concurrency, do it this week. The failure mode is invisible until it isn't.",[43,1438,1440],{"id":1439},"related-reading","Related reading",[51,1442,1443,1451],{},[54,1444,1445,1450],{},[1446,1447,1449],"a",{"href":1448},"\u002Fblog\u002Fdocker-container-alerts-missing","Docker Containers Die Quietly — And Nobody Gets Paged"," — another production failure that only shows up under real load",[54,1452,1453,1457],{},[1446,1454,1456],{"href":1455},"\u002Fblog\u002Femail-deliverability-pain-points","Why Email Deliverability Is Still Broken in 2025"," — silent degradation patterns in production systems",[1459,1460,1461],"style",{},"html pre.shiki code .s95oV, html code.shiki .s95oV{--shiki-default:#E1E4E8}html pre.shiki code .snl16, html code.shiki .snl16{--shiki-default:#F97583}html pre.shiki code .sU2Wk, html code.shiki .sU2Wk{--shiki-default:#9ECBFF}html pre.shiki code .svObZ, html code.shiki .svObZ{--shiki-default:#B392F0}html .default .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html pre.shiki code .s4JwU, html code.shiki .s4JwU{--shiki-default:#85E89D}html pre.shiki code .sAwPA, html code.shiki .sAwPA{--shiki-default:#6A737D}html pre.shiki code .sDLfK, html code.shiki .sDLfK{--shiki-default:#79B8FF}",{"title":273,"searchDepth":320,"depth":320,"links":1463},[1464,1465,1466,1467,1468,1469,1470,1477,1478,1479],{"id":45,"depth":320,"text":46},{"id":96,"depth":320,"text":97},{"id":172,"depth":320,"text":173},{"id":262,"depth":320,"text":263},{"id":391,"depth":320,"text":392},{"id":443,"depth":320,"text":444},{"id":487,"depth":320,"text":488,"children":1471},[1472,1473,1474,1475,1476],{"id":506,"depth":335,"text":507},{"id":640,"depth":335,"text":641},{"id":1132,"depth":335,"text":1133},{"id":1215,"depth":335,"text":1216},{"id":1296,"depth":335,"text":1297},{"id":1378,"depth":320,"text":1379},{"id":1426,"depth":320,"text":1427},{"id":1439,"depth":320,"text":1440},"2026-06-14","Login load tests at 100 VUs timed out on Kubernetes. A single bcrypt.compare took 6 seconds. UV_THREADPOOL_SIZE wasn't enough. We fixed it by moving bcrypt out of the HTTP server into a NestJS native script.",false,"md",null,[1486,36,1487,1488,1489,1490,1491,1492],"bcrypt bottleneck","NestJS performance","Kubernetes load test","login timeout","Node.js thread pool","password hashing","k6 load testing",{},"\u002Fblog\u002Fbcrypt-login-bottleneck-kubernetes",{"title":6,"description":1481},"blog\u002Fbcrypt-login-bottleneck-kubernetes",[1498,1499,1500,1501,1502],"nestjs","nodejs","kubernetes","performance","pain-point","gsQXKPLuX2PP-7lAe9VVgGxe6ICSv75DEg5xkpKXt90",{"id":1505,"title":1506,"author":7,"body":1507,"date":1638,"description":1639,"draft":1482,"extension":1483,"image":1484,"keywords":1640,"meta":1647,"modified":1638,"navigation":720,"path":1648,"seo":1649,"stem":1650,"tags":1651,"__hash__":1655},"blog\u002Fblog\u002Fshopify-app-review-research.md","What Shopify Merchants Actually Complain About (And What Apps Don't Fix)",{"type":9,"value":1508,"toc":1626},[1509,1512,1515,1519,1533,1537,1541,1548,1551,1555,1558,1562,1565,1569,1576,1580,1585,1588,1592,1595,1606,1609,1617,1619],[12,1510,1511],{},"Market research doesn't have to mean expensive surveys or user interviews. Sometimes the signal is sitting in public, screaming at you — in 1-star app reviews and Shopify community forum threads.",[12,1513,1514],{},"I spent two weeks doing exactly this. Here's what I found.",[43,1516,1518],{"id":1517},"the-methodology","The methodology",[142,1520,1521,1524,1527,1530],{},[54,1522,1523],{},"Scraped the \"most reviewed\" categories on the Shopify App Store.",[54,1525,1526],{},"Filtered to apps with 100+ reviews and at least 15% 1-star ratings.",[54,1528,1529],{},"Read every 1-star review across 30 apps in three categories: inventory, automation, and loyalty.",[54,1531,1532],{},"Grouped complaints into themes.",[43,1534,1536],{"id":1535},"the-recurring-themes","The recurring themes",[504,1538,1540],{"id":1539},"_1-it-worked-until-my-store-scaled","1. \"It worked until my store scaled\"",[12,1542,1543,1544,1547],{},"The most common 1-star complaint isn't that apps are broken — it's that they ",[22,1545,1546],{},"break at scale",". An app that syncs inventory works fine at 500 SKUs. At 5,000 SKUs it times out, duplicates records, or silently skips items.",[12,1549,1550],{},"Merchants don't find out until a customer orders something out of stock.",[504,1552,1554],{"id":1553},"_2-automation-that-requires-a-developer-to-maintain","2. Automation that requires a developer to maintain",[12,1556,1557],{},"Shopify Flow is powerful. But the moment you need a conditional rule more complex than \"if order > $X, add tag Y,\" most merchants hit a wall. They either pay a developer to set it up (who then becomes a dependency) or they abandon automation entirely.",[504,1559,1561],{"id":1560},"_3-analytics-that-show-the-past-but-not-the-future","3. Analytics that show the past but not the future",[12,1563,1564],{},"Almost every analytics app shows you what happened last month. Very few help you act on what's likely to happen next month — like predicting inventory stockouts 3 weeks before Black Friday.",[504,1566,1568],{"id":1567},"_4-loyalty-programs-that-feel-transactional","4. Loyalty programs that feel transactional",[12,1570,1571,1572,1575],{},"Merchants want customers to feel ",[128,1573,1574],{},"valued",". Most loyalty apps send a generic \"You have 150 points!\" email. Merchants want personalization but get a points ledger with a mail merge.",[43,1577,1579],{"id":1578},"the-biggest-underserved-niche-i-found","The biggest underserved niche I found",[12,1581,1582],{},[22,1583,1584],{},"Inventory forecasting for mid-size merchants (100–2,000 SKUs).",[12,1586,1587],{},"Enterprise tools like Inventory Planner exist and cost $299+\u002Fmonth. Spreadsheets are what everyone else uses. There's a gap in the $49–$99\u002Fmonth range for something smart enough to be useful but simple enough to not require an analyst.",[43,1589,1591],{"id":1590},"what-im-validating-next","What I'm validating next",[12,1593,1594],{},"I'm scoping a small Shopify app in this space. The core feature: a weekly email report that tells you:",[51,1596,1597,1600,1603],{},[54,1598,1599],{},"Which products will stockout in the next 30 days based on current velocity.",[54,1601,1602],{},"Which products are overstocked and tying up cash.",[54,1604,1605],{},"One-click link to reorder from your supplier.",[12,1607,1608],{},"No dashboard. No onboarding call. Just a useful email, once a week.",[12,1610,1611,1612,1616],{},"If you run a Shopify store with at least 50 SKUs, ",[1446,1613,1615],{"href":1614},"mailto:hi@haile37.com","I'd love to talk",". I'm looking for 5 beta testers to validate this before I write a line of code.",[43,1618,1440],{"id":1439},[51,1620,1621],{},[54,1622,1623,1625],{},[1446,1624,1456],{"href":1455}," — how silent failures show up in other domains",{"title":273,"searchDepth":320,"depth":320,"links":1627},[1628,1629,1635,1636,1637],{"id":1517,"depth":320,"text":1518},{"id":1535,"depth":320,"text":1536,"children":1630},[1631,1632,1633,1634],{"id":1539,"depth":335,"text":1540},{"id":1553,"depth":335,"text":1554},{"id":1560,"depth":335,"text":1561},{"id":1567,"depth":335,"text":1568},{"id":1578,"depth":320,"text":1579},{"id":1590,"depth":320,"text":1591},{"id":1439,"depth":320,"text":1440},"2025-06-18","Two weeks reading Shopify forums and 1-star app reviews revealed four recurring merchant pain points — and one underserved niche in inventory forecasting.",[1641,1642,1643,1644,1645,1646],"Shopify apps","Shopify merchant pain points","inventory forecasting","Shopify App Store reviews","ecommerce automation","stockout prediction",{},"\u002Fblog\u002Fshopify-app-review-research",{"title":1506,"description":1639},"blog\u002Fshopify-app-review-research",[1652,1653,1654,1502],"shopify","ecommerce","research","KrGGPU5EbU3dhMDwd5keUb-EEX9BLohwM0tT0aRZC7g",{"id":1657,"title":1449,"author":7,"body":1658,"date":1833,"description":1834,"draft":1482,"extension":1483,"image":1484,"keywords":1835,"meta":1843,"modified":1833,"navigation":720,"path":1448,"seo":1844,"stem":1845,"tags":1846,"__hash__":1850},"blog\u002Fblog\u002Fdocker-container-alerts-missing.md",{"type":9,"value":1659,"toc":1826},[1660,1663,1680,1683,1687,1697,1703,1709,1755,1759,1779,1782,1784,1789,1793,1796,1807,1810,1817,1819],[12,1661,1662],{},"Here's a scenario that plays out in almost every small engineering team I've spoken with:",[142,1664,1665,1668,1671,1674,1677],{},[54,1666,1667],{},"A container OOMKills at 2 AM.",[54,1669,1670],{},"Docker restarts it automatically.",[54,1672,1673],{},"The service comes back up, slightly degraded.",[54,1675,1676],{},"Nobody is paged because \"the service is technically running.\"",[54,1678,1679],{},"Six hours later, a customer reports stale data.",[12,1681,1682],{},"The restart masked the problem. The alert never fired. And now the team is debugging a ghost.",[43,1684,1686],{"id":1685},"why-this-happens","Why this happens",[12,1688,1689,1690,1693,1694,169],{},"Docker and Docker Compose are excellent for ",[128,1691,1692],{},"running"," containers. They are terrible at ",[128,1695,1696],{},"telling you when something goes wrong",[12,1698,1699,1702],{},[26,1700,1701],{},"docker events"," exists, but nobody is watching it. Health checks exist, but only catch a container that's fully down — not one that's cycling restarts every five minutes.",[12,1704,1705,1706,851],{},"The gap is ",[22,1707,1708],{},"runtime signal without operational visibility",[204,1710,1711,1721],{},[207,1712,1713],{},[210,1714,1715,1718],{},[213,1716,1717],{},"What Docker gives you",[213,1719,1720],{},"What you actually need",[220,1722,1723,1731,1739,1747],{},[210,1724,1725,1728],{},[225,1726,1727],{},"Container status (running\u002Fstopped)",[225,1729,1730],{},"Exit reason + frequency",[210,1732,1733,1736],{},[225,1734,1735],{},"Restart count",[225,1737,1738],{},"Alert when restarts exceed threshold",[210,1740,1741,1744],{},[225,1742,1743],{},"Logs (when you pull them)",[225,1745,1746],{},"Proactive log streaming to Slack",[210,1748,1749,1752],{},[225,1750,1751],{},"Health check pass\u002Ffail",[225,1753,1754],{},"Trend over time",[43,1756,1758],{"id":1757},"who-feels-this-most","Who feels this most",[51,1760,1761,1767,1773],{},[54,1762,1763,1766],{},[22,1764,1765],{},"Indie hackers and small teams"," running self-hosted apps on a single VPS.",[54,1768,1769,1772],{},[22,1770,1771],{},"Early-stage startups"," that moved fast and haven't wired up full observability yet.",[54,1774,1775,1778],{},[22,1776,1777],{},"Freelancers"," managing production servers for clients.",[12,1780,1781],{},"Datadog and New Relic solve this — for $500+\u002Fmonth and a two-week integration project. That's not a viable option for a $30\u002Fmonth VPS.",[43,1783,1427],{"id":1426},[1429,1785,1786],{},[12,1787,1788],{},"There's no lightweight, affordable way to get alerted when Docker containers crash, cycle, or degrade — without setting up a full observability stack.",[43,1790,1792],{"id":1791},"what-a-lightweight-alert-tool-looks-like","What a lightweight alert tool looks like",[12,1794,1795],{},"A single binary that:",[51,1797,1798,1801,1804],{},[54,1799,1800],{},"Connects to the Docker socket",[54,1802,1803],{},"Watches for OOMKills, restarts, and exit codes",[54,1805,1806],{},"Sends Slack\u002Femail notifications with context (container name, exit code, last 20 log lines)",[12,1808,1809],{},"No Kubernetes required. No cloud agent. Just install it next to your containers.",[12,1811,1812,1813,1816],{},"If you're managing Docker in production on a VPS, start with restart-count thresholds — even a cron job that alerts when ",[26,1814,1815],{},"docker ps"," shows high restart counts beats complete silence.",[43,1818,1440],{"id":1439},[51,1820,1821],{},[54,1822,1823,1825],{},[1446,1824,1456],{"href":1455}," — another production problem that fails silently",{"title":273,"searchDepth":320,"depth":320,"links":1827},[1828,1829,1830,1831,1832],{"id":1685,"depth":320,"text":1686},{"id":1757,"depth":320,"text":1758},{"id":1426,"depth":320,"text":1427},{"id":1791,"depth":320,"text":1792},{"id":1439,"depth":320,"text":1440},"2025-06-10","OOMKilled, CrashLoopBackOff, exit code 137 — Docker containers fail silently in production. Learn why teams miss alerts and what lightweight monitoring looks like.",[1836,1837,1838,1839,1840,1841,1842],"Docker monitoring","OOMKill","container alerts","CrashLoopBackOff","exit code 137","VPS DevOps","Docker restart loop",{},{"title":1449,"description":1834},"blog\u002Fdocker-container-alerts-missing",[1847,1848,1849,1502],"docker","devops","monitoring","ZnGqX2h9tU12drvYNM1OKaEnw1GQqK2tyZsPjhm_6Pg",{"id":1852,"title":1456,"author":7,"body":1853,"date":1975,"description":1976,"draft":1482,"extension":1483,"image":1484,"keywords":1977,"meta":1985,"modified":1975,"navigation":720,"path":1455,"seo":1986,"stem":1987,"tags":1988,"__hash__":1991},"blog\u002Fblog\u002Femail-deliverability-pain-points.md",{"type":9,"value":1854,"toc":1967},[1855,1858,1862,1865,1876,1879,1883,1886,1904,1906,1911,1915,1918,1929,1932,1936,1939,1955,1958,1960],[12,1856,1857],{},"Every week I talk to developers who ship transactional emails and assume the job is done after they paste an SPF record into DNS. Three months later, their password-reset emails land in spam and they have no idea why.",[43,1859,1861],{"id":1860},"the-silent-failure","The silent failure",[12,1863,1864],{},"Unlike a 500 error that fires immediately, email deliverability degrades gradually:",[51,1866,1867,1870,1873],{},[54,1868,1869],{},"A new DMARC policy at a major mailbox provider tightens rejection rules.",[54,1871,1872],{},"A shared IP pool gets flagged because another tenant sent spam.",[54,1874,1875],{},"An SPF record grows past the 10-lookup limit and silently breaks.",[12,1877,1878],{},"None of these send you an alert. You find out when a customer tweets that they never received your invoice.",[43,1880,1882],{"id":1881},"what-the-current-tooling-misses","What the current tooling misses",[12,1884,1885],{},"Existing solutions fall into two camps:",[142,1887,1888,1894],{},[54,1889,1890,1893],{},[22,1891,1892],{},"One-off checkers"," — You paste a domain, get a result, and never check again. There's no monitoring.",[54,1895,1896,1899,1900,1903],{},[22,1897,1898],{},"Enterprise mail platforms"," — Postmark, SendGrid, and Mailgun have dashboards, but they only show data for mail ",[128,1901,1902],{},"sent through them",". If your domain is used for phishing by a third party, you won't see it.",[43,1905,1427],{"id":1426},[1429,1907,1908],{},[12,1909,1910],{},"Developers have no way to be alerted when their domain's email authentication breaks — until a customer complains.",[43,1912,1914],{"id":1913},"what-a-solution-looks-like","What a solution looks like",[12,1916,1917],{},"A lightweight background service that:",[51,1919,1920,1923,1926],{},[54,1921,1922],{},"Checks SPF, DKIM, and DMARC records on a schedule.",[54,1924,1925],{},"Alerts via Slack\u002Femail when something changes or breaks.",[54,1927,1928],{},"Shows a clear history so you can correlate policy changes with delivery drops.",[12,1930,1931],{},"Until automated monitoring is in place, the best defense is regular manual checks and a recurring reminder to re-verify your DNS records after any infrastructure change.",[43,1933,1935],{"id":1934},"quick-self-check-right-now","Quick self-check right now",[12,1937,1938],{},"You can verify your current setup using the free tools on this site:",[51,1940,1941,1948],{},[54,1942,1943,1947],{},[1446,1944,1946],{"href":1945},"\u002Ftools\u002Fspf-checker","SPF Checker"," — validate your SPF record and lookup count",[54,1949,1950,1954],{},[1446,1951,1953],{"href":1952},"\u002Ftools\u002Fdmarc-checker","DMARC Checker"," — inspect your DMARC policy and alignment",[12,1956,1957],{},"If either of those reveals something broken, fix it immediately — and schedule a monthly re-check, because DNS records change without warning.",[43,1959,1440],{"id":1439},[51,1961,1962],{},[54,1963,1964,1966],{},[1446,1965,1449],{"href":1448}," — another silent production failure pattern",{"title":273,"searchDepth":320,"depth":320,"links":1968},[1969,1970,1971,1972,1973,1974],{"id":1860,"depth":320,"text":1861},{"id":1881,"depth":320,"text":1882},{"id":1426,"depth":320,"text":1427},{"id":1913,"depth":320,"text":1914},{"id":1934,"depth":320,"text":1935},{"id":1439,"depth":320,"text":1440},"2025-06-01","Most developers configure SPF and DMARC once and forget them — until emails start bouncing. Learn why email authentication fails silently and how to monitor it.",[1978,1979,1980,1981,1982,1983,1984],"SPF record","DMARC policy","DKIM","email authentication","transactional email","DNS monitoring","email deliverability",{},{"title":1456,"description":1976},"blog\u002Femail-deliverability-pain-points",[1989,1990,1502],"email","deliverability","oBcj7LiKfxI18Kz6nsIX8noIi7f677a9JsN7N0xDxNo",1781441628654]