fix: prevent rate limiter bypass via Lua atomic script #593
fix: prevent rate limiter bypass via Lua atomic script #593YashIIT0909 wants to merge 8 commits intocameri:mainfrom
Conversation
🦋 Changeset detectedLatest commit: 3fe35ef The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
…s in SlidingWindowRateLimiter
There was a problem hiding this comment.
Pull request overview
This PR hardens the SlidingWindowRateLimiter against concurrent-burst bypasses by moving its Redis sliding-window logic into an atomic Lua script (and switching expiry to millisecond precision), addressing the race/collision behavior described in #467.
Changes:
- Replaces multi-call Redis logic in
SlidingWindowRateLimiter.hit()with a single Redis-evaluated Lua script (atomic ZSET maintenance + unique member generation). - Adds a raw Lua eval helper to the Redis cache adapter and updates the cache adapter typing/tests to stub
eval. - Adds a changeset documenting the patch release.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
src/utils/sliding-window-rate-limiter.ts |
Migrates sliding-window operations into an atomic Lua script and uses PEXPIRE for ms-accurate expiry. |
src/adapters/redis-adapter.ts |
Adds evalRaw alongside existing SHA-cached eval implementation. |
src/@types/adapters.ts |
Updates ICacheAdapter typing context around eval (formatting/spacing). |
test/unit/utils/sliding-window-rate-limiter.spec.ts |
Updates unit tests to stub cache.eval and assert boolean outcomes. |
.changeset/metal-snails-prove.md |
Patch changeset for the rate-limiter bypass fix. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| it('returns true if rate limited', async () => { | ||
| const now = Date.now() | ||
| getRangeFromSortedSetStub.resolves([`${now}:6`, `${now}:4`, `${now}:1`]) | ||
| evalStub.resolves(1) | ||
|
|
||
| const actualResult = await rateLimiter.hit('key', 1, { period: 60000, rate: 10 }) | ||
|
|
||
| expect(actualResult).to.be.true | ||
| }) | ||
|
|
||
| it('returns false if not rate limited', async () => { | ||
| const now = Date.now() | ||
| getRangeFromSortedSetStub.resolves([`${now}:10`]) | ||
| evalStub.resolves(0) | ||
|
|
||
| const actualResult = await rateLimiter.hit('key', 1, { period: 60000, rate: 10 }) |
There was a problem hiding this comment.
These tests now only assert the boolean mapping of the Redis result. Given the security impact of this change, it would be better to also assert that cache.eval is invoked with the expected keys/args (period/step/rate) and add a case where the stub resolves to a string (e.g. '1') to ensure the production code handles Redis return types robustly.
…string return types from Redis
…nto fix/rate-limiter
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if hits + step > max_rate then | ||
| return 1 | ||
| end | ||
|
|
||
| local base_member = timestamp .. ':' .. step | ||
| local member = base_member | ||
| local counter = 0 | ||
| while redis.call('ZSCORE', key, member) do | ||
| counter = counter + 1 | ||
| member = base_member .. ':' .. counter | ||
| end | ||
|
|
||
| redis.call('ZADD', key, timestamp, member) |
| local base_member = timestamp .. ':' .. step | ||
| local member = base_member | ||
| local counter = 0 | ||
| while redis.call('ZSCORE', key, member) do | ||
| counter = counter + 1 | ||
| member = base_member .. ':' .. counter | ||
| end |
| it('returns true if rate limited', async () => { | ||
| const now = Date.now() | ||
| getRangeFromSortedSetStub.resolves([`${now}:6`, `${now}:4`, `${now}:1`]) | ||
| evalStub.resolves(1) | ||
|
|
||
| const actualResult = await rateLimiter.hit('key', 1, { period: 60000, rate: 10 }) | ||
|
|
||
| expect(actualResult).to.be.true | ||
| expect(evalStub).to.have.been.calledOnce | ||
| const args = evalStub.firstCall.args | ||
| expect(args[1]).to.deep.equal(['key']) | ||
| expect(args[2][1]).to.equal('60000') // period | ||
| expect(args[2][2]).to.equal('1') // step | ||
| expect(args[2][3]).to.equal('10') // max_rate | ||
| }) | ||
|
|
||
| it('returns false if not rate limited', async () => { | ||
| const now = Date.now() | ||
| getRangeFromSortedSetStub.resolves([`${now}:10`]) | ||
| evalStub.resolves(0) | ||
|
|
||
| const actualResult = await rateLimiter.hit('key', 1, { period: 60000, rate: 10 }) | ||
|
|
||
| expect(actualResult).to.be.false | ||
| }) | ||
|
|
||
| it('robustly handles string return types from Redis', async () => { | ||
| evalStub.resolves('1') | ||
|
|
||
| const actualResult = await rateLimiter.hit('key', 1, { period: 60000, rate: 10 }) | ||
|
|
||
| expect(actualResult).to.be.true |
|
@YashIIT0909 The lua script isn't actually being tested, right? How do we know for sure this change is working as intended? |
Description
This PR resolves a critical rate-limiter bypass vulnerability in the SlidingWindowRateLimiter by migrating the sliding-window operations from Node.js
Promise.alllogic into an atomic Lua script executed natively on the Redis client.It accomplishes three crucial fixes:
while redis.call('ZSCORE', key, member), entirely avoiding data loss for exact-millisecond bursts without relying on heavy external UUID libraries for uniqueness.EXPIREcommand to the precisely accurate milliseconds-basedPEXPIREcommand.Related Issue
Fixes #467
Motivation and Context
Previously, when hundreds of WebSocket connections or REST payloads arrived in the exact same millisecond, Redis answered
0to the initial grouped read checks before any writing actually completed due to Node's asynchronous event loop. This allowed massive concurrent bursts to completely bypass the configured metrics. Furthermore, because concurrent writes were uniquely formatted strictly astimestamp:step, they overwrote each other as exact-duplicate members in theZSET, effectively erasing accurate metrics from the cache. Utilizing an atomic eval wrapper enforces exact, bottlenecked throughput serialization in Redis, guaranteeing the metrics are impenetrable to race-condition exploitation.How Has This Been Tested?
npm run test:unit).5 / minuteenvironment threshold locally via Docker Compose.1,000concurrent payloads inside the exact same millisecond using a WebSocket parallelPromise.all()PoC script.995requests returning["NOTICE", "rate limited"], while safely accepting exactly5requests, proving maximum robustness against intentional bypass bursts.Screenshots (if appropriate):
The screenshot is of a checking script that is being ran against a limit of 5.
Types of changes
Checklist: