You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After upgrading from apollo-datasource-rest@3.x to @apollo/datasource-rest@6.x, our GraphQL service (~30,000 requests/minute, 27 datasource classes) started experiencing Kubernetes pod restarts. Pods were killed with exit code 137 (SIGKILL) after liveness probes failed - not due to memory exhaustion, but because the Node.js event loop was blocked by synchronous cloneParsedBody calls.
Root Cause
cloneParsedBody (introduced in v5 via commit 609ba1f) runs synchronously on every response return - including every deduplicated cache hit:
// RESTDataSource.ts — deduplication pathif(previousRequestPromise)returnpreviousRequestPromise.then((result)=>this.cloneDataSourceFetchResult(result, ...)// clones on EVERY dedup hit);
For high-fanout GraphQL operations (e.g., a query returning 300 items, each triggering 5+ field resolvers that make deduplicated GET calls), this results in thousands of synchronous clone operations back-to-back on the main thread with no yielding.
At our scale, this blocks the event loop for long enough that HTTP health probes cannot respond within their timeout window (5 seconds), causing the orchestrator to kill the pod.
Evidence
v3 had no equivalent — memoized responses were returned as-is without cloning
Replacing lodash.cloneDeep with structuredClone (as suggested in PR use lodash cloneDeep to clone parsed body #270 comment) reduced overhead but did not resolve the issue - structuredClone is still synchronous
Disabling cloning entirely (return parsedBody) immediately resolved all pod restarts (0 restarts vs. every 28-76 minutes before)
Reproduction
High-fanout GraphQL operation with deduplicate-until-invalidated policy:
A query returns N items (e.g., 300)
Each item has multiple field resolvers
Each field resolver makes a GET call to a datasource (deduplicated)
Each dedup hit triggers cloneParsedBody synchronously
Result: N x resolvers x dedup-hits synchronous clone calls
The problem scales with both the number of items and the number of field resolvers per item.
Suggested Fix
The commit message for 609ba1f already acknowledges this:
"There's probably an optimization that can be done to skip the cloning if there was no duplicate operation"
Possible approaches:
Skip cloning when there was no deduplication - if only one consumer exists, cloning is unnecessary
Make cloning opt-in rather than opt-out - let consumers who mutate responses opt into cloning, rather than penalizing all consumers
Use an async clone or copy-on-write pattern - avoid blocking the event loop
Current Workaround
Override cloneParsedBody to return the body as-is:
This is safe when resolvers do not mutate datasource response objects (which is the common pattern - most resolvers spread/map rather than mutate in place).
Description
After upgrading from
apollo-datasource-rest@3.xto@apollo/datasource-rest@6.x, our GraphQL service (~30,000 requests/minute, 27 datasource classes) started experiencing Kubernetes pod restarts. Pods were killed with exit code 137 (SIGKILL) after liveness probes failed - not due to memory exhaustion, but because the Node.js event loop was blocked by synchronouscloneParsedBodycalls.Root Cause
cloneParsedBody(introduced in v5 via commit 609ba1f) runs synchronously on every response return - including every deduplicated cache hit:For high-fanout GraphQL operations (e.g., a query returning 300 items, each triggering 5+ field resolvers that make deduplicated GET calls), this results in thousands of synchronous clone operations back-to-back on the main thread with no yielding.
At our scale, this blocks the event loop for long enough that HTTP health probes cannot respond within their timeout window (5 seconds), causing the orchestrator to kill the pod.
Evidence
lodash.cloneDeepwithstructuredClone(as suggested in PR use lodash cloneDeep to clone parsed body #270 comment) reduced overhead but did not resolve the issue -structuredCloneis still synchronousreturn parsedBody) immediately resolved all pod restarts (0 restarts vs. every 28-76 minutes before)Reproduction
High-fanout GraphQL operation with
deduplicate-until-invalidatedpolicy:cloneParsedBodysynchronouslyThe problem scales with both the number of items and the number of field resolvers per item.
Suggested Fix
The commit message for 609ba1f already acknowledges this:
Possible approaches:
Current Workaround
Override
cloneParsedBodyto return the body as-is:This is safe when resolvers do not mutate datasource response objects (which is the common pattern - most resolvers spread/map rather than mutate in place).
Environment
@apollo/datasource-rest@6.4.1node-fetch@2.7.0