Skip to content

cloneParsedBody blocks event loop under high-fanout operations #433

Description

@cevou

Description

After upgrading from apollo-datasource-rest@3.x to @apollo/datasource-rest@6.x, our GraphQL service (~30,000 requests/minute, 27 datasource classes) started experiencing Kubernetes pod restarts. Pods were killed with exit code 137 (SIGKILL) after liveness probes failed - not due to memory exhaustion, but because the Node.js event loop was blocked by synchronous cloneParsedBody calls.

Root Cause

cloneParsedBody (introduced in v5 via commit 609ba1f) runs synchronously on every response return - including every deduplicated cache hit:

// RESTDataSource.ts — deduplication path
if (previousRequestPromise)
  return previousRequestPromise.then((result) =>
    this.cloneDataSourceFetchResult(result, ...)  // clones on EVERY dedup hit
  );

For high-fanout GraphQL operations (e.g., a query returning 300 items, each triggering 5+ field resolvers that make deduplicated GET calls), this results in thousands of synchronous clone operations back-to-back on the main thread with no yielding.

At our scale, this blocks the event loop for long enough that HTTP health probes cannot respond within their timeout window (5 seconds), causing the orchestrator to kill the pod.

Evidence

  • v3 had no equivalent — memoized responses were returned as-is without cloning
  • Replacing lodash.cloneDeep with structuredClone (as suggested in PR use lodash cloneDeep to clone parsed body #270 comment) reduced overhead but did not resolve the issue - structuredClone is still synchronous
  • Disabling cloning entirely (return parsedBody) immediately resolved all pod restarts (0 restarts vs. every 28-76 minutes before)

Reproduction

High-fanout GraphQL operation with deduplicate-until-invalidated policy:

  1. A query returns N items (e.g., 300)
  2. Each item has multiple field resolvers
  3. Each field resolver makes a GET call to a datasource (deduplicated)
  4. Each dedup hit triggers cloneParsedBody synchronously
  5. Result: N x resolvers x dedup-hits synchronous clone calls

The problem scales with both the number of items and the number of field resolvers per item.

Suggested Fix

The commit message for 609ba1f already acknowledges this:

"There's probably an optimization that can be done to skip the cloning if there was no duplicate operation"

Possible approaches:

  1. Skip cloning when there was no deduplication - if only one consumer exists, cloning is unnecessary
  2. Make cloning opt-in rather than opt-out - let consumers who mutate responses opt into cloning, rather than penalizing all consumers
  3. Use an async clone or copy-on-write pattern - avoid blocking the event loop

Current Workaround

Override cloneParsedBody to return the body as-is:

override cloneParsedBody<TResult>(parsedBody: TResult): TResult {
  return parsedBody;
}

This is safe when resolvers do not mutate datasource response objects (which is the common pattern - most resolvers spread/map rather than mutate in place).

Environment

  • @apollo/datasource-rest@6.4.1
  • node-fetch@2.7.0
  • ~30,000 GraphQL requests/minute
  • 27 datasource classes extending RESTDataSource

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions