Skip to content

JAVA-5950 Update Transactions Convenient API with exponential backoff on retries#1899

Open
nhachicha wants to merge 65 commits intomongodb:backpressurefrom
nhachicha:nh/backpressure_convenient_api
Open

JAVA-5950 Update Transactions Convenient API with exponential backoff on retries#1899
nhachicha wants to merge 65 commits intomongodb:backpressurefrom
nhachicha:nh/backpressure_convenient_api

Conversation

@nhachicha
Copy link
Copy Markdown
Collaborator

@nhachicha nhachicha commented Feb 25, 2026

Original PR accidentally closed #1852, it has outstanding review comments for @stIncMale to go over when re-reviewing.

Relevant specification changes:

AI usage and effectiveness

  • Implementation (Sonnet/Opus 4.6): Partially AI-generated but went through many rounds of manual modifications. Tests were AI-generated based on the spec prose test descriptions, then reviewed and adjusted

  • It did identify a potential spec divergence: the implementation wraps non-labeled errors in MongoTimeoutException, which later resulted in a discussion during catch-up and https://jira.mongodb.org/browse/DRIVERS-3436

  • Copilot review 2/13 actionable (both minor nits) see details

Review generated by Claude Opus 4.6 as of commit `4a3d1ae1` on 2026-04-01.

Findings Table — Diff-Only Review vs. PR Context

# Finding Verdict Rationale
1 Non-volatile testJitterSupplier static field — data race risk across threads Valid, tracked Multiple reviewers flagged. Test-only, deferred to JAVA-6079. Should be volatile at minimum.
2 Thread.sleep(backoffMs) may overshoot remaining timeout False positive shortenBy(...).onExpired(...) checks before sleeping; next iteration fails fast if expired. Overshoot bounded by
500ms cap.
3 clearTransactionContextOnError(e) call removed False positive @stIncMale confirmed: commitTransaction calls it internally; the outer call was redundant.
4 Error wrapping obscures original exception type Valid, intentional Spec-mandated (DRIVERS-3391). Labels are now copied from cause to wrapper. Tests verify.
5 Visibility widening of CommandOperationHelper + constants False positive Everything under com.mongodb.internal is internal by definition. Enables cross-package dedup.
6 @VisibleForTesting on timeoutOrAlternative Resolved @stIncMale requested removal; addressed in cherry-picked refactor (60acf51d).
7 TRANSACTION_MAX_MS package-private; hardcoded expected values in test Minor nit Iteration bounds now based on EXPECTED_BACKOFFS_MAX_VALUES.length (adopted). Hardcoded array is an
independent oracle — acceptable.
8 TODO comments in production code Valid, accepted Will be resolved before backpressuremain merge. Blocked on docs PR.
9 testRetryBackoffIsEnforced wall-clock timing sensitivity Valid, mitigated Jitter controlled via setTestJitterSupplier. 500ms tolerance. Spec-mandated prose test.
10 Deleted copyTimeoutContext() — verify no other callers False positive Explicitly requested by @stIncMale. Private constructor also removed as dead code.

Summary: 4 false positives, 2 resolved, 1 valid+intentional, 1 tracked, 1 accepted, 1 minor nit.


Additional Findings from PR Reviewers (not caught in diff-only review)

# Finding Source Status
A calculateTransactionBackoffMs Javadoc said 0-based but implementation is 1-based Copilot, @stIncMale Fixed
B Error labels not copied from wrapped cause to MongoTimeoutException @stIncMale Fixed (d4bc4c70)
C applyMajorityWriteConcernToTransactionOptions called on outer retry @stIncMale Fixed (60acf51d)
D withTransaction code structure doesn't follow spec algorithm ordering @stIncMale Fixed (60acf51d)
E Spec submodule not updated to latest spec tests @stIncMale Deferredmain updated separately
F AI-generated comments cluttering test code @stIncMale Fixed
G MongoTimeoutException message inconsistency (missing period) Copilot Fixed
H timeoutOrAlternative removal + timeoutMsConfigured renaming @stIncMale Fixed (60acf51d)

Remaining Issues to Address

# Issue Severity File(s) Status Tracking
1 testJitterSupplier should be volatile — mutable static read/written across threads without memory visibility guarantee Low (test-only) ExponentialBackoff.java Open — deferred JAVA-6079

|
| 2 | Spec "Note 1" about error propagation needs rework — spec language around propagation/raising is inconsistent | Medium | Spec-side | Open — spec change needed | DRIVERS-3436 |
| 3 | TODO-BACKPRESSURE comments in production code — must be resolved before merging backpressuremain | Low | MongoException.java, ExponentialBackoff.java | Open — blocked on docs PR
(10gen/docs-mongodb-internal#17281) | Implicit |
| 4 | Spec submodule not pointing to latest spec tests — new transactions-convenient-api JSON tests not included | Medium | testing/resources/specifications (submodule) | Open — main updated
(55e1861) but not merged into backpressure | TODO-BACKPRESSURE |
| 5 | Verify spec test withTransaction surfaces a timeout after exhausting transient transaction retries is run | Medium | Test runner config | Open — reminder in PR comments | — |
| 6 | Verify prose tests assert all error labels are copied to wrapping exception | Medium | WithTransactionProseTest.java | Partially addressed — labels copied in code, test coverage completeness not
confirmed | — |
| 7 | OTel tracing terminology inconsistency (finalize vs finish vs stop) | Low (orthogonal) | ClientSessionImpl.java tracing code | Open — no ticket yet | — |

Priority Summary

  • Must resolve before merge to backpressure: Items 5, 6
  • Must resolve before merge to main: Items 3 (TODOs), 4 (submodule)
  • Tracked for future work: Items 1 (JAVA-6079), 2 (DRIVERS-3436), 7

What Looks Good

  • Exponential backoff formula (base=5ms, growth=1.5x, cap=500ms) with jitter is well-designed for transaction retries
  • Replacing ClientSessionClock singleton with SystemNanoTime + Mockito is a significant test infrastructure improvement
  • ExponentialBackoffTest has good coverage of boundary conditions (jitter=0, jitter=1, cap enforcement)
  • backpressure: true handshake flag is a clean protocol extension
  • Error label propagation through timeout wrapping preserves diagnostic information
  • abortIfInTransaction() extraction reduces duplication and improves clarity
  • @stIncMale's refactor of withTransaction (60acf51d) now aligns the code with the spec algorithm, making correctness easier to verify
  • New prose tests (testRetryBackoffIsEnforced, testExponentialBackoffOnTransientError) provide functional validation of backoff behavior

strogiyotec and others added 30 commits February 25, 2026 13:00
…Impl.java

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…s exceeded (ex operationContext.getTimeoutContext().getReadTimeoutMS())
…tionProseTest.java

Co-authored-by: Valentin Kovalenko <valentin.male.kovalenko@gmail.com>
…tionProseTest.java

Co-authored-by: Valentin Kovalenko <valentin.male.kovalenko@gmail.com>
…tionProseTest.java

Co-authored-by: Valentin Kovalenko <valentin.male.kovalenko@gmail.com>
…tionProseTest.java

Co-authored-by: Valentin Kovalenko <valentin.male.kovalenko@gmail.com>
…tionProseTest.java

Co-authored-by: Valentin Kovalenko <valentin.male.kovalenko@gmail.com>
…tionProseTest.java

Co-authored-by: Valentin Kovalenko <valentin.male.kovalenko@gmail.com>
…tionProseTest.java

Co-authored-by: Valentin Kovalenko <valentin.male.kovalenko@gmail.com>
Co-authored-by: Valentin Kovalenko <valentin.male.kovalenko@gmail.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 21 out of 21 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread driver-core/src/main/com/mongodb/MongoException.java
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 21 out of 21 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread driver-sync/src/main/com/mongodb/client/internal/ClientSessionImpl.java Outdated
…Impl.java

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 21 out of 21 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 21 out of 21 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread driver-sync/src/test/functional/com/mongodb/client/WithTransactionProseTest.java Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 21 out of 21 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@nhachicha nhachicha requested review from katcharov and stIncMale April 1, 2026 17:20
UnknownTransactionCommitResult is retriable in the commit loop if we don't exceed the timeout, so it makes sense to wrap it into a Timeout error if we exceed the timeout and want to throw and return (as described in section 10.1.1)
@stIncMale
Copy link
Copy Markdown
Member

I updated the PR description with the information about DRIVERS-3436 and the JAVA-6160 which the former split into.

@vbabanin vbabanin self-requested a review April 15, 2026 03:54
stIncMale added a commit to stIncMale/mongo-java-driver that referenced this pull request Apr 15, 2026
I manually copied it from mongodb#1899.
Add a dedicated MongoClientException subtype thrown when the convenient transactions API exceeds its overall timeout, replacing the generic MongoTimeoutException wrap in ClientSessionImpl.
Comment thread driver-core/src/main/com/mongodb/WithTransactionTimeoutException.java Outdated
Comment thread driver-core/src/main/com/mongodb/WithTransactionTimeoutException.java Outdated
Comment thread driver-core/src/main/com/mongodb/WithTransactionTimeoutException.java Outdated
Comment thread driver-sync/src/main/com/mongodb/client/internal/ClientSessionImpl.java Outdated
… to the @mongodb.driver.manual taglet.

- Narrow wrapInMongoTimeoutException and wrapInNonTimeoutMsMongoTimeoutException return types from MongoException to MongoClientException.
public class WithTransactionProseTest extends DatabaseTestCase {
private static final long START_TIME_MS = 1L;
private static final long ERROR_GENERATING_INTERVAL = 121000L;
private static final Duration ERROR_GENERATING_INTERVAL = Duration.ofSeconds(120);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] The current name reads like it's describing how often errors happen. Should we use TIMEOUT_EXCEEDING_DURATION?

}

/**
* This test is not from the specification.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Optional] For consistency, should we apply the same naming/annotation to the other two non-spec tests in this class as well? It would keep the class organized and make it clear which tests are spec-driven vs local coverage.

filesCollectionHelper = new CollectionHelper<>(new BsonDocumentCodec(), gridFsFileNamespace);
chunksCollectionHelper = new CollectionHelper<>(new BsonDocumentCodec(), gridFsChunksNamespace);
commandListener = new TestCommandListener();

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

try {
for (int attemptNumber = 1; attemptNumber <= EXPECTED_BACKOFFS_MAX_VALUES.length; attemptNumber++) {
long backoff = ExponentialBackoff.calculateTransactionBackoffMs(attemptNumber);
assertEquals(0, backoff, "With jitter=0, backoff should always be 0 ms");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assertEquals(0, backoff, "With jitter=0, backoff should always be 0 ms");
assertEquals(0, backoff, "Attempt %d: with jitter=0, backoff should always be 0 ms");

}

@Test
void testCustomJitterWithOne() {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All other tests use the testCalculateTransactionBackoffMs prefix, which matches the SUT (calculateTransactionBackoffMs). This test also exercises the same method with a different jitter parameter - could we keep the same naming for consistency?

Suggested change
void testCustomJitterWithOne() {
void testCalculateTransactionBackoffMsWithJitterOne() {

}

@Test
void testCustomJitterWithZero() {
Copy link
Copy Markdown
Member

@vbabanin vbabanin Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as in #1899 (comment).

Suggested change
void testCustomJitterWithZero() {
void testCalculateTransactionBackoffMsWithJitterZero() {

Comment on lines +306 to 310
} else if (labelCarryingException.hasErrorLabel(UNKNOWN_TRANSACTION_COMMIT_RESULT_LABEL)) {
throw e;
} else {
throw mongoException;
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

throw e (line 307) and throw mongoException (line 309) throw the same object - mongoException is just e cast to MongoException at line 298. Using different throw statements makes it look like the branches throw different exceptions. Could we make both branches throw e (line 312) for simplicity?

Suggested change
} else if (labelCarryingException.hasErrorLabel(UNKNOWN_TRANSACTION_COMMIT_RESULT_LABEL)) {
throw e;
} else {
throw mongoException;
}
}

Comment on lines +288 to 294
} catch (Throwable e) {
abortIfInTransaction();
throw e;
}
T retVal;
try {
retVal = transactionBody.execute();
Copy link
Copy Markdown
Member

@vbabanin vbabanin Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a separate try here?

As I see it, the separate try around startTransaction (lines 282–291) mirrors the spec’s separation of step 4 (startTransaction error) vs step 7 (callback error). However, merging into a single try should produce identical behavior.

startTransaction only throws client-side validation exceptions (no server I/O, no error labels):
IllegalArgumentException, IllegalStateException and MongoException do not have TransientTransactionError label, which means they fall through to throw e

Both paths end with abortIfInTransaction() and throw e, so the separate try could be removed to simplify.

Suggested change
} catch (Throwable e) {
abortIfInTransaction();
throw e;
}
T retVal;
try {
retVal = transactionBody.execute();
T retVal = transactionBody.execute();

MongoClientException timeoutException = timeoutMsConfigured
? createMongoTimeoutException(cause)
: wrapInNonTimeoutMsMongoTimeoutException(cause);
if (timeoutException != cause) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (timeoutException != cause) {
//TODO-JAVA-6154 constructor should be used.
if (timeoutException != cause) {

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants