JAVA-5950 Update Transactions Convenient API with exponential backoff on retries#1899
JAVA-5950 Update Transactions Convenient API with exponential backoff on retries#1899nhachicha wants to merge 65 commits intomongodb:backpressurefrom
Conversation
…Impl.java Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…s exceeded (ex operationContext.getTimeoutContext().getReadTimeoutMS())
…tionProseTest.java Co-authored-by: Valentin Kovalenko <valentin.male.kovalenko@gmail.com>
…tionProseTest.java Co-authored-by: Valentin Kovalenko <valentin.male.kovalenko@gmail.com>
…tionProseTest.java Co-authored-by: Valentin Kovalenko <valentin.male.kovalenko@gmail.com>
…tionProseTest.java Co-authored-by: Valentin Kovalenko <valentin.male.kovalenko@gmail.com>
…tionProseTest.java Co-authored-by: Valentin Kovalenko <valentin.male.kovalenko@gmail.com>
…tionProseTest.java Co-authored-by: Valentin Kovalenko <valentin.male.kovalenko@gmail.com>
…tionProseTest.java Co-authored-by: Valentin Kovalenko <valentin.male.kovalenko@gmail.com>
Co-authored-by: Valentin Kovalenko <valentin.male.kovalenko@gmail.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 21 out of 21 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 21 out of 21 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…Impl.java Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 21 out of 21 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 21 out of 21 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 21 out of 21 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
UnknownTransactionCommitResult is retriable in the commit loop if we don't exceed the timeout, so it makes sense to wrap it into a Timeout error if we exceed the timeout and want to throw and return (as described in section 10.1.1)
…re_convenient_api
|
I updated the PR description with the information about DRIVERS-3436 and the JAVA-6160 which the former split into. |
I manually copied it from mongodb#1899.
Add a dedicated MongoClientException subtype thrown when the convenient transactions API exceeds its overall timeout, replacing the generic MongoTimeoutException wrap in ClientSessionImpl.
… to the @mongodb.driver.manual taglet. - Narrow wrapInMongoTimeoutException and wrapInNonTimeoutMsMongoTimeoutException return types from MongoException to MongoClientException.
| public class WithTransactionProseTest extends DatabaseTestCase { | ||
| private static final long START_TIME_MS = 1L; | ||
| private static final long ERROR_GENERATING_INTERVAL = 121000L; | ||
| private static final Duration ERROR_GENERATING_INTERVAL = Duration.ofSeconds(120); |
There was a problem hiding this comment.
[nit] The current name reads like it's describing how often errors happen. Should we use TIMEOUT_EXCEEDING_DURATION?
| } | ||
|
|
||
| /** | ||
| * This test is not from the specification. |
There was a problem hiding this comment.
[Optional] For consistency, should we apply the same naming/annotation to the other two non-spec tests in this class as well? It would keep the class organized and make it clear which tests are spec-driven vs local coverage.
| filesCollectionHelper = new CollectionHelper<>(new BsonDocumentCodec(), gridFsFileNamespace); | ||
| chunksCollectionHelper = new CollectionHelper<>(new BsonDocumentCodec(), gridFsChunksNamespace); | ||
| commandListener = new TestCommandListener(); | ||
|
|
| try { | ||
| for (int attemptNumber = 1; attemptNumber <= EXPECTED_BACKOFFS_MAX_VALUES.length; attemptNumber++) { | ||
| long backoff = ExponentialBackoff.calculateTransactionBackoffMs(attemptNumber); | ||
| assertEquals(0, backoff, "With jitter=0, backoff should always be 0 ms"); |
There was a problem hiding this comment.
| assertEquals(0, backoff, "With jitter=0, backoff should always be 0 ms"); | |
| assertEquals(0, backoff, "Attempt %d: with jitter=0, backoff should always be 0 ms"); |
| } | ||
|
|
||
| @Test | ||
| void testCustomJitterWithOne() { |
There was a problem hiding this comment.
All other tests use the testCalculateTransactionBackoffMs prefix, which matches the SUT (calculateTransactionBackoffMs). This test also exercises the same method with a different jitter parameter - could we keep the same naming for consistency?
| void testCustomJitterWithOne() { | |
| void testCalculateTransactionBackoffMsWithJitterOne() { |
| } | ||
|
|
||
| @Test | ||
| void testCustomJitterWithZero() { |
There was a problem hiding this comment.
Same as in #1899 (comment).
| void testCustomJitterWithZero() { | |
| void testCalculateTransactionBackoffMsWithJitterZero() { |
| } else if (labelCarryingException.hasErrorLabel(UNKNOWN_TRANSACTION_COMMIT_RESULT_LABEL)) { | ||
| throw e; | ||
| } else { | ||
| throw mongoException; | ||
| } |
There was a problem hiding this comment.
throw e (line 307) and throw mongoException (line 309) throw the same object - mongoException is just e cast to MongoException at line 298. Using different throw statements makes it look like the branches throw different exceptions. Could we make both branches throw e (line 312) for simplicity?
| } else if (labelCarryingException.hasErrorLabel(UNKNOWN_TRANSACTION_COMMIT_RESULT_LABEL)) { | |
| throw e; | |
| } else { | |
| throw mongoException; | |
| } | |
| } |
| } catch (Throwable e) { | ||
| abortIfInTransaction(); | ||
| throw e; | ||
| } | ||
| T retVal; | ||
| try { | ||
| retVal = transactionBody.execute(); |
There was a problem hiding this comment.
Do we need a separate try here?
As I see it, the separate try around startTransaction (lines 282–291) mirrors the spec’s separation of step 4 (startTransaction error) vs step 7 (callback error). However, merging into a single try should produce identical behavior.
startTransaction only throws client-side validation exceptions (no server I/O, no error labels):
IllegalArgumentException, IllegalStateException and MongoException do not have TransientTransactionError label, which means they fall through to throw e
Both paths end with abortIfInTransaction() and throw e, so the separate try could be removed to simplify.
| } catch (Throwable e) { | |
| abortIfInTransaction(); | |
| throw e; | |
| } | |
| T retVal; | |
| try { | |
| retVal = transactionBody.execute(); | |
| T retVal = transactionBody.execute(); |
| MongoClientException timeoutException = timeoutMsConfigured | ||
| ? createMongoTimeoutException(cause) | ||
| : wrapInNonTimeoutMsMongoTimeoutException(cause); | ||
| if (timeoutException != cause) { |
There was a problem hiding this comment.
| if (timeoutException != cause) { | |
| //TODO-JAVA-6154 constructor should be used. | |
| if (timeoutException != cause) { |
Original PR accidentally closed #1852, it has outstanding review comments for @stIncMale to go over when re-reviewing.
Relevant specification changes:
JAVA-5950, JAVA-6046, (JAVA-6093, JAVA-6160), JAVA-6113 (only the part about
transactions-convenient-api)AI usage and effectiveness
Implementation (Sonnet/Opus 4.6): Partially AI-generated but went through many rounds of manual modifications. Tests were AI-generated based on the spec prose test descriptions, then reviewed and adjusted
It did identify a potential spec divergence: the implementation wraps non-labeled errors in MongoTimeoutException, which later resulted in a discussion during catch-up and https://jira.mongodb.org/browse/DRIVERS-3436
Copilot review 2/13 actionable (both minor nits) see details
Review generated by Claude Opus 4.6 as of commit `4a3d1ae1` on 2026-04-01.
Findings Table — Diff-Only Review vs. PR Context
testJitterSupplierstatic field — data race risk across threadsvolatileat minimum.Thread.sleep(backoffMs)may overshoot remaining timeoutshortenBy(...).onExpired(...)checks before sleeping; next iteration fails fast if expired. Overshoot bounded byclearTransactionContextOnError(e)call removedcommitTransactioncalls it internally; the outer call was redundant.CommandOperationHelper+ constantscom.mongodb.internalis internal by definition. Enables cross-package dedup.@VisibleForTestingontimeoutOrAlternative60acf51d).TRANSACTION_MAX_MSpackage-private; hardcoded expected values in testEXPECTED_BACKOFFS_MAX_VALUES.length(adopted). Hardcoded array is anbackpressure→mainmerge. Blocked on docs PR.testRetryBackoffIsEnforcedwall-clock timing sensitivitysetTestJitterSupplier. 500ms tolerance. Spec-mandated prose test.copyTimeoutContext()— verify no other callersSummary: 4 false positives, 2 resolved, 1 valid+intentional, 1 tracked, 1 accepted, 1 minor nit.
Additional Findings from PR Reviewers (not caught in diff-only review)
calculateTransactionBackoffMsJavadoc said 0-based but implementation is 1-basedMongoTimeoutExceptiond4bc4c70)applyMajorityWriteConcernToTransactionOptionscalled on outer retry60acf51d)withTransactioncode structure doesn't follow spec algorithm ordering60acf51d)mainupdated separatelyMongoTimeoutExceptionmessage inconsistency (missing period)timeoutOrAlternativeremoval +timeoutMsConfiguredrenaming60acf51d)Remaining Issues to Address
testJitterSuppliershould bevolatile— mutable static read/written across threads without memory visibility guaranteeExponentialBackoff.java|
| 2 | Spec "Note 1" about error propagation needs rework — spec language around propagation/raising is inconsistent | Medium | Spec-side | Open — spec change needed | DRIVERS-3436 |
| 3 |
TODO-BACKPRESSUREcomments in production code — must be resolved before mergingbackpressure→main| Low |MongoException.java,ExponentialBackoff.java| Open — blocked on docs PR(10gen/docs-mongodb-internal#17281) | Implicit |
| 4 | Spec submodule not pointing to latest spec tests — new
transactions-convenient-apiJSON tests not included | Medium |testing/resources/specifications(submodule) | Open —mainupdated(
55e1861) but not merged intobackpressure| TODO-BACKPRESSURE || 5 | Verify spec test
withTransaction surfaces a timeout after exhausting transient transaction retriesis run | Medium | Test runner config | Open — reminder in PR comments | — || 6 | Verify prose tests assert all error labels are copied to wrapping exception | Medium |
WithTransactionProseTest.java| Partially addressed — labels copied in code, test coverage completeness notconfirmed | — |
| 7 | OTel tracing terminology inconsistency (
finalizevsfinishvsstop) | Low (orthogonal) |ClientSessionImpl.javatracing code | Open — no ticket yet | — |Priority Summary
backpressure: Items 5, 6main: Items 3 (TODOs), 4 (submodule)What Looks Good
ClientSessionClocksingleton withSystemNanoTime+ Mockito is a significant test infrastructure improvementExponentialBackoffTesthas good coverage of boundary conditions (jitter=0, jitter=1, cap enforcement)backpressure: truehandshake flag is a clean protocol extensionabortIfInTransaction()extraction reduces duplication and improves claritywithTransaction(60acf51d) now aligns the code with the spec algorithm, making correctness easier to verifytestRetryBackoffIsEnforced,testExponentialBackoffOnTransientError) provide functional validation of backoff behavior