Skip to content

Try to make integration tests run in parallel#1163

Draft
bzawisto wants to merge 3 commits intomagicblock-labs:masterfrom
bzawisto:parallel-integration-tests
Draft

Try to make integration tests run in parallel#1163
bzawisto wants to merge 3 commits intomagicblock-labs:masterfrom
bzawisto:parallel-integration-tests

Conversation

@bzawisto
Copy link
Copy Markdown

@bzawisto bzawisto commented Apr 18, 2026

Summary

  • Attempt to run all integration tests in parallel in run_tests.rs
  • CI matrix is no longer needed as single runner runs all of the tests in their own thread
  • Added non-conflicting config files for magic-block and Solana validator

This unfortunately doesn't nicely work due to some test constrains and behaviors - e.g. how long response is awaited before making the test as failed - in congested setup where there are many independent validators running in parallel some deadlines are not met. Also, some of the tests are flaky (e.g. 02_commit_and_undelegate.rs failing sporadically before this change).
It's been assumed that each test requires independent Solana or/and magicblock validator (since the original config files are different and have e.g. different programs laid out), thus validator has its own dir path + unique port(s) for rpc/ws and gossip/airdrops/etc.

Compatibility

  • No breaking changes
  • Config change (describe):
  • Migration needed (describe):

Testing

  • tests (or explain)

Checklist

  • docs updated (if needed)
  • closes #

Summary by CodeRabbit

  • Chores
    • Refactored integration test runner to execute test suites in parallel instead of sequentially.
    • Updated test configuration infrastructure to support dynamic endpoint and URL resolution via environment variables with sensible defaults.
    • Reorganized service port assignments across test configurations for improved isolation and resource management.

@bzawisto bzawisto marked this pull request as draft April 18, 2026 19:38
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 18, 2026

Warning

Rate limit exceeded

@bzawisto has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 41 minutes and 20 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 41 minutes and 20 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 609af4a0-b60b-4981-b146-51fd06fc0b89

📥 Commits

Reviewing files that changed from the base of the PR and between 8d7cd9b and 6429e0b.

⛔ Files ignored due to path filters (1)
  • test-integration/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (2)
  • test-integration/configs/schedule-task.devnet.toml
  • test-integration/test-runner/bin/run_tests.rs
📝 Walkthrough

Walkthrough

This pull request refactors the integration test infrastructure to support dynamic, environment-driven URL and port configuration instead of hardcoded constants. Changes include migrating fixed URL constants to environment-aware accessor functions with lazy initialization, reorganizing test configuration files with updated port assignments and storage paths, refactoring the test runner to execute test suites concurrently with per-suite configuration support, and updating test code across multiple modules to resolve RPC and WebSocket endpoints through an IntegrationTestContext utility. The validator startup process now supports per-suite ledger directories and deterministic port assignments derived from suite-specific RPC port values.

Suggested reviewers

  • Dodecahedr0x
  • GabrielePicco
  • thlorenz
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
test-integration/test-tools/src/integration_test_context.rs (1)

130-144: ⚠️ Potential issue | 🟡 Minor

Ephem WS URL is not overridden when using try_new_with_ephem_port.

try_new_with_ephem_port(port) rewrites only the HTTP ephemeral URL from the supplied port, while ws_url_ephem() still resolves from EPHEM_WS_URL/default and is then cached in its OnceLock globally. Any caller using both the port-based constructor and IntegrationTestContext::ws_url_ephem() (e.g., test-pubsub) will get a WS URL that does not correspond to port unless EPHEM_WS_URL is also set consistently by the caller/runner. Consider either deriving the WS URL from the port too (e.g., ws://localhost:{port+1}) and storing it on the instance, or documenting that callers must set EPHEM_WS_URL before the first call.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test-integration/test-tools/src/integration_test_context.rs` around lines 130
- 144, The ephemeral WS URL is not aligned when using try_new_with_ephem_port
because try_new_with_ephem_url only sets the HTTP ephem URL while ws_url_ephem()
resolves/caches EPHEM_WS_URL in a global OnceLock; update the constructor to
also set the WS URL derived from the port (e.g., ws://localhost:{port+1}) and
ensure the instance exposes/uses that value instead of the global OnceLock or
explicitly set the OnceLock during construction. Concretely, modify
try_new_with_ephem_port / try_new_with_ephem_url to compute ephem_ws (derived
from port), store it on the IntegrationTestContext instance (add a field like
ephem_ws_url) and change ws_url_ephem() callers to prefer self.ephem_ws_url (or,
if you must keep the global helper, set the OnceLock with the computed value
when constructing) so callers like test-pubsub get a matching WS URL.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test-integration/configs/schedule-task.ephem.toml`:
- Line 2: The remotes entry in schedule-task.ephem.toml still points at port
7799; update the remotes value to use the suite-specific chain port (9090) so
the ephemeral validator connects to the same chain as
schedule-task.devnet.toml—edit the remotes array in
test-integration/configs/schedule-task.ephem.toml (the "remotes" key) to replace
"http://0.0.0.0:7799" with "http://0.0.0.0:9090".

In `@test-integration/configs/schedulecommit-conf-fees.ephem.toml`:
- Line 2: Replace the bind-only address in the remotes entry so tests use a
routable loopback address: change remotes = ["http://0.0.0.0:9000"] to use
127.0.0.1 (e.g., "http://127.0.0.1:9000"); this file's remotes line is the only
change needed because ensure_websocket() (magicblock-config's ensure_websocket)
will still derive the correct ws:// endpoint automatically.

In `@test-integration/test-runner/bin/run_tests.rs`:
- Around line 64-87: The test runner currently may skip kill_validators() if a
panic occurs while building or running tests; create a panic-safe cleanup guard
(e.g., a small struct like ValidatorKillerGuard with a Drop impl) and
instantiate it before calling thread::scope/register_tests! so its Drop always
calls kill_validators(), ensuring validators are killed even if handles.join()
or assert_cargo_tests_passed panics; place the guard creation before the code
that produces results and keep existing explicit kill_validators() removal or
leave it as no-op to avoid double-kill.

In `@test-integration/test-tools/src/integration_test_context.rs`:
- Around line 1280-1302: The URL getters (url_ephem, ws_url_ephem, url_chain,
ws_url_chain) use OnceLock<String> to cache environment-resolved URLs per
process, which means the first-resolved value is reused for the lifetime of the
process; add a short comment above these functions (or a module-level comment
near the OnceLock usage) stating this per-process caching invariant and that
test suites must be run in separate child processes (as done by run_tests.rs) to
allow different CHAIN_URL/EPHEM_URL/*_WS_URL values for parallel suites, so
future maintainers don't attempt to reuse multiple suites in a single process
and inadvertently get stale URLs.

In `@test-integration/test-tools/src/toml_to_args.rs`:
- Around line 56-78: The port-derivation code (faucet_port, gossip_port, slot,
dyn_start, dyn_end) assumes rpc_port meets undocumented invariants (>=9000,
multiple of 10, unique) and currently panics or silently collides; add explicit
validation at the start (check rpc_port >= 9000 and rpc_port % 10 == 0) and
return an error instead of panicking when violated (or reject the default 8899
from rpc_port_from_config), derive gossip_port from the same slot-based scheme
instead of rpc_port - 200 to avoid external collisions, increase the dynamic
range allocation (e.g., from 50 to 100) or leave a TODO comment, and add a short
doc comment near this block stating the required invariant and why rpc_port must
be spaced by 10 so future changes won’t reintroduce collisions.
- Around line 90-91: The ledger path is currently built as a relative path via
format!("test-ledger/{suite_name}") which depends on the process cwd; change
callers to pass and toml_to_args.rs to emit an absolute ledger path instead so
both start_test_validator_with_config (which sets current_dir(root_dir) and
pre-creates root_dir.join("test-ledger").join(suite_name)) and validator.rs
(which uses create_dir_all) reference the same location. Update the API that
constructs arguments (trace where toml_to_args::... constructs the "--ledger"
value) to accept/receive a Path or String representing
root_dir.join("test-ledger").join(suite_name).canonicalize() or .absolutize(),
and replace format!("test-ledger/{suite_name}") with that absolute path string
so the validator and test setup stay in sync.

In `@test-integration/test-tools/src/validator.rs`:
- Around line 86-92: The ledger directory created by
root_dir.join("test-ledger").join(suite_name) via create_dir_all can contain
leftover files from prior runs; update the setup in the validator to remove any
existing ledger at ledger_path before recreating it: call
fs::remove_dir_all(&ledger_path) (handling errors similar to the existing
unwrap_or_else panic) and then call fs::create_dir_all(&ledger_path) to ensure a
fresh, deterministic ledger directory for each run (refer to the ledger_path
variable and the existing create_dir_all call to locate where to insert the
removal).

---

Outside diff comments:
In `@test-integration/test-tools/src/integration_test_context.rs`:
- Around line 130-144: The ephemeral WS URL is not aligned when using
try_new_with_ephem_port because try_new_with_ephem_url only sets the HTTP ephem
URL while ws_url_ephem() resolves/caches EPHEM_WS_URL in a global OnceLock;
update the constructor to also set the WS URL derived from the port (e.g.,
ws://localhost:{port+1}) and ensure the instance exposes/uses that value instead
of the global OnceLock or explicitly set the OnceLock during construction.
Concretely, modify try_new_with_ephem_port / try_new_with_ephem_url to compute
ephem_ws (derived from port), store it on the IntegrationTestContext instance
(add a field like ephem_ws_url) and change ws_url_ephem() callers to prefer
self.ephem_ws_url (or, if you must keep the global helper, set the OnceLock with
the computed value when constructing) so callers like test-pubsub get a matching
WS URL.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 35cb0369-92f4-4302-ba3a-0c30e589b57a

📥 Commits

Reviewing files that changed from the base of the PR and between 6d77e7d and 8d7cd9b.

⛔ Files ignored due to path filters (1)
  • test-integration/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (41)
  • .github/workflows/ci-test-integration.yml
  • magicblock-chainlink/src/testing/chain_pubsub.rs
  • magicblock-chainlink/src/testing/utils.rs
  • magicblock-committor-service/src/config.rs
  • test-integration/configs/api-conf.ephem.toml
  • test-integration/configs/chainlink-conf.devnet.toml
  • test-integration/configs/cloning-conf.devnet.toml
  • test-integration/configs/cloning-conf.ephem.toml
  • test-integration/configs/committor-conf.devnet.toml
  • test-integration/configs/config-conf.devnet.toml
  • test-integration/configs/magicblock-api-chain.devnet.toml
  • test-integration/configs/pubsub-chain.devnet.toml
  • test-integration/configs/pubsub-ephem.toml
  • test-integration/configs/restore-ledger-conf.devnet.toml
  • test-integration/configs/schedule-intents-chain.devnet.toml
  • test-integration/configs/schedule-intents-ephem.toml
  • test-integration/configs/schedule-task.devnet.toml
  • test-integration/configs/schedule-task.ephem.toml
  • test-integration/configs/schedulecommit-conf-fees.ephem.toml
  • test-integration/configs/schedulecommit-conf.devnet.toml
  • test-integration/configs/schedulecommit-conf.ephem.toml
  • test-integration/test-chainlink/src/ixtest_context.rs
  • test-integration/test-chainlink/tests/chain_pubsub_client.rs
  • test-integration/test-chainlink/tests/ix_programs.rs
  • test-integration/test-chainlink/tests/ix_remote_account_provider.rs
  • test-integration/test-committor-service/Cargo.toml
  • test-integration/test-committor-service/tests/common.rs
  • test-integration/test-committor-service/tests/test_ix_commit_local.rs
  • test-integration/test-committor-service/tests/utils/transactions.rs
  • test-integration/test-magicblock-api/src/lib.rs
  • test-integration/test-magicblock-api/tests/test_claim_fees.rs
  • test-integration/test-magicblock-api/tests/test_domain_registry.rs
  • test-integration/test-pubsub/src/lib.rs
  • test-integration/test-runner/bin/run_tests.rs
  • test-integration/test-runner/src/cleanup.rs
  • test-integration/test-table-mania/Cargo.toml
  • test-integration/test-table-mania/tests/ix_lookup_table.rs
  • test-integration/test-table-mania/tests/utils/mod.rs
  • test-integration/test-tools/src/integration_test_context.rs
  • test-integration/test-tools/src/toml_to_args.rs
  • test-integration/test-tools/src/validator.rs

@@ -1,5 +1,5 @@
lifecycle = "ephemeral"
remotes = ["http://0.0.0.0:7799", "ws://0.0.0.0:7800"]
remotes = ["http://0.0.0.0:7799"]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Point schedule-task ephem at its suite-specific chain port.

Line 2 still targets 7799, but test-integration/configs/schedule-task.devnet.toml now listens on 9090. This can make the ephemeral validator connect to the wrong/default chain instance when suites run in parallel.

Proposed fix
-remotes = ["http://0.0.0.0:7799"]
+remotes = ["http://0.0.0.0:9090"]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
remotes = ["http://0.0.0.0:7799"]
remotes = ["http://0.0.0.0:9090"]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test-integration/configs/schedule-task.ephem.toml` at line 2, The remotes
entry in schedule-task.ephem.toml still points at port 7799; update the remotes
value to use the suite-specific chain port (9090) so the ephemeral validator
connects to the same chain as schedule-task.devnet.toml—edit the remotes array
in test-integration/configs/schedule-task.ephem.toml (the "remotes" key) to
replace "http://0.0.0.0:7799" with "http://0.0.0.0:9090".

@@ -1,8 +1,9 @@
lifecycle = "ephemeral"
remotes = ["http://0.0.0.0:7799", "ws://0.0.0.0:7800"]
remotes = ["http://0.0.0.0:9000"]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
rg -nP -C3 '\bremotes\b' magicblock-config/ magicblock-chainlink/ magicblock-api/

Repository: magicblock-labs/magicblock-validator

Length of output: 8305


🏁 Script executed:

# Find where ensure_websocket and ensure_http are called
rg -n 'ensure_websocket|ensure_http' magicblock-config/

Repository: magicblock-labs/magicblock-validator

Length of output: 718


🏁 Script executed:

# Check the Config loading/initialization logic
rg -n -A20 'impl.*Config' magicblock-config/src/lib.rs | head -100

Repository: magicblock-labs/magicblock-validator

Length of output: 62


🏁 Script executed:

# Look at the Remote::to_websocket implementation
rg -n -A10 'to_websocket' magicblock-config/src/

Repository: magicblock-labs/magicblock-validator

Length of output: 3272


🏁 Script executed:

# Check if ephemeral lifecycle has special handling
rg -n -B2 -A10 'LifecycleMode::Ephemeral' magicblock-config/

Repository: magicblock-labs/magicblock-validator

Length of output: 2195


🏁 Script executed:

# Get context around lines 108-109 to see what method they're in
sed -n '100,115p' magicblock-config/src/lib.rs

Repository: magicblock-labs/magicblock-validator

Length of output: 660


🏁 Script executed:

# Check the actual config file mentioned in the review
cat test-integration/configs/schedulecommit-conf-fees.ephem.toml | head -20

Repository: magicblock-labs/magicblock-validator

Length of output: 780


🏁 Script executed:

# Check how 0.0.0.0 is handled - search for usage patterns and any special address handling
rg -n '0\.0\.0\.0' magicblock-config/ magicblock-api/ test-integration/

Repository: magicblock-labs/magicblock-validator

Length of output: 4721


WS endpoint auto-derivation is working correctly.

The single HTTP remote will automatically derive the WS endpoint through ensure_websocket(), which is called during config loading and converts http://0.0.0.0:9000 to ws://0.0.0.0:9000/ by changing the scheme. The implementation at magicblock-config/src/types/network.rs:132-142 handles this conversion correctly for all lifecycle modes including Ephemeral.

However, the 0.0.0.0 address is a bind-only address and cannot be used as a routable client target on Windows/macOS. While this pattern is used throughout the test integration configs, 127.0.0.1 would be safer and more portable for these test configurations.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test-integration/configs/schedulecommit-conf-fees.ephem.toml` at line 2,
Replace the bind-only address in the remotes entry so tests use a routable
loopback address: change remotes = ["http://0.0.0.0:9000"] to use 127.0.0.1
(e.g., "http://127.0.0.1:9000"); this file's remotes line is the only change
needed because ensure_websocket() (magicblock-config's ensure_websocket) will
still derive the correct ws:// endpoint automatically.

Comment thread test-integration/test-runner/bin/run_tests.rs Outdated
Comment on lines 1280 to 1302
pub fn url_ephem() -> &'static str {
URL_EPHEM
static URL: OnceLock<String> = OnceLock::new();
URL.get_or_init(|| resolve_url("EPHEM_URL", DEFAULT_URL_EPHEM))
.as_str()
}
pub fn ws_url_ephem() -> &'static str {
static URL: OnceLock<String> = OnceLock::new();
URL.get_or_init(|| resolve_url("EPHEM_WS_URL", DEFAULT_WS_URL_EPHEM))
.as_str()
}
pub fn url_local_ephem_at_port(port: u16) -> String {
format!("http://localhost:{}", port)
}
pub fn url_chain() -> &'static str {
URL_CHAIN
static URL: OnceLock<String> = OnceLock::new();
URL.get_or_init(|| resolve_url("CHAIN_URL", DEFAULT_URL_CHAIN))
.as_str()
}
pub fn ws_url_chain() -> &'static str {
WS_URL_CHAIN
static URL: OnceLock<String> = OnceLock::new();
URL.get_or_init(|| resolve_url("CHAIN_WS_URL", DEFAULT_WS_URL_CHAIN))
.as_str()
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

OnceLock-cached URLs assume per-process isolation.

Since these getters cache the env-resolved URL globally per process, running suites in parallel only works because run_tests.rs spawns each suite as a separate child process with its own CHAIN_URL/EPHEM_URL/*_WS_URL. Worth a short comment here noting this invariant so future refactors don't attempt to invoke multiple suites within a single process (which would silently reuse the first-resolved URLs).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test-integration/test-tools/src/integration_test_context.rs` around lines
1280 - 1302, The URL getters (url_ephem, ws_url_ephem, url_chain, ws_url_chain)
use OnceLock<String> to cache environment-resolved URLs per process, which means
the first-resolved value is reused for the lifetime of the process; add a short
comment above these functions (or a module-level comment near the OnceLock
usage) stating this per-process caching invariant and that test suites must be
run in separate child processes (as done by run_tests.rs) to allow different
CHAIN_URL/EPHEM_URL/*_WS_URL values for parallel suites, so future maintainers
don't attempt to reuse multiple suites in a single process and inadvertently get
stale URLs.

Comment on lines +56 to +78
// Default ports (faucet 9900, gossip 8000, dynamic 1024-65535) collide
// when multiple chain validators run concurrently. Derive unique, non-
// overlapping port zones from rpc_port (which is already unique per suite
// via the TOML configs, spaced by 10).
let faucet_port = rpc_port
.checked_add(1000)
.expect("rpc_port + 1000 overflows u16");
let gossip_port = rpc_port
.checked_sub(200)
.expect("rpc_port - 200 underflows u16");
// Solana requires a minimum dynamic-port-range (~26 ports). Give each
// suite 50 ports, allocated by its "slot" — derived from rpc_port which
// is spaced by 10 starting at 9000 in the TOML configs.
let slot = rpc_port
.checked_sub(9000)
.expect("rpc_port must be >= 9000")
/ 10;
let dyn_start = 11000u16
.checked_add(slot.checked_mul(50).expect("slot * 50 overflows u16"))
.expect("dynamic-port-range start overflows u16");
let dyn_end = dyn_start
.checked_add(49)
.expect("dynamic-port-range end overflows u16");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fragile port-derivation scheme coupled to an undocumented TOML convention.

The derivation assumes every chain-side config satisfies: rpc_port >= 9000, rpc_port aligned to a multiple of 10, and rpc_port unique per suite. None of this is validated at the point of the computation, and violations surface as panics ("rpc_port must be >= 9000", overflow expects) or silent port collisions (two configs with rpc_port in the same decade share a slot and thus the same dynamic range, faucet_port differs by <10 etc.).

A few concrete risks worth addressing:

  • rpc_port_from_config falls back to 8899 when no aperture config is present or parsing fails (line 140). That default would immediately panic here via checked_sub(9000). Either reject such configs explicitly or pick a default that satisfies the invariant.
  • gossip_port = rpc_port - 200 can collide with arbitrary user/system services (e.g. 8860 for 9060). Consider deriving gossip from the same slot-based scheme used for the dynamic range.
  • The 50-port dynamic range is close to the documented Solana minimum (~26); under load (many concurrent TPU/TVU sockets) this may exhaust. Worth leaving a TODO or bumping to e.g. 100.
  • Document the invariant (rpc_port ∈ [9000, …], multiple of 10, unique) near this code or in the configs, so future additions don't silently reintroduce collisions the PR is trying to eliminate.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test-integration/test-tools/src/toml_to_args.rs` around lines 56 - 78, The
port-derivation code (faucet_port, gossip_port, slot, dyn_start, dyn_end)
assumes rpc_port meets undocumented invariants (>=9000, multiple of 10, unique)
and currently panics or silently collides; add explicit validation at the start
(check rpc_port >= 9000 and rpc_port % 10 == 0) and return an error instead of
panicking when violated (or reject the default 8899 from rpc_port_from_config),
derive gossip_port from the same slot-based scheme instead of rpc_port - 200 to
avoid external collisions, increase the dynamic range allocation (e.g., from 50
to 100) or leave a TODO comment, and add a short doc comment near this block
stating the required invariant and why rpc_port must be spaced by 10 so future
changes won’t reintroduce collisions.

Comment on lines +90 to +91
"--ledger".to_string(),
format!("test-ledger/{suite_name}"),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Ledger path is relative — relies on current_dir being root_dir.

format!("test-ledger/{suite_name}") is interpreted relative to the process cwd. start_test_validator_with_config sets current_dir(root_dir) and pre-creates root_dir.join("test-ledger").join(suite_name), so this works today — but the coupling is implicit. If anyone ever launches the validator without setting current_dir, the ledger will be created under an unrelated directory while create_dir_all in validator.rs created a different one. Consider passing the absolute path to keep the two sites in sync:

Proposed change (requires threading the absolute path in from validator.rs)
-        "--ledger".to_string(),
-        format!("test-ledger/{suite_name}"),
+        "--ledger".to_string(),
+        ledger_path.to_string_lossy().into_owned(),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test-integration/test-tools/src/toml_to_args.rs` around lines 90 - 91, The
ledger path is currently built as a relative path via
format!("test-ledger/{suite_name}") which depends on the process cwd; change
callers to pass and toml_to_args.rs to emit an absolute ledger path instead so
both start_test_validator_with_config (which sets current_dir(root_dir) and
pre-creates root_dir.join("test-ledger").join(suite_name)) and validator.rs
(which uses create_dir_all) reference the same location. Update the API that
constructs arguments (trace where toml_to_args::... constructs the "--ledger"
value) to accept/receive a Path or String representing
root_dir.join("test-ledger").join(suite_name).canonicalize() or .absolutize(),
and replace format!("test-ledger/{suite_name}") with that absolute path string
so the validator and test setup stay in sync.

Comment on lines +86 to +92
let ledger_path = root_dir.join("test-ledger").join(suite_name);
fs::create_dir_all(&ledger_path).unwrap_or_else(|err| {
panic!(
"Failed to create ledger directory '{}': {err}",
ledger_path.display()
)
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Ledger directory is not cleaned between runs.

create_dir_all is idempotent but will happily reuse a pre-existing test-ledger/{suite_name} from a previous run. Since solana-test-validator is launched with -r (reset), the ledger is rebuilt, but any other stray files under that path persist across runs. For a fully deterministic parallel setup (which this PR is trying to stabilize), consider removing the directory first:

Proposed tweak
 let ledger_path = root_dir.join("test-ledger").join(suite_name);
+let _ = fs::remove_dir_all(&ledger_path);
 fs::create_dir_all(&ledger_path).unwrap_or_else(|err| {
     panic!(
         "Failed to create ledger directory '{}': {err}",
         ledger_path.display()
     )
 });
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let ledger_path = root_dir.join("test-ledger").join(suite_name);
fs::create_dir_all(&ledger_path).unwrap_or_else(|err| {
panic!(
"Failed to create ledger directory '{}': {err}",
ledger_path.display()
)
});
let ledger_path = root_dir.join("test-ledger").join(suite_name);
let _ = fs::remove_dir_all(&ledger_path);
fs::create_dir_all(&ledger_path).unwrap_or_else(|err| {
panic!(
"Failed to create ledger directory '{}': {err}",
ledger_path.display()
)
});
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test-integration/test-tools/src/validator.rs` around lines 86 - 92, The
ledger directory created by root_dir.join("test-ledger").join(suite_name) via
create_dir_all can contain leftover files from prior runs; update the setup in
the validator to remove any existing ledger at ledger_path before recreating it:
call fs::remove_dir_all(&ledger_path) (handling errors similar to the existing
unwrap_or_else panic) and then call fs::create_dir_all(&ledger_path) to ensure a
fresh, deterministic ledger directory for each run (refer to the ledger_path
variable and the existing create_dir_all call to locate where to insert the
removal).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant