feat: add code-execution routing mode (GCORE_MCP_ROUTING=code_exec)#13
feat: add code-execution routing mode (GCORE_MCP_ROUTING=code_exec)#13algis-dumbris wants to merge 8 commits into
Conversation
Add design doc for a new GCORE_MCP_ROUTING=code_exec mode that exposes three meta-tools (search_tools, get_tool_schema, execute_code) backed by a Pydantic Monty sandbox, replacing the ~700-tool registration. Direct mode remains available as an opt-out.
Move the SDK-result serializer out of server.py into a small shared module so the upcoming code_exec dispatcher can reuse it. Renamed _serialize_result → serialize_result since it's now a public helper. Behavior unchanged.
- pydantic-monty (>=0.0.17,<0.1) — embedded secure Python interpreter used by the new code_exec mode to safely run LLM-generated scripts. - rank-bm25 (>=0.2,<1) — BM25 search index over the SDK catalog so the search_tools meta-tool can rank ~700 SDK methods by relevance. Both ship pre-built wheels for macOS, Linux, and Windows × CPython 3.10-3.14; combined install adds ~8 MB.
Introduce a new routing mode that exposes three meta-tools
(search_tools, get_tool_schema, execute_code) instead of registering
each SDK method individually. The LLM-generated Python runs in a
Pydantic Monty sandbox and calls SDK methods via host-injected
call_tool().
Package layout:
- code_exec/catalog.py — ToolEntry + BM25-indexed Catalog
- code_exec/dispatch.py — make_call_tool() with auto-injection of
project_id and region_id
- code_exec/runner.py — execute_code() driving Pydantic Monty with
result/stream truncation and typed ExecResult
- code_exec/meta_tools.py — registers the three meta-tools on FastMCP
Selected via GCORE_MCP_ROUTING={code_exec|direct}, parsed in
config/settings.py:get_routing_mode(). Default is code_exec; set
GCORE_MCP_ROUTING=direct to restore the legacy ~700-tool surface.
- 41 unit tests in tests/code_exec/ covering Catalog (build + BM25 search + boosts + get_schema), dispatch (call_tool injection + awaitable handling + error paths), result/stream truncation, Pydantic Monty runner integration with stub catalogs, register meta_tools, and get_routing_mode env parsing. - 6 e2e tests against the real Gcore API. They auto-load credentials from ../gcore-terraform/.env so local devs can run them without shell config; CI skips when no real key is available. The e2e real_client fixture clears the SDK introspection cache so methods rebind to the real client (other test files run earlier may have poisoned the cache with a dummy-keyed client).
Add a Routing modes section near the top of the README so users discover code_exec as the new default and know how to opt out via GCORE_MCP_ROUTING=direct. Lists the sandbox's supported / unsupported Python features and default resource limits.
There was a problem hiding this comment.
Pull request overview
Adds a new default “code execution” routing mode to avoid MCP tool-list overflow by exposing only three meta-tools (search/schema/code execution) while preserving the existing per-SDK-method tool registration in a legacy direct mode.
Changes:
- Introduces
GCORE_MCP_ROUTINGwithcode_exec(new default) vsdirect(legacy) and updates server startup/registration accordingly. - Adds a
code_execpackage implementing SDK catalog + search, host-side dispatch (call_tool), and Monty sandbox execution with truncation. - Adds dependencies (
pydantic-monty,rank-bm25) plus a dedicatedtests/code_exec/suite and README/docs updates.
Reviewed changes
Copilot reviewed 19 out of 21 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
uv.lock |
Locks new dependencies (Monty, BM25 + transitive deps like numpy) and updates resolution metadata. |
pyproject.toml |
Adds pydantic-monty and rank-bm25 runtime dependencies. |
README.md |
Documents routing modes, default behavior, and sandbox capabilities/limitations. |
gcore_mcp_server/server.py |
Branches tool registration by routing mode; registers 3 meta-tools in code_exec and preserves legacy direct behavior. |
gcore_mcp_server/core/serialize.py |
Extracts shared result serialization for reuse by the sandbox dispatcher. |
gcore_mcp_server/config/settings.py |
Adds routing-mode parsing (GCORE_MCP_ROUTING) with normalization + warning fallback. |
gcore_mcp_server/code_exec/__init__.py |
Exposes the code-exec public surface (catalog/runner/dispatch/meta-tools). |
gcore_mcp_server/code_exec/catalog.py |
Builds SDK catalog via introspection; BM25 search + schema export. |
gcore_mcp_server/code_exec/dispatch.py |
Implements host-side call_tool with project/region auto-injection + serialization. |
gcore_mcp_server/code_exec/meta_tools.py |
Registers search_tools, get_tool_schema, execute_code on the FastMCP server. |
gcore_mcp_server/code_exec/runner.py |
Runs Monty sandbox, captures stdout/stderr, enforces limits, truncates outputs/results. |
docs/superpowers/specs/2026-05-15-code-execution-mode-design.md |
Design spec describing the new routing mode, tool surface, architecture, and risks. |
tests/code_exec/__init__.py |
Marks the code_exec test package. |
tests/code_exec/conftest.py |
Sets anyio backend for async tests in this suite. |
tests/code_exec/test_catalog.py |
Verifies catalog build from real SDK + search/schema behavior. |
tests/code_exec/test_dispatch.py |
Tests call_tool dispatch, async handling, injection rules, and serialization. |
tests/code_exec/test_e2e_real_api.py |
Optional real-API e2e coverage (skipped unless real creds are available). |
tests/code_exec/test_meta_tools.py |
Ensures exactly three meta-tools are registered and callable. |
tests/code_exec/test_runner.py |
Validates sandbox execution contract (stdout, errors, timeout, truncation, async). |
tests/code_exec/test_settings_routing.py |
Tests routing-mode env parsing and warning fallback. |
tests/code_exec/test_truncation.py |
Tests truncation helpers for unicode safety and marker behavior. |
Comments suppressed due to low confidence (1)
gcore_mcp_server/code_exec/runner.py:131
- In dict truncation, the budget check only triggers when
dis non-empty (and d). If the first key/value pair exceeds the remaining budget, it will still be included, the budget can go negative, and no_truncatedmarker /hit=Truewill be produced. Handle the empty-dict case explicitly so oversized first entries still yield a truncated result within budget.
entry_size = _json_size({key: trimmed})
if state.budget - entry_size < 0 and d:
state.hit = True
dropped = total_keys - idx
d["_truncated"] = True
d["_dropped_items"] = dropped
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| short_name: str | ||
| doc_short: str | ||
| doc_full: str | ||
| params: list[ParamInfo] = field(default_factory=list[ParamInfo]) |
| size = _json_size(trimmed) | ||
| if state.budget - size < 0 and out: | ||
| state.hit = True | ||
| dropped = total - idx | ||
| out.append({"_truncated": True, "_dropped_items": dropped}) | ||
| return out |
| # Scalars (or anything else) – just account for size; do not trim mid-string. | ||
| state.budget -= _json_size(value) | ||
| return value |
| state = _TruncationState(budget=int(max_bytes)) | ||
| trimmed = _truncate_value(value, state) | ||
| return trimmed, state.hit |
SDK methods that take their own `name` argument (e.g.
cloud.ssh_keys.create) collided with the dispatcher's first parameter,
so `call_tool('cloud.ssh_keys.create', name='k')` raised
"Multiple values provided for parameter name" from Monty's type
checker. Made the tool selector positional-only (`tool_name`, /) in
both the dispatcher and the TYPE_STUBS so a `name` keyword is
forwarded to the SDK method. Found via real-API create+delete testing
through mcpproxy. Adds a regression test.
Valid findings fixed in runner.py: - _truncate_value: drop the `and out`/`and d` guard that let an oversized FIRST list/dict element through with the budget going negative and `hit` left False. Containers now check the budget before descending into each child and emit a _truncated marker once exhausted. - Oversized scalar strings are now truncated in place via _truncate_bytes instead of being returned whole; every over-budget path sets state.hit so ExecResult.truncated can no longer report False while the byte cap was breached. - Budget is decremented exactly once per scalar leaf (containers no longer double-account children). - _truncate_for_return annotated -> tuple[Any, bool] with corrected docstring. Added regression tests for the oversized-first-element and oversized-scalar-string cases. catalog.py: kept `field(default_factory=list[ParamInfo])` (added a clarifying comment). The Copilot suggestion to use bare `list` is incorrect here — `list[ParamInfo]()` does not raise (GenericAlias is callable and returns []), and bare `list` makes pyright strict mode report `params` as partially-unknown.
|
Addressed the Copilot review in bfe6bc2. runner.py — all three valid, fixed:
catalog.py:36 — not changed (respectfully disagree): the comment states All 50 |
Summary
Adds a new routing mode that exposes 3 meta-tools instead of registering all ~700 SDK methods individually, addressing the tool-list overflow that prevents most LLM clients from connecting the server with all tools enabled.
GCORE_MCP_ROUTINGacceptscode_exec(new default) ordirect(legacy).code_execmode exposessearch_tools(query),get_tool_schema(name), andexecute_code(code). The LLM-generated Python runs in a Pydantic Monty sandbox and reaches the SDK only through a host-injectedcall_tool().directmode preserves today's behavior byte-for-byte.Branched off the design spec PR; the spec commit (
docs/superpowers/specs/2026-05-15-code-execution-mode-design.md) is included in this branch's history.Files
gcore_mcp_server/code_exec/— new package:catalog.py,dispatch.py,runner.py,meta_tools.pygcore_mcp_server/core/serialize.py— extracted fromserver.pyso the dispatcher can reuse itgcore_mcp_server/config/settings.py— addsGCORE_MCP_ROUTINGparsergcore_mcp_server/server.py— branches registration on routing modepyproject.toml— addspydantic-monty>=0.0.17,<0.1andrank-bm25>=0.2,<1(~8 MB combined wheel size, pre-built wheels on macOS/Linux/Windows × CPython 3.10-3.14)tests/code_exec/— 41 unit tests + 6 e2e tests against the real Gcore APIREADME.md— adds Routing modes sectiondocs/superpowers/specs/2026-05-15-code-execution-mode-design.md— design docTest plan
uv run ruff format .— cleanuv run ruff check .— passesuv run pyright gcore_mcp_server/code_exec/ gcore_mcp_server/core/serialize.py gcore_mcp_server/config/settings.py— 0 errors in new code (pre-existing pyright errors in legacymake_wrapperare untouched)uv run pytest tests/code_exec/— 47 passed (41 unit + 6 e2e)uv run pytest tests/test_schema.py tests/test_inspection.py tests/test_pattern_filtering.py— 38 passed (no regression in pre-existing tests)GCORE_MCP_ROUTING=code_exec→ "Registered 3 meta-tools over 668 SDK methods"GCORE_MCP_ROUTING=direct GCORE_TOOLS=management→ "Registered 17 tools" (unchanged from main)../gcore-terraform/.envcredentials):search_tools("list regions")returnscloud.regions.listin top 5await call_tool('cloud.regions.list')succeeds and returns real region IDscall_tool('nope.does.not.exist')surfaces a clean KeyErrorMigration / compatibility
code_execis the new default. Existing clients with hard-coded tool names will see a different surface. The one-line opt-out isGCORE_MCP_ROUTING=direct. Documented in the README "Routing modes" section.code_execmode,GCORE_TOOLSis logged as ignored — catalog filtering doesn't apply when only 3 meta-tools are registered.Sandbox notes (for clients that will use
code_exec)async/await, comprehensions, exceptions, stdlibjson/re/datetime.class,with,import,match, generators._truncatedmarker so the model can re-query with narrower filters).call_toolbut cannot read the key.