Skip to content

feat(openai): route reasoning + tools to the Responses API (refs #785)#786

Open
andrew-woblavobla wants to merge 6 commits into
crmne:mainfrom
andrew-woblavobla:feat/openai-responses-reasoning-tools
Open

feat(openai): route reasoning + tools to the Responses API (refs #785)#786
andrew-woblavobla wants to merge 6 commits into
crmne:mainfrom
andrew-woblavobla:feat/openai-responses-reasoning-tools

Conversation

@andrew-woblavobla
Copy link
Copy Markdown

@andrew-woblavobla andrew-woblavobla commented May 29, 2026

What

Transparently routes with_thinking(effort:) + tools for OpenAI to /v1/responses — the only endpoint that accepts reasoning together with function tools for gpt-5.x / o-series. The default /v1/chat/completions path is unchanged (gated by @openai_responses_mode).

Why

OpenAI reasoning models 400 on reasoning_effort + function tools via chat/completions ("use /v1/responses instead"), so chat.with_thinking(effort:).with_tools(...) is impossible for the whole gpt-5 reasoning family. Details + repro in #785.

How (auto-route within the OpenAI provider)

  • OpenAI#render_payload sets @openai_responses_mode = instance_of?(OpenAI) && responses_api?(tools:, thinking:) (true only when both are present) and renders a Responses payload; completion_url / parse_completion_response branch on it. The Chat module's render_payload stays pure chat/completions, and the instance_of? guard keeps subclasses (Azure/OpenRouter/Mistral/Perplexity/xAI/GPUStack) on chat/completions — they have no /v1/responses.
  • New OpenAI::Responses module: request translation — input items (incl. function_call / function_call_output round-trip), flat {type:"function",…} tools, top-level reasoning:{effort:}, text.format for structured output, store:false; response parsing — output[]Message / ToolCall / Thinking + usage (incl. reasoning_tokens).
  • stream_response raises a clear error in responses mode (Responses SSE streaming not implemented yet).

Verified

  • Live: gpt-5.5 + with_thinking(:high) + a function tool completes the tool loop with reasoning tokens (previously a 400).
  • 18 keyless unit specs for the request/response translation + routing; the provider suite passes; RuboCop clean.

Known follow-ups

  • Responses streaming (different SSE event set)
  • input_image multimodal
  • Reasoning-item round-trip across turns (include: ["reasoning.encrypted_content"])

I'm open to design changes — e.g. extracting this into a dedicated :openai_responses provider rather than auto-routing within OpenAI, or any other shape you'd prefer.

Refs #785.

OpenAI reasoning models (gpt-5.x, o-series) reject `reasoning_effort`
together with function tools on /v1/chat/completions:
"Function tools with reasoning_effort are not supported for gpt-5.5 in
/v1/chat/completions. Please use /v1/responses instead." So
`chat.with_thinking(effort:).with_tools(...)` is impossible for the entire
gpt-5 reasoning family today.

This transparently routes that combo to /v1/responses inside the OpenAI
provider: render_payload sets @openai_responses_mode when thinking && tools,
and completion_url / parse_completion_response branch on it. The default
chat/completions path is unchanged (gated). Translates request (input items,
flat tools, reasoning:{effort:}, text.format) and response (output[] ->
Message/ToolCall/Thinking + usage).

Verified live against gpt-5.5: reasoning (88 reasoning tokens) + a function
tool complete in one turn.

Prototype scope — not yet implemented: Responses streaming (guarded), image
input, reasoning-item round-trip across turns, cassette tests.
Chat#render_payload is also called directly as a module function in specs
(RubyLLM::Providers::OpenAI::Chat.render_payload). Calling responses_api?
from there raised NoMethodError because that helper lives in OpenAI::Responses,
which is mixed into the provider *instance*, not the Chat module — breaking 3
schema render_payload specs.

Move the thinking+tools -> Responses routing into an OpenAI#render_payload
override (instance context, both modules mixed in); the Chat module's
render_payload is pure chat/completions again. Gate on instance_of?(OpenAI) so
the OpenAI subclasses (Azure/OpenRouter/Mistral/Perplexity/xAI/GPUStack) keep
chat/completions — they have no /v1/responses endpoint.

Re-verified live: gpt-5.5 + with_thinking + a function tool completes the tool
loop with reasoning tokens.
@codecov
Copy link
Copy Markdown

codecov Bot commented May 29, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 87.43%. Comparing base (5bdda1a) to head (135489a).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #786      +/-   ##
==========================================
+ Coverage   87.21%   87.43%   +0.21%     
==========================================
  Files         121      122       +1     
  Lines        5703     5802      +99     
  Branches     1442     1478      +36     
==========================================
+ Hits         4974     5073      +99     
  Misses        729      729              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…ion + routing)

Adds keyless unit specs for OpenAI::Responses: responses_api? gating, the
render_payload routing (thinking+tools -> /v1/responses; subclasses + no-thinking
stay on chat/completions), render_responses_payload request shape (input items,
flat tools, reasoning effort, instructions, text.format), function_call /
function_call_output round-trip, parse_responses_response (message/tool_call/
reasoning/usage + output_text fallback + error body), tool_choice, and the
streaming guard. Raises patch coverage flagged by Codecov.
@andrew-woblavobla andrew-woblavobla changed the title feat(openai): route reasoning + tools to the Responses API (prototype, refs #785) feat(openai): route reasoning + tools to the Responses API (refs #785) May 29, 2026
@andrew-woblavobla andrew-woblavobla marked this pull request as ready for review May 29, 2026 15:57
Adds cases for responses_tool_for provider_params deep-merge and the
responses_text_content Content/.text and to_s fallbacks — the 4 lines Codecov
flagged. responses.rb is now fully covered.
…ches

Exercises the last partial branches Codecov flagged: an assistant message with
text content rendering an output_text input item, and parse_responses_response
returning nil for an empty body.
Covers the 8 partial branches Codecov folded into patch %: render without
effort (no :reasoning), tool_prefs choice/parallel_calls, unknown output items,
non-output_text / non-summary_text content blocks, empty tool-call arguments,
and empty-content message build. responses.rb now 100% line + branch coverage.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant