feat(openai): route reasoning + tools to the Responses API (refs #785)#786
Open
andrew-woblavobla wants to merge 6 commits into
Open
feat(openai): route reasoning + tools to the Responses API (refs #785)#786andrew-woblavobla wants to merge 6 commits into
andrew-woblavobla wants to merge 6 commits into
Conversation
OpenAI reasoning models (gpt-5.x, o-series) reject `reasoning_effort`
together with function tools on /v1/chat/completions:
"Function tools with reasoning_effort are not supported for gpt-5.5 in
/v1/chat/completions. Please use /v1/responses instead." So
`chat.with_thinking(effort:).with_tools(...)` is impossible for the entire
gpt-5 reasoning family today.
This transparently routes that combo to /v1/responses inside the OpenAI
provider: render_payload sets @openai_responses_mode when thinking && tools,
and completion_url / parse_completion_response branch on it. The default
chat/completions path is unchanged (gated). Translates request (input items,
flat tools, reasoning:{effort:}, text.format) and response (output[] ->
Message/ToolCall/Thinking + usage).
Verified live against gpt-5.5: reasoning (88 reasoning tokens) + a function
tool complete in one turn.
Prototype scope — not yet implemented: Responses streaming (guarded), image
input, reasoning-item round-trip across turns, cassette tests.
Chat#render_payload is also called directly as a module function in specs (RubyLLM::Providers::OpenAI::Chat.render_payload). Calling responses_api? from there raised NoMethodError because that helper lives in OpenAI::Responses, which is mixed into the provider *instance*, not the Chat module — breaking 3 schema render_payload specs. Move the thinking+tools -> Responses routing into an OpenAI#render_payload override (instance context, both modules mixed in); the Chat module's render_payload is pure chat/completions again. Gate on instance_of?(OpenAI) so the OpenAI subclasses (Azure/OpenRouter/Mistral/Perplexity/xAI/GPUStack) keep chat/completions — they have no /v1/responses endpoint. Re-verified live: gpt-5.5 + with_thinking + a function tool completes the tool loop with reasoning tokens.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #786 +/- ##
==========================================
+ Coverage 87.21% 87.43% +0.21%
==========================================
Files 121 122 +1
Lines 5703 5802 +99
Branches 1442 1478 +36
==========================================
+ Hits 4974 5073 +99
Misses 729 729 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…ion + routing) Adds keyless unit specs for OpenAI::Responses: responses_api? gating, the render_payload routing (thinking+tools -> /v1/responses; subclasses + no-thinking stay on chat/completions), render_responses_payload request shape (input items, flat tools, reasoning effort, instructions, text.format), function_call / function_call_output round-trip, parse_responses_response (message/tool_call/ reasoning/usage + output_text fallback + error body), tool_choice, and the streaming guard. Raises patch coverage flagged by Codecov.
Adds cases for responses_tool_for provider_params deep-merge and the responses_text_content Content/.text and to_s fallbacks — the 4 lines Codecov flagged. responses.rb is now fully covered.
…ches Exercises the last partial branches Codecov flagged: an assistant message with text content rendering an output_text input item, and parse_responses_response returning nil for an empty body.
Covers the 8 partial branches Codecov folded into patch %: render without effort (no :reasoning), tool_prefs choice/parallel_calls, unknown output items, non-output_text / non-summary_text content blocks, empty tool-call arguments, and empty-content message build. responses.rb now 100% line + branch coverage.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Transparently routes
with_thinking(effort:) + toolsfor OpenAI to/v1/responses— the only endpoint that acceptsreasoningtogether with function tools for gpt-5.x / o-series. The default/v1/chat/completionspath is unchanged (gated by@openai_responses_mode).Why
OpenAI reasoning models 400 on
reasoning_effort+ function tools via chat/completions ("use /v1/responses instead"), sochat.with_thinking(effort:).with_tools(...)is impossible for the whole gpt-5 reasoning family. Details + repro in #785.How (auto-route within the OpenAI provider)
OpenAI#render_payloadsets@openai_responses_mode = instance_of?(OpenAI) && responses_api?(tools:, thinking:)(true only when both are present) and renders a Responses payload;completion_url/parse_completion_responsebranch on it. TheChatmodule'srender_payloadstays pure chat/completions, and theinstance_of?guard keeps subclasses (Azure/OpenRouter/Mistral/Perplexity/xAI/GPUStack) on chat/completions — they have no/v1/responses.OpenAI::Responsesmodule: request translation —inputitems (incl.function_call/function_call_outputround-trip), flat{type:"function",…}tools, top-levelreasoning:{effort:},text.formatfor structured output,store:false; response parsing —output[]→Message/ToolCall/Thinking+ usage (incl.reasoning_tokens).stream_responseraises a clear error in responses mode (Responses SSE streaming not implemented yet).Verified
gpt-5.5+with_thinking(:high)+ a function tool completes the tool loop with reasoning tokens (previously a 400).Known follow-ups
input_imagemultimodalinclude: ["reasoning.encrypted_content"])I'm open to design changes — e.g. extracting this into a dedicated
:openai_responsesprovider rather than auto-routing withinOpenAI, or any other shape you'd prefer.Refs #785.