feat: detect always-thinking models and surface them as always-on in the model selector#613
Draft
7Sageer wants to merge 7 commits into
Draft
feat: detect always-thinking models and surface them as always-on in the model selector#6137Sageer wants to merge 7 commits into
7Sageer wants to merge 7 commits into
Conversation
🦋 Changeset detectedLatest commit: 8aa99f0 The changes in this PR will be included in the next version bump. This PR includes changesets to release 7 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
commit: |
Some models cannot run with thinking turned off — claude-fable-5 today, where adaptive thinking is always on. The TUI model selector already renders an always-on state for aliases whose capabilities include 'always_thinking', but nothing populated it automatically. - kosong: add ModelCapability.always_thinking; register claude-fable prefixes in a dedicated always-thinking capability group; map the catalog extension field always_reasoning onto it; export getProviderModelCapability for pure (no provider construction) capability lookups - node-sdk: materialize always_thinking when converting catalog models to config aliases - agent-core: enrich runtime model aliases with detected always_thinking in loadRuntimeConfig (runtime-only; write-back reads the file from disk, so nothing is persisted); pass the field through resolveModelCapabilities
Review follow-ups on the always-thinking capability flow: - capabilities strings are now a refinement, not a replacement: always_thinking is always spelled out alongside thinking, in both producers (catalog materialization and runtime detection), so consumers checking plain 'thinking' membership (e.g. the ACP model catalog) keep working - setKimiConfig strips detected-only capabilities from incoming patches before merging onto the on-disk config. Callers commonly write back models obtained from getConfig(), whose aliases were enriched at load time; persisting those would turn runtime detection into a user declaration that outlives future detection fixes. Capabilities the user or catalog actually declared on disk are preserved verbatim.
Replace the load-time enrich / write-time strip pair with resolveAliasCapabilities: config objects stay pure declarations, and detection from kosong's built-in model knowledge merges additively at read time. Write-back paths persist snapshots verbatim, and future corrections to the detection knowledge take effect without migrating materialized state. - Extract the Claude id parser into claude-version.ts, shared by the anthropic wire layer and the capability registry so the advertised capability cannot drift from request building; this also covers vendor-prefixed Fable ids like us.anthropic.claude-fable-5-v1:0. - Mark OpenAI o-series and Gemini 2.5 Pro as always_thinking, matching their wire behavior (no off switch / minimum thinking budget). - Let custom api.json registries declare always_reasoning, same contract as the models.dev catalog extension. - Drop the capability probe in provider-manager: session resolution now goes through the same resolver instead of constructing a provider. - Pass undefined directly instead of conditional spread for optional always_thinking / providers fields.
- isFableModel now owns the full prefix-or-parser predicate in claude-version.ts, so the wire layer (omit thinking: disabled, xhigh effort) and the capability registry classify version-less ids like claude-fable-latest identically instead of drifting apart. - ModelCapability.always_thinking is typed `true` instead of boolean: absent means toggleable, which removes the === true narrowing at every consumer. - The always-thinking registry constants derive from their toggleable base capability via spread, so the only stated difference is the flag. - resolveAliasCapabilities documents that unknown detection flattens to false and that permissive-by-default consumers (the TUI media gate) must keep reading declared strings. - listModelsFromHarness reads config.providers directly (the schema defaults it), fixing the one test stub that omitted it. - Drop unnecessary casts from the model-selector test fixtures.
deriveThinkingSupported turning true for claude-fable-5 (via capability resolution) had a regression as a side effect: ACP started advertising an Off/On thinking select for models whose thinking cannot be turned off — and with defaultThinking unset the toggle even started on "Thinking Off" while the model reasons (and bills) regardless. AcpModelEntry now carries alwaysThinking (set only by capability resolution; the name-regex route cannot tell always-on variants from toggleable ones and never sets it), and buildSessionConfigOptions omits the thinking option for such models — the same dynamic-visibility rule already used for non-thinking models, since ACP's select arm has no read-only entry.
- listModelsFromHarness no longer dereferences config.providers outside its try block: getConfig crosses an RPC/stub boundary where a partial snapshot can omit the field the static type marks required, and the resulting TypeError escaped the catch. The harness mock goes back to omitting providers so the partial-stub path stays exercised. - The model selector's providers prop narrows to Record<string, Pick<ProviderConfig, 'type'>>: the component only reads the wire type, so its signature should not claim credentials. Callers keep passing their full ProviderConfig map via structural subtyping. - isFableModel's version-less prefix branch anchors at a separator so ids merely containing the substring don't classify as Fable. - A cross-check test pins getProviderModelCapability to each ChatProvider's getCapability for every provider type, so the pure lookup and the instance path cannot silently drift apart.
Declared-capability fixtures now use generic provider/model names; which providers ship which capabilities is the catalog's statement to make, not the test suite's. Detection-unknown coverage moves to an uncatalogued model name on a known wire type, which exercises the same UNKNOWN_CAPABILITY path.
c4edb94 to
8aa99f0
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Related Issue
Follow-up to #610 — the problem is explained below.
Problem
Some models cannot run with thinking turned off:
claude-fable-5today (adaptive thinking is always on per the official docs), OpenAI o-series reasoning models, and Gemini 2.5 Pro. For these models the model selector still offers a thinking on/off toggle; selecting "off" silently runs with thinking anyway — the status line reads off while thinking tokens are billed.The TUI already has the right rendering for this:
thinkingAvailabilityreturns'always-on'(no toggle) when a model resolves toalways_thinking. But nothing populated that capability automatically — users would have to hand-declare it per model alias.What changed
Capabilities are resolved at read time through a single shared resolver, with kosong's
ModelCapabilityas the single source of truth. Config objects stay pure declarations — nothing is materialized intomodels.<alias>.capabilities, so getConfig→setConfig round-trips persist snapshots verbatim and future corrections to the detection knowledge take effect without migrating persisted state.kosong
ModelCapabilitygains an optionalalways_thinkingflag, typedtrue(absent = toggleable; present impliesthinking).claude-version.ts, andisFableModelthere is the single predicate shared by the anthropic wire layer (omittingthinking: disabledon Fable) and the capability registry — the advertised capability and the request-building behavior cannot drift. This covers vendor-prefixed ids likeus.anthropic.claude-fable-5-v1:0and version-less ids likeclaude-fable-latest.always_thinking, matching their wire behavior (withThinking('off')omitsreasoning_effortand the server still reasons; Gemini 2.5 Pro rejectsthinking_budget: 0).always_reasoningextension field (models.dev-compatible: absent means false), mapped ontoalways_thinking— the route for any model that declares it through a catalog rather than kosong's built-in knowledge.getProviderModelCapability(type, model)export: a pure lookup of the same per-provider mappingChatProvider.getCapabilityexposes, without constructing a provider.agent-core
resolveAliasCapabilities(providerType, alias): the single place declared capability strings and kosong detection meet. The merge is additive — declarations can add capabilities on top of detection but never veto it.provider-manager) now goes through this resolver, replacing the previous capability probe that constructed a throwaway provider with a placeholder API key.node-sdk / consumers
resolveAliasCapabilitiesis re-exported for UIs; the TUI model selector and ACP's model catalog call it instead of interpreting rawcapabilitiesstrings themselves.AcpModelEntrycarriesalwaysThinkingand the session config options omit the Off/On thinking select for such models — ACP's select arm has no read-only entry, and an "off" choice would misreport a model that reasons (and bills thinking tokens) regardless. Same dynamic-visibility rule as non-thinking models.catalogModelToAliasand custom api.json registries materializealways_thinkingalongsidethinkingfor catalog-declared always-reasoning models, so consumers checking plain'thinking'membership keep working.With this, picking
claude-fable-5(or an o-series / Gemini 2.5 Pro alias) shows Thinking as always-on with no off toggle, and catalog-added always-reasoning models behave the same.Checklist
gen-changesetsskill, or this PR needs no changeset.gen-docsskill, or this PR needs no doc update.Review follow-ups (addressed in-place)
Earlier revisions materialized detected capabilities into the runtime config at load time and stripped them back out of write-back patches. Review found two problems with that: the strip logic re-implemented a slice of merge semantics, and its "declarations are preserved verbatim" contract did not actually hold for models kosong knows about. The latest revision removes the enrich/strip pair entirely in favor of read-time resolution — there is no detected state to leak to disk, so no strip is needed. It also replaces the capability probe in
provider-managerwith the new pure lookup, and normalization of capability strings now lives in exactly one place (the resolver).A second review pass tightened the result further: the Fable matcher's prefix branch moved into
isFableModelitself so the wire layer and the registry use one predicate (previously the registry was wider — a version-less id likeclaude-fable-latestadvertised always-on while--thinking offwould still send the explicitdisabledconfig Fable 400s on);always_thinkingis typedtrueso no consumer needs=== truenarrowing; the always-thinking registry constants derive from their toggleable base via spread; and the resolver's docs now state explicitly that unknown detection flattens tofalse, which is why permissive-by-default consumers (the TUI media-attachment gate) keep reading declared strings instead.The same pass caught a regression this PR would have introduced on the ACP surface:
deriveThinkingSupportedturning true for Fable made ACP advertise an Off/On thinking select for models whose thinking cannot be turned off (previously they had no toggle at all, since nothing classified them as thinking) — and withdefaultThinkingunset the select even started on "Thinking Off". Fixed by carryingalwaysThinkingonAcpModelEntryand omitting the thinking option for such models. The name-regex route (/thinking|reason/i) deliberately never setsalwaysThinking: it cannot tell always-on variants from toggleable ones, so those models keep their toggle as before.A third pass hardened the resolution boundaries.
listModelsFromHarnessno longer dereferencesconfig.providersoutside its try block:getConfigcrosses an RPC/stub boundary where a partial snapshot can omit the field the static type marks required, and the resulting TypeError escaped the catch (the harness mock goes back to omittingprovidersso that path stays exercised). The model selector'sprovidersprop narrows toRecord<string, Pick<ProviderConfig, 'type'>>— the component only reads the wire type, so its signature no longer claims credentials; callers keep passing their fullProviderConfigmap via structural subtyping.isFableModel's version-less prefix branch anchors at a separator so ids merely containing the substring (claude-fabled-x) don't classify. And a cross-check test pinsgetProviderModelCapabilityto eachChatProvider.getCapabilityacross every provider type, so the pure lookup and the instance path cannot silently drift apart.Known limitation (intentionally out of scope)
Session-level thinking resolution (
resolveThinkingLevel) still does not consult model capabilities: withdefault_thinking = false(or explicit--thinking offvia CLI/ACP), an always-thinking model resolves tothinkingLevel: "off"— the status line reads off while the model runs adaptive thinking anyway (the request itself is correct; kosong omits thethinkingfield). This is pre-existing behavior from #610, orthogonal to the capability plumbing here, and touches the session resolution path — better handled as a focused follow-up that normalizes thinking level once the model alias is resolved.