Skip to content

feat: 桌面端语音输入(右 Alt 录音 → 智谱 ASR 转写)#888

Open
wishfay wants to merge 3 commits into
NanmiCoder:mainfrom
wishfay:feature-voice-input
Open

feat: 桌面端语音输入(右 Alt 录音 → 智谱 ASR 转写)#888
wishfay wants to merge 3 commits into
NanmiCoder:mainfrom
wishfay:feature-voice-input

Conversation

@wishfay

@wishfay wishfay commented Jun 21, 2026

Copy link
Copy Markdown

Summary

在聊天输入框按右 Alt 开始录音,再按一次结束,语音经识别后转成文字插入输入框(不自动发送,可编辑)。语音识别模型在设置 → 供应商」配置,默认用智谱 glm-asr-2512 测试。

方案

调用链:前端录音 → 本地 server(3456)→ server 用配置的 Key 调用语音识别 provider →
返回文字 → 插入输入框。

走 server 中转:前端 CSP 禁止直连外部域名(只允许 localhost),且与现有 provider/Tavily
key 一致——外部 Key 由 server 持有并发起调用。

关键点

  • 智谱云端 ASR 仅支持 wav/mp3,Chromium 的 MediaRecorder 产不出这两种格式,因此前端用
    ScriptProcessorNode 采 PCM 并内联编码成 WAV(无第三方依赖)
  • document 级监听右 Alt,toggle
    录音/停止;录音/识别中/错误状态有指示器,错误透传真实原因
  • 「设置 → 供应商」拆分为**「语言模型」(原有 provider,驱动对话与
    Agent)与
    「语音模型」**(语音转文字配置)两个分区

改动

Server:新增 server/api/voice.ts(POST /api/voice/transcribe)、路由注册、设置
schema 增加 voiceInput
桌面端:新增 api/voice.ts + hooks/useVoiceInput.ts,接入 ChatInput(光标插入 +
指示器),Settings.tsx 增加语音配置区,store/types/i18n(5 语言)同步

验证

  • cd desktop && bun run lint(tsc)通过
  • 服务端启动无报错,/api/voice/transcribe 路由实测 400/405/404 符合预期
  • 录音→转写→插入的端到端建议在真实环境(填智谱 API Key)下进一步验证

备注

智谱 GLM-ASR 仅支持 wav/mp3(≤25MB),故前端必须录成 WAV。macOS 可能需在 src-tauri 的
Info.plist 补 NSMicrophoneUsageDescription(本次未涉及)。

wishfay and others added 3 commits June 16, 2026 22:33
- 前端右 Alt 切换录音,ScriptProcessor 采 PCM 并编码 WAV(智谱 ASR 仅支持 wav/mp3)
- 本地 server 新增 /api/voice/transcribe,读取用户设置中的 endpoint/key/model 转发到语音识别 provider
- 设置 → 供应商 拆分为「语言模型」与「语音模型」两个分区
- 转写文字插入输入框光标处(不自动发送),录音/识别/错误状态有指示器
- 5 语言 i18n 同步

Co-Authored-By: Claude <noreply@anthropic.com>
@dosubot dosubot Bot added size:XL This PR changes 500-999 lines, ignoring generated files. enhancement New feature or request labels Jun 21, 2026
@github-actions

Copy link
Copy Markdown

PR quality triage

Changed areas: area:cli-core, area:desktop, area:release, area:server

CLI core policy: Blocked by policy until a maintainer applies allow-cli-core-change and approves the PR.

Missing-test policy: Blocked by policy until a maintainer applies allow-missing-tests or matching tests are added.

Coverage baseline policy: No coverage-baseline policy block detected.

CLI core files:

  • src/utils/settings/types.ts

Coverage policy files:

  • none

Expected checks:

  • change-policy
  • desktop-checks
  • server-checks
  • desktop-native-checks
  • coverage-checks

Test coverage signals:

  • BLOCKING unless allow-missing-tests is applied: Desktop product files changed without a desktop test file in the PR.
  • BLOCKING unless allow-missing-tests is applied: Server product files changed without a server test file in the PR.
  • BLOCKING unless allow-missing-tests is applied: Agent/runtime product files changed without a tools/utils test file in the PR.
  • Agent/model runtime path changed: use mock/request-shape tests in PR and maintainer live-model smoke before release.

Risk notes:

  • Desktop state/API layer changed: verify store persistence, WebSocket behavior, and startup errors.

Hard merge gates still come from GitHub Actions, not AI review.

Dosu handoff: Dosu can be used as the AI reviewer for risk explanation, missing-test prompts, and maintainer Q&A. If it does not comment automatically from the PR template, ask:

@dosubot review this PR for changed-area risk, missing tests, docs impact, desktop startup risk, and CLI core impact.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:cli-core area:desktop area:release area:server enhancement New feature or request needs-maintainer-approval size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant