feat: 桌面端语音输入（右 Alt 录音 → 智谱 ASR 转写） by wishfay · Pull Request #888 · NanmiCoder/cc-haha

wishfay · 2026-06-21T19:46:07Z

Summary

在聊天输入框按右 Alt 开始录音,再按一次结束,语音经识别后转成文字插入输入框(不自动发送,可编辑)。语音识别模型在设置 → 供应商」配置,默认用智谱 glm-asr-2512 测试。

方案

调用链:前端录音 → 本地 server(3456)→ server 用配置的 Key 调用语音识别 provider →
返回文字 → 插入输入框。

走 server 中转:前端 CSP 禁止直连外部域名(只允许 localhost),且与现有 provider/Tavily
key 一致——外部 Key 由 server 持有并发起调用。

关键点

智谱云端 ASR 仅支持 wav/mp3,Chromium 的 MediaRecorder 产不出这两种格式,因此前端用
ScriptProcessorNode 采 PCM 并内联编码成 WAV(无第三方依赖)
document 级监听右 Alt,toggle
录音/停止;录音/识别中/错误状态有指示器,错误透传真实原因
「设置 → 供应商」拆分为**「语言模型」(原有 provider,驱动对话与
Agent)与「语音模型」**(语音转文字配置)两个分区

改动

Server:新增 server/api/voice.ts(POST /api/voice/transcribe)、路由注册、设置
schema 增加 voiceInput
桌面端:新增 api/voice.ts + hooks/useVoiceInput.ts,接入 ChatInput(光标插入 +
指示器),Settings.tsx 增加语音配置区,store/types/i18n(5 语言)同步

验证

cd desktop && bun run lint(tsc)通过
服务端启动无报错,/api/voice/transcribe 路由实测 400/405/404 符合预期
录音→转写→插入的端到端建议在真实环境(填智谱 API Key)下进一步验证

备注

智谱 GLM-ASR 仅支持 wav/mp3(≤25MB),故前端必须录成 WAV。macOS 可能需在 src-tauri 的
Info.plist 补 NSMicrophoneUsageDescription(本次未涉及)。

- 前端右 Alt 切换录音，ScriptProcessor 采 PCM 并编码 WAV（智谱 ASR 仅支持 wav/mp3） - 本地 server 新增 /api/voice/transcribe，读取用户设置中的 endpoint/key/model 转发到语音识别 provider - 设置 → 供应商拆分为「语言模型」与「语音模型」两个分区 - 转写文字插入输入框光标处（不自动发送），录音/识别/错误状态有指示器 - 5 语言 i18n 同步 Co-Authored-By: Claude <noreply@anthropic.com>

github-actions · 2026-06-21T19:46:27Z

PR quality triage

Changed areas: area:cli-core, area:desktop, area:release, area:server

CLI core policy: Blocked by policy until a maintainer applies allow-cli-core-change and approves the PR.

Missing-test policy: Blocked by policy until a maintainer applies allow-missing-tests or matching tests are added.

Coverage baseline policy: No coverage-baseline policy block detected.

CLI core files:

src/utils/settings/types.ts

Coverage policy files:

none

Expected checks:

change-policy
desktop-checks
server-checks
desktop-native-checks
coverage-checks

Test coverage signals:

BLOCKING unless allow-missing-tests is applied: Desktop product files changed without a desktop test file in the PR.
BLOCKING unless allow-missing-tests is applied: Server product files changed without a server test file in the PR.
BLOCKING unless allow-missing-tests is applied: Agent/runtime product files changed without a tools/utils test file in the PR.
Agent/model runtime path changed: use mock/request-shape tests in PR and maintainer live-model smoke before release.

Risk notes:

Desktop state/API layer changed: verify store persistence, WebSocket behavior, and startup errors.

Hard merge gates still come from GitHub Actions, not AI review.

Dosu handoff: Dosu can be used as the AI reviewer for risk explanation, missing-test prompts, and maintainer Q&A. If it does not comment automatically from the PR template, ask:

@dosubot review this PR for changed-area risk, missing tests, docs impact, desktop startup risk, and CLI core impact.

wishfay and others added 3 commits June 16, 2026 22:33

更新了一些用于windows调试和本地测试的代码。

f69835b

Merge branch 'main' of github.com:NanmiCoder/cc-haha

06ca8f7

dosubot Bot added size:XL This PR changes 500-999 lines, ignoring generated files. enhancement New feature or request labels Jun 21, 2026

github-actions Bot added area:cli-core area:desktop area:release area:server needs-maintainer-approval labels Jun 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: 桌面端语音输入（右 Alt 录音 → 智谱 ASR 转写）#888

feat: 桌面端语音输入（右 Alt 录音 → 智谱 ASR 转写）#888
wishfay wants to merge 3 commits into
NanmiCoder:mainfrom
wishfay:feature-voice-input

wishfay commented Jun 21, 2026

Uh oh!

github-actions Bot commented Jun 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

wishfay commented Jun 21, 2026

Summary

方案

关键点

改动

验证

备注

Uh oh!

github-actions Bot commented Jun 21, 2026

PR quality triage

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant