feat(captions): add editable caption timeline and editor#711
feat(captions): add editable caption timeline and editor#711joehachemx wants to merge 6 commits into
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (17)
✅ Files skipped from review due to trivial changes (2)
🚧 Files skipped from review as they are similar to previous changes (12)
📝 WalkthroughWalkthroughAdds silence-aware caption resegmentation in the Electron pipeline and a caption editing surface across the video editor timeline, settings panel, playback, persistence, and localized settings text. ChangesSilence-aware auto-caption segmentation (Electron)
Caption editing UI (frontend)
Sequence Diagram(s)sequenceDiagram
participant User
participant TimelineCanvas
participant VideoEditor
participant CaptionListPanel
participant captionOps
participant VideoPlayback
User->>TimelineCanvas: click caption lane or caption item
TimelineCanvas->>VideoEditor: onAddCaptionAtMs / onSelectCaption
VideoEditor->>captionOps: createCaptionCue / addCue / retimeCue / splitCue / mergeCues / deleteCue
VideoEditor->>CaptionListPanel: selectedCaptionId, cue data, edit callbacks
User->>CaptionListPanel: edit text or timing
CaptionListPanel->>VideoEditor: onCaptionTextEdit / onCaptionRetime / onCaptionSplit / onCaptionMerge / onCaptionDelete
User->>VideoPlayback: Escape while editing
VideoPlayback->>VideoPlayback: cancelCaptionEdit()
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 13
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/components/video-editor/timeline/hooks/useTimelineDndBindings.ts (1)
101-106: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winDon't exempt captions from overlap checks.
useTimelineCaptionActionsalready prevents adding a caption inside an existing cue, buthasOverlap()now hard-codes captions tofalse. That means drag/resize can still create overlapping caption spans on the single caption lane, breaking the invariant enforced by quick-add and leaving multiple cues active for the same time range.Also applies to: 168-186
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/components/video-editor/timeline/hooks/useTimelineDndBindings.ts` around lines 101 - 106, The overlap check in useTimelineDndBindings’s hasOverlap currently hard-codes captions to never overlap, which bypasses the single-caption-lane invariant during drag/resize. Remove the special-case exemption for the "caption" item kind so captions are validated the same as other timeline items, while keeping the existing annotation behavior intact. Make the fix in the hasOverlap callback and any related overlap path referenced by useTimelineCaptionActions so dragged or resized captions cannot create overlapping spans.
🧹 Nitpick comments (3)
src/i18n/locales/zh-TW/settings.json (1)
110-123: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueAmbiguous singular/plural distinction in
sections.captionvssections.captions.Both keys translate to "字幕", losing the singular/plural distinction present in other locales. Consider using "單條字幕" or "字幕編輯" for
sections.captionto differentiate the individual caption editing section.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/i18n/locales/zh-TW/settings.json` around lines 110 - 123, The `sections.caption` and `sections.captions` entries in the zh-TW locale are currently identical, so update the `sections` object in `settings.json` to give `sections.caption` a distinct singular meaning while keeping `sections.captions` for the plural/collection label. Use a clearer translation such as “單條字幕” or “字幕編輯” for `sections.caption`, and leave `sections.captions` as the existing plural form.src/i18n/locales/zh-CN/settings.json (1)
127-140: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueAmbiguous singular/plural distinction in
sections.captionvssections.captions.Both keys translate to "字幕", losing the singular/plural distinction present in English ("Caption" vs "Captions") and other locales (e.g., Spanish "Subtítulo"/"Subtítulos", French "Sous-titre"/"Sous-titres"). Consider using "单条字幕" or "字幕编辑" for
sections.captionto differentiate the individual caption editing section from the general captions section.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/i18n/locales/zh-CN/settings.json` around lines 127 - 140, The zh-CN locale has an ambiguous translation in the settings `sections` object because `caption` and `captions` both use the same text. Update `sections.caption` in the locale entry to a distinct singular/section label (for example, something like an individual caption or caption editing label) while keeping `sections.captions` as the plural/general captions label, so the `sections` keys remain clearly differentiated.src/components/video-editor/captionOps.test.ts (1)
49-84: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winAdd a regression test for very short retimes.
This suite never exercises shrinking a multi-word cue below its word count, which is the path where
retimeCuecurrently produces overlapping word timings.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/components/video-editor/captionOps.test.ts` around lines 49 - 84, Add a regression test in captionOps.retimeCue that retimes a multi-word cue to a very short interval smaller than its word count, since this is the path where overlapping word timings appear. Extend the existing retimeCue describe block in captionOps.test.ts with a case using makeCues()/a multi-word cue and a tiny startMs/endMs range, then assert the returned cue and its words remain valid and non-overlapping via assertCuesValid. Use the retimeCue helper and cue id lookup patterns already present so the new test covers the shrinking behavior directly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@electron/ipc/captions/generate.ts`:
- Around line 281-286: The silence-cleanup path in generate.ts is incorrectly
treating an empty result from segmentCuesIntoPhrases() as a failure, which
causes the code to fall back to the original cues instead of preserving a valid
empty resegmentation. Update the try block around detectSilenceIntervals() and
segmentCuesIntoPhrases() so cuesToReturn is assigned the resegmented value
unconditionally on success, and only retain the existing cues when an exception
is thrown.
In `@electron/ipc/captions/segment.ts`:
- Around line 311-345: The merge logic in mergeShortAdjacentCaptions() is
evaluating already padded cue timings, which can make real pauses look smaller
and incorrectly merge captions across a gap. Update resegmentCuesBySilence() /
padSpans() call flow so merge eligibility uses unpadded speech bounds, or defer
padding until after merging is complete, and keep mergeTwoCues() /
mergeShortAdjacentCaptions() working on the true pause. Add a regression test
covering a gap just above mergeGapMs to ensure it does not merge after padding.
- Around line 407-413: In segment phrase splitting, the silence-based break is
being evaluated even when consecutive words overlap or abut. Update the
shouldBreak logic in segment.ts to guard the silenceInGap() call with a positive
gap check using gapMs, so overlap cases in the word timing loop do not trigger a
bogus split; keep the rest of the break conditions in the same place around
endsSentence(), pauseMs, and maxPhraseMs.
In `@src/components/video-editor/CaptionListPanel.tsx`:
- Around line 99-102: The caption editor in CaptionListPanel still contains
hardcoded visible strings, so replace those label texts with the same i18n
translation keys used for the new settings UI. Update the label blocks around
the Text, speaker/speed/other caption controls in CaptionListPanel so all
user-facing strings resolve through the localization system instead of English
literals.
- Around line 75-82: The blur handler in CaptionListPanel’s editing flow is
still able to call commitText after Escape because setDraftText(cue.text) is
batched, so the stale draft can be saved instead of discarded. Update the
CaptionListPanel logic around commitText and the Escape/blur handling to track
when editing was explicitly cancelled and skip the next blur-driven commit in
that case. Use the existing cue.id, cue.text, draftText, onTextEdit, and
setDraftText flow to keep the cancel path from invoking the save path.
In `@src/components/video-editor/captionOps.ts`:
- Around line 28-38: createCaptionCue is returning cues without enforcing the
minimum duration invariant, which can leave endMs less than or equal to startMs.
Update createCaptionCue to clamp the returned endMs so it is always at least
startMs + 1 after rounding, and keep the existing rounding/text handling intact.
Use the createCaptionCue helper in captionOps as the place to preserve this
invariant for all callers.
- Around line 92-108: The retime logic in captionOps should guard against spans
that are too short for the cue’s word count before calling rescaleWordsIntoSpan.
In the cue retiming flow that uses normalizeCaptionWords(cue) and
rescaleWordsIntoSpan, add a check to detect when the new span cannot support
monotonic 1 ms word ranges for all words, and fall back to a safer behavior such
as skipping word-level rescaling or clamping to a minimum viable span. Keep the
fix localized to the cue retime path so later words do not overlap or reorder.
In `@src/components/video-editor/captionStyle.ts`:
- Around line 50-56: getCaptionWordVisualState is ignoring hasWordTimings and
CaptionWordState, which makes all caption words render as fully active. Update
this helper to use the incoming word timing/state information again so
VideoPlayback can distinguish spoken, upcoming, and inactive words when valid
timings exist, while preserving the fallback behavior only when word timings are
unavailable or unreliable.
In `@src/components/video-editor/SettingsPanel.tsx`:
- Around line 2785-2791: The quick-add toggle in SettingsPanel’s Switch control
is currently unnamed to assistive tech because the nearby text is not
programmatically linked. Add an accessible name to this Switch by associating it
with a label or providing an explicit aria-label/aria-labelledby tied to the
quick-add text, using the existing autoCaptionSettings.timelineQuickAdd and
updateAutoCaptionSettings wiring.
In `@src/components/video-editor/timeline/components/viewport/TimelineCanvas.tsx`:
- Around line 912-921: The row-count logic in TimelineCanvas is missing the
always-visible empty caption lane when captions are enabled, so the
min-height/stretch calculation can be too small before any caption items exist.
Update the row counting in TimelineCanvasRows/timelineRowCount to include
CAPTION_ROW_ID whenever captionsEnabled is true, not only when an item with that
rowId is present, and keep the existing annotation/audio row counting intact.
In
`@src/components/video-editor/timeline/hooks/actions/useTimelineCaptionActions.ts`:
- Around line 48-59: `addCaptionAtMs()` in `useTimelineCaptionActions` is
already resolving the actual inserted span with `nextCaptionStartMs` and
`totalMs`, but the hover preview still assumes `DEFAULT_CAPTION_DURATION_MS`.
Update the caption placement flow so the resolved `{ start, end }` span from
`addCaptionAtMs()` is exposed to the hover layer and reused for the ghost
preview, keeping the preview identical to the final inserted caption. Use the
existing `onCaptionAdded` path and any caption-placement state/hooks around
`useTimelineCaptionActions` to pass this resolved span through instead of only
the default duration.
In `@src/components/video-editor/timeline/hooks/useTimelineSelection.ts`:
- Around line 16-26: Finish the caption selection wiring in useTimelineSelection
by treating captions the same way as the other block types. Update
clearSelectedBlocks() and activateSelectAllZooms() to also clear
selectedCaptionId, and add a handleSelectCaption() path that mirrors the other
select handlers while resetting selectAllBlocksActive. Make sure the caption
delete/select callbacks (onCaptionDelete, onSelectCaption) are used consistently
so a background clear or Cmd/Ctrl+A on zooms does not leave caption selection
stale or route Delete to zooms.
In `@src/components/video-editor/VideoEditor.tsx`:
- Line 554: The caption selection state can become stale when the underlying
cues change, so add cleanup for selectedCaptionId in VideoEditor.tsx alongside
the existing zoom/annotation/audio resets. Update the effect or synchronization
logic that already responds to cue/source changes so it clears selectedCaptionId
whenever the current caption no longer exists after regeneration, undo, merge,
or reset. Use the existing selectedCaptionId and setSelectedCaptionId state in
VideoEditor to keep the panel and timeline from pointing at a missing cue.
---
Outside diff comments:
In `@src/components/video-editor/timeline/hooks/useTimelineDndBindings.ts`:
- Around line 101-106: The overlap check in useTimelineDndBindings’s hasOverlap
currently hard-codes captions to never overlap, which bypasses the
single-caption-lane invariant during drag/resize. Remove the special-case
exemption for the "caption" item kind so captions are validated the same as
other timeline items, while keeping the existing annotation behavior intact.
Make the fix in the hasOverlap callback and any related overlap path referenced
by useTimelineCaptionActions so dragged or resized captions cannot create
overlapping spans.
---
Nitpick comments:
In `@src/components/video-editor/captionOps.test.ts`:
- Around line 49-84: Add a regression test in captionOps.retimeCue that retimes
a multi-word cue to a very short interval smaller than its word count, since
this is the path where overlapping word timings appear. Extend the existing
retimeCue describe block in captionOps.test.ts with a case using makeCues()/a
multi-word cue and a tiny startMs/endMs range, then assert the returned cue and
its words remain valid and non-overlapping via assertCuesValid. Use the
retimeCue helper and cue id lookup patterns already present so the new test
covers the shrinking behavior directly.
In `@src/i18n/locales/zh-CN/settings.json`:
- Around line 127-140: The zh-CN locale has an ambiguous translation in the
settings `sections` object because `caption` and `captions` both use the same
text. Update `sections.caption` in the locale entry to a distinct
singular/section label (for example, something like an individual caption or
caption editing label) while keeping `sections.captions` as the plural/general
captions label, so the `sections` keys remain clearly differentiated.
In `@src/i18n/locales/zh-TW/settings.json`:
- Around line 110-123: The `sections.caption` and `sections.captions` entries in
the zh-TW locale are currently identical, so update the `sections` object in
`settings.json` to give `sections.caption` a distinct singular meaning while
keeping `sections.captions` for the plural/collection label. Use a clearer
translation such as “單條字幕” or “字幕編輯” for `sections.caption`, and leave
`sections.captions` as the existing plural form.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 731b72e8-c302-425c-b1c6-63c2975b767f
📒 Files selected for processing (41)
electron/ipc/captions/generate.tselectron/ipc/captions/segment.test.tselectron/ipc/captions/segment.tselectron/ipc/captions/silence.test.tselectron/ipc/captions/silence.tssrc/components/video-editor/CaptionListPanel.tsxsrc/components/video-editor/SettingsPanel.tsxsrc/components/video-editor/VideoEditor.tsxsrc/components/video-editor/VideoPlayback.tsxsrc/components/video-editor/captionEditing.tssrc/components/video-editor/captionLayout.test.tssrc/components/video-editor/captionLayout.tssrc/components/video-editor/captionOps.test.tssrc/components/video-editor/captionOps.tssrc/components/video-editor/captionStyle.tssrc/components/video-editor/projectPersistence.tssrc/components/video-editor/timeline/Item.tsxsrc/components/video-editor/timeline/ItemGlass.module.csssrc/components/video-editor/timeline/TimelineEditor.tsxsrc/components/video-editor/timeline/components/viewport/TimelineCanvas.tsxsrc/components/video-editor/timeline/components/wrapper/TimelineWrapper.tsxsrc/components/video-editor/timeline/core/constants.tssrc/components/video-editor/timeline/core/timelineTypes.tssrc/components/video-editor/timeline/hooks/actions/useTimelineCaptionActions.tssrc/components/video-editor/timeline/hooks/useTimelineDndBindings.tssrc/components/video-editor/timeline/hooks/useTimelineEditorRuntime.tssrc/components/video-editor/timeline/hooks/useTimelineKeyboardShortcuts.tssrc/components/video-editor/timeline/hooks/useTimelineSelection.tssrc/components/video-editor/timeline/hooks/utils/timelineSelectionUtils.tssrc/components/video-editor/timeline/model/timelineModel.tssrc/components/video-editor/types.tssrc/i18n/locales/en/settings.jsonsrc/i18n/locales/es/settings.jsonsrc/i18n/locales/fr/settings.jsonsrc/i18n/locales/it/settings.jsonsrc/i18n/locales/ko/settings.jsonsrc/i18n/locales/nl/settings.jsonsrc/i18n/locales/pt-BR/settings.jsonsrc/i18n/locales/ru/settings.jsonsrc/i18n/locales/zh-CN/settings.jsonsrc/i18n/locales/zh-TW/settings.json
| const [selectedAnnotationId, setSelectedAnnotationId] = useState<string | null>(null); | ||
| const [audioRegions, setAudioRegions] = useState<AudioRegion[]>([]); | ||
| const [selectedAudioId, setSelectedAudioId] = useState<string | null>(null); | ||
| const [selectedCaptionId, setSelectedCaptionId] = useState<string | null>(null); |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
Clear stale caption selection when cues change.
selectedCaptionId can outlive the cue it points to after regenerating captions, undoing an added cue, merging away the selected cue, or loading/resetting a source. Add the caption equivalent of the existing zoom/annotation/audio cleanup so the panel/timeline do not stay selected on a missing cue.
Proposed fix
+ useEffect(() => {
+ if (selectedCaptionId && !autoCaptions.some((cue) => cue.id === selectedCaptionId)) {
+ setSelectedCaptionId(null);
+ setActiveEffectSection((section) => (section === "caption" ? "scene" : section));
+ }
+ }, [selectedCaptionId, autoCaptions]);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const [selectedCaptionId, setSelectedCaptionId] = useState<string | null>(null); | |
| const [selectedCaptionId, setSelectedCaptionId] = useState<string | null>(null); | |
| useEffect(() => { | |
| if (selectedCaptionId && !autoCaptions.some((cue) => cue.id === selectedCaptionId)) { | |
| setSelectedCaptionId(null); | |
| setActiveEffectSection((section) => (section === "caption" ? "scene" : section)); | |
| } | |
| }, [selectedCaptionId, autoCaptions]); |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/components/video-editor/VideoEditor.tsx` at line 554, The caption
selection state can become stale when the underlying cues change, so add cleanup
for selectedCaptionId in VideoEditor.tsx alongside the existing
zoom/annotation/audio resets. Update the effect or synchronization logic that
already responds to cue/source changes so it clears selectedCaptionId whenever
the current caption no longer exists after regeneration, undo, merge, or reset.
Use the existing selectedCaptionId and setSelectedCaptionId state in VideoEditor
to keep the panel and timeline from pointing at a missing cue.
Description
Adds a full caption editing workflow to the video editor: a dedicated timeline track for captions, a side panel editor (CaptionListPanel) to view/edit individual caption lines, hover-to-add on the timeline, and smarter automatic segmentation that splits captions by sentence using punctuation and real audio pauses (silence trimming).
Motivation
Previously, captions were auto-generated and dropped straight onto the video timeline with no way to interact with them. You couldn't adjust their timing or fix any of the words the LLM/transcription got wrong (whatever was generated was what you were stuck with), which made the feature hard to actually use. This PR adds a dedicated caption timeline track and editing panel, plus smarter sentence-level segmentation, so captions are now fully customizable: users can reposition, retime, correct, and add caption lines—turning captions from a take-it-or-leave-it output into something genuinely usable.
Type of Change
Related Issue(s)
N/A
Screenshots / Video
The video file was too big for GitHub; watch the demo from YouTube.
Testing Guide
Summary by CodeRabbit
New Features
Bug Fixes