TextFileReader: strip leading invisible Unicode characters on first line only by niaBaldoni · Pull Request #615 · TypesettingTools/Aegisub

niaBaldoni · 2026-06-20T22:25:15Z

Fixes #614

Problem

Subtitle files generated by tools like faster-whisper might sometimes contain unexpected invisible Unicode characters (e.g. U+200E LEFT-TO-RIGHT MARK, U+202A LEFT-TO-RIGHT EMBEDDING) at the very start of the file. These cause SRT parsing to fail with an error:

Parsing SRT: Expected subtitle index at line 1

The SRT parser's digit check fails because the invisible character precedes the subtitle index on the first line. Since the characters are invisible in common text editors, the user has no way of knowing what went wrong.

Changes

The existing U+FEFF (BOM) check ran on every line of every file. This has been moved into a first_line guard so it only runs once. The guard now also strips a broader set of invisible Unicode characters before any format parser sees the first line. Characters within subtitle content (such as RTL marks, possibly present in Arabic or Persian subtitles) are intentionally preserved, since the stripping is scoped to the first line only.

Tests

Added tests/tests/text_file_reader.cpp:

strips_bom_on_first_line: existing BOM behaviour is preserved
strips_leading_invisible_char: single invisible character at file start is stripped
strips_stacked_leading_invisible_chars: multiple stacked invisible characters are all stripped
preserves_invisible_chars_in_content: U+200F inside subtitle content is not stripped

Test files in tests/text_file_reader/.

niaBaldoni added 2 commits June 20, 2026 22:54

TextFileReader: strip leading invisible Unicode chars on first line only

7eced3b

TextFileReader: add tests for leading invisible Unicode char stripping

dd43966

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TextFileReader: strip leading invisible Unicode characters on first line only#615

TextFileReader: strip leading invisible Unicode characters on first line only#615
niaBaldoni wants to merge 2 commits into
TypesettingTools:masterfrom
niaBaldoni:strip_leading_invisible_char

niaBaldoni commented Jun 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

niaBaldoni commented Jun 20, 2026

Problem

Changes

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant