Skip to content

fix: remove use locale from Filters.pm for Unicode correctness (GH #137)#386

Draft
toddr-bot wants to merge 1 commit intoabw:masterfrom
toddr-bot:koan.toddr.bot/fix-locale-unicode-filters
Draft

fix: remove use locale from Filters.pm for Unicode correctness (GH #137)#386
toddr-bot wants to merge 1 commit intoabw:masterfrom
toddr-bot:koan.toddr.bot/fix-locale-unicode-filters

Conversation

@toddr-bot
Copy link
Copy Markdown
Contributor

@toddr-bot toddr-bot commented Apr 10, 2026

What

Remove use locale from Template::Filters and add utf8::upgrade to the case-changing filter functions (upper, lower, ucfirst, lcfirst).

Why

The use locale pragma (added in 2006) causes uc/lc to operate on bytes rather than Unicode characters. When processing strings with codepoints > U+00FF (e.g. Cyrillic, CJK), this produces Wide character in substitution warnings and can silently corrupt output. Reported in #137.

How

  • Removed use locale from Filters.pm. No other code in the module depends on locale-sensitive behavior (\s+ and \d+ patterns used in trim/collapse/indent are locale-invariant).
  • Added utf8::upgrade in the four case filters to ensure Perl applies Unicode case folding rules even for Latin-1 range characters (0x80-0xFF) that might be stored as bytes internally.
  • The utf8::upgrade function is available since Perl 5.8 (the project's minimum version), so no compatibility issues.

Testing

  • New t/unicode_filters.t with 10 tests: Latin-1 case folding (upper/lower/ucfirst/lcfirst), Cyrillic wide characters (upper/lower + no-warnings check), and ASCII regression tests.
  • Full existing test suite passes (all files, 0 failures).

Closes #137


Quality Report

Changes: 2 files changed, 121 insertions(+), 5 deletions(-)

Code scan: clean

Tests: passed (OK)

Branch hygiene: clean

Generated by Kōan post-mission quality pipeline

…w#137)

The `use locale` pragma in Filters.pm caused wide character warnings
when processing strings with codepoints > 255 through case-changing
filters (upper, lower, ucfirst, lcfirst). It also made uc/lc operate
on bytes rather than characters for non-ASCII strings.

Fix: remove `use locale` and use `utf8::upgrade` in the case filters
to ensure Perl applies Unicode case folding rules regardless of the
string's internal storage format.

Closes abw#137

Co-Authored-By: Kōan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

use locale in template::filters [rt.cpan.org #119992]

1 participant