Skip to content

fix(executor): tolerate nil field-arrays in FT.SEARCH/FT.AGGREGATE replies#39

Merged
rbs333 merged 1 commit into
mainfrom
fix/ft-search-nil-field-array
Jun 25, 2026
Merged

fix(executor): tolerate nil field-arrays in FT.SEARCH/FT.AGGREGATE replies#39
rbs333 merged 1 commit into
mainfrom
fix/ft-search-nil-field-array

Conversation

@vishal-bala

@vishal-bala vishal-bala commented Jun 24, 2026

Copy link
Copy Markdown
Member

Motivation

Executor parses FT.SEARCH/FT.AGGREGATE replies by slicing each per-document field-array directly: dict(zip(row_data[::2], row_data[1::2])). This assumes every field-array is a list. When one document's field-array instead comes back as None, the slice raises TypeError: 'NoneType' object is not subscriptable.

The cost to a sql-redis user is out of all proportion to the trigger. A single nil field-array does not degrade one row, it aborts the entire query with an unhandled TypeError. From the caller's side the failure looks inexplicable: the same SELECT succeeds on the next run, the SQL is valid, the data is intact, and nothing in the query or schema points at a cause. It surfaces as a rare, non-reproducible crash in an otherwise healthy application, the kind of flake that erodes trust in the library and is nearly impossible to diagnose from application logs alone.

Root cause

We reproduced the nil on the wire and confirmed the mechanism with the Redis search team. It comes from multithreaded query execution. In 8.8 the default number of background search worker threads was raised above zero (aligning OSS Redis with similar projects, which runs searches on background executors by default); in 8.6 and earlier it defaulted to zero. When a query runs on a background worker and a matching document's key or indexed field expires (TTL) while that query is in flight, the document id can still come back in the results, but with its fields slot set to nil. On the wire (RESP2) this is the document key followed by $-1 (a nil bulk string) where a field-array belongs, which redis-py decodes to None.

This is why the behavior looks version-specific but is not. An identical probe reproduced readily on 8.8.0 and never on 8.2/8.4/8.6, yet the search team reproduced the same nil on 8.6 once background workers were enabled. The real variable is the worker-thread default, not the version. 8.8 simply turns workers on out of the box, which is why it appears with the stock redis:latest image and no special configuration.

There are two parts to the behavior, and they are judged differently upstream. That a document expiring mid-query can still surface in results is, to an extent, expected and documented (see the FT.SEARCH RETURN docs). Returning a nil in the field-array slot specifically is treated as a bug on the Redis side, and a server-side fix is in progress. Either way the client should not crash on this reply shape. Tolerating it is the correct behavior both as a guard against the current nil and as graceful handling of the legitimately expected "the document just expired" case, so this fix is worth keeping regardless of the upstream outcome.

Fix

The unguarded slice pattern was present at six parse sites: the sync and async executors each parse three reply layouts (WITHSCORES, standard, and FT.AGGREGATE). Each site now normalizes the field-array with row_data = ... or [] before slicing, so a nil or empty array yields an empty field set rather than a crash.

I chose to keep the matched document as a row (an empty dict) rather than dropping it with continue. This preserves the correspondence between the reported match count and the rows returned, and in the WITHSCORES branch it lets the row retain its score even when the fields are unavailable. The change is deliberately minimal and local to parsing; nothing about command construction or count handling is touched.

Tests

Because the trigger depends on a timing-sensitive expiration race under background workers, there is no reliable way to provoke one from a live Redis on demand, so a container-based test would be flaky at best and most likely never fire. The new tests/test_nil_field_array.py therefore exercises the parser directly: it stubs the translator and client to feed crafted nil-bearing replies (mirroring the reply shape captured from 8.8 on the wire) through all three branches for both the sync and async executors. The tests are container-free and run in well under a second.

Fixes #38

…plies

Every reply-parse site sliced the per-document field-array directly, so a
nil field-array raised TypeError and failed the whole query. Normalize each
field-array to an empty list at all six parse sites (sync and async, across
WITHSCORES, standard, and FT.AGGREGATE), turning an intermittent hard crash
into a well-defined empty-fields row. In the WITHSCORES branch the row keeps
its score.

Add container-free regression tests that feed crafted nil-bearing replies
through every parse branch.

Fixes #38

@rbs333 rbs333 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - thanks!

@rbs333 rbs333 merged commit 08e62d2 into main Jun 25, 2026
8 checks passed
@rbs333 rbs333 deleted the fix/ft-search-nil-field-array branch June 25, 2026 15:29
@github-actions

Copy link
Copy Markdown
Contributor

🚀 PR was released in v0.7.1 🚀

@github-actions github-actions Bot added the released This issue/pull request has been released. label Jun 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto:release Create a release when this PR is merged released This issue/pull request has been released.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

FT.SEARCH reply parser crashes on a nil field-array (TypeError: 'NoneType' object is not subscriptable)

2 participants