fix(executor): tolerate nil field-arrays in FT.SEARCH/FT.AGGREGATE replies#39
Merged
Conversation
…plies Every reply-parse site sliced the per-document field-array directly, so a nil field-array raised TypeError and failed the whole query. Normalize each field-array to an empty list at all six parse sites (sync and async, across WITHSCORES, standard, and FT.AGGREGATE), turning an intermittent hard crash into a well-defined empty-fields row. In the WITHSCORES branch the row keeps its score. Add container-free regression tests that feed crafted nil-bearing replies through every parse branch. Fixes #38
This was referenced Jun 25, 2026
Contributor
|
🚀 PR was released in |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
ExecutorparsesFT.SEARCH/FT.AGGREGATEreplies by slicing each per-document field-array directly:dict(zip(row_data[::2], row_data[1::2])). This assumes every field-array is a list. When one document's field-array instead comes back asNone, the slice raisesTypeError: 'NoneType' object is not subscriptable.The cost to a
sql-redisuser is out of all proportion to the trigger. A single nil field-array does not degrade one row, it aborts the entire query with an unhandledTypeError. From the caller's side the failure looks inexplicable: the sameSELECTsucceeds on the next run, the SQL is valid, the data is intact, and nothing in the query or schema points at a cause. It surfaces as a rare, non-reproducible crash in an otherwise healthy application, the kind of flake that erodes trust in the library and is nearly impossible to diagnose from application logs alone.Root cause
We reproduced the nil on the wire and confirmed the mechanism with the Redis search team. It comes from multithreaded query execution. In 8.8 the default number of background search worker threads was raised above zero (aligning OSS Redis with similar projects, which runs searches on background executors by default); in 8.6 and earlier it defaulted to zero. When a query runs on a background worker and a matching document's key or indexed field expires (TTL) while that query is in flight, the document id can still come back in the results, but with its fields slot set to nil. On the wire (RESP2) this is the document key followed by
$-1(a nil bulk string) where a field-array belongs, which redis-py decodes toNone.This is why the behavior looks version-specific but is not. An identical probe reproduced readily on 8.8.0 and never on 8.2/8.4/8.6, yet the search team reproduced the same nil on 8.6 once background workers were enabled. The real variable is the worker-thread default, not the version. 8.8 simply turns workers on out of the box, which is why it appears with the stock
redis:latestimage and no special configuration.There are two parts to the behavior, and they are judged differently upstream. That a document expiring mid-query can still surface in results is, to an extent, expected and documented (see the FT.SEARCH RETURN docs). Returning a nil in the field-array slot specifically is treated as a bug on the Redis side, and a server-side fix is in progress. Either way the client should not crash on this reply shape. Tolerating it is the correct behavior both as a guard against the current nil and as graceful handling of the legitimately expected "the document just expired" case, so this fix is worth keeping regardless of the upstream outcome.
Fix
The unguarded slice pattern was present at six parse sites: the sync and async executors each parse three reply layouts (WITHSCORES, standard, and
FT.AGGREGATE). Each site now normalizes the field-array withrow_data = ... or []before slicing, so a nil or empty array yields an empty field set rather than a crash.I chose to keep the matched document as a row (an empty dict) rather than dropping it with
continue. This preserves the correspondence between the reported match count and the rows returned, and in the WITHSCORES branch it lets the row retain its score even when the fields are unavailable. The change is deliberately minimal and local to parsing; nothing about command construction or count handling is touched.Tests
Because the trigger depends on a timing-sensitive expiration race under background workers, there is no reliable way to provoke one from a live Redis on demand, so a container-based test would be flaky at best and most likely never fire. The new
tests/test_nil_field_array.pytherefore exercises the parser directly: it stubs the translator and client to feed crafted nil-bearing replies (mirroring the reply shape captured from 8.8 on the wire) through all three branches for both the sync and async executors. The tests are container-free and run in well under a second.Fixes #38