Handle result from PyObject_VisitManagedDict#6032
Conversation
|
How much trouble is it to add a test? If adding a test is too much trouble: could you please explain in the PR description what the problem was, and why this is the correct fix? I'm totally willing to believe this fix is needed and correct, but without any test or explanation it's very opaque. |
|
This is the same handling as PY_VISIT. I had the same bug in the Python wrapper I am implementing at my company and so checked whether others had this problem as well. For us this fixes the following case: instance = m.DynamicAttr()
instance.a = "test"
assert instance in gc.get_referrers(instance.__dict__)For pybind11 this unit test still fails due to python/cpython#130327. It should presumably work once we upgrade to 3.14.4 which was released yesterday. |
|
Worth noting that cpython doesn't always properly handle the result either (python/cpython#148275). |
|
I was hoping that since 3.14.4 is released in https://github.com/actions/python-versions it would pick it up now, but apparently that isn't the case |
In the past I've seen it taking a day or so. Do you see the option to "rerun failed tests"? (I'll try to look at this PR carefully on the weekend.) |
|
No I don't have that option. |
|
I just clicked the button, but it's still picking up: Successfully set up CPython (3.14.3) Under https://github.com/pybind/pybind11/actions/runs/24187486767/job/70595813802?pr=6032 I see: I think it'll be best if you skip that test for Python versions between 3.14.0 and 3.14.3 (or similar; I'm not sure). |
|
If you need more reruns, tag me here with "another rerun needed" or similar. I cannot focus on this PR before the weekend, but I can do quick button clicks. |
This avoids a warning on msvc about Py_Visit shadowing the vret variable.
Yes once we have confirmed this is indeed working under 3.14.4 we should skip this test on most targets:
I didn't have Cpython 3.14.4 for local testing either and so only tested by manually hacking in the way Cpython 3.14.4 handles managed dict visits.
It's still giving me 3.14.3 after my variable name fix to avoid shadowing warnings on msvc |
|
@maxbachmann Based on local testing, your fix seems to work. First with Python 3.14.2 installed locally from sources: Then with Python 3.14.4 installed locally from sources, everything else exactly the same: |
|
FYI — Cursor generated the skips already (one minute). It works locally with 3.14.2 and 3.14.4. I'll push them when the CI for #6029 is done. |
The managed-dict referrer check is only known to work on CPython 3.13.13+ and 3.14.4+, while earlier releases and non-CPython interpreters can report different traversal behavior. Made-with: Cursor
|
@maxbachmann I pushed commit e9f51a1 and pasted a Cursor-generated full PR description; I only glanced through quickly. Could you please review everything and fix as needed? |
|
(sorry copy-paste mishap before, I had the wrong commit hash. corrected) |
|
Looks good to me |
Cursor-generated:
Summary
This PR fixes
pybind11_traverse()forpy::dynamic_attr()instances on Python 3.13+ by propagating the return value fromPyObject_VisitManagedDict().Without this, pybind11 could silently swallow a non-zero return from the GC/referrer visitor callback. In practice, that means the relationship between an instance and its managed
__dict__can be missed during traversal-based operations. A visible symptom is thatgc.get_referrers(instance.__dict__)can fail to report the owning pybind11 instance.This PR also adds a regression test for that behavior, together with version/runtime skips so the test only runs where the interpreter behavior is known to be reliable.
Background
Types created with
py::dynamic_attr()store per-instance attributes in a Python__dict__. For garbage collection and related traversal-based introspection to work correctly, pybind11'stp_traverseimplementation must expose that dictionary to the visitor callback.Historically, pybind11 did this by obtaining the instance dict pointer directly and visiting it with
Py_VISIT(dict). That matters becausePy_VISITdoes not just call the visitor: it also immediately propagates a non-zero return value.On Python 3.13+, CPython moved dynamic instance dictionaries to the managed-dict APIs, so pybind11 uses
PyObject_VisitManagedDict(self, visit, arg)instead. However, pybind11 did not check or return that function's integer result.That changed the semantics of
pybind11_traverse()on the managed-dict path:Py_VISIT(dict), including early return on non-zero visitor resultsPyObject_VisitManagedDict(): pybind11 always continued and eventually returned0, even if the visitor had requested an early exitThis is subtle, but it changes the contract of
tp_traverse. The concrete user-visible failure that motivated this PR is:Without this PR, that assertion can fail because the traversal machinery does not correctly report the instance as a referrer of its managed
__dict__.Why this fix is correct
The implementation change is intentionally small:
This is the same behavior pybind11 already had on the pre-3.13 path via
Py_VISIT(dict): if the visitor returns non-zero, propagate that result immediately instead of discarding it.That makes the Python 3.13+ managed-dict path match the long-standing non-managed-dict path and restores the expected
tp_traversebehavior rather than introducing new semantics.In other words, this PR does not change what pybind11 traverses. It only fixes how pybind11 handles the visitor result while traversing managed dicts.
Test coverage
This PR adds a focused regression test for the exact externally observable behavior:
DynamicAttrtest type is added withpy::dynamic_attr()__dict__, and checks thatgc.get_referrers(instance.__dict__)includes the owning instanceThis test is intentionally high-level. Instead of asserting on implementation details, it checks the behavior that users actually care about: whether the runtime can discover the instance-to-dict relationship through traversal.
Why the new test is version- and runtime-gated
While working on the regression test, it became clear that CPython itself has version-specific managed-dict behavior in this area.
Relevant upstream context:
python/cpythonissue: #130327python/cpythonfollow-up/fix discussion: #148275Based on local testing and the PR discussion, the new regression test is currently treated as reliable only on:
>= 3.13.13>= 3.14.4Accordingly, the test is skipped on:
< 3.13.133.14.0through3.14.3The non-CPython skip is deliberate. PyPy and GraalPy already differ from CPython in several GC- and referrer-sensitive cases, and this PR is about restoring correct CPython managed-dict traversal semantics. Skipping other interpreters avoids overstating what has been verified there.
This is also why the skip is expressed in terms of interpreter behavior, not platform behavior. Android, iOS, and Emscripten are handled elsewhere in CI as platform targets; the uncertainty here is specifically about interpreter traversal semantics.
Validation and debugging history
This PR had an awkward testing window because CI was still installing CPython 3.14.3 for a while even after 3.14.4 had been released. That made the new regression test appear to fail in CI even though the underlying pybind11 fix looked correct.
The version split was later confirmed locally:
With the version/runtime gating in place:
pre-commitpassesNet effect
After this PR:
PyObject_VisitManagedDict()return valuesPy_VISIT(dict)path