Skip to content

gh-145921: Add "_DuringGC" functions for tp_traverse#145925

Open
encukou wants to merge 13 commits intopython:mainfrom
encukou:during-gc
Open

gh-145921: Add "_DuringGC" functions for tp_traverse#145925
encukou wants to merge 13 commits intopython:mainfrom
encukou:during-gc

Conversation

@encukou
Copy link
Member

@encukou encukou commented Mar 13, 2026

There are newly documented restrictions on tp_traverse:

The traversal function must not have any side effects.
It must not modify the reference counts of any Python
objects nor create or destroy any Python objects.

  • Add several functions that are guaranteed side-effect-free, with a _DuringGC suffix.
  • Use these in ctypes
  • Consolidate tp_traverse docs in gcsupport.rst, moving unique content from typeobj.rst there

📚 Documentation preview 📚: https://cpython-previews--145925.org.readthedocs.build/

There are newly documented restrictions on tp_traverse:

    The traversal function must not have any side effects.
    It must not modify the reference counts of any Python
    objects nor create or destroy any Python objects.

* Add several functions that are guaranteed side-effect-free,
  with a _DuringGC suffix.
* Use these in ctypes
* Consolidate tp_traverse docs in gcsupport.rst, moving unique
  content from typeobj.rst there
Copy link
Member

@zooba zooba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly doc suggestions, the actual change LGTM!

Comment on lines +300 to +302
The traversal function has a limitation:

.. warning::
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this kind of structure flows well in the docs? It probably works better to go straight to the warning, then in the paragraph after it replace "This means" with "The limitation on side effects means ..."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I'll push back here.
The red box looks like it stands on its own. Maybe it's my banner blindness that kicks in, but my brain doesn't immediately connect "The limitation on side effects" to the box above it, so I added the line to make it a part of the running text more obviously.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're pushing in the same direction - I don't think the "traversal function has a limitation" flows into the warning banner (I don't think anything really flows into or out of these banners, apart from headings)

Maybe the following advice should just be rolled into the banner itself? Or maybe give the detailed advice, and then a brief warning banner afterwards for anyone who skipped over it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did something similar in https://docs.python.org/3/c-api/call.html#vectorcall, but ended up also having an introductory sentence.

Could you give a concrete suggestion?

.. note::

The :c:member:`~PyTypeObject.tp_traverse` function can be called from any
thread.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we mention something about "though all other Python threads are guaranteed to be blocked at the time" here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK that's an implementation detail some people want to get rid of.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't these new functions explicitly not thread safe, though? If we can't guarantee that you can safely access your objects during tp_traverse, what are we expecting - authors should figure out their own way to lock all other Python threads? How are they meant to coordinate it?

Maybe there's a way to describe it that says "the entire object graph is guaranteed to not be changed by any other Python thread" that leaves open the possibility of those threads not being blocked, but also not being able to interfere?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question -- one we should have an answer to. Are you OK leaving it to future PRs? Alas, it wouldn't be the only unclear detail around C API in free-threading.

Comment on lines +394 to +395
Note that these functions may fail (return ``NULL`` or -1).
Only creating and setting the exception is suppressed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they fail during tp_traverse, should we be bailing out as quickly as possible? What value should we return?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question; honestly I'm not sure about the intentions behind tp_traverse's result.

@pablogsal: would it be correct to say tp_traverse should return zero on recoverable errors, including being unable to visit some objects?
Currently gc.c ignores errors*, but non-zero a result makes VISIT return early; gc_free_threading.c calls PyErr_NoMemory if traversal fails.


* except _PyGC_GetReferrers (gc.get_referrers) which instead assumes that tp_traverse sets an exception on failure -- that's wrong, right?

@stestagg
Copy link

stestagg commented Mar 18, 2026

Previous invalid comment

EDIT: Ok, please ignore this for now, the errors are almost certainly stack overflows from the fuzz input, but I'm not sure why I'm seeing it only for this PR, it's looking more likely to be an issue at my end

Ok, this is going to be a horrible comment, sorry!

I'm running some independent fuzzing on PRs, and this pr is triggering some segfaults where others don't seem to be.

Unfortunately it seems to be some sort of heisencorruption because if I run the same input in isolation then I can't reproduce, but it seems to be only happening with this PR.

Over the course of 5 hours of full-tilt fuzzing, I have 227 cores.

I'm continuing to investigate, and will update when I know more, but I wanted to put this out there slightly early incase it helps.

I think what happend was a test input for a different pr got picked up and run for this pr which exposed the bug the other pr was trying to fix. Sorry, please ignore this.


The :c:member:`~PyTypeObject.tp_traverse` handler must have the following type:
The :c:member:`~PyTypeObject.tp_clear` handler must be of the :c:type:`inquiry` type, or ``NULL``
if the object is immutable.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to clarify "if the object is immutable". A tuple is immutable and can contain reference cycle:

>>> t=(1,2,[])
>>> t[-1].append(t)
>>> t
(1, 2, [(...)])

Maybe something like:

"or NULL if the object doesn't contain other objects or only contains built-in scalar objects (bool, int, float, complex, bytes, str, NoneType)."

It's not easy to define when it's not needed to implement tp_clear().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's for tp_clear -- I only moved those docs; I prefer to keep it out of scope of this PR.

@encukou encukou marked this pull request as draft March 19, 2026 10:53
Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

You should copy the NEWS entry to What's New in Python 3.15.

I tested ./python -m test test_capi -R 3:3: it does not leak.

return NULL;
}
return (char *)obj + Py_TYPE(obj)->tp_basicsize;
return PyObject_GetItemData_DuringGC(obj);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling PyObject_GetItemData_DuringGC() here repeats !PyType_HasFeature() test, it's inefficient. You can leave (char *)obj + Py_TYPE(obj)->tp_basicsize here, or write a sub-function used by PyObject_GetItemData_DuringGC() and PyObject_GetItemData().

@encukou encukou marked this pull request as ready for review March 19, 2026 12:27
@encukou encukou requested a review from AA-Turner as a code owner March 19, 2026 12:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants