gh-145921: Add "_DuringGC" functions for tp_traverse#145925
gh-145921: Add "_DuringGC" functions for tp_traverse#145925encukou wants to merge 13 commits intopython:mainfrom
Conversation
There are newly documented restrictions on tp_traverse:
The traversal function must not have any side effects.
It must not modify the reference counts of any Python
objects nor create or destroy any Python objects.
* Add several functions that are guaranteed side-effect-free,
with a _DuringGC suffix.
* Use these in ctypes
* Consolidate tp_traverse docs in gcsupport.rst, moving unique
content from typeobj.rst there
zooba
left a comment
There was a problem hiding this comment.
Mostly doc suggestions, the actual change LGTM!
| The traversal function has a limitation: | ||
|
|
||
| .. warning:: |
There was a problem hiding this comment.
I don't think this kind of structure flows well in the docs? It probably works better to go straight to the warning, then in the paragraph after it replace "This means" with "The limitation on side effects means ..."
There was a problem hiding this comment.
Hmm, I'll push back here.
The red box looks like it stands on its own. Maybe it's my banner blindness that kicks in, but my brain doesn't immediately connect "The limitation on side effects" to the box above it, so I added the line to make it a part of the running text more obviously.
There was a problem hiding this comment.
I think we're pushing in the same direction - I don't think the "traversal function has a limitation" flows into the warning banner (I don't think anything really flows into or out of these banners, apart from headings)
Maybe the following advice should just be rolled into the banner itself? Or maybe give the detailed advice, and then a brief warning banner afterwards for anyone who skipped over it?
There was a problem hiding this comment.
I did something similar in https://docs.python.org/3/c-api/call.html#vectorcall, but ended up also having an introductory sentence.
Could you give a concrete suggestion?
| .. note:: | ||
|
|
||
| The :c:member:`~PyTypeObject.tp_traverse` function can be called from any | ||
| thread. |
There was a problem hiding this comment.
Should we mention something about "though all other Python threads are guaranteed to be blocked at the time" here?
There was a problem hiding this comment.
AFAIK that's an implementation detail some people want to get rid of.
There was a problem hiding this comment.
Aren't these new functions explicitly not thread safe, though? If we can't guarantee that you can safely access your objects during tp_traverse, what are we expecting - authors should figure out their own way to lock all other Python threads? How are they meant to coordinate it?
Maybe there's a way to describe it that says "the entire object graph is guaranteed to not be changed by any other Python thread" that leaves open the possibility of those threads not being blocked, but also not being able to interfere?
There was a problem hiding this comment.
That's a good question -- one we should have an answer to. Are you OK leaving it to future PRs? Alas, it wouldn't be the only unclear detail around C API in free-threading.
Doc/c-api/gcsupport.rst
Outdated
| Note that these functions may fail (return ``NULL`` or -1). | ||
| Only creating and setting the exception is suppressed. |
There was a problem hiding this comment.
If they fail during tp_traverse, should we be bailing out as quickly as possible? What value should we return?
There was a problem hiding this comment.
That's a good question; honestly I'm not sure about the intentions behind tp_traverse's result.
@pablogsal: would it be correct to say tp_traverse should return zero on recoverable errors, including being unable to visit some objects?
Currently gc.c ignores errors*, but non-zero a result makes VISIT return early; gc_free_threading.c calls PyErr_NoMemory if traversal fails.
* except _PyGC_GetReferrers (gc.get_referrers) which instead assumes that tp_traverse sets an exception on failure -- that's wrong, right?
Previous invalid comment
EDIT: Ok, please ignore this for now, the errors are almost certainly stack overflows from the fuzz input, but I'm not sure why I'm seeing it only for this PR, it's looking more likely to be an issue at my end Ok, this is going to be a horrible comment, sorry! I'm running some independent fuzzing on PRs, and this pr is triggering some segfaults where others don't seem to be. Unfortunately it seems to be some sort of heisencorruption because if I run the same input in isolation then I can't reproduce, but it seems to be only happening with this PR. Over the course of 5 hours of full-tilt fuzzing, I have 227 cores. I'm continuing to investigate, and will update when I know more, but I wanted to put this out there slightly early incase it helps. I think what happend was a test input for a different pr got picked up and run for this pr which exposed the bug the other pr was trying to fix. Sorry, please ignore this. |
|
|
||
| The :c:member:`~PyTypeObject.tp_traverse` handler must have the following type: | ||
| The :c:member:`~PyTypeObject.tp_clear` handler must be of the :c:type:`inquiry` type, or ``NULL`` | ||
| if the object is immutable. |
There was a problem hiding this comment.
I would prefer to clarify "if the object is immutable". A tuple is immutable and can contain reference cycle:
>>> t=(1,2,[])
>>> t[-1].append(t)
>>> t
(1, 2, [(...)])Maybe something like:
"or NULL if the object doesn't contain other objects or only contains built-in scalar objects (bool, int, float, complex, bytes, str, NoneType)."
It's not easy to define when it's not needed to implement tp_clear().
There was a problem hiding this comment.
That's for tp_clear -- I only moved those docs; I prefer to keep it out of scope of this PR.
Co-authored-by: Victor Stinner <vstinner@python.org>
vstinner
left a comment
There was a problem hiding this comment.
LGTM.
You should copy the NEWS entry to What's New in Python 3.15.
I tested ./python -m test test_capi -R 3:3: it does not leak.
Objects/typeobject.c
Outdated
| return NULL; | ||
| } | ||
| return (char *)obj + Py_TYPE(obj)->tp_basicsize; | ||
| return PyObject_GetItemData_DuringGC(obj); |
There was a problem hiding this comment.
Calling PyObject_GetItemData_DuringGC() here repeats !PyType_HasFeature() test, it's inefficient. You can leave (char *)obj + Py_TYPE(obj)->tp_basicsize here, or write a sub-function used by PyObject_GetItemData_DuringGC() and PyObject_GetItemData().
There are newly documented restrictions on tp_traverse:
📚 Documentation preview 📚: https://cpython-previews--145925.org.readthedocs.build/