The regression in [Sphinx caused by the incremental gc changes in CPython](https://github.com/python/cpython/issues/124567) was largely missed by the pyperformance benchmarking suite. We should add a benchmark that reproduces that case.