Skip to content

[mypyc] Use cached ASCII characters in CPyStr_GetItem#21035

Merged
ilevkivskyi merged 1 commit intopython:masterfrom
VaggelisD:str-getitem-cache
Mar 18, 2026
Merged

[mypyc] Use cached ASCII characters in CPyStr_GetItem#21035
ilevkivskyi merged 1 commit intopython:masterfrom
VaggelisD:str-getitem-cache

Conversation

@VaggelisD
Copy link
Contributor

@VaggelisD VaggelisD commented Mar 18, 2026

For characters < 256, use PyUnicode_FromOrdinal() which returns CPython's cached single-char Latin-1 string objects instead of allocating a new PyUnicode object on every str[i] access. This avoids allocation+deallocation overhead in character-scanning hot loops.

Characters >= 256 (BMP, supplementary) keep the original PyUnicode_New allocation path unchanged.

I ran the following micro-benchmark: Scan a 50k-character string with s[i] in a loop (repeated the benchmark 5000 times):

String type Before (ms/iter) After (ms/iter) Speedup
ASCII (0–127) 0.651 0.166 3.9x (-75%)
Latin-1 (128–255) 0.752 0.162 4.6x (-78%)
BMP (256–65535) 0.901 0.809 no change
Supplementary (>65535) 0.842 0.743 no change
Mixed (25% each) 0.817 0.542 1.5x (-34%)

This was coauthored with @tobymao

For characters < 256, use PyUnicode_FromOrdinal() which returns
CPython's cached single-char Latin-1 string objects instead of
allocating a new PyUnicode object on every str[i] access. This
avoids allocation+deallocation overhead in character-scanning
hot loops.

Characters >= 256 (BMP, supplementary) keep the original
PyUnicode_New allocation path unchanged.
Copy link
Member

@ilevkivskyi ilevkivskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG, thanks!

@ilevkivskyi ilevkivskyi merged commit 6bcd02e into python:master Mar 18, 2026
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants