fix: exclude kernel memory from server memory stats to prevent double-counting#8426
Conversation
…-counting The usage endpoint was counting kernel process memory in both the 'server' and 'kernel' fields. Since the kernel is a child of the server process, main_process.children(recursive=True) included the kernel and its children, causing server + kernel to exceed total system memory. Fix by collecting kernel PIDs first and skipping them when summing server memory. Fixes marimo-team#8320
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
I have read the CLA Document and I hereby sign the CLA You can retrigger this bot by commenting recheck in this Pull Request. Posted by the CLA Assistant Lite bot. |
|
I have read the CLA Document and I hereby sign the CLA |
There was a problem hiding this comment.
Pull request overview
This PR fixes incorrect memory reporting in the /api/usage endpoint by preventing kernel RSS from being counted both under server.memory (server process + recursive children) and kernel.memory.
Changes:
- Collects the kernel process PID (and its recursive children PIDs) before computing server memory.
- Excludes kernel-related PIDs when summing
server_memoryto avoid double-counting.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| try: | ||
| if session and (pid := session.kernel_pid()) is not None: | ||
| kernel_process = psutil.Process(pid) | ||
| kernel_pids.add(kernel_process.pid) | ||
| kernel_memory = kernel_process.memory_info().rss | ||
| kernel_children = kernel_process.children(recursive=True) | ||
| for child in kernel_children: | ||
| kernel_pids.add(child.pid) | ||
| try: | ||
| kernel_memory += child.memory_info().rss | ||
| except psutil.NoSuchProcess: | ||
| pass | ||
| except psutil.ZombieProcess: | ||
| LOGGER.warning("Kernel process is a zombie") | ||
|
|
There was a problem hiding this comment.
Kernel memory collection can raise psutil.NoSuchProcess (e.g., kernel exits between session.kernel_pid() and psutil.Process(pid) / memory_info()), which would currently bubble up and fail the /api/usage request. Consider catching psutil.NoSuchProcess for the whole kernel block (similar to the child loop) and leaving kernel_memory=None (and kernel_pids empty) when the kernel process is gone.
| # Server memory (excluding kernel processes to avoid double-counting) | ||
| main_process = psutil.Process() | ||
| server_memory = main_process.memory_info().rss | ||
| children = main_process.children(recursive=True) | ||
| for child in children: | ||
| if child.pid in kernel_pids: | ||
| continue | ||
| try: | ||
| server_memory += child.memory_info().rss | ||
| except psutil.NoSuchProcess: | ||
| pass |
There was a problem hiding this comment.
The new behavior (excluding kernel PIDs from server_memory) isn't covered by tests. It would be valuable to add a unit test that monkeypatches psutil.Process / .children() / .memory_info().rss to assert that kernel RSS is not included in the reported server.memory while still being included in kernel.memory.
Summary
Fixes a bug where the usage endpoint reports
server + kernel > total system memory.Problem:
server_memorywas calculated by summing the main process + all recursive children viamain_process.children(recursive=True). Since the kernel process is a child of the server, its memory (and its children's memory) was included in both theserverandkernelfields of the response.Fix: Collect kernel PIDs first, then skip them when summing server memory.
Fixes #8320
Test Plan
test_health.pypasskernel_pidsset filter