Skip to content

fix: exclude kernel memory from server memory stats to prevent double-counting#8426

Merged
mscolnick merged 1 commit intomarimo-team:mainfrom
bxff:fix/memory-stats-double-counting
Feb 23, 2026
Merged

fix: exclude kernel memory from server memory stats to prevent double-counting#8426
mscolnick merged 1 commit intomarimo-team:mainfrom
bxff:fix/memory-stats-double-counting

Conversation

@bxff
Copy link
Contributor

@bxff bxff commented Feb 22, 2026

Summary

Fixes a bug where the usage endpoint reports server + kernel > total system memory.

Problem: server_memory was calculated by summing the main process + all recursive children via main_process.children(recursive=True). Since the kernel process is a child of the server, its memory (and its children's memory) was included in both the server and kernel fields of the response.

Fix: Collect kernel PIDs first, then skip them when summing server memory.

Fixes #8320

Test Plan

  • All 6 existing tests in test_health.py pass
  • The fix is logically straightforward: reorder the two code blocks and add a kernel_pids set filter

…-counting

The usage endpoint was counting kernel process memory in both the
'server' and 'kernel' fields. Since the kernel is a child of the server
process, main_process.children(recursive=True) included the kernel and
its children, causing server + kernel to exceed total system memory.

Fix by collecting kernel PIDs first and skipping them when summing
server memory.

Fixes marimo-team#8320
@vercel
Copy link

vercel bot commented Feb 22, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
marimo-docs Ready Ready Preview, Comment Feb 22, 2026 9:49pm

Request Review

@github-actions
Copy link


Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution. You can sign the CLA by just posting a Pull Request Comment same as the below format.


I have read the CLA Document and I hereby sign the CLA


You can retrigger this bot by commenting recheck in this Pull Request. Posted by the CLA Assistant Lite bot.

@bxff
Copy link
Contributor Author

bxff commented Feb 22, 2026

I have read the CLA Document and I hereby sign the CLA

@mscolnick mscolnick requested a review from Copilot February 22, 2026 22:43
@mscolnick mscolnick added the bug Something isn't working label Feb 22, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes incorrect memory reporting in the /api/usage endpoint by preventing kernel RSS from being counted both under server.memory (server process + recursive children) and kernel.memory.

Changes:

  • Collects the kernel process PID (and its recursive children PIDs) before computing server memory.
  • Excludes kernel-related PIDs when summing server_memory to avoid double-counting.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 228 to 242
try:
if session and (pid := session.kernel_pid()) is not None:
kernel_process = psutil.Process(pid)
kernel_pids.add(kernel_process.pid)
kernel_memory = kernel_process.memory_info().rss
kernel_children = kernel_process.children(recursive=True)
for child in kernel_children:
kernel_pids.add(child.pid)
try:
kernel_memory += child.memory_info().rss
except psutil.NoSuchProcess:
pass
except psutil.ZombieProcess:
LOGGER.warning("Kernel process is a zombie")

Copy link

Copilot AI Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kernel memory collection can raise psutil.NoSuchProcess (e.g., kernel exits between session.kernel_pid() and psutil.Process(pid) / memory_info()), which would currently bubble up and fail the /api/usage request. Consider catching psutil.NoSuchProcess for the whole kernel block (similar to the child loop) and leaving kernel_memory=None (and kernel_pids empty) when the kernel process is gone.

Copilot uses AI. Check for mistakes.
Comment on lines +243 to +253
# Server memory (excluding kernel processes to avoid double-counting)
main_process = psutil.Process()
server_memory = main_process.memory_info().rss
children = main_process.children(recursive=True)
for child in children:
if child.pid in kernel_pids:
continue
try:
server_memory += child.memory_info().rss
except psutil.NoSuchProcess:
pass
Copy link

Copilot AI Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new behavior (excluding kernel PIDs from server_memory) isn't covered by tests. It would be valuable to add a unit test that monkeypatches psutil.Process / .children() / .memory_info().rss to assert that kernel RSS is not included in the reported server.memory while still being included in kernel.memory.

Copilot uses AI. Check for mistakes.
@mscolnick mscolnick merged commit 4a7ca82 into marimo-team:main Feb 23, 2026
42 of 45 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Machine stats card on bare metal host (ie not in container environment) showing higher marimo server and kernel usage than computer memory

4 participants