fix: isolation when running multiple notebooks in an app server#8611
Draft
fix: isolation when running multiple notebooks in an app server#8611
Conversation
When marimo serves multiple apps (via `create_asgi_app()` or `marimo run app1.py app2.py`), all kernel threads previously shared the same Python process and `sys.modules`. This caused module clashes when different apps imported modules with the same name from different directories. This change runs different apps in different OS processes while keeping multiple clients of the same app as kernel threads within a single worker process for performance. Architecture: - WorkerProcessPool: manages worker processes keyed by file path - WorkerProcess: wraps multiprocessing.Process per notebook file - WorkerKernelManager: implements KernelManager protocol via ZeroMQ IPC - worker_entry.py: subprocess entry point, spawns kernel threads Activation is automatic for all multi-app scenarios. Single-app serving is unaffected. pyzmq is required (graceful fallback if missing).
Worker subprocesses are in the same process group as the main process, so Ctrl-C sends SIGINT to them too. Without this fix, workers crash on SIGINT, then the main process hangs trying to send shutdown commands to dead workers via ZMQ/queues. The fix: workers ignore SIGINT and rely on the main process to send ShutdownWorkerCmd via the management queue for graceful teardown.
When shutting down, PUSH sockets with pending messages block context.destroy() indefinitely (default linger=-1). Setting linger=0 drops unsent messages immediately, allowing clean shutdown when the remote end is already gone.
The key abstraction is one OS process per app/notebook file. "AppProcess" conveys this directly, while "Worker" was generic. Renames: - WorkerProcess -> AppProcess - WorkerProcessPool -> AppProcessPool - WorkerKernelManager -> AppKernelManager - ShutdownWorkerCmd -> ShutdownAppProcessCmd - worker_main -> app_process_main - worker*.py -> app_process*.py
Move the pyzmq check for multi-app process isolation into cli.py using MarimoCLIMissingDependencyError, which shows a clean error with install instructions instead of an ugly traceback.
- Move nested imports to module top level in app_process*.py - Reorder functions bottom-up (leaf helpers before callers) - Fix unused import (QueueManager), unused param (tmp_path) - Fix SpawnProcess/Process type mismatch with proper TYPE_CHECKING imports - Replace assert with ValueError guard for file_path narrowing
Replace multiprocessing.Process + multiprocessing.Queue with subprocess.Popen + ZeroMQ for app process management. This enables future support for sandboxed apps where each notebook needs its own Python interpreter/venv. Changes: - AppProcess now uses subprocess.Popen to launch the app process - Management channel (commands/responses) uses ZMQ PUSH/PULL sockets instead of multiprocessing.Queue - Commands converted from dataclasses to msgspec.Struct with tagged JSON serialization - app_process_entry.py is now launchable as a module (python -m marimo._session.managers.app_process_entry) - Startup args passed via stdin, ready signal via stdout (same pattern as launch_kernel.py) - AppProcess accepts optional python= parameter for custom interpreter
Move zmq and app_process imports behind TYPE_CHECKING or into local scope so that importing marimo doesn't fail when pyzmq is absent. This restores the CLI-friendly missing dependency error.
Instead of creating 12 ZMQ sockets per client connection (6 on each side via IPCQueueManager), use 4 shared ZMQ channels per app process that multiplex all kernel communication via session_id tagging. New clients now just create threading.Queue objects and spawn a kernel thread, eliminating the ~500ms ZMQ setup overhead per connection. - Add MuxQueueManager and _MuxPushQueue for multiplexed command sending - Add cmd dispatcher and stream collector threads in app process - Pass full ZMQ addresses (not ports) to decouple transport from entry point - Remove per-kernel Connection.create()/connect() dependency
MuxQueueManager -> AppProcessQueueManager _MuxPushQueue -> _AppProcessPushQueue
- Shut down dead AppProcess (ZMQ sockets/context) before respawning - Move install_thread_local_proxies() to process startup (not per-kernel) - Remove unreachable None checks in AppKernelManager - Log dropped stream messages for unknown sessions - Add cross-reference comment for channel name constants
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
mscolnick
reviewed
Mar 7, 2026
Contributor
mscolnick
left a comment
There was a problem hiding this comment.
I'm starting to think maybe pyzmq should be a required dep. at least in recommended, if not already.
- merge management loop and command loop
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary. This PR introduces process-level isolation when serving multiple apps from the same server (
marimo run directory/,create_asgi_app()); clients of any given app are still run in threads in the app's process for efficiency. This fixes a critical bug in which different apps shared the same Python globals, leading to undefined behavior such as collisions in `sys.modules. It does make multi-app servers consume slightly more RAM.Dependency on pyzmq. The proposed implementation also adds a dependency on
pyzmqfor multi-app servers, to pave the way for allowing sandboxed multi-app servers . It would be possible to design a different solution that used multiprocessing instead ofpyzmq, at the cost of not supporting package sandboxes. It is perhaps worth discussing whether we are okay with making pyzmq a required dependency of marimo.Context. When marimo was originally designed,
marimo runonly ever served a single notebook. A single process could safely serve multiple clients of the same notebook since they all share the same code.When multiple app serving was introduced, we continued serving all clients from a single process, even though the clients were potentially running different programs. When two notebooks both
import utilsbut expect different implementations from different directories, whichever app loads first wins, and the second app silently gets the wrong module. Similar problems may exist for other Python globals.This PR. This PR fixes the problem by running each app in its own OS process. Multiple clients of the same app still share a process (as kernel threads), which allows for fast and cheap sessions. The isolation boundary is per-app, not per-client.
IPC. Each app process communicates with the main process over 4 shared ZeroMQ sockets (not per-client). Kernel commands and stream output are multiplexed over these shared channels using session ID tagging:
When this path is activated.
create_asgi_app(): always enables process isolation (the whole point is multi-app)marimo run app1.py app2.py/ directory serving: auto-enables when multiple files detectedmarimo run app.py(single file): no change, uses existing thread-based kernelsmarimo edit: no change, uses existing process-based kernels