Skip to content

fix: isolation when running multiple notebooks in an app server#8611

Draft
akshayka wants to merge 24 commits intomainfrom
aka/fix-run-multiple-notebooks
Draft

fix: isolation when running multiple notebooks in an app server#8611
akshayka wants to merge 24 commits intomainfrom
aka/fix-run-multiple-notebooks

Conversation

@akshayka
Copy link
Contributor

@akshayka akshayka commented Mar 7, 2026

Summary. This PR introduces process-level isolation when serving multiple apps from the same server (marimo run directory/, create_asgi_app()); clients of any given app are still run in threads in the app's process for efficiency. This fixes a critical bug in which different apps shared the same Python globals, leading to undefined behavior such as collisions in `sys.modules. It does make multi-app servers consume slightly more RAM.

Dependency on pyzmq. The proposed implementation also adds a dependency on pyzmq for multi-app servers, to pave the way for allowing sandboxed multi-app servers . It would be possible to design a different solution that used multiprocessing instead of pyzmq, at the cost of not supporting package sandboxes. It is perhaps worth discussing whether we are okay with making pyzmq a required dependency of marimo.

Context. When marimo was originally designed, marimo run only ever served a single notebook. A single process could safely serve multiple clients of the same notebook since they all share the same code.

When multiple app serving was introduced, we continued serving all clients from a single process, even though the clients were potentially running different programs. When two notebooks both import utils but expect different implementations from different directories, whichever app loads first wins, and the second app silently gets the wrong module. Similar problems may exist for other Python globals.

This PR. This PR fixes the problem by running each app in its own OS process. Multiple clients of the same app still share a process (as kernel threads), which allows for fast and cheap sessions. The isolation boundary is per-app, not per-client.

Before (shared process — sys.modules collisions):

  ┌──────────────── Main Process ─────────────────┐
  │                                               │
  │   Kernel(app1, client A)    ← all kernels     │
  │   Kernel(app1, client B)      share one       │
  │   Kernel(app2, client C)      sys.modules     │
  │   Kernel(app2, client D)                      │
  └───────────────────────────────────────────────┘

After (per-app process isolation):

  ┌──────────────── Main Process ─────────────────┐
  │           (HTTP, WebSocket, routing)           │
  └──────────────────┬──────────────┬──────────────┘
                     │ ZMQ          │ ZMQ
       ┌─────────────▼──┐   ┌──────▼──────────────┐
       │  App Process    │   │  App Process        │
       │  (app1.py)      │   │  (app2.py)          │
       │                 │   │                     │
       │  Kernel: cl. A  │   │  Kernel: cl. C      │
       │  Kernel: cl. B  │   │  Kernel: cl. D      │
       └─────────────────┘   └─────────────────────┘
        isolated sys.modules  isolated sys.modules

IPC. Each app process communicates with the main process over 4 shared ZeroMQ sockets (not per-client). Kernel commands and stream output are multiplexed over these shared channels using session ID tagging:

IPC channels (4 shared ZMQ sockets per app process):

  Main Process                             App Subprocess
  ────────────                             ──────────────
  mgmt     [PUSH] ─────────────────────▶ [PULL]  mgmt loop
  response [PULL] ◀───────────────────── [PUSH]  (create/stop kernel)
  cmd      [PUSH] ──[sid, channel, msg]─▶ [PULL]  dispatcher ──▶ kernel queues
  stream   [PULL] ◀──[sid, msg]───────── [PUSH]  collector  ◀── kernel output
Per session (main-process side):

  AppProcessQueueManager
    control_queue    ──┐
    ui_element_queue ──┤──▶ cmd socket (tagged with session_id)
    completion_queue ──┤
    input_queue      ──┘
    stream_queue     ◀──── stream receiver thread (regular Queue)

When this path is activated.

  • create_asgi_app(): always enables process isolation (the whole point is multi-app)
  • marimo run app1.py app2.py / directory serving: auto-enables when multiple files detected
  • marimo run app.py (single file): no change, uses existing thread-based kernels
  • marimo edit: no change, uses existing process-based kernels

akshayka added 12 commits March 6, 2026 22:11
When marimo serves multiple apps (via `create_asgi_app()` or
`marimo run app1.py app2.py`), all kernel threads previously shared
the same Python process and `sys.modules`. This caused module clashes
when different apps imported modules with the same name from different
directories.

This change runs different apps in different OS processes while
keeping multiple clients of the same app as kernel threads within
a single worker process for performance.

Architecture:
- WorkerProcessPool: manages worker processes keyed by file path
- WorkerProcess: wraps multiprocessing.Process per notebook file
- WorkerKernelManager: implements KernelManager protocol via ZeroMQ IPC
- worker_entry.py: subprocess entry point, spawns kernel threads

Activation is automatic for all multi-app scenarios. Single-app
serving is unaffected. pyzmq is required (graceful fallback if missing).
Worker subprocesses are in the same process group as the main process,
so Ctrl-C sends SIGINT to them too. Without this fix, workers crash
on SIGINT, then the main process hangs trying to send shutdown commands
to dead workers via ZMQ/queues.

The fix: workers ignore SIGINT and rely on the main process to send
ShutdownWorkerCmd via the management queue for graceful teardown.
When shutting down, PUSH sockets with pending messages block
context.destroy() indefinitely (default linger=-1). Setting linger=0
drops unsent messages immediately, allowing clean shutdown when the
remote end is already gone.
The key abstraction is one OS process per app/notebook file.
"AppProcess" conveys this directly, while "Worker" was generic.

Renames:
- WorkerProcess -> AppProcess
- WorkerProcessPool -> AppProcessPool
- WorkerKernelManager -> AppKernelManager
- ShutdownWorkerCmd -> ShutdownAppProcessCmd
- worker_main -> app_process_main
- worker*.py -> app_process*.py
Move the pyzmq check for multi-app process isolation into cli.py using
MarimoCLIMissingDependencyError, which shows a clean error with install
instructions instead of an ugly traceback.
- Move nested imports to module top level in app_process*.py
- Reorder functions bottom-up (leaf helpers before callers)
- Fix unused import (QueueManager), unused param (tmp_path)
- Fix SpawnProcess/Process type mismatch with proper TYPE_CHECKING imports
- Replace assert with ValueError guard for file_path narrowing
Replace multiprocessing.Process + multiprocessing.Queue with
subprocess.Popen + ZeroMQ for app process management. This enables
future support for sandboxed apps where each notebook needs its own
Python interpreter/venv.

Changes:
- AppProcess now uses subprocess.Popen to launch the app process
- Management channel (commands/responses) uses ZMQ PUSH/PULL sockets
  instead of multiprocessing.Queue
- Commands converted from dataclasses to msgspec.Struct with tagged
  JSON serialization
- app_process_entry.py is now launchable as a module
  (python -m marimo._session.managers.app_process_entry)
- Startup args passed via stdin, ready signal via stdout
  (same pattern as launch_kernel.py)
- AppProcess accepts optional python= parameter for custom interpreter
Move zmq and app_process imports behind TYPE_CHECKING or into local
scope so that importing marimo doesn't fail when pyzmq is absent.
This restores the CLI-friendly missing dependency error.
Instead of creating 12 ZMQ sockets per client connection (6 on each
side via IPCQueueManager), use 4 shared ZMQ channels per app process
that multiplex all kernel communication via session_id tagging.

New clients now just create threading.Queue objects and spawn a kernel
thread, eliminating the ~500ms ZMQ setup overhead per connection.

- Add MuxQueueManager and _MuxPushQueue for multiplexed command sending
- Add cmd dispatcher and stream collector threads in app process
- Pass full ZMQ addresses (not ports) to decouple transport from entry point
- Remove per-kernel Connection.create()/connect() dependency
MuxQueueManager -> AppProcessQueueManager
_MuxPushQueue -> _AppProcessPushQueue
- Shut down dead AppProcess (ZMQ sockets/context) before respawning
- Move install_thread_local_proxies() to process startup (not per-kernel)
- Remove unreachable None checks in AppKernelManager
- Log dropped stream messages for unknown sessions
- Add cross-reference comment for channel name constants
@vercel
Copy link

vercel bot commented Mar 7, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
marimo-docs Error Error Mar 12, 2026 9:54pm

Request Review

@akshayka akshayka added bug Something isn't working bash-focus Area to focus on during release bug bash breaking A breaking change labels Mar 7, 2026
Copy link
Contributor

@mscolnick mscolnick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm starting to think maybe pyzmq should be a required dep. at least in recommended, if not already.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bash-focus Area to focus on during release bug bash breaking A breaking change bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants