Skip to content

fix: use non-blocking stdout writes in stdio_server to prevent event loop deadlock#2070

Closed
retospect wants to merge 1 commit intomodelcontextprotocol:mainfrom
retospect:fix/stdio-nonblocking-writes
Closed

fix: use non-blocking stdout writes in stdio_server to prevent event loop deadlock#2070
retospect wants to merge 1 commit intomodelcontextprotocol:mainfrom
retospect:fix/stdio-nonblocking-writes

Conversation

@retospect
Copy link

fix: use non-blocking stdout writes in stdio_server to prevent event loop deadlock

Problem

When an MCP server tool returns a response larger than the OS pipe buffer (64 KB on macOS), stdout_writer blocks the entire event loop on the await stdout.write() call. This happens because anyio.wrap_file delegates to a synchronous write() on a blocking fd — if the pipe buffer is full (client hasn't read yet), the write syscall blocks, and no other async tasks can run.

In practice this manifests as:

  • Server hangs indefinitely after returning a large tool result (e.g. list_papers with 500+ entries returning ~74 KB of JSON)
  • The hang is silent — no error, no timeout, no log entry
  • The server process stays alive but is completely unresponsive
  • Only affects macOS (64 KB pipe buffer) in practice; Linux has a 1 MB default

Reported in #547.

Root cause

anyio.wrap_file(TextIOWrapper(sys.stdout.buffer)) wraps the synchronous file in a thread worker, but the underlying write() still blocks when the kernel pipe buffer is full. Since MCP stdio transport is a single pipe between server and client, the client must read before the server can write more — but the server can't process the client's next read request because the event loop is blocked on the write.

Fix

For the default stdout path (no custom override):

  1. Set the stdout fd to non-blocking (os.set_blocking(fd, False))
  2. Write in small chunks (4 KB) directly via os.write(), catching BlockingIOError (EAGAIN) and yielding to the event loop with await anyio.sleep(0.005) before retrying

This ensures the event loop never blocks on a pipe-full condition. The 4 KB chunk size is well below the 64 KB macOS pipe buffer, so most writes complete in a single syscall. When the buffer fills, the coroutine yields and retries after the client drains some data.

Custom stdout overrides (the stdout parameter) use the original anyio.wrap_file path unchanged.

Testing

Tested in production with an MCP server managing 500+ research papers, where list_papers regularly returns 60-80 KB responses. Before this fix, the server would hang ~1 in 3 calls. After the fix, zero hangs over weeks of use.

Closes #547

…loop deadlock

When a tool returns a response larger than the OS pipe buffer (64 KB on
macOS), stdout_writer blocks the entire event loop on write() because
anyio.wrap_file delegates to a synchronous write on a blocking fd.

Fix: set stdout fd to non-blocking mode and write in 4 KB chunks via
os.write(), catching BlockingIOError (EAGAIN) and yielding to the event
loop before retrying.  Custom stdout overrides use the original path.

Closes modelcontextprotocol#547
@maxisbey
Copy link
Contributor

maxisbey commented Mar 5, 2026

Thanks for the investigation and the detailed writeup — but after a thorough review, I don't think we can merge this. Closing for the following reasons:

This doesn't fix #547

Issue #547 describes a server "hanging" after server.run() when run directly in a terminal (python mcp_stdio_test.py). That's expected behavior — a stdio server blocks reading stdin until a client sends JSON-RPC messages. It's meant to be spawned by a client, not run interactively. No tool is ever called in that repro, so a fix for large tool responses can't address it.

The root cause analysis is incorrect

The PR states that anyio.wrap_file blocks the event loop when the pipe buffer fills. It doesn't — AsyncFile.write() calls to_thread.run_sync(), which blocks a worker thread while the event loop continues. I verified this empirically: with a pipe held full for 0.5s, a 50ms heartbeat task recorded 10 ticks during the blocked write. The default anyio thread limiter is 40, so the 2 threads stdio uses (reader + writer) won't exhaust it.

If you genuinely observed hangs with 74KB responses, the cause is elsewhere — most likely a client-side issue or a true bidirectional deadlock (both sides blocked on write), which this change also wouldn't fix since the data still has to go somewhere.

Regressions this would introduce

  • Global stdout mutation: os.set_blocking(sys.stdout.fileno(), False) affects the entire process and is never restored. Any print() or logging call anywhere can now raise BlockingIOError when the pipe is momentarily full.
  • Windows: os.set_blocking() on stdout fails on console handles and only supports pipes on Python 3.12+. No platform guard.
  • Merge conflict: the PR base uses exclude_none=True; main switched to exclude_unset=True in fix: allow null id in JSONRPCError per JSON-RPC 2.0 spec #2056. A merge risks silently reverting that fix.
  • Busy-wait polling: anyio.sleep(0.005) retry loop is less efficient than a kernel-woken blocked thread.
  • No tests: the existing test passes a custom stdout, so the new fd path has 0% coverage.

If you have a minimal reproducer for the large-response hang (including the client side), please open a new issue — happy to dig into what's actually going on there.

AI Disclaimer

@maxisbey maxisbey closed this Mar 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MCP Server Hangs Indefinitely After KqueueSelector Log on macOS (Python 3.12, Both STDIO & SSE)

2 participants