truncate html output of run stale cells tool#8578
truncate html output of run stale cells tool#8578mscolnick merged 2 commits intomarimo-team:mainfrom
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
All contributors have signed the CLA ✍️ ✅ |
for more information, see https://pre-commit.ci
|
I have read the CLA Document and I hereby sign the CLA |
Light2Dark
left a comment
There was a problem hiding this comment.
I think this is good, thank you! I'm going to clean up the code in a follow-up
There was a problem hiding this comment.
Pull request overview
This PR adds output truncation logic to the frontend RunStaleCellsTool to prevent exceeding LLM token limits when running stale cells in marimo's AI agent mode. Large cell outputs (like Plotly figures producing millions of characters of HTML) were breaking conversations by exceeding Anthropic's 200k token input limit.
Changes:
- Added output size limits:
text/htmloutputs are summarized to a short description, text outputs capped at 2,000 chars, error outputs at 3,000 chars, with a 40k char total budget across all cells - Restructured the cell output processing loop to track total output size and skip remaining cells when the budget is exceeded
- Added tests for HTML summarization and text truncation
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
frontend/src/core/ai/tools/run-cells-tool.ts |
Added output truncation constants and logic: HTML outputs are summarized, text/error outputs are truncated, and a global character budget limits total output size |
frontend/src/core/ai/tools/__tests__/run-cells-tool.test.ts |
Added tests for HTML output summarization and text output truncation |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| describe("output truncation", () => { | ||
| it("should summarize text/html output instead of dumping raw content", async () => { | ||
| const notebook = MockNotebook.notebookState({ | ||
| cellData: { | ||
| [cellId1]: { code: "fig.show()", edited: true }, | ||
| }, | ||
| }); | ||
| store.set(notebookAtom, notebook); | ||
|
|
||
| vi.mocked(runCells).mockImplementation(async () => { | ||
| const updatedNotebook = store.get(notebookAtom); | ||
| updatedNotebook.cellRuntime[cellId1] = { | ||
| ...updatedNotebook.cellRuntime[cellId1], | ||
| status: "idle", | ||
| }; | ||
| store.set(notebookAtom, updatedNotebook); | ||
| }); | ||
|
|
||
| const largeHtml = `<div>${"x".repeat(2_000_000)}</div>`; | ||
| vi.mocked(getCellContextData).mockReturnValue({ | ||
| cellOutput: { | ||
| outputType: "text", | ||
| processedContent: null, | ||
| imageUrl: null, | ||
| output: { mimetype: "text/html", data: largeHtml }, | ||
| }, | ||
| consoleOutputs: null, | ||
| cellName: "cell1", | ||
| } as never); | ||
|
|
||
| const result = await tool.handler({}, toolContext as never); | ||
|
|
||
| expect(result.status).toBe("success"); | ||
| const output = result.cellsToOutput?.[cellId1]?.cellOutput ?? ""; | ||
| expect(output).toContain("HTML Output:"); | ||
| expect(output).toContain("text/html"); | ||
| expect(output.length).toBeLessThan(200); | ||
| expect(output).not.toContain(largeHtml); | ||
| }); | ||
|
|
||
| it("should truncate large text output to MAX_TEXT_OUTPUT_CHARS", async () => { | ||
| const notebook = MockNotebook.notebookState({ | ||
| cellData: { | ||
| [cellId1]: { code: "print(big_string)", edited: true }, | ||
| }, | ||
| }); | ||
| store.set(notebookAtom, notebook); | ||
|
|
||
| vi.mocked(runCells).mockImplementation(async () => { | ||
| const updatedNotebook = store.get(notebookAtom); | ||
| updatedNotebook.cellRuntime[cellId1] = { | ||
| ...updatedNotebook.cellRuntime[cellId1], | ||
| status: "idle", | ||
| }; | ||
| store.set(notebookAtom, updatedNotebook); | ||
| }); | ||
|
|
||
| const largeText = "a".repeat(10_000); | ||
| vi.mocked(getCellContextData).mockReturnValue({ | ||
| cellOutput: { | ||
| outputType: "text", | ||
| processedContent: largeText, | ||
| imageUrl: null, | ||
| output: { mimetype: "text/plain", data: largeText }, | ||
| }, | ||
| consoleOutputs: null, | ||
| cellName: "cell1", | ||
| } as never); | ||
|
|
||
| const result = await tool.handler({}, toolContext as never); | ||
|
|
||
| const output = result.cellsToOutput?.[cellId1]?.cellOutput ?? ""; | ||
| expect(output).toContain("[TRUNCATED:"); | ||
| expect(output).toContain("Full output visible in the notebook UI."); | ||
| // Output should be capped (2000 chars content + "Output:\n" prefix + truncation message) | ||
| expect(output.length).toBeLessThan(2200); | ||
| }); | ||
| }); |
There was a problem hiding this comment.
The new truncation logic has several important code paths that lack test coverage:
-
Budget limiting (
MAX_TOOL_OUTPUT_CHARS): No test verifies that when total output exceeds 40k chars, subsequent cells get the "output omitted due to context limits" message. This is the core mechanism for preventing token limit issues with many cells. -
Error output truncation: The code uses a higher limit (
MAX_ERROR_OUTPUT_CHARS = 3000) for error outputs, but there's no test verifying that error outputs use this separate limit rather than the standard 2000-char limit. -
Console output truncation: Console outputs are truncated via
this.truncateString(consoleOutputString, MAX_TEXT_OUTPUT_CHARS), but there's no test covering large console output truncation.
| // Output size limits to prevent exceeding LLM token limits. | ||
| const MAX_TEXT_OUTPUT_CHARS = 2000; | ||
| const MAX_ERROR_OUTPUT_CHARS = 3000; | ||
| const MAX_TOOL_OUTPUT_CHARS = 40_000; |
There was a problem hiding this comment.
The PR description states that backend changes were made to cells.py (applying HTML summarization and truncation to GetCellOutputs tool) and shared utilities (truncate_string, summarize_html_output) were added to output_cleaning.py. However, none of these backend changes are present in this PR. The GetCellOutputs.handle() method in marimo/_ai/_tools/tools/cells.py still passes raw visual_output (including potentially large HTML) without any truncation. This means the backend MCP tool path is still vulnerable to the same large-output issue that the frontend fix addresses.
There was a problem hiding this comment.
It does look like this is missing based on your PR description @PranavGopinath
There was a problem hiding this comment.
Oh I forgot to remove it from the pr description. I initially implemented the change, but figured what is the point of GetCellOutputs if its not returning the full output? That being said, the output is still ridiculously large, but I find the function doesn't get called unless you explicitly ask the agent to use it. Run stale cells on the other hand is used after every notebook edit, so it's more of a necessary fix.
There was a problem hiding this comment.
Got it, thank you. Will merge this in for next release
|
🚀 Development release published. You may be able to view the changes at https://marimo.app?v=0.20.5-dev20 |
📝 Summary
Truncate large cell outputs in run stale cells tool responses to prevent exceeding LLM token limits.
🔍 Description of Changes
When marimo's agent runs stale cells or retrieves cell outputs, large outputs like Plotly figures or rich dataframes (text/html) can produce tons of characters. In larger notebooks, it can be up to 10M characters, which far exceeds Anthropic's 200k token input limit and breaking the conversation. Later down the line, it might be possible to have a configurable option in the marimo.toml which might work better
Frontend (run-cells-tool.ts):
Tested end-to-end: 3 Plotly cells went from ~125k chars → ~300 chars (245x reduction).
📋 Checklist
discussions (Please provide a link if applicable).
--> Messaged @mscolnick over slack