Skip to content

truncate html output of run stale cells tool#8578

Merged
mscolnick merged 2 commits intomarimo-team:mainfrom
PranavGopinath:truncate-tool-output
Mar 6, 2026
Merged

truncate html output of run stale cells tool#8578
mscolnick merged 2 commits intomarimo-team:mainfrom
PranavGopinath:truncate-tool-output

Conversation

@PranavGopinath
Copy link
Contributor

@PranavGopinath PranavGopinath commented Mar 4, 2026

📝 Summary

Truncate large cell outputs in run stale cells tool responses to prevent exceeding LLM token limits.

🔍 Description of Changes

When marimo's agent runs stale cells or retrieves cell outputs, large outputs like Plotly figures or rich dataframes (text/html) can produce tons of characters. In larger notebooks, it can be up to 10M characters, which far exceeds Anthropic's 200k token input limit and breaking the conversation. Later down the line, it might be possible to have a configurable option in the marimo.toml which might work better

Frontend (run-cells-tool.ts):

  • Summarize text/html outputs (not interpretable by LLMs) into a short description with char count
  • Truncate text outputs to 2,000 chars, error outputs to 3,000 chars
  • Track total output size across all cells with a 40k char limit; summarize remaining cells when exceeded

Tested end-to-end: 3 Plotly cells went from ~125k chars → ~300 chars (245x reduction).

📋 Checklist

  • I have read the contributor guidelines.
  • For large changes, or changes that affect the public API: this change was discussed or approved through an issue, on Discord, or the community
    discussions (Please provide a link if applicable).
    --> Messaged @mscolnick over slack
  • Tests have been added for the changes made.
  • Documentation has been updated where applicable, including docstrings for API changes.
  • Pull request title is a good summary of the changes - it will be used in the release notes.

@vercel
Copy link

vercel bot commented Mar 4, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
marimo-docs Ready Ready Preview, Comment Mar 4, 2026 8:28pm

Request Review

@github-actions
Copy link

github-actions bot commented Mar 4, 2026

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

@PranavGopinath
Copy link
Contributor Author

I have read the CLA Document and I hereby sign the CLA

@Light2Dark Light2Dark added the bug Something isn't working label Mar 5, 2026
Copy link
Collaborator

@Light2Dark Light2Dark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good, thank you! I'm going to clean up the code in a follow-up

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds output truncation logic to the frontend RunStaleCellsTool to prevent exceeding LLM token limits when running stale cells in marimo's AI agent mode. Large cell outputs (like Plotly figures producing millions of characters of HTML) were breaking conversations by exceeding Anthropic's 200k token input limit.

Changes:

  • Added output size limits: text/html outputs are summarized to a short description, text outputs capped at 2,000 chars, error outputs at 3,000 chars, with a 40k char total budget across all cells
  • Restructured the cell output processing loop to track total output size and skip remaining cells when the budget is exceeded
  • Added tests for HTML summarization and text truncation

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
frontend/src/core/ai/tools/run-cells-tool.ts Added output truncation constants and logic: HTML outputs are summarized, text/error outputs are truncated, and a global character budget limits total output size
frontend/src/core/ai/tools/__tests__/run-cells-tool.test.ts Added tests for HTML output summarization and text output truncation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +414 to +491
describe("output truncation", () => {
it("should summarize text/html output instead of dumping raw content", async () => {
const notebook = MockNotebook.notebookState({
cellData: {
[cellId1]: { code: "fig.show()", edited: true },
},
});
store.set(notebookAtom, notebook);

vi.mocked(runCells).mockImplementation(async () => {
const updatedNotebook = store.get(notebookAtom);
updatedNotebook.cellRuntime[cellId1] = {
...updatedNotebook.cellRuntime[cellId1],
status: "idle",
};
store.set(notebookAtom, updatedNotebook);
});

const largeHtml = `<div>${"x".repeat(2_000_000)}</div>`;
vi.mocked(getCellContextData).mockReturnValue({
cellOutput: {
outputType: "text",
processedContent: null,
imageUrl: null,
output: { mimetype: "text/html", data: largeHtml },
},
consoleOutputs: null,
cellName: "cell1",
} as never);

const result = await tool.handler({}, toolContext as never);

expect(result.status).toBe("success");
const output = result.cellsToOutput?.[cellId1]?.cellOutput ?? "";
expect(output).toContain("HTML Output:");
expect(output).toContain("text/html");
expect(output.length).toBeLessThan(200);
expect(output).not.toContain(largeHtml);
});

it("should truncate large text output to MAX_TEXT_OUTPUT_CHARS", async () => {
const notebook = MockNotebook.notebookState({
cellData: {
[cellId1]: { code: "print(big_string)", edited: true },
},
});
store.set(notebookAtom, notebook);

vi.mocked(runCells).mockImplementation(async () => {
const updatedNotebook = store.get(notebookAtom);
updatedNotebook.cellRuntime[cellId1] = {
...updatedNotebook.cellRuntime[cellId1],
status: "idle",
};
store.set(notebookAtom, updatedNotebook);
});

const largeText = "a".repeat(10_000);
vi.mocked(getCellContextData).mockReturnValue({
cellOutput: {
outputType: "text",
processedContent: largeText,
imageUrl: null,
output: { mimetype: "text/plain", data: largeText },
},
consoleOutputs: null,
cellName: "cell1",
} as never);

const result = await tool.handler({}, toolContext as never);

const output = result.cellsToOutput?.[cellId1]?.cellOutput ?? "";
expect(output).toContain("[TRUNCATED:");
expect(output).toContain("Full output visible in the notebook UI.");
// Output should be capped (2000 chars content + "Output:\n" prefix + truncation message)
expect(output.length).toBeLessThan(2200);
});
});
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new truncation logic has several important code paths that lack test coverage:

  1. Budget limiting (MAX_TOOL_OUTPUT_CHARS): No test verifies that when total output exceeds 40k chars, subsequent cells get the "output omitted due to context limits" message. This is the core mechanism for preventing token limit issues with many cells.

  2. Error output truncation: The code uses a higher limit (MAX_ERROR_OUTPUT_CHARS = 3000) for error outputs, but there's no test verifying that error outputs use this separate limit rather than the standard 2000-char limit.

  3. Console output truncation: Console outputs are truncated via this.truncateString(consoleOutputString, MAX_TEXT_OUTPUT_CHARS), but there's no test covering large console output truncation.

Copilot uses AI. Check for mistakes.
Comment on lines +27 to +30
// Output size limits to prevent exceeding LLM token limits.
const MAX_TEXT_OUTPUT_CHARS = 2000;
const MAX_ERROR_OUTPUT_CHARS = 3000;
const MAX_TOOL_OUTPUT_CHARS = 40_000;
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description states that backend changes were made to cells.py (applying HTML summarization and truncation to GetCellOutputs tool) and shared utilities (truncate_string, summarize_html_output) were added to output_cleaning.py. However, none of these backend changes are present in this PR. The GetCellOutputs.handle() method in marimo/_ai/_tools/tools/cells.py still passes raw visual_output (including potentially large HTML) without any truncation. This means the backend MCP tool path is still vulnerable to the same large-output issue that the frontend fix addresses.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does look like this is missing based on your PR description @PranavGopinath

Copy link
Contributor Author

@PranavGopinath PranavGopinath Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I forgot to remove it from the pr description. I initially implemented the change, but figured what is the point of GetCellOutputs if its not returning the full output? That being said, the output is still ridiculously large, but I find the function doesn't get called unless you explicitly ask the agent to use it. Run stale cells on the other hand is used after every notebook edit, so it's more of a necessary fix.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thank you. Will merge this in for next release

@mscolnick mscolnick merged commit 64d9584 into marimo-team:main Mar 6, 2026
33 of 37 checks passed
@github-actions
Copy link

github-actions bot commented Mar 6, 2026

🚀 Development release published. You may be able to view the changes at https://marimo.app?v=0.20.5-dev20

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants