fix: strip stray HTML tags from imported Jupyter markdown cells by giulio-leone · Pull Request #8674 · marimo-team/marimo

giulio-leone · 2026-03-13T05:36:57Z

Summary

When importing Jupyter notebooks (e.g. SM_sphere_S2.ipynb), markdown cells containing … HTML paragraph wrappers are kept verbatim inside mo.md(). This breaks LaTeX rendering because the markdown/math renderer does not process LaTeX delimiters that appear inside HTML  elements.

Example

Jupyter markdown cell:

<p>We declare that $\mathbb{S}^2 = U \cup V$:</p>

Before (broken in marimo): The  tags prevent the LaTeX formula from rendering.

After (this fix): Converted to plain text:

We declare that $\mathbb{S}^2 = U \cup V$:

Fix

Added a _strip_paragraph_tags() helper in marimo/_convert/ipynb/to_ir.py that removes bare / tags (including those with attributes like ) from markdown cell source before passing it to markdown_to_marimo().

Other HTML tags (<div>, , , , etc.) are preserved — only paragraph wrappers are stripped since they are redundant in plain markdown.

Closes #8651

vercel · 2026-03-13T05:37:02Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
marimo-docs	Ready	Preview, Comment	Mar 16, 2026 5:55pm

github-actions · 2026-03-13T05:37:08Z

All contributors have signed the CLA ✍️ ✅
_{Posted by the CLA Assistant Lite bot.}

Copilot

Pull request overview

This PR updates the Jupyter (.ipynb) importer to normalize markdown cell sources by removing HTML paragraph wrappers ( / ) before converting them into mo.md(...), improving LaTeX rendering for notebooks that embed math inside  tags.

Changes:

Add a _strip_paragraph_tags() helper (and compiled regex) to remove  wrappers from markdown cell source.
Apply the stripping step during convert_from_ipynb_to_notebook_ir() before calling markdown_to_marimo().

Comments suppressed due to low confidence (1)

marimo/_convert/ipynb/to_ir.py:1431

This new import-time normalization changes markdown cell semantics, but there’s no targeted test asserting the intended behavior (strip  wrappers, preserve other HTML, and avoid breaking code fences). Since the repo has an established ipynb importer test suite (e.g. tests/_convert/ipynb/test_ipynb_to_ir.py), please add a unit test covering representative -wrapped markdown cells.

        if is_markdown:
            source = _strip_paragraph_tags(source)
            cell_meta = cell.get("metadata", {})
            md_prefix = cell_meta.get("marimo", {}).get(
                "md_prefix", DEFAULT_MARKDOWN_PREFIX
            )
            source = markdown_to_marimo(source, prefix=md_prefix)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

marimo/_convert/ipynb/to_ir.py

+_PARAGRAPH_TAG_RE = re.compile(
+    r"<p(?:\s[^>]*)?>|</p>",
+    re.IGNORECASE,
+)
+
+
+def _strip_paragraph_tags(source: str) -> str:
+    """Remove bare ``<p>`` / ``</p>`` HTML tags from markdown source.
+
+    Jupyter markdown cells often wrap content in ``<p>…</p>`` tags which are
+    redundant in plain markdown and can break LaTeX rendering inside
+    ``mo.md()``.  This helper removes them while preserving all other HTML
+    tags and the text content.
+    """
+    return _PARAGRAPH_TAG_RE.sub("", source)
+
+


mscolnick · 2026-03-13T13:35:34Z

marimo/_convert/ipynb/to_ir.py

+    ``mo.md()``.  This helper removes them while preserving all other HTML
+    tags and the text content.
+    """
+    return _PARAGRAPH_TAG_RE.sub("", source)
+
+


these are good suggestions and we can add test cases for each one

Thanks @mscolnick! I'll add test cases for these edge cases — specifically:

Preserving  tags inside fenced code blocks

Handling adjacent paragraph tags (ab) with proper separation

Will push an update shortly.

mscolnick

thanks for the contribution! could we add some tests for this? could we also check that it does not strip  since that is a real change to output.

giulio-leone · 2026-03-14T19:35:02Z

Added tests and addressed reviewer feedback:

Improvements to _strip_paragraph_tags:

Now skips / tags inside fenced code blocks (both backtick and tilde fences), so HTML examples in documentation/code are preserved
Replaces closing  tags with a newline instead of removing them, preventing adjacent paragraphs from collapsing (e.g. ab no longer becomes ab)
Cleans up excessive blank lines introduced by replacements

12 test cases added in tests/_convert/ipynb/test_strip_paragraph_tags.py:

Basic tag removal
Tags with attributes ()
Case-insensitive matching
Adjacent paragraph separation
Fenced code block preservation (backtick and tilde)
Multiline paragraphs
No-tags passthrough
Empty string
LaTeX content preservation
Nested HTML preservation

All tests verified locally.

Jupyter notebooks often wrap paragraph text in ... tags which Jupyter renders natively. When converting to marimo, these tags are kept verbatim inside mo.md() where they interfere with LaTeX rendering — LaTeX delimiters inside elements are not processed by the markdown/math renderer. Add a _strip_paragraph_tags() helper that removes bare / tags (including those with attributes like ) before passing the markdown source to markdown_to_marimo(). Other HTML tags (<div>, , , , etc.) are preserved. Closes marimo-team#8651

@mscolnick

Address reviewer suggestions from @mscolnick and @Copilot: - Skip /<\/p> tags inside fenced code blocks (backtick and tilde) - Replace with newline to preserve paragraph separation - Add 12 test cases covering: basic removal, attributes, case sensitivity, adjacent paragraph separation, fenced code block preservation (backtick and tilde), multiline, passthrough, empty string, LaTeX, nested HTML

for more information, see https://pre-commit.ci

Address review feedback from mscolnick: - Change regex to only match bare (no attributes) - Styled tags like are preserved - Pair-matching ensures is only stripped when closing a bare - Add tests for styled tag preservation, mixed bare+styled, adjacent styled

giulio-leone · 2026-03-14T21:44:22Z

Updated the implementation based on the review feedback:

Changes

Styled  tags are now preserved — only bare  (no attributes) are stripped. Tags like  or  remain intact since they carry semantic meaning.
Stack-based pair matching — each  is matched to its corresponding  opener. Only pairs where the opener is bare get stripped.
Fenced code blocks still protected —  tags inside ``` or ~~~ blocks are never touched.

Tests added

test_preserves_styled_p_tag —  stays
test_preserves_p_with_class —  stays
test_preserves_p_with_id —  stays
test_mixed_bare_and_styled — bare stripped, styled preserved in same cell
test_adjacent_styled_paragraphs_fully_preserved — adjacent styled pairs intact
All existing tests updated and passing

Copilot

Pull request overview

This PR improves the Jupyter notebook (.ipynb) importer by preprocessing markdown cell content to remove problematic HTML paragraph wrappers that interfere with downstream markdown/LaTeX rendering in mo.md().

Changes:

Add _strip_paragraph_tags() to remove bare ... wrappers from imported markdown source (with fenced code block protection).
Apply paragraph-tag stripping during markdown cell conversion in convert_from_ipynb_to_notebook_ir().
Add unit tests covering paragraph stripping behavior and fenced-code-block preservation.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
marimo/_convert/ipynb/to_ir.py	Introduces `_strip_paragraph_tags()` and applies it to markdown cells during ipynb → IR conversion.
tests/_convert/ipynb/test_strip_paragraph_tags.py	Adds focused unit tests for paragraph tag stripping behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

marimo/_convert/ipynb/to_ir.py

+    # Clean up excessive blank lines introduced by replacements
+    source = re.sub(r"\n{3,}", "\n\n", source)
+    return source.strip()


marimo/_convert/ipynb/to_ir.py

+    Only bare ``<p>`` tags (without attributes) are removed.  Styled tags such
+    as ``<p style="color: red">`` are preserved because they carry semantic
+    meaning.  The matching ``</p>`` is only removed when it closes a bare
+    ``<p>``.


tests/_convert/ipynb/test_strip_paragraph_tags.py

+    def test_preserves_styled_p_tag(self) -> None:
+        """Styled <p> tags carry semantic meaning and must be preserved."""
+        source = '<p style="color: red">Red text</p>'
+        result = _strip_paragraph_tags(source)
+        assert '<p style="color: red">' in result
+        assert "</p>" in result
+
+    def test_preserves_p_with_class(self) -> None:
+        result = _strip_paragraph_tags('<p class="lead">Styled text</p>')
+        assert '<p class="lead">' in result
+        assert "</p>" in result
+
+    def test_preserves_p_with_id(self) -> None:
+        result = _strip_paragraph_tags('<p id="intro">Intro text</p>')
+        assert '<p id="intro">' in result
+


dmadisetti

Preference to not use regex to strip HTML. I think we can use a markdown preprocessor to handle code blocks and the builtin HTMLParser accordingly.

But maybe a larger change, and this is fine

marimo/_convert/ipynb/to_ir.py

@dmadisetti

Replace hand-rolled _FENCED_BLOCK_RE regex with RE_NESTED_FENCE_START from pymdownx.superfences, which is already used elsewhere in the codebase (marimo/_convert/markdown/to_ir.py) and is a battle-tested dependency of marimo. As suggested by @dmadisetti in review. Signed-off-by: giulio-leone <giulio97.leone@gmail.com>

giulio-leone requested a review from dmadisetti as a code owner March 13, 2026 05:36

Copilot AI review requested due to automatic review settings March 13, 2026 05:36

Copilot started reviewing on behalf of giulio-leone March 13, 2026 05:37 View session

vercel bot deployed to Preview March 13, 2026 05:38 View deployment

Copilot AI reviewed Mar 13, 2026

View reviewed changes

mscolnick reviewed Mar 13, 2026

View reviewed changes

vercel bot deployed to Preview March 14, 2026 19:35 View deployment

vercel bot deployed to Preview March 14, 2026 19:37 View deployment

giulio-leone added 3 commits March 14, 2026 21:07

[pre-commit.ci] auto fixes from pre-commit.com hooks

c3a6806

for more information, see https://pre-commit.ci

giulio-leone force-pushed the fix/ipynb-strip-html-paragraph-tags branch from a9f69ea to e8185f0 Compare March 14, 2026 20:07

vercel bot deployed to Preview March 14, 2026 20:08 View deployment

vercel bot deployed to Preview March 14, 2026 21:45 View deployment

giulio-leone force-pushed the fix/ipynb-strip-html-paragraph-tags branch from cf23997 to f8f3e26 Compare March 15, 2026 16:20

vercel bot deployed to Preview March 15, 2026 16:21 View deployment

mscolnick requested a review from Copilot March 15, 2026 21:28

Copilot started reviewing on behalf of mscolnick March 15, 2026 21:29 View session

Copilot AI reviewed Mar 15, 2026

View reviewed changes

dmadisetti mentioned this pull request Mar 16, 2026

fix: unwrap top-level paragraph HTML in ipynb markdown import #8691

Closed

5 tasks

dmadisetti reviewed Mar 16, 2026

View reviewed changes

marimo/_convert/ipynb/to_ir.py Outdated Show resolved Hide resolved

vercel bot deployed to Preview March 16, 2026 17:55 View deployment

dmadisetti added the bug Something isn't working label Mar 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: strip stray <p></p> HTML tags from imported Jupyter markdown cells#8674

fix: strip stray <p></p> HTML tags from imported Jupyter markdown cells#8674
giulio-leone wants to merge 5 commits intomarimo-team:mainfrom
giulio-leone:fix/ipynb-strip-html-paragraph-tags

giulio-leone commented Mar 13, 2026

Uh oh!

vercel bot commented Mar 13, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 13, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

mscolnick Mar 13, 2026

Uh oh!

giulio-leone Mar 16, 2026

Uh oh!

mscolnick left a comment

Uh oh!

giulio-leone commented Mar 14, 2026

Uh oh!

giulio-leone commented Mar 14, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

dmadisetti left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

giulio-leone commented Mar 13, 2026

Summary

Example

Fix

Uh oh!

vercel bot commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

mscolnick Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

giulio-leone Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

mscolnick left a comment

Choose a reason for hiding this comment

Uh oh!

giulio-leone commented Mar 14, 2026

Uh oh!

giulio-leone commented Mar 14, 2026

Changes

Tests added

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

dmadisetti left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

vercel bot commented Mar 13, 2026 •

edited

Loading

github-actions bot commented Mar 13, 2026 •

edited

Loading