Skip to content

feat(xml): add XML file support for parsing and ontology generation#190

Open
Eskeeet wants to merge 1 commit into666ghj:mainfrom
Eskeeet:feat/xml-support-pr
Open

feat(xml): add XML file support for parsing and ontology generation#190
Eskeeet wants to merge 1 commit into666ghj:mainfrom
Eskeeet:feat/xml-support-pr

Conversation

@Eskeeet
Copy link

@Eskeeet Eskeeet commented Mar 15, 2026

Summary

  • Add XML parsing to FileParser with streaming support for MediaWiki/Wikipedia dumps and a generic XML fallback for all other XML files
  • Add .xml to ALLOWED_EXTENSIONS in config
  • Update frontend file accept filter and validation to include .xml
  • Add unit tests covering generic XML, MediaWiki XML, and edge cases (12/12 passing)

Test plan

  • Run backend/tests/test_file_parser_xml.py unit tests (12/12 passed)
  • Upload a generic .xml file via the UI and verify text is extracted correctly
  • Upload a MediaWiki XML dump and verify article titles and content are extracted
  • Verify unsupported file extensions are still rejected

🤖 Generated with Claude Code

- Add XML parsing to FileParser with streaming MediaWiki dump support and generic XML fallback
- Allow local file paths via `file_paths` form param in /generate-ontology (for large files like 1GB XML dumps)
- Add .xml to ALLOWED_EXTENSIONS in config and frontend file accept filter
- Add unit tests for XML parsing (generic and MediaWiki formats)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. enhancement New feature or request labels Mar 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant