Skip to content

Populate text_block field during OCR processing#1216

Open
majiayu000 wants to merge 1 commit intollmware-ai:mainfrom
majiayu000:fix/ocr-populate-text-block
Open

Populate text_block field during OCR processing#1216
majiayu000 wants to merge 1 commit intollmware-ai:mainfrom
majiayu000:fix/ocr-populate-text-block

Conversation

@majiayu000
Copy link

Summary

  • Fixes OCR processing to populate both 'text' and 'text_search' fields
  • Previously only 'text_search' was populated, leaving 'text_block' empty
  • Semantic queries now return proper text content from OCR-processed images

Root cause

The ocr_images_in_library method only called new_block.update({"text_search": text_chunk})
but not new_block.update({"text": text_chunk}). Since 'text' maps to 'text_block' in the
database and semantic queries retrieve from 'text_block', the text was not returned.

Test plan

  • Added unit tests verifying both fields are populated
  • Tests confirm text and text_search contain matching content

Fixes #1123

Signed-off-by: majiayu000 1835304752@qq.com

When running OCR on library images, only the text_search field was being
populated, leaving text_block (mapped from 'text' in the record) empty.

This caused semantic queries to return empty text results since they
retrieve content from text_block, not text_search.

The fix ensures both 'text' and 'text_search' fields are populated with
the OCR-extracted content.

Fixes llmware-ai#1123

Signed-off-by: majiayu000 <1835304752@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant