Populate text_block field during OCR processing by majiayu000 · Pull Request #1216 · llmware-ai/llmware

majiayu000 · 2025-12-30T03:39:26Z

Summary

Fixes OCR processing to populate both 'text' and 'text_search' fields
Previously only 'text_search' was populated, leaving 'text_block' empty
Semantic queries now return proper text content from OCR-processed images

Root cause

The ocr_images_in_library method only called new_block.update({"text_search": text_chunk})
but not new_block.update({"text": text_chunk}). Since 'text' maps to 'text_block' in the
database and semantic queries retrieve from 'text_block', the text was not returned.

Test plan

Added unit tests verifying both fields are populated
Tests confirm text and text_search contain matching content

Fixes #1123

Signed-off-by: majiayu000 1835304752@qq.com

When running OCR on library images, only the text_search field was being populated, leaving text_block (mapped from 'text' in the record) empty. This caused semantic queries to return empty text results since they retrieve content from text_block, not text_search. The fix ensures both 'text' and 'text_search' fields are populated with the OCR-extracted content. Fixes llmware-ai#1123 Signed-off-by: majiayu000 <1835304752@qq.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Populate text_block field during OCR processing#1216

Populate text_block field during OCR processing#1216
majiayu000 wants to merge 1 commit intollmware-ai:mainfrom
majiayu000:fix/ocr-populate-text-block

majiayu000 commented Dec 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

majiayu000 commented Dec 30, 2025

Summary

Root cause

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant