-
Notifications
You must be signed in to change notification settings - Fork 1k
Description
Bug Description
Bug: Semantic Search (Find) Does Not Return Results for Uploaded Resources
Description
When uploading files via the Console's "Add Resource" feature, the semantic search (/api/v1/search/find) does not return the uploaded content, even though:
- The files are successfully uploaded and stored
- The embedding generation process completes successfully (logs show "All embedding tasks completed")
- Text-based search (
/api/v1/search/grep) CAN find the content - The files are visible in the FileSystem browser
Steps to Reproduce
-
Deploy OpenViking using Docker with the following configuration:
{ "storage": { "workspace": "/app/data" }, "server": { "host": "0.0.0.0", "root_api_key": "your-api-key" }, "vlm": { "provider": "openai", "api_key": "your-api-key", "model": "glm-4-flash", "api_base": "https://open.bigmodel.cn/api/paas/v4" }, "embedding": { "dense": { "provider": "openai", "api_key": "your-api-key", "model": "embedding-3", "api_base": "https://open.bigmodel.cn/api/paas/v4", "dimension": 1024 } } } -
Upload a file via Console (http://localhost:8020):
- Navigate to "Add Resource"
- Select "Upload" tab
- Choose a local file (e.g.,
report.md) - Click "Add Resource"
- Wait for processing to complete
-
Verify the file exists:
- Navigate to "FileSystem"
- Browse to
viking://resources/upload_xxx/ - Confirm the file is visible
-
Search for the content:
- Navigate to "Find"
- Enter a query that matches content in the uploaded file
- Click "Run Find"
Expected Behavior
The semantic search should return results from the uploaded file with relevant scores.
Actual Behavior
The search returns results from other resources (e.g., project code in viking://temp/) but NOT from the uploaded file in viking://resources/upload_xxx/.
Workaround
Text-based search (grep) works correctly:
curl -X POST 'http://localhost:1933/api/v1/search/grep' \
-H 'x-api-key: your-api-key' \
-H 'Content-Type: application/json' \
-d '{"pattern": "search term", "uri": "viking://resources/"}'Root Cause Analysis
Looking at the logs, the embedding generation appears to complete successfully:
Enqueued semantic generation for: viking://resources/upload_xxx
Processing semantic generation for: viking://temp/xxx/upload_xxx
All embedding tasks(8) completed for SemanticMsg xxx
Completed semantic generation for: viking://temp/xxx/upload_xxx
WARNING - [SyncDiff] Failed to list viking://resources/upload_xxx: Directory not found
The issue appears to be:
- Embeddings are created with the temp URI (
viking://temp/xxx/...) - After processing, the content is moved to resources URI (
viking://resources/upload_xxx/...) - The SyncDiff fails with "Directory not found" error
- The embeddings are not properly linked to the final resource URI
- Subsequently, semantic search queries cannot match against the uploaded content
Environment
- OpenViking version: v0.2.7
- Deployment: Docker
- Embedding provider: OpenAI-compatible API (Zhipu AI)
- Embedding model: embedding-3 (1024 dimensions)
Log Excerpt
2026-03-17 02:50:40,747 - INFO - Enqueued semantic generation for: viking://resources/upload_ad9d004ee0a34889a92e0012b431b46b
2026-03-17 02:50:40,790 - INFO - Processing semantic generation for: viking://temp/03170250_45b0b2/upload_ad9d004ee0a34889a92e0012b431b46b
2026-03-17 02:51:16,104 - INFO - Completed semantic generation for: viking://temp/03170250_45b0b2/upload_ad9d004ee0a34889a92e0012b431b46b
2026-03-17 02:51:16,378 - INFO - All embedding tasks(8) completed for SemanticMsg 88520fb7-d857-44fa-b1ae-55f2cae140fa
2026-03-17 02:51:16,379 - WARNING - [SyncDiff] Failed to list viking://resources/upload_ad9d004ee0a34889a92e0012b431b46b: Directory not found
Additional Context
- Chinese directory/file names work correctly for upload and storage
- The issue affects all uploaded files regardless of language
- FileSystem browsing works correctly
- Direct file reading via
/api/v1/content/readworks correctly
Suggested Fix
The embedding URIs should be updated to point to the final resource location after the SyncDiff process, or the SyncDiff should correctly handle the directory structure for newly uploaded resources.
Steps to Reproduce
Semantic Search (Find) Does Not Return Results for Uploaded Resources
Expected Behavior
Semantic Search (Find) Does Not Return Results for Uploaded Resources
Actual Behavior
Semantic Search (Find) Does Not Return Results for Uploaded Resources
Minimal Reproducible Example
Error Logs
OpenViking Version
v0.2.7
Python Version
3.12
Operating System
Linux
Model Backend
None
Additional Context
No response
Metadata
Metadata
Assignees
Labels
Type
Projects
Status