Skip to content

[Bug]: Semantic Search (Find) Does Not Return Results for Uploaded Resources #677

@chenhunhun

Description

@chenhunhun

Bug Description

Bug: Semantic Search (Find) Does Not Return Results for Uploaded Resources

Description

When uploading files via the Console's "Add Resource" feature, the semantic search (/api/v1/search/find) does not return the uploaded content, even though:

  1. The files are successfully uploaded and stored
  2. The embedding generation process completes successfully (logs show "All embedding tasks completed")
  3. Text-based search (/api/v1/search/grep) CAN find the content
  4. The files are visible in the FileSystem browser

Steps to Reproduce

  1. Deploy OpenViking using Docker with the following configuration:

    {
      "storage": { "workspace": "/app/data" },
      "server": { "host": "0.0.0.0", "root_api_key": "your-api-key" },
      "vlm": {
        "provider": "openai",
        "api_key": "your-api-key",
        "model": "glm-4-flash",
        "api_base": "https://open.bigmodel.cn/api/paas/v4"
      },
      "embedding": {
        "dense": {
          "provider": "openai",
          "api_key": "your-api-key",
          "model": "embedding-3",
          "api_base": "https://open.bigmodel.cn/api/paas/v4",
          "dimension": 1024
        }
      }
    }
  2. Upload a file via Console (http://localhost:8020):

    • Navigate to "Add Resource"
    • Select "Upload" tab
    • Choose a local file (e.g., report.md)
    • Click "Add Resource"
    • Wait for processing to complete
  3. Verify the file exists:

    • Navigate to "FileSystem"
    • Browse to viking://resources/upload_xxx/
    • Confirm the file is visible
  4. Search for the content:

    • Navigate to "Find"
    • Enter a query that matches content in the uploaded file
    • Click "Run Find"

Expected Behavior

The semantic search should return results from the uploaded file with relevant scores.

Actual Behavior

The search returns results from other resources (e.g., project code in viking://temp/) but NOT from the uploaded file in viking://resources/upload_xxx/.

Workaround

Text-based search (grep) works correctly:

curl -X POST 'http://localhost:1933/api/v1/search/grep' \
  -H 'x-api-key: your-api-key' \
  -H 'Content-Type: application/json' \
  -d '{"pattern": "search term", "uri": "viking://resources/"}'

Root Cause Analysis

Looking at the logs, the embedding generation appears to complete successfully:

Enqueued semantic generation for: viking://resources/upload_xxx
Processing semantic generation for: viking://temp/xxx/upload_xxx
All embedding tasks(8) completed for SemanticMsg xxx
Completed semantic generation for: viking://temp/xxx/upload_xxx
WARNING - [SyncDiff] Failed to list viking://resources/upload_xxx: Directory not found

The issue appears to be:

  1. Embeddings are created with the temp URI (viking://temp/xxx/...)
  2. After processing, the content is moved to resources URI (viking://resources/upload_xxx/...)
  3. The SyncDiff fails with "Directory not found" error
  4. The embeddings are not properly linked to the final resource URI
  5. Subsequently, semantic search queries cannot match against the uploaded content

Environment

  • OpenViking version: v0.2.7
  • Deployment: Docker
  • Embedding provider: OpenAI-compatible API (Zhipu AI)
  • Embedding model: embedding-3 (1024 dimensions)

Log Excerpt

2026-03-17 02:50:40,747 - INFO - Enqueued semantic generation for: viking://resources/upload_ad9d004ee0a34889a92e0012b431b46b
2026-03-17 02:50:40,790 - INFO - Processing semantic generation for: viking://temp/03170250_45b0b2/upload_ad9d004ee0a34889a92e0012b431b46b
2026-03-17 02:51:16,104 - INFO - Completed semantic generation for: viking://temp/03170250_45b0b2/upload_ad9d004ee0a34889a92e0012b431b46b
2026-03-17 02:51:16,378 - INFO - All embedding tasks(8) completed for SemanticMsg 88520fb7-d857-44fa-b1ae-55f2cae140fa
2026-03-17 02:51:16,379 - WARNING - [SyncDiff] Failed to list viking://resources/upload_ad9d004ee0a34889a92e0012b431b46b: Directory not found

Additional Context

  • Chinese directory/file names work correctly for upload and storage
  • The issue affects all uploaded files regardless of language
  • FileSystem browsing works correctly
  • Direct file reading via /api/v1/content/read works correctly

Suggested Fix

The embedding URIs should be updated to point to the final resource location after the SyncDiff process, or the SyncDiff should correctly handle the directory structure for newly uploaded resources.

Steps to Reproduce

Semantic Search (Find) Does Not Return Results for Uploaded Resources

Expected Behavior

Semantic Search (Find) Does Not Return Results for Uploaded Resources

Actual Behavior

Semantic Search (Find) Does Not Return Results for Uploaded Resources

Minimal Reproducible Example

Error Logs

OpenViking Version

v0.2.7

Python Version

3.12

Operating System

Linux

Model Backend

None

Additional Context

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

Status

Backlog

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions