Bug: Documents stored with raw JSON instead of parsed fields #234

New issue

Closed

opened 2026-01-24 22:12:05 +00:00 by jack · 2 comments

jack commented

2026-01-24 22:12:05 +00:00

Owner

Description

Documents are stored with the entire JSON response in the content field instead of extracting and storing individual fields properly.

Current Behavior

The content column contains the full JSON blob:

{
  "bytes": 46325,
  "code": 200,
  "codeText": "OK",
  "result": "> ## Documentation Index...",
  "durationMs": 448,
  "url": "https://code.claude.com/docs/en/hooks"
}

This causes the UI to display "{" or "[" as the document title (first character of the JSON string).

Expected Behavior

When saving documents, the storage logic should:

Parse the JSON response
Extract url → store in url or title column
Extract result → store in content column (the actual documentation text)
Store metadata (bytes, code, durationMs) separately if needed

Location

The bug is in the document storage/caching logic, likely in:

MCP tool interceptor for Context7/WebFetch
Or wherever documents are saved to the documents table

## Description Documents are stored with the entire JSON response in the `content` field instead of extracting and storing individual fields properly. ## Current Behavior The `content` column contains the full JSON blob: ```json { "bytes": 46325, "code": 200, "codeText": "OK", "result": "> ## Documentation Index...", "durationMs": 448, "url": "https://code.claude.com/docs/en/hooks" } ``` This causes the UI to display `"{"` or `"["` as the document title (first character of the JSON string). ## Expected Behavior When saving documents, the storage logic should: 1. Parse the JSON response 2. Extract `url` → store in `url` or `title` column 3. Extract `result` → store in `content` column (the actual documentation text) 4. Store metadata (`bytes`, `code`, `durationMs`) separately if needed ## Location The bug is in the document storage/caching logic, likely in: - MCP tool interceptor for Context7/WebFetch - Or wherever documents are saved to the `documents` table

jack added the

labels

2026-01-24 22:12:08 +00:00

jack commented

2026-01-24 22:13:01 +00:00

Author

Owner

Additional Context

The document content field is a JSON string like:

{
  "bytes": 46325,
  "code": 200,
  "codeText": "OK",
  "result": "> ## Documentation Index...",
  "durationMs": 448,
  "url": "https://code.claude.com/docs/en/hooks"
}

The UI should parse this JSON and extract a meaningful title:

For WebFetch documents: Use url field (e.g., "code.claude.com/docs/en/hooks")
For Context7 (Library Docs): Parse the content and extract library name or topic

Currently it seems like the UI is displaying content[0] (first character of the JSON string) instead of parsing the JSON and extracting a title field.

## Additional Context The document `content` field is a JSON string like: ```json { "bytes": 46325, "code": 200, "codeText": "OK", "result": "> ## Documentation Index...", "durationMs": 448, "url": "https://code.claude.com/docs/en/hooks" } ``` The UI should parse this JSON and extract a meaningful title: - For **WebFetch** documents: Use `url` field (e.g., "code.claude.com/docs/en/hooks") - For **Context7** (Library Docs): Parse the content and extract library name or topic Currently it seems like the UI is displaying `content[0]` (first character of the JSON string) instead of parsing the JSON and extracting a title field.

jack changed title from ~~Bug: Documents view shows JSON characters instead of titles~~ to Bug: Documents stored with raw JSON instead of parsed fields

2026-01-24 22:14:00 +00:00

jack commented

2026-01-24 22:14:26 +00:00

Author

Owner

Context7 Format

Context7 returns an array of content blocks:

[
  {
    "type": "text",
    "text": "### SubagentStop Hook Event\n\nSource: https://github.com/anthropics/claude-code/blob/main/...\n\n..."
  }
]

Parsing needed:

Extract text from each content block
Parse the first header (e.g., ### SubagentStop Hook Event) as title
Or extract the Source: URL as title
Store the combined text content in the content column

Summary of formats to handle:

Source	Format	Title extraction
WebFetch	`{url, result, bytes, ...}`	Use `url` field
Context7	`[{type: "text", text: "..."}]`	Parse first `###` header or `Source:` URL

## Context7 Format Context7 returns an array of content blocks: ```json [ { "type": "text", "text": "### SubagentStop Hook Event\n\nSource: https://github.com/anthropics/claude-code/blob/main/...\n\n..." } ] ``` **Parsing needed:** 1. Extract `text` from each content block 2. Parse the first header (e.g., `### SubagentStop Hook Event`) as title 3. Or extract the `Source:` URL as title 4. Store the combined `text` content in the `content` column **Summary of formats to handle:** | Source | Format | Title extraction | |--------|--------|------------------| | WebFetch | `{url, result, bytes, ...}` | Use `url` field | | Context7 | `[{type: "text", text: "..."}]` | Parse first `###` header or `Source:` URL |