feat(search): Improve FTS5 search with ranking and highlighting #211

Closed
opened 2026-01-24 17:15:14 +00:00 by jack · 0 comments
Owner

Problem

Aktuelle FTS5 Suche:

  • Keine Relevanz-Sortierung (Ergebnisse in beliebiger Reihenfolge)
  • Kein Highlighting der Treffer
  • Keine Snippet-Extraktion
  • Keine Phrase-Suche oder erweiterte Operatoren

Lösung

1. Relevanz-Ranking mit BM25

// ObservationRepository.search()
async search(query: string, options: SearchOptions = {}): Promise<SearchResult[]> {
  const { limit = 20, offset = 0, project } = options;
  
  // BM25 Ranking (eingebaut in FTS5)
  const sql = `
    SELECT 
      o.*,
      bm25(observations_fts, 1.0, 0.75, 0.5) as rank,
      snippet(observations_fts, 0, '<mark>', '</mark>', '...', 32) as title_snippet,
      snippet(observations_fts, 1, '<mark>', '</mark>', '...', 64) as text_snippet
    FROM observations_fts
    JOIN observations o ON observations_fts.rowid = o.id
    WHERE observations_fts MATCH ?
    ${project ? 'AND o.project = ?' : ''}
    ORDER BY rank
    LIMIT ? OFFSET ?
  `;
  
  const params = project 
    ? [query, project, limit, offset]
    : [query, limit, offset];
  
  return this.em.execute(sql, params);
}

2. Erweiterte Query Syntax

// packages/backend/src/utils/search-query.ts
export function parseSearchQuery(input: string): string {
  // Unterstütze verschiedene Operatoren
  let query = input;
  
  // Phrase search: "exact phrase" → "exact phrase"
  // Bereits von FTS5 unterstützt
  
  // OR: term1 OR term2 → term1 OR term2
  // Bereits von FTS5 unterstützt
  
  // NOT: -term → NOT term
  query = query.replace(/\s-(\w+)/g, ' NOT $1');
  
  // Prefix: term* → term*
  // Bereits von FTS5 unterstützt
  
  // Field-specific: title:search → title:search
  // Bereits von FTS5 unterstützt
  
  return query;
}

// Beispiele:
// "authentication bug"     → Phrase match
// auth OR login           → Either term
// session -timeout        → session ohne timeout
// auth*                   → Prefix match (auth, authentication, authorize)
// title:setup             → Nur im Titel suchen

3. Search Result Interface

interface SearchResult {
  observation: Observation;
  rank: number;           // BM25 Score (niedriger = relevanter)
  highlights: {
    title?: string;       // Mit <mark> Tags
    text?: string;        // Mit <mark> Tags
    narrative?: string;   // Mit <mark> Tags
  };
  matchedFields: string[]; // Welche Felder gematcht haben
}
// Aggregationen für Filter
async getSearchFacets(query: string): Promise<SearchFacets> {
  const sql = `
    SELECT 
      o.type,
      o.project,
      COUNT(*) as count
    FROM observations_fts
    JOIN observations o ON observations_fts.rowid = o.id
    WHERE observations_fts MATCH ?
    GROUP BY o.type, o.project
  `;
  
  const results = await this.em.execute(sql, [query]);
  
  return {
    types: groupBy(results, 'type'),
    projects: groupBy(results, 'project'),
  };
}

5. API Erweiterungen

// GET /api/search
interface SearchRequest {
  q: string;              // Search query
  project?: string;       // Filter by project
  type?: string[];        // Filter by observation type
  from?: string;          // Date range start (ISO)
  to?: string;            // Date range end (ISO)
  limit?: number;         // Results per page (default 20)
  offset?: number;        // Pagination offset
  highlight?: boolean;    // Include highlights (default true)
  facets?: boolean;       // Include facet counts (default false)
}

interface SearchResponse {
  results: SearchResult[];
  total: number;
  facets?: SearchFacets;
  query: {
    parsed: string;       // Normalized query
    took: number;         // Search duration in ms
  };
}

6. UI Integration

// Search result component with highlighting
function SearchResultItem({ result }: { result: SearchResult }) {
  return (
    <div className="search-result">
      <h3 dangerouslySetInnerHTML={{ __html: result.highlights.title || result.observation.title }} />
      <p dangerouslySetInnerHTML={{ __html: result.highlights.text || truncate(result.observation.text, 200) }} />
      <div className="meta">
        <span className="type">{result.observation.type}</span>
        <span className="project">{result.observation.project}</span>
        <span className="rank">Relevance: {(1 / result.rank).toFixed(2)}</span>
      </div>
    </div>
  );
}

Beispiel-Suchen

Query Bedeutung
authentication Enthält "authentication"
"session timeout" Exakte Phrase
auth OR login Enthält "auth" oder "login"
database -migration "database" aber nicht "migration"
setup* Beginnt mit "setup"
title:refactor "refactor" nur im Titel

Akzeptanzkriterien

  • BM25 Ranking implementiert
  • Snippet/Highlighting funktioniert
  • Erweiterte Query Syntax (OR, NOT, Phrase, Prefix)
  • Faceted Search für Filter
  • API mit allen Parametern
  • UI zeigt Highlights an
  • Dokumentation der Query Syntax
## Problem Aktuelle FTS5 Suche: - Keine Relevanz-Sortierung (Ergebnisse in beliebiger Reihenfolge) - Kein Highlighting der Treffer - Keine Snippet-Extraktion - Keine Phrase-Suche oder erweiterte Operatoren ## Lösung ### 1. Relevanz-Ranking mit BM25 ```typescript // ObservationRepository.search() async search(query: string, options: SearchOptions = {}): Promise<SearchResult[]> { const { limit = 20, offset = 0, project } = options; // BM25 Ranking (eingebaut in FTS5) const sql = ` SELECT o.*, bm25(observations_fts, 1.0, 0.75, 0.5) as rank, snippet(observations_fts, 0, '<mark>', '</mark>', '...', 32) as title_snippet, snippet(observations_fts, 1, '<mark>', '</mark>', '...', 64) as text_snippet FROM observations_fts JOIN observations o ON observations_fts.rowid = o.id WHERE observations_fts MATCH ? ${project ? 'AND o.project = ?' : ''} ORDER BY rank LIMIT ? OFFSET ? `; const params = project ? [query, project, limit, offset] : [query, limit, offset]; return this.em.execute(sql, params); } ``` ### 2. Erweiterte Query Syntax ```typescript // packages/backend/src/utils/search-query.ts export function parseSearchQuery(input: string): string { // Unterstütze verschiedene Operatoren let query = input; // Phrase search: "exact phrase" → "exact phrase" // Bereits von FTS5 unterstützt // OR: term1 OR term2 → term1 OR term2 // Bereits von FTS5 unterstützt // NOT: -term → NOT term query = query.replace(/\s-(\w+)/g, ' NOT $1'); // Prefix: term* → term* // Bereits von FTS5 unterstützt // Field-specific: title:search → title:search // Bereits von FTS5 unterstützt return query; } // Beispiele: // "authentication bug" → Phrase match // auth OR login → Either term // session -timeout → session ohne timeout // auth* → Prefix match (auth, authentication, authorize) // title:setup → Nur im Titel suchen ``` ### 3. Search Result Interface ```typescript interface SearchResult { observation: Observation; rank: number; // BM25 Score (niedriger = relevanter) highlights: { title?: string; // Mit <mark> Tags text?: string; // Mit <mark> Tags narrative?: string; // Mit <mark> Tags }; matchedFields: string[]; // Welche Felder gematcht haben } ``` ### 4. Faceted Search ```typescript // Aggregationen für Filter async getSearchFacets(query: string): Promise<SearchFacets> { const sql = ` SELECT o.type, o.project, COUNT(*) as count FROM observations_fts JOIN observations o ON observations_fts.rowid = o.id WHERE observations_fts MATCH ? GROUP BY o.type, o.project `; const results = await this.em.execute(sql, [query]); return { types: groupBy(results, 'type'), projects: groupBy(results, 'project'), }; } ``` ### 5. API Erweiterungen ```typescript // GET /api/search interface SearchRequest { q: string; // Search query project?: string; // Filter by project type?: string[]; // Filter by observation type from?: string; // Date range start (ISO) to?: string; // Date range end (ISO) limit?: number; // Results per page (default 20) offset?: number; // Pagination offset highlight?: boolean; // Include highlights (default true) facets?: boolean; // Include facet counts (default false) } interface SearchResponse { results: SearchResult[]; total: number; facets?: SearchFacets; query: { parsed: string; // Normalized query took: number; // Search duration in ms }; } ``` ### 6. UI Integration ```typescript // Search result component with highlighting function SearchResultItem({ result }: { result: SearchResult }) { return ( <div className="search-result"> <h3 dangerouslySetInnerHTML={{ __html: result.highlights.title || result.observation.title }} /> <p dangerouslySetInnerHTML={{ __html: result.highlights.text || truncate(result.observation.text, 200) }} /> <div className="meta"> <span className="type">{result.observation.type}</span> <span className="project">{result.observation.project}</span> <span className="rank">Relevance: {(1 / result.rank).toFixed(2)}</span> </div> </div> ); } ``` ## Beispiel-Suchen | Query | Bedeutung | |-------|-----------| | `authentication` | Enthält "authentication" | | `"session timeout"` | Exakte Phrase | | `auth OR login` | Enthält "auth" oder "login" | | `database -migration` | "database" aber nicht "migration" | | `setup*` | Beginnt mit "setup" | | `title:refactor` | "refactor" nur im Titel | ## Akzeptanzkriterien - [ ] BM25 Ranking implementiert - [ ] Snippet/Highlighting funktioniert - [ ] Erweiterte Query Syntax (OR, NOT, Phrase, Prefix) - [ ] Faceted Search für Filter - [ ] API mit allen Parametern - [ ] UI zeigt Highlights an - [ ] Dokumentation der Query Syntax
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
customable/claude-mem#211
No description provided.