feat: Endless Mode - Real-time context compression for extended sessions #109
Labels
No labels
good first issue
has-pr
help wanted
idea
priority
critical
priority
high
priority
low
priority
medium
status
blocked
status
in-progress
status
needs-review
status
ready
type
bug
type
docs
type
enhancement
type
feature
type
refactor
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
customable/claude-mem#109
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Zusammenfassung
Endless Mode transformiert Tool-Outputs in komprimierte Observations während der Session statt danach. Dies ermöglicht dramatisch längere Sessions durch eine Dual-Memory-Architektur mit ~95% Token-Reduktion.
Aktuelle Probleme
Context-Limit-Erschöpfung
Latenz-Herausforderung
Lösungs-Architektur
Dual-Memory-Konzept
Real-time Compression Pipeline
PostToolUse Hook Enhancement
Implementierungsplan
Phase 1: Archive Infrastructure
Aufgaben:
Phase 2: Real-time Compression
Aufgaben:
Phase 3: Version Channel System
Aufgaben:
Phase 4: User Experience
Aufgaben:
Phase 5: Optimization
Aufgaben:
Konfiguration
Akzeptanzkriterien
Risiken
Geschätzter Aufwand
Erfolgsmetriken
Referenzen
Verwandte Issues
How Endless Mode v7.1 Actually Works
After examining the upstream implementation, here's the actual mechanism:
The Transcript File Trick
Claude Code stores the conversation in a local JSONL file:
Key insight: Hooks receive
transcript_pathas a parameter and can directly modify this file usingfs.writeFile(). When Claude Code makes the next API call, it reads the modified transcript.Current v7.1 Implementation (Synchronous)
Result: 110s latency after EVERY tool use while waiting for compression.
Improved Implementation: Async Worker-Based Approach
Instead of blocking the hook, let the worker handle everything asynchronously:
1. PostToolUse Hook (Zero Latency)
2. Worker Background Processing
3. How It Works
Advantages Over Synchronous Approach
Key Benefits
Trade-off
Tool outputs stay in context for 1-2 requests (until worker completes and cleans up) instead of being immediately compressed. For typical workflows:
This approach prioritizes user experience (zero latency) while maintaining token efficiency through background processing.
Findings from Claude Platform API Documentation
Research into the current Claude API documentation revealed several relevant features and patterns that could inform the Endless Mode implementation.
1. Official Context Editing API (Beta)
Anthropic has built server-side context management into the API:
Key features:
keepparameter to preserve N most recent tool usesexclude_toolsto protect specific tools from clearingclear_at_leastfor cache invalidation optimizationLimitation for claude-mem: These are API request parameters - Claude Code plugins cannot inject them. However, the patterns are useful for our own implementation.
2. SDK Compaction - Default Summary Prompt
The Python/TypeScript SDKs have a built-in compaction feature with a well-structured summary prompt that could serve as a template for observation generation:
Recommendation: Adapt this structure for our observation compression prompts.
3. Context Awareness in Claude 4.5
Claude 4.5 models have native context awareness - they receive automatic token budget updates:
Potential use: Instead of fixed token thresholds, we could leverage Claude's own awareness of its remaining budget. When Claude reports high usage in its responses, that could trigger more aggressive compression.
4. Memory Tool Pattern
The official Memory Tool (
memory_20250818) uses a structured command interface:viewcreatestr_replaceinsertdeleterenameRelevance: claude-mem's archive/recall mechanism could expose a similar MCP tool interface for explicit recall of cleared tool outputs:
5. Exclude-Tools Pattern
The API's
exclude_toolsparameter allows protecting specific tools from context clearing. This pattern should be configurable in Endless Mode:Use cases:
6. 1M Token Context Window (Beta)
Claude Sonnet 4/4.5 now supports 1M token context windows (beta, tier 4 required, premium pricing).
Implication: For users with access, this significantly extends the threshold before Endless Mode becomes necessary. Configuration could auto-detect available context size.
Summary
While these API features cannot be directly used from Claude Code plugins (since Claude Code controls the API calls), they provide:
exclude_tools, thresholds, model selectionThe transcript file modification approach from v7.1 remains the viable implementation path, but these patterns can inform the design.
Additional Research: Claude Code-Compatible Approaches
Note: The previous comment covered API-level features that cannot be directly used from Claude Code plugins. This comment focuses on patterns and approaches that work within Claude Code's hook system.
1. Continuous-Claude-v3: Alternative Architecture
Continuous-Claude-v3 is another Claude Code plugin that solves context management differently - "Compounding instead of Compacting":
TLDR Code Analysis (5-Layer AST):
Instead of full AI compression for code, it extracts structured representations:
Result: ~1,200 tokens vs 23,000 for raw files (95% savings without AI latency)
Relevance for Endless Mode: For
Readtool outputs containing code, AST-based compression could be a fast-path alternative to AI compression, avoiding the 60-90s latency.2. Proactive vs. Reactive Compression
Research from arxiv:2601.07190 found:
Key finding: Mandatory compression every 10-15 tool calls + system reminders achieved 22.7% token savings while maintaining accuracy. Passive/threshold-based compression only achieved 6% with accuracy degradation.
Implementation for claude-mem:
This is compatible with Claude Code's hook system.
3. Dual-Threshold System
Factory.ai's approach uses two thresholds:
Why this matters: A single threshold causes either too-frequent compression (high overhead) or too-aggressive compression (information loss). The gap between thresholds controls compression frequency.
Configuration example:
4. What Must Survive Compression
From multiple sources, the essential elements to preserve:
This aligns with the SDK's default summary prompt structure.
5. Fast-Path Compression Strategies
To reduce the 60-90s AI compression latency, consider tiered approaches:
Read(code)Read(text)GrepBash(success)Bash(error)Write/EditImplementation: PostToolUse hook checks tool type and applies appropriate strategy. Only complex/ambiguous outputs go through AI compression.
6. PreCompact Hook Integration
Claude Code's
PreCompacthook fires before native auto-compact (at ~95% context). This is the last chance to preserve context:This works within Claude Code's existing architecture.
Summary: Claude Code-Compatible Implementation Path
The API-level features (context_management, compaction_control) serve as validated patterns but must be reimplemented using transcript file modification for Claude Code compatibility.
Strategic Consideration: API Access vs. Claude Code Plugin
The Fundamental Limitation
Claude Code controls the API calls. As a plugin, claude-mem can only:
additionalContextWe cannot set API parameters like:
What We're Missing
clear_tool_usescompaction_controlmemory_20250818toolexclude_toolsOption: Build a Custom CLI
To use the full API feature set, we would need to build our own CLI that:
Pros:
Cons:
Alternative: Feature Request to Anthropic
A more practical approach might be requesting that Claude Code expose these API parameters:
This would allow plugins to benefit from server-side context management without rebuilding the entire CLI.
Recommended Path Forward
context_managementAPI exposure in Claude CodeThe transcript file trick gets us 80% of the way there. The API features would be the remaining 20% - nice to have, but not strictly necessary for a functional Endless Mode.
Alternative CLIs: OpenCode, Crush & API Access
Research into open-source Claude Code alternatives reveals interesting options for full API control.
OpenCode (sst/opencode)
GitHub: sst/opencode - 81.8k Stars, MIT License
Architecture:
Context Management:
SessionCompaction- automatic summarization when approaching token limitscompactionconfiguration option existsProviderTransformclass normalizes API calls across providersAPI Customization:
Key Insight: Since OpenCode is MIT licensed, we could:
context_managementsupport directlyCrush (charmbracelet/crush)
GitHub: charmbracelet/crush - 12k Stars
Architecture:
Provider Configuration:
Extensibility:
disabled_tools,allowed_toolsconfigurationKey Insight: claude-mem could be packaged as an MCP server for Crush, providing memory/context management as a tool.
Comparison: API Control
Strategic Options
Option A: Stay with Claude Code
Option B: Build for OpenCode
context_managementAPI parameter supportOption C: Build for Crush
Option D: Multi-Platform Support
Recommendation
Given that OpenCode and Crush are both:
A multi-platform approach could be valuable:
This would:
context_managementAPI usage on OpenCode/CrushReferences
New claude-mem Monorepo: Architecture Analysis
The new
claude-memmonorepo (/home/jonas/repos/claude-mem) is already designed with multi-platform extensibility in mind.Current Architecture
Platform-Agnostic Design
The hooks package already uses platform-agnostic types:
Comment in source: "Designed for easy extension to new platforms and events."
Worker Agents with Direct API Access
The worker package has agents that make direct API calls:
Multi-Platform Strategy
The architecture already supports adding new platform adapters:
Option 1: Add Platform Adapters to Hooks Package
Option 2: MCP Server Package
Endless Mode Integration Points
PostToolUse Handler (
packages/hooks/src/handlers/post-tool-use.ts)transcriptPathto enable transcript modificationWorker Agents (
packages/worker/src/agents/)context_managementfor observation extractionNew: Transcript Modifier Service
New: Context Management Route
For OpenCode/Crush Integration
The key difference: We control the API calls
This is impossible with Claude Code but straightforward with OpenCode/Crush.
Recommended Next Steps
Short-term (Claude Code)
TranscriptServiceto backendMedium-term (Multi-Platform)
packages/mcp-serverfor Crush/OpenCode integrationLong-term (Full API Control)
context_managementAPI supportRelevant Claude Code Release Notes
Two recent Claude Code features are directly useful for Endless Mode:
1. Context Window Percentage (v2.1.6)
Available in status line input - provides real-time context window usage.
Why this matters:
Implementation idea:
This aligns with the dual-threshold system mentioned earlier:
T_max(80%) = Aggressive compression triggerT_retained(60%) = Background compression starts2. Session ID in Skills (v2.1.9)
Available as string substitution in skills.
Why this matters:
/endlessskill that manages compression for the current sessionImplementation idea:
Combined: Smart Compression Skill
These two features together enable a context-aware compression skill:
Action Items
context_window.*from hooks, not just status line?/endlessskill - Manual compression trigger + status displayThese features reduce our dependency on the transcript file trick for knowing when to compress, even if we still need it for how to compress.
Clarification: context_window Access in Hooks
After checking the Claude Code documentation:
Hooks CANNOT Access context_window
context_window.*The
context_window.used_percentageandcontext_window.remaining_percentagefields are only available to StatusLine, not to hooks.What Hooks DO Receive
PostToolUse Input:
PreCompact Input:
No token counts, no context window info.
Workarounds for Endless Mode
1. PreCompact Hook as Threshold Indicator
When
trigger: "auto", it means Claude Code's native auto-compact triggered at ~95% context. This is an indirect signal:Limitation: Only fires at 95%, not at configurable thresholds.
2. Tool-Count Based Compression
Research showed this is actually more effective than threshold-based:
Advantage: Works without knowing context size.
3. StatusLine Skill + Backend Bridge (Creative Workaround)
StatusLine CAN access context_window. We could build a bridge:
Limitation: StatusLine runs on display refresh, not on every tool use.
4. Transcript Size Heuristic
Since hooks have
transcript_path, we could estimate context by file size:Limitation: Very imprecise, doesn't account for caching or actual token counts.
Recommendation
Given these limitations, the most reliable approach combines:
trigger: "auto") - Emergency fallbackThis doesn't require context_window access in hooks and aligns with research showing proactive compression is more effective than reactive threshold-based compression.
Agent SDK Features Relevant to Endless Mode
The TypeScript Agent SDK has several features we haven't documented yet:
1. SDKCompactBoundaryMessage - Token Count Before Compaction!
Why this matters: The
pre_tokensfield gives us the actual token count before compaction happened. This is the missing piece for understanding context usage!Use case: After receiving this message, we know exactly how many tokens were used before Claude Code compacted.
2. SessionStart Source: 'compact'
When
source === 'compact', the session "restarted" after native compaction.Use case: Detect post-compaction state and inject our preserved context:
3. PreCompact custom_instructions
The
custom_instructionsfield allows customizing what the native compaction should preserve!Use case: Inject instructions for Claude Code's native compaction:
4. Programmatic Hooks (SDK only)
Why this matters: For SDK-based applications (OpenCode integration?), hooks can be defined programmatically without settings.json.
5. 1M Context Window Beta
Enables 1M token context for Sonnet 4/4.5.
Use case: For users with access, delay compression triggers significantly.
6. File Checkpointing
Use case: If Endless Mode compression goes wrong, could potentially rewind to a known good state.
7. V2 SDK - Simpler Session Management
The V2 preview simplifies multi-turn conversations:
Use case: For building custom CLIs or OpenCode integration, V2 makes session management cleaner.
Summary: New Integration Points
compact_boundary.pre_tokensSessionStart source: 'compact'PreCompact custom_instructionsThe
compact_boundarymessage andSessionStart source: 'compact'are particularly valuable - they give us hooks into Claude Code's native compaction lifecycle that we weren't aware of before.Phase 1 & 2 Implementation Complete
Phase 1: Archive Infrastructure ✅
Database Schema:
ArchivedOutputentity with full tool input/output storagepending,processing,completed,failed,skipped)Repository Pattern:
IArchivedOutputRepositoryinterfaceMikroOrmArchivedOutputRepositoryimplementationcreate,getPendingCompression,updateCompressionStatus,findByObservationId,search,getStats,cleanupMigration:
Migration20260125124900_add_archived_outputscreates table with proper indexesPhase 2: Settings ✅
New settings added to
packages/shared/src/settings.ts:ENDLESS_MODE_ENABLEDfalseENDLESS_MODE_COMPRESSION_MODELclaude-haiku-4-5ENDLESS_MODE_COMPRESSION_TIMEOUT90000ENDLESS_MODE_FALLBACK_ON_TIMEOUTtrueENDLESS_MODE_SKIP_SIMPLE_OUTPUTStrueENDLESS_MODE_SIMPLE_OUTPUT_THRESHOLD1000Commits
1f8321f- feat(database): add Endless Mode infrastructureNext Steps (Phase 3+)
postToolUsehook flowPhase 3 Complete: MCP Tools & API Endpoints
Implemented
API Endpoints (DataRouter):
GET /api/data/archived-outputs- List with filtering (sessionId, project, status, toolName)GET /api/data/archived-outputs/search- Semantic search for archived outputsGET /api/data/archived-outputs/stats- Compression statisticsGET /api/data/archived-outputs/:id- Get by IDGET /api/data/archived-outputs/by-observation/:observationId- Recall by observationMCP Tools:
recall_archived- Search or retrieve full tool outputs that were compressedarchived_stats- Get compression efficiency statisticsCommit
aadf1a6- feat(endless-mode): add MCP tools and API endpoints for archived output recallRemaining Work (Phase 4 & 5)
Phase 4 Complete: User Experience
Implemented
Dashboard Widget (EndlessModeCard):
Settings UI (ProcessingSettings):
API Client:
ArchivedOutputandArchivedOutputStatsinterfacesgetArchivedOutputs,searchArchivedOutputs,getArchivedOutputStats,getArchivedOutput,getArchivedOutputByObservationCommit
e5b3fd9- feat(ui): add Endless Mode dashboard widget with compression statsPhase Summary
Acceptance Criteria Status
Remaining Work (can be deferred)
The core Endless Mode functionality is complete and usable. Phase 5 optimizations can be addressed in follow-up issues.