Session Memory Consolidation Service

A background service that compresses and consolidates agent conversation history into a structured memory store, injected as a compact context prefix on next session.

Difficulty: 1-week | Stack: Python, FastAPI, SQLite + sqlite-vec, Claude API (haiku for compression), APScheduler

Who this is for

Developers building multi-session agents (coding assistants, research agents) where re-deriving prior context burns tokens and causes drift.

Build steps

Build a session logger: any agent posts raw turns to POST /sessions/{id}/turns; stored in SQLite with timestamps.
Write a consolidation job (APScheduler, runs idle periods): calls Haiku with a compression prompt that extracts decisions, facts, and open tasks into a structured JSON memory object.
Store versioned memory snapshots; on GET /sessions/{id}/context return a compact Markdown summary (<500 tokens) ready to prepend to new session system prompt.
Add a semantic search endpoint: POST /memory/search with a query returns top-k relevant memory chunks via sqlite-vec embeddings.
Ship a Python SDK wrapper (3 functions: log_turn, get_context, search_memory) that any agent framework can call in 5 lines.

Risks

Haiku compression hallucinates or drops critical details — need a recall eval: ask questions about original turns and check memory answers match.
Memory grows unboundedly across many sessions — need a tiered eviction policy (recent full, older compressed, oldest summarized only).
APScheduler job timing conflicts with active sessions — need a session-active lock so consolidation never runs mid-conversation.

Session Memory Consolidation Service

Who this is for

Build steps

Risks

Business Angle