AI Pulse
← Projects · weekend

Novelty Memory Bot for Your Reading List

A CLI tool that scores each new paper you add against your personal reading history, flagging genuine novelty vs. incremental rehash.

Difficulty: weekend | Stack: Python, Claude API (claude-haiku for embeddings/scoring), SQLite + sqlite-vec or ChromaDB, arXiv API, Typer (CLI)

Who this is for

Researchers and technical leads who read 10-30 papers a week and waste time on papers that repackage known ideas — the tool surfaces only papers that introduce something not already in their personal corpus.

Build steps

  1. Build a CLI command add <arXiv_id> that fetches abstract + metadata from the arXiv API and stores it in a local ChromaDB collection with vector embeddings.
  2. On each add, retrieve the top-5 most similar past papers by cosine similarity and send them alongside the new abstract to Claude with a prompt: ‘Given these prior papers the user has read, what — if anything — is genuinely novel in the new one? Score novelty 1-10 and list specific novel claims.’
  3. Persist novelty scores and Claude’s reasoning in SQLite so the user can query list --min-novelty 7 to surface high-value papers.
  4. Add a digest command that batches the last N unreviewed papers and returns a ranked reading list ordered by novelty score.
  5. Export to markdown or Obsidian-compatible notes with novelty annotations embedded as frontmatter.

Risks

  • Embedding similarity alone can miss novelty in cross-domain papers (a physics technique applied to biology looks dissimilar but may be well-known in its origin field) — Claude’s reasoning helps but isn’t foolproof.
  • ChromaDB persistence across sessions can silently corrupt if the process is killed mid-write; need WAL mode or periodic export to a flat JSON backup.
  • For users with large existing libraries (500+ papers), the initial bulk-import + embedding pass can take hours and cost non-trivial API dollars — must warn upfront and add a --dry-run flag.

Business Angle

A CLI tool that scores arXiv papers for genuine novelty against your personal reading history, so researchers stop re-reading recycled ideas

Customer: Solo ML researcher or technical lead at a seed-stage AI startup — reads 15-25 papers/week on arXiv, tracks papers in Notion or Zotero, constantly annoyed that 40% of papers are incremental tweaks on things they already know cold

Pricing: one-time — $800 in first 60 days via lifetime licenses, then reassess

Full business breakdown →