Open Source

Stop burning tokens.
Cure context rot.

AI agents re-read your entire context on every turn - costs explode, quality drops. The Librarian fixes this: up to 85% fewer tokens, no context rot, and near-infinite scalability. Open source.

Get Started →See the Numbers

The Hidden Cost of AI Agents

Modern agentic systems burn through tokens fast. Even after just a few interactions, the context window fills up - and three compounding problems kick in.

💸

Exponential Cost

By turn 50, brute-force approaches send 6× more tokens than necessary. Every turn re-processes the entire history - costs scale as n².

Up to 85%Cost Reductionvs. brute-force at 50 turns

🧠

Context Rot

As context grows, LLMs lose track. Key instructions get buried under noise. Research shows the "Lost in the Middle" effect can cause quality to drop by 20% - 85% as context length increases.

82%Answer AccuracyBeats brute-force (78%) with less context

⏱️

Latency Ceiling

At 100K tokens, brute-force response generation can take up to 60 seconds. The prefill cost scales linearly with history size.

3-4×Faster at ScaleAt 100K tokens vs. brute-force

How the Librarian Works

A simple three-step process that replaces brute-force context with intelligent reasoning.

Index

After each message, a lightweight model creates a ~100-token summary. This builds a compressed index of the entire conversation - 10× smaller than the raw history. This happens asynchronously, so the user never waits.

Select

When a new message arrives, the Librarian reads the summary index and reasons about which messages are relevant. Unlike vector search, it understands temporal logic and dependencies between messages.

Hydrate

Only the selected messages are fetched in full and passed to the responder. The result: a highly curated context of ~800 tokens instead of 2,000+ tokens of noise. Less noise → better answers.

Dive into the benchmarks →

Built for Everyone

⚡

For Developers

Drop-in integrations for LangGraph and OpenClaw. Install with pip, configure your models, and get sub-linear context scaling in minutes.

View integration guides →

📊

For Teams & Founders

Reduce your LLM costs by up to 85%, improve response quality, and unlock conversations that scale to 100K+ tokens without performance degradation.

See performance data →

🔬

For Researchers

Full benchmark suite, reproducible datasets, and detailed methodology. Every claim is backed by data you can verify yourself.

Explore the science →

Coming Soon

Fine-Tuned Librarian Endpoints

We're building specialized LLM endpoints optimized for the Librarian's selection task. Early benchmarks show 1.3s context creation - an 84% reduction from general-purpose models. Zero config, drop-in replacement.

Stop burning tokens.Cure context rot.