Open-source · Python · MIT License

Cut your LLM API costs
30–70% — with
one line of code.

ContextPilot intercepts your LLM calls, compresses redundant context through a quality-gated pipeline, and sends the optimal payload. Your code stays identical.

$ pip install contextpilot

View on GitHub See the demo →

30–70%

API cost reduction

<50ms

at 100K tokens

4

integration surfaces

0

prompt changes needed

Token compression · live example

Before & after.

Same conversation, same output quality. ContextPilot strips what the model doesn't need.

          Before
          24,800 tokens
        
# system prompt (full, repeated each turn)
▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
# turn 1–12 full history
▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
# RAG chunks (12 retrieved, 4 relevant)
▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓

          After ContextPilot
          9,100 tokens
        
# system prompt (deduped, once)
▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
# compressed history summary
▓▓▓▓▓▓▓▓▓▓▓▓
▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
▓▓▓▓▓▓▓▓▓▓
# RAG chunks (4 relevant only)
▓▓▓▓▓▓▓▓▓▓▓▓▓▓
▓▓▓▓▓▓▓▓▓▓▓▓
▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
# current turn (unchanged)
▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓

          ↓ 63% reduction
          ·
          quality score 94/100
        

01

History summary

Old turns → dense summary

02

System dedup

Repeated prompts stripped

03

RAG pruning

Irrelevant chunks dropped

04

Quality gate

Score <85 → fallback original

Four integration surfaces

One engine. Any workflow.

Same compression pipeline, four different entry points. Start with the surface that fits your stack.

A · Library

pip install contextpilot

Python SDK wrapper

Drop-in for OpenAI, Anthropic, and Google Vertex AI. One call wraps your existing client. Zero prompt changes.

import contextpilot
import anthropic
client = anthropic.Anthropic()
client = contextpilot.wrap(client)
# all your existing calls unchanged

B · Proxy

OpenAI-compatible

Local proxy server

For AI coding tools (Claude Code, Aider, Cursor). Set one env var — ContextPilot intercepts transparently.

# terminal 1 — start the proxy
$ contextpilot proxy --port 8432
# terminal 2 — point your tool at it
$ export ANTHROPIC_BASE_URL=http://localhost:8432

C · MCP

Claude Desktop / Code

MCP server

Exposes optimize_context, get_savings, and suggest_config as native MCP tools inside Claude.

# run the MCP server
$ contextpilot mcp
# claude.json entry
{ "contextpilot": { "command": "contextpilot mcp" } }

D · CLI

Migration agent

AST migration agent

For existing codebases with 50+ LLM calls. Scans, wraps, and patches automatically. Dry-run first.

# preview changes first
$ contextpilot migrate ./src/ --dry-run
# apply when satisfied
$ contextpilot migrate ./src/ --apply

Engineering principles

Built to be trusted.

Zero-trust payload

Telemetry transmits only numeric metadata — token counts, latency, scores. Never prompt text, response content, or PII.

Fail-safe fallback

If compression fails or the quality score drops below 85, ContextPilot sends the original payload unmodified. Nothing breaks.

Provider-agnostic

Works with OpenAI, Anthropic, Google Vertex AI, and any gateway (Portkey, Helicone). Middleware, not a router.

Dashboard · Coming soon

Get early access.

The ContextPilot dashboard will show real-time token savings, before/after comparisons, cost projections by model, and A/B shadow test results. Join the waitlist and we'll notify you first.

No spam. One email when we launch. Unsubscribe anytime.

Cut your LLM API costs 30–70% — withone line of code.