Cut your LLM API costs
30–70% with
one line of code.

ContextPilot intercepts your LLM calls, compresses redundant context through a quality-gated pipeline, and sends the optimal payload. Your code stays identical.

$ pip install contextpilot
30–70%
API cost reduction
<50ms
at 100K tokens
4
integration surfaces
0
prompt changes needed

Before & after.

Same conversation, same output quality. ContextPilot strips what the model doesn't need.

Before 24,800 tokens
# system prompt (full, repeated each turn)
▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
# turn 1–12 full history
▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
# RAG chunks (12 retrieved, 4 relevant)
▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
After ContextPilot 9,100 tokens
# system prompt (deduped, once)
▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
# compressed history summary
▓▓▓▓▓▓▓▓▓▓▓▓
▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
▓▓▓▓▓▓▓▓▓▓
# RAG chunks (4 relevant only)
▓▓▓▓▓▓▓▓▓▓▓▓▓▓
▓▓▓▓▓▓▓▓▓▓▓▓
▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
# current turn (unchanged)
▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
↓ 63% reduction · quality score 94/100
01
History summary
Old turns → dense summary
02
System dedup
Repeated prompts stripped
03
RAG pruning
Irrelevant chunks dropped
04
Quality gate
Score <85 → fallback original

One engine. Any workflow.

Same compression pipeline, four different entry points. Start with the surface that fits your stack.

A · Library
pip install contextpilot

Python SDK wrapper

Drop-in for OpenAI, Anthropic, and Google Vertex AI. One call wraps your existing client. Zero prompt changes.

import contextpilot
import anthropic
client = anthropic.Anthropic()
client = contextpilot.wrap(client)
# all your existing calls unchanged
B · Proxy
OpenAI-compatible

Local proxy server

For AI coding tools (Claude Code, Aider, Cursor). Set one env var — ContextPilot intercepts transparently.

# terminal 1 — start the proxy
$ contextpilot proxy --port 8432
# terminal 2 — point your tool at it
$ export ANTHROPIC_BASE_URL=http://localhost:8432
C · MCP
Claude Desktop / Code

MCP server

Exposes optimize_context, get_savings, and suggest_config as native MCP tools inside Claude.

# run the MCP server
$ contextpilot mcp
# claude.json entry
{ "contextpilot": { "command": "contextpilot mcp" } }
D · CLI
Migration agent

AST migration agent

For existing codebases with 50+ LLM calls. Scans, wraps, and patches automatically. Dry-run first.

# preview changes first
$ contextpilot migrate ./src/ --dry-run
# apply when satisfied
$ contextpilot migrate ./src/ --apply

Built to be trusted.

Zero-trust payload

Telemetry transmits only numeric metadata — token counts, latency, scores. Never prompt text, response content, or PII.

Fail-safe fallback

If compression fails or the quality score drops below 85, ContextPilot sends the original payload unmodified. Nothing breaks.

Provider-agnostic

Works with OpenAI, Anthropic, Google Vertex AI, and any gateway (Portkey, Helicone). Middleware, not a router.

Get early access.

The ContextPilot dashboard will show real-time token savings, before/after comparisons, cost projections by model, and A/B shadow test results. Join the waitlist and we'll notify you first.

No spam. One email when we launch. Unsubscribe anytime.