github.com/msousa202/ContextPilot

Python 3.10+ · MIT License · v0.2.2

Python · MIT · Open Source · v0.2.2

Stop paying for tokens
your model ignores.

Every LLM call carries dead weight. ContextPilot compresses it before your model sees it, with a quality gate so nothing important gets dropped.

$pip install contextpilot-ai
60-80%
cost reduction
<50ms
at 100K tokens
4
integration surfaces
0
prompt changes
01How it works

Context compression in four stages.

LLM APIs bill per token in the context window. Over time that window fills with repeated system prompts, stale history, and RAG chunks with nothing to do with the current query. ContextPilot intercepts every call and removes what the model does not need.

01
Analyze
Staleness, redundancy, relevance, density
02
Compress
History, dedup, RAG pruning, structural
03
Gate
Score below 72 — original forwarded unchanged
04
Forward
Optimal payload sent to your provider
Before24,800 tokens
# system prompt (repeated every turn)
████████████████████████████████
████████████████████████████
# conversation history (turns 1-12)
████████████████████████████
█████████████████████████████████
████████████████████████
# RAG chunks (12 retrieved, 4 relevant)
██████████████████████████████
█████████████████████████████████
After ContextPilot5,600 tokens
# system prompt (deduped, sent once)
████████████████████
█████████████████
# compressed history summary
███████████████
████████████
# RAG (4 relevant only)
████████████████
# current turn (unchanged)
███████████████████████
77% reduction· quality 83/100
02Integration surfaces

Four ways to plug in. Pick one.

Whether you own the code that calls the LLM or just use an AI coding tool, there is a surface that requires zero changes to what you already have.

A · Library

Python SDK wrapper

Drop-in for OpenAI and Anthropic. One call wraps your client and all existing calls stay unchanged.

import contextpilot
from openai import OpenAI
 
client = contextpilot.wrap(OpenAI())
# all existing calls unchanged
03Engineering principles

Built to be boring in production.

No configuration required, no data leaves your environment, and if anything goes wrong the original request goes through untouched.

ZERO-TRUST PAYLOAD
Your prompts never leave.
Telemetry emits only numeric metadata: token counts, latency, quality scores, model IDs. No prompt content, response text, or PII ever leaves your environment. Architectural guarantee, not a policy.
FAIL-SAFE BY DEFAULT
Nothing breaks. Ever.
Compression failures or quality score below threshold (default 72/100) means the original payload is forwarded unmodified. Every call is wrapped in try/except. The proxy always forwards, even if compression throws.
PROVIDER-AGNOSTIC
Works with your stack.
OpenAI, Anthropic, Portkey, Helicone, any OpenAI-compatible gateway. ContextPilot is middleware, not a router and not a lock-in.
MINIMAL INTEGRATION
One line, then forget.
pip install and wrap() for library users. One env var for proxy users. One MCP registration for Claude Code. No configuration required to start.
PERFORMANCE BUDGET
Fast enough to be invisible.
Analysis runs in under 50ms at 100K tokens, under 10ms at 10K tokens. Telemetry is async non-blocking. Overhead is never the reason your call is slow.
A/B SHADOW TESTING
Validate before trusting.
Shadow mode samples 5% of requests, runs both compressed and original, compares via cosine similarity. Validate against your actual workload before enabling globally.
04Dashboard · coming soon

See exactly what you are saving.

Real-time token savings, before/after comparisons, cost projections by model, and A/B shadow test results. One email when it is ready.

No spam. Unsubscribe anytime.

Token savings · last 30dpreview
37.3%
reduction
106K
tokens saved
83.2
avg quality
$0.53
est. saved
live data via CONTEXTPILOT_API_KEY
ContextPilotMIT · Python 3.10+
2026 ContextPilot contributors · Released under the MIT License