Python · MIT · Open Source · v0.2.2

Stop paying for tokens
your model ignores.

Every LLM call carries dead weight. ContextPilot compresses it before your model sees it, with a quality gate so nothing important gets dropped.

$pip install contextpilot-ai

View on GitHub How it works PyPI ↗

60-80%

cost reduction

<50ms

at 100K tokens

integration surfaces

prompt changes

60-80%

cost reduction

<50ms

at 100K tokens

integration surfaces

prompt changes

01How it works

Context compression in four stages.

LLM APIs bill per token in the context window. Over time that window fills with repeated system prompts, stale history, and RAG chunks with nothing to do with the current query. ContextPilot intercepts every call and removes what the model does not need.

Analyze

Staleness, redundancy, relevance, density

Compress

History, dedup, RAG pruning, structural

Gate

Score below 72 — original forwarded unchanged

Forward

Optimal payload sent to your provider

Before24,800 tokens

# system prompt (repeated every turn)

████████████████████████████████

████████████████████████████

# conversation history (turns 1-12)

████████████████████████████

█████████████████████████████████

████████████████████████

# RAG chunks (12 retrieved, 4 relevant)

██████████████████████████████

█████████████████████████████████

After ContextPilot5,600 tokens

# system prompt (deduped, sent once)

████████████████████

█████████████████

# compressed history summary

███████████████

████████████

# RAG (4 relevant only)

████████████████

# current turn (unchanged)

███████████████████████

77% reduction· quality 83/100

02Integration surfaces

Four ways to plug in. Pick one.

Whether you own the code that calls the LLM or just use an AI coding tool, there is a surface that requires zero changes to what you already have.

A · Library

Python SDK wrapper

Drop-in for OpenAI and Anthropic. One call wraps your client and all existing calls stay unchanged.

import contextpilot

from openai import OpenAI

client = contextpilot.wrap(OpenAI())

# all existing calls unchanged

03Engineering principles

Built to be boring in production.

No configuration required, no data leaves your environment, and if anything goes wrong the original request goes through untouched.

ZERO-TRUST PAYLOAD

Your prompts never leave.

Telemetry emits only numeric metadata: token counts, latency, quality scores, model IDs. No prompt content, response text, or PII ever leaves your environment. Architectural guarantee, not a policy.

FAIL-SAFE BY DEFAULT

Nothing breaks. Ever.

Compression failures or quality score below threshold (default 72/100) means the original payload is forwarded unmodified. Every call is wrapped in try/except. The proxy always forwards, even if compression throws.

PROVIDER-AGNOSTIC

Works with your stack.

OpenAI, Anthropic, Portkey, Helicone, any OpenAI-compatible gateway. ContextPilot is middleware, not a router and not a lock-in.

MINIMAL INTEGRATION

One line, then forget.

pip install and wrap() for library users. One env var for proxy users. One MCP registration for Claude Code. No configuration required to start.

PERFORMANCE BUDGET

Fast enough to be invisible.

Analysis runs in under 50ms at 100K tokens, under 10ms at 10K tokens. Telemetry is async non-blocking. Overhead is never the reason your call is slow.

A/B SHADOW TESTING

Validate before trusting.

Shadow mode samples 5% of requests, runs both compressed and original, compares via cosine similarity. Validate against your actual workload before enabling globally.

04Dashboard · coming soon

See exactly what you are saving.

Real-time token savings, before/after comparisons, cost projections by model, and A/B shadow test results. One email when it is ready.

No spam. Unsubscribe anytime.

Token savings · last 30dpreview

37.3%

reduction

106K

tokens saved

83.2

avg quality

$0.53

est. saved

live data via CONTEXTPILOT_API_KEY

ContextPilotMIT · Python 3.10+

GitHub PyPI Security Contributing