The AI Gateway that TL;DR tokens

Edgee compresses prompts before they reach LLM providers.
Same code, fewer tokens, lower bills.

curl https://api.edgee.ai/v1/chat/completions -d \
'{
  "model": "gpt-4o",
  "message": "A long prompt and tool calls that could be optimized..."
}'
Test the command above to see the results
3B+Requests/Month
100+Global PoPs

Backed by founders and leaders of

Token Compression & Cost Governance

Use powerful models at a fraction of the cost

Edgee optimizes your prompts at the edge using intelligent token compression, removing redundancy while preserving meaning. The compressed requests are then forwarded to your LLM provider of choice. Tag requests with custom metadata to track usage and costs, and get alerted when costs spike, before they become a problem.

Up to 50% input token reduction

Reduce LLM costs by intelligently compressing prompts at the edge.

Semantic preservation

Preserves context and intent while removing redundant tokens.

Universal compatibility

Works with any LLM provider, including OpenAI, Anthropic, Gemini, xAI, and Mistral.

Cost governance

Tag requests with custom metadata to track usage and costs, and get alerted when spending spikes.

Tags & Alerts
Add tags to requests
await edgee.send({
  model: 'openai/gpt-4',
  input: 'Generate report',
  tags: ['feature:reports', 'team:analytics']
});
Cost alerts
Cost spike detected
Tag feature:reports exceeded $500 in the last 24h
+250% vs previous period
→ Track by feature, team, project, or custom tags

How it works

One gateway, many providers

Your application calls Edgee. We apply policies at the edge (routing, privacy controls, retries), then forward the request to the best provider for the job.

  • Normalize responses across models so you can switch providers easily
  • Observe and debug production AI traffic end-to-end
  • Control costs with routing policies and caching
Quick example
import Edgee from 'edgee';

const edgee = new Edgee('your-edgee-api-key');

const res = await edgee.send({
  model: 'openai/gpt-5.2',
  input: 'Explain edge computing like I’m 5',
});

console.log(res.text);

Why Edgee AI Gateway?

An edge intelligence layer for your AI traffic

Edgee sits between your application and LLM providers behind a single OpenAI-compatible API. It adds edge-level intelligence, including token compression, routing policies, cost controls, private models, and tools, so you can ship AI features faster and with confidence.

Token compression

Reduce prompt size without losing intent to lower costs and latency, especially for long contexts, RAG pipelines, and multi-turn agents.

Learn more

Edge Tools

Invoke shared tools managed by Edgee, or deploy your own private tools at the edge, closer to users and providers for lower latency and tighter control.

Learn more

Bring Your Own Keys

Use Edgee’s keys for convenience, or plug in your own provider keys for billing control and custom models.

Learn more

Observability

Monitor latency, errors, usage, and cost per model, per app, and per environment.

Learn more

Edge Models

Run small, fast models at the edge to classify, redact, enrich, or route requests before they reach an LLM provider.

Learn more

Private Models

Deploy serverless open-source LLMs on demand, where you need them, and expose them through the same gateway API alongside public providers.

Learn more

The vision behind Edgee

Every technological shift creates a new foundation: the web had bandwidth, the cloud had compute, and AI has tokens. In a world powered by models, intelligence has a cost: tokens flow through every interaction, decision, and response.

At Edgee, we believe intelligence should move efficiently, closer to users, intent, and action. It should be compressed, routed, and optimized so decisions happen instantly. Hear from Sacha, Edgee’s co-founder, on how AI scales by mastering how intelligence moves.

Edgee - Cut LLM Costs by up to 50% with AI Gateway