LiteLLM vs OpenRouter vs Portkey (2026)

Updated: June 3, 2026

An AI gateway is a single proxy layer that gives your application one unified API for many LLM providers, with built-in routing, caching, retries, observability, and cost controls. In 2026 the three serious contenders are LiteLLM (open-source proxy you self-host), OpenRouter (hosted marketplace with 300+ models), and Portkey (commercial control plane with governance). Pick LiteLLM for full control, OpenRouter for breadth and zero-ops experimentation, and Portkey for enterprise guardrails. The rest of this guide explains exactly when each one wins.

LiteLLM is an MIT-licensed Python proxy and SDK that translates 100+ provider APIs into the OpenAI schema; you self-host it, you own the data plane.
OpenRouter is a hosted gateway with 300+ models behind a single OpenAI-compatible endpoint; pay-as-you-go with a 5.5% markup and zero infrastructure.
Portkey is a commercial gateway plus observability and governance stack with prompt management, guardrails, and SOC 2 / HIPAA tiers built for enterprises.
All three support fallback routing, retries, and load balancing, but only Portkey ships prompt templates and human-review queues out of the box.
For semantic caching at scale, LiteLLM + Redis is the cheapest path; Portkey's hosted cache wins for teams that don't want to run Redis.
OpenRouter is the fastest way to A/B test new models like Claude Opus 4.7, GPT-5.1, and DeepSeek V3.1 without onboarding each vendor's billing.

What is an AI gateway?

An AI gateway sits between your application and one or more LLM providers and normalizes the traffic. Think of it as the API gateway pattern (Kong, Apigee, Envoy) applied to model inference. Instead of writing provider-specific SDK code for Anthropic, OpenAI, Google, Mistral, Bedrock, and Azure, your service speaks one schema and the gateway handles translation, authentication, retries, fallbacks, rate limiting, caching, and telemetry.

In practice, gateways solve four production headaches that I've hit on every serious LLM project. First, provider risk: when OpenAI's API rate-limits you mid-incident, you want to fail over to Claude in 200ms, not 200 minutes. Second, cost visibility: finance wants per-team, per-feature, per-tenant token spend, and that data doesn't live in any single vendor's dashboard. Third, governance: PII redaction, prompt-injection scanning, and policy enforcement belong at one chokepoint, not duplicated in every microservice. Fourth, experimentation: swapping models for evals or A/B tests should be a config change, not a code change.

Most teams adopt a gateway the second they cross two providers or three production services. The OpenAI-compatible schema has become the lingua franca, so a well-designed gateway lets your existing openai SDK talk to anything from Claude Opus 4.7 to a self-hosted Llama 4 endpoint by changing the base_url. (Honestly, that one trick alone has saved me weeks of integration work.)

LiteLLM vs OpenRouter vs Portkey at a glance

The three gateways occupy different points on the build-versus-buy spectrum. LiteLLM is a library and proxy you run yourself. OpenRouter is a hosted endpoint with credit-based billing. Portkey is a SaaS control plane with an optional self-hosted gateway and an entire observability and prompt-management product on top. Here's how they line up on the dimensions that matter when you're choosing.

Dimension	LiteLLM	OpenRouter	Portkey
License / model	MIT open source + paid enterprise tier	Closed, hosted SaaS	Closed SaaS, gateway is open source (Apache 2.0)
Deployment	Self-host (Docker, Kubernetes, ECS)	Hosted only	Hosted or self-host the gateway
Provider coverage	100+ providers via adapters	300+ models in one marketplace	250+ models
Routing & fallbacks	Yes, declarative YAML	Yes, ordered fallback list per request	Yes, configurable routing strategies
Semantic caching	Yes, Redis-backed	No native semantic cache (exact-match only)	Yes, hosted simple + semantic cache
Observability	Logs to Langfuse, Helicone, OTel	Built-in usage dashboard	Full-featured traces, evals, prompt versioning
Pricing	Free; enterprise quoted	Provider cost + ~5.5% markup	Free tier + per-request paid plans
Best for	Engineering teams who want full control	Solo devs and rapid model experimentation	Enterprises needing governance & prompt ops

LiteLLM deep dive: the open-source proxy

LiteLLM from BerriAI is the most adopted open-source AI gateway. It ships as both a Python SDK (pip install litellm) and a standalone proxy server. The SDK gives you a single completion() function that takes a model string like "anthropic/claude-opus-4-7" or "bedrock/meta.llama4-70b" and routes accordingly. The proxy mode is what you actually run in production: a FastAPI process exposing an OpenAI-compatible /v1/chat/completions endpoint with virtual keys, budgets, and a Postgres-backed admin UI.

A minimal LiteLLM proxy config that fans out across Claude and GPT, with automatic fallback when one is rate-limited, looks like this:

# config.yaml
model_list:
  - model_name: smart-default
    litellm_params:
      model: anthropic/claude-opus-4-7
      api_key: os.environ/ANTHROPIC_API_KEY
  - model_name: smart-default
    litellm_params:
      model: openai/gpt-5.1
      api_key: os.environ/OPENAI_API_KEY

router_settings:
  routing_strategy: latency-based-routing
  fallbacks:
    - smart-default: ["bedrock/claude-sonnet-4-6"]
  num_retries: 2
  timeout: 30

litellm_settings:
  cache: true
  cache_params:
    type: redis
    host: redis.internal
    port: 6379
    ttl: 3600

general_settings:
  master_key: sk-litellm-master
  database_url: postgresql://litellm:pw@db/litellm

Start it with litellm --config config.yaml --port 4000 and your application now points at http://gateway:4000/v1 using the OpenAI SDK. You get virtual API keys per team (with their own budgets and rate limits), a usage dashboard, structured logs you can ship to production-grade LLM observability tools like Langfuse or Helicone, and a Prometheus metrics endpoint.

The killer feature for cost-sensitive teams is the Redis-backed semantic cache. When combined with prompt caching at the provider level, LiteLLM is the cheapest way I've benchmarked to run high-volume chat traffic. The tradeoff is operational: you own the database, the Redis cluster, and the upgrades. LiteLLM ships breaking changes more often than I'd like, so pin the version and read the changelog before bumping. (I learned that one the hard way during a quiet Friday deploy.)

OpenRouter deep dive: the hosted model marketplace

OpenRouter takes a completely different angle. It's a hosted endpoint at https://openrouter.ai/api/v1 that proxies to more than 300 models from every major provider plus dozens of inference partners (Fireworks, Together, DeepInfra, Lambda). You load credits with a card or crypto, pick a model by its slug like anthropic/claude-opus-4.7 or deepseek/deepseek-v3.1, and OpenRouter handles the rest. There's no proxy to run, no Postgres to manage, and no SDK to install beyond the standard OpenAI client.

The pricing model is straightforward: you pay the provider's list price plus a ~5.5% markup that funds the platform. For models hosted by multiple inference partners, OpenRouter picks the cheapest healthy endpoint by default, or you can pin a specific provider. A request that uses fallback looks like this:

from openai import OpenAI

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key=os.environ["OPENROUTER_API_KEY"],
)

resp = client.chat.completions.create(
    model="anthropic/claude-opus-4.7",
    messages=[{"role": "user", "content": "Draft a release note for v2.4."}],
    extra_body={
        "models": [
            "anthropic/claude-opus-4.7",
            "openai/gpt-5.1",
            "google/gemini-2.5-pro",
        ],
        "route": "fallback",
        "provider": {"sort": "throughput"},
    },
)
print(resp.choices[0].message.content)

OpenRouter is unbeatable for two use cases. First, model experimentation: when a new frontier model drops, it appears on OpenRouter within hours, so you can A/B it against your incumbent without onboarding a new vendor. Second, indie projects: solo developers and small teams avoid the operational overhead of running a proxy entirely.

Where OpenRouter falls short is enterprise control. There's no native PII redaction, no prompt-injection scanning, no semantic cache, and limited per-team governance. The usage dashboard is good, but it isn't production-grade cost optimization tooling. The 5.5% markup also stings at scale: at $50K/month in spend you're paying $2,750/month for what amounts to a smarter routing layer.

Portkey deep dive: the enterprise control plane

Portkey is the most product-shaped of the three. The gateway itself is open source (Apache 2.0) and runs anywhere a Node.js process runs, but the value is in the hosted control plane: a web app with prompt versioning, evaluation runs, observability traces, semantic and exact-match caching, virtual keys, guardrails, and granular role-based access control. SOC 2 Type II, HIPAA, and ISO 27001 attestations are available on paid tiers.

A typical Portkey integration looks more declarative than the others. You define a "config" in the dashboard that bundles routing, retries, cache settings, and guardrails, then reference it by ID:

from portkey_ai import Portkey

client = Portkey(
    api_key=os.environ["PORTKEY_API_KEY"],
    config="pc-prod-claude-fallback-9f2",  # routing + cache + guardrails
)

resp = client.chat.completions.create(
    messages=[{"role": "user", "content": "Summarize this support ticket."}],
    model="claude-opus-4-7",
    metadata={"team": "support", "feature": "ticket-summary"},
)

The metadata dimension is where Portkey shines: every request is automatically attributed to a team, feature, environment, and user, and the dashboard breaks down spend and latency along any of those axes. For teams already using a structured-outputs library, Portkey plays well alongside our Instructor and Pydantic extraction pipelines because it logs the raw structured response next to the original prompt.

Portkey's guardrails system deserves a callout: it ships dozens of pre-built filters (PII, regex denylists, toxicity, prompt-injection patterns, JSON schema validation) and lets you compose them into pre-request and post-request chains. This overlaps significantly with the patterns we cover in our LLM guardrails defense-in-depth guide, but Portkey gives you a hosted, auditable version with no code.

Do you actually need an LLM gateway?

Not always. If you have exactly one provider, one application, and one engineer, a gateway is overhead. The decision flips the moment any of the following is true: you use more than one model provider, you have more than three services calling LLMs, you need per-team cost reporting, you have a compliance team asking about PII or audit logs, or you're doing prompt experimentation that touches production traffic.

The other strong signal is operational pain. If you've written your own retry loop, your own fallback to a backup model, your own token counter, or your own usage dashboard, congratulations: you've built half a gateway badly. Adopt one of these three and delete the code.

So, one pattern I've seen succeed: start with OpenRouter for the first six months to validate that your application needs the breadth, then graduate to LiteLLM or Portkey once you've settled on three or four models and want the cost or governance wins. Migrating is straightforward because all three speak the OpenAI schema.

Cost comparison and hidden fees

The headline cost story is misleading because each gateway hides cost differently. LiteLLM is "free" until you account for the EC2 / GKE instances, the Postgres database, the Redis cluster, and the on-call engineer who keeps it running. Figure $300 to $1,500/month in infra plus engineering time. OpenRouter is "5.5%" until you realize that 5.5% on $100K/month in token spend is $5,500/month. Portkey is "$0 free tier" until you cross the request limit and jump to a per-request plan that typically lands between $0.0001 and $0.001 per request depending on volume and features.

For a back-of-envelope: if you spend less than $5K/month on LLM tokens, OpenRouter is almost always the cheapest path because you avoid all infrastructure overhead. Between $5K and $50K/month, LiteLLM self-hosted usually wins on raw dollars. Above $50K/month, the calculus shifts toward whoever can save you the most via caching and routing, and that's often Portkey because of its semantic cache and prompt-management features, or LiteLLM if you have the team to tune it.

Self-hosting versus hosted: the operational tradeoff

Self-hosting (LiteLLM, or Portkey's open-source gateway) gives you data sovereignty and unlimited scale at the cost of operations work. You'll be responsible for upgrades, security patches, database backups, Redis failover, and the inevitable middle-of-the-night incident when the proxy OOMs under a traffic spike. In return, no third party ever sees your prompts or responses, which matters for regulated industries.

Hosted (OpenRouter, Portkey SaaS) trades sovereignty for speed-to-value. You can be live in 15 minutes with no infrastructure. The vendor handles uptime, scaling, and most of the security perimeter. The downsides are vendor lock-in on the dashboard (your historical logs live there), data-processing concerns, and the fact that when the vendor has an outage, so do you. Both OpenRouter and Portkey publish status pages; check their 90-day uptime before betting your product on them.

How to pick the right gateway for your stack

Here's the decision framework I use when consulting on this choice:

Pick LiteLLM if you have a platform team, you care about data residency, you want to ship logs to your existing observability stack, or you're spending more than $10K/month on LLMs and want to claw back every dollar.
Pick OpenRouter if you're a small team that values model breadth above all else, you're doing rapid model experimentation, you don't want to run any infrastructure, and your spend is under $20K/month.
Pick Portkey if you're an enterprise that needs prompt versioning, evaluation tooling, guardrails, audit logs, and a single vendor relationship with an SLA, and you're willing to pay per-request for that.

None of these are mutually exclusive. I've seen teams run LiteLLM as the internal proxy for production traffic and use OpenRouter from notebooks for evaluation work. Portkey can also sit in front of LiteLLM if you want hosted observability over a self-hosted data plane. The OpenAI-compatible schema makes these compositions surprisingly clean.

Whichever you choose, the win isn't the gateway itself; it's the operational discipline that comes from having one chokepoint for cost, reliability, and governance. The team that adopts any of these three is materially ahead of the team rolling their own retry loops in 14 services.

Frequently Asked Questions

Is LiteLLM better than OpenRouter?

Neither is universally better. LiteLLM wins on cost, control, and data sovereignty for teams willing to self-host. OpenRouter wins on speed-to-value, model breadth, and zero operations for small teams or experimentation workloads. Pick LiteLLM if you have a platform team; pick OpenRouter if you don't.

Can I self-host OpenRouter?

No. OpenRouter is a closed, hosted SaaS product with no self-hosted version. If self-hosting is a hard requirement, choose LiteLLM (fully open source under MIT) or Portkey's gateway (Apache 2.0, hosted control plane optional).

Does Portkey support function calling and structured outputs?

Yes. Portkey passes through tool calls, JSON mode, and structured outputs to any provider that supports them, including Claude 4.7, GPT-5.1, and Gemini 2.5 Pro. The gateway also logs the structured response and can validate it against a JSON schema as a post-request guardrail.

How much does LiteLLM cost to run in production?

The software is free under MIT. Infrastructure typically costs $300 to $1,500 per month for a small production deployment: two FastAPI replicas, a Postgres instance for usage logs, and a Redis cluster for caching. Add engineering time for upgrades and on-call. The enterprise tier with SSO, prompt management, and support is quoted per-seat.

What is the difference between an AI gateway and an API gateway?

An API gateway (Kong, Envoy, AWS API Gateway) handles generic HTTP concerns like auth, rate limiting, and routing for any service. An AI gateway is purpose-built for LLM traffic and adds model-aware features: token counting, semantic caching, provider fallback on rate-limit errors, prompt-injection scanning, and per-model cost tracking. Many production stacks run both, with the AI gateway behind the general API gateway.

What is an AI gateway?

LiteLLM vs OpenRouter vs Portkey at a glance

LiteLLM deep dive: the open-source proxy

OpenRouter deep dive: the hosted model marketplace

Portkey deep dive: the enterprise control plane

Do you actually need an LLM gateway?

Cost comparison and hidden fees

Self-hosting versus hosted: the operational tradeoff

How to pick the right gateway for your stack

Frequently Asked Questions

Is LiteLLM better than OpenRouter?

Can I self-host OpenRouter?

Does Portkey support function calling and structured outputs?

How much does LiteLLM cost to run in production?

What is the difference between an AI gateway and an API gateway?

Related Articles

E2B vs Modal vs Daytona vs Cloudflare: Best AI Agent Sandbox in 2026

Inngest vs Trigger.dev vs Hatchet: Best Background Job Queue for AI Workflows (2026)