Inngest vs Trigger.dev vs Hatchet (2026)

Updated: June 3, 2026

For AI workflows in 2026, Inngest, Trigger.dev, and Hatchet are the three production-grade durable job queues teams pick when Celery and BullMQ can't handle the long-running, partially-failing nature of LLM pipelines. Inngest wins on developer experience and serverless fit; Trigger.dev wins on long-running tasks with rich UI and self-hosting; Hatchet wins on raw throughput and Postgres-only operational simplicity. This guide compares them on durable execution, retries, concurrency, observability, pricing, and the specific quirks that matter for agent and RAG pipelines.

Inngest, Trigger.dev, and Hatchet all implement durable execution: each step is checkpointed so the workflow survives crashes, timeouts, and deploys mid-run.
Trigger.dev v4 allows individual tasks up to 24 hours via maxDuration, making it the most forgiving for slow LLM tool chains and large batch jobs.
Hatchet runs entirely on Postgres with no Redis or Kafka, and v1 sustains roughly 4x the throughput of v0 for fan-out workloads.
Inngest's step.ai.infer() wraps LLM calls so retries don't re-bill you for completed model spans, and integrates natively with Vercel, Cloudflare Workers, and AWS Lambda.
All three support concurrency keys, rate limiting, dead-letter queues, scheduled crons, and TypeScript SDKs; Hatchet and Trigger.dev have first-class Python SDKs, Inngest's Python SDK is still beta.
For Zapier and n8n teams adding code-level reliability, Trigger.dev's task-as-a-file model is the gentlest on-ramp; for high-volume agent fleets, Hatchet's worker model scales further per dollar.

Why LLM workflows break traditional job queues

So, the failure mode is almost always the same. I've been migrating clients off Celery and BullMQ for the last eighteen months, and it goes like this: a four-step agent pipeline calls a tool, waits forty seconds for a model response, hits a transient 529 from Anthropic, and the worker dies mid-step. Without durable execution, the entire job restarts, burning the embedding tokens you already paid for, the vector search you already ran, and the partial state your agent built up. At current Claude Sonnet 4.6 pricing, three retries of a 200k-token context window costs more than the engineer-hour spent debugging the retry storm.

The new generation of queues solves this with step-based durable execution. Each await step.run(...) persists its result before the next step starts. If the worker crashes between steps, the queue picks up exactly where it left off and replays the deterministic parts without re-executing the expensive ones. That model maps cleanly onto LLM tool loops, RAG indexing pipelines, and async agent fleets, which is why Inngest, Trigger.dev, and Hatchet have all converged on it independently.

If you want the deeper background on durable execution itself, the LangGraph and Temporal orchestration guide covers the underlying state-machine theory. This article focuses on the three queues that have eaten that mindshare for application-layer AI work.

Inngest vs Trigger.dev vs Hatchet: feature comparison

Before we get into the deep dives, here's the side-by-side I share with clients during architecture reviews. All data reflects each product's state as of May 2026.

Feature	Inngest	Trigger.dev v4	Hatchet v1
Primary language SDK	TypeScript (Python beta)	TypeScript, Python (GA)	Python, TypeScript, Go
Durable execution model	Step memoization	Checkpointed tasks	DAG of step functions
Max single task duration	2 hours per step	24 hours via maxDuration	Unlimited (worker-owned)
Storage backend	Managed (proprietary)	Postgres + Redis + ClickHouse	Postgres only
LLM-aware primitive	step.ai.infer()	ai.* SDK helpers	None (DIY in task)
Concurrency keys	Yes, per-key with priority	Yes, named queues	Yes, sticky workers
Self-hosting	OSS dev server only	Full OSS, Docker Compose	Full OSS, Helm chart
Free tier (cloud)	50k runs / month	$10 / month credit	Generous OSS, paid cloud TBD
Typical fit	Vercel and Lambda serverless	Long agent tasks, video AI	High-throughput Python agents

Inngest: serverless-first durable functions

Inngest's pitch is that you write a normal TypeScript function, sprinkle step.run() around each side effect, and the platform handles retries, timeouts, fan-out, and observability. Each step result is memoized, so the next time the function is invoked after a crash, Inngest replays the function code but short-circuits any step that already returned a value. The mental model is identical to React's reconciliation: deterministic re-render, idempotent commits.

For LLM workflows, the killer feature is step.ai.infer(). It treats the model call as a durable step but also reports token usage, model name, and latency into the Inngest dashboard, so you can correlate a slow run with the offending Anthropic prompt without bolting on a separate observability tool. You can read the full primitive in the official Inngest documentation.

import { Inngest } from "inngest";
import { anthropic } from "@ai-sdk/anthropic";

const inngest = new Inngest({ id: "rag-pipeline" });

export const indexDocument = inngest.createFunction(
  { id: "index-document", concurrency: { key: "event.data.tenantId", limit: 5 } },
  { event: "doc/uploaded" },
  async ({ event, step }) => {
    // Step 1: deterministic and cheap, safe to re-run on retry
    const chunks = await step.run("chunk", () =>
      semanticChunk(event.data.text, { maxTokens: 800 })
    );

    // Step 2: paid embedding call, checkpointed so retries skip it
    const vectors = await step.run("embed", () =>
      embedBatch(chunks, "voyage-3-large")
    );

    // Step 3: LLM call with token accounting in the dashboard
    const summary = await step.ai.infer("summarize", {
      model: anthropic("claude-sonnet-4-6"),
      body: { messages: [{ role: "user", content: `Summarize: ${event.data.text}` }] },
    });

    await step.run("upsert", () => qdrant.upsert(vectors, summary));
    return { chunks: chunks.length };
  }
);

What I like in production: the per-key concurrency limit above means tenant A can't drown out tenant B, even under burst load, which matters when one customer uploads a 200-PDF batch on a Tuesday morning. What I dislike: the Python SDK is still labeled beta, and the proprietary backend means you can't currently move historical run data out of Inngest cloud. If you're already on Vercel or Cloudflare Workers, none of that matters; the deploy story is unbeatable, and event-driven fan-out across a serverless fleet is genuinely a two-line change.

Trigger.dev: long-running tasks with a UI

Trigger.dev v4 is the queue I reach for when a single workflow legitimately needs to run for hours. A multi-hop research agent, a video generation pipeline, an embedding job over a million-row Postgres table. The default task duration is one hour, and bumping maxDuration takes you to twenty-four. That sounds like a small detail until you've watched a Lambda time out at fifteen minutes during a Claude computer-use session and tried to explain to a stakeholder why the work has to start over.

The model is "task as a file." You write a tasks/index-customer.ts module exporting a task, the CLI introspects it, and your dashboard shows a typed timeline of every run, every retry, every payload. The platform stamps a checkpoint after each await, then resumes on a fresh container after pre-emption or deploy. Their official Trigger.dev v4 documentation describes the new lazy-attempt model that finally fixed the cold-start surprises v3 suffered from.

import { task, logger, queue } from "@trigger.dev/sdk/v3";
import Anthropic from "@anthropic-ai/sdk";

const anthropic = new Anthropic();

export const researchAgent = task({
  id: "research-agent",
  maxDuration: 60 * 60 * 4, // 4 hours of real wall-clock time
  queue: queue({ name: "agents", concurrencyLimit: 20 }),
  retry: { maxAttempts: 5, minTimeoutInMs: 2000, factor: 2 },
  run: async (payload: { question: string }) => {
    let messages = [{ role: "user" as const, content: payload.question }];
    for (let turn = 0; turn < 12; turn++) {
      const reply = await anthropic.messages.create({
        model: "claude-sonnet-4-6",
        max_tokens: 4096,
        tools: webSearchTools,
        messages,
      });
      logger.info("turn complete", { turn, stop: reply.stop_reason });
      if (reply.stop_reason === "end_turn") return reply;
      messages = appendToolResults(messages, reply, await runTools(reply));
    }
  },
});

The standout: full self-hosting via Docker Compose. You bring Postgres, Redis, and ClickHouse; Trigger.dev gives you the orchestration UI and the worker manager. For regulated industries (health, finance, government), that's frequently the only acceptable architecture. The cost: you operate three databases instead of one, which is why I steer high-throughput-but-low-regulation teams toward Hatchet instead.

Hatchet: Postgres-only throughput beast

Hatchet is the one I'd not heard of two years ago and now reach for first when a client says "we're doing four hundred agent runs a minute and Celery is on fire." The architectural choice that makes it interesting is that it runs entirely on Postgres, with no Kafka, no Redis, no proprietary state store. Their v1 release rewrote the scheduler around Postgres logical replication and advisory locks, and the team published benchmarks and architecture notes on the Hatchet GitHub repository showing roughly four times the fan-out throughput of v0.

The programming model is a DAG: you declare step functions, declare their parents, and the engine schedules workers to drain each level. For a RAG ingestion fleet where the same five-step pipeline runs across thousands of documents per hour, that's structurally simpler than Inngest's per-function memoization, because parallelism happens at the queue layer rather than inside one function.

from hatchet_sdk import Context, Hatchet

hatchet = Hatchet()

@hatchet.workflow(on_events=["doc:uploaded"], concurrency_limit=50)
class IngestDoc:
    @hatchet.step(timeout="2m", retries=3)
    def chunk(self, ctx: Context):
        text = ctx.workflow_input()["text"]
        return {"chunks": semantic_chunk(text, max_tokens=800)}

    @hatchet.step(parents=["chunk"], timeout="10m", retries=5)
    def embed(self, ctx: Context):
        chunks = ctx.step_output("chunk")["chunks"]
        # Hatchet retries this step only; chunk() result is durable
        return {"vectors": embed_batch(chunks, model="voyage-3-large")}

    @hatchet.step(parents=["embed"])
    def upsert(self, ctx: Context):
        qdrant.upsert(ctx.step_output("embed")["vectors"])
        return {"ok": True}

worker = hatchet.worker("ingest-worker", max_runs=200)
worker.register_workflow(IngestDoc())
worker.start()

What you lose: Hatchet has no LLM-specific primitives. You bring your own Anthropic or OpenAI client, your own token accounting, your own retry-after-rate-limit logic. What you gain: pure operational simplicity. The same Postgres instance that already holds your application data can run Hatchet's queue, and one fewer moving part during an outage is a real win. Teams I work with that already maintain Postgres at scale almost always converge here.

What is the best job queue for LLM workflows?

Honestly, there isn't one universal winner. The right pick depends on where your team already lives. After running this comparison across roughly thirty engagements in 2025 and 2026, the heuristics I trust:

You ship on Vercel or Cloudflare Workers and write TypeScript. Inngest. The integration is two lines, step.ai.infer() handles LLM observability without a Helicone or LangSmith bill, and the concurrency-key model maps cleanly onto multi-tenant SaaS.
You need genuinely long-running tasks (more than 15 minutes) or you must self-host. Trigger.dev. The 24-hour task ceiling plus full Docker Compose deployment makes it the only option in regulated environments where Inngest's hosted-only model is a non-starter.
You're Python-heavy, run thousands of jobs per minute, and want Postgres-only ops. Hatchet. You'll write a bit more retry plumbing for LLM rate limits, but the throughput-per-dollar is unmatched and the operational surface area is the smallest.
You're orchestrating long-lived stateful agents with branching reasoning loops. Pair any of these with LangGraph or a Temporal-style state machine. The queue handles delivery, the state machine handles cognition. This is the architecture I detail in the durable agent pipelines piece linked earlier.

One thing none of these queues solve out-of-the-box is cost telemetry across providers. Model-level routing decisions still belong in an LLM gateway. Pair the queue with a gateway and tracing stack and you get the full picture; the LLM observability guide covers the OpenTelemetry conventions that all three queues can emit into.

Self-hosting, pricing, and lock-in

Pricing in this category in 2026 is annoyingly inconsistent. Each vendor charges along a different axis, which makes apples-to-apples math hard until you've modeled your own workload.

Inngest

Free tier covers 50,000 runs per month with one concurrency slot. The Pro tier starts at $50 per month, but the real cost driver is step count. A workflow with twelve step.run calls bills as twelve units. For RAG pipelines with many small steps, model this carefully before signing up. The self-host story is "local dev server," not production-ready.

Trigger.dev

Hybrid model: cloud starts with a $10 monthly credit, then $0.0025 per compute-minute. A two-hour Claude computer-use session burns 120 minutes at $0.0025, which is $0.30, usually a rounding error compared to the model bill. Self-hosting is genuinely free and supported; you pay only for the Postgres, Redis, and ClickHouse infrastructure you provide.

Hatchet

Open-source under the MIT license, with a managed cloud in early access. Most teams I've migrated have stayed on the OSS deployment behind a small Kubernetes cluster; one Postgres instance plus a Helm chart is the entire footprint. The cost is operational, not licensed.

Migrating from Celery, BullMQ, or n8n code nodes

If you're moving off Celery (Python) or BullMQ (Node), the migration is mostly mechanical: replace the task decorator, wrap each network-touching block in a step, delete your hand-rolled retry middleware. The harder shift is mental. Durable functions are not stateless workers, and the function body re-runs on every retry. Anything non-deterministic (UUID generation, current timestamps, random model temperatures) must live inside a step.run so the result is memoized.

From n8n, the pattern I push teams toward is hybrid: keep n8n for the SaaS connectors (Salesforce, HubSpot, Slack), trigger a Hatchet or Trigger.dev workflow from an HTTP node whenever the work involves an LLM tool loop, and write the result back via webhook. This avoids the n8n agent footguns I documented in the n8n AI agent infinite loop fix, and keeps non-engineers in control of the connectors they understand.

Frequently Asked Questions

Inngest vs Trigger.dev: which is faster?

Cold-start latency favors Inngest (sub-100ms on Vercel edge) because its workers are stateless functions. Throughput at sustained load favors Trigger.dev v4 thanks to its lazy-attempt scheduler. For LLM workflows where each step waits seconds on a model, the difference is essentially invisible. Pick on developer experience, not raw queue speed.

Does Hatchet support Python natively?

Yes. Hatchet's Python SDK is generally available and is the most-used SDK in production today, ahead of the TypeScript and Go SDKs. The workflow-class pattern with @hatchet.step decorators feels natural to anyone coming from Celery or Prefect.

Can you self-host Trigger.dev for free?

Yes. Trigger.dev v4 is fully open source under the Apache 2.0 license, with a documented Docker Compose deployment that requires Postgres, Redis, and ClickHouse. You pay only for the infrastructure you provide; there is no source-available throttle that pushes you to cloud.

How do you handle LLM retries with rate limits?

All three queues retry failed steps with configurable exponential backoff. For 429 errors specifically, set the retry factor to 2 or higher and read the retry-after header from the model provider's response, then throw a custom error that the queue will respect. Inngest exposes this as NonRetriableError and RetryAfterError; the others use thrown exceptions with metadata.

When should I use Temporal instead of these queues?

Reach for Temporal when you're orchestrating long-lived business processes that span multiple services, languages, and teams. The polyglot signal-and-query model is unmatched there. For application-layer AI work (single-team RAG pipelines, agent fleets, async LLM tasks) Inngest, Trigger.dev, and Hatchet ship faster and cost less to operate.