Inngest Raises $3M SeedInngest Raises $3M Seed from GGV Capital & Guillermo Rauch ›

Zero-infrastructure LLM & AI

Build LLM and AI chains reliably in minutes — no memory, state, or infrastructure needed. Locally test then deploy to any platform using normal code.

Automatic Memory & Context

Functions automatically maintain state, allowing you to reference the output of any API call in normal code without using databases or caching.

Fully Serverless

Deploy to any provider, on any platform. Inngest ensures that each step is called once, and spreads each step over multiple function invocations while maintaining state.

Reliable by Default

Inngest automatically retries steps within functions on error. Never worry about issues with your provider's availability or API downtime again.

Build reliable AI products in a few lines of code

Chained LLMs

1 Define an event to trigger your chain function

2 Use step.run for reliable API calls

3 Return state from each step

4 Use state in following steps in your chain

Automatic retries and persisted state across all steps in your chain.

import { inngest } from "./client";

inngest.createFunction(
  { id: "summarize-chat-and-documents" },
  { event: "api/chat.submitted" },
  async ({ event, step }) => {
    const llm = new OpenAI();

    const output = await step.run("summarize-input", async () => {
      return await llm.createCompletion({
        model: "gpt-3.5-turbo",
        prompt: createSummaryPrompt(event.data.input),
      });
    });

    const title = await step.run("generate-a-title", async () => {
      return await llm.createCompletion({
        model: "gpt-3.5-turbo",
        prompt: createTitlePrompt(output),
      });
    });

    await step.run("save-to-db", async () => {
      await db.summaries.create({
        output,
        title,
        requestID: event.data.requestID,
      });
    });

    return { output, title };
  }
);

Advanced features, for production-ready systems

Cancellation

Cancel long running functions automatically or via an API call, keeping your resources free.

Concurrency

Set custom concurrency limits on functions or specific API calls, and only run when there's capacity.

Per-User Rate-Limiting

Set hard rate limits on functions using custom keys like user IDs, ensuring that you use your model tokens or GPU efficiently.

Learn more

Dive into our resources and learn how Inngest is the best solution for building reliable LLM + AI products in production.

Blog

Running chained LLMs with TypeScript in production

What is chaining and when should you use it?

Read blog

Product

How Inngest Works

Use Cases

Learn

Docs

Patterns: Async & event-driven

Quick Starts