Architecture Patterns for LLM Apps

LLM apps aren't CRUD apps with an API call bolted on. They have unique constraints:

- Non-deterministic outputs (same input, different output)
- Token limits (you can't shove everything in)
- Cost per call (every request costs money)
- Latency (1-5 seconds per call is normal)
- Failure modes (rate limits, hallucinations, timeouts)

These constraints demand specific architecture patterns. Here are the three you'll use most.

1 / 8

Ship it

Architecture diagram for 3 common LLM app patterns

When to RAG vs Fine-Tune vs Prompt Engineer

Embedding Models: How Machines Understand Text

Rex — Architecture Patterns for LLM Apps

Gemini Flash

Ask Rex anything about this lesson

Get help with concepts, run prompts, practice exercises, or ask what to do next.

LLM Orchestration & RAG Systems/LLM Orchestration Foundations/Architecture Patterns for LLM Apps

30 minLesson 3 of 13

Architecture Patterns for LLM Apps

LLM apps aren't CRUD apps with an API call bolted on. They have unique constraints:

- Non-deterministic outputs (same input, different output)
- Token limits (you can't shove everything in)
- Cost per call (every request costs money)
- Latency (1-5 seconds per call is normal)
- Failure modes (rate limits, hallucinations, timeouts)

These constraints demand specific architecture patterns. Here are the three you'll use most.

1 / 8

Ship it

Architecture diagram for 3 common LLM app patterns

When to RAG vs Fine-Tune vs Prompt Engineer

Embedding Models: How Machines Understand Text

LLM Orchestration & RAG Systems/LLM Orchestration Foundations/Architecture Patterns for LLM Apps

30 minLesson 3 of 13

Architecture Patterns for LLM Apps

LLM apps aren't CRUD apps with an API call bolted on. They have unique constraints:

- Non-deterministic outputs (same input, different output)
- Token limits (you can't shove everything in)
- Cost per call (every request costs money)
- Latency (1-5 seconds per call is normal)
- Failure modes (rate limits, hallucinations, timeouts)

These constraints demand specific architecture patterns. Here are the three you'll use most.

1 / 8

Ship it

Architecture diagram for 3 common LLM app patterns

When to RAG vs Fine-Tune vs Prompt Engineer

Embedding Models: How Machines Understand Text

Rex — Architecture Patterns for LLM Apps

Gemini Flash

Ask Rex anything about this lesson

Get help with concepts, run prompts, practice exercises, or ask what to do next.

LLM Orchestration & RAG Systems/LLM Orchestration Foundations/Architecture Patterns for LLM Apps

30 minLesson 3 of 13

Architecture Patterns for LLM Apps

LLM apps aren't CRUD apps with an API call bolted on. They have unique constraints:

- Non-deterministic outputs (same input, different output)
- Token limits (you can't shove everything in)
- Cost per call (every request costs money)
- Latency (1-5 seconds per call is normal)
- Failure modes (rate limits, hallucinations, timeouts)

These constraints demand specific architecture patterns. Here are the three you'll use most.

1 / 8

Ship it

Architecture diagram for 3 common LLM app patterns

When to RAG vs Fine-Tune vs Prompt Engineer

Embedding Models: How Machines Understand Text