LLM apps aren't CRUD apps with an API call bolted on. They have unique constraints:
- Non-deterministic outputs (same input, different output)
- Token limits (you can't shove everything in)
- Cost per call (every request costs money)
- Latency (1-5 seconds per call is normal)
- Failure modes (rate limits, hallucinations, timeouts)
These constraints demand specific architecture patterns. Here are the three you'll use most.