Category

AI.

AI engineering, LLM patterns, RAG systems, evals, and production agent architectures we actually ship.

Latest AI posts

AI
9 min

Claude 4.7 vs GPT-5 vs Gemini 3: which LLM wins for production SaaS in 2026

Three flagship models, three different strengths. Here's how we pick between Claude 4.7, GPT-5, and Gemini 3 when wiring an LLM into a production SaaS — and the tradeoffs nobody talks about.

Read post
AI
9 min

From prototype to production: shipping an AI feature that doesn't hallucinate

The prototype impressed the stakeholders. The production version invented a policy number and sent it to a customer. Here's the layered defence we ship now so that doesn't happen.

Read post
AI
9 min

The real cost of running LLMs at scale: token economics for SaaS founders

What LLMs actually cost at production scale in 2026 — per-model pricing, cache math, batch savings, input/output ratios, and the trap that sinks more SaaS margins than any other.

Read post
AI
8 min

Vector database showdown 2026: Pinecone vs Weaviate vs pgvector vs Qdrant

Four vector databases, four different sweet spots. Here is how our team picks between Pinecone, Weaviate, pgvector, and Qdrant — with the pricing, latency, and filtering tradeoffs that matter in production.

Read post
AI
10 min

AI agents in production: the architectural patterns that survived 2025

The 2024–2025 agent hype cycle shipped a lot of demos and very few production systems. Here are the architectural patterns that actually survived — and the failure modes that killed the rest.

Read post
AI
10 min

Building production RAG systems in 2026: vector DBs, hybrid search, and eval frameworks

Vector DBs, embedding models, hybrid search, rerankers, and evals — the production RAG stack the team actually ships in 2026, with the numbers and pitfalls that shape the decision.

Read post
AI
8 min

AI observability: logging, tracing, and evals for LLM apps

Production LLM apps fail silently. Here is how our team wires up traces, evals, cost tracking, and drift detection with the tools that actually earned their place in 2026.

Read post
AI
8 min

How to add AI features to an existing SaaS without burning your runway

Ship AI features that earn their token budget. A pragmatic playbook for bootstrapped SaaS — picking the right first feature, tier-gating usage, capping spend, and designing for graceful failure.

Read post
AI
9 min

Prompt caching strategies that cut Claude API bills by 70%

Prompt caching is the single biggest cost lever on RAG and agent workloads. Here's the math, the right cache_control placement, and the traps that quietly tank cache hit rates.

Read post
AI
8 min

Fine-tuning vs RAG vs prompting: a decision framework for 2026

Three tools, three jobs. Here is the framework our team uses to decide when to fine-tune, when to reach for RAG, and when a well-designed prompt is genuinely all the problem needs.

Read post