Foundation Models & Retrieval

Summary

Consulting essays on RAG patterns, when to fine-tune vs. prompt vs. tools, embedding drift, retrieval latency, and structured retrieval with small adapters.

Overview

The architecture choice that shapes cost, quality, and speed

How you combine a foundation model with your own data does more to determine cost, quality, and latency than which model you pick. Retrieval, fine-tuning, and tools are not competing religions; they are levers with different economics, and the skill is knowing which to pull for a given job.

These essays cover the retrieval and adaptation patterns that hold up once real data, real scale, and real users arrive.

RAG patterns that survive contact with real data

Retrieval quality, not model size, is usually the true bottleneck. Chunking strategy, hybrid search, and reranking typically do more for answer quality than swapping to a larger, costlier model, and they cost a fraction as much to change.

When to fine-tune vs prompt vs tools

Fine-tune for format and tone, prompt for reasoning, and reach for tools when the task needs ground truth or real-world actions. Most teams fine-tune too early and instrument too late, paying for both mistakes in production.

Embedding drift and retrieval latency

Embeddings age as your corpus and your models change, and retrieval latency compounds across a multi-step chain. Both are operational realities that need monitoring and periodic rebuilds, not a one-time setup you can forget.

In this collection

Essays from the Stratenity Advisory Team on foundation models and retrieval. Open any title for the full read.

RAG Isn’t a Silver Bullet, But This Setup Works Often

Cross-Industry

Notes on chunking, grounding, and when to skip RAG entirely.

When to Fine-Tune vs. Prompt vs. Tools

Technology & Software

A practical decision tree we use in workshops.

Retrieval-Augmented Generation: Design Patterns for Scale

Cross-Industry

End-to-end patterns for chunking, indexing, grounding, citations, and fallbacks at scale.

Embedding Drift: Detecting When “Meaning” Moves

Finance & Banking

A lightweight drift monitor using canary queries and centroid distance.

Retrieval Latency: Where the Milliseconds Hide

Retail & Consumer

Profiling the path: network, serialization, vector I/O, and cold caches.

Structured Retrieval with Small Adapters

Manufacturing & Production

Marrying structured stores with vector recall without heroics.

Go further

Go deeper with Stratenity frameworks

The public essays sketch the trade-offs. The full library holds the reference architectures, retrieval-tuning guides, and build-vs-buy diagnostics teams use to commit with confidence.

Start your free 3-day trial ›