The Deterministic Core: Algorithms and Data Structures That Make Marketing Agents Fast, Safe, and Scalable
A Practitioner’s Guide to Classical Patterns Inside Modern Agent Systems
This is the eighth guide in our series on agentic marketing systems. The first taught you to write skills. The second explained agent anatomy. The third wired tools and MCP. The fourth built a working OpenClaw marketing agent. The fifth covered maintenance and security harnesses. The sixth introduced portable plugins. The seventh reverse-engineered Claude Code’s leaked architecture.
This guide goes underneath all of them.
Every skill you have written, every MCP tool you have connected, every plugin you have packaged, and every CLI command you have scripted is doing one of six things: finding something, exploring something, ordering something, deduplicating something, routing through dependencies, or compressing data. These are not LLM operations. They are classical computer science operations. And getting them wrong is the difference between an agent that runs a weekly reporting pipeline in twelve seconds and one that times out, retries endlessly, or silently drops data.
The model gives your agent intelligence. Algorithms give it reliability.
Interactive companion: Explore the live algorithm visualisations in AI Harness Lab, where the same marketing data shapes from this guide are rendered step by step.
Why This Article Exists
There is a growing gap in the agentic AI conversation. The industry talks constantly about model capabilities, context windows, and reasoning benchmarks. What gets far less attention is the deterministic control layer - the part of the system that does not hallucinate, does not require prompt engineering, and does not cost tokens.
When we examined Claude Code’s leaked architecture in our previous guide, we found that most of the 512,000 lines of TypeScript were not about interacting with the model. They were about managing state, routing tasks, deduplicating tool results, compressing context, and scheduling work in dependency order. Those are algorithmic problems. The model is called when reasoning or generation is needed. Everything else is classical engineering.
This matters for marketing teams for a practical reason. As agent systems scale from a single skill running on a developer’s laptop to a fleet of coordinated agents managing campaigns, creative workflows, compliance checks, and reporting across platforms, the bottleneck shifts. It stops being “can the model do this?” and becomes “can the system around the model handle this reliably at speed?” The answer depends on whether you have chosen the right data structures and algorithms for the job.
This guide does not teach algorithms from textbook first principles. It teaches them from agent-builder first principles: what problem does this solve, why does it matter when building marketing agents, and where does it show up in the stack you are already working with.
Think in Agent Operations
Before looking at any individual algorithm, it helps to have a classification system for the work agents actually do. Not every task requires the model. In fact, most of the compute in a well-designed agent system is spent on deterministic operations that happen before the model is called, after the model responds, or instead of calling the model at all.
We classify agent operations into six categories.
Study mode: The companion lab uses this same operations model, so you can move from the categories in this article into live walkthroughs of find, explore, order, deduplicate, route, and memory patterns.
Find covers search, lookup, and retrieval. When an agent needs to locate a specific campaign in a sorted performance log, check whether a URL has already been processed, or find the exact point where a metric crossed a threshold, it is doing a find operation. These are the most common operations in any data-touching workflow.
Explore covers crawling, traversing, and inspecting. When a skill needs to walk through every page of a website to audit schema markup, or recurse through a repository’s folder structure to find all SKILL.md files, or inspect every node in an ad group hierarchy, it is exploring. The critical question is whether to explore breadth-first or depth-first, and that choice has real consequences for performance and memory usage.
Order covers sorting, ranking, and prioritisation. When a reporting plugin needs to surface the top five campaigns by return on ad spend, or a lead-scoring skill needs to rank prospects, or a CLI export needs to sort thousands of rows before writing a file, it is ordering. The choice of sorting algorithm matters when the dataset is large, partially sorted, or needs to remain stable.
Deduplicate covers removing repeated items. When an agent crawls a site and encounters the same URL through different paths, or processes an event stream where the same alert fires multiple times, or receives overlapping results from parallel tool calls, it needs to detect and remove duplicates efficiently. Naive deduplication is expensive. The right data structure makes it nearly free.
Route covers following dependencies and task graphs. When a workflow has steps that must run in a specific order - extract data before transforming it, transform before analysing, analyse before reporting - the agent needs to resolve that dependency graph correctly. When there are multiple possible tool paths with different costs or latencies, the agent needs to find the best route. This is graph processing, and it is the backbone of every multi-step agent workflow.
Compress covers reducing tokens, logs, and storage. When session transcripts grow too long for the context window, when tool call logs accumulate faster than they can be stored, or when cached data needs to fit in limited memory, the agent needs compression strategies. The leaked Claude Code architecture revealed five distinct compaction strategies for exactly this reason.
Every algorithm in this guide maps to one or more of these six operations. If you can classify the task, you can choose the right algorithm.
Part I: Search and Retrieval
This is where most agent work begins. Before the model can reason about data, the system needs to find the right data. Before a tool can act on a target, it needs to locate the target. Before a workflow can proceed, it needs to confirm that a prerequisite exists.
Binary Search: Finding the Threshold
Binary search is the simplest algorithm that most agent builders underuse. It works on sorted data and finds a target value - or the boundary where a condition changes - by repeatedly halving the search space.
In marketing agent systems, binary search appears whenever you need to answer a question like “when did this metric cross a threshold?” If you have a sorted time series of daily campaign performance data and you want to find the first date where cost per acquisition exceeded your target, a linear scan examines every row. Binary search finds the answer in a handful of steps regardless of whether the dataset has a hundred rows or a hundred thousand.
This pattern shows up in skills that analyse historical performance, MCP tools that serve threshold-based queries, and CLI exports that need to locate pivot points in large datasets. The implementation is small, but the performance difference at scale is not.
BFS: Exploring Breadth-First
Breadth-first search explores a structure level by level. It visits every node at the current depth before moving deeper. In agent systems, BFS is the right choice when you need to cover an entire level of a hierarchy before drilling down.
The most common marketing use case is site auditing. When a skill needs to crawl a website to check for broken links, missing schema markup, or compliance issues, BFS ensures you audit all top-level pages before exploring nested subpages. This matters because top-level pages are typically higher-traffic and higher-priority. An agent that uses BFS for site crawling can surface critical issues on your main landing pages before it has even started examining deep blog archives.
BFS also appears in ad account auditing. When inspecting a Google Ads account structure - account, campaigns, ad groups, ads - a breadth-first traversal examines all campaigns before diving into any single campaign’s ad groups. This gives you a complete picture of campaign-level health before committing to detailed analysis.
The trade-off is memory. BFS must hold the entire current level in memory before moving to the next, which matters when dealing with very wide hierarchies.
DFS: Following One Branch Deep
Depth-first search takes the opposite approach. It follows a single path as deep as it goes before backtracking. In agent systems, DFS is the right choice when the structure is deep and narrow, or when you need to find something that lives at the bottom of a hierarchy.
DFS is the natural fit for repository scanning. When a maintenance skill needs to find all configuration files in a project - CLAUDE.md files, SKILL.md files, package.json files scattered through nested directories - DFS follows each folder path to its end before backtracking. It uses minimal memory because it only needs to track the current path, not the entire level.
DFS also appears in dependency resolution. When an agent needs to trace a chain of task dependencies to find the root cause of a failure - this report failed because the transform failed because the data extraction failed because the API credential expired - it is doing a depth-first traversal of the failure graph.
The choice between BFS and DFS is not about which is “better.” It is about which matches the shape of your problem. Wide and priority-driven means BFS. Deep and path-driven means DFS.
KMP and Rabin-Karp: Pattern Matching in Text
String-matching algorithms become important the moment agents work with text at scale, which in marketing means immediately.
KMP (Knuth-Morris-Pratt) and Rabin-Karp both solve the same problem: finding a pattern inside a larger text. They differ in mechanism - KMP preprocesses the pattern to avoid redundant comparisons, while Rabin-Karp uses hashing to compare pattern and text windows efficiently - but both dramatically outperform naive character-by-character matching on large inputs.
In agent systems, these patterns show up in three places. First, prompt-injection scanning. When an MCP tool receives input from an external source - a webpage, a user-submitted document, a feed - a pattern matcher can check for known injection phrases before the content reaches the model. Second, compliance checking. When a skill audits ad copy or landing page content for prohibited terms, trademark violations, or regulatory language, pattern matching is the underlying operation. Third, UTM and tracking parameter analysis. When a CLI processes thousands of URLs to detect duplicate or malformed UTM patterns, efficient string matching turns an operation that could take minutes into one that takes milliseconds.
Rabin-Karp has an additional advantage: it can search for multiple patterns simultaneously using a set of hashes, making it particularly useful for scanning against a list of known injection strings or prohibited terms.
Bloom Filter: The Cheap Duplicate Gate
A Bloom filter is a probabilistic data structure that answers one question very fast: “have I probably seen this before?” It can produce false positives (saying “yes” when the answer is “no”) but never false negatives (it will never say “no” when the answer is “yes”).
This makes Bloom filters ideal as a gate before expensive operations. In agent workflows, the expensive operation is usually an API call, a model invocation, or a database query. If your site-crawling agent encounters a URL it may have already processed, a Bloom filter check costs almost nothing. If the filter says “not seen,” you can proceed with confidence. If the filter says “possibly seen,” you fall back to a definitive check - but that fallback happens rarely.
Bloom filters appear in URL deduplication during crawls, document-processing pipelines where the same file might arrive through multiple paths, and prompt caching systems where you want to quickly check whether a similar prompt has already been processed. The memory footprint is tiny compared to storing every item you have ever seen, which matters when agents run in constrained environments or process millions of items.
Part II: Ranking and Batching
Once agents have found and retrieved data, the next most common operation is ranking it. Reporting skills rank campaigns. Lead-scoring tools rank prospects. Creative optimisation agents rank ad variants. Export CLIs sort thousands of rows before writing output.
Heap: Keeping the Top K
A heap is a tree-based data structure that efficiently maintains either the smallest or largest element at the top. In agent systems, the critical use case is top-K selection: continuously maintaining the best N items as new data arrives.
Consider a live performance dashboard skill. As campaign data streams in, you want to maintain the top ten campaigns by return on ad spend. A naive approach re-sorts the entire list every time a new data point arrives. A min-heap of size ten accepts each new data point, compares it to the current minimum in the heap, and only swaps if the new point is better. The work per update is logarithmic in K rather than linear in the total dataset size.
This pattern scales to any “surface the best” operation: top creatives, highest-converting landing pages, best-performing keywords, most active leads. Whenever an agent needs a live leaderboard without re-sorting everything every time, a heap is the right structure.
Merge Sort and Quick Sort: The Workhorses
Merge sort and quick sort are the general-purpose sorting algorithms that handle most batch-ranking operations in agent systems.
Merge sort guarantees consistent performance regardless of input order and preserves the relative order of equal elements (stability). This matters when you are sorting campaigns first by category and then by performance within each category - you need the sub-ordering to survive the primary sort.
Quick sort is typically faster in practice due to better cache behaviour but has a worst case on already-sorted or nearly-sorted data, which is exactly the kind of data marketing systems often produce (yesterday’s ranked list with a few changes). The standard mitigation is randomised pivot selection or using an introsort variant that falls back to a guaranteed algorithm when quick sort degrades.
For most agent builders, the choice is straightforward: use your language’s built-in sort (which is almost always an optimised hybrid) for batch operations, and understand that the underlying algorithm matters when you are sorting hundreds of thousands of items or more.
Counting Sort and Radix Sort: When Values Are Small
Counting sort and radix sort exploit a specific property of the data: when the values being sorted are integers within a known, small range, these algorithms sort in linear time - faster than any comparison-based sort can theoretically achieve.
In marketing agent systems, this shows up when sorting event logs by status code, campaigns by a small set of categorical labels, or items by priority tier. If your lead-scoring system uses a 1-to-100 integer score, counting sort will rank a million leads faster than any comparison sort. If your agent categorises campaigns into five performance tiers, counting sort is essentially free.
The pattern is simple: when the range of values is small relative to the number of items, use a counting-based sort. When it is not, use a comparison-based sort. Recognising which situation you are in avoids both overengineering and unnecessary slowness.
Part III: Dependency-Aware Workflows
This is where algorithms become directly visible in agent orchestration. Every multi-step workflow - and most useful agent workflows have multiple steps - is a dependency graph. Getting the execution order wrong means failures, wasted work, or incorrect results.
Topological Sort: Running Tasks in the Right Order
Topological sort takes a directed acyclic graph of dependencies and produces a linear ordering where every task runs after its prerequisites. It is the algorithm behind every “extract → transform → analyse → report” pipeline you build.
In the Claude Code architecture we examined in the previous guide, task scheduling for subagents follows exactly this pattern. When a marketing agent receives a request like “generate this week’s performance report,” the system decomposes it into steps: fetch data from the analytics API, clean and transform the data, calculate derived metrics, generate narrative summaries, assemble the report, and distribute it. Some of these steps can run in parallel (fetching data from multiple platforms), but the downstream steps must wait for their inputs.
Topological sort resolves this automatically. You declare the dependencies - “transform depends on extract, analyse depends on transform” - and the algorithm produces a valid execution order. If you have declared dependencies that form a cycle (A depends on B, B depends on C, C depends on A), the algorithm detects it and fails explicitly rather than letting the agent loop forever.
This is not abstract. Every skill that has ordered steps, every MCP tool that orchestrates a pipeline, every CLI that runs batch operations with dependencies is implementing topological sort, whether the builder knows it or not. Making it explicit means you can parallelise independent steps, detect cycles before runtime, and produce clear execution plans that are auditable.
Dijkstra: Choosing the Cheapest Path
Dijkstra’s algorithm finds the shortest (or cheapest) path through a weighted graph. In agent systems, the “graph” is the set of possible tool paths, and the “weights” are costs - latency, API charges, token usage, or a combination.
Consider a marketing agent that needs to fetch campaign performance data. It could call the Google Ads API directly (fast but rate-limited), query a cached data warehouse (slower but no rate limit), or ask a reporting MCP tool that aggregates multiple sources (slowest but most complete). If the agent needs to perform this operation hundreds of times in a batch job, the cheapest overall path may not be the fastest individual path.
Dijkstra’s algorithm models this as a graph problem: each tool or API is a node, the connections between them are edges, and the weights are the relevant costs. The algorithm finds the path from “need data” to “have data” that minimises total cost. When tool chains become complex - and they do in enterprise marketing stacks - this turns ad-hoc “which API should I use?” decisions into systematic optimisation.
This pattern also applies to multi-step workflows where each step has alternative implementations. If you can generate a creative brief using a fast cheap model or a slower expensive one, and the downstream formatting step can work with either quality level, Dijkstra helps the system choose the combination that meets quality requirements at minimum cost.
Union-Find: Clustering Related Problems
Union-Find (also called Disjoint Set Union) tracks which items belong to the same group and can efficiently merge groups. In agent systems, it solves the “shared root cause” problem.
When a monitoring agent detects that five campaigns all experienced a cost spike on the same day, it needs to determine whether these are five independent problems or one problem affecting five campaigns. Union-Find efficiently groups items that share a connection - same time window, same targeting segment, same landing page, same creative template - and answers “how many distinct problem clusters are there?” without comparing every pair of items.
This appears in incident clustering (grouping alerts that share a root cause), campaign analysis (identifying which campaigns share common factors driving performance), and content deduplication (grouping pages that are substantially similar). The efficiency comes from two optimisations called path compression and union by rank that keep the data structure nearly flat regardless of how many merge operations you perform.
Part IV: Memory and Streaming
Agents that run continuously - monitoring dashboards, processing event streams, maintaining session context - need data structures that handle time-bounded data efficiently. You cannot store everything forever, and you should not reprocess everything from the beginning each time.
Circular Buffer: The Rolling Window
A circular buffer is a fixed-size array that overwrites the oldest entry when full. It is the simplest and most efficient way to maintain a “last N items” window.
In agent systems, circular buffers appear everywhere there is a recency requirement. The last 100 tool calls for debugging. The last 20 report runs for trend detection. The last 50 alerts for pattern recognition. The rolling window of recent campaign performance data that feeds a live summary.
The leaked Claude Code architecture uses this pattern for tool call history. Rather than growing an unbounded log, the system maintains a fixed window of recent activity. When the buffer is full and a new entry arrives, the oldest entry is silently dropped. There is no memory allocation, no garbage collection pressure, and no risk of unbounded growth.
For marketing agents that run as long-lived processes - a monitoring daemon, a live reporting skill, a continuous compliance checker - circular buffers are essential infrastructure. They guarantee bounded memory usage while preserving the most relevant recent data.
Compression and Context Management
The connection between compression algorithms and agent systems is more direct than it might appear. The fundamental problem of LLM context windows - fitting relevant information into a finite space - is a compression problem.
Huffman coding and its descendants work by assigning shorter representations to more frequent items. In agent systems, the analogy is context compaction: summarising verbose tool outputs, condensing repetitive log entries, and replacing frequently referenced information with shorter references. The five compaction strategies we documented in the Claude Code architecture analysis - summary compaction, tool output truncation, conversation pruning, context window rotation, and session checkpointing - are all compression operations applied to conversational context rather than binary data.
For marketing agents, the practical application is session management. When a campaign analysis session grows long - dozens of tool calls, multiple data retrievals, evolving hypotheses - the agent needs to compress older context without losing critical information. Understanding that this is a compression problem, not just a “summarise the conversation” problem, leads to better strategies: keep the decisions and conclusions, compress the intermediate reasoning, and discard the raw tool outputs that have already been processed.
Index structures - tries, suffix arrays, and inverted indexes - serve a complementary purpose. When agents maintain persistent memory across sessions, they need to retrieve relevant past knowledge quickly. An unstructured memory file requires scanning the entire contents for every retrieval. An indexed memory allows targeted lookup by keyword, topic, or time range. Claude Code’s three-layer memory design (MEMORY.md index → topic files → searchable transcripts) is exactly this pattern: an index layer for fast routing, a topic layer for focused retrieval, and a raw layer for when complete detail is needed.
How These Algorithms Map to Your Stack
The algorithms above are not theoretical knowledge. They appear in specific layers of the systems you are building. Here is where each one shows up.
In Skills
Skills are structured workflows over data, and they are where ordering and exploration algorithms do most of their work. When a skill has steps that must run in sequence, topological sort determines the order. When a skill needs to inspect nested artefacts - repository structures, ad group hierarchies, site trees - BFS or DFS provides the traversal. When a skill needs to avoid reprocessing items from a previous run, a Bloom filter provides the deduplication gate. When a skill maintains a running summary, a circular buffer holds the recent state.
In MCP Servers
MCP tools expose structured queries over datasets, which means search and ranking algorithms are central. Binary search powers threshold lookups. Heaps power top-K endpoints. Hash maps combined with Bloom filters provide efficient deduplication and caching. Pattern-matching algorithms power text-search and compliance-checking tools.
In Plugins
Plugins package repeated workflows with persistent memory, making compression and indexing patterns essential. Circular buffers maintain recent activity windows. Suffix and index structures make searchable memory practical. Compression reduces the storage cost of session logs and cached transcripts.
In CLIs
CLIs handle batch operations, which is where sorting algorithms earn their keep. Merge sort and quick sort handle large export files. Topological sort manages dependency-aware command sequences. BFS and DFS power repository scans and site crawls.
In Agent Orchestration
Agent orchestration is graph processing. Tasks are nodes, dependencies are edges. Topological sort schedules work. Union-Find clusters related tasks and failures. Dijkstra finds the cheapest execution path when tool calls have different costs. The fork-join subagent pattern from Claude Code’s architecture is a graph-parallel execution model with dependency resolution at each join point.
Three Marketing Scenarios
To make this concrete, here are three common marketing workflows with the algorithms they use.
Scenario 1: Weekly Performance Reporting
A reporting skill runs every Monday morning. It needs to fetch data from multiple platforms, identify the top-performing campaigns, generate a summary, and distribute the report.
The workflow uses BFS to crawl the dashboard tree, visiting each platform at the top level before diving into individual campaign details. Topological sort ensures the pipeline runs in order: data fetch completes before transformation, transformation completes before analysis, analysis completes before report generation. A heap maintains the top five campaigns by return on ad spend as data streams in from each platform, avoiding a full re-sort. A circular buffer keeps the last twenty report runs, enabling the skill to compare this week’s performance against recent history and flag significant changes.
Scenario 2: Content Compliance Scanner
A security skill scans landing pages and ad copy for prohibited terms, prompt-injection patterns, and trademark violations.
KMP or Rabin-Karp scans each document against a set of known prohibited patterns. Multi-pattern Rabin-Karp is particularly efficient here because the skill can check for dozens of patterns in a single pass through each document. A Bloom filter tracks which documents have already been scanned, so incremental runs skip previously processed content. DFS recurses through the repository or site structure, following each branch to its deepest page before backtracking. Union-Find clusters violations that share a common root cause - for example, multiple pages using the same non-compliant creative template - so the remediation team can fix one source rather than patching individual pages.
Scenario 3: Multi-Agent Campaign Orchestration
A fleet of specialised agents collaborates on a quarterly campaign launch: one handles creative generation, one manages bid strategy, one runs compliance checks, one coordinates targeting, and an orchestrator agent manages the workflow.
Topological sort sequences the dependencies: targeting parameters must be defined before bid strategy can be configured, creative must be generated before compliance can check it, and all components must be approved before launch. Dijkstra selects the cheapest tool path at each step - when the creative agent can use either a fast model for draft generation or a more capable model for final output, the path choice depends on the stage of the workflow and the remaining budget. Union-Find clusters any failures that occur during the orchestration: if three targeting configurations fail simultaneously, the system identifies whether they share a common cause (for example, an API outage affecting a specific platform) rather than treating them as independent failures.
The Rule of Thumb
You do not need to memorise every algorithm. You need to recognise which category your problem belongs to, and then choose the simplest algorithm that solves it well.
For text problems - scanning content, matching patterns, detecting injections - reach for KMP, Rabin-Karp, or Bloom filters.
For task and ordering problems - scheduling steps, managing dependencies, maintaining priority queues - reach for topological sort, Union-Find, and heaps.
For exploration problems - crawling sites, scanning repositories, inspecting hierarchies - reach for BFS or DFS, and choose based on whether breadth or depth matters more for your use case.
For ranking problems - surfacing top performers, sorting exports, ordering lists - reach for your language’s built-in sort for batch operations and heaps for streaming top-K.
For memory problems - maintaining recent history, indexing persistent knowledge, managing context windows - reach for circular buffers, index structures, and compression.
That is twelve algorithms and data structures covering the vast majority of deterministic operations in marketing agent systems. Everything else is the model’s job.
What This Means for the Series
Throughout this series, we have been building layers. Skills gave agents procedural knowledge. Tools and MCP gave them the ability to act on external systems. Plugins packaged those capabilities for sharing. The harness architecture from the Claude Code analysis showed how production systems manage memory, permissions, and orchestration at scale.
This guide reveals what sits underneath all of those layers. The algorithms and data structures in this article are not additions to your stack. They are already present in every skill, tool, plugin, and orchestration pattern you have built. Making them explicit, choosing them deliberately rather than relying on whatever your language’s standard library happens to provide, is the difference between agent systems that work in demos and agent systems that work in production.
The live companion at AI Harness Lab lets you step through these structures visually, using the same marketing-oriented examples discussed in this guide.
The companion repository at github.com/ai-knowledge-hub/ai-skills-guide includes reference implementations for each algorithm discussed here, with marketing-specific examples: a binary-search threshold finder for campaign metrics, a BFS site crawler, a topological-sort pipeline scheduler, a Bloom-filter deduplication gate, and a circular-buffer recent-activity tracker. The implementations are intentionally minimal - clear enough to read, small enough to embed in a skill or MCP tool, and tested against realistic marketing data shapes.
The model provides intelligence. The harness provides structure. The algorithms provide speed. You need all three.
This guide is part of the Performics Labs AI Knowledge Hub series on agentic marketing systems. Previous guides: Building AI Skills · Agent Architecture · Tools, MCP, and CLI · Your OpenClaw Marketing Agent · Maintenance and Security Harness · Portable Plugins · Claude Code Architecture
Implementation examples and skill templates are available in the companion repository at github.com/ai-knowledge-hub/ai-skills-guide.