Stop the “Oops I Skipped a Step” Bot - Routine Brings Order to Agent Chaos 🤖➡️📈

Large-language-model agents love to hallucinate plans-and forget half the tools you give them. Routine (Hugging Face / Zhejiang U) fixes that by forcing explicit, step-by-step scripts with typed parameters. Accuracy on enterprise, multi-tool tasks jumped +34 pts on GPT-4o, +51 pts on Qwen3-14B. Below we translate the research into concrete plays for Search, Programmatic, Social and E-commerce teams.

Key Facts

Signal	Detail
Structured Plan DSL	JSON plan → `{step, tool, inputs, outputs}`; zero implicit reasoning
Parameter Passing	Output vars flow into next step-no context bleed
Planner ≠ Executor	Two modules; easy to debug & fine-tune separately
Distillation Gains	Small models learn the recipe → slice infra cost 40 %
Enterprise Bench	Financial-report parsing, SKU enrichment, ad-ops setups

Why It Matters for Every Channel

Channel	Old Reality	Routine-Style Boost	Practical Win
Search (SEO/SEM)	Prompt chains drop schema push or miss re-crawl ping	Planner lists “generate JSON-LD → pingIndexNow” as atomic steps	Faster inclusion in AI answers; less manual QA
Programmatic	Bid-shift agents mis-order “pull-stats / set-bid” calls	Ordered workflow → fetch KPI ➟ compute delta ➟ update bid	Stops runaway CPC spikes; improves ROAS stability
Social	Meta creative-rotate bot forgets fatigue check	`checkFatigue()` must run before `swapCreative()`	CPA ↓ as bad ads pause on time
E-commerce (AMC)	Multi-step feed-enrichment scripts lose size/colour data	Explicit param pass keeps variant-level attributes intact	Better match & ranking in Amazon shelves

How Routine Works (60-sec Tech Stack)

AI-Web Diagram — Figure 1: Routine agentic framework

Planner (LLM large) → produces immutable plan with clear I/O.
Executor → smaller LLM or plain code reading plan, calling tools.
Distiller → trains a cheaper model on successful plans for volume jobs.

Pros & Cons

✔ Pros	⚠ Cons
Fewer skipped steps	Higher task success
Easy debug	Failed steps show in logs
Works with small models after distillation	Planner & executor both need guard-rails

Strategic To-Dos for Performance Teams

#	Move	What to Do	Outcome
1	Map Your Playbooks	Write 3–5 key workflows as plain-language steps (e.g., “pull yesterday spend → calc ROAS → if ROAS < 2, cut bid 15 %”).	Blueprint for planner prompts.
2	Define Typed Parameters	List required inputs/outputs per tool: campaignId:int, roas:float.	Stops context leakage.
3	Use JSON Plans	Ask GPT-4o: “Return plan as array of {step,tool,inputs,outputs}”.	Deterministic executor can parse.
4	Log & Distill	Save successful plan/param pairs → fine-tune a 7B open-model.	Cheaper runtime for hourly jobs.
5	Rollout Guardrails	Add a reflexion critic: rejects plans missing mandatory tools.	Maintains compliance at scale.

🤖 Quick Demo Prompt

{
  "role": "system",
  "content": "You are a **Planner**. Output strict JSON. \
Task: Reduce spend on low-ROAS adsets." 
}
{
  "role": "user",
  "content": "Goal: keep daily ROAS ≥ 3 for campaign 8723 on Meta. \
If ROAS < 3, cut bid 20 %. Then push report to Slack."
}

Expected JSON

[
  { "step": 1, "tool": "meta.getStats",  "inputs": {"campaignId":8723}, "outputs":["roas"] },
  { "step": 2, "tool": "compute.checkRoas", "inputs":{"roas":"$1.roas","threshold":3}, "outputs":["action"] },
  { "step": 3, "tool": "meta.updateBid", "inputs":{"campaignId":8723,"percent":-20}, "condition":"$2.action == 'cut'" },
  { "step": 4, "tool": "slack.post", "inputs":{"channel":"#ad-ops","text":"ROAS + actions taken"}} 
]

The Executor loops through steps, passing $1.roas → step 2, ensuring no variable is lost.