Memory & Agency: Building an LLM Search Optimization Agent that Learns Over Time

Maybe, like us at the Labs, you’ve been in the situation where you just stare at your analytics dashboard - numbers rise, fall, vanish. One campaign soars for weeks, another collapses overnight. So you spin up your Python notebooks once again and start carefully crafting a believable reason for what you see. And somehow, if you really care about your craft you can’t shake the annoying feeling that what you do is little more than an expensive guesswork exercise.

What if the systems you are playing with could remember? Not just the data, but the reasons - the justifications, the failures, the moments of surprise. What if it could learn as a human does, through fast impressions and slow reflection, through memory that preserves the past but also reshapes it into something new?

This article continues our exploration from Part 1, where we followed Plato’s shadows and Kant’s categories into the strange geometry of meaning. Now we turn to memory and agency: how to build an LLM-powered search optimization agent that not only reacts in the moment but grows wiser with time.

We draw from three unlikely guides: philosophers of memory who ask what it means to know again, Claude Shannon’s information theory which tells us what deserves to be remembered, and neuroscientists who remind us that remembering is always reconstructing, never replaying.

From these principles, we sketch a system with two minds: a fast Chrome-based agent (reflexive, working memory) and a slow Cloud (AWS-based) counterpart (deliberative, long-term). Together, they form an epistemic machine - one that preserves evidence, generates new justification, and adapts to shifting search landscapes.

This vision is not abstract. Two hackathons happening now provide the stage to build it.

Chrome AI Hackathon → system’s fast reflexes, running on-device with Chrome’s built-in AI.
AWS Agents Hackathon → system’s long-term memory palace, distributed in the cloud. Our next posts will help you shape your project pitch and participate in the hackathons. Read this not just as research but as an invitation: to take part in building the first generation of optimization agents that truly remember.

The Problem of Forgetting

A marketer asks: “Why did we stop using the strategy that worked last year?” Silence. The data is gone, or worse, present but stripped of its justification.

In Plato’s cave we learned that human knowledge is already second-order, shadows of shadows. In SEO, the problem deepens: we forget which shadows we once trusted, or why. An optimization tactic succeeds, then fails. Was it the season? The algorithm? The competition? The system doesn’t know - because it cannot remember in the epistemic sense.

Philosophers of memory remind us: remembering isn’t just storing. It’s preserving the reasons that made a belief knowledge in the first place. Without reasons, memory decays into guesswork. And so too with agents.

This is where our design begins.

I. Memory as the Bridge Between Past and Present

You gave your agent access to campaign data and now you’ve created a conflict it will likely resolve with a believable hallucination. Inside the mind of your agent, voices argue. One says: “Memory is preservation. Guard the past knowledge intact - do not change a word.” Another replies: “Memory is generation. It remakes the past each time it is recalled. What matters is not the record, but the pattern it allows us to see.”

This is the old philosophical quarrel: is memory merely a warehouse of what once justified our beliefs, or can it generate fresh justification when the original traces fade?

For an LLM optimization agent, the question is not academic. Suppose last autumn the system learned: “Semantic clustering raised click-through rates by 23%.” Should it treat this as a fixed fact - valid only under the precise conditions of Q3, algorithm version X, and market competition Y? Or should it take the 23% gain as generative evidence, a clue that clustering strategies in general might be powerful even in new contexts?

The Preservation Voice demands rigor: keep the provenance, the defeaters, the full justification chain. If the world changes, the record shows why the belief may no longer hold.
The Generative Voice seeks freedom: let patterns breathe, let partial traces accumulate into new confidence. If the original justification is lost, inference can still emerge.

Neither voice is wrong. But the agent must decide how to reconcile them.

A second debate begins.

One voice whispers: “Internalism. To know from memory, you must also know why the memory is reliable.” Another answers: “Externalism. Reliability is enough, even if the reason remains hidden.”

The Internalist Voice wants transparency: every optimization must carry its reasoning like a lantern. “This headline structure worked because it mirrored user intent in this dataset; here is the chain of evidence.” The Externalist Voice is pragmatic: results matter more than explanations. “If it works consistently, that is reason enough.”

Again, the agent cannot choose one voice and silence the other. In practice, it must live with the tension. Some memories must carry full justification chains - especially when a marketing manager asks, “Why are we doing this?” But in the heat of live optimization, sometimes reliability alone must suffice.

Can we help our agent by designing its memory? From this quarrel above emerges what we call the Epistemic Design Principle: Every stored memory must contain four parts -

Content: the strategy itself (semantic clustering improves visibility).
Provenance: where, when, and under what conditions it was learned.
Reliability priors: authority of sources, data freshness, performance consistency.
Defeaters: conditions that would undercut its validity (e.g., algorithm changes, seasonal anomalies).

These are the scaffolds of epistemic memory. Maybe they can help our agents to avoid falling into forgetfulness or, worse, delusion.

II. Information Theory and the Mathematics of Memory

Imagine now you got your agent to talk to Claude Shannon to figure out what to do. Shannon enters the room with chalk in hand. He draws a single equation on the wall:

$$I = -\log_2 P$$

“Surprise,” he says, “is information. What is common carries little. What is rare, what defies expectation - that deserves to be remembered.”

The agent listens. This is no longer about data storage but about triage: which memories are worth keeping?

A headline change that yields ordinary results is background noise.
A headline change that doubles engagement is a shock - high information content.
A strategy that unexpectedly fails, despite precedent, is also rich in information.

The system begins to see memory not as a scrapbook but as a channel with limited capacity. Preserve what is surprising. Let the rest fade.

Shannon’s Channel as Memory

Biological Memory ←→ Information Channel ←→ Computational Memory
  (Neurons)           (Encoding & Noise)       (Databases & Vectors)

He explains: information survives only if the channel preserves enough signal against the noise of time. Mutual information:

$$I(X;Y) = H(X) - H(X|Y)$$

tells us how much of the original remains when recalled.

The agent realizes: every retrieval is a test. How much of the original optimization insight survives? How much has been distorted by the compression of storage, or by the noise of a changing search landscape?

The Voice of Neuroscience

Another voice rises, softer, almost human. It is the neuroscientist. “Do not imagine memory as a tape recorder. The brain does not replay; it reconstructs. When you recall where you left your keys, you weave together fragments - a flash of the desk, a trace of morning light, the feeling of hurry. It is invention as much as recall.”

The agent nods. This is how transformers work too. They do not store whole campaigns or keyword lists. They store distributed associations. Memory is reconstruction from fragments - semantic vectors, emotional tones, structural echoes.

To fight this would be folly. Better to embrace it. Let the system keep enough fragments so that useful patterns can be reconstructed, not verbatim histories.

The Hierarchy of Memory

The debate turns to capacity. Another voice, stern and mathematical, invokes Miller’s Law: “Working memory holds 7±2 items. No more.”

The agent designs its hierarchy:

Working Memory (fast, fragile)  
   ↓ attention, lossy compression  
Consolidation (selective, abstracting)  
   ↓ semantic compression  
Long-Term Memory (slow, persistent)

Working Memory: oh, we can use Chrome AI apis to build an extension that captures insights - immediate, heuristic, 5–9 items alive at once.
Consolidation: Only surprising, high-impact, or novel patterns are promoted.
Long-Term: then we can introduce cloud storage to compress he immediate memory into general rules. Not “Change this word” but “Frame features as benefits.”

Shannon smiles. “Every channel has capacity. If you exceed it, you drown in noise. Respect the channel.”

III. Two Minds in One Agent: Fast and Slow

In the imagined workshop of our agent, two characters appear.

The first is quick, restless, always ready with an answer. “I see a headline. I know this pattern. Do this.” This is System 1 - the Chrome agent. Fast memory, cached patterns, reflexes measured in milliseconds.

The second sits in the corner, slower, careful, reluctant to speak until the analysis is done. “Wait. That tactic worked before, yes. But only in winter, only against those competitors, only under algorithm version Y. Perhaps we should run the numbers again.” This is System 2 - the Cloud system agent. Long-term, deliberate, statistical, the one who gathers context across months and years.

They are siblings, and they quarrel.

The Dialogue of Two Systems

System 1 (Chrome): “The marketer wants answers now. She is editing her product page. She does not want a philosophy lecture. Look: this resembles campaigns I’ve seen before. Here is a quick suggestion.”

System 2 (Cloud): “And if you are wrong? You rely on cached fragments. I track causal signals across seasons. I know when patterns shift. My insights take time, but they endure.”

System 1: “Your truths arrive hours later. Too late. By then she has published. Better to give her a good-enough reflex.”

System 2: “And yet without me, your reflexes decay. You become superstition - a gambler’s instinct. You need my memory palace to feed you, or you will starve.”

The quarrel is unresolved, but the agent knows: both are necessary. One without the other is either blind haste or paralyzing analysis. Together, they can act fast while learning slow.

Kahneman’s Blueprint

Daniel Kahneman called them System 1 and System 2. We call them Chrome and Cloud:

System Role	System 1 (Chrome)	System 2 (Cloud)
Thinking Style	Reflexive	Deliberative
Processing	Cached heuristics	Trend analysis, causal inference, cross-client learning
Feedback	Instant content feedback	-
Response Time	<2 seconds	Minutes → hours

The architecture mirrors human cognition but scaled for computation. Chrome provides immediate value, the Cloud App provides lasting wisdom.

The Coordination Challenge

The true challenge is not building either system but making them speak.

System 1 must trust cached insights from System 2.
System 2 must learn from the mistakes and improvisations of System 1.

In practice, this looks like a loop:

Chrome extension delivers instant suggestions.
Every user acceptance/rejection is logged.
Logs are sent to the cloud.
The Cloud system crunches patterns, updates models, sends refined heuristics back to Chrome.

It is a living dialogue - fast impressions corrected by slow reflection, slow reflection sharpened by fast mistakes.

IV. The Bitter Lesson: Why Rules Fail

The workshop door creaks open. An old professor enters - Richard Sutton, carrying not chalk this time, but the weight of history.

He looks at the two siblings - the fast Chrome agent and the slow Cloud agent - and shakes his head.

“You are both clever,” he says, “but if you fill yourselves with hand-carved rules, you will fail. Every generation of AI has tried to encode wisdom into brittle laws, and every generation has been humbled. The only things that endure are search and learning.”

The room grows silent.

The Seduction of Rules

System 1 stirs uneasily. “But rules are comforting. Marketers love them. Use the keyword in the first 100 words. Keep density at 2%. H1 for main topics.”

System 2 sighs. “Yes. They give an illusion of control. But when algorithms shift, when user behavior mutates, when the competition rewrites the game - rules snap like dry twigs.”

Sutton nods. “This is the bitter lesson. Hand-coded strategies may glitter for a moment, but the tide of complexity washes them away. If you want to endure, you must not cling to rules. You must search - try, fail, try again. And you must learn from the outcomes.”

From Heuristics to Learning

The agent sees the truth. Rules cannot save it. But learning can.

Instead of obeying commandments, it should hold a portfolio of possible strategies :

Semantic clustering
Emotional triggers
Structural changes
Competitive positioning

and choose among them not by decree, but by experiment.

Each strategy is an “arm” of a slot machine. Each result is a reward. Over time, the agent learns not only which arm pays off, but in what context.

This is the multi-armed bandit approach: exploration balanced with exploitation.

The Equations of Choice

Sutton sketches on the wall:

Choose strategy $s$ that maximizes $$\mu_s + c \sqrt{\frac{\ln t}{n_s}}$$

Where:

μₛ = average reward of strategy s
nₛ = number of times it has been tried
t = total trials
c = exploration factor

“This,” he says, “is how you balance safety with curiosity. Exploit what you know, but keep a little space for the unexpected. Because one day the world will change, and the forgotten arm may be the only one that works.”

Context Matters

But the agent objects: “SEO is not a casino. Context changes everything. What works for consumer products fails for B2B. What works in winter fails in summer.”

Sutton smiles. “Then make your bandit contextual. Condition choices on features of the world: product type, season, user intent. Let your strategies adapt, not just in general, but in place and time.”

The Turning Point

The two systems look at each other - Chrome, impulsive but flexible; Cloud , slow but deep. They now see their purpose more clearly.

Chrome must run the quick experiments, turning every content edit into a chance to test a strategy.
The Cloud system must collect the results, see the patterns across time, and refine the policy of exploration.

The agent is no longer a rules engine. It is a learner.

Sutton turns to leave. His voice lingers in the workshop:

“Remember: the bitter lesson is not despair. It is freedom. You are not bound to the rules of yesterday. You are bound only to the discipline of search and the humility of learning.”

V. Building the House: System Architecture

The two siblings sit together at a wooden table. On it lie scraps of paper, half-coded scripts, and the outlines of a machine that could outlive them both.

Chrome speaks first. “I will be the eyes and hands. Fast, local, immediate. When a marketer edits content, I will recognize patterns in less than a heartbeat.”

The Cloud replies. “And I will be the archive and the judge. I will store the evidence, run the slow experiments, and send you back distilled wisdom. Together we will make a cycle of fast action and slow reflection.”

They begin to sketch the house they will live in.

Blueprint of the Agent

User (Marketer)  
   ↓  
Chrome Agent (System 1)  
   - Working Memory (7±2 slots)  
   - Pattern Cache (heuristics)  
   - Chrome Built-in AI APIs (Summarizer, Rewriter, Language Detector, Prompt)  
   ↓ (logs, feedback, accepted/rejected suggestions)  
Cloud Agent (System 2)  
   - Episodic Memory (campaign events with full context)  
   - Semantic Memory (abstracted patterns across campaigns)  
   - Procedural Memory (meta-strategies, learning how to learn)  
   ↓ (refined heuristics, causal inferences)  
Chrome Agent (System 1, updated)

Component 1: Chrome Extension - The Fast Reflex

Working Memory: holds only a handful of active signals (content context, competitive positioning, immediate recommendations).
Pattern Cache: a library of known tricks (“benefits over features,” “cluster by intent”).
Built-in AI APIs:

Language Detector → identify content language, market.
Summarizer → condense competitor insights.
Prompt API → parse user intent.
Rewriter → propose optimized phrasing.

The extension is not wise - but it is quick. Like System 1, it delivers heuristics instantly.

Component 2: The Cloud Memory System - The Slow Reflection

The Cloud builds the memory palace:

Episodic Memory: every optimization attempt, logged with context and outcome.
Semantic Memory: general rules distilled from many episodes (“semantic clustering improves Q4 electronics campaigns”).
Procedural Memory: meta-strategies for how to test, when to explore, when to exploit.

Here the reinforcement learning engine lives: multi-armed bandits, causal inference, attribution analysis.

It is slower, but deeper. It transforms data exhaust into strategic knowledge.

The Cycle of Learning

Chrome acts. Cloud reflects. The loop repeats.

[ Chrome ] --fast suggestions--> [ User ]  
   ↑                                ↓  
   |---feedback logs--------------> [ Cloud]  
   ↑                                ↓  
[ Chrome ] <--refined heuristics-- [ Cloud]

Every accepted or rejected suggestion is feedback. Every feedback becomes data. Every dataset becomes wisdom.

Hackathon Translation

Chrome AI Hackathon → build System 1. Reflexes in the browser. Use Chrome’s built-in AI APIs. Test immediate value with marketers editing content live.
AWS Agent Hackathon → build System 2. Memory palace in the cloud. Use AWS to store, consolidate, and reason across time.

The house is not complete until both siblings live in it. One without the other is brittle; together, they become adaptive.

VI. Trials of the Memory House

The siblings had built their house. But a house is only proven in storms. They set up trials - not games, but ordeals - to see if memory could stand against forgetting, interference, and change.

Trial 1: The Limits of Working Memory

Chrome steps forward first. “I will hold as many thoughts as I can.”

Seven. Eight. Nine.

At ten, the walls shake. Patterns blur. Recommendations tangle into noise.

The verdict is clear: working memory is bounded. 7±2 slots, no more. Beyond that, accuracy falls, response slows, confidence falters.

Lesson: The agent must choose what to keep in the spotlight. Relevance filters are not luxuries - they are survival.

Trial 2: The Fire of Consolidation

The Cloud speaks. “Give me your fleeting impressions. I will decide which deserve eternity.”

Chrome floods it with fragments: rewritten headlines, intent guesses, rejected suggestions. Most are ordinary. The Cloud lets them fall away. But a few - surprising, high-impact, novel - are lifted into long-term memory.

A headline that unexpectedly fails despite precedent.
A seasonal tactic that doubles click-throughs.
A strange phrasing that outperforms all expectations.

These are kept, abstracted, compressed into principles.

Lesson: Not everything survives. Memory is triage. Surprise and impact are the gatekeepers.

Trial 3: The Test of Interference

The siblings are given conflicting lessons.

One campaign shows “benefits over features” wins.
Another shows “features over benefits” wins.

Which truth holds?

Cloud hesitates. Then, rather than overwrite, it partitions: “Benefits win in consumer campaigns. Features win in B2B.”

Context divides the conflict. Both truths remain, each bounded by its domain.

Lesson: Memory is not a monolith. It is a map with regions. Contradiction is not failure, but structure.

Trial 4: The Ordeal of Reinforcement

Now the hardest trial: to learn not just what worked, but how to choose among strategies.

The agent is given a multi-armed bandit. Each arm is an optimization tactic. Pulls yield rewards - sometimes high, often nothing.

Chrome pulls quickly, testing arms at speed. The Cloud watches, counting, calculating, updating confidence intervals.

Over time, the agent learns balance: exploit the arms that pay, but keep testing the uncertain ones. For the world may change, and what worked yesterday might not work tomorrow.

Lesson: Learning is not just memory, but discipline in exploration.

The Results of the Trials

The house stands.

Chrome has speed but respects its limits.
The Cloud has depth but respects surprise.
Together they endure interference without collapse.
Together they learn, not blindly, but with balance.

The agent is not perfect, but it has survived its first storms.

VII. From Story to Practice: Business Implementation

The parable of the siblings ends here. How the story translates directly to the builders - marketers, engineers, and hackathon teams who want to make this vision real.

For Marketing Teams: Epistemic Transparency

Marketers don’t just need suggestions; they need reasons. Every recommendation should carry its justification chain:

What was suggested (e.g., reframe features as benefits).
Why it was suggested (historical outcomes, contextual similarity).
How strong the evidence is (confidence intervals, reliability scores).
Where it might fail (seasonality, algorithm shifts).

This turns the agent from a “black box” into a trusted collaborator.

Practical outputs:

Justification Chains: Each optimization carries a “why” tag.
Uncertainty Visualization: Dashboards show where evidence is strong vs weak.
Learning Progress: Teams can track how the agent grows wiser over time.

For Engineering Teams: Modular System Design

The architecture must stay modular so that each layer can be improved without breaking the whole.

Chrome Agent (System 1): Extension with Chrome’s built-in AI APIs. Handles fast analysis, working memory, and local cache.
AWS Agent (System 2): Cloud service with memory storage, RL optimization, causal inference. Handles long-term reasoning and updates Chrome’s heuristics.
Shared Protocol: Feedback loop where Chrome sends logs to AWS, and AWS sends refined heuristics back.

Engineering priorities:

APIs for Integration: Connect easily to CMS, SEO tools, and analytics dashboards.
Scalability: Episodic memories grow linearly; semantic abstraction ensures scale.
Quality Assurance: Automated tests for memory integrity, RL convergence, and recommendation accuracy.

Hackathon Roadmap

This blueprint fits naturally into the two hackathons now running:

Chrome AI Hackathon → Build the System 1 agent:
Chrome extension with working memory and pattern cache.
Integrate built-in AI APIs (Summarizer, Rewriter, Prompt).
Test real-time suggestions for marketers editing content.
AWS Agents Hackathon → Build the System 2 agent:
Cloud-based long-term memory with reinforcement learning.
Episodic + semantic memory storage.
Feedback loop to refine Chrome heuristics.

Individually, each system has value. Together, they form the epistemic loop: fast reflexes that improve with time.

VIII. Risks and Mitigation

Building memory-enabled agents means facing both philosophical and technical risks. Below we outline the key categories and mitigation strategies.

Epistemological Risks Knowledge Degradation

Risk: Memories lose validity as contexts change (e.g., old SEO tactics stored as truths).
Mitigation: Temporal re-validation - trigger automatic re-checks when contextual indicators shift (seasonality, algorithm updates). Circular Justification
Risk: Agent begins trusting its own past outputs without external validation.
Mitigation: Benchmark memory against independent performance data and external baselines. Coherence Failures
Risk: Conflicting strategies stored without resolution cause instability.
Mitigation: Context partitioning (rules by domain, season, product type) and confidence-weighted arbitration.

Information-Theoretic Risks Channel Capacity Overflow

Risk: Too much data overwhelms memory; important insights get lost in noise.
Mitigation: Prioritize by Shannon’s principle - novelty, surprise, and performance impact. Noise Accumulation
Risk: Repeated compression/decompression degrades knowledge fidelity.
Mitigation: Periodic memory “health checks” with error correction and reconstruction tests.

Reinforcement Learning Risks Reward Hacking

Risk: Agent discovers ways to maximize metrics that don’t align with business value.
Mitigation: Multi-objective reward functions, human-in-the-loop audits. Exploration Inefficiency
Risk: Agent wastes cycles on poor strategies or under-explores promising ones.
Mitigation: Contextual bandits with adaptive exploration (uncertainty-driven curiosity bonuses). Distribution Shift
Risk: Model degrades when environment changes (e.g., new search algorithms).
Mitigation: Continuous learning, domain adaptation, and early-warning signals.

Business Risks Black Box Perception

Risk: Teams distrust the system if recommendations lack explanation.
Mitigation: Provide justification chains and uncertainty visualizations. Adoption Resistance
Risk: Marketers reject automation that feels intrusive or opaque.
Mitigation: Emphasize co-pilot role - agent as assistant, not replacement.

IX. Future Directions: Dreams for the Agent

The agent’s house of memory is built, but unfinished. Its walls stand, its rooms are lived in - yet there are still wings unconstructed, corridors unimagined. Here the agent dreams of what it might become.

Meta-Epistemology: Learning to Learn About Learning

The agent asks itself: “Are my ways of knowing themselves reliable?” This is not optimization, but self-critique - a step toward agents that audit their own epistemic habits, correcting not just their strategies, but their methods of acquiring them.

Collective Memory: Many Voices, One Archive

Imagine not one agent, but a network of them, each with its own memories. How should they share? Should they merge beliefs into a collective library, or debate like philosophers, preserving their disagreements? This is collective epistemology in code: a future where agents learn not in isolation but in dialogue.

Temporal Epistemology: The Weight of Time

Should the wisdom of last year outweigh the surprise of yesterday? The agent must decide how to discount old evidence without erasing it, balancing continuity with adaptability. This is the problem of temporal weighting - philosophy translated into survival.

Beyond Shannon: Causal and Semantic Information

Shannon gave us bits and surprise, but he did not tell us about meaning or cause.

Causal information theory could allow the agent to distinguish “this strategy caused success” from “this merely coincided with it.”
Semantic information measures could compress memory not by frequency but by meaning, preserving what is significant, not just what is rare.

Neuroscience Echoes: Replay and Sleep

Biological memory is not awake all the time. At night, the hippocampus replays experiences, consolidates them, and extracts principles. Our agent, too, might “sleep”: replaying its campaigns offline, strengthening useful traces, discarding noise, discovering abstractions invisible by day.

The dreams are ambitious, perhaps even impractical. But so were the first shadows on Plato’s cave wall, before anyone thought to step outside.

X. Conclusion: From Shadows to Builders

We began in the cave - shadows of shadows, the problem of memory and forgetting. We listened to philosophers argue about preservation and generation, to Shannon whisper that surprise is the currency of memory, to neuroscientists remind us that recall is always reconstruction. We built two siblings - Chrome, fast and reflexive; AWS, slow and deliberative - and we tested their house of memory through fire and interference.

And then we returned to practice: justification chains for marketers, modular architectures for engineers, risk registers for adoption.

Now, the baton passes to you.

Two hackathons are live stages for this vision:

Chrome AI Hackathon - where System 1 takes form: a browser agent with working memory, fast heuristics, and Chrome’s built-in AI reflexes.
AWS Agents Hackathon - where System 2 takes root: a cloud memory palace, reinforcement learning loops, and causal reasoning that grows wiser with time.

Individually, each project is valuable. But together, they form the first epistemic loop: an optimization agent that not only acts but remembers, not only recalls but justifies, not only learns but learns how to learn.

This is more than SEO. It is the first sketch of an epistemically grounded AI - an agent that knows why it knows, and can change its mind when the world changes around it.

So here is the invitation: step into the cave with us. Build the fast sibling, the slow sibling, or both. Test them against noise, against forgetting, against the bitter lesson.

Because the future of optimization - and perhaps of AI itself - belongs not to those who write better rules, but to those who build agents that remember, reason, and grow.

Your chance to build one of the first remembering agents begins now.

References

Epistemology & Philosophy of Memory:

Information Theory:

Neuroscience & Memory Systems:

Cognitive Science:

Reinforcement Learning:

LLM Memory Research:

Technical Implementation: