Build AI Apps with Python: AI Agent That Remembers — JSON Memory Tools | Episode 19
Video: Build AI Apps with Python: AI Agent That Remembers — JSON Memory Tools | Episode 19 by Taught by Celeste AI - AI Coding Coach
Student code: github.com/GoCelesteAI/build-ai-apps-python/tree/main/episode19 Two new tools —
save_note,recall_notes— and a JSON file. Now the agent remembers what it did last session.
The ReAct agent in Episode 18 thought, acted, and observed — but only within a single session. The moment the script exited, everything the agent had researched was gone. That works for one-shot questions. It doesn't work for an assistant you keep coming back to: a research agent, a coding assistant, a tutor that should remember what you covered yesterday.
The fix is persistent memory. We give the agent two new tools — save_note(topic, content) and recall_notes(topic) — that read and write a JSON file on disk. The agent decides what to remember and when to remember it. Across multiple runs of the same script, the memory file accumulates.
This is also a structural insight worth absorbing: memory is just another kind of tool. The model doesn't need a special "memory subsystem." Give it functions to write things and read them back, and tell it to use them, and it has memory.
What we're building
A research assistant agent with three tools:
save_note(topic, content)— adds a note to the memory file, indexed by topic.recall_notes(topic)— returns saved notes for a topic.lookup_info(query)— looks up programming-topic facts (a small mocked knowledge base).
Three sequential calls to the same agent, with the memory file persisting between them:
- "Research the history of Python and save your findings." — looks up Python, saves notes.
- "Research Rust memory management and save your findings." — looks up Rust, saves notes.
- "What do you remember about Python from your earlier research?" — recalls notes from the first session, no new lookup needed.
By session 3, the agent answers from its own memory, not from the lookup tool. That's the persistence working.
The memory layer
MEMORY_FILE = "agent_memory.json"
def load_memory():
if os.path.exists(MEMORY_FILE):
with open(MEMORY_FILE) as f:
return json.load(f)
return {}
def save_memory(memory):
with open(MEMORY_FILE, "w") as f:
json.dump(memory, f, indent=2)
def save_note(topic, content):
memory = load_memory()
if topic not in memory:
memory[topic] = []
memory[topic].append(content)
save_memory(memory)
return f"Saved note under '{topic}'"
def recall_notes(topic):
memory = load_memory()
if topic in memory:
notes = memory[topic]
return f"Found {len(notes)} notes: " + " | ".join(notes)
return f"No notes found for {topic}"
The memory is a JSON file. The structure is {topic: [list_of_notes]}. Saving appends to the list for the given topic; recalling returns all notes for the topic joined by a separator.
For tutorial purposes this is the right shape. For real apps you'd want richer structure — timestamps, tags, embedding-based search, deduplication — but everything that follows works the same way. The model treats memory as a tool. The implementation can be as simple or as fancy as your needs.
The load_memory() / save_memory() pattern reads-and-writes-the-whole-file on every operation. That's fine for small notebooks. For megabytes of state, you'd want a real database. But notice what we don't have: any in-memory cache that could go stale. Every operation reads fresh from disk. That's the right default until performance forces you to optimise.
Telling the agent it has memory
system="You are a research assistant with persistent memory. Use lookup_info to find information. Always save important findings with save_note. Use recall_notes to check what you already know before researching. Be concise."
Three instructions in one prompt:
- "Use lookup_info to find information." — fact retrieval.
- "Always save important findings with save_note." — habit of writing.
- "Use recall_notes to check what you already know before researching." — habit of reading first.
The third instruction is the one that makes session 3 work. Without it, the agent would call lookup_info again instead of checking memory. With it, the agent's first move is "do I already know this?"
What the trace looks like
Session 1 (research Python):
Question: Research the history of Python and save your findings.
==================================================
Step 1 — Think: Let me look up Python history first.
Step 1 — Act: lookup_info({'query': 'python history'})
Step 1 — Observe: Python was created by Guido van Rossum in 1991. Named after Monty Python.
Step 2 — Think: Now I'll save this finding.
Step 2 — Act: save_note({'topic': 'python history', 'content': 'Created by Guido van Rossum in 1991. Named after Monty Python.'})
Step 2 — Observe: Saved note under 'python history'
Final Answer: Python was created by Guido van Rossum in 1991 and named after Monty Python. I've saved this for future reference.
After this session, agent_memory.json exists on disk:
{
"python history": [
"Created by Guido van Rossum in 1991. Named after Monty Python."
]
}
Session 2 (research Rust) does the same thing for Rust. Now memory has two topics.
Session 3 is the interesting one:
Question: What do you remember about Python from your earlier research?
==================================================
Step 1 — Think: Let me check my memory for Python notes.
Step 1 — Act: recall_notes({'topic': 'python history'})
Step 1 — Observe: Found 1 notes: Created by Guido van Rossum in 1991. Named after Monty Python.
Final Answer: From my earlier research: Python was created by Guido van Rossum in 1991 and is named after Monty Python.
No lookup_info call. The agent went straight to recall_notes, found the saved note, and answered. The persistence is doing the work.
Memory designs (a brief tour)
The JSON-file approach is the simplest persistent memory. Here are richer alternatives, in order of complexity:
SQLite. Same shape as JSON but indexed and faster. Replace load_memory / save_memory with sqlite3 calls. Survives concurrent access better.
Vector store (RAG-as-memory). Embed each note when saving, query by similarity when recalling. The agent recalls by meaning rather than by exact topic match. "What did I learn about programming language history?" finds both the Python and Rust notes. Episodes 14–17 are reusable here.
Structured + summarised. Periodically summarise old notes into "long-term memory" and clear the raw entries. Mimics how human memory actually works — recent events vivid, old events compressed.
Hierarchical. Episodic memory (what I did) + semantic memory (what I know). Two stores. The agent decides which to write to and which to query.
For most agent projects, start with a JSON or SQLite key-value store and only graduate to fancier designs when the agent's behaviour demands it.
When memory hurts more than it helps
A few cases where persistent memory is the wrong choice:
The user expects a fresh session every time. A customer-support chatbot probably shouldn't carry state from one customer to another. The persistence model has to match the privacy model.
The state is short-lived. A code-completion agent doesn't need to remember what you did yesterday. Stateless is simpler.
The memory leaks across users. If you're hosting a multi-tenant agent, every user's memory file should be isolated by user ID. Otherwise you're shipping a privacy violation.
Memory drift compounds errors. If the agent saves an incorrect fact, that fact persists and gets recalled in future sessions. Memory amplifies bad output. Build in a way to correct or invalidate notes — a delete_note(topic, index) tool, or periodic review.
Common mistakes
Saving everything. A note-taking agent that saves every observation produces a memory file no one can navigate. Be selective; the system prompt should encourage saving important findings, not everything.
Recalling without filtering. A memory of 10,000 notes is unwieldy if you return all of them every time. Filter by topic, recency, or relevance.
Concurrent writes. If two scripts hit the same JSON file simultaneously, you'll get corrupt JSON. SQLite is safer for any non-trivial use.
Not telling the agent the memory exists. Without explicit instruction in the system prompt, the agent won't think to check memory or save findings. Memory tools without prompt support are dead weight.
Storing the API key in memory. Don't ever pass secrets into a tool call. The model can include them in subsequent reasoning, surface them in answers, and persist them in memory files. Keep secrets server-side.
What's next
Next episode: multi-agent pipelines. One agent decomposes a task and delegates sub-tasks to specialised agents — a researcher, a writer, a reviewer. We orchestrate three agents in sequence and watch them produce a final report none of them could produce alone.
Recap
What we did today. Added two memory tools — save_note and recall_notes — backed by a JSON file on disk. Updated the system prompt to tell the agent it has persistent memory and should consult it before researching. Ran the agent across three sessions and watched the third session answer from memory rather than re-researching. Recognised that memory is a tool, not a special architectural component.
You haven't built a long-lived assistant. But you've removed the "amnesia between sessions" limitation that makes most basic agents disposable.
Next episode: multi-agent pipelines. See you in the next one.