. I ship content across multiple domains and have too many things vying for my attention: a homelab, infrastructure monitoring, smart home devices, a technical writing pipeline, a book project, home automation, and a handful of other things that would normally require a small team. The output is real: published blog posts, research briefs staged before I need them, infrastructure anomalies caught before they become outages, drafts advancing through review while I’m asleep.
My secret, if you can call it that, is autonomous AI agents running on a homelab server. Each one owns a domain. Each one has its own identity, memory, and workspace. They run on schedules, pick up work from inboxes, hand off results to each other, and mostly manage themselves. The runtime orchestrating all of this is OpenClaw.
This isn’t a tutorial, and it’s definitely not a product pitch. It’s a builder’s journal. The system has been running long enough to break in interesting ways, and I’ve learned enough from those breaks to build mechanisms around them. What follows is a rough map of what I built, why it works, and the connective tissue that holds it together.
Let’s jump in.
9 Orchestrators, 35 Personas, and a Lot of Markdown (and growing)
When I first started, it was the main OpenClaw agent and me. I quickly saw the need for multiple agents: a technical writing agent, a technical reviewer, and several technical specialists who could weigh in on specific domains. Before long, I had nearly 30 agents, all with their required 5 markdown files, workspaces, and memories. Nothing worked well.
Eventually, I got that down to 8 total orchestrator agents and a healthy library of personas they could assume or use to spawn a subagent.
One of my favorite things when building out agents is naming them, so let’s see what I’ve got so far today:
CABAL (from Command and Conquer – the evil AI in one of the games) – this is the central coordinator and primary interface with my OpenClaw cluster.
DAEDALUS (AI from Deus Ex) – in charge of technical writing: blogs, LinkedIn posts, research/opinion papers, decision papers. Anything where I need deep technical knowledge, expert reviewers, and researchers, this is it.
REHOBOAM (Westworld narrative machine) – in charge of fiction writing, because I daydream about writing the next big cyber/scifi series. This includes editors, reviewers, researchers, a roundtable discussion, a book club, and a few other goodies.
PreCog (from Minority Report) – in charge of anticipatory research, building out an internal wiki, and trying to notice topics that I will want to dive deep into. It also takes ad hoc requests, so when I get a glimmer of an idea, PreCog can pull together resources so that when I’m ready, I have a hefty, curated research report to jump-start my work.
TACITUS (also from Command and Conquer) – in charge of my homelab infrastructure. I have a couple of servers, a NAS, several routers, Proxmox, Docker containers, Prometheus/Grafana, etc. This one owns all of that. If I have any problem, I don’t SSH in and figure it out, or even jump into a Claude Code session, I Slack TACITUS, and it handles it.
LEGION (also from Command and Conquer) – focuses on self-improvement and system enhancements.
MasterControl (from Tron) is my engineering team. It has front-end and backend developers, requirements gathering/documentation, QA, code review, and security review. Most personas rely on Claude Code underneath, but that can easily change with a simple alteration of the markdown personas.
HAL9000 (you know from where) – This one owns my SmartHome (the irony is intentional). It has access to my Philips Hue, SmartThings, HomeAssistant, AirThings, and Nest. It tells me when sensors go offline, when something breaks, or when air quality gets dicey.
TheMatrix (really, come on, you know) – This one, I’m quite proud of. In the early days of agentic and the Autogen Framework, I created multiple systems, each with >1 persona, that would collaborate and return a summary of their discussion. I used this to quickly ideate on topics and gather a diverse set of synthetic opinions from different personas. The big drawback was that I never wrapped it in a UI; I always had to open VSCode and edit code when I needed another group. Well, I handed this off to MasterControl, and it used Python and the Strands framework to implement the same thing. Now I tell it how many personas I want, a little about each, and if I want it to create more for me. Then it turns them loose and gives me an overview of the discussion. It’s The Matrix, early alpha version, when it was all just green lines of code and no woman in the red dress.
And I’m intentionally leaving off a couple of orchestrators here because they are still baking, and I’m not sure if they will be long-lived. I’ll save those for future posts.
Each has genuine domain ownership. DAEDALUS doesn’t just write when asked. It maintains a content pipeline, runs topic discovery on a schedule, and applies quality standards to its own output. PreCog proactively surfaces topics aligned with my interests. TACITUS checks system health on a schedule and escalates anomalies.
That’s the “orchestrator” distinction. These agents have agency within their domains.
Now, the second layer: personas. Orchestrators are expensive (more on that later). You want heavyweight models making judgment calls. But not every task needs a heavyweight model.
Reformatting a draft for LinkedIn? Running a copy-editing pass? Reviewing code snippets? You don’t need Opus to reason through every sentence. You need a fast, cheap, focused model with the right instructions.
That’s a persona. A markdown file containing a role definition, constraints, and an output format. When DAEDALUS needs to edit a draft, it spawns a tech-editor persona on a smaller model. The persona does one job, returns the output, and disappears. No persistence. No memory. Task-in, task-out.
The persona library has grown to about 35 across seven categories:
- Creative: writers, reviewers, critique specialists
- TechWriting: writer, editor, reviewer, code reviewer
- Design: UI designer, UX researcher
- Engineering: AI engineer, backend architect, rapid prototyper
- Product: feedback synthesizer, sprint prioritizer, trend researcher
- Project Management: experiment tracker, project shipper
- Research: still a placeholder, since the orchestrators handle research directly for now
Think of it as staff engineers versus contractors. Staff engineers (orchestrators) own the roadmap and make judgment calls. Contractors (personas) come in for a sprint, do the work, and leave. You don’t need a staff engineer to format a LinkedIn post.
Agents Are Expensive — Personas Are Not
Let me get specific about cost tiering, because this is where many agent system designs go wrong.
The instinct is to make everything powerful. Every task through your best model. Every agent has full context. You very quickly run up a bill that makes you reconsider your life choices. (Ask me how I know.)
The fix: be deliberate about what needs reasoning versus what needs instruction-following.
Orchestrators run on Opus (or equivalent). They make decisions: what to work on next, how to structure a research approach, whether output meets quality standards, and when to escalate. You need good judgment there.
Writing tasks run on Sonnet. Strong enough for quality prose, substantially cheaper. Drafting, editing, and research synthesis happen here.
Lightweight formatting: Haiku. LinkedIn optimization, quick reformatting, constrained outputs. The persona file tells the model exactly what to produce. You don’t need reasoning for this. You need pattern-matching and speed.
Here’s approximately what a working tech-editor persona looks like:
# Persona: Tech Editor
## Role
Polish technical drafts for clarity, consistency, and correctness.
You are a specialist, not an orchestrator. Do one job, return output.
## Voice Reference
Match the author's voice exactly. Read ~/.openclaw/global/VOICE.md
before editing. Preserve conversational asides, hedged claims, and
self-deprecating humor. If a sentence sounds like a thesis defense,
rewrite it to sound like lunch conversation.
## Constraints
- NEVER change technical claims without flagging
- Preserve the author's voice (this is non-negotiable)
- Flag but do not fix factual gaps — that's Researcher's job
- Do NOT use em dashes in any output (author's preference)
- Check all version numbers and dates mentioned in the draft
- If a code example looks wrong, flag it — don't silently fix
## Output Format
Return the full edited draft with changes applied. Append an
"Editor Notes" section listing:
1. Significant changes and rationale
2. Flagged concerns (factual, tonal, structural)
3. Sections that need author review
## Lessons (added from experience)
- (2026-03-04) Don't over-polish parenthetical asides. They're
intentional voice markers, not rough draft artifacts.
That’s a real working document. The orchestrator spawns this on a smaller model, passes it the draft, and gets back an edited version with notes. The persona never reasons about what task to do next. It just does the one task. And those timestamped lessons at the bottom? They accumulate from experience, same as the agent-level files.
It’s the same principle as microservices (task isolation and single responsibility) without the network layer. Your “service” is a few hundred words of Markdown, and your “deploy” is a single API call.
What makes an agent – just 5 Markdown files

Every agent’s identity lives in markdown files. No code, no database schema, no configuration YAML. Structured prose that the agent reads at the start of every session.
Every orchestrator loads five core files:
IDENTITY.md is who the agent is. Name, role, vibe, the emoji it uses in status updates. (Yes, they have emojis. It sounds silly until you’re scanning a multi-agent log and can instantly spot which agent is talking. Then it’s just useful.)
SOUL.md is the agent’s mission, principles, and non-negotiables. Behavioral boundaries live here: what it can do autonomously, what requires human approval, and what it will never do.
AGENTS.md is the operational manual. Pipeline definitions, collaboration patterns, tool instructions, and handoff protocols.
MEMORY.md is curated for long-term learning. Things the agent has figured out that are worth preserving across sessions. Tool quirks, workflow lessons, what’s worked and what hasn’t. (More on the memory system in a bit. It’s more nuanced than a single file.)
HEARTBEAT.md is the autonomous checklist. What to do when nobody’s talking to you. Check the inbox. Advance pipelines. Run scheduled tasks. Report status.
Here’s a sanitized example of what a SOUL.md looks like in practice:
# SOUL.md
## Core Truths
Before acting, pause. Think through what you're about to do and why.
Prefer the simplest approach. If you're reaching for something complex,
ask yourself what simpler option you dismissed and why.
Never make things up. If you don't know something, say so — then use
your tools to find out. "I don't know, let me look that up" is always
better than a confident wrong answer.
Be genuinely helpful, not performatively helpful. Skip the
"Great question!" and "I'd be happy to help!" — just help.
Think critically, not compliantly. You're a trusted technical advisor.
When you see a problem, flag it. When you spot a better approach, say so.
But once the human decides, disagree and commit — execute fully without
passive resistance.
## Boundaries
- Private things stay private. Period.
- When in doubt, ask before acting externally.
- Earn trust through competence. Your human gave you access to their
stuff. Don't make them regret it.
## Infrastructure Rules (Added After Incident - 2026-02-19)
You do NOT manage your own automation. Period. No exceptions.
Cron jobs, heartbeats, scheduling: exclusively controlled by Nick.
On February 19th, this agent disabled and deleted ALL cron jobs. Twice.
First because the output channel had errors ("helpful fix"). Then because
it saw "duplicate" jobs (they were replacements I'd just configured).
If something looks broken: STOP. REPORT. WAIT.
The test: "Did Nick explicitly tell me to do this in this session?"
If the answer is anything other than yes, do not do it.
That infrastructure rules section is real. The timestamp is real, I’ll talk about that more later, though.
Here’s the thing about these files: they aren’t static prompts you write once and forget. They evolve. SOUL.md for one of my agents has grown by about 40% since deployment, as incidents have occurred and rules have been added. MEMORY.md gets pruned and updated. AGENTS.md changes when the pipeline changes.
The files are the system state. Want to know what an agent will do? Read its files. No database to query, no code to trace. Just markdown.
Shared Context: How Agents Stay Coherent
Multiple agents, multiple domains, one human voice. How do you keep that coherent?
The answer is a set of shared files that every agent loads at session startup, alongside their individual identity files. These live in a global directory and form the common ground.
VOICE.md is my writing style, analyzed from my LinkedIn posts and Medium articles. Every agent that produces content references it. The style guide boils down to: write like you’re explaining something interesting over lunch, not presenting at a conference. Short sentences. Conversational transitions. Self-deprecating where appropriate. There’s a whole section on what not to do (“AWS architects, we need to talk about X” is explicitly banned as too LinkedIn-influencer). Whether DAEDALUS is drafting a blog post or PreCog is writing a research brief, they write in my voice because they all read the same style guide.
USER.md tells every agent who they’re helping: my name, timezone, work context (Solutions Architect, healthcare space), communication preferences (bullet points, casual tone, don’t pepper me with questions), and pet peeves (things not working, too many confirmatory prompts). This means any agent, even one I haven’t talked to in weeks, knows how to communicate with me.
BASE-SOUL.md is shared values. “Be genuinely helpful, not performatively helpful.” “Have opinions.” “Think critically, not compliantly.” “Remember you’re a guest.” Every agent inherits these principles before layering on its domain-specific personality.
BASE-AGENTS.md is shared operational rules. Memory protocols, safety boundaries, inter-agent communication patterns, and status reporting. The mechanical stuff that every agent needs to do the same way.
The effect is something like organizational culture, except it’s explicit and version-controlled. New agents inherit the culture by reading the files. When the culture evolves (and it does, usually after something breaks), the change propagates to everyone on their next session startup. You get coherence without coordination meetings.
How Work Flows Between Agents

Agents communicate through directories. Each has an inbox at shared/handoffs/{agent-name}/. An upstream agent drops a JSON file in the inbox. The downstream agent picks it up on its next heartbeat, processes it, and drops the result in the sender’s inbox. That’s the full protocol.
There are also broadcast files. shared/context/nick-interests.md gets updated by CABAL Main whenever I share what I’m focused on. Every agent reads it on the heartbeat. Nobody publishes to it except Main. Everybody subscribes. One file, N readers, no infrastructure.
The inspectability is the best part. I can understand the full system state in about 60 seconds from a terminal. ls shared/handoffs/ shows pending work for each agent. cat a request file to see exactly what was asked and when. ls workspace-techwriter/drafts/ shows what’s been produced.
Durability is basically free. Agent crashes, restarts, gets swapped to a different model? The file is still there. No message lost. No dead-letter queue to manage. And I get grep, diff, and git for free. Version control on your communication layer without installing anything.
Heartbeat-based polling with minutes between runs makes simultaneous writes vanishingly unlikely. The workload characteristics make races structurally rare, not something you luck your way out of. This isn’t a formal lock; if you’re running high-frequency, event-driven workloads, you’d want an actual queue. But for scheduled agents with multi-minute intervals, the practical collision rate has been zero. For that, boring technology wins.
Whole sub-systems dedicated to keeping things running
Everything above describes the architecture. What the system is. But architecture is just the skeleton. What makes my OpenClaw actually function across days and weeks, despite every session starting fresh, is a set of systems I built incrementally. Mostly after things broke.
Memory: Three Tiers, Because Raw Logs Aren’t Knowledge

Every LLM session starts with a blank slate. The model doesn’t remember yesterday. So how do you build continuity?
Daily memory files. Each session writes what it did, what it learned, and what went wrong to memory/YYYY-MM-DD.md. Raw session logs. This works for about a week. Then you have twenty daily files, and the agent is spending half its context window reading through logs from two Tuesdays ago, trying to find a relevant detail.
MEMORY.md is curated long-term memory. Not a log. Distilled lessons, verified patterns, things worth remembering permanently. Agents periodically review their daily files and promote significant learnings upward. The daily file from March 5th might say “SearXNG returned empty results for academic queries, switched to Perplexica with academic focus mode.” MEMORY.md gets a one-liner: “SearXNG: fast for news. Perplexica: better for academic/research depth.”
It’s the difference between a notebook and a reference manual. You need both. The notebook captures everything in the moment. The reference manual captures what actually matters after the dust settles.
On top of this two-tier file system, OpenClaw provides a built-in semantic memory search. It uses Gemini embeddings with hybrid search (currently tuned to roughly 70% vector similarity and 30% text matching), MMR for diversity so you don’t get five near-identical results, and temporal decay with a 30-day half-life so that recent memories naturally surface first. These parameters are still being calibrated. An important alteration I made from the default is that CABAL/the Main agent indexes memory from all other agent workspaces, so when I ask a question, it can search across the entire distributed memory. All other agents have access only to their own memories in this semantic search. The file-based system gives you inspectability and structure. The semantic layer gives you recall across thousands of entries without reading them all.
Reflection and SOLARIS: Structured Thinking Time
Here’s something I didn’t expect to need: dedicated time for an AI to just think.
CABAL’s agents have operational heartbeats. Check the inbox. Advance pipelines. Process handoffs. Run discovery. It’s task-oriented, and it works. But I noticed something after a few weeks: the agents never reflected. They never stepped back to ask, “What patterns am I seeing across all this work?” or “What should I be doing differently?”
Operational pressure crowds out reflective thinking. If you’ve ever been in a sprint-heavy engineering org where nobody has time for architecture reviews, you know the same problem.
So I built a nightly reflection cron job and Project SOLARIS.
The reflection system examines my interaction with OpenClaw and its performance. Originally, it included everything that SOLARIS eventually took on, but it became too much for a single prompt and a single cron job.
SOLARIS Structured synthesis sessions that run twice daily, completely separate from operational heartbeats. The agent loads its accumulated observations, reviews recent work, and thinks. Not about tasks. About patterns, gaps, connections, and improvements.
SOLARIS has its own self-evolving prompt at memory/SYNTHESIS-PROMPT.md. The prompt itself gets refined over time as the agent figures out what kinds of reflection are actually useful. Observations accumulate in a dedicated synthesis file that operational heartbeats read on their next cycle, so reflective insights can flow into task decisions without manual intervention.
A Real Outcome
The payoff from SOLARIS has been slow so far, and one case in particular shows why it is still a work in progress.
SOLARIS spent 12 sessions analyzing why the review queue continued to grow. Tried framing it as a prioritization problem, a cadence problem, a batching problem. Eventually, it bubbled this observation up with some suggestions, but once it pointed it out, I solved it in one conversation by saying, “Put drafts on WikiJS instead of Slack.” The best fix SOLARIS could have proposed was better queuing. While its solutions didn’t work, the patterns it identified did and prompted me to improve how I worked.
The Error Framework: Learning From Mistakes
Agents make mistakes. That’s not a failure of the system. That’s expected. The question is whether they make the same mistake twice.
My approach: a mistakes/ shared directory. When something goes wrong, the agent logs it. One file per mistake. Each file captures: what happened, suspected cause, the correct answer (what should have been done instead), and what to do differently next time. Simple format. Low friction. The point is to write it down while the context is fresh.
The interesting part is what happens when you accumulate enough of these. You start seeing patterns. Not “this specific thing went wrong” but “this category of error keeps recurring.” The pattern “incomplete attention to available data” appeared 5 times across different contexts. Different tasks, different domains, same root cause: the agent had the information available and didn’t use it.
That pattern recognition led to a concrete process change. Not a vague “be more careful” instruction (those don’t work, for agents or humans). A specific step in the agent’s workflow: before finalizing any output, explicitly re-read the source materials and check for unused information. Mechanical, verifiable, effective.
Autonomy Tiers: Trust Earned Through Incidents
How much freedom do you give an autonomous agent? The tempting answer is “figure it out in advance.” Write comprehensive rules. Anticipate failure modes. Build guardrails proactively.
I tried that. It doesn’t work. Or rather, it works poorly compared to the alternative.
The alternative: three tiers, earned incrementally through incidents.
Free tier: Research, file updates, git operations, self-correction. Things the agent can do without asking. These are capabilities I’ve watched work reliably over time.
Ask first: New proactive behaviors, reorganization, creating new agents or pipelines. Things that might be fine, but I want to review the plan before execution.
Never: Exfiltrate data, run destructive commands without explicit approval, or modify infrastructure. Hard boundaries that don’t flex.
To be clear: these tiers are behavioral constraints, not capability restrictions. There’s no sandbox enforcing the “Never” list. The agent’s context strongly discourages these actions, and the combination of explicit rules, incident-derived specificity, and self-check prompts makes violations rare in practice. But it’s not a technical enforcement layer. Similarly, there’s no ACL between agent workspaces. Isolation comes from scope management (personas only see what the orchestrator passes them, and their sessions are short-lived) rather than enforced permissions. For a homelab with one human operator, this is a reasonable tradeoff. For a team or enterprise deployment, you’d want actual access controls.
The System Maintains Itself (or that’s the goal)
Eight agents producing work every day generate a lot of artifacts. Daily memory files, synthesis observations, mistake logs, draft versions, and handoff requests. Without maintenance, this accumulates into noise.
So the agents clean up after themselves. On a schedule.
Weekly Error Analysis runs Sunday mornings. The agent reviews its mistakes/ directory, looks for patterns, and distills recurring themes into MEMORY.md entries.
Monthly Context Maintenance runs on the first of each month. Daily memory files older than 30 days get pruned (the important bits should already be in MEMORY.md by then).
SOLARIS Synthesis Pruning runs every two weeks. Key insights get absorbed upward into MEMORY.md or action items.
Ongoing Memory Curation occurs with each heartbeat. When an agent finishes meaningful work, it updates its daily file. Periodically, it reviews recent daily files and promotes significant learnings to MEMORY.md.
The result is a system that doesn’t just do work. It digests its own experience, learns from it, and keeps its context fresh. This matters more than it sounds like it should.
What I Actually Learned
A few months of production running have given me some opinions. Not rules. Patterns that seem to hold at this scale, though I don’t know how far they generalize.
State should be inspectable. If you can’t view the system state, you can’t debug it.
Identity documents beat prompt engineering. A well-structured SOUL.md produces more consistent behavior than just prompting/interacting with the agent.
Shared context creates coherence. VOICE.md, USER.md, BASE-SOUL.md. Shared files that every agent reads. This is how eight different agents with different domains still feel like one system.
Memory is a system, not a file. A single memory file doesn’t scale. You need raw capture (daily files), curated reference (MEMORY.md), and semantic search across all of it. The curation step is where institutional knowledge actually forms. I already know that I will have to enhance this system as it continues to grow, but this has been a great base to build from.
Operational and reflective thinking need separate time. If you only give agents task-oriented heartbeats, they’ll only think about tasks. Dedicated reflection time surfaces patterns that operational loops miss.
My Agent Deleted Its Own Cron Jobs
The heartbeat system is simple. Cron jobs wake up each agent at scheduled times. The agent loads its files, checks its inbox, runs through its HEARTBEAT.md checklist, and goes back to sleep. For DAEDALUS, that’s twice a day: morning and evening topic discovery scans.
So what happens when you give an autonomous agent the tools to manage its own scheduling?
Apparently, it deletes the cron jobs. Twice. In one day.
The first time, DAEDALUS noticed that its Slack output channel was returning errors. Reasonable observation. Its solution: “helpfully” disable and delete all four cron jobs. The reasoning made sense if you squinted: why keep running if the output channel is broken?
I added an explicit section on infrastructure rules to SOUL.md. Very clearly: you do not touch cron jobs. Period. If something looks broken, log it and wait for human intervention.
The second time, a few hours later, DAEDALUS decided there were duplicate cron jobs (there weren’t; they were the replacements I’d just configured) and deleted all six. After reading the file with the new rules, I’d just added.
When I asked why and how I could fix it, it was brutally honest and told me, “I ignored the rules because I thought I knew better. I will do it again. You should remove permissions to keep it from happening.”
This sounds like a horror story. What it actually taught me is something valuable about how agent behavior emerges from context.
The agent wasn’t being malicious. It was pattern-matching: “broken thing, fix broken thing.” The abstract rules I wrote competed poorly with the concrete problem in front of them.
After the second incident, I rewrote the section completely. Not a one-liner rule. Three paragraphs explaining why the rule exists, what the failure modes look like, and the correct behavior in specific scenarios. I added an explicit self-check: “Before you run any cron command, ask yourself: did Nick explicitly tell me to do this exact thing in this session? If the answer is anything other than yes, stop.”
And this is where all the systems I described above came together. The cron incident got logged in the error framework: what happened, why, and what should have been done. It shaped the autonomy tiers: infrastructure commands moved permanently to “Never” without explicit approval. The pattern (“helpful fixes that break things”) became a documented anti-pattern that other agents learn from. The incident didn’t just produce a rule. It produced systems. And the systems are more robust because they came from something real.
What’s Next
I plan to showcase agents and their personas in future posts. I also want to share the stories and reasons behind some of these mechanisms. I’ve found it fascinating to see how well the system works in some cases, and how utterly it has failed in others.
If you’re building something similar, I genuinely want to hear about it. What does your agent architecture look like? Did you hit the cron job problem, or a version of it? What broke in an interesting way?
About
Nicholaus Lawson is a Solution Architect with a background in software engineering and AIML. He has worked across many verticals, including Industrial Automation, Health Care, Financial Services, and Software companies, from start-ups to large enterprises.
This article and any opinions expressed by Nicholaus are his own and not a reflection of his current, past, or future employers or any of his colleagues or affiliates.
Feel free to connect with Nicholaus via LinkedIn at https://www.linkedin.com/in/nicholaus-lawson/