From Ad Hoc Prompting to Repeatable AI Workflows with Claude Code Skills

Editor
12 Min Read


1. Introduction

workflow working once. It is much harder to make it repeatable.

Prompting ChatGPT or Claude for each run is fast, but the results are inconsistent and hard to reproduce. Building everything in Python or locking down the workflow improves reliability, but often removes the flexibility that makes LLMs useful for exploration.

A Claude Code skill can bridge this gap. It preserves the flexibility of natural language, while SKILL.md and bundled scripts provide enough structure to keep the workflow consistent.

This approach works best for tasks that repeat with small changes, where natural-language instructions are important, and where hardcoding everything would add unnecessary complexity.

In my previous article, I walked through how to design, build, and distribute a Claude Code skill from scratch. In this article, I will focus on a concrete case study to show where a skill adds real value.

2. Use Case: Virtual Customer Research

The case study is LLM persona interviews—using an LLM to simulate customer conversations for qualitative research.

Customer research is valuable, but expensive. A qualitative study with a specialist agency can easily cost tens of thousands of dollars.

That is why more teams are turning to LLMs as a stand-in. You might tell ChatGPT, ‘You are a 25-year-old woman interested in skincare,’ and then ask for reactions to a new concept. This approach is fast, free, and always available.

However, when you try this approach on real projects, several issues come up. They reflect the core limitations of ad hoc prompting.

3. What Goes Wrong With Ad Hoc Prompting

It is straightforward to have an LLM play a persona and answer questions. The real problems start when you try to make that process repeatable across multiple personas, sessions, or projects.

Image by Author

In persona interview workflows, these problems show up fast. Responses in a shared chat start to anchor on earlier answers, outputs drift toward a generic middle, and the panel is hard to reuse for later tests or follow-up questions.

That is why better prompting alone does not solve the problem. The issue is not just wording. The workflow itself needs structure: stable persona definitions, deliberate diversity, and independent interview contexts.

4. From Prompting to a Reusable Skill

The key step was not writing a better prompt. It was turning a fragile, multi-step prompting workflow into a reusable Claude Code skill.

Instead of manually repeating panel setup, persona generation, and follow-up instructions every time, I can now trigger the whole workflow with a single command:

/persona generate 10 Gen Z skincare shoppers in the US

From the user’s perspective, this looks simple. But behind that one line, the skill handles panel design, persona generation, validation, and output packaging in a repeatable way.

5. What Runs Behind the Command

That single command triggers a workflow, not just a single prompt.
Behind the scenes, the skill does two things: it defines the panel structure and generates personas in a controlled way. This lets us run virtual interviews in isolated contexts, so the outputs can be reused for later tests or follow-ups.

Image by Author

5a. Treat Personas as Structured Objects

The first change was to treat a persona as a structured data object, not just a line of conversational setup. This shift makes the workflow more reliable and easier to analyze.

A naive approach usually looks like this:

You are a 22-year-old college student interested in skincare.

What do you think about a concept called "Barrier Repair Cream"?

The persona is vague here, and as you ask more questions, the character drifts. Instead, I define the persona as a JSON object:

Image by Author
Image by Author

This structure lets us pin down the key attributes, so the persona does not drift across questions. Since each persona is stored in a JSON file, you can reload the same panel for your next concept test or follow-up.

5b. Design Panel Diversity Up Front, and Validate It

The second change was to define the diversity of the customer panel before letting the AI model generate persona details.

If you just ask the LLM to generate 10 personas at once, you cannot control the balance of the panel. Ages may cluster too narrowly, and attitudes often end up sounding like small variations of the same person.

So I designed the Claude Code skill to define the attitudinal mix up front, then generate personas within that structure, and finally validate the result afterward. For a Gen Z skincare panel, that might mean a planned mix of routine devotees, skincare skeptics, budget-conscious shoppers, trend chasers, and problem-driven buyers.

Once the segments are set, the skill generates personas and then validates the distribution after generation.

One more design choice matters at interview time: each persona runs in an isolated context. That prevents later answers from anchoring on earlier ones and helps preserve sharper differences across the panel.

6. Why a Claude Code Skill — Not a Prompt, Not a Python Library

The design choices above were inspired by TinyTroupe, a Python library from Microsoft Research for LLM-powered multiagent persona simulation. One of its core ideas is treating personas as objects in a multi-agent setup. I borrowed that concept, but found that using it as a Python library added more friction than I wanted for daily work. So I rebuilt the workflow as a Claude Code skill.

A skill fit this workflow better than a prompt or a library because it sits in the middle ground between flexibility and structure.

comparison-flexible-structured
Image by Author
Image by Author

Based on this comparison, the advantages of a Claude Code skill come down to main points.

No extra billing. Python libraries that call LLMs, including TinyTroupe, require a separate OpenAI or Claude API key, and you have to watch usage costs. When you are still experimenting, that small meter running in the background creates friction. A Claude Code skill runs inside the subscription you already have, so scaling the panel from 10 to 20 personas does not add extra overhead.

Parameters pass as natural language. With a Python library, you have to match the function signature, for example: factory.generate_person(context="A hospital in São Paulo", prompt="Create a Brazilian doctor who loves pets"). With a Claude Code skill, you can just write:

/persona generate 10 Gen Z skincare shoppers in the US

That is enough.

SKILL.md acts as a guardrail. The rules for structuring a persona, the diversity design steps, and the overall workflow live in the instruction file. You do not have to rewrite the prompt each time. Whatever the user types, the workflow skeleton is protected by the skill.

Here is what it looks like in practice. Generating the panel takes one natural-language command:

/persona generate 10 Gen Z skincare shoppers in the US

Ten diverse personas are generated and saved as structured JSON objects. The segment distribution and age spread are validated automatically. Then, running /persona ask What frustrates you most about choosing skincare products? interviews each persona in an independent context and returns a full picture of the panel’s frustrations and needs. A complete demo, including concept test and verbatims, is available in the demo folder on GitHub.

7. Where Claude Code Skills Fit — and Where They Don’t

There are cases where a skill is not the right tool. Fully deterministic pipelines are better as plain code. Logic that needs audit or regulatory review is a poor fit for natural-language instructions. For a one-off exploratory question, just asking in a chat window is fine.

A Claude Code skill is not limited to natural-language instructions. You can include Python scripts inside the skill as well. In the persona skill, I use Python for panel diversity validation and for aggregating results. This lets you mix the parts where you want the LLM’s flexible judgment with the parts that should be deterministic, all in the same skill. That is what sets it apart from a prompt template.

The rule of thumb is simple: when your workflow needs structure but full hardcoding would be too heavy, a skill is often the right fit.

8. Conclusion

There is a middle ground in repetitive AI work: too unstable for ad hoc prompting, too rigid for a Python library. A Claude Code skill fills that gap, keeping the flexibility of natural language while SKILL.md and bundled scripts act as guardrails.

In this article, I used LLM persona interviews as a case study and walked through key design choices behind that workflow: structuring personas as objects and designing panel diversity up front. The core concepts were inspired by Microsoft’s TinyTroupe research.

The full SKILL.md, Python code, and a detailed demo for claude-persona are on GitHub.

Key takeaways

  • A Claude Code skill sits between ad hoc prompting and a Python library. The balance of flexibility and guardrails makes it a good fit for packaging AI workflows that repeat but are not identical each run.
  • LLM persona interviews become much more reliable once you structure personas as objects and design panel-level diversity deliberately.
  • If you have an AI workflow that is too fragile as a prompt but too fluid to justify a library, a Claude Code skill might be the right middle layer.

If you have questions or want to share what you built, find me on LinkedIn.

References

Share this Article
Please enter CoinGecko Free Api Key to get this plugin works.